A Study on L2-Loss (Squared Hinge-Loss) Multi-Class SVM

Size: px
Start display at page:

Download "A Study on L2-Loss (Squared Hinge-Loss) Multi-Class SVM"

Transcription

1 A Study on L2-Loss (Squared Hnge-Loss) Mult-Class SVM Chng-Pe Lee and Chh-Jen Ln Department of Computer Scence, Natonal Tawan Unversty, Tape 10617, Tawan Keywords: Support vector machnes, Mult-class classfcaton, Squared hnge loss, L2 loss. Abstract Crammer and Snger s method s one of the most popular mult-class SVMs. It consders L1 loss (hnge loss) n a complcated optmzaton problem. In SVM, squared hnge loss (L2 loss) s a common alternatve to L1 loss, but surprsngly we have not seen any paper studyng detals of Crammer and Snger s method usng L2 loss. In ths note, we conduct a thorough nvestgaton. We show that the dervaton s not trval and has some subtle dfferences from the L1 case. Detals provded n ths work can be a useful reference for those who ntend to use Crammer and Snger s method wth L2 loss. They do not need a tedous process to derve everythng by themselves. Further, we present some new results/dscusson for both L1- and L2-loss formulatons. 1 Introducton Support Vector Machnes (SVM) (Boser et al., 1992; Cortes and Vapnk, 1995) were orgnally desgned for bnary classfcaton. In recent years, many approaches have been proposed to extend SVM to handle mult-class classfcaton problems; see, for example, a detaled comparson n Hsu and Ln (2002). Among these works, the method proposed by Crammer and Snger (2001, 2002) has been wdely used. They extend the optmzaton problem of L1-loss (hnge-loss) SVM to a mult-class formula. In bnary classfcaton, squared hnge loss (L2 loss) s a common alternatve to L1 loss, but surprsngly we have not found any paper studyng detals of Crammer and Snger s method wth L2 loss. Ths s nconvenent because, for example, we do not know what the dual problem s. 1 Although the dual problem of two-class SVM usng L2 loss s well known, t cannot be drectly extended to the mult-class case. In fact, the dervaton s non-trval and has some subtle dfferences from the L1 case. Also, the algorthm to solve the dual problem (for both kernel and lnear stuatons) must be modfed. We thnk there s a need to gve all the detals for future references. Then those who ntend to use Crammer and 1 Indeed, the dual problem has been provded n some works of structured SVM, where Crammer and Snger s mult-class SVM s a specal case. However, ther form can be smplfed for mult-class SVM; see a detaled dscusson n Secton 2.2.

2 Snger s method wth L2 loss do not need a tedous procedure to derve everythng by themselves. In addton to the man contrbuton to nvestgate L2-loss mult-class SVM, we present some new results for both L1- and L2-loss cases. Frst, we dscuss the dfferences between the dual problem of Crammer and Snger s mult-class SVM and that of structured SVM, although we focus more on the comparson of L2-loss formulaton. Second, we gve a smpler dervaton for solvng the sub-problem n the decomposton method to mnmze the dual problem. Ths paper s organzed as follows. In Secton 2, we ntroduce L2-loss multclass SVM and derve ts dual problem. We dscuss the connecton to structured SVM, whch s a generalzaton of Crammer and Snger s mult-class SVM. Then n Secton 3, we extend a decomposton method to solve the optmzaton problem. In partcular, we obtan the sub-problem to be solved at each teraton. A procedure to fnd the soluton of the sub-problem s gven n Secton 4. Our dervaton and proof are smpler than Crammer and Snger s. In Secton 5, we dscuss some mplementaton ssues and extensons. Experments n Secton 6 compare the performance of L1-loss and L2-loss mult-class SVM usng both lnear and nonlnear kernels. Secton 7 then concludes ths paper. 2 Formulaton Gven a set of nstance-label pars (x, y ), x R n, y {1,..., k}, = 1,..., l, Crammer and Snger (2002) proposed a mult-class SVM approach by solvng the followng optmzaton problem. mn w 1,...,w k,ξ 1 2 w T mw m + C subject to w T y x w T mx e m ξ, (1) ξ = 1,..., l, m = 1,..., k, where e m = { 0 f y = m, 1 f y m. Note that f y = m, the constrant s the same as ξ 0. The decson functon for predctng the label of an nstance x s (2) arg max,...,k wt mx. 2

3 The dual problem of (1) s where subject to 1 mn α 2 j=1 K,j α m αj m + α m e m α m = 0, = 1,..., l, (3) α m 0, = 1,..., l, m = 1,..., k, m y, (4) α y C, = 1,..., l, (5) α = [α 1 1,..., α k 1,..., α 1 l,..., α k l ] T and K,j = x T x j. (6) Constrants (4) and (5) are often combned as { α m Cy m, where Cy m 0 f y m, = C f y = m. We separate them n order to compare wth the dual problem of usng L2 loss. After solvng (3), one can compute the optmal w m by w m = α m x, m = 1,..., k. (7) In ths paper, we extend problem (1) to use L2 loss. By changng the loss term from ξ to ξ 2, the prmal problem becomes mn w 1,...,w k,ξ 1 2 w T mw m + C subject to w T y x w T mx e m ξ, (8) ξ 2 = 1,..., l, m = 1,..., k. The constrant w T y x w T mx e m ξ when m = y can be removed because for L2 loss, ξ 0 holds at an optmum wthout ths constrant. We keep t here n order to compare wth the formulaton of usng L1 loss. We wll derve the followng dual problem. mn α subject to f(α) α m = 0, = 1,..., l, (9) α m 0, = 1,..., l, m = 1,..., k, m y, where f(α) = 1 2 K,j α m αj m + j=1 3 α m e m + (α y )2 4C.

4 Problem (9) s smlar to (3), but t possesses an addtonal quadratc term n the objectve functon. Further, the constrant on α y s dfferent. In (5), α y C, but n (9), α y s unconstraned. 2 We dscuss two methods to derve the dual problem. The frst s from a drect calculaton, whle the second follows from the dervaton of structured SVM. 2.1 A Drect Calculaton to Obtan the Dual Problem The Lagrange functon of (8) s L(w 1,..., w k, ξ, ˆα) = 1 2 w T mw m + C ξ 2 ˆα m (w T y x w T mx e m + ξ ), where ˆα m 0, m = 1,..., k, = 1,..., l, are Lagrange multplers. The Lagrange dual problem s ( max nf L(w 1,..., w k, ξ, ˆα) ). ˆα: ˆα m 0,,m w 1,...,w k,ξ To mnmze L under fxed ˆα, we rewrte the followng term n the Lagrange functon and have ˆα m w T y x = wm L = 0 w m ξ L = 2Cξ :y =m s=1 We smplfy (11) by defnng α m (1 e m ) ˆα s w T y x = ( (1 e m ) s=1 ˆα m = 0 ξ = w T m (1 e m ) ˆα s x, (10) s=1 ) ˆα s ˆα m x = 0, m = 1,..., k, (11) k ˆαm 2C, = 1,..., l. (12) ˆα s ˆα m, = 1,..., l, m = 1,..., k. (13) s=1 Ths defnton s equvalent to α m = ˆα m, m y, (14) α y = ˆα m = α m. (15) m:m y m:m y 2 Indeed, usng k αm = 0 and α m 0, m y, we have α y dual problems of L1 and L2 cases. 4 0 for both

5 Therefore, we can rewrte the soluton of mnmzng L under fxed ˆα as w m = α m x, m = 1,..., k, (16) ξ = ˆαy + α y 2C, = 1,..., l. (17) By (2), (10), (11), and (14)-(17), the Lagrange dual functon s L(w 1,..., w k, ξ, ˆα) = 1 (w 2 m) T w m = 1 2 C (ξ ) 2 + ˆα m ( (w y ) T x (w m) T x ) ˆα m e m (18) ( (w m) T w m (w m) T (1 e m ) = 1 2 = 1 2 = 1 2 ) (w m) T ˆα m x C (w m) T w m C α m x 2 j=1 (ξ ) 2 + (ξ ) 2 + K,j α m αj m (ˆα y + α y )2 4C α m e m ˆα s x s=1 ˆα m e m ˆα m e m α m e m (ˆα y + α y 4C )2. (19) Because ˆα m, m y do not appear n (19), from (14) and (15), the dual problem s mn α, ˆα subject to 1 2 j=1 K,j α m αj m + α m e m + (ˆα y + α y 4C )2 ˆα y 0, = 1,..., l, α m = 0, = 1,..., l, (20) α m 0, = 1,..., l, m = 1,..., k, m y. (21). Because (20) and (21) mply α y the optmal ˆα y problem s complete. 0 and ˆα y must be zero. After removng ˆα y appears only n the term (ˆα y +α y )2,, the dervaton of the dual 5

6 We dscuss the dfference from L1 loss. If L1 loss s used, (12) becomes ˆα m = C, (22) and C l (ξ ) 2 n (18) dsappears. Equaton (22) and the fact ˆα y 0 lead to the constrant (5) as follows. α y = m:m y ˆα m = C ˆα y C. For L2 loss, wthout the condton n (22), α y s unconstraned. 2.2 Usng Structured SVM Formulaton to Obtan the Dual Problem It s well known that Crammer and Snger s mult-class SVM s a specal case of structured SVM (Tsochantards et al., 2005). By defnng w w 1. w k R kn 1 and δ(, m) R kn 1, wth δ(, m) 0 x 0 x 0 y -th poston m-th poston, f y m, and 0 f y = m, problem (8) can be wrtten as mn w,ξ 1 2 w 2 + C ξ 2 subject to w T δ(, m) e m ξ, (23) = 1,..., l, m = 1,..., k. Ths problem s n a smlar form to L2-loss bnary SVM, so the dervaton of the dual problem s straght forward. Followng Tsochantards et al. (2005), the dual problem s 3 3 Problem (24) s slghtly dfferent from that n Tsochantards et al. (2005) because they remove the constrants ξ 0, by settng m y n (23). 6

7 1 mn ˆα 2 s=1 j=1 ˆα m e m + δ(, m) T δ(j, s)ˆα m ˆα s j ( k ˆαm ) 2 4C subject to ˆα m 0, = 1,..., l, m = 1,..., k. (24) Also, at an optmal soluton, we have w = ˆα m δ(, m) and ξ = k ˆαm 2C. (25) Problem (24) seems to be very dfferent from problem (9) obtaned n Secton 2.1. In fact, problem (24) s an ntermedate result n our dervaton. A careful check shows 1. ˆα s the same as the Lagrange multpler used n Secton w n (25) s the same as that n (7); see Equaton (11). In Secton 2.1, we ntroduce a new varable α and smplfes the two terms δ(, m) T δ(j, s)ˆα m ˆα j s and s=1 j=1 ( ˆα m ) 2 to j=1 K,j α m αj m and (α y )2 4C, respectvely. An advantage of problem (9) s that K,j = x T x j explctly appears n the objectve functon. In contrast, δ(, m) T δ(j, m) does not reveal detals of the nner product between nstances. However, a caveat of (9) s that t contans some lnear constrants. An nterestng queston s whether the smplfcaton from (24) to (9) allows us to apply a smpler or more effcent optmzaton algorthm. Ths ssue already occurs for usng L1 loss because we can ether solve problem (3) or a form smlar to (24). However, the dual problem of L1-loss structured SVM contans a lnear constrant, but problem (24) does not. 4 Therefore, for the L1 case, t s easy to see that the smplfed form (3) should be used. However, for L2 loss, problem (24) possesses an advantage of beng a bound-constraned problem. We wll gve some dscusson about solvng (9) or (24) n Secton 5.5. In all remanng places we focus on problem (9) because exstng mplementatons for the L1-loss formulaton all solve the correspondng problem (3). 4 See Proposton 5 n Tsochantards et al. (2005). 7

8 3 Decomposton Method and Sub-problem Decomposton methods are currently the major method to solve the dual problem (3) of the L1 case (Crammer and Snger, 2002; Keerth et al., 2008). At each teraton, the k varables α 1,..., α k assocated wth an nstance x are selected for updatng, whle other varables are fxed. For (3), the followng sub-problem s solved. mn α 1,...,αk subject to ( 1 2 A(αm ) 2 + B m α m ) α m = 0, (26) α m C m y, m = 1,..., k, where A = K, and B m = K j, ᾱj m + e m Aᾱ m. (27) In (27), ᾱ s the soluton obtaned n the prevous teraton. We defer the dscusson on the selecton of the ndex n Secton 5. For problem (9), we show that the sub-problem s: mn α 1,...,αk subject to j=1 ( 1 2 A(αm ) 2 + B m α m ) + (αy )2 4C α m = 0, (28) α m 0, m = 1,..., k, m y, where A and B m are the same as (27). The dervaton of (28) s as follows. Because all elements except α 1,..., α k are fxed, the objectve functon of (9) becomes = 1( K, (α m ) ) K,j ᾱj m α m + α m e m 2 j:j (1 2 K,(α m ) 2 + ( K j, ᾱj m + e m K, ᾱ m )α m j=1 + (αy )2 4C + constants ) + (α y )2 4C + constants. (29) Equaton (29) then leads to the objectve functon of (28), whle the constrants are drectly obtaned from (9). Note that B m = w T mx + e m K, ᾱ m (30) 8

9 f are mantaned. 5 w m = ᾱ m x, m = 1,..., k 4 Solvng the Sub-problem We dscuss how to solve the sub-problem when A > 0. If A = 0, then x = 0. Thus ths nstance gves a constant value ξ = 1 to the prmal objectve functon (8), and the value of α m, m = 1,..., k have no effect on w m defned n (16), so we can skp solvng the sub-problem. We follow the approach by Crammer and Snger to solve the sub-problem, although there are some nterestng dfferences. Ther method frst computes D m = B m + AC m y, m = 1,..., k. Then t starts wth a set Φ = φ and sequentally adds one ndex m to Φ by the decreasng order of D m untl the followng nequalty s satsfed. β = AC + m Φ D m Φ max m/ Φ D m. (31) The optmal soluton of (26) s computed by: α m = mn(cy m, β B m ), m = 1,..., k. (32) A Crammer and Snger gave a lengthy proof to show the correctness of ths method. Our contrbuton here s to derve the algorthm and prove ts correctness by easly analyzng the KKT optmalty condton. We now derve an algorthm for solvng (28). Let us defne A y A + 1/2C. The KKT condtons of (28) ndcate that there are scalars β, ρ m, m = 1,..., k, such that α m = 0, (33) α m 0, m y, (34) ρ m α m = 0, ρ m 0, m y, (35) Aα m + B m β = ρ m, m y, (36) A y α y + B y β = 0. (37) 5 See detals of solvng lnear Crammer and Snger s mult-class SVM n Keerth et al. (2008). 9

10 Usng (34), Equatons (35) and (36) are equvalent to Aα m + B m β = 0, f α m < 0, m y, (38) Aα m + B m β = B m β 0, f α m = 0, m y. (39) Now KKT condtons become (33)-(34), (37), and (38)-(39). If β s known, we prove that { mn(0, β B m α m A ) f m y, β B y (40) A y f m = y, satsfes all KKT condtons except (33). Clearly, the way to get α m n (40) ensures α m 0, m y, so (34) holds. From (40), when β < B m, we have α m < 0 and β B m = Aα m. Thus, (38) s satsfed. Otherwse, β B m and α m = 0 satsfy (39). Also notce that α y s drectly obtaned from (37). The remanng task s how to fnd β such that (33) holds. From (33), (37), and (38) we obtan A y (β B m ) + A(β B y ) = 0. m:α m <0 Hence, β = AB y + A y A + m:α m <0 B m m:α m <0 A y = A A y B y + m:α m <0 B m A A y + {m α m < 0}. (41) Combnng (41) and (39), we begn wth a set Φ = φ, and then sequentally add one ndex m to Φ by the decreasng order of B m, m = 1,..., k, m y, untl h = A A y B y + m Φ B m A A y + Φ max m/ Φ B m. (42) Let β = h when (42) s satsfed. Algorthm 1 lsts the detals for solvng the sub-problem (28). To prove (33), t s suffcent to show that β and α m, m, obtaned by Algorthm 1 satsfes (41). Ths s equvalent to showng that the set Φ of ndces ncluded n step 5 of Algorthm 1 satsfes Φ = {m α m < 0}. From (40), we prove the followng equvalent result. β < B m, m Φ and β B m, m / Φ. (43) The second nequalty mmedately follows from (42). For the frst, assume t s the last element added to Φ. When t s consdered, (42) s not satsfed yet, so A A y B y + B m m Φ\{t} A A y + Φ 1 10 < B t. (44)

11 ALGORITHM 1: Solvng the sub-problem 1. Gven A, A y, and B = {B 1,..., B k }. 2. D B 3. Swap D 1 and D y, then sort D \ {D 1 } n decreasng order. 4. r 2, β D 1 A/A y 5. Whle r k and β/(r 2 + A/A y ) < D r 5.1. β β + D r 5.2. r r β β/(r 2 + A/A y ) 7. α m mn ( 0, (β B m )/A ), m y 8. α y (β B y )/A y Usng (44) and the fact that elements n Φ are added n the decreasng order of B m, A B y + B m = A B y + A y A y m Φ m Φ\{t} < ( Φ 1 + A A y )B t + B t = ( Φ + A A y )B t ( Φ + A A y )B s, s Φ. B m + B t Thus, we have the frst nequalty n (43). Wth all KKT condtons satsfed, Algorthm 1 obtans an optmal soluton of (28). By comparng (31), (32) and (42), (40) respectvely, we can see that the procedures for L1 loss and L2 loss are smlar but dfferent n several aspects. In partcular, because α y s unconstraned, B y s consdered dfferently from other B m s n (42). 5 Other Issues and Extensons In ths secton, we dscuss other detals of the decomposton method. Some of them are smlar to those for the L1 case. We also extend problems (8)-(9) to more general settngs. In the end we dscuss advantages and dsadvantages of solvng two dual forms (9) and (24). 5.1 Extensons to Use Kernels It s straghtforward to extend our algorthm to use kernels. The only change s to replace K,j = x T x j 11

12 n (6) wth K,j = φ(x ) T φ(x j ), (45) where φ(x) s a functon mappng data to a hgher dmensonal space. 5.2 Workng Set Selecton We mentoned n Secton 3 that at each teraton of the decomposton method, an ndex s selected so that α 1,..., α k are updated. Ths procedure s called workng set selecton. If kernels are not used, we follow Keerth et al. (2008) to sequentally select {1,..., l}. 6 For lnear SVM, t s known that more sophstcated selectons such as usng gradent nformaton may not be cost-effectve; see the detaled dscusson n Secton 4.1 of Hseh et al. (2008). For kernel SVM, we can use gradent nformaton for workng set selecton because the cost s relatvely low compared to that of kernel evaluatons. In Crammer and Snger (2001), to solve problems wth L1 loss, they select an ndex by = arg max ϕ, (46) {1,...,l} where ϕ = max 1 m k ĝm mn ĝ m:α m m, (47) <Cm y and ĝ m, = 1,..., l, m = 1,..., k, are the gradent of (3) s objectve functon. The reason behnd (46) s that ϕ shows the volaton of the optmalty condton. Note that for problem (3), α s optmal f and only f α s feasble and ϕ = 0, = 1,..., l. (48) See the dervaton n Crammer and Snger (2001, Secton 5). For L2 loss, we can apply a smlar settng by where g m = j=1 ϕ = max 1 m k gm mn g m:α m <0 or m=y m, = 1,..., l, K,j αj m + e m + (1 e m ) αm, = 1,..., l, m = 1,..., k, 2C are the gradent of the objectve functon n (9). Note that C m y n (47) becomes 0 here. 5.3 Stoppng Condton From (48), a stoppng condton of the decomposton method can be max ϕ ɛ, where ɛ s the stoppng tolerance. The same stoppng condton can be used for the L2 case. 6 In practce, for faster convergence, at each cycle of l steps, they sequentally select ndces from a permutaton of {1,..., l}. 12

13 5.4 Extenson to Assgn Dfferent Regularzaton Parameters to Each Class In some applcatons, we may want to assgn dfferent regularzaton parameter C to class. Ths can be easly acheved by replacng C n earler dscusson wth C. 5.5 Solvng Problem (9) Versus Problem (24) In Secton 2.2, we mentoned an ssue of solvng problem (9) or problem (24). Based on the nvestgaton of decomposton methods so far, we gve some bref dscusson. Some works for structured SVM have solved the dual problem, where (24) s a specal case. For example, n Chang et al. (2010), a dual coordnate descent method s appled for solvng the dual problem of L2-loss structured SVM. Because (24) does not contan any lnear constrant, they are able to update a sngle ˆα m at a tme. 7 Ths settng s related to the decomposton method dscussed n Secton 3, although ours update k varables at a tme. If ˆα m s selected for update, the computatonal bottleneck s on calculatng for constructng a one-varable sub-problem. 8 the calculaton of ˆα j m K j, and j=1 w T δ(, m) = w T y x w T mx (49) From (11), Equaton (49) nvolves j=1 ˆα y j K j,. (50) The cost of 2l kernel evaluatons s O(ln) f each kernel evaluaton takes O(n). For our decomposton method to solve (9), to update k varables α m, m = 1,..., k, together, the number of kernel evaluatons s only l; see Equatons (27) and (29). More precsely, the complexty of Algorthm 1 to solve the sub-problem (28) s O(k log k + ln + kl), (51) where O(k log k) s for sortng B m, m y, and O(kl) s for obtanng B m, m = 1,..., k n Equaton (27). If k s not large, O(ln) s the domnant term n (51). Ths analyss ndcates that regardless of how many elements n α m, m = 1,..., k, are updated, we always need to calculate the -th kernel column K j,, j = 1,..., l. In ths regard, the decomposton method for problem (9) by solvng a sequence of sub-problems (28) ncely allows us to update as many varables as possble under a smlar number of kernel evaluatons. If kernel s not appled, nterestngly the stuaton becomes dfferent. The O(ln) cost of computng (50) s reduced to O(n) because w y and w m are avalable. If 7 Ths s not possble for the dual problem of L1-loss structured SVM. We have mentoned n Secton 2.2 that t contans a lnear constrant. 8 We omt detals because the dervaton s smlar to that for dervng the sub-problem (28). 13

14 Algorthm 1 s used, from (30), the complexty n (51) for updatng k varables becomes O(k log k + kn). For updatng an ˆα m by (49), the cost s O(n). Therefore, f log k < n, the cost of updatng α m, m = 1,..., k, together s k tmes of updatng a sngle varable. Then, the decomposton method for solvng problem (9) and sub-problem (28) may not be better than a coordnate descent method for solvng problem (24). Note that we have focused on the cost per sub-problem, but there are many other ssues such as the convergence speed (.e., the number of teratons). Memory access also affects the computatonal tme. For the coordnate descent method to update a varable ˆα m, the correspondng w m, x, and ˆα m must be accessed. In contrast, the approach of solvng sub-problem (28) accesses data and varables more systematcally. An mportant future work s to conduct a serous comparson and dentfy the better approach. 6 Experments In ths secton, we compare the proposed method for L2 loss wth an exstng mplementaton for L1 loss. We check lnear as well as kernel mult-class SVMs. Moreover, a comparson of senstvty to parameters s also conducted. Our mplementaton s extended from those n LIBLINEAR (Fan et al., 2008) and BSVM (Hsu and Ln, 2002), whch respectvely nclude solvers for lnear and kernel L1-loss Crammer and Snger mult-class SVM. Programs for experments n ths paper are avalable at codes.zp. All data sets used are avalable at Lnear Mult-class SVM We check both tranng tme and test accuracy of usng L1 and L2 losses. We consder the four data sets used n Keerth et al. (2008): news20, MNIST, sector and rcv1. We select the regularzaton parameter C by checkng fve-fold crossvaldaton (CV) accuracy of usng values n {2 5, 2 4,..., 2 5 }. The stoppng tolerance s ɛ = 0.1. The detals of the data sets are lsted n Table 1, and the experment results can be found n Table 2 The accuracy values are comparable. One may observe that the tranng tme of usng L1 loss s less. Ths result s opposte to that of bnary classfcaton; see experments n Hseh et al. (2008). In bnary classfcaton, when C approaches zero, the Hessan matrx of L2-loss SVM s close to the matrx I/(2C), where I s the dentty matrx. Thus, the optmzaton problem s easer to solve. However, for Crammer and Snger s mult-class SVM, when C approaches zero, only l of the Hessan s kl dagonal elements become close to 1/(2C). Ths may be the reason why for mult-class SVM, usng L2 loss does not lead to faster tranng tme. 14

15 Table 1: Data sets for experments of lnear mult-class SVMs. n s the number of features and k s the number of classes. data set #tranng #testng n k C for L1 loss C for L2 loss news20 15, 395 3, , MNIST 60, , sector 6, 412 3, , rcv1 15, , , Table 2: Lnear mult-class SVMs: we compare tranng tme (n seconds) and test accuracy between L1 loss and L2 loss. L1 loss L2 loss data set tranng tme test accuracy tranng tme test accuracy news % % MNIST % % sector % % rcv % % 6.2 Kernel Mult-class SVM We use the same data sets and the same procedure n Hsu and Ln (2002) to compare test accuracy, tranng tme and sparsty (.e., percentage of tranng data as support vectors) of usng L1 and L2 losses. We use the RBF kernel K(x, x j ) = e γ x x j 2. We fx the cache sze for the kernel matrx as 2048 MB. The stoppng tolerance s set to be ɛ = n all data sets except letter and shuttle, whose stoppng tolerance s ɛ = 0.1 for avodng lengthy tranng tme. The data set descrpton s n Table 3 and the results are lsted n Table 4. For dna, satmage, letter and shuttle, both tranng and test sets are avalable. We follow Hsu and Ln (2002) to splt the tranng data to 70% tranng and 30% valdaton for fndng parameters among C = {2 2, 2 1,..., 2 12 } and γ = {2 10, 2 9,..., 2 4 }. We then tran the whole tranng set by the best parameters and report the test accuracy and the model sparsty. For the rest data sets whose test sets are not avalable, we report the best ten-fold CV accuracy and the model sparsty. 9 From Table 4, we can see that L2-loss mult-class SVM gves comparable accuracy to L1-loss SVM. Note that the accuracy and the parameters of L1-loss mult-class SVM on some data sets are slghtly dfferent from those n Hsu and Ln (2002) because of the random data segmentaton n the valdaton procedure and the dfferent versons of the BSVM code. Tranng tme and sparsty are very dfferent between usng L1 and L2 losses because they hghly depend on the parameters used. To remove the effect of dfferent parameters, n Secton 6.3, we present the average result over a set of parameters. 9 The sparsty s the average of the 10 models n the CV procedure. 15

16 Table 3: Data sets for experments of kernel mult-class SVMs. n s the number of features and k s the number of classes. data set #tranng #testng n k (C,γ) for L1 loss (C,γ) for L2 loss rs (2 3, 2 5 ) (2 10, 2 7 ) wne (2 0, 2 2 ) (2 1, 2 3 ) glass (2 1, 2 3 ) (2 1, 2 3 ) vowel (2 2, 2 1 ) (2 4, 2 1 ) vehcle (2 7, 2 3 ) (2 5, 2 4 ) segment 2, (2 3, 2 3 ) (2 7, 2 0 ) dna 2, 000 1, (2 1, 2 6 ) (2 1, 2 6 ) satmage 4, 435 2, (2 2, 2 2 ) (2 4, 2 2 ) letter 15, 000 5, (2 4, 2 2 ) (2 11, 2 4 ) shuttle 43, , (2 10, 2 4 ) (2 9, 2 4 ) *: ɛ = 0.1 s used. 6.3 Senstvty to Parameters Parameter selecton s a tme consumng process. To avod checkng many parameters, we hope a method s not senstve to parameters. In ths secton, we compare the senstvty of L1 and L2 losses by presentng the average performance over a set of parameters. For lnear mult-class SVM, 11 values of C are selected: {2 5, 2 4,..., 2 5 }, and we present the average and the standard devaton of tranng tme and test accuracy. The results are n Table 5. For the kernel case, We pck C and γ from the two sets {2 1, 2 2, 2 5, 2 8 } and {2 6, 2 3, 2 0, 2 3 }, respectvely, so 16 dfferent results are generated. 10 We then report average and standard devaton n Table 6. From Tables 5 and 6, L2 loss s worse than L1 loss on the average tranng tme and sparsty. The hgher percentage of support vectors s the same as the stuaton n bnary classfcaton because the squared hnge loss leads to many small but nonzero α m. Interestngly, the average performance (test or CV accuracy) of L2 loss s better. Therefore, f usng L2 loss, t may be easer to locate a good parameter settng. We fnd that the same stuaton occurs n bnary classfcaton, although ths result was not clearly mentoned n prevous studes. An nvestgaton shows that L2 loss gves better accuracy when C s small. In ths stuaton, both L1- and L2-loss SVM suffer from the underfttng of tranng data. However, because L2 loss gves a hgher penalty than L1 loss, underfttng s less severe. 6.4 Summary of Experments Based on the experments, we have the followng fndngs. 10 We use a subset of (C, γ) values n Secton 6.2 to save the runnng tme. To report the average tranng tme, we must run all jobs n the same machne. In contrast, several machnes were used n Secton 6.2 to obtan CV accuracy of all parameters. 16

17 Table 4: Kernel mult-class SVMs: we compare tranng tme (n seconds), test or CV accuracy, and sparsty between L1 loss and L2 loss. nsv represents the percentage of tranng data that are support vectors. L1 loss L2 loss tranng test or CV tranng test or CV data set tme accuracy nsv tme accuracy nsv rs % 37.33% % 16.67% wne % 28.65% % 33.96% glass % 80.01% % 98.91% vowel % 67.93% % 72.31% vehcle % 53.73% % 65.29% segment % 46.65% % 19.08% dna % 46.90% % 56.10% satmage % 60.41% % 60.92% letter % 42.56% % 78.56% shuttle % 0.66% % 1.41% *: ɛ = 0.1 s used. Table 5: Senstvty to parameters: lnear mult-class SVMs. We present average±standard devaton. L1 loss L2 loss data set tranng tme test accuracy tranng tme test accuracy news ± ± 1.33% 3.59 ± ± 0.41% MNIST ± ± 0.20% ± ± 0.21% sector 5.96 ± ± 0.77% 7.66 ± ± 0.46% rcv ± ± 2.42% 3.83 ± ± 0.89% 1. If usng the best parameter, L2 loss gves comparable accuracy to L1 loss. For the tranng tme and the number of support vectors, L2 loss s better for some problems, but worse for some others. The stuaton hghly depends on the chosen parameter. 2. If we take the whole procedure of parameter selecton nto consderaton, L2 loss s worse than L1 loss on tranng tme and sparsty. However, the regon of sutable parameters s larger. Therefore, we can check fewer parameters f usng L2 loss. 7 Conclusons Ths paper extends Crammer and Snger s mult-class SVM to apply L2 loss. We gve detaled dervatons and dscuss some nterestng dfferences from the L1 case. Our results serve as a useful reference for those who ntend to use Crammer and Snger s method wth L2 loss. Fnally, we have extended the software BSVM (after 17

18 Table 6: Senstvty to parameters: kernel mult-class SVMs. We present average±standard devaton. nsv represents the percentage of tranng data that are support vectors. L1 loss data set tranng tme test or CV accuracy nsv rs 0.10 ± ± 6.93% ± 17.74% wne 0.09 ± ± 0.95% ± 31.15% glass 5.57 ± ± 5.24% ± 8.74% vowel ± ± 15.12% ± 13.17% vehcle ± ± 5.99% ± 16.68% segment ± ± 2.91% ± dna 6.29 ± ± 7.85% ± 20.68% satmage ± ± 2.83% ± 19.66% letter ± ± 9.18% ± 18.30% shuttle ± ± 1.89% 8.07 ± 6.63% L2 loss rs 0.18 ± ± 2.12% ± 25.68% wne 0.11 ± ± 1.11% ± 30.97% glass ± ± 3.40% ± 9.61% vowel ± ± 12.15% ± 12.72% vehcle ± ± 4.91% ± 16.52% segment ± ± 2.07% ± 20.86% dna ± ± 7.73% ± 19.72% satmage ± ± 2.21% ± 18.89% letter ± ± 7.24% ± 21.32% shuttle ± ± 1.62% ± 11.00% *: ɛ = 0.1 s used. verson 2.07) to nclude the proposed mplementaton. Acknowledgment Ths work was supported n part by the Natonal Scence Councl of Tawan va the grant E MY3. The authors thank the anonymous revewers and Mng-We Chang for valuable comments. We also thank Yong Zhuang and We-Sheng Chn for ther help n fndng errors of ths paper. References Bernhard E. Boser, Isabelle Guyon, and Vladmr Vapnk. A tranng algorthm for optmal margn classfers. In Proceedngs of the Ffth Annual Workshop on Computatonal Learnng Theory, pages ACM Press, Mng-We Chang, Vvek Srkumar, Dan Goldwasser, and Dan Roth. Structured 18

19 output learnng wth ndrect supervson. In Proceedngs of the Twenty Seven Internatonal Conference on Machne Learnng (ICML), pages , Corna Cortes and Vladmr Vapnk. Support-vector network. Machne Learnng, 20: , Koby Crammer and Yoram Snger. On the algorthmc mplementaton of multclass kernel-based vector machnes. Journal of Machne Learnng Research, 2: , Koby Crammer and Yoram Snger. On the learnablty and desgn of output codes for multclass problems. Machne Learnng, (2 3): , Rong-En Fan, Ka-We Chang, Cho-Ju Hseh, Xang-Ru Wang, and Chh-Jen Ln. LIBLINEAR: A lbrary for large lnear classfcaton. Journal of Machne Learnng Research, 9: , URL ~cjln/papers/lblnear.pdf. Cho-Ju Hseh, Ka-We Chang, Chh-Jen Ln, S. Sathya Keerth, and Sellamanckam Sundararajan. A dual coordnate descent method for large-scale lnear SVM. In Proceedngs of the Twenty Ffth Internatonal Conference on Machne Learnng (ICML), URL cddual.pdf. Chh-We Hsu and Chh-Jen Ln. A comparson of methods for mult-class support vector machnes. IEEE Transactons on Neural Networks, 13(2): , S. Sathya Keerth, Sellamanckam Sundararajan, Ka-We Chang, Cho-Ju Hseh, and Chh-Jen Ln. A sequental dual method for large scale mult-class lnear SVMs. In Proceedngs of the Forteenth ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, pages , URL http: // Ioanns Tsochantards, Thorsten Joachms, Thomas Hofmann, and Yasemn Altun. Large margn methods for structured and nterdependent output varables. Journal of Machne Learnng Research, 6: , A Solvng the Sub-problems when A 0 Our decomposton method only solves the sub-problem when A > 0. To cover the case when K s not a vald kernel and K, s any possble value, we stll need to solve the sub-problems when A 0. 19

20 A.1 A = 0 When A = 0, for L1 loss, the sub-problem (26) reduces to a lnear programmng problem. Defne m arg max m:m y B m, then the optmal soluton s α m = 0, m = 1,..., k f B y B m 0, α y = C α m = α y f B y B m < 0. = 0, m y and m m, α m It s more complcated n the L2-loss case, because there s a quadratc term of α y. To solve the sub-problem (28), we reformulate t by the followng procedure. From footnote 2, we know α y 0. For any fxed α y, the sub-problem becomes mn α m,m y subject to B m α m m:m y = α y, m:m y α m α m 0, m y. Clearly, the soluton s α m = { α y f m = m, 0 otherwse. (52) Therefore, the sub-problem (28) s reduced to the followng one-varable problem. y (α mn α y 0 4C )2 + (B y B m )α y. (53) The soluton of (53) s α y = max ( 0, 2(B y B m )C ). (54) Usng (52) and (54), the optmal soluton can be wrtten as α m = 0, m = 1,..., k f B y B m 0, α y = 2(B y B m )C α m = α y f B y B m < 0. = 0, m y and m m α m 20

21 A.2 A < 0 For any gven α y that satsfes ther correspondng constrants, both (26) and (28) are equvalent to mn α m,m y subject to When A < 0, t s equvalent to max α m,m y subject to 2 A( (α m + B m A )2) = α y, m y 1 m y α m α m 0, m y. (α m ) 2 + m y m y α m m y 2 B m A αm = α y, (55) α m 0, m y. (56) By constrants (55) and (56), we have (α m ) 2 ( α m ) 2 = ( α y )2, and m y m y B m A αm ( B m α m ) = B m A A αy. m y m y Note that when A < 0, B m m = arg max B m = arg mn m:m y m:m y A. Thus clearly the optmal soluton s { α m α y f m = m, = 0 otherwse. Sub-problem (26) s then reduced to the followng one-varable problem. mn A(α y C α y 0 )2 + (B y B m )α y = A(α y Its soluton s α y = { 0 f By B m C, 2A 2 C otherwse. + B y B m ) 2 + constants. 2A Combne them together, the optmal soluton of (26) when A < 0 s α m = 0, m = 1,..., k f B y B m AC, α y = C α m = α y f B y B m < AC. (57) = 0, m y and m m α m 21

22 When A = 0, AC = 0. Therefore, (57) can be used n L1 loss for A 0. For problem (28), when A < 0, t s reduced to another one-varable problem. mn A (α y α y 0 )2 + (B y B m )α y, (58) where A A+ 1. If 4C A = 0, then (58) reduces to a trval problem wth optmal soluton { α y 0 f B y B m 0, = f B y B m < 0. Thus the optmal soluton of (28) when A = 0 s α m = 0, m = 1,..., k f B y B m 0, α y = α m = f B y B m < 0. = 0, m y and m m, α m If A 0, (58) s equvalent to mn A (α y α y 0 + B y B m ) 2. 2A When A < 0, the optmal soluton of (28) s α y =, α m =, and α m = 0, m y and m m. Whle f A > 0, A < 0, the optmum occurs at α m = 0, m = 1,..., k f B y B m 0, α y = (B y B m ) 2A α m = α y f B y B m < 0. (59) = 0, m y and m m α m Note that when A = 0, 1/2A = 2C. Thus (59) can be used n L2 loss for A > 0, A 0. 22

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them? Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Lagrange Multipliers Kernel Trick

Lagrange Multipliers Kernel Trick Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

An Interactive Optimisation Tool for Allocation Problems

An Interactive Optimisation Tool for Allocation Problems An Interactve Optmsaton ool for Allocaton Problems Fredr Bonäs, Joam Westerlund and apo Westerlund Process Desgn Laboratory, Faculty of echnology, Åbo Aadem Unversty, uru 20500, Fnland hs paper presents

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

Integrals and Invariants of Euler-Lagrange Equations

Integrals and Invariants of Euler-Lagrange Equations Lecture 16 Integrals and Invarants of Euler-Lagrange Equatons ME 256 at the Indan Insttute of Scence, Bengaluru Varatonal Methods and Structural Optmzaton G. K. Ananthasuresh Professor, Mechancal Engneerng,

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012 Support Vector Machnes Je Tang Knowledge Engneerng Group Department of Computer Scence and Technology Tsnghua Unversty 2012 1 Outlne What s a Support Vector Machne? Solvng SVMs Kernel Trcks 2 What s a

More information

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Inexact Newton Methods for Inverse Eigenvalue Problems

Inexact Newton Methods for Inverse Eigenvalue Problems Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

A Local Variational Problem of Second Order for a Class of Optimal Control Problems with Nonsmooth Objective Function

A Local Variational Problem of Second Order for a Class of Optimal Control Problems with Nonsmooth Objective Function A Local Varatonal Problem of Second Order for a Class of Optmal Control Problems wth Nonsmooth Objectve Functon Alexander P. Afanasev Insttute for Informaton Transmsson Problems, Russan Academy of Scences,

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition Sngle-Faclty Schedulng over Long Tme Horzons by Logc-based Benders Decomposton Elvn Coban and J. N. Hooker Tepper School of Busness, Carnege Mellon Unversty ecoban@andrew.cmu.edu, john@hooker.tepper.cmu.edu

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

An Admission Control Algorithm in Cloud Computing Systems

An Admission Control Algorithm in Cloud Computing Systems An Admsson Control Algorthm n Cloud Computng Systems Authors: Frank Yeong-Sung Ln Department of Informaton Management Natonal Tawan Unversty Tape, Tawan, R.O.C. ysln@m.ntu.edu.tw Yngje Lan Management Scence

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Evaluation of simple performance measures for tuning SVM hyperparameters

Evaluation of simple performance measures for tuning SVM hyperparameters Evaluaton of smple performance measures for tunng SVM hyperparameters Kabo Duan, S Sathya Keerth, Aun Neow Poo Department of Mechancal Engneerng, Natonal Unversty of Sngapore, 0 Kent Rdge Crescent, 960,

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

A NEW ALGORITHM FOR FINDING THE MINIMUM DISTANCE BETWEEN TWO CONVEX HULLS. Dougsoo Kaown, B.Sc., M.Sc. Dissertation Prepared for the Degree of

A NEW ALGORITHM FOR FINDING THE MINIMUM DISTANCE BETWEEN TWO CONVEX HULLS. Dougsoo Kaown, B.Sc., M.Sc. Dissertation Prepared for the Degree of A NEW ALGORITHM FOR FINDING THE MINIMUM DISTANCE BETWEEN TWO CONVEX HULLS Dougsoo Kaown, B.Sc., M.Sc. Dssertaton Prepared for the Degree of DOCTOR OF PHILOSOPHY UNIVERSITY OF NORTH TEXAS May 2009 APPROVED:

More information

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES BÂRZĂ, Slvu Faculty of Mathematcs-Informatcs Spru Haret Unversty barza_slvu@yahoo.com Abstract Ths paper wants to contnue

More information

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM Internatonal Conference on Ceramcs, Bkaner, Inda Internatonal Journal of Modern Physcs: Conference Seres Vol. 22 (2013) 757 761 World Scentfc Publshng Company DOI: 10.1142/S2010194513010982 FUZZY GOAL

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

CSE 546 Midterm Exam, Fall 2014(with Solution)

CSE 546 Midterm Exam, Fall 2014(with Solution) CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Solving the Quadratic Eigenvalue Complementarity Problem by DC Programming

Solving the Quadratic Eigenvalue Complementarity Problem by DC Programming Solvng the Quadratc Egenvalue Complementarty Problem by DC Programmng Y-Shua Nu 1, Joaqum Júdce, Le Th Hoa An 3 and Pham Dnh Tao 4 1 Shangha JaoTong Unversty, Maths Departement and SJTU-Parstech, Chna

More information

A fast iterative algorithm for support vector data description

A fast iterative algorithm for support vector data description https://do.org/10.1007/s13042-018-0796-7 ORIGINAL ARTICLE A fast teratve algorthm for support vector data descrpton Songfeng Zheng 1 Receved: 9 February 2017 / Accepted: 26 February 2018 Sprnger-Verlag

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k) ISSN 1749-3889 (prnt), 1749-3897 (onlne) Internatonal Journal of Nonlnear Scence Vol.17(2014) No.2,pp.188-192 Modfed Block Jacob-Davdson Method for Solvng Large Sparse Egenproblems Hongy Mao, College of

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Feature Selection in Multi-instance Learning

Feature Selection in Multi-instance Learning The Nnth Internatonal Symposum on Operatons Research and Its Applcatons (ISORA 10) Chengdu-Juzhagou, Chna, August 19 23, 2010 Copyrght 2010 ORSC & APORC, pp. 462 469 Feature Selecton n Mult-nstance Learnng

More information

Fisher Linear Discriminant Analysis

Fisher Linear Discriminant Analysis Fsher Lnear Dscrmnant Analyss Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan Fsher lnear

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Lecture 6: Support Vector Machines

Lecture 6: Support Vector Machines Lecture 6: Support Vector Machnes Marna Melă mmp@stat.washngton.edu Department of Statstcs Unversty of Washngton November, 2018 Lnear SVM s The margn and the expected classfcaton error Maxmum Margn Lnear

More information

Interactive Bi-Level Multi-Objective Integer. Non-linear Programming Problem

Interactive Bi-Level Multi-Objective Integer. Non-linear Programming Problem Appled Mathematcal Scences Vol 5 0 no 65 3 33 Interactve B-Level Mult-Objectve Integer Non-lnear Programmng Problem O E Emam Department of Informaton Systems aculty of Computer Scence and nformaton Helwan

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS BOUNDEDNESS OF THE IESZ TANSFOM WITH MATIX A WEIGHTS Introducton Let L = L ( n, be the functon space wth norm (ˆ f L = f(x C dx d < For a d d matrx valued functon W : wth W (x postve sem-defnte for all

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms

More information