A fast iterative algorithm for support vector data description

Size: px
Start display at page:

Download "A fast iterative algorithm for support vector data description"

Transcription

1 ORIGINAL ARTICLE A fast teratve algorthm for support vector data descrpton Songfeng Zheng 1 Receved: 9 February 2017 / Accepted: 26 February 2018 Sprnger-Verlag GmbH Germany, part of Sprnger Nature 2018 Abstract Support vector data descrpton (SVDD) s a well known model for pattern analyss when only postve examples are relable. SVDD s usually traned by solvng a quadratc programmng problem, whch s tme consumng. Ths paper formulates the Lagrangan of a smply modfed SVDD model as a dfferentable convex functon over the nonnegatve orthant. The resultng mnmzaton problem can be solved by a smple teratve algorthm. The proposed algorthm s easy to mplement, wthout requrng any partcular optmzaton toolbox. Theoretcal and expermental analyss show that the algorthm converges r-lnearly to the unque mnmum pont. Extensve experments on pattern classfcaton were conducted, and compared to the quadratc programmng based SVDD (QP-SVDD), the proposed approach s much more computatonally effcent (hundreds of tmes faster) and yelds smlar performance n terms of recever operatng characterstc curve. Furthermore, the proposed method and QP-SVDD extract almost the same set of support vectors. Keywords Support vector data descrpton Quadratc programmng Penalty functon method Lagrangan dual functon Support vectors 1 Introducton There s a class of pattern recognton problems, such as novelty detecton, where the task s to dscrmnate the pattern of nterest from outlers. In such a stuaton, postve examples for tranng are relatvely easer to obtan and more relable. However, although negatve examples are very abundant, t s usually dffcult to sample enough useful negatve examples for accurately modelng the outlers snce they may belong to any class. In ths case, t s reasonable to assume postve examples clusterng n a certan way. As such, the goal s to accurately descrbe the class of postve examples as opposed to the wde range of negatve examples. For ths purpose, Tax et al. [30, 31, 33] proposed a support vector data descrpton (SVDD) method, whch fts a tght hypersphere n the nonlnearly transformed feature space to nclude most of the postve examples. Thus, SVDD could be regarded as a descrpton of the data dstrbuton of nterest. Extensve experments [30, 31, 33] showed that * Songfeng Zheng SongfengZheng@MssourState.edu 1 Department of Mathematcs, Mssour State Unversty, Sprngfeld, MO 65897, USA SVDD s able to correctly dentfy negatve examples n testng even though t has not seen any durng tranng. Lke support vector machne (SVM) [34, Chap. 10], SVDD s a kernel based method, possessng all the related advantages of kernel machnes. SVDD has been appled to varous problems, ncludng mage classfcaton [38], handwrtten dgt recognton [32], face recognton [18], remote sensng mage analyss [22], medcal mage analyss [29], and multclass problems [17, 37], to name a few. In addton, SVDD s a prelmnary step for support vector clusterng [4]. The formulaton of SVDD leads us to a quadratc programmng problem. Although decomposton technques [24, 25] or sequental mnmzaton method [26] could be employed to solve the quadratc programmng, the tranng of SVDD has tme complexty roughly of order O(n 3 ), where n s the tranng set sze (see Sects. 4.2 and 4.3 for expermental verfcaton). Thus, tranng an SVDD model could be very expensve for large dataset. As such, gven the wde applcaton of SVDD, t s hghly desrable to develop a tme-effcent yet accurate enough tranng algorthm for SVDD. In ths paper, we frst slghtly modfy the formulaton of SVDD model, resultng n a more convex mnmzaton problem wth smpler constrants. We then apply the quadratc penalty functon method [28, Sect ] from Vol.:( )

2 optmzaton theory to absorb an equalty constrant n the dual problem, obtanng a dfferentable convex functon over the nonnegatve orthant as the approxmated Lagrangan functon, whch can be effcently mnmzed by a smple teratve algorthm. We thus call the proposed model as Lagrangan SVDD (L-SVDD). The proposed L-SVDD algorthm s easy to mplement, requrng no partcular optmzaton toolbox besdes basc standard matrx operatons. Theoretcal and expermental analyss show that the algorthm converges r-lnearly to the global mnmum pont and the algorthm has computatonal complexty of the order O(n 2 ) multplyng the teraton number. We test the proposed approach on face detecton and handwrtten dgt recognton problems, and detaled performance measure comparson demonstrates that L-SVDD often yelds testng accuracy very close to or slghtly better than that of the Quadratc Programmng based SVDD (QP- SVDD). More mportantly, L-SVDD s much more computatonally effcent than QP-SVDD (.e., tmes faster on the consdered experments). Furthermore, the two methods extract almost dentcally the same set of support vectors. The followng are several words about our notatons. All scalars are represented by symbols (.e., Englsh or Greek letters) wth normal font. All vectors wll be denoted by bold lower case symbols, and all are column vectors unless transposed to a row vector by a prme superscrpt. All matrces wll be denoted by bold upper case symbols. For a vector x n R n, the plus functon x + s defned as (x + ) = max{0, x }, for = 1,, n. For two vectors a and b n R n, a b means a b for each = 1,, n. The notaton a b means the two vectors a and b are perpendcular, that s, a 1 b 1 + a 2 b a n b n = 0. For a vector x R n, x stands for ts 2-norm, that s, x = x x2. For a square n matrx A of sze n n, A represent the matrx norm, that s A = sup x 0 n Ax x. Thus, Ax A x. If A s postve defnte matrx, A s just the largest egenvalue of A. The rest of ths paper s organzed as follows: Sect. 2 brefly revews the formulaton of SVDD and presents a smple modfcaton to the orgnal SVDD model; Sect. 3 formulates the approxmated Lagrangan dual problem and proposes a smple teratve algorthm to solve t, and the convergence propertes of the algorthm wll also be nvestgated; Sect. 3 dscusses the feasblty of two alternatve schemes as well; Sect. 4 compares the performance measures n terms of recever operatng characterstc curve and tranng tme of the proposed L-SVDD algorthm to those of QP-SVDD on two publcly avalable real-world datasets, and we also compare the support vectors the two methods extracted; the expermental results n Sect. 4 also verfy the role of each parameter n the convergence behavor of the algorthm, and the computatonal complexty s verfed by the expermental results as well; fnally, Sect. 5 summarzes ths paper and dscusses some possble future research drectons. 2 Support vector data descrpton Gven tranng data {x, = 1,, n} wth the feature vector x R p, let Φ( ) be a nonlnear transformaton 1 whch maps the orgnal data vector nto a hgh dmensonal Hlbert feature space. SVDD s lookng for a hypersphere n, wth radus R > 0 and center c, whch has a mnmum volume contanng most of the data. Therefore, we have to mnmze R 2 constraned to Φ(x ) c 2 R 2, for = 1,, n. In addton, snce the tranng sample mght contan outlers, we can ntroduce a set of slack varables ξ 0, as n the framework of support vector machne (SVM) [34, Chap. 10]. The slack varable ξ measures how much the squared dstance from the tranng example x to the center c exceeds the radus squared. Therefore, the slack varable could be understood as a measure of errors. Takng all the consderatons nto account, the SVDD model can be obtaned by solvng the followng optmzaton problem mn R,c,ξ F(R, c, ξ) =R 2 + C wth constrants where ξ =(ξ 1,, ξ n ) s the vector of slack varables, and the parameter C > 0 controls the tradeoff between the volume of the hypersphere and the permtted errors. The Lagrangan dual of the above optmzaton problem s (refer to [30, 31, 33] for detaled dervatons) wth constrants ξ, Φ(x ) c 2 R 2 + ξ, ξ 0, for = 1,, n, mn L(α) = α α α j K(x, x j ) j=1 α K(x, x ), α = 1, 0 α C for = 1,, n, 1 In SVM and SVDD lterature, the explct form of the functon Φ( ) s not mportant, and n fact, t s often dffcult to wrte out Φ( ) explctly. What s mportant s the kernel functon, that s, the nner product of Φ(x ) and Φ(x j ). (1) (2) (3) (4)

3 where K(x, x j )=Φ(x ) Φ(x j ) s the kernel functon whch satsfes Mercer s condton [34, Chap. 10], and α =(α 1,, α n ) wth α beng the Lagrangan multpler for the -th constrant n Eq. (2). Smlar to the works on SVM [21] and support vector regresson [23], we consder the sum of squared errors n the objectve functon gven n Eq. (1), that s, we modfy Eq. (1) to mn R,c,ξ F(R, c, ξ) =R 2 + C ξ 2. Wth ths slght modfcaton, the objectve functon becomes more convex because of the square terms. Furthermore, the nonnegatvty constrant n Eq. (2) could be removed, whch s proved n the followng. Proposton 1 If ( R, c, ξ) s the mnmum pont of functon F(R, c, ξ) defned n Eq. (5), wth the constrants Φ(x ) c 2 R 2 + ξ, for = 1,, n, then all components of ξ should be nonnegatve. Proof Assume ξ =( ξ 1, ξ 2,, ξ n ) and wthout loss of generalty, assume ξ 1 < 0. Let ξ =(0, ξ 2,, ξ n ), that s, we replace the frst component of ξ (whch s negatve) by 0 and keep others unchanged. Snce ξ 1 satsfes the constrant Φ(x 1 ) c 2 R 2 + ξ 1, we must have Φ(x 1 ) c 2 R snce ξ 1 < 0. By assumpton, the constrants are satsfed at ξ 2,, ξ n. Hence, the constrants are satsfed at all components of ξ. However, there s F( R, c, ξ) = R 2 + C =2 ξ 2 snce ξ 1 < 0 by assumpton, and ths s contradcton to the assumpton that ( R, c, ξ) s the mnmum pont. Thus, at the mnmum pont, there must be ξ 1 0. In the same manner, t can be argued that all components of ξ should be nonnegatve. Consequently, the nonnegatve constrant on ξ s not necessary, thus can be removed. The Lagrangan functon of ths new problem s L(R, c, ξ, α) =R 2 + C + ξ 2 < R 2 + C wth α s beng the nonnegatve Lagrangan multplers. At the optmal pont, the partal dervatve to the prmal varables are zeros. Thus, there are [ α Φ(x ) c 2 R 2 ] ξ, ξ 2 = F( R, c, ξ) (5) L R = 0 2R 2Rα = 0 and Substtutng these results to the Lagrangan functon, we have the dual functon as The purpose s now to maxmze L(α) or mnmze L(α) = L(α) 2 wth respect to nonnegatve α s wth the constrant n Eq. (6), that s wth constrants α = 1, Wth the sum of errors replaced by sum of squared errors, the resultng dual problem n Eq. (7) has an extra quadratc term, compared to the orgnal dual problem n Eq. (3), and ths wll mprove the convexty of the objectve functon. Moreover, by comparng the constrants n Eqs. (4) and (8), t s clear that the new optmzaton problem has smpler constrants, wthout any upper bound for the dual varables. As mplemented n popular SVM toolboxes [7, 14, 15], the quadratc programmng problems n Eqs. (3) and (7) can be solved by decomposton methods [24, 25] or sequental mnmal optmzaton method [26]. However, these algorthms are computatonally expensve wth tme complexty roughly O(n 3 ). Thus, a fast tranng algorthm for SVDD whch can acheve smlar accuracy as the quadratc programmng method s hghly apprecated. L ξ = 0 2Cξ α = 0 ξ = α 2C L c = 0 2 α (Φ(x ) c) =0 c = L(α) = 1 4C mn α α 2 j=1 L(α) = 1 α 2 + 4C α K(x, x ), α α j K(x, x j )+ for = 1, 2,, n, α Φ(x ). (6) α K(x, x ). α α j K(x, x j ) j=1 α = 1, α 0 for = 1,, n. 2 We slghtly abuse the notaton here because L(α) was used n Eq. (3). However, ths wll not cause any confuson because all of our followng dscussons are based on Eq. (7). (7) (8)

4 3 Lagrangan support vector data descrpton 3.1 The algorthm Let K be the n n kernel matrx, that s, K j = K(x, x j ), let vector u be formed by the dagonal elements of kernel matrx K, let I n be the n n dentty matrx, and let 1 n ( 0 n ) be the n-dmensonal vector of all 1 s (0 s). The optmzaton problem n Eq. (7) could be wrtten compactly n matrx form as mn α L(α) = 1 2 α ( In 2C + 2K wth constrants 1 n α = 1 and α 0 n. To deal wth the equalty constrant n Eq. (10), we consder the penalty functon method [28, Sect ] from optmzaton theory. The basc dea s ntegratng the orgnal objectve functon wth a functon whch ncorporates some constrants, n order to approxmate a constraned optmzaton problem by an unconstraned problem or one wth smpler constrants. For our problem, we consder the followng functon where J n s the n n matrx of all elements beng 1. As proved n [28], as the penalty parameter ρ, the mnmum pont of Eq. (11) wth α 0 n converges to the soluton to Eq. (9) wth constrants n Eq. (10). Let The matrx Q s postve defnte because both K and J n are postve sem-defnte whle I n s postve defnte. Ignorng the constant term n Eq. (11), we can formulate the approxmated mnmzaton problem as The Kuhn Tucker statonary-pont problem [20, p. 94, KTP 7.2.4] for Eq. (13) s From these equatons, we have ) α u α, f (α) = 1 ( ) In 2 α 2C + 2K α u α + ρ(1 α 1)2 n = 1 ( ) In 2 α 2C + 2K + 2ρJ n α (u + 2ρ1 n ) α + ρ, Q = I n 2C + 2K + 2ρJ n and v = u + 2ρ1 n. 1 mn α 0 n 2 α Qα v α. Qα v w = 0 n, w α = 0, α 0 n, w 0 n. w = Qα v 0 n and (Qα v) α = 0, (9) (10) (11) (12) (13) whch can be summarzed as solvng the classcal lnear complementarty problem [10], that s, solvng for α, such that, 0 n (Qα v) α 0 n. (14) Snce the matrx Q s symmetrc postve-defnte, the exstence and unqueness of the soluton to Eq. (14) s guaranteed [9]. The optmalty condton n Eq. (14) s satsfed f and only f for any γ >0, the relatonshp Qα v =(Qα v γα) + (15) holds. See Appendx for a proof. To obtan a soluton to the above problem, we start from an ntal pont α 0, and apply the followng teratve scheme α k+1 = Q 1 (v +(Qα k v γα k ) + ). (16) The ntal pont α 0 could be any vector, but n our mplementaton, we take α 0 = Q 1 v. We summarze the algorthm for L-SVDD as n Algorthm 1 below, and the convergence analyss wll be gven n Sect Algorthm 1: Lagrangan Support Vector Data Descrpton 0. Intalzaton: choose the startng pont as α 0 = Q 1 v, fnd α 1 usng Eq. (16), set k = 1, set the teraton number as M, and set the error tolerance as ε. 1. whle k < M and α k α k 1 > ε do: 2. Set k = k Fnd α k usng Eq. (16). 4. end whle 5. Return the vector α k. Remark 1 In Algorthm 1, we termnate the program when the soluton does not change too much. Snce the purpose s to fnd a soluton to Eq. (14), we can also termnate the program f the absolute value of nner product (α k ) (Qα k v) s below a certan level. However, snce t ncludes matrx and vector multplcaton, ths stoppng crteron s more expensve to evaluate than the one n Algorthm 1. Thus, n our mplementaton, we choose to use the stoppng rule gven n Algorthm 1. Our expermental results show that when the algorthm stops, the nner product ndeed s very close to 0. Please see Sects. 4.2 and 4.3 for the detaled results. Remark 2 Each teraton of Algorthm 1 ncludes matrx multplyng vector, vector addton/subtracton, and takng postve part of a vector component-wse, among whch the most expensve operaton s matrx multplyng vector, whch has computatonal complexty of order O(n 2 ). We thus expect the computatonal complexty of Algorthm 1 to be about teraton number multplyng O(n 2 ). Sectons 4.2 and 4.3 verfy ths analyss expermentally.

5 Smlar to SVM, we call the tranng examples wth correspondng α s nonzero as support vectors. Once α s are obtaned, the radus R can be computed from the set of support vectors [30, 31]. In the stage of decson makng, f the dstance from a new example x to the center s less than the radus R, t s classfed as a postve example; otherwse, t s classfed as a negatve example. That s, the decson rule s ( f (x) =sgn R 2 2 ) Φ(x) α Φ(x ) ( ) (17) = sgn 2 α K(x, x) K(x, x)+b, where b = R 2 n n j=1 α α j K(x, x j ). 3.2 Convergence analyss To analyze the convergence property of Algorthm 1, we need the followng Lemma 1 Let a and b be two ponts n R p, then a + b + a b. (18) Proof For two real numbers a and b, there are four stuatons (1) a 0 and b 0, then a + b + = a b ; (2) a 0 and b 0, then a + b + = a 0 a b ; (3) a 0 and b 0, then a + b + = 0 b a b ; (4) a 0 and b 0, then a + b + = 0 0 a b. In summary, for one dmensonal case, there s a + b + 2 a b 2. Assume that Eq. (18) s true for p dmensonal vectors a p and b p. Denote the p + 1 dmensonal vectors a and b as 3 a =(a p, a p+1 ) and b =(b p, b p+1 ), where a p+1 and b p+1 are real numbers. Then, a + b + 2 = ((a p ) + (b p ) +, (a p+1 ) + (b p+1 ) + ) 2 = (a p ) + (b p ) ((a p+1 ) + (b p+1 ) + ) 2 a p b p 2 +(a p+1 b p+1 ) 2 = a b 2, (19) where n Eq. (19), we used the assumpton on the p dmensonal vectors, the specal result for one dmensonal case, and the defnton of Eucldean norm. By nducton, Eq. (18) s proved. 3 For notatonal convenence, n ths proof, we assume all the vectors are row vectors. Clearly, the result also apples to column vectors. Wth the ad of Lemma 1, we are ready to study the convergence behavor of Algorthm 1, and we have the followng concluson. Proposton 2 Wth 0 <γ<1 C, the sequence α k obtaned by Algorthm 1 converges r-lnearly [2] to the unque soluton α of Eq. (13), that s lm sup α k α 1 k < 1. k Proof The convexty of the objectve functon n Eq. (13) and the convexty of the feasble regon ensure the exstence and unqueness of soluton α to Eq. (13). Snce α s the soluton to Eq. (13), t must satsfy the optmalty condton n Eq. (15), that s, for any γ >0 Q α = v +(Q α v γ α) +. Multplyng Eq. (16) by Q and subtractng Eq. (20), and then takng norm gves us (21) Applyng Lemma 1 to the vectors Qα k v γα k and Q α v γ α n Eq. (21), we have In the defnton of Q n Eq. (12), t s clear that the matrx 2K + 2ρJ n s postve sem-defnte because both K and J n are, and we denote ts egenvalues as λ, = 1, 2,, n. Then the egenvalues of Q are 1 + λ 2C and the egenvalues of Q 1 are ( 1 + λ 2C ) 1, = 1, 2,, n. To make the sequence Qα k Q α converge, Eq. (22) ndcates that we need I γq 1 < 1, that s, the egenvalues of I γq 1 are all between 1 and 1, or Thus, wth the choce of 0 <γ<1 C, we have (20) Qα k+1 Q α = (Qα k v γα k ) + (Q α v γ α) +. Qα k+1 Q α (Qα k v γα k ) (Q α v γ α) = (Q γi)(α k α) = (I γq 1 )(Qα k Q α) I γq 1 Qα k Q α. ( ) < 1 γ 2C + λ < 1 for = 1, 2,, n, ( ) 1 0 <γ<2 2C + λ = 1 C + 2λ for = 1, 2,, n. c = I γq 1 < 1. (22)

6 Recursvely applyng Eq. (22), we have that for any k, Qα k Q α c k Qα 0 Q α. Consequently, α k α = Q 1 Q(α k α) Q 1 Qα k Q α c k Q 1 Qα 0 Q α = Ac k, where A = Q 1 Qα 0 Q α > 0. Hence, lm sup k α k α 1 k lm sup A 1 k c = c < 1. k Ths proves the proposton. (23) Remark 3 The proof of Proposton 2 enables us to estmate the teraton number M. If we requre the accuracy of the soluton to be ε, n the sense that α M α <ε. From Eq. (23), t s suffcent to have α M α < Ac M = ε, where A = Q 1 Qα 0 Q α and c = I γq 1. Ths enables us to solve for M as M = log ε log A. log c However, A cannot be calculated because α s unknown. Hence, n the mplementaton, we set M as a large number and termnate the program accordng to the crteron n Algorthm 1. From the proof of Proposton 2, the convergence rate of Algorthm 1 depends on c, the norm of matrx I γq 1. The analyss n the proof shows that a smaller value of c gves a faster convergence rate. By the defnton of matrx norm, there s c = I γq 1 = max { ( ) 1 1 } 1 γ 2C + λ, from whch t s clear that a larger value of γ makes c smaller and consequently makes the algorthm converge faster. In accord wth Proposton 2, let us assume that γ = a C for some constant 0 < a < 1. We have, c = max = max = max = max { ( 1 1 γ { 1 a C { 1 { Cλ ) 1 } 2C + λ ( ) 1 1 } 2C + λ ( ) 1 2a + λ C 1 } a } 2a. Thus, a small value of C and the smallest egenvalue wll 2a make large hence c small, and consequently the algo- 1+2Cλ rthm wll converge faster. However, to the best of our knowledge, there s no theoretcal concluson about the dependence between the egenvalues of 2K + 2ρJ n and ρ. Fortunately, our numercal tests revealed that wth the change of ρ, the largest egenvalue of 2K + 2ρJ n changes dramatcally whle the smallest egenvalue does not change too much. Snce c depends on the smallest egenvalue of 2K + 2ρJ n, we thus conclude that the convergence rate of Algorthm 1 s not affected much by ρ. In summary, we reach the concluson that, to acheve a faster convergence rate, we should set γ large and C small ( γ depends on C n our mplementaton), and ρ does not sgnfcantly mpact the convergence behavor. We should menton that C also controls the error of the model, so settng C small mght make the resultng model perform poorly n classfcaton. We wll numercally verfy these analyss n Sect Dscusson on two alternatves We tran the L-SVDD model by solvng the lnear complementarty problem n Eq. (14), and Algorthm 1 s based on the condton n Eq. (15) at the optmum pont. Alternatvely, the optmalty condton can be wrtten as α =(α γ (Qα v)) +, (24) where γ > 0. In prncple, smlar to Algorthm 1, we can desgn an algorthm based on recursve relaton α k+1 =(α k γ (Qα k v)) +, (25) wth some approprately selected γ. Intutvely, the algorthm based on Eq. (25) should be more computatonally effcent n each teraton than Algorthm 1 whch s based on Eq. (16). The reason s that Eq. (16) nvolves three vector addton/subtracton operatons and two matrx and vector multplcatons, whle Eq. (25) only ncludes two vector addton/subtracton operatons and one matrx and vector multplcaton. To choose γ, as n the proof of Proposton 2, we let the unque soluton to Eq. (14) be α, whch must satsfy α =( α γ (Q α v)) +. (26) Subtractng Eq. (26) from Eq. (25), takng norm, and applyng Lemma 1, we have α k+1 α = (α k γ (Qα k v)) + ( α γ (Q α v)) + (α k γ (Qα k v)) ( α γ (Q α v)) = (I γ Q)(α k α) I γ Q α k α. Thus, to make the potental algorthm converge, we must have I γ Q < 1.

7 Denote the egenvalues of matrx 2K + 2ρJ n as λ, = 1, 2,, n, then the egenvalues of I γ Q are 1 γ ( 1 + λ 2C ). To ensure I γ Q < 1, we need all the egenvalues of I γ Q to be between 1 and 1, that s ( ) 1 1 < 1 γ 2C + λ < 1 for = 1, 2,, n, or 2 4C 0 < γ < 1 + λ = for = 1, 2,, n Cλ 2C Thus, we should choose γ as 0 < γ < 4C 1 + 2Cλ max, where λ max = max{λ 1, λ 2,, λ n }. However, our expermental results 4 show that λ max s large because ρ need to be large, as requred by the penalty functon method. As a result, ths wll make γ very small and consequently, make the potental algorthm based on Eq. (25) converge slowly. We tested ths alternatve dea n our experments, and the result showed that, under the same stoppng crteron, compared to Algorthm 1, although each teraton s more tme effcent, the algorthm based on Eq. (25) needs much more teratons to converge, 5 hence spendng more tme. The purpose of Algorthm 1 s to solve the quadratc mnmzaton problem gven n Eq. (13), and another alternatve s to apply nteror pont method (IPM) [2, 5, 28] to ths problem. Let z k be the k-th step soluton to the problem of mnmzng some convex functon g(z) wth some constrants usng IPM, and denote the global mnmum pont as z μ. In [2], t was proved that z k converges to z μ not only r-lnearly, but also q-superlnearly n the sense that z k+1 z μ z k z μ 0. Proposton 2 shows that the proposed L-SVDD algorthm also has r-lnear convergence rate. However, L-SVDD algorthm cannot acheve q-superlnear convergence rate. Ths means that theoretcally, IPM should converge n fewer teratons than L-SVDD. In each teraton of L-SVDD algorthm, the operaton s qute smple, wth the most expensve computaton beng matrx and vector multplcaton. However, each IPM teraton s much more complcated, because t ncludes evaluatng the objectve functon and constrant values, calculatng 4 To the best of our knowledge, there s no theoretcal result regardng the dependence between the largest egenvalue of matrx 2K + 2ρJ n and the parameter ρ. 5 For nstance, on the face detecton problem n Sect. 4.2, to acheve the error tolerance of 10 5, the algorthm based on Eq. (25) needs more than 20,000 teratons to converge. the gradents and Hessan, fndng the search drecton, and conductng a backtrackng lne search to update the soluton. These operatons nclude several matrx nverson and more matrx multplcatons. Thus, each teraton of IPM s much more expensve than L-SVDD. Hence, although IPM converges n fewer teratons, t mght spend more computng tme and resource (e.g., memory) than L-SVDD. We developed the nteror pont method based SVDD model (IPM-SVDD) by adaptng the MATLAB code from https ://pcarb o.gthu b.o/conve xprog.html. Secton 4.2 presents the comparson between IPM-SVDD and L-SVDD n terms of teraton number and CPU tme for achevng convergence. 4 Expermental results and analyss On a face dataset and the USPS handwrtten dgt dataset, we compared the performances of the proposed Lagrangan SVDD (L-SVDD) and the ordnary Quadratc Programmng based SVDD (QP-SVDD), whch s obtaned by applyng a quadratc programmng solver to Eq. (7) wth constrants n Eq. (8). 4.1 Experment setup and performance measures The program for L-SVDD was developed usng MATLAB, and we dd not do any specfc code optmzaton; QP-SVDD was mplemented based on the MATLAB SVM toolbox [14] wth the core quadratc programmng solver wrtten n C++. The source code of ths work s avalable upon request. All the experments were conducted on a laptop computer wth Intel(R) Core(TM) M CPU 2.50 GHz and 4 GB memory, wth Wndows 7 Professonal operatng system and MATLAB R2007b as the platform. Durng all experments that ncorporated measurement of runnng tme, one core was used solely for the experments, and the number of other processes runnng on the system was mnmzed. In our experments, we adopted the Gaussan kernel K(u, v) =exp { u v 2 2σ 2 } wth σ = 8. We set the SVDD control parameter C = 2 n the algorthms. The parameter settng n our experments mght not be optmal to acheve the best testng performance. Nonetheless, our purpose s not to acheve the least testng error, but to compare the performances between L-SVDD and QP-SVDD; therefore, the comparson s far as long as the parameter settngs are the same for the two algorthms. In general, we can select the optmal parameter settng (C, σ) by applyng cross valdaton, generalzed approxmate cross valdaton [36], or other crtera mentoned n [8, 13], but

8 1 ROC Curve for QP SVDD 1 ROC Curve for L SVDD True Postve Rate True Postve Rate False Postve Rate (a) False Postve Rate (b) Fg. 1 The ROC curves of a QP-SVDD and b L-SVDD on CBCL testng dataset snce t s not the focus of ths paper, we choose not to pursue further n ths ssue. To make the L-SVDD algorthm converge fast, from the concluson at the end of Sect. 3.2, we should set γ a large value. In all of our experments, we chose γ = 0.95 C to ensure the convergence of Algorthm 1. Accordng to the theoretcal property of penalty functon method [28, Sect ], the parameter ρ n the defnton of Q n Eq. (12) should be as large as possble. In our experments, we found that settng ρ = 200 s enough to ensure the closeness of 1 nα and 1 (n fact, all the expermental results have 1 nα 1 < when the L-SVDD tranng algorthm termnates). Secton 4.2 also studes the roles of the parameters C, γ, and ρ n the convergence behavor of Algorthm 1. The teraton number M n Algorthm 1 was set to be 3000, although n our experments we found that most tmes the algorthm converges n 1000 teratons. The tolerance parameter n Algorthm 1 was set to be In the decson rule gven by Eq. (17), we changed the value of the parameter b, and for each value of b, we calculated the true postve rate and false postve rate. We then plot the true postve rate vs. the false postve rate, resultng the recever operatng characterstc (ROC) curve [11], whch wll be used to llustrate the performance of the classfer. The classfer performs better f the correspondng ROC curve s hgher. In order to numercally compare the ROC curves of dfferent methods, we calculate the area under the curve (AUC) [11]. The larger the AUC, the better the overall performance of the classfer. To compare the support vectors extracted by the two algorthms, we regard the support vectors gven by QP- SVDD as true support vectors, and denote them as the set SV Q, denote the support vectors from L-SVDD as SV L. We defne precson and recall [27] as precson = SV Q SV L SV L where S represents the sze of a set S. Hgh precson means that L-SVDD fnds substantally more correct support vectors than ncorrect ones, whle hgh recall means that L-SVDD extracts most of the correct support vectors. 4.2 Face detecton and recall = SV Q SV L, SV Q Ths experment used the face dataset provded by the Center for Bologcal and Computatonal Learnng (CBCL) at MIT. The tranng set conssts of 6977 mages, wth 2429 face mages and 4548 non-face mages; the testng set conssts of 24,045 mages, ncludng 472 face mages and 23,573 nonface mages. Each mage s of sze 19 19, wth pxel values between 0 and 1. We dd not extract any specfc features for face detecton (e.g., the features used n [35]), nstead, we drectly used the pxel values as nput to L-SVDD and QP-SVDD. Fgure 1 shows the ROC curves of QP-SVDD and L-SVDD on the testng set. The ROC curves are so close that f we plot them n the same fgure, they wll be ndstngushable. The closeness of the ROC curves ndcates that the two resultng classfers should perform very closely. Indeed, the AUC for QP-SVDD s whle the AUC for L-SVDD s , that s, L-SVDD classfer performs even slghtly better. On our computer, QP-SVDD spent s n tranng whle t took the L-SVDD tranng algorthm 862 teratons to converge, consumng only s. In another word, the proposed L-SVDD s almost 560 tmes faster than QP-SVDD n tranng. Thus, we could conclude that L-SVDD s much more tme-effcent n tranng

9 Convergence Curve of α Calculated Curve Ftted Exponental Curve 1 0 x Norm of Dfference Inner Product Iteraton Number (a) Iteraton Number (b) Fg. 2 a On the CBCL face tranng dataset, the convergence of the L-SVDD soluton α k to the QP-SVDD soluton α Q n terms of the norm of the dfference vector. b The evoluton of the nner product (α k ) (Qα k v) wth the teraton number Table 1 The tranng tme (t, n seconds) of QP-SVDD wth dfferent tranng set sze (n) Tranng set sze (n) Tranng tme (t) t n 3 ( 10 6 ) For the purpose of complexty analyss, the values for t n 3 ( 10 6 ) are also lsted than QP-SVDD, whle stll possessng smlar classfcaton performance. QP-SVDD found 79 support vectors, whle L-SVDD extracted 85 support vectors, wth the precson rate 92.94% and the recall rate 100%. These numbers ndcate that L-SVDD and QP-SVDD obtan models wth smlar complexty. The precson and recall rates show that L-SVDD correctly fnds all the support vectors whle only mstakenly regards a few tranng examples as support vectors. We should menton that, by mathematcal and expermental analyss, the orgnal SVDD papers [31, 33] conclude that, for the SVDD model wth Gaussan kernel, the number of support vectors decreases as the kernel parameter σ or the control parameter C ncreases. Ths concluson should be true for L-SVDD because t s just another soluton (although approxmate) to the same problem. We tested ths conjecture by changng dfferent parameter settngs, and found that the number of support vectors for L-SVDD also follows the concluson presented n [31, 33]. Snce ths ssue s not partcular to L-SVDD, we choose not to present the results. We denote the soluton of QP-SVDD as α Q, and calculate the norm of the dfference between the k-th teraton soluton of L-SVDD and the QP-SVDD soluton, that s, α k α Q. The black curve n Fg. 2a represents the evoluton of the norm α k α Q, whch shows that the dfference ndeed decreases to zero exponentally. We further ft the obtaned values of α k α Q to an exponental functon Ac k, and we plot the ftted curve n Fg. 2a as the red dashed curve. We see that the calculated values and the ftted values are close to each other, and ths numercally verfes our theoretcal result n Eq. (23). The purpose of Algorthm 1 s to solve Eq. (14) wth respect to α. To observe the evoluton of the soluton, Fg. 2b plots the nner product (α k ) (Qα k v), whch approaches 0 as the tranng program proceeds. In fact, when the algorthm termnated, the nner product value was , very close to 0. To numercally nvestgate the relatonshp between the tranng tme of QP-SVDD and the tranng set sze, we gradually ncrease the tranng set sze and record the tranng tme (n seconds) of QP-SVDD, and the results are gven n Table 1, n whch we also lst the rato between the tranng tme and the cubc of tranng set sze. Table 1 shows that, wth the ncreasng of tranng set sze, the tranng tme ncreases dramatcally, and the majorty of the rato t n 3 are around (average of the 3rd row). Ths suggests that the tranng tme complexty of QP-SVDD s around O(n 3 ). We conducted the same experment wth L-SVDD, and Table 2 gves the tranng tme of L-SVDD for dfferent tranng set sze, and the number of teratons needed are

10 Table 2 The tranng tme (t, n seconds) of L-SVDD wth dfferent tranng set sze (n) Tranng set sze (n) Tranng tme (t) # Iteraton Tme per teraton (tp) tp n 2 ( 10 8 ) For the purpose of tme complexty analyss, the teraton numbers needed to converge, the tme per teraton (tp), and tp n 2 are also lsted Table 3 On a subset from face data, the values of n α wth dfferent parameter settngs ρ C = C = C = C = Table 4 On a subset of face data, the number of teratons needed for L-SVDD to converge wth dfferent values of γ γ 0.9 / C 0.92 / C 0.94 / C 0.95 / C 0.97 / C 0.98 / C 0.99 / C # Iteraton Table 5 On a subset of face data, the number of teratons needed for L-SVDD to converge wth dfferent values of C C # Iteraton Table 6 On a subset of face data, the number of teratons needed for L-SVDD to converge wth dfferent values of ρ ρ # Iteraton also shown. To verfy our theoretcal analyss n Remark 2, we calculated the tme per teraton (tp), and the rato between tp and n 2, whch are also presented n Table 2. Table 2 shows that the values of tp n 2 are around a constant,.e., (average of the 5th row). Ths ndcates that the tme per teraton s of order O(n 2 ), hence the total tranng tme complexty of L-SVDD s about the teraton number multplyng O(n 2 ), whch s consstent wth the theoretcal analyss n Remark 2. In the dervaton of Eq. (11), to absorb the constrant n α = 1 n Eq. (10), we ntroduced a parameter ρ n the penalty functon, and the penalty functon method requres that ρ should be large enough to ensure the valdty of the constrant [28, Sect ]. To verfy ths, we randomly selected 600 face mages as tranng data and run the L-SVDD algorthm for dfferent settngs of ρ and C (snce γ depends on C through γ = 0.95 C ). Table 3 gves the values of n α for dfferent parameter settngs. The results show that as long as ρ large enough, n α s n reasonably close to 1. Moreover, for a fxed ρ, α s almost the same for a varety of C values. Ths paper sets ρ = 200, whch s suffcent to make the constrant hold approxmately. To expermentally nvestgate the mpact of parameters C, ρ, and γ to the convergence behavor of L-SVDD, we randomly select 600 face mages, and run L-SVDD tranng algorthm wth dfferent parameter settngs. We frst fx C = 4 and ρ = 200, and change the value of γ. Table 4 lsts the number of teratons to converge along wth dfferent values of γ. It s observed that as γ ncreases, the algorthm converges faster. Next, we fx ρ = 100, set γ = 0.95 C, and change the value of C, and Table 5 gves the number of teratons to converge along wth dfferent values of

11 Table 7 The teraton numbers needed and the tranng tme (n seconds) of IPM-SVDD and L-SVDD wth dfferent parameter settngs ( γ was set to be 0.95 / C) λ: C : # Iter IPM-SVDD Tme IPM-SVDD # Iter L-SVDD Tme L-SVDD C. We see that a smaller C wll result n faster convergence. Fnally, we fx C = 4 and γ = 0.95 C, and change the parameter ρ. Table 6 shows the number of teratons to converge wth dfferent values of ρ. It s evdent that ρ hardly has any sgnfcant nfluence to the convergence rate. All these numercal results are consstent wth our theoretcal analyss n Sect To compare the nteror pont method based SVDD (IPM- SVDD) to the proposed L-SVDD, we randomly select 600 face mages and tran the two SVDD models wth dfferent parameter settngs, and the convergence measures are gven n Table 7. Table 7 shows that, wth all the parameter settngs, although IPM-SVDD converges n fewer teratons, t spends consderably more tranng tme than L-SVDD (.e., tmes slower). Ths s expected due to the reason that was stated n Sect We also notce that n all cases, the dfference between the IPM soluton ( α I ) and L-SVDD soluton ( α L ) was wthn 10 4 n terms of the norm of the dfference vector,.e., α I α L < 10 4, and ths means that the two methods get very close models. In our experments, t was also observed that f we set the tranng set sze n = 1200, the IPM-SVDD tranng program would have memory ssue on our computer but L-SVDD dd not have such problem even for n = Ths s because IPM needs varables for gradent, Hessan, search drectons, and other ntermedate results, whle L-SVDD does not have any ntermedate result to store n memory. From our analyss and expermental results, we may clam that the proposed L-SVDD s not only tme effcent but also space effcent, compared to IPM-SVDD. 4.3 Handwrtten dgt recognton The handwrtten dgts dataset conssts of 7291 tranng examples and 2007 testng examples, and s avalable at https ://web.stanf ord.edu/~hast e/elems tatle arn/. The dataset conssts of normalzed handwrtten dgts ( 0 to 9 ), each s a gray-scale mage. Same as the experment n Sect. 4.2, we smply use these 256 pxel values as nputs to the algorthms. To compare the performance of L-SVDD to that of QP- SVDD, we created ten bnary classfcaton problems, wth each dgt as postve class, respectvely. The performance measures of QP-SVDD and L-SVDD are gven n Table 8, whch lsts the number of support vectors, AUC on testng set, and tranng tme for each algorthm. We observe that for all the ten problems, the AUC of both algorthms are qute close, snce the dfference between AUC s at most 0.001, and most problems have AUC dfference even under The ROC curves of QP-SVDD and L-SVDD, smlar to the results on CBCL face data, are almost ndstngushable (also ndcated by the AUC values), we thus choose not to present the ROC curves. Table 8 also lsts the tranng tmes of QP-SVDD and L-SVDD on all the ten problems, and the szes of the consdered problems are also gven. It s observed that for the problems wth sze under 1000, L-SVDD termnates wthn 1 second; for the two problems wth larger sze, L-SVDD needs 2 3 s. To clearly llustrate the speed advantage of L-SVDD, Table 8 lsts the rato between the tranng tmes of QP-SVDD and L-SVDD for dfferent problems. It s clear that on most of the problems, L-SVDD s tmes faster than the QP counterpart. More mportantly, we should menton that n our mplementaton, the core quadratc programmng code for QP-SVDD was developed n C++ whch s much more computatonally effcent than MATLAB, n whch L-SVDD was developed. Takng ths factor nto account, L-SVDD would be much more tme-effcent than QP-SVDD, f they were mplemented n the same programmng language and ran on the same platform. To gan further nsght about the computatonal complexty of QP-SVDD, we compare the tranng tme of QP- SVDD on each problem. Snce the problem for dgt 8 has the smallest sze, we use t as the baselne. We calculate the rato of the tranng tme on each dgt to that on dgt 8, along wth the cubc of the problem sze rato, and the results are gven n Table 9. Table 9 shows that the two numbers are close enough for each problem, whch ndcates that the tranng tme of QP-SVDD grows roughly n the rate of O(n 3 ). Table 11 lsts the number of teratons needed for L-SVDD tranng algorthm to converge. We see that the algorthm usually termnates n 600 teratons, wth only one excepton. To analyze the computatonal complexty of L-SVDD, we frst calculate the average tme spent on one teraton for each problem, usng the nformaton gven n Tables 8 and 11. We denote the results as tp 0 through tp 9 ( tp stands for tme per teraton ). Same as the analyss

12 Table 8 On the handwrtten dgt dataset, the performance comparson between QP-SVDD and L-SVDD Dgt # Tran Method # SV AUC Tranng tme Tme rato QP-SVDD L-SVDD QP-SVDD L-SVDD QP-SVDD L-SVDD QP-SVDD L-SVDD QP-SVDD L-SVDD QP-SVDD L-SVDD QP-SVDD L-SVDD QP-SVDD L-SVDD QP-SVDD L-SVDD QP-SVDD L-SVDD The lsted are the tranng set sze, the number of support vectors found by each method, the AUC on the testng set by each method, and the tranng tmes n seconds. The last column gves the rato between the tranng tmes of the two algorthms on each problem Table 9 For QP-SVDD, the cubc of problem sze ratos and the tranng tme ratos on the handwrtten dgt dataset (n 0 n 8 ) 3 = (n 1 n 8 ) 3 = 6.38 (n 2 n 8 ) 3 = 2.45 (n 3 n 8 ) 3 = 1.79 (n 4 n 8 ) 3 = 1.74 t 0 t 8 = t 1 t 8 = 7.16 t 2 t 8 = 2.48 t 3 t 8 = 1.82 t 4 t 8 = 1.76 (n 5 n 8 ) 3 = 1.08 (n 6 n 8 ) 3 = 1.84 (n 7 n 8 ) 3 = 1.68 (n 8 n 8 ) 3 = 1 (n 9 n 8 ) 3 = 1.67 t 5 t 8 = 1.06 t 6 t 8 = 1.87 t 7 t 8 = 1.69 t 8 t 8 = 1 t 9 t 8 = 1.71 The results show that QP-SVDD roughly has computatonal complexty O(n 3 ) Table 10 For L-SVDD, the square of problem sze ratos and the rato of tme per teraton n tranng on the handwrtten dgt dataset (n 0 n 8 ) 2 = 4.85 (n 1 n 8 ) 2 = 3.44 (n 2 n 8 ) 2 = 1.82 (n 3 n 8 ) 2 = 1.47 (n 4 n 8 ) 2 = 1.45 tp 0 tp 8 = 4.54 tp 1 tp 8 = 2.76 tp 2 tp 8 = 2.00 tp 3 tp 8 = 1.43 tp 4 tp 8 = 1.58 (n 5 n 8 ) 2 = 1.05 (n 6 n 8 ) 2 = 1.50 (n 7 n 8 ) 2 = 1.42 (n 9 n 8 ) 2 = 1 (n 9 n 8 ) 2 = 1.41 tp 5 tp 8 = 0.99 tp 6 tp 8 = 1.39 tp 7 tp 8 = 1.43 tp 8 tp 8 = 1 tp 9 tp 8 = 1.43 The results show that L-SVDD roughly has computatonal complexty O(n 2 ) multplyng the teraton number for QP-SVDD, we use dgt 8 as the baselne, and calculate the rato of the tme per teraton n tranng on each dgt to that on dgt 8, along wth the square of the problem sze rato. The results are presented n Table 10, whch shows that these numbers are reasonably close for each problem, and ths ndcates that the tme per teraton grows roughly wth order O(n 2 ). Consequently, the tme complexty of tranng L-SVDD (.e., Algorthm 1) s about O(n 2 ) multplyng the teraton number, whch s consstent wth the result n Sect These analyss verfes our theoretcal analyss n Remark 2. In our mplementatons for QP-SVDD and L-SVDD, the kernel matrx was calculated beforehand, thus the tme spent on calculatng kernel matrx was not counted as tranng tme. In fact, computng kernel matrx spent much more tme than Algorthm 1 tself n our experments. 6 Our theoretcal 6 For example, for dgt 0, our computer spent s on constructng kernel matrx, whle only s on Algorthm 1.

13 Table 11 Comparson of the support vectors extracted by QP-SVDD and those extracted by L-SVDD, n terms of precson and recall rates Dgt Precson 98.20% 94.74% 100% 97.62% 98.86% Recall 100% 100% 100% 100% 100% #Iteraton Dgt Precson 98.95% 98.78% 98.41% 98.85% 97.01% Recall 100% 100% 100% 100% 100% #Iteraton Ths table also presents the teraton numbers needed for L-SVDD to converge Convergence Curve of α Calculated Curve Ftted Exponental Curve 1 0 x 10 3 Norm of Dfference Inner Product Iteraton Number (a) Iteraton Number (b) Fg. 3 a On dgt 4, the convergence of the L-SVDD soluton α k to the QP-SVDD soluton α Q n terms of the norm of the dfference vector. b On dgt 4, the evoluton of the nner product (α k ) (Qα k v) wth the teraton number analyss of the computatonal complexty for QP-SVDD and L-SVDD s consstent wth the expermental results, but the order of complexty seems not to support the results presented n Table 8, whch shows that L-SVDD s much more tme-effcent than QP-SVDD. To explan ths apparent paradox, we wrte the tranng tme of QP-SVDD as t Q = k 1 n 3, and that of L-SVDD could be wrtten as t L = k 2 Mn 2, where M s the teraton number of L-SVDD, n s the tranng set sze, and k 1 and k 2 are constants. Usng the nformaton form Table 8, we can calculate that n average k ; and we calculate k from Tables 8 and 11. Note that these numbers are comparable to those obtaned n Sect. 4.2 from Tables 1 and 2. Therefore, k 1 s much larger than k 2 n average (about 170 tmes), and ths explans why L-SVDD s so much more tme-effcent than the QP counterpart. Table 8 shows that QP-SVDD and L-SVDD extract roughly the same number of support vectors. To nvestgate the overlap of the sets of support vectors, we calculate the precson and recall rates, gven n Table 11. The results show that all the recall rates are 100%, whch means that L-SVDD extracts all the support vectors whch are found by QP-SVDD, and the very hgh precson rates demonstrate that L-SVDD only mstakenly regards a few tranng examples as support vectors. Smlar to the experment on face dataset, on the problem for dgt 4, we calculate the norm of dfference between the vector α k from the k-th teraton of L-SVDD and the vector α Q from QP-SVDD, shown as the black curve n Fg. 3a, whch also gves the ftted exponental functon as the red dashed curve. We once agan see that the calculated values and the ftted values are very close to each other, and ths numercally verfes our theoretcal analyss presented n Eq. (23). Fgure 3b presents the nner product (α k ) (Qα k v) wth the teraton number on the problem of dgt 4, whch clearly shows that the nner product approaches 0 as the tranng algorthm proceeds. In fact, when the algorthm termnated, the nner product value was The

14 curves on other dgts gve the same pattern, thus we do not show them all. 5 Concluson and future works Support vector data descrpton (SVDD) s a well known tool for data descrpton and patter classfcaton, wth wde applcatons. In lterature, the SVDD model s often traned by solvng the dual of a constraned optmzaton problem, resultng n a quadratc programmng. However, the quadratc programmng s computatonally expensve to solve, wth tme complexty about O(n 3 ), where n s the tranng set sze. Usng the sum of the squared error and the dea of quadratc penalty functon method, we formulate the Lagrangan dual problem of SVDD as mnmzng a convex quadratc functon over the postve orthant, whch can be solved effcently by a smple teratve algorthm. The proposed Lagrangan SVDD (L-SVDD) algorthm s very easy to mplement, requrng no partcular optmzaton toolbox other than basc matrx operatons. Theoretcal and expermental analyss show that the L-SVDD soluton converges to the Quadratc Programmng based SVDD (QP-SVDD) soluton r-lnearly. Extensve experments were conducted on varous pattern classfcaton problems, and we compared the performance of L-SVDD to that of QP-SVDD, n terms of ROC curve measures and the tranng tme. Our results show that L-SVDD has smlar classfcaton performance as ts QP counterpart; both L-SVDD and QP-SVDD extract almost the same set of support vectors. However, L-SVDD s a couple hundreds tmes faster than QP-SVDD n tranng. The experments also verfed the theoretcal analyss about the convergence rate and the tranng tme complexty of L-SVDD. Due to the lmt of avalable computng resource, we dd not test L-SVDD on larger tranng set. However, f the tranng set s too large, we conjecture that the performance of L-SVDD mght deterorate. One reason s that n Algorthm 1 for L-SVDD, we need to calculate the nverse of an n n matrx Q, where n s the tranng set sze. It s well known that nvertng a large matrx s numercally unrelable and tme/memory consumng. Another reason s that Algorthm 1 nvolves matrx and vector multplcaton, whose computng cost scales up wth n 2. Ths work verfes the speed advantage of L-SVDD when the tranng set sze s at order of several thousands. It would be nsghtful to nvestgate the performance of L-SVDD for larger scaled problems. Ths paper formulates the optmzaton problem for L-SVDD as a lnear complementarty problem shown n Eq. (14), and we propose Algorthm 1 to solve t. In optmzaton lterature, there are many methods to solve lnear complementarty problem, for example, Newton s method [1], pvotng method [16], and the methods n [10]. Thus, as next step of work, we plan to apply these methods to L-SVDD model. Smlar to the works on support vector machne for classfcaton [12] and regresson [3], we can remove the constrant n Eq. (13) by addng extra penalty terms, obtanng an unconstraned optmzaton problem for an approxmated SVDD model. Then gradent based optmzaton methods can be appled to ths approxmaton, for example, Newton s method [5], coordnate gradent descent [6], block-wse coordnate descent [19], or conjugate gradent method [39, 40]. Ths s another possble extenson to the current work. Acknowledgements The author would lke to thank the edtors and four anonymous revewers for ther constructve suggestons whch greatly helped mprove the paper. Ths work was supported by a Summer Faculty Fellowshp from Mssour State Unversty. Orthogonalty condton for two nonnegatve vectors We show that two nonnegatve vectors a and b are perpendcular, f and only f a =(a γb) + for any real γ >0. If two nonnegatve real numbers a and b satsfy ab = 0, there s at least one of a and b s 0. If a = 0 and b 0, then for any γ >0, a γb 0, so that (a γb) + = 0 = a ; f a > 0, we must have b = 0, then for any real γ >0, (a γb) + =(a) + = a. In both cases, there s a =(a γb) + for any real number γ >0. Conversely, assume that two nonnegatve real numbers a and b can be wrtten as a =(a γb) + for any real number γ>0. If a and b are both strctly postve, then a γb < a snce γ >0. Consequently, (a γb) + < a, whch s contradct to the assumpton that a =(a γb) +. Thus at least one of a and b must be 0,.e., ab = 0. Now assume that nonnegatve vectors a and b n space R p are perpendcular, that s, p a b = 0. Snce both of a and b are nonnegatve, there must be a b = 0 for = 1, 2,, p. By the last argument, ths s equvalent to a =(a γb ) + for any γ >0 and any = 1, 2,, p. In vector form, we have a =(a γb) +. References 1. Aganagć M (1984) Newton s method for lnear complementarty problems. Math Program 28(3): Armand P, Glbert JC, Jan-Jégou S (2000) A feasble BFGS nteror pont algorthm for solvng convex mnmzaton problems. SIAM J Optm 11(1):

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Inexact Newton Methods for Inverse Eigenvalue Problems

Inexact Newton Methods for Inverse Eigenvalue Problems Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence. Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b Int J Contemp Math Scences, Vol 3, 28, no 17, 819-827 A New Refnement of Jacob Method for Soluton of Lnear System Equatons AX=b F Naem Dafchah Department of Mathematcs, Faculty of Scences Unversty of Gulan,

More information

Fisher Linear Discriminant Analysis

Fisher Linear Discriminant Analysis Fsher Lnear Dscrmnant Analyss Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan Fsher lnear

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Lagrange Multipliers Kernel Trick

Lagrange Multipliers Kernel Trick Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 493 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces you have studed thus far n the text are real vector spaces because the scalars

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Norms, Condition Numbers, Eigenvalues and Eigenvectors

Norms, Condition Numbers, Eigenvalues and Eigenvectors Norms, Condton Numbers, Egenvalues and Egenvectors 1 Norms A norm s a measure of the sze of a matrx or a vector For vectors the common norms are: N a 2 = ( x 2 1/2 the Eucldean Norm (1a b 1 = =1 N x (1b

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

An Interactive Optimisation Tool for Allocation Problems

An Interactive Optimisation Tool for Allocation Problems An Interactve Optmsaton ool for Allocaton Problems Fredr Bonäs, Joam Westerlund and apo Westerlund Process Desgn Laboratory, Faculty of echnology, Åbo Aadem Unversty, uru 20500, Fnland hs paper presents

More information

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Approximate Smallest Enclosing Balls

Approximate Smallest Enclosing Balls Chapter 5 Approxmate Smallest Enclosng Balls 5. Boundng Volumes A boundng volume for a set S R d s a superset of S wth a smple shape, for example a box, a ball, or an ellpsod. Fgure 5.: Boundng boxes Q(P

More information

CHAPTER 7 CONSTRAINED OPTIMIZATION 2: SQP AND GRG

CHAPTER 7 CONSTRAINED OPTIMIZATION 2: SQP AND GRG Chapter 7: Constraned Optmzaton CHAPER 7 CONSRAINED OPIMIZAION : SQP AND GRG Introducton In the prevous chapter we eamned the necessary and suffcent condtons for a constraned optmum. We dd not, however,

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Large-scale packing of ellipsoids

Large-scale packing of ellipsoids Large-scale packng of ellpsods E. G. Brgn R. D. Lobato September 7, 017 Abstract The problem of packng ellpsods n the n-dmensonal space s consdered n the present work. The proposed approach combnes heurstc

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k) ISSN 1749-3889 (prnt), 1749-3897 (onlne) Internatonal Journal of Nonlnear Scence Vol.17(2014) No.2,pp.188-192 Modfed Block Jacob-Davdson Method for Solvng Large Sparse Egenproblems Hongy Mao, College of

More information

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012 Support Vector Machnes Je Tang Knowledge Engneerng Group Department of Computer Scence and Technology Tsnghua Unversty 2012 1 Outlne What s a Support Vector Machne? Solvng SVMs Kernel Trcks 2 What s a

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2 Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system. Chapter Matlab Exercses Chapter Matlab Exercses. Consder the lnear system of Example n Secton.. x x x y z y y z (a) Use the MATLAB command rref to solve the system. (b) Let A be the coeffcent matrx and

More information

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

SIO 224. m(r) =(ρ(r),k s (r),µ(r)) SIO 224 1. A bref look at resoluton analyss Here s some background for the Masters and Gubbns resoluton paper. Global Earth models are usually found teratvely by assumng a startng model and fndng small

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Random Walks on Digraphs

Random Walks on Digraphs Random Walks on Dgraphs J. J. P. Veerman October 23, 27 Introducton Let V = {, n} be a vertex set and S a non-negatve row-stochastc matrx (.e. rows sum to ). V and S defne a dgraph G = G(V, S) and a drected

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1 C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned

More information

Formulas for the Determinant

Formulas for the Determinant page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information