9 Adaptive Soft K-Nearest-Neighbour Classifiers with Large Margin

Size: px
Start display at page:

Download "9 Adaptive Soft K-Nearest-Neighbour Classifiers with Large Margin"

Transcription

1 9 Adaptve Soft -Nearest-Neghbour Cassfers wth Large argn Abstract- A nove cassfer s ntroduced to overcome the mtatons of the -NN cassfcaton systems. It estmates the posteror cass probabtes usng a oca Parzen wndow estmaton wth the -nearest-neghbour prototypes (n the Eucdean sense) to the pattern to cassfy. A earnng agorthm to reduce the number of prototypes (that maxmses the confdence or margn on the correct cassfcaton of tranng patterns) s aso presented. Expermenta resuts n two hand-wrtten cassfcaton probems demonstrate the potenta of the proposed cassfcaton system. Index Terms- Soft Nearest Neghbour Cassfers, Onne gradent descent, Hand-wrtten Character Recognton. 9-

2 . Introducton Nearest neghbor (NN) technques are smpe but powerfu non-parametrc cassfcaton systems. Snce ther ntroducton n the fftes, many sophstcaton of the basc schema appeared, concerned wth dfferent topcs as condensng, edtng, earnng or extensons to ncude reecton. (See (Dasarathy, 99) to revew much of the exstng terature.) In recent years, nterest n these methods has fourshed agan n severa feds (ncudng statstcs, machne earnng and pattern recognton) snce they reman the best choce n many cassfcaton probems. Ths chapter presents a nove cassfcaton earnng archtecture n the context of condensed -NN cassfers wth the goa of enhancng ts generazaton propertes. In earnng systems, generazaton performance s affected by a trade-off between the number of tranng exampes and the capacty (e.g. the number of parameters) of the earnng machne. Ths goba trade-off can be renterpreted n oca earnng systems (ke NN methods) as a trade-off between capacty and ocaty (how oca the souton s) (Bottou & Vapnk, 992). One of the mpcatons of ths s that the earnng system must contro the effectve number of tranng sampes avaabe for tranng ocay the system. -NN technques are good exampes of such oca earnng systems. For every testng pattern, they estmate posteror cass probabtes wth a fxed number of tranng ponts. In ths way, good generazaton s guaranteed snce there s aways enough data to compute the estmatons. By contrast, these systems have poor approxmaton capabtes due to estmate usng a rato of ntegers. The quaty of the -NN approxmaton can be smpy mproved f we use nstead of the usua rato, a Parzen wndow estmate computed wth the - nearest-neghbor tranng sampes. Hence, the resutng soft estmaton coud beneft from the vrtues of -NN and Parzen estmates, snce t mproves the approxmaton quaty of crsp - NN estmators by means of a smooth nterpoaton and t retans ther contro of the tranng ponts that contrbute to the estmatons. However, an mportant dsadvantage of the method, nherted from ther ancestors, woud be that a of the tranng data must be retaned to compute the estmatons. So we aso present a more sophstcated verson of the agorthm to aow fewer data ponts to be used. Ths ncudes a earnng agorthm to compute a condensed set of prototypes from tranng data. The organzaton of ths chapter s as foows. Secton 2 derves the soft -NN cassfer n the context of oca earnng agorthms (Bottou & Vapnk, 992). In Secton 3 we ntroduce an adaptve verson of the prevous cassfcaton rue. The proposed earnng agorthm desgns a condensed set of prototypes that mnmzes the emprca average number of mscassfcatons. 9-2

3 Secton 4 shows expermenta resuts that compares successfuy the adaptve soft -NN cassfer wth other agorthms usng two NIST handwrtten character databases (Garrs, 997). Fnay, n Sectons 5 and 6 some dscusson and concuson are gven. 2. The soft -NN cassfer Let c:x {,..., C} be a maxmum cassfer that assgns a nput vector x to one of the C exstng casses. Ths knd of cassfers s defned n terms of a set of dscrmnant functons d (x), =,..., C. The cassfer c then assgns x to cass f d (x)>d (x) for a. In the case of equay mscassfcaton costs, c acheves a mnmum expected number of mscassfcatons when d s the posteror probabty of beongng to cass, P(C \x). Suppose we have N observatons pars T={(x, y )} =0,..., N- where x s a random sampe taken from nput space X that beongs to one of the casses and y s an ndcator varabe. If x beongs to the nth cass, the nth coeffcent of y s equa to and the other coeffcents are equa to 0. To ensure c s a good cassfcaton procedure each dscrmnant functons d (x) must estmate the bnary random varabe y (that s equa to f x beongs to the th cass and 0 otherwse). In oca agorthms ths estmaton s defned for each neghborhood around the testng pattern x as those whch mnmzes a weghted emprca average oss functon of the foowng form (Bottou & Vapnk, 992) p.892: I d oca emp * [ d ] G( x x; γ ) J( y, d ( x ; γ )) = = ( x; γ ) = arg mn I [ d ] oca emp () where kerne G s a bounded and even functon on X that s peaked about 0, γ denotes the {,..,C} ocaty contro parameters assocated to G and d * ( ; γ ), x are the optma dscrmnant = functons. If we use a constant approxmaton d (wth respect to x) of y and a quadratc oss J(y,d )=(y -d ) 2, sovng the equaton I [ d ] d 0 =,...,C yeds oca emp = d * ; = = ( x; γ ) = G( x x; γ ) y G( x x γ ) (2) As we see from nspectng (2), ths optma oca dscrmnants functons computes kerne-based estmates of the posteror probabty of the casses (Bshop, 995) 2.5. The shape of the kerne 9-3

4 G controed by γ then determnes dfferent we-known soutons to the same orgna probem. If G=G H where G H s a square kerne whose wdth s adusted to contan exacty k sampes then equaton (2) s the -NN agorthm. On the other hand f G=G S where G S s a smooth kerne wth a wdth moduated by a ocaty parameter σ then (2) s the Parzen wndow agorthm. The -NN technque consders a hypershere centered at the testng pattern x that exacty contans tranng sampes. Ths feature prevents the estmator from not havng enough tranng data to ensure good generazaton. Athough n practce, ther estmates poory approxmate posteror probabtes, snce the regon around x s not sma enough to keep the estmatons, a rato of ntegers ( /), vad. By contrast, the Parzen wndow agorthm can estmate better snce t s based on a smooth nterpoaton between tranng ponts, athough t does not effectvey contro the tradeoff between the capacty of the earnng devce and the number of tranng sampes. Ths s due to usng a fxed wdth parameter σ for a the tranng ponts. If the dstrbuton of the tranng patterns n the nput space s uneven, t w be mpossbe to ensure that there are enough tranng data that sgnfcanty contrbute to the estmatons for a testng pattern x. We propose an approach that benefts from the vrtues of these two technques and can be understood as a compromse between them: for each testng pattern x, a Parzen wndow estmate s computed usng the nearest (n the Eucdean sense) tranng ponts. In fact, ths agorthm resuts of usng a composed kerne G soft =G H G S n equaton (2). As we show n the Appendx, the resutng equaton aso estmates the posteror cass probabtes. Equaton (2) s then caed the soft -NN agorthm and can be wrtten as foows d * C ( x; γ ) = G( x x; γ ) G( x x; γ ) = = u= u (3) where C -NN of cass = {, u =,..., } x are those prototypes of cass among the k nearest u C prototypes to x and = =.We remark that the (crsp) -NN agorthm n equaton (3) s recovered n the mt σ. In the partcuar case k=2, the resutng cassfer s equvaent to the -nearest-neghbor cassfcaton rue snce ony two prototypes are nvoved n the decson. Fgure shows the cassfcaton borders for the Rpey s synthetc probem (Rpey, 994) n the case of the soft -NN and crsp -NN (σæ ) rues. Note that the soft -NN agorthm can smooth the crsp cassfcaton border where σ contros the degree of smoothness. 9-4

5 a) =3, σæ b) =3, σ=0.00 c) =3, σ=0. d) =3, σ=0.05 e) =, σæ f) =, σ=0.0 g) =, σ=0.05 h) =, σ=

6 h) =33, σæ ) =33, σ=0.0 ) =33, σ=0.05 k) =33, σ=0. ) =55, σæ m) =55, σ=0.075 n) =99, σæ o) =99, σ=0.075 Fg.. Soft & crsp -nearest-neghbor cassfcaton borders n the Rpey s synthetc probem for severa vaues of sgma and. (The soft verson uses gaussan kernes) 9-6

7 3. The adaptve soft -NN cassfer Gven a tranng set T, we can consder to appy (2) usng G soft, cassfyng a new nput to those cass whose dscrmnant functon has the hghest vaue. However when tranng set sze s arge, storage and processng requrements make ths smpe agorthm unusabe n engneerng appcatons. In the case of (crsp) -NN cassfers, many heurstc sophstcaton of the basc agorthm aow to use fewer data ponts (e.g. Condensed NN (Hart, 968)). By contrast, our approach can smpy be extended by repacng the oca kerne estmaton wth a smpfed adaptve mxture mode (clachan & Basford, 988). Suppose we defne for each cass densty functon a mxture as foows f X C ()( x P C ) = f () x = X C, (4) where {f X\C, =,...,C} are the cass densty probabtes, {P(C ), =,...,C} are the prors and {f X\C, =..., =,...,C} are known parametrc modes restrcted to the form G(x-w ;γ) wth G(u) a decreasng functon on U and peaked about u=0. Appyng Bayes' theorem, we arrve at the foowng expresson for the posteror probabtes P ( C x) f X C, = = C = u= f () x X C,u () x (5) Snce each mode of the mxture s ocay defned, we can approxmate equaton (5) usng ony the -nearest-modes to the testng pattern x. If we use ths smpfcaton to construct the cassfer, then the dscrmnant functons gve P ( C x) f, () x X C = C fx C,u = u= () x (6) { } where f () x, =,...,C, u =,..., are the -nearest-modes to x. Lke the mxture X C,u modes are ocay defned around ts centers w, P(C \x) then can be approxmated by 9-7

8 P ( C x) where { u =,...,,,...,C} = C G = u= ( w x; γ ) G ( w x; γ ) C w u, = are the -nearest-mxture-centers to x and = = u (7). The tota number of mxture modes (*C) w be typcay much smaer than the number of tranng ponts (N), and the mxture modes centers (w ) are no onger constraned to concde wth the tranng ponts. The parameters of the mxture modes coud be estmated usng unsupervsed procedures as the E agorthm (Dempster, 977), athough the foowed approach here make use of a supervsed earnng procedure based on mnmzng the number of mscassfcatons. 3.. The Loss functon. Consder a tranng set of random pars T={(x, y )} =0,..., N-. If the pattern x beongs to the nth cass, the nth coeffcent of y s equa to and the other coeffcents are equa to 0. As a oss functon we propose a modfcaton * of the cross-entropy error functon for mutpe casses (Bshop, 995) 6.9: L = N N C = 0 = y d ( x ) (8) The absoute mnmum wth respect to the {d (x )} occurs when a the tranng ponts are correcty cassfed wth the hghest confdence (or maxmum margn), that s when d (x )=y for a and. In the mt n whch the sze N of the tranng set goes to nfnty, we can substtute the sum over patters wth the foowng ntegra L = m L = N C d R = y () x d () x f ()x x d X (9) where f X s the probabty densty of the nput space. Snce y =(x cass ) where s the ndcator functon * We w see n the next subsecton, when we derve the earnng equatons, why equaton (8) can be of more practca nterest than cross-entropy error. 9-8

9 ( condton) = 0 f condton s true otherwse (0) C and f () x = f () x P( C ) X = X C, equaton (9) can be easy rewrtten as L = C = R d C x dx X C X C = () R () f ()( x P C ) dx + d () x f ()( x P C ) where R are the decson regon of cass, that s the nput regon n whch d has the hghest vaue among dscrmnant functons. <L> woud be mnmzed when d = for a x n R and f X\C P(C )=max f X\C P(C ) for a R. But as we know the best possbe choce for d s P(C \x). If we use an optmzaton method to mnmze teratvey <L>, snce the agorthm w try to mnmze the number of mscassfcatons t w be possbe a convergence of d to P(C \x) (at east near Bayes Borders). In ths case f cass overappng s moderate, the maxmum expected number of correct cassfcatons coud approxmate the absoute vaue of <Lmn>, L mn C = R ( C x) f ()( x P C ) (2) P dx X C Ths atter resut gves us, when N s arge enough, a possbe tendency of the system from mnmzng equaton (8) durng the earnng phase The earnng agorthm. In rea-fe pattern recognton probems tranng sets usuay are arge and hgh dmensona. So agorthmc smpcty of the earnng procedure, or the optmzaton process, s a requrement n order to acheve a souton wth moderated computatona resources. Onne gradent descent agorthms (Bottou, 998) are one of the most smpe computatona approaches to the earnng probem. Another features of these agorthms ncude better convergence speed than ther batch counterparts n redundant tranng sets (e.g. rea-word databases) and good mathematca characterzaton (Benvenste, 990)(Bottou, 998). 9-9

10 3.2.. Dervaton of the on-ne gradent earnng agorthm. We restrct the mxture mode parameters to centers and a goba defned parameter σ that contros ocaty. So the goa n the earnng phase s reduced to compute a set of abeed w =,...,C, one for each cass, that mnmzes an estmator of the codebooks C = {, =,..., } expected oss functon gven n equaton (9), ({,...C} ) = E [ Q(,{&,...C} )] L & X x = (3) = Snce E X [ ] s unknown, an emprca estmator s formed wth a (tranng) set of random sampes pars L={(x, cass ndex(x ))} =0,...,N-. The onne gradent earnng agorthm estmates usng the nstantaneous oss functon Q ( x,{, =,...,C} ) &. Each step of ths agorthm conssts of pckng up (cyccay or randomy) one sampe par from the tranng set L and appyng the foowng update equaton w ( ) [] k w [ k ] [] k H w [ k ], x[] k = η (4) The earnng rates [k] are postve numbers and are usuay made to decrease monotoncay wth dscrete tme k, athough they can be aso constant wth tme. The term H(w, x) s defned as H ( w, x) ( x, {,...C} ) & Q = w = 0 when dfferent abe otherwse (5) Gven a tranng set of nfnte sze, the necessary (but not suffcent, see (Bottou, 998) 5) condton to ensure the convergence of the teratve equaton (4) to a oca mnmum of <L> s that E [ H(, x) ] L ({&,...C} ) w (6) X = = w Snce Q s not dfferentabe on a fnte number of ponts and the teratve agorthm has a zero probabty to reach them, ths condton s met (see (Bottou, 998) p.0). In the case of usng a constant η, we mght ensure that η was sma enough (see (Benvenste et a., 990) p.48). Practcay, an optma vaue of η can be estmated usng a vadaton set. 9-0

11 Let (x-w ) be exp( x w σ 2 ' ), L = σ 2 L and >N@ :H DYH PRGLILHG WH ORVV functon for anaytca convenence snce the added constant does not affect the desred ponts of convergence. Hence equaton (5) yeds H ( w, x) d d ( )( ) () W () x d () x x w f w C k NN and = cass ndex x () x d ()( x x w ) f w C and cass ndex() x = W k NN 0 otherwse (7) where C -NN are those prototypes of cass among the k nearest prototypes (n the Eucdean sense) to x and d W s equa to d W 2 exp x w σ x = (8) C m 2 m exp x w u σ () m= u= In the case of σ goes to nfnte exp( σ ) tends to the unty. So H can be smpfed snce d W tends to / and d to /. Fnay the earnng equaton gves f w w f w w otherwse w [ k ] C k NN( x[ k] ) and = cass ndex( x[] k ) [] k = w [ k ] + d W ( x[] k ) d ( x[] k ) x k w k [ k ] C k NN( x[ k] ) and cass ndex( x[] k ) [] k = w [ k ] η d ( x[] k ) d ( x[] k ) x[] k w [ k ] [] k = w [ k ] W ( )( [] [ ] ) η (9) ( ) = x( k )mod N x f we pck up cyccay. where [] k k Propertes of the earnng equaton. As we ponted out before, the proposed oss functon has the same mnmum ponts than the cross-entropy error functon 9-

12 L cross entropy = N N C = 0 = y n d ( x ) (20) If we derve the on-ne gradent equatons for ths error functon, we obtan the equaton (7) dvded by d (x). So practca dfferences between performng on-ne gradent descent usng one oss functon or another must be determned wth an anayss of how ths term can affect to the evouton of the optmzaton process. Suppose that severa outers are present n the tranng set. Ths s a typca stuaton n rea-word databases. When we pck up one outer of cass, the amount of change n a wnnng prototype w w be w cross entropy = d W () x d d () x () ( x w ) x (2) Snce t s probabe that d (x) takes a ow vaue when x s an outer of cass, the term (- d (x))/d (x) tends to nfnte. In ths way w cross-entropy can be arbtrary arge and the norma evouton of the earnng process can be enormousy degraded by the presence of outers. By contrast, f we use our proposed oss functon, outers have a tte mpact snce the term d w domnates the amount of change and tends to zero snce the dstance between the wnnng prototype and the outer can be arge. 4. Emprca Resuts In ths secton, we compare ohonen s LVQX agorthms (ohonen, 996) (where X s, 2 and 3) and -NN cassfers wth our proposa. We have used the NIST (uppercase & owercase) handwrtten characters database (Garrs, 997), whch can be found n drectores tran/hsf_4, tran/hsf_6 and tran/hsf_7. We have spt drectory tran/hsf_7 n two sets: one for vadaton (00 sampes) and one for test (over 0000 sampes) preservng the orgna cass dstrbuton n data. The other two drectores were chosen for tranng (above sampes). These mages (32x32 pxes) have been preprocessed wth a Prncpa Component Anayzer that extracts 64 components from each mage (NIST's ms2evt utty). The correaton matrx of the PCA was computed usng the tranng set. 4.. Smuatons #: Adaptve Soft -NN vs. LVQ Ten runs for each cassfcaton probem, earnng agorthm and severa codebook szes have been made. We have ntazed the codebooks usng the LVQ_PA s evennt program 9-2

13 (ohonen et a., 995). Then we have apped every earnng agorthm unt cassfcaton error n vadaton set ncreases or stands. (We have montored ths error every teratons. In the case of LVQX agorthms, every tme we montored the error, we restart ts α vaue). Optma parameters of agorthms have been estmated usng the vadaton set Resuts Two man aspects are of nterest n ths secton. The frst one woud be concerned wth the comparson of our cassfer and ther crsp counterparts when our system s not constraned to use aso the crsp -NN estmaton. The second one woud compare the cassfer when a of them make use of ths cassfcaton rue. In ths way we can evauate f the use of a more compex cassfcaton procedure, the soft -NN estmaton, s of practca nterest and, secondy, what happens when we force our system to compete wth other we-estabshed methods to desgn prototypes sets for crsp -NN cassfers. Tabes and 2 summarze the average cassfcaton error on uppercase and owercase test sets. Each resut s the average of ten tras for each earnng agorthm (except n the case of - NN cassfers that the test error s computed once usng the whoe tranng set as the prototypes set). In tabe, we show the average mscassfcaton rate of our proposed cassfer n four dfferent cases. In the frst case, we fx the number of nearest-prototypes (k) equa to 2 and σ to nfnte. Whe n the second case wth k fxed to 2, we make use of the best σ vaue estmated by cross-vadaton. In these two frst cases, the resutng cassfcaton systems are crsp -NN cassfers snce ony two nearest-prototypes contrbute to form the soft estmatons. In the thrd case, we fx σ to nfnte and make use of the best-estmated k vaue (wth k greater than 2). As above, our cassfer then s the crsp -NN cassfer. Fnay n the fourth case, we constran k to be greater than 2 and aow σ to take a fnte vaue. Both parameters are estmated wth the vadaton set usng these constrants. Tabe 2 shows cassfcaton resuts of -NN cassfers whose codebooks are desgned usng ohonen's LVQ agorthms and crsp -NN that use the whoe tranng database as the prototypes set. The condensed soft -NN cassfers (the fourth case) aways outperforms crsp -NN cassfers based on LVQ agorthms for each tested codebook sze. In uppercase and owercase recognton, best resuts are acheved when codebook sze s 800. In ths case, our cassfcaton systems aso outperform -NN cassfers that use the whoe tranng database as the prototypes set. On average, our earnng system correcty recognzed 94,62% of the uppercase handwrtten etters and 87,46% of the owercase handwrtten characters, n front of 9,77% and 85,94% respectvey for LVQ2, the best of the LVQ agorthms. Dfferences between our procedure and 9-3

14 LVQ agorthms are notaby greater as codebook sze decreases. We aso note that the evouton of the mscassfcaton rate, as codebook sze grows, s ess abrupt n our procedure. Ths behavor suggests that our procedure acheves a better gracefu degradaton n the cassfcaton accuracy as the codebook sze s smaer. Our best crsp -NN cassfers, the best soutons of the two frst cases, ceary outperform LVQ-based crsp -NN cassfers except n one case. When the codebook sze s equa to 200, codebooks desgned wth the LVQ2 agorthm are better than ours. It s nterestng to compare the cassfcaton accuracy between case and case 2 snce both earnng agorthms desgn codebooks for crsp -NN cassfer athough they dffer n the way of dong t. Whe the frst agorthm uses an nfnte σ, the second one do not and can better defne how oca the souton s, snce σ reguate how many tranng data contrbute to update each prototype durng the earnng phase. In both cassfcaton probems we observe the same behavor. For sma codebooks (00 and 200), the effect of usng a fnte σ aows to acheve an optma souton. However, the use of an nfnte σ works better for medum codebooks. Cassfers of the thrd case ony acheve better cassfcaton accuracy than -NN cassfers (that use over prototypes) when ther codebook sze s 800. Athough these cassfers are nferor to the best soutons acheved n the above crsp cases. Fnay we compare best soutons of cases,2 and 4. We can observe that the soft estmaton s n sx out of eght cases superor than the crsp estmaton. Ony n owercase hand-wrtten recognton, the crsp estmaton s superor on average when codebook szes are 400 and 800. Codebook Upper Lower Sze σ Best σ< σ Best σ< σ Best σ< σ Best σ< =2 =2 Best >2 Best >2 =2 =2 Best >2 Best >2 00 8,2 4,52 25,95(3),94(5) 24,7 20,92 3,0(3) 8,8 () 200 0,98 0,63 5,8(3) 9,0(5) 7,62 7,26 22,69(3) 5,93(5) 400 7,36 7,97 0,09(3) 6,53(5) 3,3 5,28 7,3(3) 4,02(5) 800 5,45 7,22 5,74(3) 5,38(),28 4,,82(3) 2,54(5) Tabe. Expermenta Resuts for our proposed cassfer. We show for each agorthm the average test error over ten runs. The number n parenthess denotes the vaue of k. 9-4

15 Upper Lower Codebook LVQ LVQ2 LVQ3 -NN LVQ LVQ2 LVQ3 -NN Sze 00 2,5 5,82 6,02-27,89 22,4 22, ,3 9,94,4-20,85 6,8 8,39-400,8 8,24 9,08-7,27 4,84 5, ,6 8,23 8,26-5,53 4,06 4,43 - Whoe database ,84() ,(7) Tabe 2. Expermenta Resuts for LVQ-based and -NN cassfers. We show for each agorthm the average test error over ten runs, except n the case of -NN cassfcaton. In - NN cassfers, the number n parenthess denotes the vaue of k Smuatons #2: The utty of Soft outputs for post-processng purposes An addtona feature the soft -NN approach s that soft abes can be assgned to test patterns. Ths can enhance the overa cassfcaton accuracy when some post-processng s performed usng context (e.g. reatons of patterns n tme). Ths secton expores how the mscassfcaton rate s mproved when we accept that the correct souton was among the top greatest actvaton of the dscrmnant functons. In ths way, we can detect f a correct souton s among the top greatest actvaton and consequenty a post-processor woud recover the correct souton. We use the best soft -NN cassfers of secton 4.2 for the upper probem and now we perform ten runs for each cassfer checkng the vadaton error every steps. Tabe 3 and 4 show the averaged vadaton and test error respectvey usng the top hghest actvaton. (The cassfer w make a correct decson f the abe of the tranng pattern agrees wth the one of the abes of the dscrmnant functons wth hghest actvaton.) Note that top resuts n tabe 4 are sghty dfferent than the fourth coumn of tabe snce now the cassfcaton error s checked every steps so the agorthm stops at dfferent ponts. As tabe 3 and 4 show f a correct souton s between the top 2 hghest actvaton the cassfcaton accuracy s reduced by more than a haf (except n the case of 00 prototypes). 9-5

16 Top Top Top Top Tabe 3. Averaged vadaton error over ten runs for the best soft -NN cassfcaton n the upper probem. (Note: a cassfcaton error s produced f the abe of the tranng pattern does not agree wth the one of the abes of the dscrmnant functons wth hghest actvaton.) Top Top Top Top Tabe 4. Averaged test error over ten runs for the best soft -NN cassfcaton n the upper probem Smuatons #3: Overtranng and Overfttng n adaptve soft -NN cassfers Overtranng (that s, the excessve tunng to the tranng set of the souton due to reachng a deep mnmum of the cost functon) can ead to soutons wth poor generazaton capabtes. If the number of free parameters of the cassfer s excessve, overtranng eads to overfttng (the souton s too ftted to nosy sampes). The usua way of avodng them s to stop at a (oca) mnmum of the vadaton set. However ths practce can be too conservatve snce the earnng agorthm coud converge to a oca mnmum of the tranng cost functon far from the goba mnmum so the rsk of overtranng woud be ow. (See e.g. mut-ayer perceptrons traned wth the back-propagaton agorthm (Lawrence et a., 997)). In tabe 5 we show the resuts of the best soft -NN cassfers for the upper probem over 50 runs when the earnng agorthm stops at mnmum of tranng and vadaton errors. Observe that cassfers acheve ther best resuts when they stop at a mnmum of the tranng error. 9-6

17 Stop at a Codebook sze mnmum of the Tranng error (.30) (0.4689) (0.2336) (0.43) Vadaton error 2.7 (.4342) (0.837) (0.297) (0.8) Tabe 5. Averaged test error and varance over 50 runs for the best soft -NN cassfcaton n the upper probem when the earnng agorthm stop at a mnmum of the tranng and vadaton errors. 5. Dscusson 5.. Soft -NN vs. Crsp -NN The Soft -NN agorthm uses the dstance to test patterns for ts estmaton of posteror cass probabtes. Ths addtona nformaton enhances the approxmaton quaty of the estmaton snce t ntroduces a smooth nterpoaton n reaton wth the crsp -NN estmate. We have shown n fgure that the cassfcaton borders of the soft -NN estmate are a smooth verson of the crsp -NN borders. As secton 4. demonstrated, the resutng smoothed borders ncrease the cassfcaton rate for a hard decson at the expense of a tte more computaton. Besdes the soft -NN rue aows a fuzzy cassfcaton (Bezdek & Pa, 992) where soft abes can be used for post-processng purposes. Secton 4.3 show that f a post-processor detect the correct abe among the two greatest outputs of the soft -NN cassfer the cassfcaton accuracy can be notaby reduced (more than a haf) The Adaptve Soft -NN agorthm vs. LVQ agorthms The adaptve soft -NN agorthm computes a reduced set of prototypes for the soft -NN cassfcaton rue. The earnng agorthm mnmses a modfcaton of the cross-entropy functon whose absoute mnmum ponts ensure the mnmsaton of the tranng cassfcaton error. However the earnng rue resembes a fuzzfed verson of ohonen s LVQ agorthm (ohonen, 996) so ths fact can ndcate some smarty between our agorthm and LVQ when hard decsons are empoyed (σæ ) and =2. Suppose that m [k] and m [k] are the two nearest prototypes to pattern x[k] then equaton 9 yeds 9-7

18 f m m m [ k ], x[ k] same cass and m [ k ], x[ k] η 4 η 4 [] k = m [ k ] + ( x[] k m [ k ] ) ( ) [] k = m [ k ] x[] k m [ k ] same cass (22) Equaton 2 s the LVQ2. agorthm when the tranng pattern x[k] fas n a wndow defned n LVQ2 and LVQ3 agorthms. Ths resut s not a surprse snce (Bottou, 998) showed that the LVQ2 agorthm performs gradent descent over an nstantaneous oss functon based on a contnuos approxmaton of the (bnary) cassfcaton error functon The oss functon The oss functon that we use n the earnng phase has nterestng propertes. It s ess senstve to outers as secton shows (see (Bermeo, 2000) for an addtona dscusson). Besdes, our error functon s, for a two-cass probem, the margn functon of the cassfer averaged over the whoe tranng patterns. If =2 for a two-cass probem, the earnng rue concdes wth the earnnn agorthm (Bermeo, 2000). So the computed prototypes uses the support vectors to bud Optma argn (O) hyperpanes (Schökopf, 997) as t was emprcay shown n (Bermeo, 2000) Overtranng and Overfttng Expermenta secton 4.4 shows that f we stop at a oca mnmum of the tranng cassfcaton error nstead of usng the eary stoppng crteron, the test error s reduced. Ths resut can suggest that overtranng s harder that expected (Lawrence, 997). Snce the earnng agorthm can hardy acheve a goba mnmum the rsk of overtranng s reduced so the use of a vadaton set makes the agorthm to stop too eary. Furthermore, If overtranng s dffcut then the rsk of overfttng s notaby reduced. 6. Concuson Crsp -nearest-neghbour cassfers estmate posteror cass probabtes usng a fxed number of tranng data. Ths feature prevents them, unke other approaches, for not havng n some nput regons enough data to ensure good generazaton. Athough the quaty of ther estmatons (a smpe rato of ntegers) s too poor to propery approxmate the true probabtes when fnte data sets are empoyed. An extenson of -NN estmaton was presented n order to overcome ths mtaton: for each testng pattern x, a Parzen wndow estmate s computed usng the nearest (n the 9-8

19 Eucdean sense) tranng ponts. Ths smpe agorthm aso estmates the posteror cass probabtes athough a the tranng data ponts must be retaned n order to compute the estmatons. An mprovement of ths approach can be acheved, n terms of storage and processng requrements, by repacng the oca kerne estmaton wth a smpfed adaptve mxture mode: snce each mode of the mxture s ocay defned, for each testng pattern we ony take nto account ts -nearest-modes. The tota number of mxture modes w be typcay much smaer than the number of tranng ponts and the mxture modes centers are no onger constraned to concde wth the tranng ponts. We have proposed a supervsed earnng procedure to estmate the mxture modes parameters based on mnmzng a modfcaton of the cross-entropy error functon. Ths error functon has the same mnmum ponts that the crossentropy error functon whe t aows the derved on-ne gradent earnng agorthm to have a more robust behavor n presence of outers n the tranng set. Expermenta resuts n two handwrtten cassfcaton probems demonstrate the potenta and versatty of the proposed cassfcaton system. Snce the soft -NN cassfer ncudes as a speca case the crsp verson, n numerca smuatons we aways fnd an optma souton that acheves better cassfcaton accuracy than crsp -NN cassfers desgned wth LVQ agorthms and -NN cassfers. References Benvenste, A., étver,., & Prouret, P. (990). Adaptve Agorthms and Stochastc Approxmatons, Bern: Sprnger-Verag. Bermeo, S. (2000). Large argn Adaptve -Nearest Neghbour Cassfers, n Bermeo, S. Learnng wth Nearest Neghbour Cassfers, Ph.D. Thess, Barceona: Unverstat Potècnca de Cataunya. Bezdek, J.C & Pa, S.. (Eds.) (992). Fuzzy odes for Paterrn Recognton, NY: IEEE Press. Bshop, C.. (995). Neura Networks and Pattern Recognton, Oxford: Oxford Unversty Press. Bottou, L. (998). Onne Learnng and Stochastc Approxmatons, n Davd Saa (Ed.) Onne Learnng and Neura Networks, Cambrdge: Cambrdge Unversty Press. Bottou, L.& Vapnk, V. (992). Loca Learnng Agorthms, Neura Computaton, 4, Darasay, B. V. (Ed.). (99). Nearest Neghbor Pattern Cassfcaton Technques, Los Aamtos, LA: IEEE Computer Socety Press. Dempster, A.P., Lard, N.. & Rubn, D.B. (977). axmum Lkehood from ncompete data va the E agorthm, Journa of the Roya Statstca Socety, seres B, 39, -38. Garrs,. et a. (997). NIST Form-Based Handprnt Recognton System (Reease 2.0), Natona Insttute of Standards and Technoogy. Hart, P.E. (968). The Condensed Nearest Neghbor Rue, IEEE Transactons on Informaton Theory (Corresp.), 4,

20 ohonen, T., Hynnnen, J., angas, J., Laaksonen, J., & Torkkoa,., LVQ_PA. The Learnng Vector Quantzaton Program Package. Verson 3., Hesnk: Hesnk Unversty of Technoogy, Laboratory of Computer and Informaton Scence. ohonen, T. (996). Sef-organzng aps, 2nd Edton, Bern: Sprnger-Verag. Lawrence, S., Ges, C. L. & Tso, A. C. (997). Lessons n Neura Network Tranng: Overfttng ay be Harder than Expected, Proceedngs of the Fourteenth Natona Conference on Artfca Integence, AAAI-97, , eno Park, CA: AAAI Press. clachan, G. J., & Basford,. E. (988). xture odes. Inference and Appcatons to Custerng, New York: arce Dekker. Rpey, D. (994). Neura Networks and ethods for Cassfcaton, Journa of the Roya Statstca Socety, Seres B, 56, p Schökopf, B. (997). Support Vector Learnng. Ph.D. Thess, Bern: Informatk der Technschen Unverstät Bern. Appendx: The Soft -NN erne Estmaton Consder a tranng set of random pars T={(x, cass abe(x ))} =0,..., N- where x s a random sampe beongng to one of the C exstng casses. Suppose we take the regon R(x) to be a sma hypersphere centered at the pont x that contans precsey tranng data ponts. Let us denote R as the event X beongs to R(X). Ths event s aways true due to any vector beongs to a regon centered at tsef. So the foowng equaton can be wrtten P ( C x ) = P( C x, R) (A.) akng use of Bayes' theorem n the rght hand-sde of equaton (A.) we have P ( C x, R) f = X C,R f ()( x P C R) () x X R (A.2) If the tranng data beongng to cass that fa n R(x) are denoted by {(x )} =... =...C, we obtan the foowng Parzen wndow estmates (equaton (2.57) (Bshop, 995)) for the probabty densty functons f X\R and f X\C,R f f X C,R X R () x C = () x = m= d h h d x x G h x x G h m (A.3) 9-20

21 9-2 Snce the probabty of beongng to cass condtoned to beong to regon R, P(C \R) can be approxmated by /, we have as an estmaton of P(C \x) ( ) = = = C m m h G h G C P x x x x x (A.4)

Image Classification Using EM And JE algorithms

Image Classification Using EM And JE algorithms Machne earnng project report Fa, 2 Xaojn Sh, jennfer@soe Image Cassfcaton Usng EM And JE agorthms Xaojn Sh Department of Computer Engneerng, Unversty of Caforna, Santa Cruz, CA, 9564 jennfer@soe.ucsc.edu

More information

Associative Memories

Associative Memories Assocatve Memores We consder now modes for unsupervsed earnng probems, caed auto-assocaton probems. Assocaton s the task of mappng patterns to patterns. In an assocatve memory the stmuus of an ncompete

More information

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students.

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students. Exampe: Suppose we want to bud a cassfer that recognzes WebPages of graduate students. How can we fnd tranng data? We can browse the web and coect a sampe of WebPages of graduate students of varous unverstes.

More information

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory Advanced Scence and Technoogy Letters Vo.83 (ISA 205), pp.60-65 http://dx.do.org/0.4257/ast.205.83.2 Research on Compex etworks Contro Based on Fuzzy Integra Sdng Theory Dongsheng Yang, Bngqng L, 2, He

More information

MARKOV CHAIN AND HIDDEN MARKOV MODEL

MARKOV CHAIN AND HIDDEN MARKOV MODEL MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not

More information

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory Proceedngs of the 2009 IEEE Internatona Conference on Systems Man and Cybernetcs San Antono TX USA - October 2009 Mutspectra Remote Sensng Image Cassfcaton Agorthm Based on Rough Set Theory Yng Wang Xaoyun

More information

Neural network-based athletics performance prediction optimization model applied research

Neural network-based athletics performance prediction optimization model applied research Avaabe onne www.jocpr.com Journa of Chemca and Pharmaceutca Research, 04, 6(6):8-5 Research Artce ISSN : 0975-784 CODEN(USA) : JCPRC5 Neura networ-based athetcs performance predcton optmzaton mode apped

More information

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks Shengyang Sun, Changyou Chen, Lawrence Carn Suppementary Matera: Learnng Structured Weght Uncertanty n Bayesan Neura Networks Shengyang Sun Changyou Chen Lawrence Carn Tsnghua Unversty Duke Unversty Duke

More information

Nested case-control and case-cohort studies

Nested case-control and case-cohort studies Outne: Nested case-contro and case-cohort studes Ørnuf Borgan Department of Mathematcs Unversty of Oso NORBIS course Unversty of Oso 4-8 December 217 1 Radaton and breast cancer data Nested case contro

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Sparse Training Procedure for Kernel Neuron *

Sparse Training Procedure for Kernel Neuron * Sparse ranng Procedure for Kerne Neuron * Janhua XU, Xuegong ZHANG and Yanda LI Schoo of Mathematca and Computer Scence, Nanng Norma Unversty, Nanng 0097, Jangsu Provnce, Chna xuanhua@ema.nnu.edu.cn Department

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

A DIMENSION-REDUCTION METHOD FOR STOCHASTIC ANALYSIS SECOND-MOMENT ANALYSIS

A DIMENSION-REDUCTION METHOD FOR STOCHASTIC ANALYSIS SECOND-MOMENT ANALYSIS A DIMESIO-REDUCTIO METHOD FOR STOCHASTIC AALYSIS SECOD-MOMET AALYSIS S. Rahman Department of Mechanca Engneerng and Center for Computer-Aded Desgn The Unversty of Iowa Iowa Cty, IA 52245 June 2003 OUTLIE

More information

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques Energes 20, 4, 73-84; do:0.3390/en40073 Artce OPEN ACCESS energes ISSN 996-073 www.mdp.com/journa/energes Short-Term Load Forecastng for Eectrc Power Systems Usng the PSO-SVR and FCM Custerng Technques

More information

Cyclic Codes BCH Codes

Cyclic Codes BCH Codes Cycc Codes BCH Codes Gaos Feds GF m A Gaos fed of m eements can be obtaned usng the symbos 0,, á, and the eements beng 0,, á, á, á 3 m,... so that fed F* s cosed under mutpcaton wth m eements. The operator

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Quantum Runge-Lenz Vector and the Hydrogen Atom, the hidden SO(4) symmetry

Quantum Runge-Lenz Vector and the Hydrogen Atom, the hidden SO(4) symmetry Quantum Runge-Lenz ector and the Hydrogen Atom, the hdden SO(4) symmetry Pasca Szrftgser and Edgardo S. Cheb-Terrab () Laboratore PhLAM, UMR CNRS 85, Unversté Le, F-59655, France () Mapesoft Let's consder

More information

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1 Abstract The Entre Souton Path for Support Vector Machne n Postve and Unabeed Cassfcaton 1 Yao Lmn, Tang Je, and L Juanz Department of Computer Scence, Tsnghua Unversty 1-308, FIT, Tsnghua Unversty, Bejng,

More information

A finite difference method for heat equation in the unbounded domain

A finite difference method for heat equation in the unbounded domain Internatona Conerence on Advanced ectronc Scence and Technoogy (AST 6) A nte derence method or heat equaton n the unbounded doman a Quan Zheng and Xn Zhao Coege o Scence North Chna nversty o Technoogy

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

NONLINEAR SYSTEM IDENTIFICATION BASE ON FW-LSSVM

NONLINEAR SYSTEM IDENTIFICATION BASE ON FW-LSSVM Journa of heoretca and Apped Informaton echnoogy th February 3. Vo. 48 No. 5-3 JAI & LLS. A rghts reserved. ISSN: 99-8645 www.jatt.org E-ISSN: 87-395 NONLINEAR SYSEM IDENIFICAION BASE ON FW-LSSVM, XIANFANG

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

International Journal "Information Theories & Applications" Vol.13

International Journal Information Theories & Applications Vol.13 290 Concuson Wthn the framework of the Bayesan earnng theory, we anayze a cassfer generazaton abty for the recognton on fnte set of events. It was shown that the obtane resuts can be appe for cassfcaton

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Supervised Learning. Neural Networks and Back-Propagation Learning. Credit Assignment Problem. Feedforward Network. Adaptive System.

Supervised Learning. Neural Networks and Back-Propagation Learning. Credit Assignment Problem. Feedforward Network. Adaptive System. Part 7: Neura Networ & earnng /2/05 Superved earnng Neura Networ and Bac-Propagaton earnng Produce dered output for tranng nput Generaze reaonaby & appropratey to other nput Good exampe: pattern recognton

More information

COXREG. Estimation (1)

COXREG. Estimation (1) COXREG Cox (972) frst suggested the modes n whch factors reated to fetme have a mutpcatve effect on the hazard functon. These modes are caed proportona hazards (PH) modes. Under the proportona hazards

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Predicting Model of Traffic Volume Based on Grey-Markov

Predicting Model of Traffic Volume Based on Grey-Markov Vo. No. Modern Apped Scence Predctng Mode of Traffc Voume Based on Grey-Marov Ynpeng Zhang Zhengzhou Muncpa Engneerng Desgn & Research Insttute Zhengzhou 5005 Chna Abstract Grey-marov forecastng mode of

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

WAVELET-BASED IMAGE COMPRESSION USING SUPPORT VECTOR MACHINE LEARNING AND ENCODING TECHNIQUES

WAVELET-BASED IMAGE COMPRESSION USING SUPPORT VECTOR MACHINE LEARNING AND ENCODING TECHNIQUES WAVELE-BASED IMAGE COMPRESSION USING SUPPOR VECOR MACHINE LEARNING AND ENCODING ECHNIQUES Rakb Ahmed Gppsand Schoo of Computng and Informaton echnoogy Monash Unversty, Gppsand Campus Austraa. Rakb.Ahmed@nfotech.monash.edu.au

More information

On the Power Function of the Likelihood Ratio Test for MANOVA

On the Power Function of the Likelihood Ratio Test for MANOVA Journa of Mutvarate Anayss 8, 416 41 (00) do:10.1006/jmva.001.036 On the Power Functon of the Lkehood Rato Test for MANOVA Dua Kumar Bhaumk Unversty of South Aabama and Unversty of Inos at Chcago and Sanat

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

The University of Auckland, School of Engineering SCHOOL OF ENGINEERING REPORT 616 SUPPORT VECTOR MACHINES BASICS. written by.

The University of Auckland, School of Engineering SCHOOL OF ENGINEERING REPORT 616 SUPPORT VECTOR MACHINES BASICS. written by. The Unversty of Auckand, Schoo of Engneerng SCHOOL OF ENGINEERING REPORT 66 SUPPORT VECTOR MACHINES BASICS wrtten by Vojsav Kecman Schoo of Engneerng The Unversty of Auckand Apr, 004 Vojsav Kecman Copyrght,

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Delay tomography for large scale networks

Delay tomography for large scale networks Deay tomography for arge scae networks MENG-FU SHIH ALFRED O. HERO III Communcatons and Sgna Processng Laboratory Eectrca Engneerng and Computer Scence Department Unversty of Mchgan, 30 Bea. Ave., Ann

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Probability Density Function Estimation by different Methods

Probability Density Function Estimation by different Methods EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT Probablty Densty Functon Estmaton by dfferent Methods Vas Chandraant Rayar Abstract The am of the assgnment was to estmate the probablty densty functon (PDF of

More information

Optimum Selection Combining for M-QAM on Fading Channels

Optimum Selection Combining for M-QAM on Fading Channels Optmum Seecton Combnng for M-QAM on Fadng Channes M. Surendra Raju, Ramesh Annavajjaa and A. Chockangam Insca Semconductors Inda Pvt. Ltd, Bangaore-56000, Inda Department of ECE, Unversty of Caforna, San

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

Lower Bounding Procedures for the Single Allocation Hub Location Problem

Lower Bounding Procedures for the Single Allocation Hub Location Problem Lower Boundng Procedures for the Snge Aocaton Hub Locaton Probem Borzou Rostam 1,2 Chrstoph Buchhem 1,4 Fautät für Mathemat, TU Dortmund, Germany J. Faban Meer 1,3 Uwe Causen 1 Insttute of Transport Logstcs,

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Interference Alignment and Degrees of Freedom Region of Cellular Sigma Channel

Interference Alignment and Degrees of Freedom Region of Cellular Sigma Channel 2011 IEEE Internatona Symposum on Informaton Theory Proceedngs Interference Agnment and Degrees of Freedom Regon of Ceuar Sgma Channe Huaru Yn 1 Le Ke 2 Zhengdao Wang 2 1 WINLAB Dept of EEIS Unv. of Sc.

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2 Note 2 Lng fong L Contents Ken Gordon Equaton. Probabty nterpretaton......................................2 Soutons to Ken-Gordon Equaton............................... 2 2 Drac Equaton 3 2. Probabty nterpretaton.....................................

More information

3. Stress-strain relationships of a composite layer

3. Stress-strain relationships of a composite layer OM PO I O U P U N I V I Y O F W N ompostes ourse 8-9 Unversty of wente ng. &ech... tress-stran reatonshps of a composte ayer - Laurent Warnet & emo Aerman.. tress-stran reatonshps of a composte ayer Introducton

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

2 Finite difference basics

2 Finite difference basics Numersche Methoden 1, WS 11/12 B.J.P. Kaus 2 Fnte dfference bascs Consder the one- The bascs of the fnte dfference method are best understood wth an example. dmensonal transent heat conducton equaton T

More information

Application of Particle Swarm Optimization to Economic Dispatch Problem: Advantages and Disadvantages

Application of Particle Swarm Optimization to Economic Dispatch Problem: Advantages and Disadvantages Appcaton of Partce Swarm Optmzaton to Economc Dspatch Probem: Advantages and Dsadvantages Kwang Y. Lee, Feow, IEEE, and Jong-Bae Par, Member, IEEE Abstract--Ths paper summarzes the state-of-art partce

More information

Characterizing Probability-based Uniform Sampling for Surrogate Modeling

Characterizing Probability-based Uniform Sampling for Surrogate Modeling th Word Congress on Structura and Mutdscpnary Optmzaton May 9-4, 3, Orando, Forda, USA Characterzng Probabty-based Unform Sampng for Surrogate Modeng Junqang Zhang, Souma Chowdhury, Ache Messac 3 Syracuse

More information

[WAVES] 1. Waves and wave forces. Definition of waves

[WAVES] 1. Waves and wave forces. Definition of waves 1. Waves and forces Defnton of s In the smuatons on ong-crested s are consdered. The drecton of these s (μ) s defned as sketched beow n the goba co-ordnate sstem: North West East South The eevaton can

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION. Machine Learning

GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION. Machine Learning CHAPTER 3 GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION Machne Learnng Copyrght c 205. Tom M. Mtche. A rghts reserved. *DRAFT OF September 23, 207* *PLEASE DO NOT DISTRIBUTE

More information

IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED BY PARTICLE SWARM ALGORITHM

IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED BY PARTICLE SWARM ALGORITHM Journa of Theoretca and Apped Informaton Technoogy th February 3. Vo. 48 No. 5-3 JATIT & LLS. A rghts reserved. ISSN: 99-8645 www.att.org E-ISSN: 87-395 IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED

More information

On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel

On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel Proceedngs of th European Symposum on Artfca Neura Networks, pp. 25-222, ESANN 2003, Bruges, Begum, 2003 On the Equaty of Kerne AdaTron and Sequenta Mnma Optmzaton n Cassfcaton and Regresson Tasks and

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Optimization of JK Flip Flop Layout with Minimal Average Power of Consumption based on ACOR, Fuzzy-ACOR, GA, and Fuzzy-GA

Optimization of JK Flip Flop Layout with Minimal Average Power of Consumption based on ACOR, Fuzzy-ACOR, GA, and Fuzzy-GA Journa of mathematcs and computer Scence 4 (05) - 5 Optmzaton of JK Fp Fop Layout wth Mnma Average Power of Consumpton based on ACOR, Fuzzy-ACOR, GA, and Fuzzy-GA Farshd Kevanan *,, A Yekta *,, Nasser

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

A General Column Generation Algorithm Applied to System Reliability Optimization Problems

A General Column Generation Algorithm Applied to System Reliability Optimization Problems A Genera Coumn Generaton Agorthm Apped to System Reabty Optmzaton Probems Lea Za, Davd W. Cot, Department of Industra and Systems Engneerng, Rutgers Unversty, Pscataway, J 08854, USA Abstract A genera

More information

The line method combined with spectral chebyshev for space-time fractional diffusion equation

The line method combined with spectral chebyshev for space-time fractional diffusion equation Apped and Computatona Mathematcs 014; 3(6): 330-336 Pubshed onne December 31, 014 (http://www.scencepubshnggroup.com/j/acm) do: 10.1164/j.acm.0140306.17 ISS: 3-5605 (Prnt); ISS: 3-5613 (Onne) The ne method

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing Advanced Scence and Technology Letters, pp.164-168 http://dx.do.org/10.14257/astl.2013 Pop-Clc Nose Detecton Usng Inter-Frame Correlaton for Improved Portable Audtory Sensng Dong Yun Lee, Kwang Myung Jeon,

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

Application of support vector machine in health monitoring of plate structures

Application of support vector machine in health monitoring of plate structures Appcaton of support vector machne n heath montorng of pate structures *Satsh Satpa 1), Yogesh Khandare ), Sauvk Banerjee 3) and Anrban Guha 4) 1), ), 4) Department of Mechanca Engneerng, Indan Insttute

More information

Training Convolutional Neural Networks

Training Convolutional Neural Networks Tranng Convolutonal Neural Networks Carlo Tomas November 26, 208 The Soft-Max Smplex Neural networks are typcally desgned to compute real-valued functons y = h(x) : R d R e of ther nput x When a classfer

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

Active Learning with Support Vector Machines for Tornado Prediction

Active Learning with Support Vector Machines for Tornado Prediction Actve Learnng wth Support Vector Machnes for Tornado Predcton Theodore B. Trafas, Indra Adranto, and Mchae B. Rchman Schoo of Industra Engneerng, Unversty of Okahoma, 0 West Boyd St, Room 4, Norman, OK

More information

A New Modified Gaussian Mixture Model for Color-Texture Segmentation

A New Modified Gaussian Mixture Model for Color-Texture Segmentation Journa of Computer Scence 7 (): 79-83, 0 ISSN 549-3636 0 Scence Pubcatons A New Modfed Gaussan Mxture Mode for Coor-Texture Segmentaton M. Sujartha and S. Annadura Department of Computer Scence and Engneerng,

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Analysis of Non-binary Hybrid LDPC Codes

Analysis of Non-binary Hybrid LDPC Codes Anayss of Non-bnary Hybrd LDPC Codes Luce Sassate and Davd Decercq ETIS ENSEA/UCP/CNRS UMR-5 954 Cergy, FRANCE {sassate,decercq}@ensea.fr Abstract Ths paper s egbe for the student paper award. In ths paper,

More information

L-Edge Chromatic Number Of A Graph

L-Edge Chromatic Number Of A Graph IJISET - Internatona Journa of Innovatve Scence Engneerng & Technoogy Vo. 3 Issue 3 March 06. ISSN 348 7968 L-Edge Chromatc Number Of A Graph Dr.R.B.Gnana Joth Assocate Professor of Mathematcs V.V.Vannaperuma

More information

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION European Journa of Mathematcs and Computer Scence Vo. No. 1, 2017 ON AUTOMATC CONTNUTY OF DERVATONS FOR BANACH ALGEBRAS WTH NVOLUTON Mohamed BELAM & Youssef T DL MATC Laboratory Hassan Unversty MORO CCO

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Research Article Green s Theorem for Sign Data

Research Article Green s Theorem for Sign Data Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization Journa of Machne Learnng Research 18 17 1-5 Submtted 9/16; Revsed 1/17; Pubshed 1/17 A Genera Dstrbuted Dua Coordnate Optmzaton Framework for Reguarzed Loss Mnmzaton Shun Zheng Insttute for Interdscpnary

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Research Article H Estimates for Discrete-Time Markovian Jump Linear Systems

Research Article H Estimates for Discrete-Time Markovian Jump Linear Systems Mathematca Probems n Engneerng Voume 213 Artce ID 945342 7 pages http://dxdoorg/11155/213/945342 Research Artce H Estmates for Dscrete-Tme Markovan Jump Lnear Systems Marco H Terra 1 Gdson Jesus 2 and

More information