Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
Chapter 5: Lnear Dscrmnant Functons (Sectons 5.-5-3) Introducton Lnear Dscrmnant Functons and Decsons Surfaces Generalzed Lnear Dscrmnant Functons
Introducton In Ch.3, the underlyng probablty denstes ere knon (or gven) The tranng sample as used to estmate the parameters of these probablty denstes (ML, MAP estmatons) In ths chapter, e only kno the proper forms for the dscrmnant functons Use the samples to estmate the values of the parameters of the classfer Goal Determnng the dscrmnant functons.(no knoledge of the underlyng prob.dst. s reqd.) Dscrmnant functons: Lnear n or functons of They may not be optmal, but they are very smple to use Fndng a lnear dscrmnant functon Mnmzng a crteron functon. Sample crteron fnc. Sample rsk, tranng error (small tranng error small test error)
Lnear dscrmnant functons 3 and decsons surfaces Lnear dscrmnant functon g() = T 0 () here s the eght vector and 0 the bas A to-category classfer th a dscrmnant functon of the form () uses the follong rule: Decde ω f g() > 0 and ω f g() < 0 Decde ω f t > - 0 and ω otherse If g() = 0 s assgned to ether class
4
5 The equaton g() = 0 defnes the decson surface that separates ponts assgned to the category ω from ponts assgned to the category ω When g() s lnear, the decson surface s a hyperplane Algebrac measure of the dstance from to the hyperplane (nterestng result!)
6
= p r. (snce s colnear th - p and = ) 7 sn ce g() = 0 and g( ) therefore r = n partcular d(0, H) = t. = 0 In concluson, a lnear dscrmnant functon dvdes the feature space by a hyperplane decson surface The orentaton of the surface s determned by the normal vector and the locaton of the surface s determned by the bas
The mult-category case 8 We defne c lnear dscrmnant functons g 0 t ( ) = =,...,c and assgn to ω f g () > g j () j ; n case of tes, the classfcaton s undefned In ths case, the classfer s a lnear machne A lnear machne dvdes the feature space nto c decson regons, th g () beng the largest dscrmnant f s n the regon R For a to contguous regons R and R j ; the boundary that separates them s a porton of hyperplane H j defned by: g () = g j () ( j ) t ( 0 j0 ) = 0 j s normal to H j and d(,h j ) = g g j j
9
It s easy to sho that the decson regons for a lnear machne are conve, ths restrcton lmts the fleblty and accuracy of the classfer 0
Lnear Dscrmnant Functons
Non-lnear Dscrmnant Fncs
3 Hgher Dmensonal Space Constructed Feature Constructed Feature Fnd functon Φ() to map to a dfferent space
Generalzed Lnear Dscrmnant Functons Decson boundares hch separate beteen classes may not alays be lnear The complety of the boundares may sometmes request the use of hghly non-lnear surfaces A popular approach to generalze the concept of lnear decson functons s to consder a generalzed decson functon as: 4 g() = f () f () N f N () N here f (), N are scalar functons of the pattern R n (Eucldean Space)
Introducng f n () = e get: 5 g( ) = N = here = ( f ( ) =, T,..., f ( ) N, N ) T and f() = (f ( ), f ( ),..., f N ( ), f N ( )) T Ths latter representaton of g() mples that any decson functon defned by equaton () can be treated as lnear n the (N ) dmensonal space (N > n) g() mantans ts non-lnearty characterstcs n R n
6 The most commonly used generalzed decson functon s g() for hch f () ( N) are polynomals ( ~ T g( ) = ) f ( ) T: s the vector transpose form ~ Where s a ne eght vector, hch can be calculated from the orgnal and the orgnal lnear f (), N Quadratc decson functons for a -dmensonal feature space g( ) = here : = (,,..., 6 ) 3 T 4 and f() = 5 (, 6,,,,) T
For patterns R n, the most general quadratc decson functon s gven by: 7 g( ) = n j j = n n n n () = j= = The number of terms at the rght-hand sde s: l = N = n n( n ) n = ( n )( n ) Ths s the total number of eghts hch are the free parameters of the problem If for eample n = 3, the vector If for eample n = 0, the vector f() f () s 0-dmensonal s 65-dmensonal
In the case of polynomal decson functons of order m, a typcal f () s gven by: f ( ) here = e e,... e,..., m m m n and m s 0 or It s a polynomal th a degree beteen 0 and m. To avod repettons, e request m e,. 8 g m ( ) = n... n n = = = m m m...... g ( ) m m (here g 0 () = n ) s the most general polynomal decson functon of order m
9 Eample : Let n = 3 and m = then: Eample : Let n = and m = 3 then: 4 3 3 3 33 3 3 3 3 3 4 3 3 3 ) ( g = = = = 3 3 3 3 ) ( g ) ( here g ) ( g ) ( g ) ( g 3 3 3 = = = = = = = = =
The commonly used quadratc decson functon can be represented as the general n- dmensonal quadratc surface: g() = T A T b c 0 here the matr A = (a j ), the vector b = (b, b,, b n ) T and c, depends on the eghts, j, of equaton () j If A s postve defnte then the decson functon s a hyperellpsod th aes n the drectons of the egenvectors of A In partcular: f A = I n (Identty), the decson functon s smply the n-dmensonal hypersphere
If A s negatve defnte, the decson functon descrbes a hyperhyperbolod In concluson: t s only the matr A hch determnes the shape and characterstcs of the decson functon
Objectve Functons J Lnear Separablty Perceptron Relaaton Procedures Convergence No lnear separablty Mn-Squared Error Msclassfed Samples All Samples
Lnear Separablty 3
-Category Lnear Case 4
5 Termnology Weght : a A A: Weght space Solutons: Not unque: Normalzed a = Margn a T y b
b > 0 Margn 6
Gradent Descent 7
Neton Descent 8
9
30
Perceptron 3
As Neural Netork 3
Relaaton Algorthm 33
34
36 Mean-Squared Error Error-Correctng Procedures Separable Samples Perceptron Relaaton procedures Nonseparable Samples Correctons n the error-correcton procedure ll NEVER converge Heurstc modfcatons Acceptable performance on nonseparable samples! MSE No guarantee that soln. s a separatng vector
Mn-Squared Error 37
38 MSE Pseudo-Inverse Drect relaton to Fscher s lnear dscrmnant If b=, MSE appromaton to the Bayes dscrmnant functon T g( ) = a y( ) b g( ) = P( ω ) P( ω )
39
40 Wdro-Hoff or LMS Mnmze usng gradent descent Sngular pseudo-nverse Inverson of large matrces
4 Ho Kashyap Procedure If tranng samples are lnearly separable, Yaˆ = bˆ here bˆ > 0 Gradent descent rt both a and b
4 Support Vector Machnes Map to a hgher dmensonal space
43 Support Vector Machne (SVM) Decson surface s a hyperplane n feature space In summary Map the data to a predetermned very hgh-dmensonal space va a kernel functon Fnd the hyperplane that mamzes the margn beteen the to classes If data are not separable fnd the hyperplane that mamzes the margn and mnmzes the (a eghted average of the) msclassfcatons
44 Separatng Hyperplanes? X
45 Idea: Mamze Margn Select the separatng hyperplane that mamzes the margn! Margn Wdth Margn Wdth
46 Support Vectors Support Vectors Margn Wdth 46
47 Constraned Optmzaton The dth of the margn s: k r r r b = k Optmzaton problem: r r b = k k k r r b = 0 k ma s. t. ( b) k, of class ( b) k, of class
Constraned Quadratc Optmzaton 48 If class corresponds to and class corresponds to -, e can rerte as ( b), th y = ( b ), th y = y ( b), So the problem becomes: ma s. t. y ( b), or mn s. t. y ( b),
49 Comparson: SVM vs NN SVMs Kernel maps to a very-hgh dmensonal space Search space has a unque mnmum Tranng s etremely effcent Classfcaton etremely effcent Kernel and cost the to parameters to select Very good accuracy n typcal domans Etremely robust Neural Netorks Hdden Layers map to loer dmensonal spaces Search space has multple local mnma Tranng s epensve Classfcaton etremely effcent Requres number of hdden unts and layers Very good accuracy n typcal domans
50 General Summary Perceptron and relaaton procedures Do not converge on nonseparable data. MSE Works regardless of separablty, but no guarantee of separaton thout error. Ho-Kashyap Provdes a separatng vector or th the evdence of nonseparablty. (No bound on the number of steps reqd.)