Machine learning and pattern recognition Part 2: Classifiers

Size: px
Start display at page:

Download "Machine learning and pattern recognition Part 2: Classifiers"

Transcription

1 Machne learnng and pattern recognton Part 2: Classfers Irs bulbeux I. hstro (Photo F. Vrecoulon). Irs nan botanque I. pumla attca (Vrecoulon). Grand rs 'Ecstatc Echo'. Tark AL ANI, Département Informatque et Télécommuncaton, ESIEE-Pars E-mal : t.alan@esee.fr Url:

2 0. Tranng-based modellng The machne learnng was largely devoted to solvng problems related to data mnng, text categorsaton [6], bomedcal problems such as data analyss [7], Magnetc resonance magng [8, 9], sgnal processng [10], speech recognton [11, 12], mage processng [13-19] and other felds. In general, the machne learnng or pattern recognton s used as a technque for data, pattern or a physcal process modellng. 18/03/2014 1

3 0. Tranng-based modellng It s only after the raw data acquston, preprocessng, extracton and selecton of most nformatve features from a representatve data (see the frst part of ths course RF1 ) that we are fnally ready to choose the type of a classfer and ts correspondent tranng algorthm to construct a model of the object or process of our nterest. 18/03/2014 2

4 0. Tranng-based 0. classfers modellng and regressors Supervsed learnng Supervsed learnng framework Consder the problem of separatng (accordng to some gven crteron by a lne, a hyperplan,..) a set of tranng vectors {p q R R } {1, 2,, n q }, q {1, 2,, Q}, called tranng feature vectors where Q s the maxmum number of classes. Gven a set of pars (p q, y q ), = 1, 2,, n q, q =1, 2,, Q: D = { (p, y ), (p, y ), (p, y ), 18/03/2014 D n 1 1 n 1 1 (p (p 12 1Q,, y y 12 1Q ), ),... (p (p 22 2Q,, n q could be the same q. y q s the desred output (target) correspondng to the nput feature vector p q, n class q. For example, y q [0, 1] or y q [-1, 1] n 2-D classfcaton or y q Rn regresson. y y 22 2Q ), ), (p (p n 2 n 2 Q, Q y, n y 2 n 2 Q ), Q )} 3

5 0. Tranng-based classfers and regressors In theory, the problem of classfcaton (or regresson) s to fnd a functon f that maps an Rx1 nput feature vector to an output: class label (n classfcaton case) or to real-valued (n regresson case), n whch nformaton s encoded n an approprate manner. p y = f(p) y Feature nput space Output space 18/03/2014 4

6 0. Tranng-based classfers and regressors Once the problem of classfcaton (or regresson) s defned, a varety of mathematcal tools such as optmzaton algorthms can be used to buld a model. 18/03/2014 5

7 0. Tranng-based classfers and regressors The classfcaton problem Recall that a classfer consders a set of feature vectors {p R R } (or scalars) =1, 2,.., N, from objects or a processes, each of whch belongs to a known class q, q {1,..., Q}. Ths set s called the tranng feature vectors. Once the classfer s traned, the problem s then to assgn to new gven feature vectors (feld feature vectors) p R R = [p 1 p 2 p R ] T, {1, 2,.., M}, the best class labels (classfer) or the best real value (regressor). In ths course we focus more on the classfcaton problem. 18/03/2014 6

8 0. Tranng-based classfers and regressors Example : Classfcatons of Irs flowers The Fsher's Irs data s a set of multvarate data ntroduced by Sr Ronald Aylmer Fsher (1936) as an example of dscrmnate analyss. Sr Ronald Aylmer Fsher FRS (17 February July 1962) was an Englsh statstcan, evolutonary bologst, eugencst and genetcst. Irs bulbeux I. hstro (Photo F. Vrecoulon). Irs nan botanque I. pumla attca (Vrecoulon). Grand rs 'Ecstatc Echo'. 18/03/2014 7

9 0. Tranng-based classfers and regressors Example : Classfcatons of Irs flowers (cont.) In botany, a sepal s one of the leafy, generally green, whch together made up the chalce and supports the flower corolla. A petal s a floral pece that surrounds the reproductve systems of flowers. Consttutng one of the folose whch together made up the petals of a flower, t s a modfed leaf. 18/03/2014 8

10 0. Tranng-based classfers and regressors Example : Classfcatons of Irs flowers (cont.) The data conssts of 50 samples from each of three speces of Irs flowers. (Irs setosa, Irs vrgnca et Irs verscolor). Four features were measured for each sample: the length and wdth of sepals and petals, n centmetres. The data set contans 3 classes where each class refers to a type of rs plant. One class s lnearly separable from the other 2; class2 and class 3 ('verscolor and 'vrgnca') are NOT lnearly separable from each other Ls Ws Lp Wp speces = 'setosa' 'setosa' 'setosa' 'setosa' Irs setosa Irs verscolor Irs vrgnca 18/03/ 'verscolor' 'verscolor' 'verscolor' 'verscolor'. 'vrgnca' 'vrgnca' 'vrgnca' 'vrgnca'. 'vrgnca'

11 0. Tranng-based classfers 1) Statstcal Pattern Recognton approaches [26] Several approches, the most mportant: 1.1 Bayes classfer [26] 1.2 Nave Bayes classfer [26] 1.3 Lnear and Quadratc Dscrmnant Analyss [26] 1.4 Support vector machnes (SVM) [1, 2, 26] 1.5 Hdden Markov models (HMMs) [27, 26, 31, 32] 2) Neural networks [26] 3) Decson trees [26] In ths course, we ntroduce only 1.1, 1.2, 1.4 and 2. 18/03/

12 1.Statstcal classfers Although the most common pattern recognton algorthms are classfed as statstcal approaches aganst neural network approaches, t s possble to show that they are closely related and even that there s a certan equvalence relaton between statstcal approaches and ther correspondng neural networks. 18/03/

13 1.1 Bayes classfer 1.Statstcal classfers Thomas Bayes (c Aprl 1761) was an Englsh mathematcan and Presbyteran mnster, known for havng formulated a specfc case of the theorem that bears hs name: Bayes' theorem. Bayes never publshed what would eventually become hs most famous accomplshment; hs notes were edted and publshed after hs death by Rchard Prce. We ntroduce the technques nspred by Bayes decson theory. In statstcal approaches, feature nstances (data samples) are treated as random varables (scalars or vectors) drawn from a probablty dstrbuton, where each nstance has a certan probablty of belongng to a class determned by ts probablty dstrbuton n the class. To buld a classfer, these dstrbutons must be ether known n advance or to be learned from data. 18/03/

14 1.1 Bayes classfer 1.Statstcal classfers The feature vector p belongng to class c q s consdered as an observaton drawn at random from a condtonal probablty dstrbuton over the class c q, pr(p c q ). Remark: Pr s the probablty n the dscrete feature case or the probablty densty functon n the contnuous feature case. Ths dstrbuton s called lkelhood: t s the condtonnal probablty to observe a feature vector p, gven the true class s c q. 18/03/

15 1.1 Bayes classfer 1.Statstcal classfers Maxmum a posteror probablty (MAP) Two cases: 1. All classes have equal pror probabltes pr(c q ). In ths case, the class wth the greatest lkelhood s more lkely to be the rght class,.e., the posteror probablty whch s the condtonal probablty that the true class s c q*, gven the feature vector p. pr ( c q * * p ) = max pr ( p c q ), q, q q { 1, 2, 3,..., Q} 2. All classes have not always equal pror probabltes (some classes may be nherently more lkely. The lkelhood s then converted by the Bayes theorem to a posteror probablty pr(c q p ), 18/03/

16 1.1 Bayes classfer Bayes theorem: a posteror probablty pr ( c q p Note that the evdence ) 1.Statstcal classfers pr ( p c q ) pr ( p c = = Q pr ( p ) pr ( p pr a pror probablty lkelhood 18/03/ q = 1 q c ) q pr ) ( c pr evdence ( total probablty ) s the same for all classes, and therefore ts value s nconsequental to the fnal classfcaton. Q ( p ) = pr ( p q = 1 c q ) q ) ( c pr q ) ( c q )

17 1.1 Bayes classfer 1.Statstcal classfers The pror probablty can be estmated from a pror experence. If such an experment s not possble, t can be estmated: - ether by the ratos between the numbers of features n each class and the total number of features; - ether by consderng that all these probabltes are equal f the number of features s not enough to make ths estmaton. 18/03/

18 1.1 Bayes classfer 1.Statstcal classfers A classfer constructed n ths way s usually called Bayes classfer or Bayes decson rule, and t can be shown that ths classfer s optmal, wth mnmal error n the statstcal sense. 18/03/

19 1.1 Bayes classfer 1.Statstcal classfers More general form Ths approch consder that all the errors are equally costly, and try then to mnmse the expected rsk R(a q p ), the expected loss of takng acton a q. 18/03/

20 1.1 Bayes classfer 1.Statstcal classfers Whle takng acton a q, usually consdered the selecton of a class c q, refusng to take an acton may also be consdered as an acton allowng the classfer to not make a decson f the estmated rsk of dong so s smaller than that to select one of the classes. 18/03/

21 1.1 Bayes classfer 1.Statstcal classfers The expected rsk can be calculated by Q R( c p) = λ( c c ) pr( c p q ' q = 1 where λ(c q c q' ) s the loss ncurred n takng acton c q when the correct class s c q'. λ( c ' ) 0 1 q q f f ' ' ) If one assocate an acton a q as the selecton of c q, and f all the errors are equally costly, the zero-one loss s obtaned q c q = q q = q q q ' ' 18/03/

22 1.1 Bayes classfer 1.Statstcal classfers Ths loss functon assgns no loss to correct classfcaton and assgns a loss of 1 to msclassfcaton. The rsk correspondng to ths loss functon s then R ( c q p) = pr ( c ' p ) = 1 pr ( cq p ) ' q q ' q = 1,, Q provng that the class that maxmses the posteror probablty mnmses the expected rsk. q 18/03/

23 1.1 Bayes classfer 1.Statstcal classfers Out of the three terms n the optmal Bayes decson rule, the evdence s unnecessary, the pror probablty can be easly estmated, but we have not mentoned how to obtan the key thrd term, the lkelhood. Yet, t s ths crtcal lkelhood term whose estmaton s usually very dffcult, partcularly for hgh dmensonal data, renderng Bayes classfer mpractcal for most applcatons of practcal nterest. 18/03/

24 1.1 Bayes classfer 1.Statstcal classfers One cannot dscard the Bayes classfer outrght, however, as several ways exst n whch t can stll be used: (1) If the lkelhood s known, t s the optmal classfer; (2) f the form of the lkelhood functon s known (e.g., Gaussan), but ts parameters are unknown, they can be estmated usng the parametrc approach: maxmum lkelhood estmaton (MLE) [26]*; * See Appendx n the frst part of lecture (RF1). 18/03/

25 1.1 Bayes classfer 1.Statstcal classfers (3) even the form of the lkelhood functon can be estmated from the tranng data usng non parametrc approach, for example, by usng Parzen wndows [1] *, however, ths approach becomes computatonally expensve as dmensonalty ncreases; (4) the Bayes classfer can be used as a benchmark aganst the performance of new classfers by usng artfcally generated data whose dstrbutons are known. * See the frst part of lecture (RF1). 18/03/

26 1.Statstcal classfers 1.2 Naïve Bayes classfer As mentoned above, the man dsadvantage of the Bayes classfer s the dffculty n estmatng the lkelhood (classcondtonal) probabltes, partcularly for hgh dmensonal data because of the curse of dmensonalty, where a large number of tranng nstances should be avalable to obtan a relable estmate of the correspondng multdmensonal probablty densty functon (pdf) assumng that features could be statstcally dependent on each other. 18/03/

27 1.2 Naïve Bayes classfer 1.Statstcal classfers There s hghly practcal soluton to ths problem, however, and that s to assume class-condtonal ndependence of the prmtves p n p = [p 1,, p R ] T pr(p c q ) = R = 1 pr( p whch yelds the so-called Naïve Bayes classfer. Ths equaton bascally requres that the th prmtve p of nstance p, s ndependent of all other prmtves n p, gven the class nformaton. c q ) 18/03/

28 1.2 Naïve Bayes classfer 1.Statstcal classfers It should be noted that ths s not as nearly restrctve as assumng full ndependence, that s, pr(p) = R = 1 pr( p ) 18/03/

29 1.2 Naïve Bayes classfer 1.Statstcal classfers The classfcaton rule correspondng to the Naïve Bayes classfer s then to compute the dscrmnant functon representng posteror probabltes as g q (p) = pr( c q ) R = 1 pr( for each classe c q, and then choosng the class for whch the dscrmnant functon g q (p) s largest. The man advantage of ths approach s that t only requres unvarate denstes pr(p c q ) to be computed, whch are much easer to compute than the multvarate denstes pr(p c q ). p c q ) 18/03/

30 1.2 Naïve Bayes classfer 1.Statstcal classfers In practce, Naïve Bayes has been shown to provde respectable performance, comparable wth that of neural networks, even under mld volatons of the ndependence assumptons. 18/03/

31 1.3 Lnear and Quadratc Dscrmnant Analyss 1.Statstcal classfers Lnear dscrmnant analyss (LDA) In the frst part of ths course (RF1), we ntroduced the prncple of LDA that can be used for lnear classfcaton. In the followng we wll deal brefly wth the problems of ts practcal mplementaton as a classfer. 18/03/

32 1.3 Lnear and Quadratc Dscrmnant Analyss Lnear dscrmnant analyss (LDA) 1.Statstcal classfers In practce, the means and covarances of a gven class are not known. They can, however, be estmated on the bass of tranng. Ether the estmate of maxmum lkelhood ether the estmaton of maxmum a posteror can be used nstead of exact values n the equatons gven n the frst part of the course. Although the estmate of the covarance can be consdered as optmal n some sense, ths does not mean that the resultng dscrmnaton obtaned by substtutng these parameters s optmal n all drectons, even f the hypothess of a normal dstrbuton of classes s correct. 18/03/

33 1.3 Lnear and Quadratc Dscrmnant Analyss Lnear dscrmnant analyss (LDA) 1.Statstcal classfers Another complcaton n the applcaton of LDA and Fsher dscrmnaton to real data occurs when the number of features n the feature vector of each class s greater than the number of nstances n ths class. In ths case, the estmate of the covarance does not have full rank, and therefore can not be nversed. 18/03/

34 1.3 Lnear and Quadratc Dscrmnant Analyss Lnear dscrmnant analyss (LDA) 1.Statstcal classfers There are a number of methods to address ths problem: The frst s to use a pseudo nverse nstead of the usual nverse of the matrx S W. The second s to use a Shrnkage estmator of the covarance matrx, usng a parameter δ [0, 1] called shrnkage ntensty or regularsaton parameter. For more detals, see e.g., [29, 30]. 18/03/

35 1.3 Lnear and Quadratc Dscrmnant Analyss 1.Statstcal classfers Quadratc Dscrmnant Analyss (QDA) A quadratc classfer s used n machne learnng and statstcal classfcaton to classfy data from two or more classes of objects or events by a quadrc surface (*). Ths s a more general verson of the lnear classfer. A second-order algebrac surface gven by the general equaton: Quadratc surfaces are also called quadrcs, and there are 17 standard-form types. A quadratc surface ntersects every plane n a (proper or degenerate) conc secton. In addton, the cone consstng of all tangents from a fxed pont to a quadratc surface cuts every plane n a conc secton, and the ponts of contact of ths cone wth the surface form a conc secton 18/03/

36 1.3 Lnear and Quadratc Dscrmnant Analyss Quadratc Dscrmnant Analyss (QDA) 1.Statstcal classfers In statstcs, If p s a feature vector consstng of R random features, and A s a RxR squar symetrc matrx, then the scalar quantty p T Ap s known as quadratc form n p. 18/03/

37 1.3 Lnear and Quadratc Dscrmnant Analyss Quadratc Dscrmnant Analyss (QDA) 1.Statstcal classfers The classfcaton problem For a quadratc classfer, the correct classfcaton s supposed to be of second degree n the features, then the class c q wll be decded on the bass of quadratc dscrmnant functon: g(p) = p T Ap+b T p+c In the specal case where each feature vector conssts of two features (R = 2), ths means that the surfaces of separaton of the classes are conc sectons (a lne, a crcle or an ellpse, a parabola or a hyperbola). For more detals, see e.g., [26]. 18/03/

38 1.3 Lnear and Quadratc Dscrmnant Analyss Quadratc Dscrmnant Analyss (QDA) 1.Statstcal classfers Types of conque sectons: 1. Parabola, 2. Crcle or ellpse, 3. Hyperbola 18/03/

39 Run Matlab 1. In the Matlab menu clck on "Help". 1.Statstcal classfers Usng Statstcs Matlab toolbox for lnear, quadratc and naïve Bayes classfers A separate help wndow wll be open then clck on "product help", wat for the openng of the wndow that dsplays all the toolboxes 2. In the search command feld on the left, type "classfcaton". You get tutoral on classfcaton at the rght sde of ths wndow: 18/03/

40 1.Statstcal classfers 1.3 Lnear, quadratc and naïve Bayes classfers 3. Start wth the ntroducton and follow the tutoral whch gudes you on usng these methods by clckng each tme the arrow at the bottom rght. Learn the use of the two methods ntroduced n the lecture: "Nave Bayes Classfcaton" and "Dscrmnant analyss". Exercse: explore the other methods. 18/03/

41 1.Statstcal classfers 1.4 Support vector machnes (SVM) Snce the year 1990, Support Vector Machnes (SVM), was a major theme n the theoretcal development and applcatons (see for example [1-5]). The theory of SVM s based on the combned contrbutons of the optmsaton theory, statstcal learnng, kernel theory and the algorthmc. Recently, the SVMs, has been appled successfully to solve problems n dfferent areas. Vladmr Vapnk s a leadng developer of the theory of SVM. 18/03/

42 1.4 Support vector machnes (SVM) 1.Statstcal classfers Let p : nput feature vector (pont), p = [p 1 p 2 p R ] T, R = maxmum number of attrbutes, {1, 2,, nq} P = [p 1 p 2 p Q ], Q = maxmum number of classes (q {1, 2,, Q} w: weght vector (traned classfer parameters): w = [w 1 w 2 w R ] T 18/03/

43 1.4 Support vector machnes (SVM) 1.Statstcal 2. Neural networks classfers General scheme for tranng a classfer 1. Gven a couple (P,y d ) of nput matrx (P) contanng the feature vectors of all classes (P=[p ], p R R = 1, 2,, (n 1 + n 2 + +n Q )) and desred output vector (y d =[y 1d, y 2d,, y (n1+ n2+ +nq)d ]) 2. when an nput p s presented to the classfer, the stable output y of the classfer s calculated; 3. the error vector E = [e 1, e 2,, e (n1+ n2+ +nq) ] = [y 1 -y 1d, y 2 -y 2d,, y (n1+ n2+ +nq) - y (n1+ n2+ +nq)d ] s calculated; 1. E s mnmsed by adjustng the vector w usng a specfc tranng algorthm based on some optmsaton method. y d p Classfer parameters (weghts) w error e = y -y d 18/03/ y Tranng algorthm

44 1.4 Support vector machnes (SVM) 1.Statstcal classfers Tradtonal optmsaton approaches apply a procedure based on the Mnmum mean square error (MMSE) between the desred result (desred classfer output: y d = +1 for samples p from class 1 and y d = -1 for samples p j from class 2) and the real result (classfer output: y ). 18/03/

45 1.4 Support vector machnes (SVM) 1.Statstcal classfers Lnear dscrmnant functons and decson hyperplanes Two-class case The decson hypersurface n the R-dmensonal feature space s a hyperplane, that s the lnear dscrmnant functon g(p) = w T p+b = 0 where b s known as the threshold or bas. Let p 1 and p 2 two ponts on the decson hyperplanes, p Hyperplane wth Bas = b p 2 w W T p +b = 0 then the followng s vald p 1 w T p 1 +b = w T p 2 +b = 0w T (p 1 - p 2 ) = 0. p 1 Snce the dfference vector p 1 - p 2 obvously les on the decson hyperplane (for any p 1, p 2 ), t s apparent from the fnal expresson that the vector w s always orthogonal to the decson hyperplane. 18/03/

46 1.4 Support vector machnes (SVM) 1.Statstcal classfers Lnear dscrmnant functons and decson hyperplanes Two-class case Let w 1 > 0, w 2 > 0 and b < 0. Then we can demonstrate that b z( w, b ) = = 2 2 w 1 + w d( w, b; p) = w T p + b w 2 = b w g( p) 2 w 1 + w 2 2 p 2 -b/w 2 d p w z -b/w 1 p 1.e., g(p) s a measure of the Eucldean dstance of the pont p from the decson hyperplane. On one sde of the plane g(p) takes postve values and on the other negatve. In the specal case that b = 0, the hyperplane passes through the orgn. 18/03/

47 1.4 Support vector machnes (SVM) 1.Statstcal classfers Two-class lnear SVM If the the hyperplane pass through the orgn: p 2 angle < 90 W T p 1 > 0 p 1 p W T p = 0 Hyperplane passes through the orgn, angle = 90 angle > 90 W T p 2 < 0 p 2 p 1 18/03/

48 1.4 Support vector machnes (SVM) Lnear dscrmnant functons and decson hyperplanes Two-class case Let w 1 > 0, w 2 > 0 and b < 0. Then we can demonstrate that b b z( w, b ) = = 2 2 w 1 + w w 1.Statstcal classfers If the hyperplane s based (does not pass through the orgn), the dscrmnant functon s then: g(p) = w T p+b = 0. d( w, b; p) = w T p + b w 2 = g( p) 2 w 1 + w 2 2 p 2 -b/w 2.e., g(p) s a measure of the Eucldean dstance of the pont p from the decson hyperplane. On one sde of the plane g(p) takes postve values and on the other negatve. In the specal case that b = 0, the hyperplane passes through the orgn. 18/03/ d z p -b/w 1 w p 1

49 1.4 Support vector machnes (SVM) 1.Statstcal classfers Support vectors classfer (SVC) Tradtonal approach of adjustng separaton plan Generalsaton capacty: The man queston now s how to fnd a separaton hyperplane to classfy the data n an optmal way? What we really want s to reduce the mnmum probablty of msclassfcaton for classfyng a set of feature vectors (feld feature vectors) that are dfferent from those used to adjust the weght parameters and b of the hyperplane (.e. the tranng feature vectors). 18/03/

50 1.4 Support vector machnes (SVM) 1.Statstcal classfers Suppose two possble hyperplane solutons: Both hyperplanes do the job for the tranng set. However, whch one of the two hyperplanes one choose as the classfer for operaton n practce, where data outsde the tranng set (from feld data set) wll be fed to t? No doubt the answer s: the full-lne one. p 2 No doubt the answer s: the full-lne one: ths hyperplane leaves more space on ether sde, so that data n both classes can move a bt more freely, wth less rsk of causng an error. Thus such a hyperplane can be trusted more, when t s faced wth the challenge of operatng wth unknown data,.e. ncreasng the generalsaton performance of the classfer. Now we can accept that a very sensble choce for the hyperplane classfer would be the one that leaves the maxmum margn from both classes. p 1 18/03/

51 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) Lnear classfers 1.Statstcal classfers We consder that the data s lnearly separable and wsh to fnd the best lne (or hyperplane) separatng them nto two classes: T w p b + 1, class1, y 1, class 1 = + p w T p + + b 1, class 2, y = 1, p class 2 The hypothess space s then defned by the set of functons (decson surfaces): y = f w, b = sgn(w T p +b), y {-1, 1} If the parameters w and b are calbrated by the same amount, then the decson surface wll not be changed n ths case. 18/03/

52 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Why the number x s +1 or -1 n w T p +b x and not any number x? The parameter x can take any value, whch means that the two plans can be close or dstant from one another. By settng the value of x, and by dvdng both sdes of the above equaton by x, we obtan ± 1 on the rght sde. However, the drecton and poston n space of the hyperplane n the two cases do not change. W T p+b = +1 W T p+b = -1 W T p+b = 0 The same apples to the hyperplane descrbed by the equaton w T p +b = 0. Normalsng by a constant value x has no effect on the ponts that are on (and defne) the hyperplane. 18/03/

53 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers To avod ths redundancy, and to match each decson surface to a unque par (w, b) t s approprate to constran the parameters w, b by T mn w p + b =1 The set of hyperplanes defned by ths constrant are called canoncal hyperplane [1]. Ths constrant s just a normalsaton that s sutable for the optmzaton problem. 18/03/

54 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) Here we assume that the data are lnearly separable, whch means that we can draw a lne on a graph p 1 vs. p 2 separatng the two classes when R = 2 and a hyperplane on the graphs of p 1, p 2,, p R when R > 2. As we show before, the dstance from the nearest nstance n the data set to the lne or the hyperplane to be equal to d( w, b ; p ) = w T p w 1.Statstcal classfers + b p 2 W T p = 0 Separaton lne or hyperplane wth bas = b w p p W T p +b = 0 18/03/ p 1

55 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers The optmal separatng hyperplane s the one that mnmses the mean square error (mse) between the desred result (+1 or -1) and actual results obtaned when classfyng the gven data nto 2 classes 1 and 2 respectvely. Ths mse crteron turns out to be optmal when the statstcal propertes of the data are Gaussan. But f the data s not Gaussan, the result wll be based. 18/03/

56 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Example: In the followng Fgure, two Gaussan clusters of data are separated by a hyperplane. It s adjusted usng a mnmum mse crteron. Samples of both classes have the mnmum possble mean squared dstance to the hyperplanes w T p + b = ±1. w T p+b = +1 w T p+b = 0 w T p+b = -1 18/03/

57 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Example (Cont.) But n the followng fgure, the same procedure s appled to a data set (non-gaussan or Gaussan corrupted by some outlers that are far from the centre group, thus basng the result. W T p+b =+1 W T p+b = 0 W T p+b = -1 18/03/

58 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers SVM approach In the classcal classfcaton approaches, t s consdered that the classfcaton error s commtted by a pont f t s on the wrong sde of the decson hyperplane formed by the classfer. In the SVM approach more constrants wll be mposed: not only nstances on the wrong sde of the classfer that contrbute to the countng of the error, but also any nstance that s between w T p+b = ± 1, even f t s on the rght sde of the classfer. Only nstances that are outsde these lmts and on the rght sde of the classfer does not contrbute to the cost of error countng. 18/03/

59 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Exemple: two overlappng classes and two lnear classfers denoted by a dash-dotted and a sold lne, respectvely. For both cases, the lmts have been chosen to nclude fve ponts. Observe that for the case of the dash-dotted classfer, n order to nclude fve ponts the margn had to be made narrow. 18/03/

60 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Imagne that the open and flled crcles n the prevous Fgure are houses n two nearby vllages and that a road must be constructed n between the two vllages. One has to decde where to construct the road so that t wll be as wde as possble and ncur the least cost (n the sense of demolshng the smallest number of houses). No sensble engneer would choose the dash-dotted opton. The dea s smlar wth desgnng a classfer. It should be placed between the hghly populated (hgh probablty densty) areas of the two classes and n a regon that s sparse n data, leavng the largest possble margn. Ths s dctated by the requrement for good generalzaton performance that any classfer has to exhbt. That s, the classfer must exhbt good error performance when t s faced wth data outsde the tranng set (valdaton, test or feld data). 18/03/

61 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers To solve the above problem, we mantan always the assumpton that data are separable wthout msclassfcaton by a lnear hyperplane. The optmalty crteron s: put the separaton hyperplane as far as possble away from the nearest nstances, but keepng all the nstances n ther good sde. Separaton hyperplane W T p+b = 0 (*) CAUTION: In some books and papers, the margn s consdered as the dstance 2d. 18/03/

62 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Ths translates n: maxmzng the margn d between the separaton hyperplane and the nearest nstances, but now placng the margn hyperplanes w T p+b = ± 1 nto the separaton margn. W T p+b = +1 Separaton hyperplane W T p+b = 0 W T p+b = -1 18/03/

63 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers One can reformulate the SVM crteron as: maxmze the dstance d between the separatng hyperplane and the nearest samples subject to the constrants y [w T p + b] 1 wher y [+1, -1] s the class label assocated to the nstances p. 18/03/

64 The margn wdth (= 2d) between the margn hyperplanes s wher. (sometmes denoted. 2 ) denotes the Eucldan norm. Demonstraton : w w 2 2 ), ( = = d b M p w p w w p p ) ;, max d( ) ;, d( max b), ( 1, 1, + = = = b b M y y 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers 18/03/ w p w p w w w p w w p w p p p p p p 2 ) max max ( 1 max max 1, 1, 1, 1, 1, 1, = = = = = = = = = b b b b T y T y T y T y y y

65 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Maxmsaton of d s equvalent to solvng a quadratc optmsaton: Mnmse the norm of the vector w. Ths gves a more useful expresson crteron of SVM: mn w w, b T subject to the constrant y [ w p + b] 1, = 1, 2,..., nq 1 2 Mnmse w s equvalent to mnmse w and the use of ths 2 w term then allows for optmsaton by a quadratc programmng: Remnder : 18/03/ mn w w, b 2 subject to w 2 the = constrant w T w y T = [ w p + b] 1 0, = 1, 2,..., nq 64

66 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers In practcal stuatons, the samples are not lnearly separable, so the prevous constrant cannot be satsfed. For that reason, slack varables must be ntroduced to account for the non separable samples [33]. Then, the optmzaton crteron consst of mnmzng the (prmal) functonal [33, 21]. nq 1 2 mn w + C ξ w, b 2 = 1 T subject to the constrant y = [ w p + b] 1 ξ wth ξ > =, nc 0, 1 2, 3,, For a smple ntroducton on the dervaton of SVM optmzaton procedures, see for example [20-23]. 18/03/

67 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers If the nstance p s correctly classfed by the hyperplane, and t s outsde the margn, ts correspondng slack varable sξ = 0. If t s well classfed but t s nto the margn, then 0 <ξ < 1. If the sample s msclassfed, thenξ > 1. The value of C s a trade-off between the maxmzaton of the margn and the mnmzaton of the errors. SVM of 2 non lnearly separable classes 18/03/

68 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Once the crteron of optmalty has been establshed, we need a method for fndng the parameter vector w whch meets t. The optmzaton problem n the last equatons s a classcal constraned optmzaton problem. In order to solve ths a optmzaton problem, one must apply a Lagrange optmzaton procedure wth as many Lagrange multplers λ as constrants [22]. 18/03/

69 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Optmal estmaton of w (for demonstraton, see for example [34, 21]) Mnmse the cost s a compromse between a large margn and a few margn errors. The soluton s gven as a weghted average of nstances of learnng: w * = nq = 1 λ y p The coeffcentsλ, wth 0 λ C are the Lagrange multplers of the optmsaton task and they are zero for all nstances outsde of the margn and on the rght sde of the classfer. These nstances do not contrbute to the determnaton of the drecton of the classfer (drecton of hyperplanes defned by w). The rest of the nstances, wth non zero λ, whch contrbute to the constructon of w *, are called support vectors. 18/03/

70 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Selectng a value for the parameter C The free parameter C controls the relatve mportance of mnmzng the norm of w (whch s equvalent to maxmzng the margn) and satsfyng the margn constrant for each data pont. The margn of the soluton ncreases as C decreases. Ths s natural, because reducng C makes the margn term n equaton 1 2 w 2 + C more mportant. In practce, several SVM classfers must be traned, usng tranng as well as testng data wth dfferent values of C (e.g., start from mn. to max. n {0.1, 0.2, 0.5, 1, 2, 20}) and select the classfer whch gves the mnmum test error. 18/03/ nc = 1 ξ

71 Estmaton of b Any nstance p s whch s a support vector as well as ts desred response y s satsfy : y s [w* T p s + b] 1 or 1 ) ( = + b y y s j T j S j j j s p p λ 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers 18/03/ s s the ndex set of support vectors where λ > 0. Multplyng by y s and usng (y s ) 2 =1 Instead of usng an arbtrarly support vector p js, t s better to use the average over all support vectors n S: j S s j T j S j j j s s s j T j S j j j s y y b y b y y p p p p = = + λ λ then, ) ( ) ( 2 ) ( 1 s j T j S S j j j s s y y N b p p = λ

72 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Practcal mplementaton of the SVM algorthm for the separaton of 2 lnearly separable classes Create a matrx H, wth H j = y y j p T p j,, j = 1, 2,, nc 1. select a value for the parameter C (start from mn to max, e.g. C n {0.1, 0.2, 0.5, 1, 2, 20}) 2. fnd Λ={λ 1, λ 2,, λ nc } such as the quantty nc = 1 λ s maxmsed (usng a quadratc programmng solver) subject to the constrants 0 λ C and nc λ y = 1 18/03/ λ = T 0 H λ

73 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Practcal mplementaton of the SVM algorthm for the separaton of 2 lnearly separable classes (Cont.) nq 4. Calculate w * = λ y p = 1 5. Determne the set of support vectors S by fndng the ndexes such that 0 < λ C 1 s T s 6. Calculate b = ( y λ j y jp jp j ) Ns S j S 7. Each new vector p s classfed by the followng evaluaton: T If If w w T p p + b 1, + b 1, y y = + 1, p = 1, p class1 class 2 8. Calculate the tranng and the test errors usng test data. 9. Repeat from step 1 (construct another classfer) wth a next value of C 10. Choose the best classfer that mnmses the test error wth mnmum number of support vectors. 18/03/

74 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Example [20] : Lnear SVM classfer (SVC) n MATLAB An easy way to program a lnear SVC s to use the MATLAB quadratc programmng "quadprog.m". Frst, generate a set of data n two dmensons wth few nstances from two classes wth ths smple code: p = [randn(1,10)-1 randn(1,10)+1;randn(1,10)-1 randn(1,10)+1]'; y = [- ones(1,10) ones(1,10)]' ; Ths generates a matrx of lne vectors n = 20 n two dmensons. We study the performance of the SVC on a non-separable set. The frst 10 samples are labeled as vectors of class 1, and the rest as vectors of class /03/

75 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) %%Lnear Support Vector Classfer %%%%%%%%%%%%%%%%%%%%%%% %%% Data Generaton %%% %%%%%%%%%%%%%%%%%%%%%%% x=[randn(1,10)-1 randn(1,10)+1;randn(1,10)-1 randn(1,10)+1]'; y=[-ones(1,10) ones(1,10)]'; %%%%%%%%%%%%%%%%%%%%%%%% %%% SVC Optmzaton %%% %%%%%%%%%%%%%%%%%%%%%%%% R=x*x'; % Dot products Y=dag(y); H=Y*R*Y+1e-6*eye(length(y)); % Matrx H regularzed f=-ones(sze(y)); a=y'; K=0; Kl=zeros(sze(y)); C=100; % Functonal Trade-off Ku=C*ones(sze(y)); alpha=quadprog(h,f,[],[],a,k,kl,ku); % Solver w=x'*(alpha.*y); % Parameters of the Hyperplane %%% Computaton of the bas b %%% e=1e-6; % Tolerance to errors n alpha nd=fnd(alpha>e & alpha<c-e) % Search for 0 < alpha_ < C b=mean(y(nd) - x(nd,:)*w) % Averaged result 1.Statstcal classfers %%%%%%%%%%%%%%%%%%%%%% %%% Representaton %%% %%%%%%%%%%%%%%%%%%%%%% data1=x(fnd(y==1),:); data2=x(fnd(y==-1),:); svc=x(fnd(alpha>e),:); plot(data1(:,1),data1(:,2),'o') hold on plot(data2(:,1),data2(:,2),'*') plot(svc(:,1),svc(:,2),'s') % Separatng hyperplane plot([-3 3],[(3*w(1)-b)/w(2) (-3*w(1)-b)/w(2)]) % Margn hyperplanes plot([-3 3],[(3*w(1)-b)/w(2)+1 (-3*w(1)-b)/w(2)+1],'--') plot([-3 3],[(3*w(1)-b)/w(2)-1 (-3*w(1)-b)/w(2)-1],'--') %%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Test Data Generaton %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%% x=[randn(1,10)-1 randn(1,10)+1;randn(1,10)-1 randn(1,10)+1]'; y=[-ones(1,10) ones(1,10)]'; y_pred=sgn(x*w+b); %Test error=mean(y_pred~=y); %Error Computaton 18/03/

76 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Generated data. 18/03/

77 The values of Lagrange multplers λ. Crcles and squares correspond to crcles and stars of the data n the prevous and the next fgures. 18/03/

78 1.4 Support vector machnes (SVM) Support vectors classfer (SVC) 1.Statstcal classfers Resultng margn and separatng hyperplanes. Support vectors are marked by squares. 18/03/

79 1.Statstcal classfers Lnear Support vectors regressor (LSVR) A lnear regressor s a functon f(p) = w T p+b whch allows for an approxmaton of a mappng from set of vectors p R R to a set of scalars y R. w T p+b Lnear egresson 18/03/

80 1.Statstcal classfers 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) Instead of tryng to classfy new varables nto two categores y = ± 1, we now want to predct a real-valued output y R. The man dea of a SVR s to fnd a functon whch fts the data wth a devaton less than a gven quanttyεfor every sngle par p, y. At the same tme, we want the soluton to have a mnmum norm w. Ths means that SVR does not mnmze errors less thanε, but only hgher errors. 18/03/

81 Formulaton of the SVR The dea to adjust the lnear regressor can be formulated n the followng prmal functonal, n whch we mnmze the norm of w plus the total error. ) ( 1 ' 2 ξ ξ + + = n C L w 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) 1.Statstcal classfers Subject to the constrants ) ( 2 1 ' 1 2 ξ ξ + + = = p C L w 0, ' ' T T b y b y ξ ξ ε ξ ε ξ p w p w 18/03/

82 1.Statstcal classfers 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) Les contrantes précédentes sgnfent : For each nstance, f the error > 0 and error >ε, then error s forced to be less Than ξ + ε. f the error < 0 error >ε, then error s forced to be less Than ξ ' + ε. ' If error <ε, then the correspondng slack varable wll be zero, as ths s the mnmum allowed value for the slack varables n the prevous constrants. Ths s the concept ofε- nsenstvty [2]. 18/03/

83 1.Statstcal classfers 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) Concept of ε- nsenstvty. Only nstances out of (margn/2) ± ε wll have a nonzero slack varable, so they wll be the only ones that wll be part of the soluton. 18/03/

84 1.Statstcal classfers 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) The functonal s ntended to mnmze the sum of the slack varables ' and. Only losses of samples for whch the error s greater thanεappear, so the soluton wll be only functon of those samples. The appled cost functon s a lnear one, so the descrbed procedure s equvalent to the applcaton of the so-called Vapnk or ε- nsenstve cost functon ξ ξ l ( e for ) 0 = e e = ξ ε + ε, e e e < ε > ε = ' ξ ε Vapnk orε-nsenstve cost functon. 18/03/

85 1.Statstcal classfers 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) Ths procedure s smlar to the one appled to SVC. In prncple, we should force the errors to be less thanεand we mnmze the norm of the parameters. Nevertheless, n practcal stuatons, t may be not possble to force all the errors to be less thanε. In order to be able to solve the functonal, we ntroduce slack varables on the constrants, and then we mnmze them. 18/03/

86 1.Statstcal classfers 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) To solve ths constraned optmzaton problem, we can apply Lagrange optmzaton procedure to convert t nto an unconstraned one. The resultng dual formulaton for ths dual functonal s [20-23, 34] : 1 nc nc nc ' 2 = 1 = 1 = 1 ' T ' ' Ld = ( λ λ ) p p j ( λ λ j ) + (( λ λ ) y ( λ + λ ) ε ) 2 wth the addtonal constrant 0 ( λ λ ) ' C 18/03/

87 1.Statstcal classfers 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) The mportant result of ths dervaton s that the expresson nc of the parameters w s ' w ( λ λ ) p and nc = 1 = = 1 ' ( λ λ ) In order to fnd the bas, we just need to recall that for all samples that le n one of the two margns, the error s exactly ε for those samples λ et λ' < C. Once these samples are dentfed, we can solve b from the followng equatons y w T p b + ε = T y + w p + b + ε = 0 for the nstances p for whch 0 18/03/ = 0 λ, λ ' < C

88 1.Statstcal classfers 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) In matrx notaton we get Ld = 1 ' ' T ' ' T ( λ λ ) R( λ λ ) + ( λ λ ) y ( λ + λ ) 1ε 2 T [ p p j Where R s the dot product matrx Ths functonal can be maxmzed usng the same procedure used for SVC. Very small egenvalues may eventually appear n the matrx, so t s convenent to numercally regularze t by addng a small dagonal matrx to t. The functonal becomes 1 ' ' T ' ' T Ld = ( λ λ ) [ R + γ I]( λ λ ) + ( λ λ ) y ( λ + λ ) 1ε 2 where s a regularsaton constant. Ths numercal regularzaton s equvalent to the applcaton of a modfed verson of the cost functon. 18/03/ ]

89 We need to compute the dot product matrx R and then the product but the Lagrange multplers λ et λ' should be splt to be dentfed after the optmzaton. To acheve ths we use the equvalent form 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) 1.Statstcal classfers T ) ]( [ ) ( ' ' λ λ I R λ λ + γ T We can use the Matlab functon quadprog.m to solve the optmzaton problem. 18/03/ ' ' λ λ I I I I R R R R λ λ γ T

90 1.Statstcal classfers 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) Exemple [20] : SVR lnéare en MATLAB We start by wrtng a smple lnear model of the form y (x ) = a x + b + n (15) where x s a random varable and n s a Gaussan process. P = rand(100,1) % Generate 100 unform nstances between -1 et 1 y =1.5*P+1+0.1*randn(100,1) % Lnear model plus nose Generated data. 18/03/

91 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) %% Lnear SupportVector Regressor %%%%%%%%%%%%%%%%%%%%%%% %%% Data Generaton %%% %%%%%%%%%%%%%%%%%%%%%%% x=rand(30,1); % Generate 30 samples y=1.5*x+1+0.2*randn(30,1); % Lnear model plus nose %%%%%%%%%%%%%%%%%%%%%%%% %%% SVR Optmzaton %%% %%%%%%%%%%%%%%%%%%%%%%%% R_=x*x'; R=[R_ -R_;-R_ R_]; a=[ones(sze(y')) -ones(sze(y'))]; y2=[y;-y]; H=(R+1e-9*eye(sze(R,1))); epslon=0.1; C=100; f=-y2'+epslon*ones(sze(y2')); K=0; K1=zeros(sze(y2')); Ku=C*ones(sze(y2')); alpha=quadprog(h,f,[],[],a,k,k1,ku); % Solver beta=(alpha(1:end/2)-alpha(end/2+1:end)); w=beta'*x; %% Computaton of bas b %% e=1e-6; % Tolerance to errors n alpha nd=fnd(abs(beta)>e & abs(beta)<c-e) % Search for % 0 < alpha_ < C b=mean(y(nd) - x(nd,:)*w) % Averaged result 1.Statstcal classfers %%%%%%%%%%%%%%%%%%%%%% %%% Representaton %%% %%%%%%%%%%%%%%%%%%%%%% plot(x,y,'.') % All data hold on nd=fnd(abs(beta)>e); plot(x(nd),y(nd),'s') % Support Vectors plot([0 1],[b w+b]) % Regresson lne plot([0 1],[b+epslon w+b+epslon],'--') % Margns plot([0 1],[b-epslon w+b-epslon],'--') plot([0 1],[ ],':') % True model} 18/03/

92 1.Statstcal classfers 1.4 Support vector machnes (SVM) Lnear Support vectors regressor (LSVR) Results : Contnuous lne: SVR; Dotted lne: real lnear model; Dashed lnes: margns; square ponts: support vectors. Fgure adapted from [20] 18/03/

93 1.Classfcateurs statstques LMCSVM Lnear multclass SVM (LMCSVM) Although mathematcal generalsatons for the multclass case are avalable, the task tends to become rather complex. When more than two classes are present, there are several dfferent approaches that evolve around the 2-class case. The more used methods are called one versus all and one versus one. These technques are not talored to the SVM. They are general and can be used wth any classfer developed for the 2-class problem. 18/03/

94 1.4 Support vector machnes (SVM) Lnear multclass SVM (LMCSVM) 1.Statstcal classfers One-versus-all The method one-versus-all s used to buld Q bnary classfers by assgnng a label to nstances from one class and a label -1 to nstances from all others. For example, n 4 classes problem, we construct 4 bnary classfers : {c 1 /{c 2, c 3, c 4 }, c 2 /{c 1, c 3, c 4 }, c 3 /{c 1, c 2, c 4 }, c 4 /{c 1, c 2, c 3 }}. For each one of the classes, we seek to desgn an optmal dscrmnant functon, g q (p), q = 1, 2,..., Q, so that g q (p) > g q (p), q q, f p c q. Adoptng the SVM methodology, we can desgn the dscrmnant functons so that g q (p) = 0 to be the optmal hyperplane separatng class c q from all the others. Thus, each classfer s desgned to gve g q (p) > 0 for p c q and g q (p) < 0 otherwse. 18/03/

95 1.4 Support vector machnes (SVM) Lnear multclass SVM (LMCSVM) 1.Statstcal classfers Accordng to the one-aganst-all method, Q classfers have to be desgned. Each one of them s desgned to separate one class from the rest. For the SVM paradgm, we have to desgn Q lnear classfers: w T k p + b k, k =1, 2,..., Q For example, to desgn classfer c 1, we consder the tranng data of all classes other than c 1 to form the second class. Obvously, unless an error s commtted we expect all ponts from class c 1 to result n w T 1 p + b1 + 1, and the data from the rest of the classes to result n negatve outcomes. w T p m + b m 1, m 1 T T p s classfed n c f w p + b > w p + b, m l = 1,2,, Q l l l 18/03/ m m

96 1.4 Support vector machnes (SVM) Lnear multclass SVM (LMCSVM) 1.Statstcal classfers The classfer gvng the hghest margn wns the vote. : assgn p n c f q = arg max q { g ( p) } A drawback of one-aganst-all s that after the tranng there are regons n the space, where no tranng data le, for whch more than one hyperplane gves a postve value or all of them result n negatve values. q ' q ' 18/03/

97 1.4 Support vector machnes (SVM) Lnear multclass SVM (LMCSVM) 1.Statstcal classfers One-versus-one The more wdely used method one-versus-one constructs Q(Q 1) /2 bnary classfers (each classfer separates a par of classes.) by confrontng each one of the Q classes. For example, n 4 classes problem, we construct 6 bnary classfers : {{ c 1 /c 2 }, {c 1 /c 3 }, {c 1 /c 4 }, {c 2 /c 3 }, {c 2 /c 4 }, {c 3 /c 4 }}. In 3 classes problem, we construct 3 bnary classfers : {{c 1 /c 2 }, {c 1 /c 3 }, {c 2 /c 3 }}. In the classfcaton phase, the nstance to classfy s analysed by each classfer and a majorty vote determnes ts class. The obvous dsadvantage of the technque s that a relatvely large number of bnary classfers has to be traned. In [Plat 00] a methodology s suggested that may speed up the procedure. [Plat 00] Platt J.C., Crstann N., Shawe-Taylor J. Large margn DAGs for the multclass classfcaton, n Advances n Neural Informaton Processng,(Smola S.A.,LeenT.K.,Muller K.R., eds.), Vol. 12, pp , MIT Press, /03/

98 1.4.1 Nonlnear SVM 1.Statstcal classfers Non lnear mappng of the feature vectors (p) n a hghdmensonal space We adopt the phlosophy of non-lnear mappng of feature vectors n a space of hgher dmenson, where we expect, wth hgh probablty, that the classes are lnearly separable (*). Ths s guaranteed by the famous theorem of Cover [34, 35]. (*) See 18/03/

99 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers Non-lnear low dmensonal space (p) Non-lnear Kernel functon φ(p) Lnear hgh dmensonal space (p) Lnear SVM 18/03/

100 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers Mappng from : p R R ---φ(p ) > p R H where the dmenson of H s greater than R, dependng on the choce of the nonlnear functonφ( ). In addton, f the functonφ( ) s carefully chosen from a known famly of functons that have specfc desrable propertes, the nner (or dot) product <φ(p ),φ(p j ) > between the spaces correspondng to two nput vectors p, p j can be wrtten as <φ(p ),φ(p j ) > = k(p, p j ) where <, > denotes the nner product n H and k (, ) s a known functon, called kernel functon. 18/03/

101 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers In other words, nner products n hgh-dmensonal space can be made n terms of the kernel functon actng n the orgnal space of low dmenson. The space H assocated wth k (, ) s known as reproducng kernel Hlbert space (RKHS) [35, 36]. 18/03/

102 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers Two typcal examples of kernel functon: (a) Radal bass functon (RBF), s a real-valued functon k(p, p j ) =φ( p - p j ) The norm s usually the Eucldean dstance, although other dstance functons are also possble. The sums of radal functons are typcally used to approxmate a gven functon. Ths approxmaton process can also be nterpreted as a knd of smple neural network. 18/03/

103 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers Exampls of RBFs : Let r = p -p j σ ϕ( r) = e Gaussan:, where σ a user-defned parameter whch specfes the decay rate of k(p, p j ) twards zero. Multquadratc: r 2 2 ϕ ( r ) = 1 + ( ε r ) Inverse quadratc: ϕ( r) = Inverse multquadratc: ϕ( r) = 2 1+ ( εr) k Splne polyharmonque : ϕ( r) = r, k = 1, 3, 5,... k ϕ( r) = r ln(r), k = 2, 4, Specal splne polyharmonc (thn plate splne) : ( εr) 1 6,... ϕ( r ) = r ln(r) 18/03/

104 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers (b) polynomal functon (PF): k(p, p j ) = (p T p j +β) n whereβ, n are user-defned parameters. Note that the resoluton of a lnear problem n hgh-dmensonal space s equvalent to the resoluton of a non-lnear problem n the orgnal space. 18/03/

105 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers Although a classfer (lnear) s formed n the space RKHS, due to the non-lnearty of the mappng functonφ( ), the method s equvalent to a nonlnear functon n the orgnal space. Moreover, snce each operaton can be expressed n nner products, the explct knowledge of φ ( ) s not necessary. All that s necessary s to adopt the kernel functon whch defnes the nner product. 18/03/

106 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers (c) Sgmoïde (d) Drchlet K T ( p, p ) tanh( γ p p + µ ) j = j K ( p, p j ) = sn(( n + 2 sn(( 1 / p 2)( p p j ) / p 2) j )) More on Nonlnear PCA (NLPCA): 18/03/

107 1.4 Support vector machnes (SVM) Nonlnear SVM Constructon of a nonlnear SVC 1.Statstcal classfers The soluton to the lnear SVC s gven by a lnear combnaton of a subset of tranng data: w = nc = 1 y p If, before optmzaton data s mapped nto a Hlbert space, then the soluton becomes nc w = (A) whereφ s a mappng nonlnear functon. λ λ y ϕ = 1 18/03/

108 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers The parameter vector w s a combnaton of vectors nto the Hlbert space, but recall that many transformatons φ( ) are unknown. Thus, we may not have an explct form for them. But the problem can stll be solved, because the SVM just needs the dot products of the vector, and not an explct form of them. 18/03/

109 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers We cannot use ths expresson y j = w T φ (p j ) + b (B) because the parameters w are n an nfntedmensonal space, so no explct expresson exsts for them. However, by substtutng equaton (A) n equaton (B), we obtan y nc nc = j = 1 = 1 T j y λ ϕ ( p ) ϕ ( p ) + b = y λ K ( p, p ) + j b (C) 18/03/

110 In lnear algebra, the Graman matrx (or Gram matrx or Graman) of a set of vectors v 1,, v n n an nner product space s the Hermtan matrx of nner products, whose entres are gven by (D) Jørgen Pedersen Gram (June 27, 1850 Aprl 29, 1916) was a = = n n G v v v v v v v v v v v v v v v v,,,,,,, ), ( Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers (D) Dansh actuary and mathematcan who was born n Nustrup, Duchy of Schleswg, Denmark and ded n Copenhagen, Denmark = = n n n n n j j G v v v v v v v v v v,,,, ), ( /03/

111 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers The resultng SVM can now be expressed drectly n terms of the Lagrange multplers and the Kernel dot products. In order to solve the dual functonal whch determnes the Lagrange multplers, the transformed vectors φ(p ) and φ(p j ) are not requred ether, but only the Gram matrx K of the dot products between them. Agan, the kernel s used to compute ths matrx K j = K(p, p j ) (E) 18/03/

112 1.4 Support vector machnes (SVM) Nonlnear SVM 1.Statstcal classfers Once ths matrx has been computed, solvng for a nonlnear SVM s as easy as solvng for a lnear one, as long as the matrx s postve defnte. It can be shown that f the kernel fts the Mercer theorem, the matrx wll be postve defnte [25]. In order to compute the bas b, we can stll make use of the expresson (y j (w T p + b) 1 = 0), but for the nonlnear SVC t becomes nc T y ( y λ ϕ ( p ) ϕ ( p ) + b ) 1 = 0 y j j ( p = 1 = 1 j nc y pour λ K ( p lequel, p λ j ) < C. We just need to extract b from expresson (F) and average t for all samples wth λ < C. j + b ) 1 (*) Explct condtons that must be met for a kernel functon: t must be symmetrc, postve sem-defnte. 18/03/ = 0 (F)

113 1.4 Support vector machnes (SVM) Nonlnear SVM Example [20] : 1.Statstcal classfers Nonlnear Support Vector Classfer (NLSVC) n MATLAB In ths example, we try to classfy a set of data whch cannot be reasonably classfed usng a lnear hyperplane. We generate a set of 40 tranng vectors usng ths code k=20; %Number of tranng data per class ro=2*p*rand(k,1); r=5+randn(k,1); x1=[r.*cos(ro) r.*sn(ro)]; x2=[randn(k,1) randn(k,1)]; x=[x1;x2]; y=[-ones(1,k) ones(1,k)]'; 18/03/

114 1.4 Support vector machnes (SVM) Nonlnear SVM-SVC 1.Statstcal classfers Example: NLSVC (cont.) We generate a set of 100 test vectors usng ths code ktest=50; %Nombre de données de test par class ro=2*p*rand(ktest,1); r=5+randn(ktest,1); p1=[r.*cos(ro) r.*sn(ro)]; p2=[randn(ktest,1) randn(ktest,1)]; ptest=[p1;p2]; ytest=[-ones(1,ktest) ones(1,ktest)]' ; 18/03/

115 1.4 Support vector machnes (SVM) Nonlnear SVM-SVC Example: NLSVC (cont.) 1.Statstcal classfers x 2 x 1 An example of generated data. 18/03/

116 1.4 Support vector machnes (SVM) Nonlnear SVM-SVC Example: NLSVC (cont.) 1.Statstcal classfers The steps of the SVC procedure : 1. Calculate the nner product matrx K j = K(p, p j ) Snce we want a non-lnear classfer, we compute the nner product matrx usng a kernel. Choose a kernel In ths example, we choose a Gaussan kernel K ( p, p j ) = e γ p p j, avec γ = 1 2σ 2 18/03/

117 1.4 Support vector machnes (SVM) Nonlnear SVM-SVC Example: NLSVC (cont.) 1.Statstcal classfers The steps of the SVC procedure (cont.) nc=2*k; % Number of data sgma=1; % Parameter of the kernel D=buffer(sum([kron(x,ones(nc,1))- kron(ones(1,nc),x')'].^2,2),nc,0) % Ths s a recpe for fast computaton % of a matrx of dstances n MATLAB % usng the Kronecker product R=exp(-D/(2*sgma)); % kernel matrx * In mathematcs the Kronecker product s an operaton on a matrx. Ths s a specal case of the tensor product. It s so named n honor of the German mathematcan Leopold Kronecker. 18/03/

118 1.4 Support vector machnes (SVM) Nonlnear SVM-SVC Example: NLSVC (cont.) 1.Statstcal classfers The steps of the SVC procedure (cont.) 2. Optmsaton procedure Once the matrx has been obtaned, the optmzaton procedure s exactly equal to the one for the lnear case, except for the fact that we cannot have an explct expresson for the parameters w. 18/03/

119 1.4 Support vector machnes (SVM) Nonlnear SVM-SVC Example: NLSVC (cont.) tranng %%%%%%%%%%%%%%%%%%%%%%% %%% Data Generaton %%% %%%%%%%%%%%%%%%%%%%%%%% k=20; %Number of tranng data per class ro=2*p*rand(k,1); r=5+randn(k,1); x1=[r.*cos(ro) r.*sn(ro)]; x2=[randn(k,1) randn(k,1)]; x=[x1;x2]; y=[-ones(1,k) ones(1,k)]'; ktest=50; %Number of test data per class ro=2*p*rand(ktest,1); r=5+randn(ktest,1); x1=[r.*cos(ro) r.*sn(ro)]; x2=[randn(ktest,1) randn(ktest,1)]; xtest=[x1;x2]; ytest=[-ones(1,ktest) ones(1,ktest)]'; %%%%%%%%%%%%%%%%%%%%%%%% %%% SVC Optmzaton %%% %%%%%%%%%%%%%%%%%%%%%%%% N=2*k; % Number of data sgma=2; % Parameter of the kernel 1.Statstcal classfers Cont. D=buffer(sum([kron(x,ones(N,1))... - kron(ones(1,n),x')'].^2,2),n,0); % Ths s a recpe for fast computaton % of a matrx of dstances n MATLAB R=exp(-D/(2*sgma)); % Kernel Matrx Y=dag(y); H=Y*R*Y+1e-6*eye(length(y)); % Matrx H regularzed f=-ones(sze(y)); a=y'; K=0; Kl=zeros(sze(y)); C=100; % Functonal Trade-off Ku=C*ones(sze(y)); e=1e-6; % Tolerance to % errors n alpha alpha=quadprog(h,f,[],[],a,k,kl,ku); % Solver nd=fnd(alpha>e); x_sv=x(nd,:); % Extracton of the support % vectors N_SV=length(nd); % Number of SV %%% Computaton of the bas b %%% nd=fnd(alpha>e & alpha<c-e); % Search for % 0 < alpha_ < C N_margn=length(nd); D=buffer(sum([kron(x_sv,ones(N_margn,1))... - kron(ones(1,n_sv),x(nd,:)')'].^2,2),n_margn,0); % Computaton of the kernel matrx R_margn=exp(-D/(2*sgma)); y_margn=r_margn*(y(nd).*alpha(nd)); b=mean(y(nd) - y_margn); % Averaged result 18/03/

120 SVM Non lnéare - SVC 1.4 Support vector machnes (SVM) Nonlnear SVM-SVC Example: NLSVC (cont.) test 1.Statstcal classfers %%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Support Vector Classfer %%% %%%%%%%%%%%%%%%%%%%%%%%%%%% N_test=2*ktest; % Number of test data %%Computaton of the kernel matrx%% D=buffer(sum([kron(x_sv,ones(N_test,1))... - kron(ones(1,n_sv),xtest')'].^2,2),n_test,0); % Computaton of the kernel matrx R_test=exp(-D/(2*sgma)); % Output of the classfer y_output=sgn(r_test*(y(nd).*alpha(nd))+b); errors=sum(ytest~=y_output) % Error Computaton %%%%%%%%%%%%%%%%%%%%%% %%% Representaton %%% %%%%%%%%%%%%%%%%%%%%%% data1=x(fnd(y==1),:); data2=x(fnd(y==-1),:); svc=x(fnd(alpha>e),:); plot(data1(:,1),data1(:,2),'o') hold on plot(data2(:,1),data2(:,2),'*') plot(svc(:,1),svc(:,2),'s') g=(-8:0.1:8)'; % Grd between -8 and 8 Cont. x_grd=[kron(g,ones(length(g),1)) kron(ones(length(g),1),g)]; N_grd=length(x_grd); D=buffer(sum([kron(x_sv,ones(N_grd,1))... - kron(ones(1,n_sv),x_grd')'].^2,2),n_grd,0); % Computaton of the kernel matrx R_grd=exp(-D/(2*sgma)); y_grd=(r_grd*(y(nd).*alpha(nd))+b); contour(g,g,buffer(y_grd,length(g),0),[0 0]) % Boundary draw 18/03/

121 1.4 Support vector machnes (SVM) Nonlnear SVM-SVC NLSVM 1.Statstcal classfers Separatng boundary, margns and support vectors for the nonlnear SVC example 18/03/

122 1.4 Support vector machnes (SVM) Nonlnear SVM Non lnear Support vectors regressor (NLSVR) The soluton of lnear SVR 1.Statstcal classfers nc = w ( λ λ ) p = 1 Its nonlnear counterpart wll have the expresson w nc = = 1 ( λ Followng the same procedure as n the SVC, one can fnd the expresson of the nonlnear SVR nc The constructon of a nonlnear SVR s almost dentcal to the constructon of the nonlnear SVC. Exercse: wrte a Matlab code of ths NLSVR. ' ' λ ) ϕ ( p ' T ' y j = ( λ λ ) ϕ ( p ) ϕ ( p j ) + b = ( λ λ ) K ( p, p j ) + = 1 18/03/ nc = 1 ) b

123 1.4 Support vector machnes (SVM) Nonlnear SVM-SVR 1.Statstcal classfers Some useful Web stes on SVM: /03/

124 2. Neural networks 18/03/

125 2. Neural networks The artfcal neural networks (ANN) are composed of smple elements whch operate n parallel. These elements are nspred by bologcal nervous systems. As n nature, the connectons between these elements largely determne the functonng of the network. We can buld a neural network to perform a partcular functon by adjustng the connectons (weghts) between the elements. In the followng, we wll use the Matlab toolbox, verson 2011a or more recent versons to llustrate ths course. More detals, see 1.Neural Network Toolbox, Gettng Started Gude 2.Neural Network Toolbox, User s Gude An RN s an assembly of elementary processng elements. Processng capacty of the network s stored as weghts of nterconnectons obtaned by a process of adaptaton or tranng from a set of tranng examples. 18/03/

126 Types of neural networks : 2. Neural networks 1. Perceptron (P) one or more (adalne) formal neurons n one layer; 2. Statc networks such as "multlayer feed forward neural networks" (MLFFNR) "multlayer perceptron" (MLP) or wthout feedback from outputs to the frst nput layer; 3. Statc networks such as "Radal Bass Functons neural networks (RBFNN)" wth one layer wthout feedback to the nput layer. 3.1 Generalzed regresson neural network (GRNN) 3.2 Probablstc neural networks (PNN) 4. Partally recurrent multlayer feed forward networks (PRFNN) (Elman or Jourdan) where only the outputs are looped to the frst layer. 5. Recurrent networks wth one layer and total connectvty (Assocatve networks) 6. Self-Organzng neural networks (SONN) or compettve neural networks (CNN) 7. Dynamc neural networks (DNN) 18/03/

127 Types of neural networks 2. Neural networks Supervsed tranng: Gven couples of nput featur vectors P and ther assocated desred outputs (targets) Y d (P,Y d ) = ([p 1, p 2,, p N ] RxN, [y d1, y d2,, y dn ] SxN ) and error matrx E = [E 1, E 2,, E N ] contanng error vector E = [e 1, e 2,, e S ] T = Y - Y d 18/03/

128 Types of neural networks 1. Perceptron (cont.) 2. Neural networks 1. Gven a couple of nput and desred output vectors (p,y d ) ; 2. when an nput p s presented to the network, the actvatons of neurons are calculated once the network s stablzed; 3. the errors E are calculated; 4. E are mnmzed by a gven tranng algorthm. y d p y Error E Fg. 1. Tranng of neural network. (Fgure adapted from «Neural Network Toolbox 6 User s Gude», by The MathWorks, Inc. 18/03/

129 Types of neural networks (cont.) 2. Neural networks 1. Perceptron (P) one or more formal neurons wth one layer One neuron wth one scalar nput Input Neuron wthout bas Input Neuron wth bas (scalar) (scalar) Bologcal neuron p : nput (scalar) ω : weght (scalar) b : bas (scalar), consdered as a threshold wth constant nput =1, t acts as an actvaton threshold of the neuron. n : actvaton of the neuron, sum of the weghted nputs, n = wp, or n = wp+b f : transfert functon or actvaton functon) 18/03/

130 Types of neural networks 1. Perceptron (cont.) 2. Neural networks Transfert functon Step functon A = hardlm(n) and A = hardlms(n) takes an SxQ matrx of S N-element net nput column vectors and returns an SxQ matrx A of output vectors wth a 1 n each poston where the correspondng element of N was 0 or greater, and 0 elsewhere. A = 1 f N > 0 A = -1 f N < = 0 Example : N = -5:0.1:5; A = hardlms(n) plot(n,a) A A N A = 1 f N > 0 A = 0 f N < =0 Example : N = -5:0.1:5; A = hardlm(n) plot(n,a) N 18/03/

131 Types of neural networks 1. Perceptron (cont.) 2. Neural networks Postve saturatng lnear transfer functon A = satln(n) and A = satlns(n) takes an SxQ matrx of S N-element net nput column vectors and returns an SxQ matrx A of output vectors where each element of A s 1 where N s 1 or greater, N where N s n the nterval [0 1], and 0 where N s 0 or less. A = 1 f N > 0 A = -1 f N < =0 Example : N = -5:0.1:5; A = satlns(n) plot(n,a) A A N A = 1 f N > 0 A = 0 f N < =0 Example : N = -5:0.1:5; A = satln(n) plot(n,a) N 18/03/

132 Types of neural networks 1. Perceptron (cont.) 2. Neural networks Lnear transfert functon A = pureln (N) = N takes an SxQ matrx of S N-element net nput column vectors and returns an SxQ matrx A of output vectors equal to N. A Example : N = -5:0.1:5; A = purln(n); plot(n,a) 18/03/ N

133 Types of neural networks 1. Perceptron (cont.) 2. Neural networks Logarthmc sgmod transfer functon A = logsg(n) logsg(n) (=1/(1+exp(-N))]) takes an SxQ matrx of S N-element net nput column vectors and returns an SxQ matrx A of output vectors, where each element of N n s squashed from the nterval [-nf nf] to the nterval [0 1] wth an "S-shaped" functon. Example : N = -5:0.01:5; plot(n,logsg(n)) set(gca,'dataaspectrato',[1 1 1],'xgrd','on','ygrd','on') 18/03/

134 Types of neural networks 1. Perceptron (cont.) Symmetrc sgmod transfer functon A = transg(n) transg(n) (=[ 2/(1+exp(-2*N))]-1) takes an SxQ matrx of S N-element net nput column vectors and returns an SxQ matrx A of output vectors, where each element of N n s squashed from the nterval [-nf nf] to the nterval [-1 1] wth an "S-shaped" functon. 2. Neural networks Example : N = -5:0.01:5; plot(n,transg(n)) set(gca,'dataaspectrato',[111],'xgrd','on','ygrd','on') 18/03/

135 Types of neural networks 1. Perceptron (cont.) Feature vector nput (R features) Neuron wth bas 2. Neural networks One neuron wth feature vector nput w 0 weghts +1 f output s +1 y nputs x 1 1 w 1 x 2 w 2 P : feature vector nput : P=[p 1 p 2 p R ] ( denotes Transpose) W : weght vector : W=[w 1,1 w 1,2 w 1,R ] b : bas (scalare), consdered as a threshold wth constant nput =1 n : actvaton of the neuron, sum of the weghted nputs: n = W P + b = R j = 1 + b, 1 for f : transfer functon (or actvaton functon) a : output, a = f(n) w p j j = one formal neuron 18/03/

136 Types of neural networks 1. Perceptron (cont.) Abbrevated Notaton 2. Neural networks feature vector nput (R features) Neuron wth bas 18/03/

137 Types of neural networks 1. Perceptron (cont.) 2. Neural networks Lnear separaton of classes Crtcal condton of separaton: actvaton = threshold Let b = - w 0 W.x = w 0 an equaton of a straght lne decson In 2D space f w 0 > 0 we obtan a straght lne equaton wth a slope gven by w 1 p = w 2 w 1 p 2 = p1 + w2 w w 0 2 w w 0 2 p 2 Decson lne p 1 w w /03/

138 Types of neural networks 1. Perceptron (cont.) 2. Neural networks Example of 2D nput space wth 2 classes (class 1:, classe 2: ) w 1 = w 2 =1, seul w 0 =1,5 p 1 p 2 actvaton output p 2 p = [p 1 p 2 ] (0,1) (1,1) (1,0) p 1 18/03/

139 Types of neural networks 1. Perceptron (cont.) 2. Neural networks Matlab demo: To study the effect of changng the bas (b) for a gven weght (w) or vce versa, type n the Matlab wndow: >> nnd2n1 18/03/

140 Types of neural networks 1. Perceptron (cont.) 2. Neural networks Abbrevated model of one layer composed of S neurons Feature vector One weghted sum One output nput layer layer layer (R features) (S weghted sum unts) (S transfer functons ) IW 1, 1 Number of the nput layer Number of layer contanng the weghted sum unts layer Feature vector One weghted sum One output nput layer layer layer (R features) (S weghted sum unts) (S transfer functons ) 18/03/

141 Types of neural networks 1. Perceptron (cont.) Hstorcal ADALINE (Sngle Layer Feedforward Networks) unt 1 Input X e -1 x 1 x N ω 0 ω 1 ω N bas..... unt n 2. Neural networks Input X e -1 x 1 x N ω 0 ω 1 ω N bas +1d(Xe) -1 S +1d(Xe) -1 S e s f +1-1 quantfer Algorthm delta e s f +1-1 quantfer Algorthm delta 18/03/

142 Types of neural networks 1. Perceptron (cont.) Supervsed tranng: Gven couples of nput featur vectors P and ther assocated desred outputs (targets) Y d (P,Y d ) = ([p 1, p 2,, p N ] RxN, [y d1, y d2,, y dn ] SxN ) and error matrx E = [E 1, E 2,, E N ] contanng error vector E = [e 1, e 2,, e S ] T = Y - Y d 2. Neural networks Input layer of R features p (a tranng example: R- nput feature vector) One layer of S neurons y 1 y 2 e 1 y S - - e 2 - e S outputs Errors y d1 y d2 y ds Desred outputs layer Y layer E layer y d 18/03/

143 Types of neural networks 1. Perceptron (cont.) 2. Neural networks 1. Gven a couple of nput and desred output vectors (p,y d ) ; 2. when an nput p s presented to the network, the actvatons of neurons are calculated once the network s stablzed; 3. the errors E are calculated; 4. E are mnmzed by a gven tranng algorthm. y d p y Error E Fg. 1. Tranng of neural network. (Fgure adapted from «Neural Network Toolbox 6 User s Gude», by The MathWorks, Inc. 18/03/

144 Types of neural networks (cont.) 2. Neural networks 2. Statc networks: "multlayer feed forward neural networks" (MLFFNR) or "multlayer perceptron" (MLP) wthout feedback from outputs to the frst nput layer Some nonlnear problems (or logcal, e.g. exclusve or) are not resolvable by one-layer perceptron: SolutonOne or more ntermedate layers (hdden) are added between the nput and output layers to allow networks to create ts own representaton of entres. In ths case t s possble to approxmate a nonlnear functon or perform any sort of logc functon. 18/03/

145 Types of neural networks 2. MLP (cont.) Model wth two formal neurons α : λε πασ δ αδαπτατιον; ω ιϕ : ποιδσ σψναπτιθυε f et f j : fonctons d actvaton ; s et s j : valeurs d actvaton; x n : une composante d entrée au réseau bas -1 Inputs x 1 weghts w 0 w 1 2. Neural networks s +1 f 1 neurone -1 w 0 j w j s j +1 1 neurone g j -1 y j x N w N Hdden layer Output layer 18/03/

146 Types of neural networks 2. MLP (cont.) 2. Neural networks Input layer of R features One output layer of S neurons p (a tranng example: R- nput feature vector) One or more hdden layers y 1 - y d1 y e 2 - y d2 y e S - y ds e outputs Errors Desred outputs layer Y layer E layer y d 18/03/

147 Types of neural networks (cont.) 2. Neural networks 3. Statc networks: "Radal bass functons neural networks " (RBFNN) wth one layer wthout feedback to the nput layer The RBF may requre more neurons than that n FFNR (needs only one hdden layer), but often they can be traned more quckly than FFNR. These networks work well for a lmted number of tranng examples. 18/03/

148 Types of neural networks 3. RBFNN (cont.) 2. Neural networks RBFs can be used for: Regresson: Generalzed regresson networks (GRNN) Classfcaton: Probablstc neural networks (PNN) 18/03/

149 Types of neural networks 3. RBFNN (cont.) 2. Neural networks 3.1 Generalzed regresson networks (GRNN) An arbtrary contnuous functon can be approxmated by a lnear combnaton of a large number of well-chosen Gaussan functons. Regresson : buld a good approxmaton of a functon that s known only by a fnte number of "expermental" low nosy couples {x, y } y x Local Regresson: the Gaussan bass affect only small areas around ther mean values. 18/03/

150 Types of neural networks 3. RBFNN (cont.) 2. Neural networks Ths network can be used for classfcaton problems. When an nput s presented, the frst Radal bass layer calculates the dstances between the nput vector and weght vector and produce a vector, multpled by the bas. Matlab NN toolbox 18/03/

151 Types of neural networks 3. RBFNN (cont.) 3.2 Probablstc neural networks (PNN) Ths network can be used for classfcaton problems. When an nput s presented, the frst layer calculates the dstances between the nput vector and all tranng vectors and produces a vector whose elements descrbe how ths nput vector s close to each tranng vector. the second layer adds these contrbutons for each class of nputs to produce at the output of the network a vector of probabltes. 2. Neural networks Ματλαβ ΝΝ τοολβοξ Fnally, a competton transfer functon n the output of the second layer selects the maxmum of these probabltes, and produces a 1 for the correspondng class and a 0 for the other classes. 18/03/

152 Types of neural networks 3. RBFNN (cont.) General Remarks: 2. Neural networks Replcate the functon throughout the data space means scanng the space by a large number of Gaussans. In practce, the RBF s centred and normalsed: f T ( P) = exp( P P) 18/03/

153 Types of neural networks 3. RBFNN (cont.) 2. Neural networks RBF networks are neffcent: - n a large nput feature space (R) - on very nosy data. Local reconstructon of the functon prevents the network to "averagng" nose on the whole space (compared wth Lnear Regresson, whose objectve s precsely to average out the nose on the data). 18/03/

154 Types of neural networks 3. RBFNN (cont.) Tranng RBF networks 2. Neural networks Tranng by optmzaton procedures: consderable computaton tme - very slow or mpossble tranng n practce Soluton: use heurstc tranng approxmaton the constructon of an RBF network s quck and easy but they are less effcent than multlayer perceptrons networks (MLP). 18/03/

155 Types of neural networks 3. RBFNN (cont.) 2. Neural networks Concluson on RBFNN : Used as a credble alternatve to MLP on problems not too dffcult. Speed and ease of use For more on RBFNN, see [I] Chen, S., C.F.N. Cowan, and P.M. Grant, "Orthogonal Least Squares Learnng Algorthm for Radal Bass Functon Networks," IEEE Transactons on Neural Networks, Vol. 2, No. 2, March 1991, pp [II] P.D. Wasserman, Advanced Methods n Neural Computng, New York: Van Nostrand Renhold, 1993, on pp and pp , respectvely. 18/03/

156 Types of neural networks (cont.) 2. Neural networks 4. Partally recurrent multlayer feed forward networks (PRFNN) (Elman or Jourdan) where only the outputs are looped to the frst layer. These are networks of type "feedforward" except that a feedback s performed between the output layer and hdden layer or between the hdden layers themselves through addtonal layers called state layer (Jordan) or context layer (Elman). 18/03/

157 Types of neural networks 4. PRFNN (cont.) 2. Neural networks Snce the nformaton processng n recurrent networks depends on network condton at the prevous teraton, these networks can then be used to model temporal sequences (dynamc system). 18/03/

158 Types of neural networks 4. PRFNN (cont.) 2. Neural networks Jordan network [Jordan 86a, b] State unts p e y e e e:error - - y d Input layer One hdden layer Output layer Desred output layer 18/03/

159 Types of neural networks 4. PRFNN (cont.) 2. Neural networks Elman network [Elman 1990] In ths case an addtonal layer of context unts s ntroduced. The nputs of these unts are the actvatons of unts n the hdden layer. 18/03/

160 Types of neural networks 4. PRFNN (cont.) 2. Neural networks Context unts p e - y e - e e : error y d Input layer One hdden layer Output layer Desred output layer 18/03/

161 Types of neural networks 4. PRFNN (cont.) 2. Neural networks Extended Elman network However, there s a lmtaton n the Elman network. It cannot deal wth complex structure such as a long dstance dependency, so the followng extenson: The extenson of number of generatons n context layers 18/03/

162 Types of neural networks (cont.) 2. Neural networks 5. Recurrent networks (RNN) wth one layer and total connectvty (Assocatve networks) Tranng type: non supervsed In these models each neuron s connected to all others and theoretcally (but not n practce) has a return on tself. These models are not motvated by a bologcal analogy but by ther analogy wth statstcal mechancs. 18/03/

163 Types of neural networks 5. RNN (cont.) 2. Neural networks w 31 w 32 w 21 w 23 w w 1 w +1 neuron j p y => p y +1 Réseau entèrement connecté(une seule couche). weghts w w 12 w 13 sum f f f f Transfer functon 18/03/

164 Types of neural networks (cont.) 2. Neural networks 6. Self-Organzng neural networks (SONN) or compettve neural networks (CNN) These networks are smlar to statc monolayer networks except that there are connectons, usually wth negatve sgns, between the output unts. p y Input layer Output layer 18/03/

165 Types of neural networks 6. SONN (cont.) 2. Neural networks Tranng: A set of examples s presented to the networks, one example at a tme. For each presented example, the weghts are modfed. If a degraded verson of one of these example s presented later to the networks, the networks wll then rebuld the degraded example. Through these connectons the output unts tend to have a competton to represent the current example presented to the nput networks. 18/03/

166 Functon fttng and pattern recognton problems In fact, t s proved that a smple neural network can adapt to all the practcal functons. Defnng a probleme 2. Neural networks Supervsed learnng To defne a fttng problem, arrange a set of Q nput vectors as columns of a matrx. Then, arrange another seres of Q target vectors (the rght output vectors for each of nput vectors) n a second matrx. For example, a logc «AND» functon: Q=4; Inputs= [ ; ]; Outputs = [ ]; 18/03/

167 Supervsed learnng 2. Neural networks We can construct an ANN n 3 dfferent fashons: Usng command-lne functon * Usng graphcal user nterface, nftool ** Usng nntool, *** * Usng Command-LneFunctons, Neural Network Toolbox 6 User s Gude», by The MathWorks, Inc., page 1-7, ** Usng the NeuralNetwork Fttng Tool GUI, Neural Network Toolbox 6 User s Gude», by Graphcal User Interface, The MathWorks, Inc., page 1-13, ** Graphcal User Interface, Neural Network Toolbox 6 User s Gude», by The MathWorks, Inc., page /03/

168 Supervsed learnng 2. Neural networks Input-output processng unts The majorty of methods are provded by default when you create a network. You can overrde the default functons for processng nputs and outputs when you call a creaton network functon, or by settng network propertes after creatng the network. net.nputs{1}.processfcns : property of the network to dsplay the lst of nput processng functons. net.outputs{2}.processfcns : property of the network to dsplay the lst of output processng functons of a 2-layer network. You can use these propertes to change the processng functons that apply to nputs-outputs of your network (but Matlab recommends usng the default propertes). 18/03/

169 Supervsed learnng Input-output processng unts 2. Neural networks Several functons have default settngs whch defne ther operatons. You can access or change the th parameter of the nput or output: net.nputs{1}.processparams{} for functons of nput processng net.outputs{2}. processparams{} for functons of output processng of a 2-layer network. 18/03/

170 Supervsed learnng Input-output processng unts 2. Neural networks For the networks MLP the default functons are: IPF Structure of the nput processng functons. Default : IPF={'fxunknowns ','removeconstantrows ','mapmnmax'}. OPF - Structure of the output processng functons. Default : OPF = {'removeconstantrows ','mapmnmax '}. fxunknowns : Ths functon saves the unknown data (represented n the user data wth NaN values ) n a numercal form for the network. It preserves nformaton about whch values are known and whch values are unknown. removeconstantrows values n a matrx. : ths functon removes rows wth constant 18/03/

171 Supervsed learnng Input-output processng unts 2. Neural networks Pre- and post-processng n Matlab: 1. Mn and Max (mapmnmax) Pror to tranng, t s often useful to calbrate the nputs and targets so that they are always n a specfed range, e.g., [-1,1] (normalzed nputs and targets). [pn,ps] = mapmnmax(p); [tn,ts] = mapmnmax(t); net = tran(net,pn,tn); % tranng a created network an = sm(net,pn); % smulaton (an : normalsed outputs) 18/03/

172 Supervsed learnng Input-output processng unts 2. Neural networks To convert the outputs to the same unts of the orgnal targets : a = mapmnmax(`reverse',an,ts); If mapmnmax s already used to preprocess the data of tranng set, then whenever the traned network s used wth new entres, these entres must be pre-processed wth mapmnmax : Let «pnew» a new nput set to already traned network pnewn = mapmnmax('apply',pnew,ps); anewn = sm(net,pnewn); anew = mapmnmax(`reverse',anewn,ts); 18/03/

173 Supervsed learnng Input-output processng unts 2. Neural networks 2. Mean and standard devaton (mapstd) 4. Prncpal Component Analyss (processpca) 5. Processng unknown nputs (fxunknowns) 6. Processng unknown targets (or Don't Care ) (fxunknowns) 7. Post-tranng analyss (postreg) 18/03/

174 Supervsed learnng Input-output processng unts 2. Neural networks The performance of a traned network can be measured n some sens by the errors on the tranng, valdaton and test sets but t s often useful to further analyse the performance of the network response. One way to do ths s to perform lnear regresson. a = sm(net,p); [m,b,r] = postreg(a,t) m : slope b = ntersecton of the best straght lne (relatng outputs to targets values) wth the y-axs r = Pearson s correlaton coeffcent 18/03/

175 Supervsed learnng 2. Neural networks PERCEPTRON 18/03/

176 Supervsed learnng Perceptron 2. Neural networks Creaton of one-layer perceptron wth R nputs and S outputs : net = newp(p,t,tf,lf) p : RxQ1 matrx of Q1 nput feature vectors, each of R nput features. t : SxQ2 matrx of Q2 target vecteurs tf : Transfer functon, default = 'hardlm'. lf : Learnng functon, default = 'learnp' a = 1 f n > 0 a = 0 f n < =0 18/03/

177 2. Neural networks Supervsed learnng Perceptron Classfcaton example: % DEFINITION % Creaton of a new perceptron usng net = newp(pr,s,tf,lf) % Descrpton % Perceptrons are used to solve smple problems of classfcaton % (.e. lnearly separable classes) % net = newp(pr,s,tf,lf) % pr - Rx2 matrx of mn and max values of R nput elements. % s - Nombre de neurones. % tf transfer functon, default = 'hardlm': Hard lmt transfer functon.. % lf tranng functon, default = 'learnp': Perceptron weght/bas learnng functon. p1 = 7*rand(2,50); p2 = 7*rand(2,50)+5; p = [p1,p2]; t = [zeros(1,50),ones(1,50)]; pr = mnmax(p); % pr s Rx2 s a matrx of mn and max values of the matrx pmm of dm (RxQ) net = newp(pr,t,'hardlm','learnpn'); % Dsplay the ntal values of the network w0=net.iw{1,1} b0=net.b{1} E=1; Iter=0; % sse: Sum squared error performance functon whle (sse(e)>.1)&(ter<1000) [net,y,e] = adapt(net,p,t);% adapt: adapter le réseau de neurones ter = ter+1; end % Dsplay the fnal values of the network w=net.iw{1,1} b=net.b{1} % TEST test = rand(2,1000)*15; ctest = sm(net,test); fgure % scatter: Scatter/bubble plot. % scatter(x,y,s,c) dsplays colored crcles at the locatons specfed by the vectors X and Y (same sze). % S determne the surface of each mark (n ponts ^ 2) % C determne the colors of the markers % 'flled' fllng the markers scatter(p(1,:),p(2,:),100,t,'flled') hold on scatter(test(1,:),test(2,:),10,ctest,'flled'); hold off % plotpc (W,B): Draw a classfcaton lne % W - SxR weght matrx (R <= 3). % B - Sx1 bas vector plotpc(net.iw{1},net.b{1}); % Plot Regresson fgure [m,b,r] = postreg(y,t); 18/03/

178 Supervsed learnng Perceptron 2. Neural networks Example (cont): >> run( perceptron.m') w0 = 0 0 b0 = 0 w = b = /03/

179 Supervsed learnng Perceptron Lnear regresson: 2. Neural networks 18/03/

180 Supervsed learnng Perceptron 2. Neural networks Data structure To study the effect of data formats on the network structure There are two man types of nput vectors: those that are ndependent of tme or concurrent vectors (e.g., statc mages), and those that occur sequentally n tme or sequental vectors (e.g., tme sgnals or dynamc mages). For concurrent vectors, the orderng of vectors s not mportant, and f there s a number of networks operatng n parallel (statc networks), we can present one nput vector to each network. For sequental vectors, the order n whch the vectors appear s mportant. So, networks used n ths case are called dynamc networks. 18/03/

181 Supervsed learnng Perceptron Statc network 2. Neural networks Statc network These networks have nether a delay nor a feedback. Smulaton wth nput concurrent vectors n batch mode When the presentaton order of nputs s not mportant, then all these nputs can be ntroduced smultaneously (batch mode). For example, assume that the target values produced by the network are 100, 50, -100, and 25. Q=4 P = [ ; ]; % form a batch matrx P t 1 =[100], t 2 =[50], t 3 =[-100], t 4 =[25], T = [ ]; % form a batch matrx T P T W = [1 2], b = [0]; % assgn values to weghts and to b (wthout tranng) net.iw{1,1} = W; net.b{1} = b; net = newln(p, T); % all the nput vectors and ther target values are gven at a tme y = sm(net, P); after smulaton, we obtan: y = /03/

182 Supervsed learnng Perceptron Statc network 2. Neural networks Smulaton wth nput concurrent vectors n batch mode (cont.) p 1 t 1 In the prevous network a sngle matrx contanng all the concurrent vectors s presented to the network, and the network produced smultaneously a sngle matrx of vectors n the output. p 2 t 2 The result would be the same f there were four networks operatng n parallel and each network has receved one of the nput vectors and produces one output. The orderng of nput vectors s not mportant, because they do not nteract wth each other. p 3 t 3 p 4 t 4 18/03/

183 Supervsed learnng Perceptron 2. Neural networks Dynamc network These networks have a delay and feedback. Smulaton wth nputs n ncremental mode When the order of presentaton of nputs s mportant, then these nputs may be ntroduced sequentally (on-lne mode). The network have a delay. For example p 1 =[1], p 2 =[2], p 3 =[3], p 4 =[4] P = { };% form a «cell array», nput sequence Suppose the target values are gven n the order 10, 3, 3, -7 : t (1)=[10], t(2)=[3], t(3)=[3], t(4)=[-7], T = { 10, 3, 3, -7}; net = newln(p, T, [0 1]); % create the network wth a delay of 0 and 1 net.basconnect = 0; W=[1 2]; net.iw{1,1} = [1 2]; % assgn weghts y = sm(net, P); After smulaton we obtan «cell array», output sequence: y = [1] [4] [7] [10] t(t) 18/03/

184 Supervsed learnng Perceptron Dynamc network 2. Neural networks Dynamc network The presentaton order of nputs s mportant when they are presented as a sequence. In ths case, the current output s obtaned by multplyng the current nput by 1 and the prevous nput by 2 and by addng the result. If you were to change the order of nputs, the obtaned numbers n the output change. t(t) 18/03/

185 Supervsed learnng Perceptron Dynamc network 2. Neural networks Dynamc network Smulaton wth nput concurrent vectors n Incrémental mode Although ths choce s rratonal, we can always use a dynamc network wth nput concurrent vectors (the order of presentaton s not mportant). p 1 =[1], p 2 =[2], p 3 =[3], p 4 =[4] P = [ ]; % form P y = sm(net, P); % create the network wthout delay ([0 0]) After smulaton we obtan «cell array», output sequence: y = /03/

186 Supervsed learnng 2. Neural networks Multple Layers Neural Network (MLNN) or Feed-forword backpropagaton network (FFNN) Entrée couche 1 couche 2 couche 3 18/03/

187 Supervsed learnng FFNN 2. Neural networks Input layer 1 layer 2 layer 3 18/03/

188 Multple Layers Neural Network (MLNN) or Feed-forword backpropagaton network (FFNN) Creatng a MLNN wth N layers : Feedforward neural network. Two (or more) layer feedforward networks can mplement any fnte nput-output functon arbtrarly well gven enough hdden neurons. feedforwardnet(hddenszes,tranfcn) takes a 1xN vector of N hdden layer szes, and a backpropagaton tranng functon, and returns a feed-forward neural network wth N+1 layers. Input, output and output layers szes are set to 0. These szes wll automatcally be confgured to match partcular data by tran. Or the user can manually confgure nputs and outputs wth confgure. Defaults are used f feedforwardnet s called wth fewer arguments. The default arguments are (10,'tranlm'). Here a feed-forward network s used to solve a smple fttng problem: [x,t] = smpleft_dataset; net = feedforwardnet(10); net = tran(net,x,t); vew(net) y = net(x); perf = perform(net,t,y) 18/03/

189 Multple Layers Neural Network (MLNN) or Feed-forword backpropagaton network (FFNN) Example 1 : Regresson Suppose, for example you have data from a housng applcaton [HaRu78]. Ths s to desgn a network to predct the value of a house (n 1000 U.S. dollars) gven 13 features of geographc nformaton and real estate. We have a total of 506 examples of houses for whch we have these 13 features and ther assocated values of the market.. [HaRu78] Harrson, D., and Rubnfeld, D.L., Hedonc prces and the demand for clean ar, J. Envron. Economcs & Management, Vol. 5, 1978, pp /03/

190 MLNN orffnn Example 1 (cont.) Gven p nput vectors and t target vectors load housng; % Load the datap (13x506 batch nput matrx) and t (1x506 batch target matrx) [P mm, Ps] = mapmnmax(p); % assgn the values mn and max of the rows n the matrx P wth values n [-1 1] [t mm, ts] = mapmnmax(t); Dvde data nto three sets: tranng, valdaton, and testng. The valdaton set s used to ensure that there wll be no overfttng n the fnal results. The test set provdes an ndependent measure of tranng data. Take 20% of the data for the valdaton set and 20% for the test set, leavng 60% for the tranng set. Choose sets randomly from the orgnal data. [tranv, val, test] = dvdevec(p mm, t mm, 0.20, 0.20); % 3 structures : tranng (60%), valdaton (20%) and testng (20%) pr = mnmax(p mm ); % pr s Rx2 matrx of mn and max values of the RxQ matrx P mm net = newff(pr, [20 1]); % create a «feed-forward backpropagaton network» wth one hdden layer of 20 neurons and one output layer wth 1 neuron. The default tranng functon s 'tranlm' 18/03/

191 MLNN orffnn Example 1 (cont.) Tranng vectors set (tranv.p) : presented to the network durng tranng and the network s adjusted accordng to ts errors. valdaton vectors set (tranv.t, vald) : used to measure the generalsaton of the network and nterrupt the learnng when the generalsaton stops to mprove. test vectors set (tranv.t, test) : They have no effect on tranng and thus provde an ndependent measure of network performance durng and after tranng. [net, tr]=tran(net, tranv.p, tranv.t, [ ], [ ],val,test); ensemble d apprentssage Structures des ensembles de valdaton et de test % Tran a neural network. Cette foncton présente smultanément tous les vecteurs d entrée et de cbles au réseau en mode «batch». Pour évaluer les performances, elle utlse la foncton mse (mean squared error). net est la structure du réseau obtenu, tr est tranng record (epoches et performance) 18/03/

192 MLNN orffnn Example 1 (cont.) 18/03/

193 MLNN orffnn Example 1 (cont.) The tranng s stopped at teraton 9 because the performance measured by mse) of the valdaton start ncreasng after ths teraton. Performance qute suffcent. tranng several tmes wll produce dfferent results due to dfferent ntal condtons. The mean square error (mse) s the average of squares of the dfference between outputs (standard) and targets. Zero means no error, whle an error more than sgnfes hgh error. 18/03/

194 MLNN orffnn Example 1 (cont.) Analyss of the network response Present the entre set of data to the network (tranng, valdaton and test) and perform a lnear regresson between network outputs after they were brought back to the orgnal range of outputs and related targets. y mm = sm(net, P mm ); % smulate an ANN 20 y = mapmnmax('reverse', y mm, ts); % Replace the values between [-1 1] of the matrx y mm by ther real mnmal and maxmale [m, b, r] = postreg(y, t); % Make a lnear regresson between the outputs and the targets. m Slope of the lnear regresson. b Y-ntercept of the lnear regresson. r value of the lnear regresson. P mm net y mm 18/03/

195 MLNN orffnn Example 1 (cont.) The regresson r of values measures the correlaton between the outputs (not normalsed) and targets. A value r of 1 means a perfect relatonshp, 0 a random relatonshp. The output follow the target, r = 0.9. If greater accuracy s requred, then: - Reset the weghts and the bas of the network agan usng the functons nt (net) and tran - Increase the number of neurons n the hdden layer - Increase the number of tranng feature vectors - Increase the number of features f more useful nformaton are avalable - Try another tranng algorthm 18/03/

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Support Vector Machines

Support Vector Machines /14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one) Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Chapter 6 Support vector machine. Séparateurs à vaste marge

Chapter 6 Support vector machine. Séparateurs à vaste marge Chapter 6 Support vector machne Séparateurs à vaste marge Méthode de classfcaton bnare par apprentssage Introdute par Vladmr Vapnk en 1995 Repose sur l exstence d un classfcateur lnéare Apprentssage supervsé

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov 9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar

More information

Kristin P. Bennett. Rensselaer Polytechnic Institute

Kristin P. Bennett. Rensselaer Polytechnic Institute Support Vector Machnes and Other Kernel Methods Krstn P. Bennett Mathematcal Scences Department Rensselaer Polytechnc Insttute Support Vector Machnes (SVM) A methodology for nference based on Statstcal

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

The big picture. Outline

The big picture. Outline The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them? Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning Advanced Introducton to Machne Learnng 10715, Fall 2014 The Kernel Trck, Reproducng Kernel Hlbert Space, and the Representer Theorem Erc Xng Lecture 6, September 24, 2014 Readng: Erc Xng @ CMU, 2014 1

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Intro to Visual Recognition

Intro to Visual Recognition CS 2770: Computer Vson Intro to Vsual Recognton Prof. Adrana Kovashka Unversty of Pttsburgh February 13, 2018 Plan for today What s recognton? a.k.a. classfcaton, categorzaton Support vector machnes Separable

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Nonlinear Classifiers II

Nonlinear Classifiers II Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

18-660: Numerical Methods for Engineering Design and Optimization

18-660: Numerical Methods for Engineering Design and Optimization 8-66: Numercal Methods for Engneerng Desgn and Optmzaton n L Department of EE arnege Mellon Unversty Pttsburgh, PA 53 Slde Overve lassfcaton Support vector machne Regularzaton Slde lassfcaton Predct categorcal

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Fisher Linear Discriminant Analysis

Fisher Linear Discriminant Analysis Fsher Lnear Dscrmnant Analyss Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan Fsher lnear

More information

Statistical Foundations of Pattern Recognition

Statistical Foundations of Pattern Recognition Statstcal Foundatons of Pattern Recognton Learnng Objectves Bayes Theorem Decson-mang Confdence factors Dscrmnants The connecton to neural nets Statstcal Foundatons of Pattern Recognton NDE measurement

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information