CS407 Neural Computation

Size: px

Start display at page:

Download "CS407 Neural Computation"

Domenic Jefferson
6 years ago
Views:

1 CS407 Neural Computatio Lecture 4: Sigle Layer Perceptro SLP Classifiers Lecturer: A/Prof. M. Beamou

2 Outlie What s a SLP ad hat s classificatio? Limitatio of a sigle perceptro. Foudatios of classificatio ad Bayes Decisio makig theory Discrimiat fuctios, liear machie ad miimum distace classificatio Traiig ad classificatio usig the Discrete perceptro Sigle-Layer Cotiuous perceptro Netorks for liearly separable classificatios Appedix A: Ucostraied optimizatio techiques Appedix B: Perceptro Covergece proof Suggested readig ad refereces

3 What is a perceptro ad hat is a Sigle Layer Perceptro SLP?

4 Perceptro The simplest form of a eural etork cosists of a sigle euro ith adjustable syaptic eights ad bias performs patter classificatio ith oly to classes perceptro covergece theorem : Patters vectors are dra from to liearly separable classes Durig traiig, the perceptro algorithm coverges ad positios the decisio surface i the form of hyperplae betee to classes by adjustig syaptic eights

5 What is a perceptro? x x k k Bias b k v m x + k kj j j Activatio fuctio b y k k ϕ v k Σ v k ϕ. Output y k x m Iput sigal km Syaptic eights Summig juctio Discrete Perceptro: ϕ sig Cotious Perceptro: ϕ S shape

6 Activatio Fuctio of a perceptro + + v i v i - Sigum Fuctio sig Discrete Perceptro: ϕ sig Cotious Perceptro: ϕ v s shape

7 SLP Architecture Sigle layer perceptro Iput layer Output layer

8 Where are e headig? Differet No-Liearly Separable Problems Structure Types of Decisio Regios Exclusive-OR Problem Classes ith Most Geeral Meshed regiosregio Shapes Sigle-Layer Half Plae Bouded By Hyperplae A B B A B A To-Layer Covex Ope Or Closed Regios A B B A B A Three-Layer Arbitrary Complexity Limited by No. of Nodes A B B A B A

9 Revie from last lectures:

10 Implemetig Logic Gates ith Perceptros We ca use the perceptro to implemet the basic logic gates AND, OR ad NOT. All e eed to do is fid the appropriate coectio eights ad euro thresholds to produce the right outputs for each set of iputs. We sa ho e ca costruct simple etorks that perform NOT, AND, ad OR. It is the a ell ko result from logic that e ca costruct ay logical fuctio from these three operatios. The resultig etorks, hoever, ill usually have a much more complex architecture tha a simple Perceptro. We geerally at to avoid decomposig complex problems ito simple logic gates, by fidig the eights ad thresholds that ork directly i a Perceptro architecture.

11 Implemetatio of Logical NOT, AND, ad OR I each case e have iputs i i ad outputs out, ad eed to determie the eights ad thresholds. It is easy to fid solutios by ispectio:

12 The Need to Fid Weights Aalytically Costructig simple etorks by had is oe thig. But hat about harder problems? For example, hat about: Ho log do e keep lookig for a solutio? We eed to be able to calculate appropriate parameters rather tha lookig for solutios by trial ad error. Each traiig patter produces a liear iequality for the output i terms of the iputs ad the etork parameters. These ca be used to compute the eights ad thresholds.

13 Fidig Weights Aalytically for the AND Netork We have to eights ad ad the threshold θ, ad for each traiig patter e eed to satisfy So the traiig data lead to four iequalities: It is easy to see that there are a ifiite umber of solutios. Similarly, there are a ifiite umber of solutios for the NOT ad OR etorks.

14 Limitatios of Simple Perceptros We ca follo the same procedure for the XOR etork: Clearly the secod ad third iequalities are icompatible ith the fourth, so there is i fact o solutio. We eed more complex etorks, e.g. that combie together may simple etorks, or use differet activatio/thresholdig/trasfer fuctios. It the becomes much more difficult to determie all the eights ad thresholds by had. These eights istead are adapted usig learig rules. Hece, eed to cosider learig rules see previous lecture, ad more complex architectures.

15 E.g. Decisio Surface of a Perceptro + + x Liearly separable - - x x x No-Liearly separable Perceptro is able to represet some useful fuctios But fuctios that are ot liearly separable e.g. XOR are ot represetable

16 What is classificatio?

17 Classificatio? Patter classificatio/recogitio - Assig the iput data a physical object, evet, or pheomeo to oe of the pre-specified classes categories The block diagram of the recogitio ad classificatio system

18 Classificatio: a example Duda & Hart, Chapter Automate the process of sortig icomig fish o a coveyor belt accordig to species Salmo or Sea bass. Set up a camera Take some sample images Note the physical differeces betee the to types of fish Legth Lightess Width No. & shape of fis safirim Positio of the mouth

19 Classificatio a example

20 Classificatio: a example Cost of misclassificatio: depeds o applicatio Is it better to misclassify salmo as bass or vice versa? Put salmo i a ca of bass loose profit Put bass i a ca of salmo loose customer There is a cost associated ith our decisio. Make a decisio to miimize a give cost. Feature Extractio: Problem & Domai depedet Requires koledge of the domai A good feature extractor ould make the job of the classifier trivial.

21 Bayesia decisio theory

22 Bayesia Decisio Theory Duda & Hart, Chapter Bayesia decisio theory is a fudametal statistical approach to the problem of patter classificatio. Decisio makig he all the probabilistic iformatio is ko. For give probabilities the decisio is optimal. Whe e iformatio is added, it is assimilated i optimal fashio for improvemet of decisios.

23 Bayesia Decisio Theory Fish Example: Each fish is i oe of states: sea bass or salmo Let ω deote the state of ature ω ω for sea bass ω ω for salmo

24 Bayesia Decisio Theory The State of ature is upredictable ω is a variable that must be described probabilistically. If the catch produced as much salmo as sea bass the ext fish is equally likely to be sea bass or salmo. Defie Pω : a priori probability that the ext fish is sea bass Pω : a priori probability that the ext fish is salmo.

25 Bayesia Decisio Theory If other types of fish are irrelevat: P ω + P ω. Prior probabilities reflect our prior koledge e.g. time of year, fishig area, Simple decisio Rule: Make a decisio ithout seeig the fish. Decide if P ω > P ω ; ω otherise. OK if decidig for oe fish If several fish, all assiged to same class.

26 Bayesia Decisio Theory... I geeral, e ill have some features ad more iformatio. Feature: lightess measuremet x Differet fish yield differet lightess readigs x is a radom variable

27 Bayesia Decisio Theory. Defie px ω Class Coditioal Probability Desity Probability desity fuctio for x give that the state of ature is ω The differece betee px ω ad px ω describes the differece i lightess betee sea bass ad salmo.

28 Class coditioed probability desity: px ω Hypothetical class-coditioal probability Desity fuctios are ormalized area uder each curve is.0

29 Bayesia Decisio Theory... Suppose that e ko The prior probabilities Pω ad Pω, The coditioal desities ad Measure lightess of a fish x. pxω pxω p ω x What is the category of the fish? j

30 Bayes Formula Give Prior probabilities Pω j Coditioal probabilities px ω j Measuremet of particular item Feature value x Bayes formula: Likelihood Prior Posterior Evidece from p ω, x p x P P x p x j ω j ω j ω j here so Pω i x i P ω x p x p x ωi P ωi i j p x ω P ω j p x j

31 Bayes' formula... px ω j is called the likelihood of ω j ith respect to x. the ω j category for hich px ω j is large is more "likely" to be the true category px is the evidece ho frequetly e ill measure a patter ith feature value x. Scale factor that guaratees that the posterior probabilities sum to.

32 Posterior Probability Posterior probabilities for the particular priors Pω /3 ad Pω /3. At every x the posteriors sum to.

33 Error If e decide ω P ω x P error x If e decide ω P ω x For a give x, e ca miimize the probability of error by decidig ω if Pω x > Pω x ad ω otherise.

34 Bayes' Decisio Rule Miimizes the probability of error ω : if Pω x > Pω x i.e. ω : otherise or ω : if P x ω Pω > Px ω Pω ω : otherise ad PError x mi [Pω x, Pω x] x P x P ω ω ω ω < > Likelihood ratio ω ω ω ω ω ω ω ω ω ω ω ω P P x p x p P x p P x p < > < > Threshold

35 Decisio Boudaries Classificatio as divisio of feature space ito o-overlappig regios X, K x X, k X R such that x assiged to ω k Boudaries betee these regios are ko as decisio surfaces or decisio boudaries

36 Optimum decisio boudaries Criterio: miimize miss-classificatio Maximize correct-classificatio y probabilit posterior maximum.. x P x P k j e i P x p P x p k j if X x Classify j k j j k k k ω ω ω ω ω ω > >, R Here P X x P X x P correct P R k k k k R k k k ω ω ω

37 Discrimiat fuctios Discrimiat fuctios determie classificatio by compariso of their values: Classify j k g x k x > g x Optimum classificatio: based o posterior probability P ω k x Ay mootoe fuctio g may be applied ithout chagig the decisio boudaries X k j if e. g. g k g x k x g P ω k l P ω x k x

38 The To-Category Case Use discrimiat fuctios g ad g, ad assigig x to ω if g >g. Alterative: defie a sigle discrimiat fuctio gx g x - g x, decide ω if gx>0, otherise decide ω. To category case g x P ω x P ω x p x ω P ω g x l + l p x ω P ω

39 Summary Bayes approach: Estimate class-coditioed probability desity Combie ith prior class probability Determie posterior class probability Derive decisio boudaries Alterate approach implemeted by NN Estimate posterior probability directly i.e. determie decisio boudaries directly

40 DISCRIMINANT FUNCTIONS

41 Discrimiat Fuctios Determie the membership i a category by the classifier based o the compariso of R discrimiat fuctios g x, g x,, g R x Whe x is ithi the regio X k if g k x has the largest value Do ot mix betee dim of each I/P vector dim of feature space; P # of I/P vectors; ad R # of classes.

42 Discrimiat Fuctios

43 Discrimiat Fuctios

44 Discrimiat Fuctios

45 Discrimiat Fuctios

46 Discrimiat Fuctios

47 Liear Machie ad Miimum Distace Classificatio Fid the liear-form discrimiat fuctio for to class classificatio he the class prototypes are ko Example 3.: Select the decisio hyperplae that cotais the midpoit of the lie segmet coectig ceter poit of to classes

48 Liear Machie ad Miimum Distace Classificatio dichotomizer The dichotomizer s discrimiat fuctio gx: t

49 Liear Machie ad Miimum Distace Classificatio multiclass classificatio The liear-form discrimiat fuctios for multiclass classificatio There are up to RR-/ decisio hyperplaes for R pairise separable classes i.e. ext to or touchig aother

50 Liear Machie ad Miimum Distace Classificatio multiclass classificatio Liear machie or miimum-distace classifier Assume the class prototypes are ko for all classes Euclidea distace betee iput patter x ad the ceter of class i, X i : t

51 Liear Machie ad Miimum Distace Classificatio multiclass classificatio

52 Liear Machie ad Miimum Distace Classificatio P, P, P 3 are the cetres of gravity of the prototype poits, e eed to desig a miimum distace classifier. Usig the formulas from the previous slide, e get i Note: to fid S e eed to compute g -g

53 Liear Machie ad Miimum Distace Classificatio If R liear discrimiat fuctios exist for a set of patters such that g i x > g x j for x Class i, i,,..., R, j,,..., R, i j The classes are liearly separable.

54 Liear Machie ad Miimum Distace Classificatio Example:

55 Liear Machie ad Miimum Distace Classificatio Example

56 Liear Machie ad Miimum Distace Classificatio Examples 3. ad 3. have sho that the coefficiets eights of the liear discrimiat fuctios ca be determied if the a priori iformatio about the sets of patters ad their class membership is ko I the ext sectio Discrete perceptro e ill examie eural etorks that derive their eights durig the learig cycle.

57 Liear Machie ad Miimum Distace Classificatio The example of liearly o-separable patters

58 Liear Machie ad Miimum Distace Classificatio o sg x + x + Image space o Iput space x

59 Liear Machie ad Miimum Distace Classificatio o sg x + x + o sg x x + x x o o These iputs map to the same poit, i the image space

60 The Discrete Perceptro

61 Discrete Perceptro Traiig Algorithm So far, e have sho that coefficiets of liear discrimiat fuctios called eights ca be determied based o a priori iformatio about sets of patters ad their class membership. I hat follos, e ill begi to examie eural etork classifiers that derive their eights durig the learig cycle. The sample patter vectors x, x,, x p, called the traiig sequece, are preseted to the machie alog ith the correct respose.

62 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios Zurada, Chapter 3 Itersects the origi poit 0 5 prototype patters i this case: y, y, y 5 If dim of augmeted patter vector is > 3, our poer of visualizatio are o loger of assistace. I this case, the oly recourse is to use the aalytical approach.

63 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios Devise a aalytic approach based o the geometrical represetatios E.g. the decisio surface for the traiig patter y y i Class see previous slide Weight Space y i Class t y y If y i Class : If y i Class : + cy cy Gradiet the directio of steepest icrease c cotrols the size of adjustmet Weight Space c >0 is the correctio icremet is to times the learig costat ρ itroduced before correctio i egative gradiet directio

64 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios

65 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios t y cy y t y y p Note : pdistace so >0 Note : c is ot costat ad depeds o the curret traiig patter as expressed by eq. Above.

66 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios The iitial eight should be differet from 0. if 0, the cy 0 ad +cy0, therefore o possible adjustmets. For fixed correctio rule: ccostat, the correctio of eights is alays the same fixed portio of the curret traiig vector The eight ca be iitialised at ay value For dyamic correctio rule: c depeds o the distace from the eight i.e. the eight vector to the decisio surface i the eight space. Hece Curret eight Curret iput patter

Discrete Perceptro Traiig Algorithm - Geometrical Represetatios Dyamic correctio rule: Usig the value of c from previous slide as a referece, e devise a adjustmet techique hich depeds o the

67 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios Dyamic correctio rule: Usig the value of c from previous slide as a referece, e devise a adjustmet techique hich depeds o the legth - λ: Symmetrical reflectio.r.t decisio plae λ0: No eight adjustmet Νote: λ is the ratio of the distace betee the old eight vector ad the e, to the distace from to the patter hyperplae

68 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios Example: x x, x 3 0.5, 3, x 4 d, d 3 d d :class 4 The augmeted iput vectors are: y, 0.5, 3 :class y y3 y 4 The decisio lies t y i 0, for i,, 3, 4 are sketched o the augmeted eight space as follos:

69 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios

70 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios For c ad [.5.75] t Usig ' ± cy the eight traiig ith each step ca be summarized as follos: k c kt [ dk sg y k ] y k We obtai the folloig outputs ad eight updates: Step : Patter y is iput o d sg [.5 o + y.75].5.75

71 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios Step : Patter y is iput ].5 [ sg 3 y o d o Step 3: Patter y 3 is iput ] [ sg y o d o

72 Discrete Perceptro Traiig Algorithm - Geometrical Represetatios Sice e have o evidece of correct classificatio of eight 4 the traiig set cosistig of a ordered sequece of patters y,y ad y 3 eeds to be recycled. We thus have y 4 y, y 5 y, etc the superscript is used to deote the folloig traiig step umber. Step 4, 5: o misclassificatio, thus o eight adjustmets. You ca check that the adjustmet folloig i steps 6 through 0 are as follos: 7 t.5.75 is i solutio area. 0 [ ] 9 8 [ ] t 7

73 The Cotiuous Perceptro

74 Cotiuous Perceptro Traiig Algorithm Replace the TLU Threshold Logic Uit ith the sigmoid activatio fuctio for to reasos: Gai fier cotrol over the traiig procedure Zurada, Chapter 3 Facilitate the differetial characteristics to eable computatio of the error gradiet of curret error fuctio The factor ½ does ot affect the locatio of the error miimum

75 Cotiuous Perceptro Traiig Algorithm The e eights is obtaied by movig i the directio of the egative gradiet alog the multidimesioal error surface By defiitio of the steepest descet cocept, each elemetary move should be perpedicular to the curret error cotour.

76 Cotiuous Perceptro Traiig Algorithm Defie the error as the squared differece betee the desired output ad the actual output et Sice et t y, e have yi i,,..., + i Traiig rule of cotious perceptro equivalet to delta traiig rule

77 Cotiuous Perceptro Traiig Algorithm

78 Cotiuous Perceptro Traiig Algorithm Same as previous example of discrete perceptro but ith a cotiuous activatio fuctio ad usig the delta rule. Same traiig patter set as discrete perceptro example

79 Cotiuous Perceptro Traiig Algorithm exp + k k k et d E λ [ ] exp + + E λ ad reducig the terms simplifies this expressio to the folloig form λ [ ] exp E + + similarly [ ] exp0.5 E + [ ] 3 exp3 E + + [ ] 4 exp E + These error surfaces are as sho o the previous slide.

80 Cotiuous Perceptro Traiig Algorithm miimum

81 Mutlicategory SLP

82 Multi-category Sigle layer Perceptro ets Treat the last fixed compoet of iput patter vector as the euro activatio threshold. T + y + - irrelevat heter it is equal to + or

83 Multi-category Sigle layer Perceptro ets R-category liear classifier usig R discrete bipolar perceptros Goal: The i-th TLU respose of + is idicative of class i ad all other TLU respod ith -

84 Multi-category Sigle layer Perceptro ets Example 3.5 should be t -, -, Idecisio regios regios here o class membership of a iput patter ca be uiquely determied based o the respose of the classifier patters i shaded areas are ot assiged ay reasoable classificatio. E.g. poit Q for hich o[ ] t > idecisive respose. Hoever o patters such as Q have bee used for traiig i the example.

85 Multi-category Sigle layer Perceptro ets [ ] [ ] t t 3 ad 0 3 [ ] t 0 ad For c Step : Patter y is iput [ ] [ ] [ ] * 0 3 sg 0 0 sg 0 0 sg Sice the oly icorrect respose is provided by TLU3, e have

86 Multi-category Sigle layer Perceptro ets Step : Patter y is iput [ ] [ ] [ ] sg 5 0 sg * 5 0 sg

87 Multi-category Sigle layer Perceptro ets Step 3: Patter y 3 is iput Oe ca verify that the oly adjusted eights from o o are those of TLU sg sg * sg y y y t t t Durig the secod cycle:

88 Multi-category Sigle layer Perceptro ets R-category liear classifier usig R cotiuous bipolar perceptros

89 Compariso betee Perceptro ad Bayes Classifier Perceptro operates o the promise that the patters to be classified are liear separable otherise the traiig algorithm ill oscillate, hile Bayes classifier ca ork o oseparable patters Bayes classifier miimizes the probability of misclassificatio hich is idepedet of the uderlyig distributio Bayes classifier is a liear classifier o the assumptio of Gaussiaity The perceptro is o-parametric, hile Bayes classifier is parametric its derivatio is cotiget o the assumptio of the uderlyig distributios The perceptro is adaptive ad simple to implemet the Bayes classifier could be made adaptive but at the expese of icreased storage ad more complex computatios

90 APPENDIX A Ucostraied Optimizatio Techiques

91 Ucostraied Optimizatio Techiques Hayki, Chapter 3 Cost fuctio E cotiuously differetiable a measure of ho to choose of a adaptive filterig algorithm so that it behaves i a optimum maer e at to fid a optimal solutio * that miimize E E r * 0 local iterative descet : startig ith a iitial guess deoted by 0, geerate a sequece of eight vectors,,, such that the cost fuctio E is reduced at each iteratio of the algorithm, as sho by E+ < E Steepest Descet, Neto s, Gauss-Neto s methods

92 Method of Steepest Descet Here the successive adjustmets applied to are i the directio of steepest descet, that is, i a directio opposite to the grade + - a g a : small positive costat called step size or learig-rate parameter. g : grade The method of steepest descet coverges to the optimal solutio * sloly The learig rate parameter a has a profoud ifluece o its covergece behavior overdamped, uderdamped, or eve ustablediverges

93 Neto s Method Usig a secod-order Taylor series expasio of the cost fuctio aroud the poit E E+ - E ~ gt + / T H here + -, H : Hessia matrix of E We at * that miimize E so differetiate E ith respect to : g + H * 0 so, * -H - g

94 Neto s Method Fially, H - g Neto s method coverges quickly asymptotically ad does ot exhibit the zigzaggig behavior the Hessia H has to be a positive defiite matrix for all

95 Gauss-Neto Method The Gauss-Neto method is applicable to a cost fuctio Because the error sigal ei is a fuctio of, e liearize the depedece of ei o by ritig Equivaletly, by usig matrix otatio e may rite i i e E r, ' i e i e i e T r r r r r r +, ' J e e r r r r r +

96 Gauss-Neto Method here J is the -by-m Jacobia matrix of e see bottom of this slide We at updated eight vector + defied by simple algebraic calculatio tells No differetiate this expressio ith respect to ad set the result to 0, e obtai +, ' arg mi e r r r r, ' J J J e e e T T T r r r r r r r r r r + + M M M e e e k e k e k e e e e J L L M L M L M L L M L M L M L L α α α

97 Gauss-Neto Method 0 + J J e J T T r r r Thus e get To guard agaist the possibility that the matrix product J T J is sigular, the customary practice is here is a small positive costat. This modificatio effect is progressively reduced as the umber of iteratios,, is icreased. δ e J J J T T r r r + e J I J J T T r r r + + δ

98 Liear Least-Squares Filter The sigle euro aroud hich it is built is liear The cost fuctio cosists of the sum of error squares Usig ad the error vector is Differetiatig ith respect to correspodigly, From Gauss-Neto method, eq. 3. i i i y T x i y i d i e X d e T e X X J T T d X d X X X + +

99 LMS Algorithm Based o the use of istataeous values for cost fuctio : Differetiatig ith respect to, The error sigal i LMS algorithm : hece, so, e E e e E d e T x e x e E x

100 LMS Algorithm E Usig as a estimate for the gradiet vector, gˆ x e Usig this for the gradiet vector of steepest descet method, LMS algorithm as follos : ˆ + ˆ + ηx e η : learig-rate parameter The iverse of η is a measure of the memory of the LMS algorithm Whe η is small, the adaptive process progress sloly, more of the past data are remembered ad a more accurate filterig actio

101 LMS Characteristics LMS algorithm produces a estimate of the eight vector Sacrifice a distictive feature Steepest descet algorithm : follos a ell-defied trajectory LMS algorithm : ˆ follos a radom trajectory Number of iteratios goes ifiity, performs a radom alk ˆ But importatly, LMS algorithm does ot require koledge of the statistics of the eviromet

102 Covergece Cosideratio To distict quatities, η ad x determie the covergece the user supplies η, ad the selectio of is importat for the LMS algorithm to coverge Covergece of the mea E [ ˆ ] 0 as This is ot a practical value Covergece i the mea square E [ ] e costat as x Covergece coditio for LMS algorithm i the mea square 0 <η < sum of mea -square values of the sesor iputs

103 APPENDIX B Perceptro Covergece Proof

104 Perceptro Covergece Proof Hayki, Chapter 3 Cosider the folloig perceptro: v m i 0 T i x i x T x > T x 0 for every iput vector x 0 for every iput vector x belogig to class C belogig to class C

105 Perceptro Covergece Proof The algorithm for the eight adjustmet for the perceptro if x is correctly classified o adjustmets to T + if x 0 ad x T + if x > 0 ad x otherise belogs belogs to class C to class C T + η x if x > 0 ad x T + + η x if x 0 ad x belogsto classc belogsto classc learig rate parameter η cotrols adjustmet applied to eight vector

106 Perceptro Covergece Proof For η ad 0 0 Suppose the perceptro icorrectly classifies the vectors x, x,... such that T x But siceη Sice 0 so that : + + x 0, iteratively e x + x x + η x for x belogig to C fid + B Sice the classes C ad C are assumed to be liearly separable, there exists a solutio 0 for hich T x>0 for the vectors x, x belogig to the subset H subset of traiig vectors that belog to class C.

107 Perceptro Covergece Proof For a fixed solutio 0, e may the defie a positive umber α as Hece α T mi 0 x x H equatio B aboveimplies B T T T T 0 + 0x + 0x x Usig equatio B above, sice each term is greater or equal tha α, e have T 0 + α No e use the Cauchy-Schartz iequality: a. b a a. b b a b for b or 0

108 Perceptro Covergece Proof This implies that: α + 0 B3 No let s follo aother developmet route otice idex k k + k + x k for k,..., ad xk By takig the squared Euclidea orm of both sides, e get: k + k + x k + T k x k But uder the assumptio the the perceptro icorrectly classifies a iput vector xk belogig to the subset H, e T have k x k < 0 ad hece: H k + k + x k

109 Perceptro Covergece Proof Or equivaletly, k k k k,... ; + x Addig these iequalities for k,, ad ivokig the iitial coditio 00, e get the folloig iequality: 4 B k k β + x Where β is a positive umber defied by; k H k k max x x β Eq. B4 states that the squared Euclidea orm of + gros at most liearly ith the umber of iteratios.

110 Perceptro Covergece Proof The secod result of B4 is clearly i coflict ith Eq. B3. Ideed, e ca state that caot be larger tha some value max for hich Eq. B3 ad B4 are both satisfied ith the equality sig. That is max is the solutio of the eq. max α 0 max Solvig for max give a solutio 0, e fid that max β β α 0 We have thus proved that for η for all, ad for 00, give that a sol vector 0 exists, the rule for adaptig the syaptic eights of the perceptro must termiate after at most max iteratios.

111 MORE READING

112 Suggested Readig. S. Hayki, Neural Netorks, Pretice-Hall, 999, chapter 3. L. Fausett, Fudametals of Neural Netorks, Pretice-Hall, 994, Chapter. R. O. Duda, P.E. Hart, ad D.G. Stork, Patter Classificatio, d editio, Wiley 00. Appedix A4, chapter, ad chapter 5. J.M. Zurada, Itroductio to Artificial Neural Systems, West Publishig Compay, 99, chapter 3.

113 Refereces: These lecture otes ere based o the refereces of the previous slide, ad the folloig refereces. Berli Che Lecture otes: Normal Uiversity, Taipei, Taia, ROC. Ehud Rivli, IIT: 3. Ji Hyug Kim, KAIST Computer Sciece Dept., CS679 Neural Netork lecture otes 4. Dr Joh A. Bulliaria, Course Material, Itroductio to Neural Netorks,

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],