Minimum Squred Error
LDF: Minimum Squred-Error Procedures Ide: conver o esier nd eer undersood prolem Percepron y i > 0 for ll smples y i solve sysem of liner inequliies MSE procedure y i i for ll smples y i solve sysem of liner equions Choose posiive consns,,, n ry o find weigh vecor s.. y i i for ll smples y i If we cn find weigh vecor such h y i i for ll smples y i, hen is soluion ecuse i s re posiive consider ll he smples (no jus he misclssified ones)
LDF: MSE Mrgins g(y) 0 y i y k Since we wn y i i, we expec smple y i o e disnce i from he sepring hyperplne (normlized y ) Thus,,, n give relive expeced disnces or mrgins of smples from he hyperplne Should mke i smll if smple i is expeced o e ner sepring hyperplne, nd mke i lrger oherwise In he sence of ny ddiionl informion, here re good resons o se n
LDF: MSE Mrix Noion Need o solve n equions Inroduce mrix noion: y y n M n ( 0) ( ) ( d) y y L y ( ) ( ) ( d) 0 0 y y L y ( ) ( ) ( ) M M M M M M d M 0 d y n n y n L y n Y Thus need o solve liner sysem Y
LDF: Exc Soluion is Rre Thus need o solve liner sysem Y Y is n n y (d +) mrix Exc soluion cn e found only if Y is nonsingulr nd squre, in which cse he inverse Y - exiss Y - (numer of smples) (numer of feures + ) lmos never hppens in prcice in his cse, gurneed o find he sepring hyperplne y y
LDF: Approxime Soluion Typiclly Y is overdeermined, h is i hs more rows (exmples) hn columns (feures) If i hs more feures hn exmples, should reduce dimensionliy Y Need Y, u no exc soluion exiss for n overdeermined sysem of equion More equions hn unknowns Find n pproxime soluion, h is Y Noe h pproxime soluion does no necessrily give he sepring hyperplne in he seprle cse Bu hyperplne corresponding o my sill e good soluion, especilly if here is no sepring hyperplne
LDF: MSE Crierion Funcion Minimum squred error pproch: find which minimizes he lengh of he error vecor e e Y e Thus minimize he minimum squred error crierion funcion: n ( ) Y y J s ( ) i i Unlike he percepron crierion funcion, we cn opimize he minimum squred error crierion funcion nlyiclly y seing he grdien o 0 i Y
LDF: Opimizing J s () J s ( ) Le s compue he grdien: J s n Y ( ) y i i i J ( ) M 0 Y ( Y ) s J Seing he grdien o 0: Y s d ( ) Y 0 Y Y Y
LDF: Pseudo Inverse Soluion Mrix Y Y is squre (i hs d + rows nd columns) nd i is ofen non-singulr If Y Y is non-singulr, is inverse exiss nd we cn solve for uniquely: ( ) Y Y Y pseudo inverse of Y ( ) ) Y Y Y Y ( Y Y) ( Y Y) I
LDF: Minimum Squred-Error Procedures If n, MSE procedure is equivlen o finding hyperplne of es fi hrough he smples y,,y n J ( ) Y s n n M n Then we shif his line o he origin, if his line ws good fi, ll smples will e clssified correcly
LDF: Minimum Squred-Error Procedures Only gurneed he sepring hyperplne if Y > 0 y h is if ll elemens of vecor Y M re posiive y n We hve Y + ε n + ε n Th is Y M where ε my e negive If ε,, ε n re smll relive o,, n, hen ech elemen of Y is posiive, nd gives sepring hyperplne If pproximion is no good, ε i my e lrge nd negive, for some i, hus i + ε i will e negive nd is no sepring hyperplne Thus in linerly seprle cse, les squres soluion does no necessrily gives sepring hyperplne Bu i will give resonle hyperplne
LDF: Minimum Squred-Error Procedures We re free o choose. My e emped o mke lrge s wy o insure Y > 0 Does no work Le β e sclr, le s ry β insed of if * is les squres soluion o Y, hen for ny sclr β, les squres soluion o Y β is β* rg min Y β rg minβ Y( / β) ( / ) rg min Y β β * hus if for some i h elemen of Y is less hn 0, h is y i < 0, hen y i (β) < 0, Relive difference eween componens of mers, u no he size of ech individul componen
LDF: Exmple Clss : (6 9), (5 7) Clss : (5 9), (0 4) Se vecors y, y, y 3, y 4 y dding exr feure nd normlizing 6 y 9 5 y 7 5 y 3 9 y 0 4 4 Mrix Y is hen Y 6 5 5 0 9 7 9 4
LDF: Exmple Choose In ml, Y\ solves he les squres prolem. 7. 0.9 0 Noe is n pproximion o Y, since no exc soluion exiss Y 0. 0.. 3 4. 6 This soluion does give sepring hyperplne since Y > 0
LDF: Exmple Clss : (6 9), (5 7) Clss : (5 9), (0 0) The ls smple is very fr compred o ohers from he sepring hyperplne y Mrix 6 9 5 y 7 5 y 3 9 y 0 0 4 Y 6 5 5 0 9 7 9 0
LDF: Exmple Choose In ml, Y\ solves he les squres prolem 3. 0 0..4 Noe is n pproximion o Y, since no exc soluion exiss Y 0. 0.6. 0 04. 9 This soluion does no give sepring hyperplne since y 3 < 0
LDF: Exmple MSE pys o much enion o isoled noisy exmples (such exmples re clled ouliers) oulier MSE soluion desired soluion No prolems wih convergence hough, nd soluion i gives rnges from resonle o good
LDF: Exmple we know h 4 h poin is fr fr from sepring hyperplne In prcice we don know his Thus pproprie 0 In Ml, solve Y\. 0..9 7 Noe is n pproximion o Y, 0. 0 0..0. 0 9 8 This soluion does give he sepring hyperplne since Y > 0 Y 0
LDF: Grdien Descen for MSE soluion J s ( ) Y My wish o find MSE soluion y grdien descen:. Compuing he inverse of Y Y my e oo cosly. Y Y my e close o singulr if smples re highly correled (rows of Y re lmos liner cominions of ech oher) compuing he inverse of Y Y is no numericlly sle In he eginning of he lecure, compued he grdien: ( ) J Y ( Y ) s
LDF: Widrow-Hoff Procedure Thus he upde rule for grdien descen: ( ) ( ) ( ) ( ) J Y ( Y ) ( k+ ) ( k) ( k) ( k) η Y k If η η / k weigh vecor (k) converges o he MSE Y soluion, h is Y (Y-)0 s Widrow-Hoff procedure reduces sorge requiremens y considering single smples sequenilly: ( y ) ( k+ ) ( k) ( k) ( k) η y i i i
LDF: Ho-Kshyp Procedure In he MSE procedure, if is chosen rirrily, finding sepring hyperplne is no gurneed Suppose rining smples re linerly seprle. Then here is s nd posiive s s.. Y s s > 0 If we knew s could pply MSE procedure o find he sepring hyperplne Ide: find oh s nd s Minimize he following crierion funcion, resricing o posiive : ( ), Y J HK
LDF: Ho-Kshyp Procedure J HK ( ), Y As usul, ke pril derivives w.r.. nd J HK J HK ( Y ) 0 Y ( Y ) 0 Use modified grdien descen procedure o find minimum of J HK (,) Alerne he wo seps elow unil convergence: ) Fix nd minimize J HK (,) wih respec o ) Fix nd minimize J HK (,) wih respec o
LDF: Ho-Kshyp Procedure J HK ( Y ) 0 Y J HK ( Y ) 0 Alerne he wo seps elow unil convergence: ) Fix nd minimize J HK (,) wih respec o ) Fix nd minimize J HK (,) wih respec o Sep () cn e performed wih pseudoinverse For fixed minimum of J HK (,) wih respec o is found y solving Thus ( Y ) 0 Y ( ) Y Y Y
LDF: Ho-Kshyp Procedure Sep : fix nd minimize J HK (,) wih respec o We cn use Y ecuse hs o e posiive Soluion: use modified grdien descen sr wih posiive, follow negive grdien u refuse o decrese ny componens of This cn e chieved y seing ll he posiive componens of J o 0 No doing seepes descen nymore, u we re sill doing descen nd ensure h is posiive
LDF: Ho-Kshyp Procedure The Ho-Kshyp procedure: 0) Sr wih rirry () nd () > 0, le k repe seps () hrough (4) ( k) ( k) ( k) ) e Y ) Solve for (k+) using (k) nd (k) [ e e ] ( k+ ) ( k) ( k) ( k) + η + 3) Solve for (k+) using (k+) 4) k k + ( k+ ) ( ) ( k+ ) unil e (k) > 0 or k > k mx or (k+) (k) Y Y For convergence, lerning re should e fixed eween 0 < η < Y
LDF: Ho-Kshyp Procedure In he linerly seprle cse, e (k) 0, found soluion, sop one of componens of e (k) is posiive, lgorihm coninues In non seprle cse, e (k) will hve only negive componens evenully, hus found proof of nonsepriliy No ound on how mny ierion need for he proof of nonsepriliy
LDF: Ho-Kshyp Procedure Exmple Clss : (6 9), (5 7) Clss : (5 9), (0 0) Mrix Y ( ) Sr wih nd 6 5 5 0 Use fixed lerning η 0.9 6 ( ) A he sr Y 5 3 9 7 9 0 ( )
LDF: Ho-Kshyp Procedure Exmple Ierion : e ( ) ( ) ( ) Y 6 5 3 5 6 solve for () using () nd () [ e + e ] ( ) ( ) ( ) ( ) + 0.9 + solve for () using () ( ) ( ) ( ) Y Y Y.6 0.6 0.6 5 + 0.9 6 4.7 0. 0.5.6 0. 0. + 0.5 0. * 0. 5 6 8.6 8.6 34. 3.. 7 6 8
LDF: Ho-Kshyp Procedure Exmple Coninue ierions unil Y > 0 In prcice, coninue unil minimum componen of Y is less hen 0.0 Afer 04 ierions converged o soluion 7 34..3. 3 9 does gives sepring hyperplne Y 8 3 47 0 7..48. 4. 5
LDF: MSE for Muliple Clsses Suppose we hve m clsses Define m liner discriminn funcions g i ( x) w x+ w 0 i i i,...,m Given x, ssign clss c i if g i ( x ) g ( x ) j j i Such clssifier is clled liner mchine A liner mchine divides he feure spce ino c decision regions, wih g i (x) eing he lrges discriminn if x is in he region R i
LDF: MSE for Muliple Clsses For ech clss i, find weigh vecor i, s.. i i y y 0 y y clss clss Le Y i e mrix whose rows re smples from clss i, so i hs d + columns nd n i rows i i Le s pile ll smples in n y d + mrix Y: Y Y Y M Y m smple from smple from M smple from smple from clss clss clss m clss m
LDF: MSE for Muliple Clsses Le i e column vecor of lengh n which is 0 everywhere excep rows corresponding o smples from clss i, where i is : i 0 M M 0 M rows corresponding o smples from clss i
LDF: MSE for Muliple Clsses Le s pile ll i s columns in n y c mrix B B [ L ] n Le s pile ll i s columns in d + y m mrix A A [ L ] m m LSE prolems cn e represened in YA B: smple from clss smple from clss smple from clss smple from clss 3 smple from clss 3 smple from clss 3 0 0 0 0 0 0 0 0 0 0 0 0 Y A B
LDF: MSE for Muliple Clsses Our ojecive funcion is: J m ( A) i Y i i J(A) is minimized wih he use of pseudoinverse A ( Y Y) YB
LDF: Summry Percepron procedures find sepring hyperplne in he linerly seprle cse, do no converge in he non-seprle cse cn force convergence y using decresing lerning re, u re no gurneed resonle sopping poin MSE procedures converge in seprle nd no seprle cse my no find sepring hyperplne if clsses re linerly seprle use pseudoinverse if Y Y is no singulr nd no oo lrge use grdien descen (Widrow-Hoff procedure) oherwise Ho-Kshyp procedures lwys converge find sepring hyperplne in he linerly seprle cse more cosly