CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9

CS434/541: Pttern Recognton Prof. Olg Veksler Lecture 9

Announcements Fnl project proposl due Nov. 1 1-2 prgrph descrpton Lte Penlt: s 1 pont off for ech d lte Assgnment 3 due November 10 Dt for fnl project due Nov. 15 Must be ported n Mtlb, send me.mt fle wth dt nd short descrpton fle of wht the dt s Lte penlt s 1 pont off for ech d lte Fnl project progress report Meet wth me the week of November 22-26 5 ponts of f I wll see ou tht hve done NOTHNG et Assgnment 4 due December 1 Fnl project due December 8

Tod Lner Dscrmnnt Functons Introducton 2 clsses Multple clsses Optmzton wth grdent descent Perceptron Crteron Functon Btch perceptron rule Sngle smple perceptron rule

lghtness Lner dscrmnnt functons on Rod Mp No probblt dstrbuton (no shpe or prmeters re known) slmon bss slmon slmon Lbeled dt The shpe of dscrmnnt functons s known slmon bss lner dscrmnnt functon lot s known length Need to estmte prmeters of the dscrmnnt functon (prmeters of the lne n cse of lner dscrmnnt) lttle s known

lghtness lghtness Lner Dscrmnnt Functons: Bsc Ide bss slmon bss slmon bd boundr length good boundr length Hve smples from 2 clsses x 1, x 2,, x n Assume 2 clsses cn be seprted b lner boundr l(θ) wth some unknown prmeters θ Ft the best boundr to dt b optmzng over prmeters θ Wht s best? Mnmze clssfcton error on trnng dt? Does not gurntee smll testng error

Prmetrc Methods vs. Assume the shpe of denst for clsses s known p 1 (x θ ), 1 p 2 (x θ ), 2 Estmte θ 1, θ 2, from dt Use Besn clssfer to fnd decson regons c 3 c 2 Dscrmnnt Functons Assume dscrmnnt functons re or known shpe l(θ ), l(θ 1 2 ), wth prmeters θ, θ 1 2, Estmte θ 1, θ 2, from dt Use dscrmnnt functons for clssfcton c 3 c 2 c 1 c 1 In theor, Besn clssfer mnmzes the rsk In prctce, do not hve confdence n ssumed model shpes In prctce, do not rell need the ctul denst functons n the end Estmtng ccurte denst functons s much hrder thn estmtng ccurte dscrmnnt functons Some rgue tht estmtng denstes should be skpped Wh solve hrder problem thn needed?

LDF: Introducton Dscrmnnt functons cn be more generl thn lner For now, we wll stud lner dscrmnnt functons Smple model (should tr smpler models frst) Anltcll trctble Lner Dscrmnnt functons re optml for Gussn dstrbutons wth equl covrnce M not be optml for other dt dstrbutons, but the re ver smple to use Knowledge of clss denstes s not requred when usng lner dscrmnnt functons we cn s tht ths s non-prmetrc pproch

LDF: 2 Clsses A dscrmnnt functon s lner f t cn be wrtten s g(x) = w t x + w 0 w s clled the weght vector nd w 0 clled bs or threshold R 2 x (2) R 1 g(x) > 0 g g g ( x ) > 0 x clss 1 ( x ) < 0 x clss 2 ( x ) = 0 ether clss g(x) < 0 x (1) decson boundr g(x) = 0

LDF: 2 Clsses Decson boundr g(x) = w t x + w 0 =0 s hperplne set of vectors x whch for some sclrs α 0,, α d stsf α 0 +α 1 x (1) + + α d x (d) = 0 A hperplne s pont n 1D lne n 2D plne n 3D

LDF: 2 Clsses g(x) = w t x + w 0 w determnes orentton of the decson hperplne w 0 determnes locton of the decson surfce x (2) g(x) / w w x g(x) > 0 w 0 / w g(x) < 0 x (1) g(x) = 0

LDF: 2 Clsses

LDF: Mn Clsses Suppose we hve m clsses Defne m lner dscrmnnt functons g t ( x) w x + w 0 = = 1,...,m Gven x, ssgn clss c f g ( x) g ( x) j j Such clssfer s clled lner mchne A lner mchne dvdes the feture spce nto c decson regons, wth g (x) beng the lrgest dscrmnnt f x s n the regon R

LDF: Mn Clsses

LDF: Mn Clsses For two contguous regons R nd R j ; the boundr tht seprtes them s porton of hperplne H j defned b: g ( x) g ( x) w x + w = w x + w ( ) t w w x + ( w w ) 0 t t = j 0 j j0 j 0 j0 = Thus w w j s norml to H j And dstnce from x to H j s gven b d( x, H j ) = g ( x ) g ( x ) w w j j

LDF: Mn Clsses Decson regons for lner mchne re convex, z R α + 1 ( α ) z R z j g ( ) g j ( ) nd g ( z) g j ( z) ( α + ( 1 α ) z) g ( α + ( α ) z) j g 1 In prtculr, decson regons must be sptll contguous j R R R R R j s vld decson regon R j s not vld decson regon

LDF: Mn Clsses Thus pplcblt of lner mchne to mostl lmted to unmodl condtonl denstes p(x θ) even though we dd not ssume n prmetrc models Exmple: need non-contguous decson regons thus lner mchne wll fl

LDF: Augmented feture vector Lner dscrmnnt functon: t g( x) = w x + w 0 1 x Cn rewrte t: [ ] t t g x) = w w = = g( ) ( 0 new weght vector new feture vector s clled the ugmented feture vector Added dumm dmenson to get completel equvlent new homogeneous problem t g( x) = w x + w x old problem 1 x d 0 new problem g( ) = 1 x 1 x d t

LDF: Augmented feture vector Feture ugmentng s done for smpler notton From now on we lws ssume tht we hve ugmented feture vectors Gven smples x 1,, x n convert them to ugmented smples 1,, n b ddng new dmenson of vlue 1 = 1 x ( 2) R 2 g() < 0 R 1 g() > 0 g() / g() = 0 (1)

LDF: Trnng Error For the rest of the lecture, ssume we hve 2 clsses Smples 1,, n some n clss 1, some n clss 2 Use these smples to determne weghts n the t dscrmnnt functon g( ) = Wht should be our crteron for determnng? For now, suppose we wnt to mnmze the trnng error (tht s the number of msclssfed smples 1,, n ) g( ) > 0 clssfed c 1 Recll tht g( ) < 0 clssfed c 2 Thus trnng error s 0 f g( g( ) > 0 ) < 0 c c 1 2

LDF: Problem Normlzton Thus trnng error s 0 f Ths suggest problem normlzton : 1. Replce ll exmples from clss c 2 b ther negtve c 2 2. Seek weght vector s.t. t > 0 < > 2 1 0 0 c c t t Equvlentl, trnng error s 0 f ( ) > > 2 t 1 t c 0 c 0 If such exsts, t s clled seprtng or soluton vector Orgnl smples x 1,, x n cn ndeed be seprted b lne then

LDF: Problem Normlzton before normlzton ( 2) fter normlzton ( 2) (1) (1) Seek hperplne tht seprtes ptterns from dfferent ctegores Seek hperplne tht puts normlzed ptterns on the sme (postve) sde

LDF: Soluton Regon Fnd weght vector s.t. for ll smples 1,, n t ( 2) d = k = 0 k ( k ) > 0 best (1) In generl, there re mn such solutons

LDF: Soluton Regon Soluton regon for : set of ll possble solutons defned n terms of norml to the seprtng hperplne ( 2) (1) soluton regon

Optmzton Need to mnmze functon of mn vrbles ( x ) = J( x ) J,..., 1 x d We know how to mnmze J(x) Tke prtl dervtves nd set them to zero x x 1 d J J ( x ) ( x ) = J ( x ) = 0 grdent However solvng nltcll s not lws es Would ou lke to solve ths sstem of nonlner equtons? sn cos 2 2 3 x 4 ( x1 + x2 ) + e = 0 2 3 3 ( x + x ) + log( x ) Sometmes t s not even possble to wrte down n nltcl expresson for the dervtve, we wll see n exmple lter tod 1 2 5 x 2 4 = 0

Optmzton: Grdent Descent ( ) ( ) Grdent J x ponts n drecton of steepest ncrese of J(x), nd J x n drecton of steepest decrese one dmenson two dmensons J(x) dj dx ( ) J( ) dj dx ( ) x dj dx ( )

Optmzton: Grdent Descent (1 ) J( x ) J(x) J ( 2 ) ( x ) s (1) s (2) J ( k ) ( x ) = 0 x x (1) x (2) x (3) x (k) Grdent Descent for mnmzng n functon J(x) set k = 1 nd x (1) to some ntl guess for the weght vector ( k ) ( ) whle ( k η J x ) > ε choose lernng rte η (k) x (k+1) = x (k) η (k) k = k + 1 J( x ) (updte rule)

Optmzton: Grdent Descent Grdent descent s gurnteed to fnd onl locl mnmum J(x) x x (1) x (2) x (3) x (k) globl mnmum Nevertheless grdent descent s ver populr becuse t s smple nd pplcble to n functon

Optmzton: Grdent Descent Mn ssue: how to set prmeter η (lernng rte ) If η s too smll, need too mn tertons J(x) x If η s too lrge m overshoot the mnmum nd possbl never fnd t (f we keep overshootng) J(x) x (1) x (2) x

Tod Contnue Lner Dscrmnnt Functons Perceptron Crteron Functon Btch perceptron rule Sngle smple perceptron rule

lghtness LDF: Augmented feture vector Lner dscrmnnt functon: t g( x) = w x + w 0 bss slmon need to estmte prmeters w nd w 0 from dt length Augment smples x to get equvlent homogeneous problem n terms of smples : g x) = ( 0 1 x [ ] t w w t = = g( ) normlze b replcng ll exmples from clss c 2 b ther negtve c 2

LDF Augmented nd normlzed smples 1,, n t Seek weght vector s.t. > 0 ( 2) ( 2) (1) before normlzton (1) fter normlzton If such exsts, t s clled seprtng or soluton vector orgnl smples x 1,, x n cn ndeed be seprted b lne then

Optmzton: Grdent Descent J(x) (1 ) J( x ) J ( 2 ) ( x ) s x (1) s (1) x (2) s (2) x (3) J ( k ) ( x ) = 0 x (k) Grdent Descent for mnmzng n functon J(x) set k = 1 nd x (1) to some ntl guess for the weght vector ( k ) ( ) whle ( k η J x ) > ε choose lernng rte η (k) x (k+1) = x (k) η (k) k = k + 1 ( k + 1) ( k + 1) ( k ) ( k ) ( k ) = x J( x ) x = η ( J( x )) x (updte rule)

LDF: Crteron Functon Fnd weght vector s.t. for ll smples 1,, n t d = k = 0 k Need crteron functon J() whch s mnmzed when s soluton vector Let Y M be the set of exmples msclssfed b Y Frst nturl choce: number of msclssfed exmples J ( k ) ( ) { t = smple s.t. 0 } M < > ( ) = Y ( ) pecewse constnt, grdent descent s useless M 0 J()

LDF: Perceptron Crteron Functon Better choce: Perceptron crteron functon J p ( ) ( t = ) If s msclssfed, t 0 Thus ( ) 0 J p J p () s tmes sum of dstnces of msclssfed exmples to decson boundr Y M t / J p () s pecewse lner nd thus sutble for grdent descent J()

LDF: Perceptron Btch Rule Grdent of J p () s J ( ) p = ( ) Y M re smples msclssfed b (k) It s not possble to solve J p = nltcll Thus grdent decent btch updte rule for J p () s: J p ( ) ( t = ) becuse of Y M ( ) 0 Y M ( k + 1) ( k ) ( k = + η ) It s clled btch rule becuse t s bsed on ll msclssfed exmples Updte rule for grdent descent: x (k+1) = x (k) η (k) Y M Y M J( x )

LDF: Perceptron Sngle Smple Rule Thus grdent decent sngle smple rule for J p () s: ( k +1) ( k ) ( k ) = + note tht M s one smple msclssfed b (k) must hve consstent w of vstng smples η M Geometrc Interpretton: M msclssfed b (k) ( ( k )) t 0 M M s on the wrong sde of decson hperplne ddng η M to moves new decson hperplne n the rght drecton wth respect to M (k) (k+1) η M M

LDF: Perceptron Sngle Smple Rule ( k +1) ( k ) ( k ) = + η M (k) (k+1) M (k) (k+1) M k k η s too lrge, prevousl correctl clssfed smple k s now msclssfed η s too smll, M msclssfed s stll

LDF: Perceptron Exmple fetures grde nme good ttendnce? tll? sleeps n clss? chews gum? Jne es (1) es (1) no (-1) no (-1) A Steve es (1) es (1) es (1) es (1) F Mr no (-1) no (-1) no (-1) es (1) F Peter es (1) no (-1) no (-1) es (1) A clss 1: students who get grde A clss 2: students who get grde F

LDF Exmple: Augment feture vector fetures grde nme extr good ttendnce? tll? sleeps n clss? chews gum? Jne 1 es (1) es (1) no (-1) no (-1) A Steve 1 es (1) es (1) es (1) es (1) F Mr 1 no (-1) no (-1) no (-1) es (1) F Peter 1 es (1) no (-1) no (-1) es (1) A convert smples x 1,, x n to ugmented smples 1,, n b ddng new dmenson of vlue 1

LDF: Perform Normlzton fetures grde nme extr good ttendnce? tll? sleeps n clss? chews gum? Jne 1 es (1) es (1) no (-1) no (-1) A Steve -1 es (-1) es (-1) es (-1) es (-1) F Mr -1 no (1) no (1) no (1) es (-1) F Peter 1 es (1) no (-1) no (-1) es (1) A Replce ll exmples from clss c 2 b ther negtve c 2 Seek weght vector s.t. t > 0

LDF: Use Sngle Smple Rule fetures grde nme extr good ttendnce? tll? sleeps n clss? chews gum? Jne 1 es (1) es (1) no (-1) no (-1) A Steve -1 es (-1) es (-1) es (-1) es (-1) F Mr -1 no (1) no (1) no (1) es (-1) F Peter 1 es (1) no (-1) no (-1) es (1) A 4 = k = 0 t ( k ) Smple s msclssfed f k < 0 grdent descent sngle smple rule: Set fxed lernng rte to η (k) = 1: ( k + 1) ( k ) ( k = + η ) ( k +1) ( k ) = + M Y M

LDF: Grdent decent Exmple set equl ntl weghts (1) =[0.25, 0.25, 0.25, 0.25] vst ll smples sequentll, modfng the weghts for fter fndng msclssfed exmple nme Jne Steve t 0.25*1+0.25*1+0.25*1+0.25*(-1)+0.25*(-1) >0 0.25*(-1)+0.25*(-1)+0.25*(-1)+0.25*(-1)+0.25*(-1)<0 msclssfed? no es new weghts ( 2) ( 1) = + = [ 0.25 0.25 0.25 0.25 0. 25]+ M [ 1 1 1 1 ] = [ 0.75 0.75 0.75 0.75 0.75] + 1 =

LDF: Grdent decent Exmple ( 2) = [ 0.75 0.75 0.75 0.75 0.75] nme Mr t -0.75*(-1)-0.75*1-0.75 *1-0.75 *1-0.75*(-1) <0 msclssfed? es new weghts ( 3) ( 2) = + = [ 0.75 0.75 0.75 0.75 0. 75]+ M = [ 1 1 1 1 ] = + 1 [ 1.75 0.25 0.25 0.25 1.75]

LDF: Grdent decent Exmple ( 3) = [ 1.75 0.25 0.25 0.25 1.75] nme Peter t -1.75 *1 +0.25* 1+0.25* (-1) +0.25 *(-1)-1.75*1 <0 msclssfed? es new weghts ( 4) ( 3) = + = [ 1.75 0.25 0.25 0.25 1. 75]+ M [ 1 1 1 1 ] = + 1 = [ 0.75 1.25 0.75 0.75 0.75]

LDF: Grdent decent Exmple ( 4) = [ 0.75 1.25 0.75 0.75 0.75] nme Jne Steve Mr Peter t -0.75 *1 +1.25*1-0.75*1-0.75 *(-1) -0.75 *(-1)+0-0.75*(-1)+1.25*(-1) -0.75*(-1) -0.75*(-1)-0.75*(-1)>0-0.75 *(-1)+1.25*1-0.75*1-0.75 *1 0.75*(-1) >0-0.75 *1+ 1.25*1-0.75* (-1)-0.75* (-1) -0.75 *1 >0 msclssfed? no no no no Thus the dscrmnnt functon s ( 0) ( 1) ( 2) ( 3) ( 4) g( ) = 0.75 * + 1.25 * 0.75 * 0.75 * 0.75 * Convertng bck to the orgnl fetures x: ( 1) ( 2) ( 3) ( 4) g( x ) = 1.25 * x 0.75 * x 0.75 * x 0.75 * x 0.75

LDF: Grdent decent Exmple Convertng bck to the orgnl fetures x: 1.25 * x 1.25 * x ( 1) ( 2) ( 3) ( 4) 0.75 * x 0.75 * x 0.75 * x > 0. 75 ( 1) ( 2) ( 3) ( 4) 0.75 * x 0.75 * x 0.75 * x < 0. 75 grde grde A F good ttendnce tll sleeps n clss chews gum Ths s just one possble soluton vector If we strted wth weghts (1) =[0,0.5, 0.5, 0, 0], soluton would be [-1,1.5, -0.5, -1, -1] 1.5 * 1.5 * x x ( 1) ( 2 ) ( 3 ) ( 4 ) 0.5 * x x x ( 1) ( 2 ) ( 3 ) ( 4 ) 0.5 * x x x > 1 grde A < 1 grde F In ths soluton, beng tll s the lest mportnt feture

LDF: Nonseprble Exmple Suppose we hve 2 fetures nd smples re: Clss 1: [2,1], [4,3], [3,5] Clss 2: [1,3] nd [5,6] These smples re not seprble b lne Stll would lke to get pproxmte seprton b lne, good choce s shown n green some smples m be nos, nd t s ok f the re on the wrong sde of the lne Get 1, 2, 3, 4 b ddng extr feture nd normlzng 1 = 2 1 1 2 = 1 4 3 3 = 1 3 5 4 = 3 11 5 = 5 1 6

LDF: Nonseprble Exmple Let s ppl Perceptron sngle smple lgorthm ntl equl weghts ( 1 ) = [ 1 1 1] ths s lne x (1) +x (2) +1=0 fxed lernng rte η = 1 ( k +1) ( k ) = + M (1) 1 = 2 1 1 2 = 1 4 3 3 = 1 3 5 4 = 3 11 5 = 5 1 6 t 1 (1) = [1 1 1]*[1 2 1] t > 0 t 2 (1) = [1 1 1]*[1 4 3] t > 0 t 3 (1) = [1 1 1]*[1 3 5] t > 0

LDF: Nonseprble Exmple ( ) [ 1 1 1] 1 = ( k +1) ( k ) = + M 1 = 2 1 1 2 = 1 4 3 3 = 1 3 5 4 = 3 11 5 = 5 1 6 t 4 (1) =[1 1 1]*[-1-1 -3] t = -5< 0 (1) (2) ( 2 ) ( 1) = + = [ 1 1 1] + [ 1 1 3] = [ 0 0 2] M t 5 (2) =[0 0-2]*[-1-5 -6] t = 12 > 0 t 1 (2) =[0 0-2]*[1 2 1] t < 0 ( 3 ) ( 2 = ) + = [ 0 0 2 ] + [ 1 2 1 ] = [ 1 2 1 ] M

LDF: Nonseprble Exmple ( ) = [ 1 2 1] 3 ( k +1) ( k ) = + M 1 = 2 1 1 2 = 1 4 3 3 = 1 3 5 4 = 3 11 5 = 5 1 6 (3) t 2 (3) =[1 4 3]*[1 2-1] t =6 > 0 t 3 (3) =[1 3 5]*[1 2-1] t > 0 t 4 (3) =[-1-1 -3]*[1 2-1] t = 0 ( 4 ) ( 3 ) = + = [ 1 2 1] + [ 1 1 3] = [ 0 1 4] M (2)

LDF: Nonseprble Exmple ( ) = [ 0 1 4] 4 ( k +1) ( k ) = + M 1 = 2 1 1 2 = 1 4 3 3 = 1 3 5 4 = 3 11 5 = 5 1 6 (3) t 2 (3) =[1 4 3]*[1 2-1] t =6 > 0 t 3 (3) =[1 3 5]*[1 2-1] t > 0 t 4 (3) =[-1-1 -3]*[1 2-1] t = 0 ( 4 ) ( 3 ) = + = [ 1 2 1] + [ 1 1 3] = [ 0 1 4] M (4)

LDF: Nonseprble Exmple we cn contnue ths forever there s no soluton vector stsfng for ll t 5 = k = 0 k ( k ) > 0 need to stop but t good pont: solutons t tertons 900 through 915. Some re good some re not. How do we stop t good soluton?

LDF: Convergence of Perceptron rules If clsses re lnerl seprble, nd use fxed lernng rte, tht s for some constnt c, η (k) =c both sngle smple nd btch perceptron rules converge to correct soluton (could be n n the soluton spce) If clsses re not lnerl seprble: lgorthm does not stop, t keeps lookng for soluton whch does not exst b choosng pproprte lernng rte, cn lws ensure ( k ) convergence: η 0 s k ( 1) ( k ) η for exmple nverse lner lernng rte: η = k for nverse lner lernng rte convergence n the lnerl seprble cse cn lso be proven no gurntee tht we stopped t good pont, but there re good resons to choose nverse lner lernng rte

LDF: Perceptron Rule nd Grdent decent Lnerl seprble dt perceptron rule wth grdent decent works well Lnerl non-seprble dt need to stop perceptron rule lgorthm t good pont, ths mbe trck Btch Rule Smoother grdent becuse ll smples re used Sngle Smple Rule eser to nlze Concentrtes more thn necessr on n solted nos trnng exmples