New Schedule. Dec. 8 same same same Oct. 21. ^2 weeks ^1 week ^1 week. Pattern Recognition for Vision

ew Schedule Dec. 8 same same same Oct. ^ weeks ^ week ^ week Fall 004 Patter Recogto for Vso

9.93 Patter Recogto for Vso Classfcato Berd Hesele Fall 004

Overvew Itroducto Lear Dscrmat Aalyss Support Vector Maches Lterature & Homework Fall 004 Patter Recogto for Vso

Itroducto Classfcato Lear, o-lear separato wo class, mult-class problems wo approaches: Desty estmato, classfy wth Bayes decso: Lear Dscr. Aalyss (LDA), Quadratc Dscr. Aalyss (QDA) Wthout desty estmato: Support Vector Maches (SVM) Fall 004 Patter Recogto for Vso

LDA Bayes Rule Bayes Rule p(x,w ) = p(x w ) P (w ) = P(w x ) p ( x ) lkelhood pror P ( w x ) = posteror p ( x w ) P( w ) p ( x ) evdece x : radom vector w : class Fall 004 Patter Recogto for Vso

LDA Bayes Decso Rule x Decde w f P(w ) P(w x) Lkelhood Rato > ; otherwse w p(x w ) P ( w ) > p(x w ) P ( w ) Log Lkelhood Rato l p(x w ) ( w ) p(x w ) P( ) P > 0, l w + l Ł p(x w ) ( w ) Ł p(x w ) Ł P( ) > 0 P ł ł w ł Fall 004 Patter Recogto for Vso

LDA wo Classes, Idetcal Covarace p ( x w ) P ( w ) l + l p ( x w ) P ( w ) Ł ł Ł ł - - P ( w ) = ( x - m ) S ( x - m ) - ( x - m ) S ( x - m ) + l Ł P ( w ) ł - - P ( w ) = x S ( m - m ) + ( m + m ) S ( m - m ) + l Ł P ( w ) w ł = + xw b lear decso fucto: w f xw + b > 0 b ( ) - x m ( x m ) - - S - Gaussa: p ( x w ) = e ( ) d / / p S assume detcal covarace matrces S =S : Fall 004 Patter Recogto for Vso

X w LDA wo Classes, Idetcal Covarace f U Rotato f D -/ Scalg X x earest eghbor classfer X - - f ( x ) = x S ( m - m ) + ( m + m ) S ( m - m ) + l( P ( w ) / P ( w )) = x ( m - m ) + ( m + m ) ( m - m ) + l( P ( w ) / P ( w )), GG = S, x = x G, x = G x, m - m = G ( m - m ) - S = UDU, S = UD U = GGG=, D U - - -/ b m m f ( x ) = 0 Fall 004 Patter Recogto for Vso

LDA Computato m ˆ = x, = ˆ S = x - ˆ x - ˆ (, m )(, m ) + = = Desty estmato f ( x ) = sg ( xw + b ) w b ˆ - m ˆ ˆ m = S ( - ) - P( w ) = ( m ˆ ˆ ˆ ˆ + m ) S ( m - m ) + l Ł P ( w ) ł Approxmate by Fall 004 Patter Recogto for Vso

QDA wo classes, dfferet covarace matrx decde w f f ( x ) > 0 ( ( w )) w ( ( w )) ( ( w )) f ( x ) = l p x + l( P ( ))- l p x - l P - l ( p ( x w )) =- l S - ( x - m ) S ( x - m ) + l P ( w ) where f ( x ) = xax + wx + w - quadratc w Quadratc Dscrmat Aalyss A =- =S m -S w o = 0 ( - - S -S ) m - -... well, the rest of t - a matrx - a vector - a scalar Fall 004 Patter Recogto for Vso

LDA Multclass, Idetcal Covarace Fd the lear decso boudares for k classes: For two classes we have: f ( ) x = xw x > + b decde w f f ( ) 0 I the mult-class case we have k - decso fuctos: f, ( x ) = xw, f,3 ( ) f, + b,, x = xw,3 + b,3, k ( x) = xw, k + b, k we have to determe (k -)( p +) parameters, p s dmeso of x Fall 004 Patter Recogto for Vso

LDA Multclass, Idetcal Covarace w m m 3 X m y ( m ) m 3 m X m w y ( m ) Fd the - dmesoal subspace that gves the best lear dscrmato betw. the k classes. y = ( w w... w ) x also kow as Fsher Lear Dscrmat Fall 004 Patter Recogto for Vso

LDA Multclass, Idetcal Covarace Computato compute the d k matrx M = (m m... m ) ad cov. matrx S k compute the m : M =G M, S - =GG compute the cov. matrx B of m compute the egevectors v of B raked by egevalues calculate y by projectg x to X ad the oto the egevector: G y = v x w =G v Fall 004 Patter Recogto for Vso

LDA Fsher s Approach Fd w such that the rato of betwee-class ad -class varace s maxmzed f the data s projected oto w : y = w Bw max, B = MM the covarace of the m 's w Sw ca be wrtte as: max w x w Bw subject to w Sw = geeralzed egevalue problem, soluto are the raked egevectors of...same s a prevous dervato. - S B Fall 004 Patter Recogto for Vso

he Coffee Problem: LDA vs. PCA Image removed due to copyrght cosderatos. See: : R. Guterrez-Osua http://research.cs.tamu.edu/prsm/lectures/pr/pr_l0.pdf Fall 004 Patter Recogto for Vso

LDA/QDA Summary Advatages: LDA s the Bayes classfer for multvarate Gaussa dstrbutos wth commo covarace. LDA creates lear boudares whch are smple to compute. LDA ca be used for represetg mult-class data low dmesos. QDA s the Bayes classfer for multvarate Gaussa dstrbutos. QDA creates quadratc boudares. Problems: LDA s based o a sgle prototype per class (class ceter) whch s ofte suffcet practce. Fall 004 Patter Recogto for Vso

Varats of LDA oparameterc LDA (Fukuaga) removes the umodal assumpto by the scatter matrx usg local formato. More tha k- features ca be extracted. Orthoormal LDA (Okada&omta) computes projectos that maxmze separablty ad are par-wse orthoormal. Geeralzed LDA (Lowe) Icorporates a cost fucto smlar to Bayes Rsk mmzato..ad may may more (see Elemets of Statstcal Learg Haste, bshra, Fredma) Fall 004 Patter Recogto for Vso

SVM Lear, Separable (LS) Fd the separatg fucto f ( x ) = sg( xw+ b) whch maxmzes the marg M = d o the trag data. Maxmum marg classfer. Fall 004 Patter Recogto for Vso

w, b SVM Prmal, (LS) { x x } { } rag data cossts of pars (, y ),...,(, y ), y -,. he problem of maxmzg the marg d max d subject to y ( xw + b ) d w = ca be formulated as: w or alteratvely: w = w / d, b = b / d w y ( xw + b ), where d = m subject to w, b Covex optmzato problem wth quadratc objectve fucto ad lear costrats. w Fall 004 Patter Recogto for Vso

SVM Dual, (LS) Multply costrat equatos by postve Lagrage multplers ad subtract them from the objectve fucto: L = - º y ( x w + b )-ø ß P w a Ø = M. L w.r. t. w ad b ad max. w.r. t. a, subject to a 0. P Set dervatves d L / d w ad d L / d b to zero ad max. w.r. t. a : = = substttug L we get the so called Wolfe dual: max. L = D a - aa k yy kx x k = k = = subject to a 0, p w = a y x, a y = 0. P P = a y = 0 solve for a the compute w = a y x ad b from a ( ) 0 º Ø y x w + b - ß ø = Fall 004 Patter Recogto for Vso

SVM Prmal vs. Dual (LS) Prmal: w m w, b Dual: subject to: ( xw + b ) max subject to: 0, a y a aa k yy kx x k a a y = k = = = - = 0 he prmal has a dese equalty costrat for every pot the trag set. he dual has a sgle dese equalty costrat ad a set of box costrats whch makes t easer to solve tha the prmal. Fall 004 Patter Recogto for Vso

SVM Optmalty Codtos (LS) Optmalty codtos for the learly separable data: = a y = 0, a 0 ", y ( x w + b )- 0 " w = a y x f ( x ) = a y x x + b a = = ( y ( x w + b )- ) = 0 " a =0 for pots whch are a 0 ot o the boudary of the a =0 marg. a =0 a 0 pots wth a > 0 are support vectors. Fall 004 Patter Recogto for Vso

SVM Lear, o-separable (LS) f ( x ) = f ( x ) = 0 f ( x ) =- x d = M m w w, b x + C x subject to: = y ( xw + b ) - x ", x > 0 " C are called slack varables costat, pealzes errors Fall 004 Patter Recogto for Vso

SVM Dual (LS) Same procedure as separable case max. L = D a - aa = k = = = kyy kx x k subject to 0 a C, a y = 0 solve for a the compute w ad b from y ( x w + b )- = 0 = a for ay sample x for whch 0 y x < a < C Fall 004 Patter Recogto for Vso

SVM Optmalty Codtos (LS) f ( x ) = f ( x ) = 0 f ( x ) =- x d = M a = 0 x = 0, yf ( x ) > 0 < a < C x = 0 yf ( x ) = a = C x 0, yf ( x ) = a x x + = f ( x ) y b ubouded support vectors bouded support vectors Fall 004 Patter Recogto for Vso

SVM o-lear (L) x put space o-lear mappg: x =F( x ) x feature space x x Project to feature space, apply SVM procedure Fall 004 Patter Recogto for Vso

SVM Kerel rck k k = k = = max. a - aa yyx x k subject to 0 C, a y = 0 a = Oly the er product of the samples appears the objectve fucto. If we ca wrte: K(x, x ) = x x k k we ca avod ay computatos the feature space. he soluto f (x ) = x w+ b ca be wrtte as: x = a y F( x ) F(x ) + b = = f ( ) y x. usg w = a = a y K (x x) + b, Fall 004 Patter Recogto for Vso

SVM Kerels Whe s a Kerel K ( uv, ) a er product a Hlbert space? K ( uv, ) = lf ( u ) f ( v ) wth postve coeffcets f for ay g ( u ) L K ( uv, ) g ( u ) g ( v ) dud v 0 Mercer's codto l Some examples of commoly used kerels: Lear kerel: Polyomal kerel: ( + u v ) Gaussa kerel (RBF): u v MLP: tah( u v -q ) d exp( - u- v ), shft var. Fall 004 Patter Recogto for Vso

SVM Polyomal Kerel Polyomal secod degree kerel K ( uv, ) = ( + u v ), uv, R = + ( u v ) + ( u v ) + uvu v + u v + uv = (, u, u, u u, u, u )(, v, v, v v, v, v ) F ( x ) = (, x, x, x x, x, x ) Shft varat kerel K ( uv, ) = defed o L d ([0, ] ) ca as the Fourer seres of K : K ( u- v ) be wrtte j p kt j p kt j pkt - l k e, ( 0 ) l k e e k =- k =- f( t ) = f t - t = j p k - j p k d ( - ) = k u k v u v l k " k k ([-, ] ) k = 0 K e e Z 0 Fall 004 Patter Recogto for Vso

SVM Uqueess Are the a the soluto uque? o f ( x ) = a y + b x + x = w = (, 0) w two solutos: - + a = (0.5,0.5,0.5,0.5) 3 - x a = (0.5,0.5,0,0) costrats: a 0, a y = = 0 are satsfed x x 4 More mportat: s the soluto f ( x ) uque? Yes, the soluto s uque ad global f the objectve fucto s strctly covex. y = + x Fall 004 Patter Recogto for Vso

SVM Multclass Bottom-Up vs A or B or C or D A vs. All B,C,D A or B C or D B A,C,D C A,B,D A B C D D A,B,C rag: k (k-) / Classfcato : k- rag: k Classfcato : k Fall 004 Patter Recogto for Vso

SVM Choosg the Kerel How to choose the kerel? Lear SVMs are smple to compute, fast at rutme but ofte ot suffcet for complex tasks. SVM wth Gaussa kerels showed excellet performace may applcatos (after some tug of sgma). Slow at ru-tme. Polyomal wth d are commoly used computer vso applcatos. Good trade off betwee classfcato performace computatoal complexty. Fall 004 Patter Recogto for Vso

SVM Example Face detecto wth lear ad d degree poly. SVM & LDA Fall 004 Patter Recogto for Vso

SVM Choosg C How to choose the C-value? m w, b w + C x = C-value pealzes pots wth the marg. Large C-value ca lead to poor geeralzato performace (over-fttg). From ow experece object detecto tasks: Fd a kerel ad C-values whch gve you zero errors o the trag set. Fall 004 Patter Recogto for Vso

SVM Computato durg Classfcato I computer vso applcatos fast classfcato s usually more mportat tha fast trag. wo ways of computg the decso fucto f ( x ): a) w F( x ) + b b) a y K (x, x) + b Whch oe s faster? = -For a lear kerel a) -For a polyomal d degree kerel: Multplcatos for a): G F, poly = ( + ), where s dm. of x Multplcatos for b): G K, poly = ( + ) s, where s s b. of sv's -Gaussa kerel: oly b) sce dm. of F( x ) s. Fall 004 Patter Recogto for Vso

Learg heory Problem Formulato From a gve set of trag examples {x, y }lear the mappg x f y. he learg mache s defed by a set of possble mappgs x f f (x,a ) where a s the adjustable parameter of f. he goal s to mmze the expected rsk R : R ( a ) = V ( f (x,a), y ) d P ( x, y ) V s the loss fucto P s the probablty dstrbuto fucto We ca't compute R ( a ) sce we do't kow P (x, y ) Fall 004 Patter Recogto for Vso

Learg heory Emprcal Rsk Mmzato o solve the problem mmze the "emprcal rsk" R emp over the trag set: R a = emp ( ) V ( f ( x, a ), y ) = V s the loss fucto Commo loss fuctos: V ( f y ) ( y f ( x ), = - ( x )) least squares V ( f ( x ), y ) = (- yf ( x )) hge loss where ( x ) max( x, 0) + + yf ( x ) Fall 004 Patter Recogto for Vso

Learg heory & SVM Boud o the expected rsk: For a loss fucto wth 0 V ( f ( x ), y ) wth probablty - h, 0 h the followg boud holds: R ( a ) R emp R ( a ) + emp ( a ) emprcal rsk umber of trag examples h l( / h ) + h -l( h / 4) h Vapk Chervoeks (VC) dmeso Boud s depedat of the probabllty dstrbuto P ( x, y). Keep all parameters the boud fxed except oe: (- h) boud, boud fl, h boud Fall 004 Patter Recogto for Vso

Learg heory VC Dmeso he VC dmeso s a property of the set of fuctos If for a set of pots labeled all possble ways { } { f a } oe ca fd a f f ( a) whch separates the pots correctly oe says that the set of pots s shattered by { f ( a )}. he VC dmeso s the maxmum umber of pots that ca be shattered by { f ( a ). } ( ). he VC dmeso of a fuctos f : w x + b = 0 dm: Fall 004 Patter Recogto for Vso

Learg heory SVM D M he expected rsk ER ( ) for the optmal hyperplaes: ED ( / M ) ER ( ) where the expectato s over all trag sets of sze. 'Algorthms that maxmze the marg have better geeralzato performace.' Fall 004 Patter Recogto for Vso

Bouds Most bouds o expected rsk are very loose to compute stead: Cross Valdato Error Error o a cross valdato set whch s dfferet from the trag set. Leave-oe-out Error Leave oe trag example out of the trag set, tra classfer ad test o the example whch was left out. Do ths for all examples. For SVMs upper bouded by the # of support vectors. Fall 004 Patter Recogto for Vso

Regularzato heory Gve examples ( x, y ), x R, y {0,} solve: m f H where = f V ( f ( x ), y ) K + g Hlbert Space (RKHS) H,wth the reproducg kerel K, g s the regularzato par ameter. g f K = K f Uder rather geeral codtos the soluto ca be wrtte as: f ( x ) = ck ( xx, ) s the orm a Reproducg Kerel ca be terpreted as a smoothess costrat. Smooth fucto Fall 004 Patter Recogto for Vso

Regularzato Reproducg Kerel Hlbert Space (RKHS) Reproducg Kerel Hlbert Space (RKHS) H f ( x ) = K ( xy, ), f ( y ) Postve umbers l ad orthoormal set of fuctos f ( x ), f ( x ) f ( x) d x = 0 for m, ad otherwse : m K ( xy, ) lf ( x ) f ( y ), l f ( x ) = a f ( x ), a = f ( x ) f ( x) d x, f ( x ), f ( y ) H f ( x ) = f ( x ), f ( x ) = a H H l H a b l are oegatve egevalues of K / l Fall 004 Patter Recogto for Vso

K( x, y ) = exp( ( x y )),, 0, wrte - - xy [ ] K( x, y ) as Fourer expaso usg shft theorem: K( x, y ) = l exp( j p x ) exp( - j p y ) Perod = where l l Regularzato Smple Example of RKHS Kerel s a oe dmesoal Gaussa wth s =: l are the Fourer = A exp( - / ) decreases wth hgher frequeces (creasg ). hs s a property of most kerels. he regularzato term: f ( x ) = a / l, where a are the Fourer coeff. of f ( x ) H coeff. of exp( -x pealzes hgh freq. more tha low freq. f smoothess! ) Fall 004 Patter Recogto for Vso

Regularzato SVM For the hge loss fucto V ( f ( x ), y ) = (- yf ( x )) t ca be show that the regularzato problem s equvalet to the SVM problem: m (- yf ( x )) + + l f H = K troducg slack varables x = - yf ( x ) = = f + we ca rewrte: m x + l f, subject to: yf ( x ) - x, ad x 0 " f H K It ca be show that ths s equvalet to the SVM problem (up to b ): SVM: m w + C x C = /( l ) w, b subject to: y ( xw + b ) - x, x 0 " Fall 004 Patter Recogto for Vso

SVM Summary SVMs are maxmum marg classfers. Oly trag pots close to the boudary (support vectors) occur the SVM soluto. he SVM problem s covex, the soluto s global ad uque. SVMs ca hadle o-separable data. o-lear separato the put space s possble by projectg the data to a feature space. All calculatos ca be doe the put space (kerel trck). SVMs are kow to perform well hgh dmesoal problems wth few examples. Depedg o the kerel, SVMs ca be slow durg classfcato SVMs are bary classfers. ot effcet for problems wth large umber of classes. Fall 004 Patter Recogto for Vso

Lterature. Haste, R. bshra, J. Fredma: he Elemets of Statstcal Learg, Sprger, 00: LDA, QDA, extesos to LDA, SVM & Regularzato. C. Burges: A utoral o SVM for Patter Recogto, 999: Learg heory, SVM. R. Rfk: Everythg Old s ew aga: A fresh Look at Hstorcal Approaches Mache Learg, 00: SVM trag, SVM multclass.. Evgeou, M. Potl,. Poggo: Regularzato etworks ad SVMs, 999: SVM & Regularzato. V. Vapk: he ature of Statstcal Learg, 995: Statstcal learg theory, SVM. Fall 004 Patter Recogto for Vso

Homework Classfcato problem o the IS hadwrtte dgts data volvg PCA, LDA ad SVMs. PCA code wll be posted today Fall 004 Patter Recogto for Vso