CS 536: Machne Learnng Nonparamerc Densy Esmaon Unsupervsed Learnng - Cluserng Fall 2005 Ahmed Elgammal Dep of Compuer Scence Rugers Unversy CS 536 Densy Esmaon - Cluserng - 1 Oulnes Densy esmaon Nonparamerc kernel densy esmaon Mure Denses Unsupervsed Learnng - Cluserng: Herarchcal Cluserng K-means Cluserng Mean Shf Cluserng Specral Cluserng Graph Cus Applcaon o Image Segmenaon CS 536 Densy Esmaon - Cluserng - 2
Densy Esmaon Paramerc: Assume a sngle model for p ( C (Chaper 4 and 5 Semparamerc: p ( C s a mure of denses Mulple possble eplanaons/prooypes: Dfferen handwrng syles, accens n speech Nonparamerc: No model; daa speaks for self (Chaper 8 CS 536 Densy Esmaon - Cluserng - 3 Nonparamerc Densy Esmaon Densy Esmaon: Gven a sample S={ } =1..N from a dsrbuon oban an esmae of he densy funcon f ( a any pon. Paramerc : Assume a paramerc densy famly f (. θ, (e. N(µ,σ 2 and oban he bes esmaor θ of θ Advanages: Effcen Robus o nose: robus esmaors can be used Problem wh paramerc mehods An ncorrecly specfed paramerc model has a bas ha canno be removed even by large number of samples. Nonparamerc : drecly oban a good esmae f ( of he enre densy f ( from he sample. Mos famous eample: Hsogram CS 536 Densy Esmaon - Cluserng - 4
Kernel Densy Esmaon 1950s + (F & Hodges 51, Rosenbla 56, Parzen 62, Cencov 62 Gven a se of samples S={ } =1..N we can oban an esmae for he densy a as: N N 1 1 f ( = K ( = K Nh h N = 1 = 1 h ( CS 536 Densy Esmaon - Cluserng - 5 N N 1 1 f ( = K ( = K Nh h N = 1 = 1 ( where K h (=K(/h/h called kernel funcon (wndow funcon h : scale or bandwdh K sasfes ceran condons, e.g.: K ( d h =1 K h ( 0 h CS 536 Densy Esmaon - Cluserng - 6
Kernel Esmaon A varey of kernel shapes wh dfferen properes. Gaussan kernel s ypcally used for s connuy and dfferenably. Mulvarae case: Kernel Produc Use same kernel funcon wh dfferen bandwdh h for each dmenson. General form: avod o sore all he samples N f ( = α K = 1 h ( 1 f ( = N N d = 1 = 1 K h ( CS 536 Densy Esmaon - Cluserng - 7 Kernel Densy Esmaon Advanages: Converge o any densy shape wh suffcen samples. asympocally he esmae converges o any densy. No need for model specfcaon. Unlke hsograms, densy esmaes are smooh, connuous and dfferenable. Easly generalze o hgher dmensons. All oher paramerc/nonparamerc densy esmaon mehods, e.g., hsograms, are asympocally kernel mehods. In many applcaons, he denses are mulvarae and mulmodal wh rregular cluser shapes. CS 536 Densy Esmaon - Cluserng - 8
Eample: color clusers Cluser shapes are rregular Cluser boundares are no well defned. rom omancu and eer ean shf robus approach oward feaure space analyss CS 536 Densy Esmaon - Cluserng - 9 Converson - KDE Esmaon usng Gaussan Kernel Esmaon usng Unform Kernel CS 536 Densy Esmaon - Cluserng - 10
Converson - KDE Esmaon usng Gaussan Kernel Esmaon usng Unform Kernel CS 536 Densy Esmaon - Cluserng - 11 Scale selecon Imporan problem. Large leraure. Small h resuls n ragged denses. Large h resuls n over smoohng. Bes choce for h depends on he number of samples: small n, wde kernels large n, Narrow kernels lm h( n = 0 n CS 536 Densy Esmaon - Cluserng - 12
Opmal scale Opmal kernel and opmal scale can be acheved by mnmzng he mean negraed square error f we know he densy! Normal reference rule: h op = (4 / 3 σ n 1.06σ n 1/ 5 1/5 ˆ 1/ 5 CS 536 Densy Esmaon - Cluserng - 13 Scale selecon CS 536 Densy Esmaon - Cluserng - 14
From R. O. Duda, P. E. Har, and D. G. Sork. Paern Classfcaon Wley, New York, 2nd edon, 2000 CS 536 Densy Esmaon - Cluserng - 15 Densy Esmaon Paramerc: Assume a sngle model for p ( C (Chaper 4 and 5 Semparamerc: p ( C s a mure of denses Mulple possble eplanaons/prooypes: Dfferen handwrng syles, accens n speech Nonparamerc: No model; daa speaks for self (Chaper 8 CS 536 Densy Esmaon - Cluserng - 16
Mure Denses p ( = p( G P( G = 1 where G he componens/groups/clusers, P ( G mure proporons (prors, p ( G componen denses k Gaussan mure where p( G ~ N ( µ, parameers Φ = {P ( G, µ, } k =1 unlabeled sample X={ } (unsupervsed learnng CS 536 Densy Esmaon - Cluserng - 17 Classes vs. Clusers Supervsed: X = {,r } Classes C =1,...,K p ( = p( C P( C = 1 where p ( C ~ N ( µ, Φ = {P (C, µ, } K =1 Pˆ ( C S = K r r = m = N r r ( ( m m r T Unsupervsed : X = { } Clusers G =1,...,k p k ( = p( G P( G = 1 where p ( G ~ N ( µ, Φ = {P ( G, µ, } k =1 Labels, r? CS 536 Densy Esmaon - Cluserng - 18
k-means Cluserng Fnd k reference vecors (prooypes/codebook vecors/codewords whch bes represen daa Reference vecors, m, =1,...,k Use neares (mos smlar reference: m = mn m Reconsrucon error E b k ({ m } X = = 1 1 = 0 f m oherwse b m = mn m CS 536 Densy Esmaon - Cluserng - 19 Encodng/Decodng b = 1 f m = mn m 0 oherwse CS 536 Densy Esmaon - Cluserng - 20
k-means Cluserng CS 536 Densy Esmaon - Cluserng - 21 CS 536 Densy Esmaon - Cluserng - 22
Image Clusers on nensy Clusers on color K-means cluserng usng nensy alone and color alone K=5 segmened mage s labeled wh cluser means CS 536 Densy Esmaon - Cluserng - 23 Image Clusers on color K-means usng color alone, 11 segmens CS 536 Densy Esmaon - Cluserng - 24
K-means usng color alone, 11 segmens. CS 536 Densy Esmaon - Cluserng - 25 K-means usng color and poson, 20 segmens CS 536 Densy Esmaon - Cluserng - 26
Herarchcal Cluserng Cluser based on smlares/dsances Dsance measure beween nsances r and s Mnkowsk (L p (Eucldean for p = 2 d m Cy-block dsance p 1/ [ ] p r s d r s (, = = ( 1 d cb r s d r (, = = 1 s CS 536 Densy Esmaon - Cluserng - 27 Herarchcal Cluserng: Agglomerave cluserng cluserng by mergng boom-up Each daa pon s assumed o be a cluser Recursvely merge clusers Algorhm: Make each pon a separae cluser Unl he cluserng s sasfacory Merge he wo clusers wh he smalles ner-cluser dsance Dvsve cluserng cluserng by splng op-down The enre daa se s regarded as a cluser Recursvely spl clusers Algorhm: Consruc a sngle cluser conanng all pons Unl he cluserng s sasfacory Spl he cluser ha yelds he wo componens wh he larges ner-cluser dsance CS 536 Densy Esmaon - Cluserng - 28
Herarchcal Cluserng: Two man ssues: Wha s a good ner-cluser dsance sngle-lnk cluserng: dsance beween he closes elemens -> eended clusers complee-lnk cluserng: he mamum dsance beween elemens > rounded clusers group-average cluserng: Average dsance beween elemens rounded clusers How many clusers are here (model selecon Dendrograms yeld a pcure of oupu as cluserng process connues CS 536 Densy Esmaon - Cluserng - 29 Agglomerave Cluserng Sar wh N groups each wh one nsance and merge wo closes groups a each eraon Dsance beween wo groups G and G : Sngle-lnk: Complee-lnk: d d Average-lnk, cenrod r s ( G, G = mn d (, r G, G r s ( G, G = ma d (, r s G, G s CS 536 Densy Esmaon - Cluserng - 30
Eample: Sngle-Lnk Cluserng Dendrogram CS 536 Densy Esmaon - Cluserng - 31 Choosng k Defned by he applcaon, e.g., mage quanzaon Plo daa (afer PCA and check for clusers Incremenal (leader-cluser algorhm: Add one a a me unl elbow (reconsrucon error/log lkelhood/nergroup dsances Manual check for meanng CS 536 Densy Esmaon - Cluserng - 32
CS 536 Densy Esmaon - Cluserng - 33 Mean Shf Gven a sample S={s :s R n } and a kernel K, he sample mean usng K a pon : m( = s K( s Ieraon of he form m( wll lead o he densy local mode Le s he cener of he wndow Ierae unl converson. Compue he sample mean m( from he samples nsde he wndow. Replace wh m( K( s CS 536 Densy Esmaon - Cluserng - 34
CS 536 Densy Esmaon - Cluserng - 35 Mean Shf Gven a sample S={s :s R n } and a kernel K, he sample mean usng K a pon : Fukunaga and Hosler 1975 nroduced he mean shf as he dfference m(- usng a fla kernel. Ieraon of he form m( wll lead o he densy mode Cheng 1995 generalzed he defnon usng general kernels and weghed daa Recenly popularzed by D. Comancu and P. Meer 99+ Applcaons: Cluserng[Cheng,Fu 85], mage flerng, segmenaon[meer 99] and rackng [Meer 00]. = s K s K s m ( ( ( ( ( ( ( ( s w s K s w s K s m = CS 536 Densy Esmaon - Cluserng - 36 Mean Shf Ieraons of he form m( are called mean shf algorhm. If K s a Gaussan (e.g. and he densy esmae usng K s Usng Gaussan Kernel K σ (, he dervave s we can show ha: he mean shf s n he graden drecon of he densy esmae. = s w s K C P ( ( ˆ( m P P = ( ˆ( ˆ( ( ( 2 K K σ σ σ =
Mean Shf The mean shf s n he graden drecon of he densy esmae. Successve eraons would converge o a local mama of he densy,.e., a saonary pon: m(=. Mean shf s a seepes-ascen lke procedure wh varable sze seps ha leads o fas convergence well-adused seepes ascen. CS 536 Densy Esmaon - Cluserng - 37 CS 536 Densy Esmaon - Cluserng - 38
Mean shf and Image Flerng Dsconnuy preservng smoohng Recall, average or Gaussan flers blur mages and do no preserve regon boundares. Mean shf applcaon: Represen each pel as spaal locaon s and range r (color, nensy Look for modes n he on spaal-range space Use a produc of wo kernels: a spaal kernel wh bandwdh h s and a range kernel wh bandwdh h r K ( k ( s r h, hr hs hr Algorhm: For each pel =( s, r apply mean shf unl converson. Le he converson pon be (y s,y r Assgn z = ( s,y r as fler oupu Resuls: see he paper. s = k CS 536 Densy Esmaon - Cluserng - 39 CS 536 Densy Esmaon - Cluserng - 40
Graph Cu Wha s a Graph Cu: We have undreced, weghed graph G=(V,E Remove a subse of edges o paron he graph no wo dson ses of verces A,B (wo sub graphs: A B = V, A B = Φ CS 536 Densy Esmaon - Cluserng - 42 Graph Cu Each cu corresponds o some cos (cu: sum of he weghs for he edges ha have been removed. cu( A, B = u A, v B w( u, v A B CS 536 Densy Esmaon - Cluserng - 43
Graph Cu In many applcaons s desred o fnd he cu wh mnmum cos: mnmum cu Well suded problem n graph heory, wh many applcaons There ess effcen algorhms for fndng mnmum cus cu( A, B = u A, v B A w( u, v B CS 536 Densy Esmaon - Cluserng - 44 Graph heorec cluserng Represen okens usng a weghed graph Weghs reflecs smlary beween okens affny mar Cu up hs graph o ge subgraphs such ha: Smlary whn ses mamum. Smlary beween ses mnmum. Mnmum cu CS 536 Densy Esmaon - Cluserng - 45
CS 536 Densy Esmaon - Cluserng - 46 Use eponenal funcon for edge weghs d( : feaure dsance w( = e ( d ( / σ 2 CS 536 Densy Esmaon - Cluserng - 47
Scale affecs affny w( = e ( d ( / σ 2 σ=0.1 σ=0.2 σ=1 CS 536 Densy Esmaon - Cluserng - 48 Egenvecors and cluserng Smples dea: we wan a vecor w gvng he assocaon beween each elemen and a cluser We wan elemens whn hs cluser o, on he whole, have srong affny wh one anoher We could mamze w Sum of T w n Awn Assocaon of elemen wh cluser n Affny beween and Assocaon of elemen wh cluser n CS 536 Densy Esmaon - Cluserng - 49
Egenvecors and cluserng We could mamze Bu need he consran Usng Lagrange mulpler λ w T Aw n n T w n wn = 1 Dfferenaon w T n Aw n + n n T λ( w w 1 Aw = λ n w n Ths s an egenvalue problem - choose he egenvecor of A wh larges egenvalue CS 536 Densy Esmaon - Cluserng - 50 Eample egenvecor pons egenvecor mar CS 536 Densy Esmaon - Cluserng - 51
Eample egenvecor pons mar Frs egenvecors The hree egenvecors correspondng o he ne hree egenvalues of he affny mar CS 536 Densy Esmaon - Cluserng - 52 Too many clusers! More obvous clusers egenvalues for hree dfferen scales for he affny mar CS 536 Densy Esmaon - Cluserng - 53
More han wo segmens Two opons Recursvely spl each sde o ge a ree, connung ll he egenvalues are oo small Use he oher egenvecors Algorhm Consruc an Affny mar A Compuer he egenvalues and egenvecors of A Unl here are suffcen clusers Take he egenvecor correspondng o he larges unprocessed egenvalue; zero all componens for elemens already clusered, and hreshold he remanng componens o deermne whch elemen belongs o hs cluser, (you can choose a hreshold by cluserng he componens, or use a fed hreshold. If all elemens are accouned for, here are suffcen clusers CS 536 Densy Esmaon - Cluserng - 54 We can end up wh egenvecors ha do no spl clusers because any lnear combnaon of egenvecors wh he same egenvalue s also an egenvecor. CS 536 Densy Esmaon - Cluserng - 55
Sources R. O. Duda, P. E. Har, and D. G. Sork. Paern Classfcaon. Wley, New York, 2nd edon, 2000 Ehem Alpaydn Inroducon o Machne Learnng Chaper 7 Forsyh and Ponce, Compuer Vson a Modern approach: chaper 14: 14.1,14.2,14.4. Sldes by D. Forsyh @ Berkeley Sldes by Ehem Alpaydn CS 536 Densy Esmaon - Cluserng - 73