Unsupervised Learning and Other Neural Networks

CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all our trag samples were labeled: these samples were sad supervsed We ow vestgate a umber of usupervsed procedures whch use ulabeled samples Collectg ad Labelg a large set of sample patters ca be costly We ca tra wth large amouts of (less epesve) ulabeled data, ad oly the use supervso to label the groupgs foud, ths s approprate for large data mg applcatos where the cotets of a large database are ot ow beforehad Usupervsed Learg ad Other Neural Networs

CSE 53 Soft Computg Ths s also approprate may applcatos whe the characterstcs of the patters ca chage slowly wth tme (such as automated food classfcato as the seasos chage) Improved performace ca be acheved f classfers rug a usupervsed mode are used We ca use usupervsed methods to detfy features (through clusterg) that wll the be useful for categorzato (or classfcato) We ga some sght to the ature (or structure) of the data Mture Destes ad Idetfablty We shall beg wth the assumpto that the fuctoal forms for the uderlyg probablty destes are ow ad that the oly thg that must be leared s the value of a uow parameter vector We mae the followg assumptos: The samples come from a ow umber c of classes The pror probabltes P(ω ) for each class are ow (,,c) P( ω, θ ) (,,c) are ow but mght be dfferet The values of the c parameter vectors θ, θ,, θ c are uow Usupervsed Learg ad Other Neural Networs

CSE 53 Soft Computg The category labels are uow P( θ) 64748 ). P( ω ) destes c compoet P( ω, θ t where θ [ θ, θ,..., θ ] Ths desty fucto s called a mture desty c 3 mg parameters Our goal wll be to use samples draw from ths mture desty to estmate the uow parameter vector θ. Oce θ s ow, we ca decompose the mture to ts compoets ad use a MAP classfer o the derved destes Defto A desty P( θ) s sad to be detfable (or ectve) f θ θ mples that there ests a such that: P( θ) P( θ ) As a smple eample, cosder the case where s bary ad P( θ) s the followg mture: P( θ) θ ( θ) + θ ( θ ) ( θ + θ ) f - ( θ + θ ) f 0 Assume that: P( θ) 0.6 P( 0 θ) 0.4 by replacg these probabltes values, we obta: θ + θ. Usupervsed Learg ad Other Neural Networs 3

CSE 53 Soft Computg Thus, we have a case whch the mture dstrbuto s completely udetfable, ad therefore usupervsed learg s mpossble I the dscrete dstrbutos, f there are too may compoets the mture, there may be more uows tha depedet equatos, ad detfablty ca become a serous problem! Whle t ca be show that mtures of ormal destes are usually detfable, the parameters the smple mture desty P( ω ) P( θ) ep ( θ π P( ω ) + ep ( θ π ) ) Caot be uquely detfed f P(ω ) P(ω ) (we caot recover a uque θ eve from a fte amout of data!) θ (θ, θ ) ad θ (θ, θ ) are two possble vectors that ca be terchaged wthout affectg P( θ) Idetfablty ca be a problem, we always assume that the destes we are dealg wth are detfable! Usupervsed Learg ad Other Neural Networs 4

CSE 53 Soft Computg ML Estmates Suppose that we have a set D {,, } of ulabeled samples draw depedetly from the mture desty p( θ) (θ s fed but uow!) θ c p( ω, θ )P( ω) θˆ argmap(d θ) wth p(d θ) p( θ) The gradet of the log-lelhood s: θ l P( ω, θ) θ lp( ω, θ ) Sce the gradet must vash at the value of θ that mamzes l (l the ML estmate ˆθ lp( θ, therefore, )) must satsfy the codtos P( ω, θˆ) ˆ,..., c) θ lp( ω, θ) 0 ( By cludg the pror probabltes as uow varables, we fally ca show that: Pˆ ( ω) ad Pˆ ( ω where : Pˆ ( ω Pˆ ( ω, θˆ), θˆ) θ, θˆ) lp( c p( p( ω, θˆ ) 0 ω, θˆ )Pˆ ( ω ) ω, θˆ )Pˆ ( ω ) Ths equato eables clusterg Usupervsed Learg ad Other Neural Networs 5

CSE 53 Soft Computg Applcatos to Normal Mtures p( ω, θ ) ~ N(µ, Σ ) Case µ Σ P(ω ) c 3 Case Smplest case Case : Uow mea vectors µ θ,, c lp( ω ML estmate of µ (µ ) s: [ π ] d / / t ) ( µ ), µ ) l ( ( µ ) µ ˆ P( ω P( ω, µ ˆ ), µ ˆ ) Weghted average of the samples Comg from the -th class () P( ω, µ ˆ ) s the fracto of those samples havg value that come from the th class, ad average of the samples comg from the th class. ˆµ s the Usupervsed Learg ad Other Neural Networs 6

CSE 53 Soft Computg Ufortuately, equato () does ot gve ˆµ eplctly However, f we have some way of obtag good tal estmates µ ˆ (0) for the uow meas, therefore equato () ca be see as a teratve process for mprovg the estmates µ ˆ ( + ) P( ω P( ω, µ ˆ()), µ ˆ()) Ths s a gradet ascet for mamzg the loglelhood fucto Eample (Class): Cosder the smple two-compoet oe-dmesoal ormal mture p( µ, µ ) ep ( µ ) + ep ( µ ) 3 π 3 π ( clusters!) Let s draw 5 samples sequetally from ths mture (see Table p.54: wth ω ) wth µ -, ad µ The log-lelhood fucto s: l( µ, µ ) lp( µ, µ ) Usupervsed Learg ad Other Neural Networs 7

CSE 53 Soft Computg The mamum value of l occurs at: µ ˆ.30 ad µ ˆ.668 (whch are ot far from the true values: µ - ad µ +) There s aother pea at ˆ ˆ µ.085 ad µ.57 whch has almost the same heght as ca be see from the followg fgure: Usupervsed Learg ad Other Neural Networs 8

CSE 53 Soft Computg Ths mture of ormal destes s detfable Whe the mture desty s ot detfable, the ML soluto s ot uque Case : All parameters uow No costrats are placed o the covarace matr Let p( µ, σ ) be the two-compoet ormal mture: p( µ, σ ) µ ep π. σ σ + ep π Suppose µ, therefore: p( µ, σ ) + ep π σ π For the rest of the samples: p( µ, σ ) ep π p( Fally, + µ, σ ) ep ep σ ( π) 444 4443,..., ths term σ 0 The lelhood s therefore large ad the mamum-lelhood soluto becomes sgular. Usupervsed Learg ad Other Neural Networs 9

CSE 53 Soft Computg Addg a assumpto Cosder the largest of the fte local mama of the lelhood fucto ad use the ML estmato. We obta the followg: Iteratve scheme Pˆ( ω ) µ ˆ Σˆ Pˆ( ω Pˆ( ω Pˆ( ω Pˆ( ω, θˆ), θˆ) Pˆ( ω, θˆ), θˆ)( µ ˆ )(, θˆ) t µ ˆ ) Where: Pˆ ( ω, θˆ) c Σ / Σˆ / ep ( ep ( t µ ˆ ) Σˆ ( µ ˆ ) Pˆ ( ω) t µ ˆ ) Σˆ ( µ ˆ ) Pˆ ( ω) K-Meas Clusterg Goal: fd the c mea vectors µ, µ,, µ c Replace the squared Mahalaobs dstace t ( µ ˆ ) Σˆ ( µ ˆ ) by the squared Eucldea dstace µ ˆ ˆµ m Fd the mea earest to ad appromate f m Pˆ ( ω, θˆ) as: Pˆ ( ω, θ) 0 otherwse Usupervsed Learg ad Other Neural Networs 0

CSE 53 Soft Computg µ ˆ, µ,..., µ Use the teratve scheme to fd ˆ ˆ c f s the ow umber of patters ad c the desred umber of clusters, the -meas algorthm s: Beg talze, c, µ, µ,, µ c (radomly selected) do classfy samples accordg to earest µ recompute µ utl o chage µ I retur µ, µ,, µ c Ed Other Neural Networs: Compettve Learg Networs (wer-tae-all) w Output wth hghest Actvato s selected Iput uts actvato value a of Output 3 T T a w w w w 3 ( t + ) w() t + η( () t w()) t w () t + η( () t w ()) t 3 4 Output Uts Weghts Update Oly for the wer Output Usupervsed Learg ad Other Neural Networs

CSE 53 Soft Computg Weght vectors move towards those area where most put appears The weght vectors become the cluster ceters The weght update fds the cluster ceters The followg topcs ca be cosdered by the studets for ther oral presetatos Kohoe Self-Orgazg Networs Learg Vector Quatzato Usupervsed Learg ad Other Neural Networs