Tema 5: Aprendizaje NO Supervisado: CLUSTERING Unsupervised Learning: CLUSTERING. Febrero-Mayo 2005

Size: px

Start display at page:

Download "Tema 5: Aprendizaje NO Supervisado: CLUSTERING Unsupervised Learning: CLUSTERING. Febrero-Mayo 2005"

Camilla Bridges
6 years ago
Views:

1 Tema 5: Apredzae NO Supervsado: CLUSTERING Usupervsed Learg: CLUSTERING Febrero-Mayo 2005

2 SUPERVISED METHODS: LABELED Data Base Labeled Data Base Dvded to Tra ad Test Choose Algorthm: MAP, ML, K-Nearest LD, SVC NN, Tree,... Trag the algorthm or determg the fucto Evaluatg The Classfer Reducg the space dmeso d: by Lear Methods as PCA, MDA, ICA Reducg the space dmeso d: Feature Selecto (Idepedet Algorthm Mache Learg) 2

3 UNSUPERVISED METHODS: No LABELED Data Base No Labeled Data Base Choose Algorthm: Clusters talzat o E-Step: Classfyg samples M-Step: Updatg Parameters or Evaluatg Crtero Fuctos Reducg the space dmeso d: by Lear Methods as PCA, ICA Reducg the space dmeso d: Feature Selecto (Idepedet Algorthm Mache Learg) 3

4 LOOKING FOR STRUCTURE INSIDE THE DATA Parametrc Methods: They assume some f.d.p. for the clusters. No Parametrc Methods: Formal Clusterg Procedures 4

5 INDEX (Parametrc Methods) MIXTURE DENSITIES AND IDENTIFIABILITY 2 MAXIMUM LIKELIHOOD ESTIMATES: EM 3 K-Meas Clusterg 5

6 MIXTURE DENSITIES AND IDENTIFIABILITY Assumptos:. The samples come from a kow umber c of classes 2. Pror probabltes for each class are kow (Mxg Parameters). 3. The form of the classcodtoal probabltes destes are kow 4. The values for parameters are ukow 5. The category labels are ukow: UNSUPERVISED { ω } Pr ; =.. c f x ω, θ { x ω, } θ θ ; =.. c 6

7 MIXTURE DENSITIES AND IDENTIFIABILITY MIXTURE DENSITY: f xθ c { xθ} f { xω, θ } Pr{ ω } = = x ω, θ. For the momet t s assumed that oly parameter vector θ s ukow. 2. Necessary codtos for detfablty: f xθ { x θ} f { x θ' } xθ θ θ = : θc 7

8 MIXTURE DENSITIES AND IDENTIFIABILITY Example: detfablty problem: BINARY (SYMMETRIC) CHANNEL P 0 + P = P θ0 = Pr{ x = 0ω0} = Pr { ω }( bt = 0) x = { ω } P = Pr ( bt = ) { x ω } θ = Pr = 0 { x ω } θ = Pr = { x ω } θ = Pr = 0 0 x = 8

9 MIXTURE DENSITIES AND IDENTIFIABILITY Example: detfablty problem: BINARY (SYMMETRIC) CHANNEL Parameter Vector: θ θ 0 = θ MIXTURE DENSITY (PROBABILITY) { } x( ) x x x θ = Pθ θ + Pθ ( θ ) x Pr

10 2 MAXIMUM LIKELIHOOD ESTIMATES Lkelhood of the statstcal depedet observed D samples: = { x }, x2,.. x D xk k k k k = k = c ( θ) = ( x θ) ; = l ( ) x x θ f D f l f k ( x θ) ( x θ ) Pr ( ) f = x f ω ω k xk k = Assumg statstcal depedece betwee lθ =, θ { ˆ} ( ˆ ) θ, ML Pr solutos ω, s oe l of the multple, 0;.. xk θ θ f x x k k ω θ = = c k = solutos of: 0

11 2 MAXIMUM LIKELIHOOD ESTIMATES Demo:, l ( ) k = f x θ ( x θ) θ k= fx k = k = k = θ θ x x k k k k k ( x θ) k ( x θ) = f = f x c f (, ) x x Pr( ) k k ω θ ω = ( f (, ) Pr( )) x xk ω θ ω = θ k { ω x θ} ( x ω θ ) Pr, l f, = 0; =.. c k θ x k k k

12 2 MAXIMUM LIKELIHOOD ESTIMATES Geeralzg to the ukow pror probablty case: (No demo s cluded here). To compute pror probablty estmates 2. To compute vector parameter estmates 3. To compute codtoed probablty for classes. k = { } { } ω ˆ ˆ = ω xk θ Pr ˆ Pr, k = { ω ˆ} ( ˆ ) xk θ θ x xk ω θ ˆPr, l f, ; =.. c { ω x θˆ } ˆPr, = k c = f x f k x k ( x ˆ ) ˆ k ω, θ Pr{ ω} k ( x ˆ ) ˆ k ω, θ Pr{ ω} 2

13 2 MAXIMUM LIKELIHOOD ESTIMATES For Gaussa Dstrbutos: 2 Σ l, l d k (2 ) 2 π ( ( )) T fx xk ω θ = 2 ( x k µ ) Σ ( x k µ ) Parameters to estmate: ( ) θ= θ,..; θ θ = c ( µ, Σ ) 3

14 2 MAXIMUM LIKELIHOOD ESTIMATES ML s solved applyg the SOFT Expectato-Maxmzato algorthm: Soft Assgmet. Iteratos stop whe the p.d.f. does ot vary.. Expectato (E-Step) { ω x θˆ } ˆPr, 2. Maxmzato (M-Step) = k c { ω} ˆ { ω } = 2 2 ( ( ) ( )) T ˆ ˆ 2 Pr{ ω } ˆ ˆ exp ˆ k k Σ x µ Σ x µ ( ( ) ( )) T ˆ ˆ 2 Pr{ ω } ˆ ˆ exp ˆ k k Σ x µ Σ x µ { ω ˆ} { ω ˆ} { ω ˆ}( )( ) Pr ˆ x, θ x Pr ˆ x, θ x µ ˆ x µ ˆ Pr ˆ Pr ˆ, ; ; ˆ k k k k k k= k= ˆ = xk θ µ = Σ = k = Pr ˆ, Pr ˆ ˆ xk θ { ω xk, θ} k= k= T 4

15 2 MAXIMUM LIKELIHOOD ESTIMATES Pb: Startg Pot: Bra Images; Full DataBase vs Labeled Data Base Pr>0.95 µ [ ] [ ] { ω } 0; dm: dx 0; dm: dxd Pr =,2,3 5

16 2 MAXIMUM LIKELIHOOD ESTIMATES E-step: For a gve x k estmate: { ω x θˆ } ˆPr, = k c = f x f k x ( x ˆ ) ˆ k ω, θ Pr{ ω} ˆ ( x, ) Pr ˆ { } k k ω θ ω µ [ ] µ [ ] µ 3 [ ] 2 x M-STEP: Parameters are updated (ML estmato) 6

17 3. K-Meas Clusterg HARD Classfcato: Smplfcato of the ML (EM) estmates for a Normal Multvarable (Optmum for CASE Multvarable Gaussa Varable see wth MAP). 2 θ = µ Σ = I σ Pr ˆ, { ω x µ ˆ} k ( x µ ˆ ) ( x µ ˆ ) de k, < de k, ; = 0 other { } { } ω ˆ ˆ ω xk θ Pr ˆ = Pr, = µ ˆ = x k = k = k k Cetrod 7

18 3. K-Meas Clusterg K-Meas Clusterg 8

19 3. K-Meas Clusterg K-Meas Clusterg 9

20 3. K-Meas Clusterg K-Meas Clusterg 20

21 3. K-Meas Clusterg K-Meas Clusterg 2

22 Bra Images 22

23 Bra Images: K-Meas Dfferet Startg Pots 23

24 Bra Images: Expectato- Maxmzato Dfferet Startg Pots 24

25 Bra Images: NN 25

26 3. K-Meas Clusterg J µ ˆ APPLICATION: Vector Quatzato of a -dmesoal real valued vector. See: Proaks: Dgtal Commucatos Chapter 3: Source Codg. FUZZY K-Meas Soft Classfcato. b s a free bledg parameter = ( Pr ˆ { ω x, θˆ }) d ( x, µ ˆ ) Fuzzy k e = = ( ˆPr { ω, ˆ} ) k k k = = b k = c b ( ˆPr { ω, ˆ} ) xk θ b x θ x [ ] { ω ˆ } xk θ ˆPr, x µ k 26

27 INDEX: Formal Clusterg Procedures INTRODUCTION: FORMAL CLUSTERING PROCEDURES 2 SIMILARITY MEASURES 3 CRITERION FUNCTIONS 4 ITERATIVE OPTIMIZATION 5 CONCLUSIONS 27

28 . INTRODUCTION Clusters may form clouds of pots a d-dmesoal space. Normal Dstrbuto: Sample Mea ad Sample Covarace Matrx form a Suffcet Statstcs Mea Sample m: Locates de Ceter of gravty of the cloud ad t best represets all of the data the sese of mmzg the sum of squared dstaces from m to the samples. Sample Covarace Matrx C: deotes the amout the data scatters alog varous drectos aroud m. 28

29 . INTRODUCTION Sample mea vector ad Sample Covarace Matrx are t a sufccet statstcal a geeral case: Dstrbutos wth detcal Mea ad Covarace: m N N = x N k = k C N = k k x m x m N k = ( )( ) T 29

30 . INTRODUCTION Formal Clusterg Procedures: Two Key Steps Data are grouped clusters or groups of data pots that posses strog teral smlartes. A Crtero Fucto s used to seek the groupg that extremzes t. To evaluate the parttog of a set of samples to clusters, the smlarty s measured betwee samples. 30

31 2. SIMILARITY MEASURES Smlarty s measured usg dstace betwee samples Example: Eucldea dstace d(x, x ). d ( [ ] [ ]) 2 d ( x, x ) = x x = x x e = Two samples belogs to the same cluster f d(x, x )<d o. Threshold d o s crtcal. 3

32 2. SIMILARITY MEASURES Dstace threshold affects the umber ad sze of clusters: typcal wth clusters dstace < d < typcal betwee clusters dstace 0 32

33 2. SIMILARITY MEASURES Eucldea dstace d. Clusters are varat to Rotato. Clusters are varat to Traslato. Clusters are varat to Lear Trasformatos geeral. 33

34 2. SIMILARITY MEASURES Normalzato pror to clusterg. Each feature s traslated to have zero mea Each feature s scaled to have ut varace. (These two prevous actos are recommeded wth Neural Nets). PCA Prcpal Compoets Aalyss (Axes cocde wth the egevectors of the sample covarace matrx). AFTER NORMALIZATION AND PCA, CLUSTERS ARE INVARIANT TO DISPLACEMENTS, SCALE CHANGE AND ROTATIONS. 34

35 2. SIMILARITY MEASURES Other Metrcs. Mkowsk Dstace d dq( x, x ) = x x = ( [ ] [ ]) q q Mahalaobs Dstace 2 T d M ( x, x ) = x x Σ x x ( ) ( ) 35

36 2. SIMILARITY MEASURES Smlarty Fuctos: It compares two vectors T (, ) s e x x = x x x x It s varat to Rotato ad Dlato It s o varat to traslato ad geeral lear trasformato 36

37 2. SIMILARITY MEASURES If the foud clusters are used to a posteror problem of classfcato: Metrc (dstace) s used as classfcato crtera or Smlarty fucto s used as classfcato crtera 37

38 3. CRITERION FUNCTIONS Crtero Fuctos for Clusterg: Ital Set D = x, x2,..., x { } Partto to exactly c subsets. D, D,..., D 2 c Obectve: To fd the partto that extremzes the crtero fucto 38

39 3. CRITERION FUNCTIONS 3. Crtero Fucto Sum Of Squared Error Crtero: J e c = x m = x D 2 m s the best represetatve of the samples D. It s approprated whe the clusters form compact clouds ad uform umber of samples per cluster. 39

40 3. CRITERION FUNCTIONS Related Mmum Varace Crtera: J e c e 2 2 = x D x' D J = s; s = x x' 2 Suggesto to obta other crtero fucto: s d s s s s = max xx D e( xx, '); = e( xx, '); = m xx D e( xx, ');, ' 2, ' x D x' D 40

41 3. CRITERION FUNCTIONS 3.2 Scatter Crtera: Mea Vectors ad Scatter matrces used clusterg crtera m = x x D Mea Vector for the cluster c Total mea vector m= x= m x D = Scatter matrx for the I cluster S = ( )( ) Wth-cluster scatter matrx x m x m Betwee-cluster scatter matrx Total Scatter Matrx S = x D c S W = c T B = ( )( ) = T T = ( )( ) = W + B x D S m m m m S x m x m S S T 4

42 3. CRITERION FUNCTIONS 3.2 Scatter Crtera: TRACE CRITERION It measures the square of the scatterg radus Mmze the trace of the Wth Cluster Scatter c c Matrx [ ] [ ] 2 Tr SW = Tr S = x m = Je = = x D It results fucto J e. It s equvalet to maxmze betwee cluster scatterg matrx trace. [ S ] = [ S ] [ S ] Tr Tr Tr W T B Tr [ ] c S = m m B = 2 42

43 3. CRITERION FUNCTIONS 3.2 Scatter Crtera: DETERMINANT CRITERION It measures the square of the scatterg volume. S B s sgular f c<=d; rak(s B )<=c- S W s sgular f -c<d Assumg >d+c c J d = SW = S = It o chages f the axes are scaled 43

44 3. CRITERION FUNCTIONS 3.2 Scatter Crtera: Ivarat Crtera Egevalues of v(s W )S B are varat to osgular lear trasformatos of the data. max : Tr S = ; S = d d W W SB λ + λ = ST = Proposed Crtera m : They are equvalet for c=2 d f = ST S W = + = J Tr λ 44

45 3. CRITERION FUNCTIONS 3.2Ivarat Crtera Demo: S v B W ( λ ) λ λ λ T W + λ,..,,.., d egevalues( W B) + λ,.., + λ,.., + λ = egevalues( ST SB) = λ S v = S S d ( λ ) S v = S v + S v = λs v + S v = + S v T B W W W W v = + S S v T W S S v = v 45

46 3. CRITERION FUNCTIONS 3.2 Scatter Crtero: Ivarat Crtera Trace Crtera. Determat Crtera. Ivarat Crtera. 46

47 CLUSTERING PROCEDURES CONCLUSIONS Uderlyg Model: assumes that samples form c farly well separated clouds of pots. S W measures the compactess of these clouds. Problem: Computatoal complexty to evaluate the overall umber of possbltes parttog s mpractcable. 47

48 4 ITERATIVE OPTIMIZATION Drect parttog: c /c! Practcal soluto: Itate wth some reasoable partto ad to move samples from oe group to aother f such a move wll mprove the value of the crtero fucto. It guaratees local but ot global optmzato. 48

49 4 ITERATIVE OPTIMIZATION Iteratve Improvemet to mmze the sum of squared error crtero J e. Effectve error per cluster J. c = = x J J ; J e = x D A sample s moved from cluster to cluster. xˆ D xˆ D m xˆ m xˆ m m * = m + ; m* = m + = + ; = 2 49

50 50 4 ITERATIVE OPTIMIZATION Icreasg / Decreasg Effectve error per cluster (DEMOSTRAR COMO EJERCICIO) ( ) 2 2 * ˆ * * ˆ ˆ ˆ D D J J + + = + = + = + + x x x m x m x m x m x m x m

51 5 4 ITERATIVE OPTIMIZATION Icreasg / Decreasg Effectve error per cluster (DEMOSTRAR COMO EJERCICIO) ( ) 2 2 * ˆ * * ˆ ˆ ˆ D D J J = = + = x x x m x m x m x m x m x m

52 4 ITERATIVE OPTIMIZATION The sample moved from cluster to cluster s advatageous f 2 ˆ + x m > xˆ m 2 52

53 4 ITERATIVE OPTIMIZATION BASIC ITERATIVE MINIMUM SQUARED ERROR CLUSTERING 53

54 4 ITERATIVE OPTIMIZATION 54

55 7 CONCLUSIONS Whe uderlyg dstrbuto comes from a mxture of compoet destes descrbed by a set of ukow parameters, these parameters ca be estmated by Bayesa or ML (EM_algorthm) methods. Clusterg s a more geeral approach. 55

56 7 CONCLUSIONS: OTHER TOPICS Herarchcal methods to reveal clusters ad sub-clusters: Taxoomy. Estmato of the umber of clusters Self-Orgazg feature Maps: SOFM They preserve eghborhoods to reduce dmesoalty (Kohoe Maps). 56

57 Laboratory Classes Práctca 0: Observacó de base de datos Bra, Gauss. Práctca : Aplcacó de métodos MAP (ldc,qdc) sobre GAUSS. Práctca 2: Aplcacó de métodos MAP (ldc,qdc) sobre PHONEME, SPAM. Práctca 3: Aplcacó de PCA y MDA sobre GAUSS. Práctca 4: ICA como separacó cega de fuetes de audo Práctca 5: k-nearest Negbour ZIP. (Práctca 6: Dscrmate Leal (LMS-MMSE y Perceptro) sobre GAUSS y ZIP). Práctca 7: (NN,Decsó Trees ad K-meas) MULTILAYER NEURAL NETWORKS, TREE CLASSIFIERS ad UNSUPERVISED Methods appled to PET ad Magetc Resoace BRAIN Images. 57

Unsupervised Learning and Other Neural Networks

Unsupervised Learning and Other Neural Networks CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all