Clustering: K-Means. Machine Learning , Fall Bhavana Dalvi Mishra PhD student LTI, CMU

Size: px

Start display at page:

Download "Clustering: K-Means. Machine Learning , Fall Bhavana Dalvi Mishra PhD student LTI, CMU"

Hector Carr
6 years ago
Views:

1 Clusterg: K-Meas Mache Learg 0-60, Fall 204 Bhavaa Dalv Mshra PhD studet LTI, CMU Sldes are based o materals from Prof. Erc Xg, Prof. Wllam Cohe ad Prof. Adrew Ng

2 Outle What s clusterg? How are smlarty measures defed? Dfferet clusterg algorthms K-Meas Gaussa Mture Models Epectato Mamato Advaced topcs How to seed clusterg? How to choose #clusters Applcato: Gloss fdg for a Kowledge Base 2

3 3 Clusterg

4 Classfcato vs. Clusterg Supervso avalable Usupervsed Learg from supervsed data: eample classfcatos are gve Usupervsed learg: learg from raw ulabeled data 4

5 Clusterg The process of groupg a set of objects to clusters hgh tra-cluster smlarty low ter-cluster smlarty How may clusters? How to detfy them? 5

6 Applcatos of Clusterg Google ews: Clusters ews stores from dfferet sources about same evet. Computatoal bology: Group gees that perform the same fuctos Socal meda aalyss: Group dvduals that have smlar poltcal vews Computer graphcs: Idetfy smlar objects from pctures 6

7 Eamples People Images Speces 7

8 What s a atural groupg amog these objects? 8

9 9 Smlarty Measures

10 What s Smlarty? Hard to defe! But we ow t whe we see t 0 The real meag of smlarty s a phlosophcal questo. Depeds o represetato ad algorthm. For may rep./alg., easer to th terms of a dstace rather tha smlarty betwee vectors.

11 Itutos behd desrable dstace measure propertes DA,B = DB,A Symmetry Otherwse you could clam "Ale loos le Bob, but Bob loos othg le Ale" DA,A = 0 Costacy of Self-Smlarty Otherwse you could clam "Ale loos more le Bob, tha Bob does" DA,B = 0 IIf A= B Idetty of dscerblesects your world that are dfferet, but you caot tell apart. DA,B DA,C + DB,C Tragular Iequalty Otherwse you could clam "Ale s very le Bob, ad Ale s very le Carl, but Bob s very ule Carl"

12 Itutos behd desrable dstace measure propertes DA,B = DB,A Symmetry Otherwse you could clam "Ale loos le Bob, but Bob loos othg le Ale" DA,A = 0 Costacy of Self-Smlarty Otherwse you could clam "Ale loos more le Bob, tha Bob does" DA,B = 0 IIf A= B Idetty of dscerbles Otherwse there are objects your world that are dfferet, but you caot tell apart. DA,B DA,C + DB,C Tragular Iequalty Otherwse you could clam "Ale s very le Bob, ad Ale s very le Carl, but Bob s very ule Carl" 2

13 Dstace Measures: Mows Metrc 3 Suppose two object ad y both have p features The Mows metrc s defed by Most Commo Mows Metrcs r p r y y d,,,,,,, 2 2 p p y y y y ma, dstace "sup", Mahatta dstace, Eucldea dstace p p p y y d r y y d r y y d r

14 A Eample 3 y 4 : 2 : 3: Eucldea dstace : Mahatta dstace : "sup" dstace : ma{ 4, 3} 4. 4

15 Hammg dstace Mahatta dstace s called Hammg dstace whe all features are bary. Gee Epresso Levels Uder 7 Codtos -Hgh,0-Low GeeA GeeB Hammg Dstace : # 0 #

16 Smlarty Measures: Correlato Coeffcet Negatvely correlated Epresso Level Epresso Level Ucorrelated Gee A Gee B 3 Gee B Gee A Tme Tme 6 Epresso Level 2 Tme Postvely correlated Gee B Gee A

17 Smlarty Measures: Correlato Coeffcet 7 Pearso correlato coeffcet Specal case: cose dstace. ad where, 2 2 p p p p p p p y y y y y y y s, y s y y y s,

18 Clusterg Algorthm K-Meas 8

19 9 K-meas Clusterg: Step

20 20 K-meas Clusterg: Step 2

21 2 K-meas Clusterg: Step 3

22 22 K-meas Clusterg: Step 4

23 23 K-meas Clusterg: Step 5

24 K-Meas: Algorthm. Decde o a value for. 2. Itale the cluster ceters radomly f ecessary. 3. Repeat tll ay object chages ts cluster assgmet Decde the cluster membershps of the N objects by assgg them to the earest cluster cetrod cluster arg m j d, j Re-estmate the cluster ceters, by assumg the membershps foud above are correct. 24

25 K-Meas s wdely used practce Etremely fast ad scalable: used varety of applcatos Ca be easly paralleled Easy Map-Reduce mplemetato Mapper: assgs each datapot to earest cluster Reducer: taes all pots assged to a cluster, ad re-computes the cetrods Sestve to startg pots or radom seed talato Smlar to Neural etwors There are etesos le K-Meas++ that try to solve ths problem 25

26 26 Outlers

27 Clusterg Algorthm Gaussa Mture Model 27

28 Desty estmato Estmate desty fucto P gve ulabeled datapots X to X 28 A arcraft testg faclty measures Heat ad Vbrato parameters for every ewly bult arcraft.

29 29 Mture of Gaussas

30 Mture Models A desty model p may be mult-modal. We may be able to model t as a mture of u-modal dstrbutos e.g., Gaussas. Each mode may correspod to a dfferet sub-populato e.g., male ad female. 30

31 Gaussa Mture Models GMMs Cosder a mture of K Gaussa compoets: K p, N, mture proporto mture compoet 3 Ths model ca be used for usupervsed clusterg. Ths model ft by AutoClass has bee used to dscover ew ds of stars astroomcal data, etc.

32 Learg mture models 32 I fully observed d settgs, the log lelhood decomposes to a sum of local terms. Wth latet varables, all the parameters become coupled together va margalato, log log, log ; c p p p D l c p p p D, log, log ; l

33 If we are dog MLE for completely observed data Data log-lelhood MLE C N p p p D log, ; log log,, log, log ; 2 θ l MLE for GMM 33, ; arg ma ˆ, D MLE θ l ; arg ma ˆ, D MLE θ l ; arg ma ˆ, D MLE θ l, ˆ MLE ˆ datapots Number of, Z MLE Gaussa Naïve Bayes

34 34 Learg GMM s are uow

35 35 Epectato Mamato EM

36 Epectato-Mamato EM Start: "Guess" the mea ad covarace of each of the K gaussas Loop 36

37 37

38 Epectato-Mamato EM Start: "Guess" the cetrod ad covarace of each of the K clusters Loop 38

39 The Epectato-Mamato EM Algorthm 39 E Step: Guess values of Z s l t t t t j l P l p j P j p j p w,,,, t j t j N j p t Z P

40 The Epectato-Mamato EM Algorthm 40 # datapots t t w Z P t T t t t t w w t t t w w M Step: Update parameter estmates

41 EM Algorthm for GMM 4 E Step: Guess values of Z s l t t t t j l P l p j P j p j p w,,, t N w Z P t t T t t t t w w t t t w w M Step: Update parameter estmates

42 K-meas s a hard verso of EM 42 I the K-meas E-step we do hard assgmet: I the K-meas M-step we update the meas as the weghted sum of the data, but ow the weghts are 0 or : arg m t t T t t t t t,,

43 Soft vs. Hard EM assgmets GMM K-Meas 43

44 Theory uderlyg EM What are we dog? Recall that accordg to MLE, we ted to lear the model parameters that would mame the lelhood of the data. But we do ot observe, so computg s dffcult! What shall we do? l ; D log p, log p p, c 44

45 45 Ituto behd the EM algorthm

46 Jese s Iequalty For a cove fucto f fε[ ] [f] Smlarly, for a cocave fucto f fε[ ] [f] 46

47 Jese s Iequalty: cocave f fε[ ] [f] 47

48 EM ad Jese s Iequalty fε[ ] [f] 48

49 49 Advaced Topcs

50 How May Clusters? Number of clusters K s gve Partto documets to predetermed #topcs Solve a optmato problem: peale #clusters Iformato theoretc approaches: AIC, BIC crtera for model selecto Tradeoff betwee havg clearly separable clusters ad havg too may clusters 50

51 Seed Choce: K-Meas++ K-Meas results ca vary based o radom seed selecto. K-Meas++ Choose oe ceter uformly at radom amog gve datapots. For each data pot, compute D D = dstace, earest ceter Choose oe ew data pot at radom as a ew ceter P D 2. Repeat Steps 2 ad 3 utl ceters have bee chose. Ru stadard K-Meas wth ths cetrod talato. 5

52 52 Sem-supervsed K-Meas

53 Supervsed Learg Usupervsed Learg Sem-supervsed Learg 53

54 Automatc Gloss Fdg for a Kowledge Base Glosses: Natural laguage deftos of amed ettes. E.g. Mcrosoft s a Amerca multatoal corporato headquartered Redmod that develops, maufactures, lceses, supports ad sells computer software, cosumer electrocs ad persoal computers ad servces... Iput: Kowledge Base.e. a set of cocepts e.g. compay ad ettes belogg to those cocepts e.g. Mcrosoft, ad a set of potetal glosses. Output: Caddate glosses matched to relevat ettes the KB. Mcrosoft s a Amerca multatoal corporato headquartered Redmod s mapped to etty Mcrosoft of type Compay. [Automatc Gloss Fdg for a Kowledge Base usg Otologcal Costrats, Bhavaa Dalv Mshra, Eat Mov, Partha Pratm Taludar, ad Wllam W. Cohe, 204, Uder submsso] 54

55 55 Eample: Gloss fdg

56 56 Eample: Gloss fdg

57 57 Eample: Gloss fdg

58 58 Eample: Gloss fdg

59 Trag a clusterg model Frut Compay Test: Ambguous glosses 59 Tra: Uambguous glosses

60 60 GLOFIN: Clusterg glosses

61 6 GLOFIN: Clusterg glosses

62 62 GLOFIN: Clusterg glosses

63 63 GLOFIN: Clusterg glosses

64 64 GLOFIN: Clusterg glosses

65 65 GLOFIN: Clusterg glosses

66 GLOFIN o NELL Dataset SVM Labal Propagato GLOFIN 0 0 Precso Recall F categores, 247K caddate glosses, #tra=20k, #test=227k

67 GLOFIN o Freebase Dataset Precso Recall F SVM Labal Propagato GLOFIN categores, 285K caddate glosses, #tra=25k, #test=260k

68 Summary What s clusterg? What are smlarty measures? K-Meas clusterg algorthm Mture of Gaussas GMM Epectato Mamato Advaced Topcs How to seed clusterg How to decde #clusters Applcato: Gloss fdg for a Kowledge Bases 68

69 Tha You Questos? 69

Unsupervised Learning and Other Neural Networks

Unsupervised Learning and Other Neural Networks CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all