Instance-Based Learning and Clustering

Size: px

Start display at page:

Download "Instance-Based Learning and Clustering"

Tyler McDonald
5 years ago
Views:

1 Instane-Based Learnng and Clusterng R&N 04, a bt of 03

2 Dfferent knds of Indutve Learnng Supervsed learnng Bas dea: Learn an approxmaton for a funton y=f(x based on labelled examples { (x,y, (x,y,, (x n,y n } Eg Deson Trees, Bayes lassfers, Instane-based learnng methods Unsupervsed learnng Instane-based learnng Idea: For every test data pont, searh database of tranng data for smlar ponts and predt aordng to those ponts

3 Instane-based learnng Idea: For every test data pont, searh database of tranng data for smlar ponts and predt aordng to those ponts Four elements of an nstane-based learner: How do we defne smlarty? How many smlar data ponts (neghbors do we use? (Optonal What weghts do we gve these neghbors? How do we predt usng these neghbors? One-nearest-neghbor (-NN Smplest Instane-based learnng method Four elements of -NN: How do we defne smlarty? Euldan dstane metr How many smlar data ponts (neghbors do we use? one (Optonal What weghts do we gve these neghbors? unused How do we predt usng these neghbors? predt the same value as the nearest neghbor 3

4 -NN Predton Classfaton (predtng dsrete-valued labels lass A lass B -NN Predton Classfaton (predtng dsrete-valued labels lass A lass B p p Test pont Predton p p 4

5 -NN Predton Classfaton (predtng dsrete-valued labels lass A lass B p p Test pont Predton p p -NN Predton Classfaton (predtng dsrete-valued labels Three lasses: Bakground olor ndates predton n dfferent areas Sold lnes are deson boundares between lasses [gnore the dashed purple lne] 5

6 -NN Predton Regresson (predtng real-valued labels K-nearest-neghbor (K-NN A generalzaton of -NN to multple neghbors Four elements of K-NN: How do we defne smlarty? Euldan dstane metr How many smlar data ponts (neghbors do we use? K (Optonal What weghts do we gve these neghbors? unused How do we predt usng these neghbors? Classfaton: predt maorty label among neghbors Regresson: predt average value among neghbors 6

7 K-NN Predton Classfaton (K=3 lass A lass B p p Test pont Predton p p K-NN Predton Classfaton (K=3 lass A lass B p p Test pont Predton p p 7

8 K-NN Predton Classfaton (K=5 Three lasses: Bakground olor ndates predton n dfferent areas Sold lnes are deson boundares between lasses [gnore the dashed purple lne] K-NN Predton K= K=5 8

9 K-NN Predton Regresson (wth K=9 K-NN Predton K= K=9 9

10 Example: Reognton of handwrtten dgts 30 pxels 600-dmensonal data pont 0 pxels Example: Reognton of handwrtten dgts N sets of handwrtten dgt samples 0xN 600-dmensonal tranng ponts New handwrtten sample lassfed by K-NN n 600-dmensonal spae on tranng data (Eah olor represents samples of a partular dgt 0

11 K-NN vs other tehnques Most Instane-based methods work only for real-valued nputs Instane-based methods do not need a tranng phase, unlke deson trees and Bayes lassfers However, the nearest-neghbors-searh step an be expensve for large/hgh-dmensonal datasets Instane-based learnng s non-parametr, e no pror model assumptons No foolproof way to pre-selet K must try dfferent values and pk one that works well Problems of dsontnutes and edge effets n K-NN regresson an be addressed by ntrodung weghts for data ponts that are proportonal to loseness Unsupervsed Learnng (aka Clusterng Unsupervsed learnng Bas dea: Learn an approxmaton for a funton y=f(x based on unlabelled examples { x, x,, x n } The goal s to unover dstnt lasses of data ponts (lusters, whh mght then lead to a supervsed learnng senaro Eg K-means, herarhal lusterng The followng sldes are adapted from Andrew Moore s sldes at

12 K-means Even f we have no labels for a data set, there mght stll be nterestng struture n the data n the form of dstnt lusters/lumps K-means s an teratve algorthm to fnd suh lusters gven the assumpton that exatly K lusters exst K-means Ask user how many lusters they d lke (eg k=5

13 K-means Ask user how many lusters they d lke (eg k=5 Randomly guess k luster Center loatons K-means Ask user how many lusters they d lke (eg k=5 Randomly guess k luster Center loatons 3 Eah datapont fnds out whh Center t s losest to (Thus eah Center owns a set of dataponts 3

14 K-means Ask user how many lusters they d lke (eg k=5 Randomly guess k luster Center loatons 3 Eah datapont fnds out whh Center t s losest to 4 Eah Center fnds the entrod of the ponts t owns K-means Ask user how many lusters they d lke (eg k=5 Randomly guess k luster Center loatons 3 Eah datapont fnds out whh Center t s losest to 4 Eah Center fnds the entrod of the ponts t owns 5 and umps there 6 Repeat untl termnated! 4

15 K-means Questons What s t tryng to optmze? Are we sure t wll termnate? Are we sure t wll fnd an optmal lusterng? How should we start t? Dstorton Gven an enoder funton: ENCODE : R m [k] a deoder funton: DECODE : [k] R m Defne Dstorton = R ( x DECODE[ ENCODE( x ] = 5

16 Dstorton Gven an enoder funton: ENCODE : R m [k] a deoder funton: DECODE : [k] R m Defne Dstorton = ( x DECODE[ ENCODE( x ] = We may as well wrte DECODE[ ] = R so Dstorton = R = ( x ENCODE ( x The Mnmal Dstorton Dstorton = = What propertes must enters,,, k have when dstorton s mnmzed? R ( x ENCODE ( x 6

17 The Mnmal Dstorton ( Dstorton = = What propertes must enters,,, k have when dstorton s mnmzed? ( x must be enoded by ts nearest enter why? R ( x ENCODE ( x ENCODE ( x = arg mn ( x {,, } at the mnmal dstorton k The Mnmal Dstorton ( R Dstorton = ( x ENCODE ( x = What propertes must enters,,, k have when dstorton s mnmzed? ( x must be enoded by ts nearest enter Otherwse dstorton ould be why? redued by replang ENCODE[x ] by the nearest enter ENCODE ( x = arg mn ( x {,, } at the mnmal dstorton k 7

18 The Mnmal Dstorton ( Dstorton = = What propertes must enters,,, k have when dstorton s mnmzed? ( The partal dervatve of Dstorton wth respet to eah enter loaton must be zero R ( x ENCODE ( x ( The partal dervatve of Dstorton wth respet to eah enter loaton must be zero Dstorton Dstorton = = = = = R = k ENCODE( x ( x = OwnedBy( ( x ( x OwnedBy( ( x OwnedBy( 0 (for a mnmum OwnedBy( = the set of reords owned by Center 8

19 9 ( The partal dervatve of Dstorton wth respet to eah enter loaton must be zero mnmum 0 (for a ( ( Dstorton ( ( Dstorton OwnedBy( OwnedBy( OwnedBy( ENCODE( = = = = = = = k R x x x x x = OwnedBy( OwnedBy( x Thus, at a mnmum: At the mnmum dstorton What propertes must enters,,, k have when dstorton s mnmzed? ( x must be enoded by ts nearest enter ( Eah Center must be at the entrod of ponts t owns = = R ( ENCODE ( Dstorton x x

20 Improvng a suboptmal onfguraton Dstorton = R = What propertes an be hanged for enters,,, k have when dstorton s not mnmzed? ( x ENCODE ( x Improvng a suboptmal onfguraton Dstorton = What propertes an be hanged for enters,,, k have when dstorton s not mnmzed? ( Change enodng so that x s enoded by ts nearest enter ( Set eah Center to the entrod of ponts t owns There s no pont applyng ether operaton twe n suesson But t an be proftable to alternate And that s K-means! R = ( x ENCODE ( x 0

21 Wll we fnd the optmal onfguraton? Not neessarly Can you nvent a onfguraton that has onverged, but does not have the mnmum dstorton? Wll we fnd the optmal onfguraton? Not neessarly Can you nvent a onfguraton that has onverged, but does not have the mnmum dstorton?

22 Tryng to fnd good optma Idea : Be areful about where you start Idea : Do many runs of k-means, eah from a dfferent random start onfguraton Many other deas floatng around Other dstane metrs Note that we ould have used the Manhattan dstane metr nstead of the one above If so, Dstorton = R = x ENCODE ( x How would you fnd the dstorton-mnmzng enters n ths ase?

23 Example: Image Segmentaton One K-means s performed, the resultng luster enters an be thought of as K labelled data ponts for -NN on the entre tranng set, suh that eah data pont s labelled wth ts nearest enter Ths s alled Vetor Quantzaton Example: Image Segmentaton One K-means s performed, the resultng luster enters an be thought of as K labelled data ponts for -NN on the entre tranng set, suh that eah data pont s labelled wth ts nearest enter Ths s alled Vetor Quantzaton 3

Example: Image Segmentaton One K-means s performed, the resultng luster enters an be thought of as K labelled data ponts for -NN on the entre tranng set, suh that eah data pont s labelled wth ts

24 Example: Image Segmentaton One K-means s performed, the resultng luster enters an be thought of as K labelled data ponts for -NN on the entre tranng set, suh that eah data pont s labelled wth ts nearest enter Ths s alled Vetor Quantzaton Vetor quantzaton on pxel ntenstes Vetor quantzaton on pxel olors Common uses of K-means Often used as an exploratory data analyss tool In one-dmenson, a good way to quantze realvalued varables nto k non-unform bukets Used on aoust data n speeh understandng to onvert waveforms nto one of k ategores (e Vetor Quantzaton Also used for hoosng olor palettes on old fashoned graphal dsplay deves! 4

25 Sngle Lnkage Herarhal Clusterng Say Every pont s ts own luster Sngle Lnkage Herarhal Clusterng Say Every pont s ts own luster Fnd most smlar par of lusters 5

26 Sngle Lnkage Herarhal Clusterng Say Every pont s ts own luster Fnd most smlar par of lusters 3 Merge t nto a parent luster Sngle Lnkage Herarhal Clusterng Say Every pont s ts own luster Fnd most smlar par of lusters 3 Merge t nto a parent luster 4 Repeat 6

27 Sngle Lnkage Herarhal Clusterng Say Every pont s ts own luster Fnd most smlar par of lusters 3 Merge t nto a parent luster 4 Repeat Sngle Lnkage Herarhal Clusterng How do we defne smlarty between lusters? Mnmum dstane between ponts n lusters Maxmum dstane between ponts n lusters Average dstane between ponts n lusters You re left wth a ne dendrogram, or taxonomy, or herarhy of dataponts Say Every pont s ts own luster Fnd most smlar par of lusters 3 Merge t nto a parent luster 4 Repeat untl you ve merged the whole dataset nto one luster 7

28 Herarhal Clusterng Comments It s ne that you get a herarhy nstead of an amorphous olleton of groups If you want k groups, ust ut the (k- longest lnks 8

Clustering. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University

Clustering. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University Clusterng CS4780/5780 Mahne Learnng Fall 2012 Thorsten Joahms Cornell Unversty Readng: Mannng/Raghavan/Shuetze, Chapters 16 (not 16.3) and 17 (http://nlp.stanford.edu/ir-book/) Outlne Supervsed vs. Unsupervsed