Algebraic and topological perspectives on semi-supervised learning

Size: px

Start display at page:

Download "Algebraic and topological perspectives on semi-supervised learning"

Caren Morgan Hutchinson
5 years ago
Views:

1 Algebraic and topological perspectives on semi-supervised learning Mikael Vejdemo-Johansson and Primoz Skraba Jozef Stefan Institute 1

2 Learning First introduce the types of learning problems Unsupervised (exploratory) Semi-supervised Supervised 2

3 Supervised Learning Given fully-labelled data, learn a classifier, a mapping, etc. Example Given document correspondences, learn a map between corpora 3

4 Supervised Learning Wikipedia Treat documents as vectors I have some relevant pictures if we want to elaborate Corpora are low dimensional subspaces Find map between spaces Least-squares or related optimization techniques Assumes a model - linear space representation Parameter fitting 4

5 Unsupervised Learning Exploratory data analysis Here we need to just go over the idea of clustering/structure of data... The structure of the data Canonical example: clustering 5

6 Semi-Supervised Learning broad definition of semisupervised Between supervised and unsupervised Use underlying structure of data together with limited label data 6

7 Semi-Supervised Learning Example example: given two classes find a classifier i give you 2 labelled points Find a classifier 7

8 Semi-Supervised Learning Example a reasonable choice Find a classifier 8

9 Semi-Supervised Learning Example Find a classifier what if my unlabeled data looks like this? is that still a reasonable choice 9

10 Semi-Supervised Learning Example maybe this is better Find a classifier 10

11 Goals What is the topological structure/interpretation of semi-supervised learning? How can we take advantage of this? Topics of the talk concentrate on first one 11

12 Main Idea Separate structure and label information - if both are correct they should be coherent 12

13 Clustering and Topologysome examples and justification needed. Clusters are 0-dimensional homology groups Connected components of an underlying topological structure Points belonging to the same cluster are close to each other are connected to each other 13

14 Structure of the Data can be combined with previous slide We assume data takes on the shape of some collection of observations with some measure of similarity We will build clustering schemes that use persistent homology 14

15 These act as relations... gluing pieces together - Structure of the Labels set up next slide on quotient spaces Given: data set X set of admissible labels L partial labeling function X L Labels will act as relations, gluing clusters to each other, and as a consistency measure, helping find clustering parameters 15

16 Combining Labels with Data Quotient spaces are a natural model...i will draw an example first thing tomorrow for nnclassifiers. Quotient spaces in topology are spaces with some points forcibly identified. Quotient relation dictates which parts are fused together Here, we glue points that are tagged with same labels 16

17 Presentations Algebraic structure of quotient spaces 0 R G M 0 Persistent homology produces a vector space with extra structure: a persistence module. Persistence modules can be presented with generators and relations Generators ~ data points Relations ~ clustering co-occurrence relations Labels produce additional relations to those generated by persistent homology 17

18 Injectivity is crucial Information in the Map 0 R G M 0 L Map from labels to clusters must be injective Only one type of label can map into a generator/cluster 18

19 Example good example 19

20 Example clustering 20

21 Example labels and clustering 21

22 Example injective map 22

23 Example non-injective map 23

24 Example to get injectivity we may need a finer clustering 24

25 Example bad case - the structure of the data and the labels are very incoherent To resolve this we need a very fine clustering 25

26 Two Clustering Schemes Distance-based examples brief descriptions Density-based Both will use persistence 26

27 Distance Filtration All points are introduced at the beginning Edge x y is introduced when the filtration parameter exceeds d(x,y), possibly fusing two clusters 27

28 Density Filtration Points are introduced in decreasing order of estimated density Globally fixed k Each point is at introduction connected to its k nearest neighbors possibly fusing clusters 28

29 Stability Both these approaches obey Stability theorems: If points perturb by at most epsilon, the persistence diagrams perturb by at most epsilon. This will be important to estimating reliability of labels. 29

30 Example: Four Gaussians first example 4 gaussians 30

31 Single Linkage Persistence Diagram 20 single linkage

32 only yields outliers with 4 clusters Single Linkage Persistence Diagram

33 Single Linkage Persistence Diagram

34 metric incoherent with labels... Single Linkage Error Rate 250 Error vs. Number of Clusters (Single Linkage) Error Rate Number of Clusters 34

35 Density-based Persistence 35

36 Density-based Persistence Persistence Diagram

37 Density-based Persistence Persistence Diagram more separated diagram - good sign

38 Density-based Persistence Persistence Diagram good clustering

39 Density-based Persistence Error Rate 220 Error vs. Number of Clusters (Density Based) Error Rate Number of Clusters error rate drops at 4 rest of it not possible to distinguish 39

40 Uncertainty regions Fix an expected number M of clusters Write p k for the length of the k:th longest persistence bar: the lifetime of that feature. The gap d = pm - pm+1 is a measure of the stability of the M most salient clusters Any perturbation by less than d/2 will create exactly M clusters with persistence at least (pm + pm+1)/2 Canonical bijection of cluster basins across any such perturbation 40

41 Uncertainty Regions if we fix number of clusters we can find unstable points 41

42 Uncertainty Regions 42

43 Soft error rates Given: Data set X, partial label assignments, expected number M of clusters By repeated clustering and perturbation, we can estimate probability of membership in each of the stable basins for each data point. Function p:x [0,1] M. Find partition of X into M clusters: c:x [M] Maximizing x p(x)c(x) the probability of this particular assignment across partitions that separate distinct label assignments Underlying idea: we may have to guess that unstable points have been mis-labelled. 43

44 3 Circles other example 44

45 Single Linkage Persistence Diagram 20 in this case the metric is good

46 Single Linkage Error Rate 80 Error vs. Number of Clusters (Single Linkage) Error Rate Number of Clusters 46

47 Density-based Persistence Diagram

48 Density-based Bandwidth =. =. = 48

49 Density-based Persistence Diagrams =. =. = 49

50 Density-based Persistence Error Rate 80 Error vs. Number of Clusters (Density Based) 80 Error vs. Number of Clusters (Density Based) Error Rate 50 Error Rate Number of Clusters Number of Clusters =. =. 50

51 Choice of Scale N=5 N=7 51

52 Behavior with 10% Labels 11 Error vs. Number of Clusters (density based 10% labels) 4 Error vs. Number of Clusters (Density Based 10% Labels) Error Rate 6 Error Rate Number of Clusters Number of Clusters 52

53 Behavior with 5% Labels 5 Error vs. Number of Clusters (density based 5% labels) 2 Error vs. Number of Clusters (Density Based 5% Labels) Error Rate 2.5 Error Rate Number of Clusters Number of Clusters 53

54 a lot ca go here Next steps Parameter selection: automatically pick best clustering that maintains injectivity Testing metrics with limited labels Penalize unlabeled data 54

55 Thank you for listening 55

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the