Random Subspace NMF for Unsupervised Transfer Learning

Size: px

Start display at page:

Download "Random Subspace NMF for Unsupervised Transfer Learning"

Angela Butler
5 years ago
Views:

1 Random Subspace NMF for Unsupervised Transfer Learning Ievgen Redko & Younès Bennani Université Paris 13 - Institut Galilée - Sorbonne Paris Cité Laboratoire d'informatique de Paris-Nord - CNRS (UMR 7030)

transfer learning aims to help improve the learning performance in DT using knowledge gained from DS and TS, where

2 What is Transfer Learning? w Transfer learning n Given a source domain DS and a learning task TS, a target domain DT and a target task TT, transfer learning aims to help improve the learning performance in DT using knowledge gained from DS and TS, where DS DT and TS TT. w Subspace paradigm n Simultaneously cluster the data into multiple subspaces to find a lower-dimensional subspace fitting each group of points. Source domain? Target domain Transfer Learning 2

3 Transfer Learning vs Traditional ML 3

4 What is Matrix factorization? Nonnegative Matrix Factorization Théorème : K-means NMF 4

5 Preliminary knowledge w Standard NMF X FG T, X R m n, F R m k, G R k n w Convex NMF (column vectors of F lie within the column space of X) X XWG T, X R m n, W R n k, G R k n w Multilayer NMF (we build up a system that has many layers or cascade connection of L mixing subsystems) X F 1 G 1, X R m n, F 1 R m k, G 1 R k n G i 1 F i G i i =1...L X F 1 F 2...F L G L. 5

6 Our approach: RS-NMF Clustering of the target task X T X T W T G T T M { X ssi } i=1 Knowledge Decomposition task in the source M { G i } i=1 k { G i } i=1 Select KNN among them with respect to a target partition = N k (G T ) link matrices between them k { W i } i=1 k { P T,{ W i } i=1 } Final factorization using link matrices X T P T W 1... W k G * T w w w w w Find initial partition and prototype matrix of the target task Build a sequence of partitions in different subspaces of a source task ( knowledge decomposition ) Find k nearest neighbors among them with respect to a target partition Find link matrices between them Use these link matrices to perform a final factorization 6

7 Initialization w Let us consider two tasks T S and T T defined by two matrices X S and X T w We perform Convex NMF on X T X T X T W T G T T n G T is an initial partition n P T = X T W T is a matrix of basis vectors that are linear combinations of the original data points. 7

8 Knowledge decomposition w Choose randomly m features of X S and perform any arbitrary type of NMF on the M sequence of the reduced matrices { X ssi } i=1 w Obtain a sequence of partition matrices that were calculated on the subspaces of X. M { G i } i=1 8

9 Random Subspace NMF Purity values

10 Defining neighborhood w Simply use any arbitrary similarity measure (any divergence measure or just a simple correlation function) to find k nearest neighbors of target task s partition G T. k { G i } i=1 = N k (G T ) w We use a simple correlation function given by the following expression corr(x,y ) = cov(x,y ) σ X σ Y 10

11 Learning link matrices w At this step we take each of the chosen matrices and perform the NMF of the following form: G i W i G i *, G i R k n, W i R k k, G * i Rk n i =1...k. w The idea behind constructing this sequence of link k matrices { W i } i=1 is that they capture the relationships between clusters and thus reflect the structure of a data set. 11

12 Final decomposition w Finally we have a sequence of matrices { k P T, W } i { } i=1 w Performing Multilayer NMF of the following form X T P T W 1... W k G * T gives the final partition G T*. 12

13 Evaluation criteria n Dunn s index (k denotes the number of clusters, i and j are cluster labels, d(c i, c j ) defines the between-cluster distance between clusters X i and X j ; d(x k ) represents the within-cluster of X k. Dunn = min 1 i k "( " d(c min i, c j ) %%( # # && $( $ max 1 k k (d(x k ))''( n Calinski-Harabasz index (S B is a between-cluster scatter matrix, S W is the internal scatter matrix, n p is a number of clustered samples and k is a number of clusters.) CH = trace(s B ) trace(s W ) n p 1 n p k 13

14 Dunn s index for transfer between different data sets 14

15 Calinski-Harabasz index for transfer between different data sets 15

16 WHY DOES IT WORK? Sparse Matrix Factorization [B. Neyshabur and R. Panigrahy, 2013] For a given binary matrix Y minimizing the total sparsity of the following decomposition Y = sign(x 1 sign(x 2 sign( X n ))) is equal to the computations in a deep neural network where each X i corresponds to the i th layer. Learning link matrices can be seen as learning nonnegative encoders between target and chosen partitions (i.e. injecting auxiliary knowledge in the corresponding layer of a deep neural network)

17 WHY DOES IT WORK ON THESE DATA SETS? Common assumption: transfer learning is useful only for closely related data sets [Rosenstein et al., 2004]. Is it really so?! Transfer learning using Kolmogorov complexity [M. M. Mahmud and S. R. Ray 2007] Performing transfer learning between data sets with a very tenuous connection Introducing an optimal transfer learning algorithm based on the universal distance to measure the relatedness between tasks

18 Future extensions of the algorithm w Multitask transfer learning extension of the algorithm w Finding possible range of suboptimal values of k beforehand depending on the minimum correlation level w Introducing additional constraints on link matrices (regularization terms, orthogonality constraints etc.) 18

19 Thank you for you attention! Feel free to ask questions if you have any. 19

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering