Multi-Label Informed Latent Semantic Indexing

Size: px

Start display at page:

Download "Multi-Label Informed Latent Semantic Indexing"

Aubrey Walton
5 years ago
Views:

Multi-Label Informed Latent Semantic Indexing Shipeng Yu 12 Joint work with Kai Yu 1 and Volker Tresp 1 August 2005 1 Siemens Corporate

1 Multi-Label Informed Latent Semantic Indexing Shipeng Yu 12 Joint work with Kai Yu 1 and Volker Tresp 1 August Siemens Corporate Technology Department of Neural Computation 2 University of Munich Institute for Computer Science First Prev Page 1 Next Last Go Back Full Screen Close Quit

2 Outline Motivation Latent Semantic Indexing Multi-label Informed Latent Semantic Indexing (MLSI) Experimental Results Conclusion and Future works First Prev Page 2 Next Last Go Back Full Screen Close Quit

3 Motivation We are dealing with high-dimensional data in information retrieval. A typical text corpus has more than 10,000 features (words as features)! What are the problems? Noisy features: Effective features are small Learnability: curse of dimensionality Inefficiency: Computational cost is too high How to solve these problems? Dimensionality Reduction Feature selection: Select part of the features Latent semantic indexing (LSI): Learn a feature transformation from high-dimensional input space to a low-dimensional latent space First Prev Page 3 Next Last Go Back Full Screen Close Quit

4 Why MLSI LSI is unsupervised: Unable to use prior knowledge or label information The indexing is not necessarily related to classification tasks We want to have a feature transformation method that can Incorporate label information elegantly Derive both linear and non-linear mappings Explore the dependency between multiple categories This leads to Multi-label informed Latent Semantic Indexing (MLSI). First Prev Page 4 Next Last Go Back Full Screen Close Quit

5 Before We Start... Some notations: We have N documents Document i is denoted as x i X R M Output for the ith document is denoted as y i Y R L X R N M, Y R N L contain the input and output data as follows: X = x 11 x 1M... = xt 1., Y = y 11 y 1L... = yt 1. x N1 x NM x T N y N1 y NL y T N We aim to derive a mapping Ψ : X V such that V R K, K < M First Prev Page 5 Next Last Go Back Full Screen Close Quit

6 Outline Motivation Latent Semantic Indexing Multi-label Informed Latent Semantic Indexing Experimental Results Conclusion and Future works First Prev Page 6 Next Last Go Back Full Screen Close Quit

7 Latent Semantic Indexing LSI finds the best rank-k approximation to the data matrix X. This can be equivalently solved by singular value decomposition (SVD) of X: X = VΣU T We can sort diagonal entries of Σ in decreasing order U = [u 1,..., u K ] gives the K mapping directions Problem: How to incorporate label information into the mappings? First Prev Page 7 Next Last Go Back Full Screen Close Quit

8 Optimization Problem of LSI Alternatively, LSI minimizes the reconstruction error of input data: min A,V X VA 2 F s.t. V T V = I, with V R N K the latent factors, and A R K M the factor loadings. First Prev Page 8 Next Last Go Back Full Screen Close Quit

9 MLSI In MLSI we are minimizing the reconstruction errors of both X and Y: min A,B,V s.t. (1 β) X VA 2 F + β Y VB 2 F V T V = I, V = XW. MLSI is biased by the outputs Y MLSI minimizes the inter-correlation between X and Y MLSI minimizes the intra-correlation within Y (if multiple outputs) First Prev Page 9 Next Last Go Back Full Screen Close Quit

10 Outline Motivation Latent Semantic Indexing Multi-label Informed Latent Semantic Indexing Primal form: Linear mappings Dual form: Non-linear mappings Experimental Results Conclusion and Future works First Prev Page 10 Next Last Go Back Full Screen Close Quit

11 Solution of MLSI The optimization problem is min A,B,V s.t. (1 β) X VA 2 F + β Y VB 2 F V T V = I, V = XW. Following standard Lagrange formulism, we obtain, at the optimum, A and B solely depend on V: A = V T X, B = V T Y. Denote K := (1 β)xx T + βyy T, the minimum value is N i=k+1 λ i. We only need to optimize W since V = XW. First Prev Page 11 Next Last Go Back Full Screen Close Quit

12 MLSI: Primal Form Denote W = [w 1,..., w K ], we turn to an equivalent problem w.r.t. w: max w R M w T X T KXw s.t. w T X T Xw = 1. This leads to the primal form of the MLSI solution: Calculate K = (1 β)xx T + βyy T ; Solve a generalized eigenvalue problem X T KXw = λx T Xw, obtain eigenvectors w 1,..., w K with largest K eigenvalues λ 1... λ K ; Form mapping functions ψ j (x) = λ j w T j x, j = 1,..., K, and finally Ψ(x) = [ψ 1 (x),..., ψ K (x)] T defines the mapping Ψ. MLSI recovers LSI when β = 0. First Prev Page 12 Next Last Go Back Full Screen Close Quit

13 MLSI: Dual Form Dual form is obtained by applying representer theorem and define dual variable α as w = X T α. This leads to the equivalent dual form with respect to α: max α R N α T K x KK x α s.t. α T K 2 xα = 1. K x = XX T, K y = YY T, K = (1 β)k x + βk y. This is a simpler problem for N < M. First Prev Page 13 Next Last Go Back Full Screen Close Quit

14 Primal versus Dual Which form to choose in real world applications? Primal MLSI solves an M M generalized eigenvalue problem more efficient when M < N can only learn a linear mapping for X Dual MLSI solves an N N generalized eigenvalue problem more efficient when N < M (usually true for text data) can learn non-linear mappings using kernel trick First Prev Page 14 Next Last Go Back Full Screen Close Quit

15 Connection to Related Work MLSI is more general to other supervised projection methods. Fisher Discriminant Analysis (FDA) Only deal with binary classification problem Can only handle one output Canonical Correlation Analysis (CCA) Only minimize the correlation between X and Y Ignore intrinsic correlations of both X and Y Partial Least Square (PLS) A penalized CCA Can not generalize well to new data First Prev Page 15 Next Last Go Back Full Screen Close Quit

16 Outline Motivation Latent Semantic Indexing Multi-label Informed Latent Semantic Indexing Experimental Results Conclusion and Future works First Prev Page 16 Next Last Go Back Full Screen Close Quit

17 Experiment Setup The Goal: Evaluate indexing methods for multi-label classification. Data sets Reuters-21578: 1600 documents with 6076 words, 47 categories RCV1: 3588 documents with 5496 words, 79 categories Preprocessing Take categories with at least 50 documents Pick up words that occur at least 5 times in documents Use TFIDF features First Prev Page 17 Next Last Go Back Full Screen Close Quit

18 Methodology We compare three methods: Full Features: Use all features to do classification LSI: Classification with new unsupervised features MLSI: Classification with new supervised features We test two settings for each data set: Setting (I): We pick up 70% categories for classification and employ 5-fold cross-validation with one fold training and 4 folds testing Setting (II): Evaluate the classification performance on the rest 30% categories for previously unseen data with newly derived features First Prev Page 18 Next Last Go Back Full Screen Close Quit

19 Results for Reuters MLSI is significantly better than LSI. First Prev Page 19 Next Last Go Back Full Screen Close Quit

20 Results for RCV1 MLSI is significantly better than Full Features in setting (II). First Prev Page 20 Next Last Go Back Full Screen Close Quit

21 Sensitivity of β for MLSI setting (I) setting (II) First Prev Page 21 Next Last Go Back Full Screen Close Quit

22 Outline Motivation Latent Semantic Indexing Multi-label Informed Latent Semantic Indexing Experimental Results Conclusion and Future works First Prev Page 22 Next Last Go Back Full Screen Close Quit

23 Conclusion MLSI has the following advantages: It is supervised and incorporates label information It considers both the inter-correlation between X and Y, and the intracorrelation of Y Both linear and non-linear mappings are easy to derive It handles multiple outputs simultaneously It takes LSI as a special case (when β = 0) Experimental results are very encouraging. First Prev Page 23 Next Last Go Back Full Screen Close Quit

24 Future Works Compare with other supervised projection methods Automatically set parameter β Try larger data sets Apply the indexing to information retrieval tasks First Prev Page 24 Next Last Go Back Full Screen Close Quit

Multi-Label Informed Latent Semantic Indexing

Multi-Label Informed Latent Semantic Indexing Kai Yu, Shipeng Yu, Volker Tresp Siemens AG, Corporate Technology, Information and Communications, Munich, Germany Institute for Computer Science, University