Multi-Task Co-clustering via Nonnegative Matrix Factorization

Size: px
Start display at page:

Download "Multi-Task Co-clustering via Nonnegative Matrix Factorization"

Transcription

1 Multi-Task Co-clustering via Nonnegative Matrix Factorization Saining Xie, Hongtao Lu and Yangcheng He Shanghai Jiao Tong University Abstract Recent results have empirically proved that, given several related tasks with different data distributions and an algorithm that can utilize both the task-specific and cross-task knowledge, clustering performance of each task can be significantly enhanced. This kind of unsupervised learning method is called multi-task clustering. We focus on tackling the multi-task clustering problem via a 3-factor nonnegative matrix factorization. The oect of our approach consists of two parts: (1) Within-task co-clustering: co-cluster the data in the input space individually. (2) Cross-task regularization: Learn and refine the relations of feature spaces among different tasks. We show that our approach has a sound information theoretic background and the experimental evaluation shows that it outperforms many state-of-theart single-task or multi-task clustering methods. 1. Introduction Many real-world problems can be seen as a series of related, yet self-contained tasks. One great example, also illustrated in [8], is that given the web pages from several, e.g. four, different universities, we aim to classify (cluster) these pages into different categories, such as Student, Faculty, Project, Course and so on. Considering the categorizing (clustering) work on the dataset of each school as a single task, then we will have four tasks. These tasks are closely related for the similar contents and common vocabulary they share, thus it is natural to think of tackling these tasks simultaneously, rather than follow the more traditional approach of dealing with each task independently of the others. The remaining question is how to handle the relations between different tasks. We cannot simply merge all tasks together and use the traditional methods, due to the difference in data distributions. Above situation leads to an important approach to machine learning, i.e. multi-task learning [2], in which how to characterize task relations becomes the most fundamental concern. Some representative works include [12], [6]. ased on the same philosophy, recently, people began to focus on the unsupervised version of multi-task learning. In [9] and [8], a subspace learning method and a kernel method were proposed to explicitly tackle with the multi-task clustering problem. They are both derived from supervised multi-task learning methods. However, they only use the raw term features for text clustering tasks, which is not sufficient sometimes. Intuitively, instead of using the raw terms and only cluster the instances (documents), we would like to cluster the features (words) at the same time. ecause the feature clusters, representing the concepts, are more stable a- mong different tasks, this idea is called co-clustering. Following the information-theoretic co-clustering approach presented in [4], Self-taught Clustering(STC) [3] was proposed, where the oect is to minimize the loss in mutual information before and after clustering, and utilize the common feature clusters as a bridge for knowledge transfer. STC can be regarded as a domain adaptation method for clustering, which is a similar problem as multi-task clustering. However, STC focuses on only two tasks, i.e. the target data and the auxiliary data, and only target data clustering performance is evaluated. In this paper, based on above observations, we follow the basic idea of information-theoretic co-clustering on both instance space and feature space, and propose a novel algorithm for multi-task clustering. Instead of using the original iterative algorithm for information-theoretic co-clustering, we show that it can also be solved in Nonnegative Matrix Factorization(NMF) framework [10]. ecause the data and their probability distributions in the real world are nonnegative, our model is interpretable comparing to others. Furthermore, many applications make use of the (co- )clustering aspect of NMF for its nice characteristics showed in [15] [17]. In detail, we explored the 3-factor NMF based on KL-Divergence[5]. similar idea was al-

2 so used in [11],[13]. 2. The Proposed Method 2.1 Problem Formulation Suppose we are given several clustering tasks, i.e. X (1),X (2),..., X (i),..., these tasks can be regarded as discrete random variables. The i-th task X (i) takes values from the value set {x (i) 1,..., x(i) n i }, where n i is the number of instances in the i-th task. Let Z (i) be the discrete random variable, taking values from the value set {z (i) 1,..., z(i) d }, that corresponds to the feature space of the i-th task with dimensionality d. We assume the dimensionality of the feature vector of all the tasks is the same. The bag-of-words model used in our experiment will automatically does the augmentation by padding zeros. Denote clustering functions as C x (i) : X (i) X (i) and C z (i) : Z (i) Z (i). X(i) and Z (i) are used to denote these two functions for brevity. The goal of multi-task clustering is to partition the data set X (i) of each task into c clusters { x (i) j }c j=1. We assume that the number of clusters in each task is the same, which is also assumed in existing multi-task literature. 2.2 Oective Function We first review the preliminaries in informationtheoretic co-clustering(itcc). The mutual information I(X; Y ) between two random variables X and Y is a fundamental measure of information X contains about Y(and vice versa). ITCC judge the quality of a coclustering by the resulting loss in mutual information, i.e. I(X; Y ) I( X; Ỹ ) (1) Definition 1. Let p(x, Z) denote the joint probability distribution of X and Z with respect to the co-clusters C x (X) and C z (Z); formally, p(x, y) =p( x, z)p(x x)p(z z); (2) where p( x, z) is defined as p( x, z) = x x p(x, z) (3) z z The proof can be found in [4]. Lemma 2. When co-clustering functions C x (i) (X) and C z (i) (Z) are fixed, the oective function of ITCC in e- quation (1) can be reformulated as I(X; Y ) I( X; Ỹ )=D(p(X, Y ) p(x, Y )) (4) Documen space X1 Word space Z1 Z2 X2 where D( ) denotes the Kullback-Leibler(KL) divergence also known as relative entropy, where D(p q) = p(x) x p(x)log q(x) ased on above preliminaries, in this work, we model our method s oective function as Z4 Z3 J = i [I(X (i) ; Z (i) ) I( X (i) ; Z (i) )] X3 X4 +λ i [I(Z (i) ; Z (j) ) I( Z (i) ; Z (j) )]; (5) y lemma 2, we re-formulate the function as Figure 1. A Simple Illustration of Our Model Let p(x (i),z (i) ) be the joint probability distribution with respect to the i-th task. We denote it as a matrix P (i) of size n i d. Let p(z (i),z (j) ) be the joint probability distribution with respect to the i-th and j-th task. It is a matrix of size d d. It describes the similarity of words in the raw vocabulary. We will use W (i,j) to denote this matrix, in which the entry W w (i,j) 1,w 2 is the joint probability for the co-occurrence of word w 1 and w 2 between task i and j. oth of the two matrices can be estimated from data observations. J = i +λ i [D(p(X (i),z (i) ) p(x (i),z (i) ))] [D(q(Z (i),z (j) ) q(z (i),z (j) ))]; (6) It is noticeable that our model contains two parts, working simultaneously. First, task-specific co-clustering, in which we co-cluster the data individually; second, cross-task regularization, in which we mine and refine the relations between feature clusters from all tasks. According to the information theory, this oective will not only minimize the mutual information (MI) between instance space and feature space of each task, but

3 also minimize the MI between any two feature spaces from different tasks, before and after co-clustering. However, the joint distributions in above equations are difficult to optimize in a traditional way. Note that the key to this optimization task is to find a matrix approximation. Following the idea of [7], we use the multiplicative update rules of 3-factor NMF, for its inborn fitness for our problem: the task-specific co-clustering and cross-task regularization can be both formulated in an NMF framework, thus it is easy to be jointly optimized. We use KL divergence as the error criteria for NMF, thus it is consistent with the information-theoretic background. A 3-factor nonnegative matrix factorization is decomposition of a nonnegative dyadic data matrix P R M D + that takes form P USV T min U 0,S 0,V 0 D(P USVT ) (7) where U R M Kw +, S R Kw K d + and V R D K d +. M is the number of instances, D is the dimensionality of feature vector, Kw and Kd are the number of clustering partitions for features and instances, respectively. Willing to adapt the NMF framework to our oective function where the matrix to be factorized is a joint probability distribution, we let P = 1 in equation (8), thus we define normalizing matrix D U diag(1 T U), D V diag(1 T V), where 1 = [1,...,1] T, then P (UD U 1 )(D U SD V )(VD V 1 ) T (8) Comparing (8) with (2), one can see that the marginal distributions p(x x) and p(z z) are associated with (UD U 1 ) and (VD V 1 ) respectively, and joint distribution p( x, z) is represented by entries of (VD V 1 ). Now we can re-formulate (6) as J = i + λ i D(P (i) U (i) S (i) W V (i) ) D(W (i,j) U (i) S (i,j) U (j) ) Expand the KL divergence J = {P (t) P (t) log [U t (t) S (t) W V (t) ] P (t) +[ S (t) W V (t) ] } + λ {W (t,l) W (t,l) log (10) [U t l=t+1 (t) S (t,l) U (l) ] W (t,l) +[ S (t,l) U (l) ] } s.t. ( ) =1, (V (t) ) =1 i i 2.3 Optimization Algorithm The oect function in (10), though seems complicated at first glance, can be optimized easily by a set of (9) multiplicative updates, derived in a similar way to [14]. We use the iterative normalization technique [18] for the normalization constraints, where the corresponding factor matrices are normalized during the iterative process. Theorem 3 (The multiplicative update rules of N- MFMTCC). V (t) V (t) S W (t) S (t,l) a = P (t) a = V (t) P (t) = / a = V (t) / a (t) = SW = S (t,l) ia [S(t) W V (t) ] ja/[ S (t) W V (t) ] ia + λa a [S(t) W V (t) ] ja + λ ai [ S (t) /[ S (t) W V (t) ] ai a [ S (t) ( ) aj (V (t) ) aj ab P (t) ab ab W (t,l) ab ai V (t) /[ S (t) W V (t) ] ab ab ai V (t) ai U (l) /[ S (t,l) U (l) ] ab ab ai U (l) Suppose we have N tasks, then 1 t < l N. A = t 1 l=1 a W (l,t) ai [U (l) S (l,t) ]aj/[u (l) S (l,t) U (t) ] ai + N l=t+1 a W (t,l) ia [S (t,l) U (l) ] ja/[ S (t,l) U (l) ] ia = t 1 l=1 a [U (l) S (l,t) ]aj + N l=t+1 a [S(t,l) U (l) ] ja We summarize our optimization method in (10) in Algorithm 1. Algorithm 1 Algorithm for NMFMTCC Input: number of document clusters K d, number of words clusters K w, the document-term matrices {P (1), P (2),..., P (N) }, and the trade-off parameter λ. 1. Calculate the joint probability distribution matrices W (t,l) for each task pair (P (t), P (l) ), t<l. 2. Obtaining the initial U (i) and V (i) for each task i by simultaneous K-means clustering of rows and columns, and we set U (i) U (i) +0.2, V (i) V (i) Initialize the S and S W with row-normalized constants. repeat 4. For each task t, update and V (t) ; 5. For each task t, normalize and V (t) ; 6. For each task t, update S W (t) ; 7. For each pair of tasks t and l, t < l, update S (t,l) ; until Convergence Output: The document clustering result V (t) and word clustering result for each task t. 2.4 Correctness of Convergence We will show that the above update rules can correctly converge to a local optimum. We show this by updating V, with fixed U, S and S W. Other update

4 rules can be derived similarly. The oective function of V (t) with other variables fixed is J V (t) = P (t) {P (t) log P (t) [ S W (t) V (t) ] +[ S W (t) V (t) ] } where [ S W (t) V (t) ] = ab U ias W ab V jb. Derivatives with respect to matrix V is thus given by J V (t) V (t) = a [ S (t) P (t) ai [ S (t) [ S (t) W V (t) ] ai The KKT complementarity condition for the nonnegativity of V (t) gives ([ S (t) a P (t) ai [ S (t) [ S W (t) V (t) ] ai )V (t) =0 (11) This is a fixed point relation that local minima for V (t) must satisfy. The KKT condition gives an update rule just the same as the updating rule for V. So the correctness of convergence is guaranteed since the solution converged from the updating rules must satisfy (11). 3. Experimental Evaluations In our experiments, we make a comparison with our proposed multi-task co-clustering method, i.e. N- MFMTCC, with many widely used single-task clustering methods include K-means, Kernel K-means, Normalized Cut [16], Graph-regularized Nonnegative Matrix Factorization(GNMF)[1]. We also present the results of these algorithms on simply merged data, e.g. All Kmeans. More over, we compare our approach with three recent proposed multi-task clustering method, i.e. LNKMTC LSSMTC and LSKMTC[9],[8]. We also e- valuate the clustering performance with different parameter settings. The information theory based Normalized Mutual Information(NMI) is defined as NMI = c i=1 c j=1 ni,j log n i,j n i ˆn j ( c i=1 ni log n i n )( c j=1 ˆnj log ˆn j n ) (12) where n i denotes the number of data contained in the cluster C i (1 i c), ˆn j is the number of data belonging to the G j (1 j c), and n i,j denotes the number of data that are in the intersection between the cluster C i and the class G j. It can measure how similar two clusters are, and is suitable in our context. The larger the NMI is, the better the clustering result will be. We experiment on WebK4 1 data set. It is a subset of WebK data set, which consists of seven classes /www/data/ of web pages collected from computer science departments of four different universities. Frequently, only four classes are used (student, faculty, course, project); thus it is called WebK4. We use Rainbow 2 toolkit for data preprocessing. We set the number of document clusters as the true number of classes for all the clustering algorithms. We use the similar parameter settings for K-means, Kernel K-means and NCut as [8]. The results of LSSMTC, LNKMTC and LSKMTC are quoted directly from literature. For our method, we set K d = 4,K w = 8. The trade-off parameter λ is set to 1 after searching the grid {0, 0.5, 1, 5, 10, 100}. The max iteration number in NMFMTCC is set to 50. For each algorithm and parameter setting, we repeat clustering for 10 times. The average results are shown in Table 1. Normalized Mutual Information task1 task2 task3 task the trade off parameter lambda the number of word clusters Figure 2. The clustering NMI corresponding to different parameter settings x Reconstruction Error On WebK4 Figure 3. the convergence of NMFMTCC Table 1. Results on Web-K4 Dataset # Methods task1 task2 task3 task4 NMI(%) NMI(%) NMI(%) NMI(%) Kmeans 23.53± ± ± ±0.00 KKM 21.65± ± ± ±3.43 NCut 25.99± ± ± ±0.77 GNMF 24.76± ± ± ±0.17 All Kmeans 22.58± ± ± ±0.79 All KKM 22.05± ± ± ±2.86 All NCut 24.12± ± ± ±0.34 All GNMF 26.16± ± ± ±5.59 LSSMTC 33.69± ± ± ± 0.96 LNKMTC 36.34± ± ± ±1.21 LSKMTC 40.85± ± ± ±1.16 NMFMTCC(0) 31.53± ± ± ±4.34 NMFMTCC 41.49± ± ± ± Discussions With the experimental result, we have the following conclusions: 1) As can be seen, simply clustering the data of all the tasks together (e.g. All Kmeans, All KKM and All NCut) does not necessarily improve the clustering result, because the data distributions of different tasks are not the same, and combining 2 mccallum/bow/

5 the data together directly violates the i.i.d. assumption in single-task clustering. 2) our proposed NMFMTC- C method performs better than other methods, mainly due to the following reasons: We use the idea of coclustering to cluster columns and rows simultaneously while other methods only consider about clustering in the instance space. We generalize our model to pairwise knowledge transfer between task-specific feature spaces, i.e. any two feature spaces is connected during the optimization procedure. In the optimization stage, we adopt the NMF framework, which leads to a interpretable solution as well as good performance. Another point of view is that NMF for co-clustering problem can be seen as a soft version of information-theoretic co-clustering, thus shares similar or even better characteristics. 3) The clustering accuracy corresponding to different trade-off parameter λ is showed in Figure 1. The detailed clustering performance when λ = 0 is reported in Table 1 as a method named NMFMTC- C(0), note that with λ =0, our method degenerates to a co-clustering method on each task individually. The clear drop of the NMI proves that the cross-task regularization in our method can indeed help in a multi-task clustering problem. 4 Conclusions and Future work We proposed a Multi-task Co-clustering via Nonnegative Matrix Factorization (NMFMTCC) method. NMFMTCC follows the idea of the well-known information-theoretic co-clustering, but in an matrix factorization framework. We optimize an oective function which consists of two parts, i.e. task-specific co-clustering and cross-task feature space regularization. Experimental result shows significant improvement over other related methods. esides the scenario of text clustering, we will try to exploit the multi-task clustering idea to more applications in the future, such as Collaborative Filtering and clustering based Natural Image Classification Acknowledgements The work was supported by NSFC (No ), the National High Technology Research and Development Program of China (No. 2008AA02Z310) and 973 (No. 2009C320901) References [1] D. Cai, X. He, J. Han, and T. Huang. Graph regularized nonnegative matrix factorization for data representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(8): , [2] R. Caruana. Multitask learning. Machine Learning, 28(1):41 75, [3] W. Dai, Q. Yang, G. Xue, and Y. Yu. Self-taught clustering. In Proceedings of the 25th international conference on Machine learning, pages ACM, [4] I. Dhillon, S. Mallela, and D. Modha. Informationtheoretic co-clustering. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, [5] C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix t-factorizations for clustering. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, [6] T. Evgeniou and M. Pontil. Regularized multi task learning. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, [7] E. Gaussier and C. Goutte. Relation between plsa and nmf and implications. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages ACM, [8] Q. Gu, Z. Li, and J. Han. Learning a kernel for multi-task clustering. In Twenty-Fifth AAAI Conference on Artificial Intelligence, [9] Q. Gu and J. Zhou. Learning the shared subspace for multi-task clustering and transductive transfer classification. In Data Mining, ICDM 09. Ninth IEEE International Conference on, pages Ieee, [10] D. Lee, H. Seung, et al. Learning the parts of oects by non-negative matrix factorization. Nature, 401(6755): , [11] T. Li, V. Sindhwani, C. Ding, and Y. Zhang. ridging domains with words: opinion analysis with matrix trifactorizations. Proceedings of the 10th SDM, [12] A. Lindbeck and D. Snower. Multi-task learning and the reorganization of work. Journal of Labor Economics, 18(3): , [13] M. Long, W. Cheng, X. Jin, J. Wang, and D. Shen. Transfer learning via cluster correspondence inference. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pages IEEE, [14] D. Seung and L. Lee. Algorithms for non-negative matrix factorization. Advances in neural information processing systems, 13: , [15] F. Shahnaz, M. erry, V. Pauca, and R. Plemmons. Document clustering using nonnegative matrix factorization. Information Processing & Management, 42(2): , [16] J. Shi and J. Malik. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8): , [17] W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages ACM, [18] F. Zhuang, P. Luo, H. Xiong, Q. He, Y. Xiong, and Z. Shi. Exploiting associations between word clusters and document classes for cross-domain text categorization? Statistical Analysis and Data Mining, 2010.

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization Samir Al-Stouhi Chandan K. Reddy Abstract Researchers have attempted to improve the quality of clustering solutions through

More information

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Bin Gao Tie-an Liu Wei-ing Ma Microsoft Research Asia 4F Sigma Center No. 49 hichun Road Beijing 00080

More information

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Nonnegative Matrix Factorization Clustering on Multiple Manifolds Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,

More information

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

Cholesky Decomposition Rectification for Non-negative Matrix Factorization Cholesky Decomposition Rectification for Non-negative Matrix Factorization Tetsuya Yoshida Graduate School of Information Science and Technology, Hokkaido University N-14 W-9, Sapporo 060-0814, Japan yoshida@meme.hokudai.ac.jp

More information

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,

More information

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing Computational Statistics and Data Analysis 52 (2008) 3913 3927 www.elsevier.com/locate/csda On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing Chris

More information

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding, Xiaofeng He, Horst D. Simon Published on SDM 05 Hongchang Gao Outline NMF NMF Kmeans NMF Spectral Clustering NMF

More information

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning LIU, LU, GU: GROUP SPARSE NMF FOR MULTI-MANIFOLD LEARNING 1 Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning Xiangyang Liu 1,2 liuxy@sjtu.edu.cn Hongtao Lu 1 htlu@sjtu.edu.cn

More information

Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method

Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, hi-square Statistic, and a Hybrid Method hris Ding a, ao Li b and Wei Peng b a Lawrence Berkeley National Laboratory,

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION Ehsan Hosseini Asl 1, Jacek M. Zurada 1,2 1 Department of Electrical and Computer Engineering University of Louisville, Louisville,

More information

Nonnegative Matrix Factorization

Nonnegative Matrix Factorization Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization

Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization Fuzhen Zhuang Ping Luo Hui Xiong Qing He Yuhong Xiong Zhongzhi Shi Abstract Cross-domain text categorization

More information

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space

More information

Non-negative matrix factorization with fixed row and column sums

Non-negative matrix factorization with fixed row and column sums Available online at www.sciencedirect.com Linear Algebra and its Applications 9 (8) 5 www.elsevier.com/locate/laa Non-negative matrix factorization with fixed row and column sums Ngoc-Diep Ho, Paul Van

More information

Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification

Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification 250 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification Keigo Kimura, Mineichi Kudo and Lu Sun Graduate

More information

Using Matrix Decompositions in Formal Concept Analysis

Using Matrix Decompositions in Formal Concept Analysis Using Matrix Decompositions in Formal Concept Analysis Vaclav Snasel 1, Petr Gajdos 1, Hussam M. Dahwa Abdulla 1, Martin Polovincak 1 1 Dept. of Computer Science, Faculty of Electrical Engineering and

More information

A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation ABSTRACT Arindam Banerjee Inderjit Dhillon Joydeep Ghosh Srujana Merugu University of Texas Austin, TX, USA Co-clustering

More information

Non-Negative Matrix Factorization

Non-Negative Matrix Factorization Chapter 3 Non-Negative Matrix Factorization Part 2: Variations & applications Geometry of NMF 2 Geometry of NMF 1 NMF factors.9.8 Data points.7.6 Convex cone.5.4 Projections.3.2.1 1.5 1.5 1.5.5 1 3 Sparsity

More information

Adaptive Affinity Matrix for Unsupervised Metric Learning

Adaptive Affinity Matrix for Unsupervised Metric Learning Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, Junxuan Chen, Yiru Zhao and Hongtao Lu Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering,

More information

NONNEGATIVE matrix factorization (NMF) is a

NONNEGATIVE matrix factorization (NMF) is a Algorithms for Orthogonal Nonnegative Matrix Factorization Seungjin Choi Abstract Nonnegative matrix factorization (NMF) is a widely-used method for multivariate analysis of nonnegative data, the goal

More information

Nonnegative Matrix Factorization with Orthogonality Constraints

Nonnegative Matrix Factorization with Orthogonality Constraints Nonnegative Matrix Factorization with Orthogonality Constraints Jiho Yoo and Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology (POSTECH) Pohang, Republic

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Preserving Privacy in Data Mining using Data Distortion Approach

Preserving Privacy in Data Mining using Data Distortion Approach Preserving Privacy in Data Mining using Data Distortion Approach Mrs. Prachi Karandikar #, Prof. Sachin Deshpande * # M.E. Comp,VIT, Wadala, University of Mumbai * VIT Wadala,University of Mumbai 1. prachiv21@yahoo.co.in

More information

Research Article Relationship Matrix Nonnegative Decomposition for Clustering

Research Article Relationship Matrix Nonnegative Decomposition for Clustering Mathematical Problems in Engineering Volume 2011, Article ID 864540, 15 pages doi:10.1155/2011/864540 Research Article Relationship Matrix Nonnegative Decomposition for Clustering Ji-Yuan Pan and Jiang-She

More information

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,

More information

Knowledge Transformation from Word Space to Document Space

Knowledge Transformation from Word Space to Document Space Knowledge Transformation from Word Space to Document Space Tao Li School of Computer Science Florida International University Miami, FL 33199 taoli@cs.fiu.edu Chris Ding CSE Department University of Texas

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

arxiv: v1 [stat.ml] 23 Dec 2015

arxiv: v1 [stat.ml] 23 Dec 2015 k-means Clustering Is Matrix Factorization Christian Bauckhage arxiv:151.07548v1 [stat.ml] 3 Dec 015 B-IT, University of Bonn, Bonn, Germany Fraunhofer IAIS, Sankt Augustin, Germany http://mmprec.iais.fraunhofer.de/bauckhage.html

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Regularized NNLS Algorithms for Nonnegative Matrix Factorization with Application to Text Document Clustering

Regularized NNLS Algorithms for Nonnegative Matrix Factorization with Application to Text Document Clustering Regularized NNLS Algorithms for Nonnegative Matrix Factorization with Application to Text Document Clustering Rafal Zdunek Abstract. Nonnegative Matrix Factorization (NMF) has recently received much attention

More information

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Collaborative Filtering Using Orthogonal Nonnegative Matrix Tri-factorization

Collaborative Filtering Using Orthogonal Nonnegative Matrix Tri-factorization Collaborative Filtering Using Orthogonal Nonnegative Matrix Tri-factorization Gang Chen 1,FeiWang 1, Changshui Zhang 2 State Key Laboratory of Intelligent Technologies and Systems Tsinghua University 1

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Using co-clustering to analyze spatio-temporal patterns: a case study based on spring phenology

Using co-clustering to analyze spatio-temporal patterns: a case study based on spring phenology Using co-clustering to analyze spatio-temporal patterns: a case study based on spring phenology R. Zurita-Milla, X. Wu, M.J. Kraak Faculty of Geo-Information Science and Earth Observation (ITC), University

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Probabilistic Dyadic Data Analysis with Local and Global Consistency

Probabilistic Dyadic Data Analysis with Local and Global Consistency Deng Cai DENGCAI@CAD.ZJU.EDU.CN Xuanhui Wang XWANG20@CS.UIUC.EDU Xiaofei He XIAOFEIHE@CAD.ZJU.EDU.CN State Key Lab of CAD&CG, College of Computer Science, Zhejiang University, 100 Zijinggang Road, 310058,

More information

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

Non-negative Matrix Factorization on Kernels

Non-negative Matrix Factorization on Kernels Non-negative Matrix Factorization on Kernels Daoqiang Zhang, 2, Zhi-Hua Zhou 2, and Songcan Chen Department of Computer Science and Engineering Nanjing University of Aeronautics and Astronautics, Nanjing

More information

Graph-Laplacian PCA: Closed-form Solution and Robustness

Graph-Laplacian PCA: Closed-form Solution and Robustness 2013 IEEE Conference on Computer Vision and Pattern Recognition Graph-Laplacian PCA: Closed-form Solution and Robustness Bo Jiang a, Chris Ding b,a, Bin Luo a, Jin Tang a a School of Computer Science and

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 6 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen May 23, 23 Non-Negative Datasets Some datasets are intrinsically non-negative: Counters (e.g., no. occurrences

More information

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation Amnon Shashua School of Computer Science & Eng. The Hebrew University Matrix Factorization

More information

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding Xiaofeng He Horst D. Simon Abstract Current nonnegative matrix factorization (NMF) deals with X = FG T type. We

More information

Discriminative Transfer Learning on Manifold

Discriminative Transfer Learning on Manifold Discriminative Transfer Learning on Manifold Zheng Fang Zhongfei (Mark) Zhang Abstract Collective matrix factorization has achieved a remarkable success in document classification in the literature of

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and

More information

Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds

Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds Meng Jiang, Peng Cui Tsinghua University Nicholas

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Clustering by Low-Rank Doubly Stochastic Matrix Decomposition

Clustering by Low-Rank Doubly Stochastic Matrix Decomposition Zhirong Yang zhirong.yang@aalto.fi Department of Information and Computer Science, Aalto University, 00076, Finland Erkki Oja erkki.oja@aalto.fi Department of Information and Computer Science, Aalto University,

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation Bregman Co-clustering and Matrix Approximation A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation Arindam Banerjee Department of Computer Science and Engg, University

More information

A Generative Block-Diagonal Model for Clustering

A Generative Block-Diagonal Model for Clustering A Generative Block-Diagonal Model for Clustering Junxiang Chen Dept. of Electrical & Computer Engineering Northeastern University Boston, MA 02115 jchen@ece.neu.edu Jennifer Dy Dept. of Electrical & Computer

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Cross-Domain Recommendation via Cluster-Level Latent Factor Model

Cross-Domain Recommendation via Cluster-Level Latent Factor Model Cross-Domain Recommendation via Cluster-Level Latent Factor Model Sheng Gao 1, Hao Luo 1, Da Chen 1, and Shantao Li 1 Patrick Gallinari 2, Jun Guo 1 1 PRIS - Beijing University of Posts and Telecommunications,

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Bregman Divergences. Barnabás Póczos. RLAI Tea Talk UofA, Edmonton. Aug 5, 2008

Bregman Divergences. Barnabás Póczos. RLAI Tea Talk UofA, Edmonton. Aug 5, 2008 Bregman Divergences Barnabás Póczos RLAI Tea Talk UofA, Edmonton Aug 5, 2008 Contents Bregman Divergences Bregman Matrix Divergences Relation to Exponential Family Applications Definition Properties Generalization

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Research Article Comparative Features Extraction Techniques for Electrocardiogram Images Regression

Research Article Comparative Features Extraction Techniques for Electrocardiogram Images Regression Research Journal of Applied Sciences, Engineering and Technology (): 6, 7 DOI:.96/rjaset..6 ISSN: -79; e-issn: -767 7 Maxwell Scientific Organization Corp. Submitted: September 8, 6 Accepted: November,

More information

An Algorithm for Transfer Learning in a Heterogeneous Environment

An Algorithm for Transfer Learning in a Heterogeneous Environment An Algorithm for Transfer Learning in a Heterogeneous Environment Andreas Argyriou 1, Andreas Maurer 2, and Massimiliano Pontil 1 1 Department of Computer Science University College London Malet Place,

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

P leiades: Subspace Clustering and Evaluation

P leiades: Subspace Clustering and Evaluation P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de

More information

Single-channel source separation using non-negative matrix factorization

Single-channel source separation using non-negative matrix factorization Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

Term Filtering with Bounded Error

Term Filtering with Bounded Error Term Filtering with Bounded Error Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi, tangjie, ljz}@keg.cs.tsinghua.edu.cn

More information

Learning Bound for Parameter Transfer Learning

Learning Bound for Parameter Transfer Learning Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Nonnegative Matrix Tri-Factorization Based High-Order Co-Clustering and Its Fast Implementation

Nonnegative Matrix Tri-Factorization Based High-Order Co-Clustering and Its Fast Implementation 2011 11th IEEE International Conference on Data Mining Nonnegative Matrix Tri-Factorization Based High-Order Co-Clustering and Its Fast Implementation Hua Wang, Feiping Nie, Heng Huang, Chris Ding Department

More information

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Asaf Bar Zvi Adi Hayat. Semantic Segmentation Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random

More information

Efficient Maximum Margin Clustering

Efficient Maximum Margin Clustering Dept. Automation, Tsinghua Univ. MLA, Nov. 8, 2008 Nanjing, China Outline 1 2 3 4 5 Outline 1 2 3 4 5 Support Vector Machine Given X = {x 1,, x n }, y = (y 1,..., y n ) { 1, +1} n, SVM finds a hyperplane

More information

Multi-Task Learning and Algorithmic Stability

Multi-Task Learning and Algorithmic Stability Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Multi-Task Learning and Algorithmic Stability Yu Zhang Department of Computer Science, Hong Kong Baptist University The Institute

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

Coupled Dictionary Learning for Unsupervised Feature Selection

Coupled Dictionary Learning for Unsupervised Feature Selection Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Coupled Dictionary Learning for Unsupervised Feature Selection Pengfei Zhu 1, Qinghua Hu 1, Changqing Zhang 1, Wangmeng

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Collaborative Filtering via Ensembles of Matrix Factorizations

Collaborative Filtering via Ensembles of Matrix Factorizations Collaborative Ftering via Ensembles of Matrix Factorizations Mingrui Wu Max Planck Institute for Biological Cybernetics Spemannstrasse 38, 72076 Tübingen, Germany mingrui.wu@tuebingen.mpg.de ABSTRACT We

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

AN ENHANCED INITIALIZATION METHOD FOR NON-NEGATIVE MATRIX FACTORIZATION. Liyun Gong 1, Asoke K. Nandi 2,3 L69 3BX, UK; 3PH, UK;

AN ENHANCED INITIALIZATION METHOD FOR NON-NEGATIVE MATRIX FACTORIZATION. Liyun Gong 1, Asoke K. Nandi 2,3 L69 3BX, UK; 3PH, UK; 213 IEEE INERNAIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEP. 22 25, 213, SOUHAMPON, UK AN ENHANCED INIIALIZAION MEHOD FOR NON-NEGAIVE MARIX FACORIZAION Liyun Gong 1, Asoke K. Nandi 2,3

More information

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS International Journal of Uncertainty Fuzziness and Knowledge-Based Systems World Scientific ublishing Company ENTOIES OF FUZZY INDISCENIBILITY ELATION AND ITS OEATIONS QINGUA U and DAEN YU arbin Institute

More information

Doubly Stochastic Normalization for Spectral Clustering

Doubly Stochastic Normalization for Spectral Clustering Doubly Stochastic Normalization for Spectral Clustering Ron Zass and Amnon Shashua Abstract In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering Filippo Pompili 1, Nicolas Gillis 2, P.-A. Absil 2,andFrançois Glineur 2,3 1- University of Perugia, Department

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Heterogeneous Learning. Jingrui He Computer Science Department Stevens Institute of Technology

Heterogeneous Learning. Jingrui He Computer Science Department Stevens Institute of Technology Heterogeneous Learning Jingrui He Computer Science Department Stevens Institute of Technology jingrui.he@gmail.com What is Heterogeneity? n Definition q Inhomogeneous property of a target application n

More information