NMF based Gene Selection Algorithm for Improving Performance of the Spectral Cancer Clustering

Size: px
Start display at page:

Download "NMF based Gene Selection Algorithm for Improving Performance of the Spectral Cancer Clustering"

Transcription

1 NMF based Gene Selection Algorithm for Improving Performance of the Spectral Cancer Clustering Andri Mirzal Faculty of Computing Universiti Teknologi Malaysia Skudai, Johor Bahru, Malaysia Abstract Analyzing cancers using microarray gene expression datasets is currently an active research in medical community. There are many tasks related to this research, e.g., clustering and classifcation, data compression, and samples characterization. In this paper, we discuss the task of cancer clustering. The spectral clustering is one of the most commonly used methods in cancer clustering. As the gene expression datasets usually are highly imbalanced, i.e., containing only a few tissue samples (hundreds at most but each is expressed by thousands of genes, filtering out some irrelevant and potentially misleading gene expressions is a necessary step to improve the performance of the method. In this paper, we propose an unsupervised gene selection algorithm based on the nonnegative matrix factorization (NMF. Our algorithm is designed by making use of the clustering capability of the NMF to select the most informative genes. Clustering performance of the spectral method is then evaluated by comparing the results using the original datasets with the results using the pruned datasets. Our results suggest that the proposed algorithm can be used to improve clustering performance of the spectral method. Keywords cancer clustering, gene selection, nonnegative matrix factorization, spectral clustering I. INTRODUCTION Cancer clustering is a task of grouping samples from patients with cancers so that samples with the same type can be clustered in the same group (usually each group refers to a specific cancer type [1]. In some datasets, normal tissues are also included for controlling purpose []. In the literature, one can find two related terms: cancer clustering and cancer classification which sometimes are used interchangeably. In this paper we explicitly differentiate these terms: cancer clustering refers to the unsupervised task for grouping the samples, and cancer classification refers to the supervised task where the classifiers are trained first before being used to classify the samples. In recent years, thousands of new gene expression datasets are being generated. These datasets usually consist of only a few samples (hundreds at most, but each sample is represented by thousands of gene expressions. This characteristic makes analyzing the datasets quite challenging because most clustering and classification techniques usually perform poorly when the number of samples is small. In addition, the high dimensionality of the data suggests that many of the gene expressions are actually irrelevant and possibly misleading, and thus gene selection procedure should be employed to clean the data. For classification problem, the small number of samples creates an additional problem of overfitting the classifiers [3]. The using of gene selection procedure to improve classification performance has been extensively studied [1], [3] [17]. Most of the proposed methods are based on the support vector machines (SVMs, and it has been shown that the methods can significantly improve performances of the classifiers. However in cancer clustering research, gene selection is not well studied yet. The common approach is to use the whole dimensions which potentially reduces performances of the clustering algorithms because the data can contain irrelevant and misleading gene expressions. In this paper, we propose an unsupervised gene selection algorithm based on the nonnegative matrix factorization (NMF. NMF is a matrix factorization technique that decomposes a nonnegative matrix into a pair of other nonnegative matrices. It has been successfully applied in many problem domains including clustering [4] [6], [18] [3], image analysis [33] [37], and feature extraction [4], [5], [38], [39]. The proposed algorithm is designed by using the fact that the NMF can be used for grouping similar genes in unsupervised manner and that the membership degrees of each gene to the clusters are directly given by entries in the corresponding column of the coefficient matrix. We then use the proposed algorithm to improve performance of the spectral clustering. II. THE SPECTRAL CLUSTERING The spectral clustering is a family of multiway clustering techniques that make use of eigenvectors of the data matrix to perform the clustering. Depending on the choosing of the matrix, the number of eigenvectors, and the algorithm to infer clusters from the eigenvectors, there are many spectral clustering algorithms available, e.g., [40] [4] (a detailed discussion on the spectral clustering can be found in ref. [43]. Here we use spectral clustering algorithm proposed by Ng et al. [41]. We choose this algorithm because of its intuitiveness and clustering capability. Algorithm 1 outlines the algorithm where R+ M N denotes an M-by-N nonnegative matrix and B+ M K denotes an M-by-K binary matrix. III. THE PROPOSED ALGORITHM Given a nonnegative data matrix A R+ M N,theNMF decomposes the matrix into the basis matrix B R+ M R and the coefficient matrix C R+ R N such that: A BC.

2 Algorithm 1 A spectral clustering algorithm by Ng et al. [41] 1 Input: Rectangular data matrix A R+ M N with M data points, and the number of clusters K. Construct symmetric affinity matrix Ȧ R+ M M from A by using the Gaussian kernel. 3 Normalize Ȧ by Ȧ D 1/ ȦD 1/ where D is a diagonal matrix with D ii = A j ij. 4 Compute K eigenvectors that correspond to the K largest eigenvalues of Ȧ, and form ˆX R M K = [ˆx 1,...,ˆx K ], where ˆx k is the k-th eigenvector. 5 Normalize every row of ˆX, i.e., X ij X ij /( j X ij 1/. 6 Apply k-means clustering on rows of ˆX to obtain clustering indicator matrix X B+ M K. To compute B and C, usually the following optimization problem is used: min B,C J(B, C =1 A BC F s.t. B 0, C 0, (1 where X F denotes the Frobenius norm of X. There are many algorithms proposed to solve the optimization problem in eq. 1. However, for clustering purpose, there is not much performance difference between the standar NMF algorithm proposed by Lee and Seung [44] and the more advanced and application specific algorithms [4] [6], [18] [3]. Accordingly, we use the standard NMF algorithm. Algorithm outlines the algorithm where b (k mr denotes the (m, r entry of B at k-th iteration, X T denotes the transpose of X, and δ denotes a small positive number to avoid division by zero. Algorithm The standard NMF algorithm [44]. Initialization: B (0 > 0 and C (0 > 0. for k =0,...,maxiter do end for b (k+1 mr c (k+1 rn b (k (AC (kt mr mr (B (k C (k C (kt mr + δ c (k rn m, r (B (k+1t A rn (B (k+1t B (k+1 C (k rn + δ r, n Let A denotes sample-by-gene matrix containing the gene expression data and R denotes the number of cancer classes. By using Algorithm to factorize A into B and C, column n-th of C describes the clustering membership degrees of nth gene to each cluster with the more positive the entry the more likely the gene to belong to the corresponding cluster. For hard clustering case, the membership is determined by the most positive entry. Further, if we normalize each column of C, i.e., c rn c rn / r c rn, the entries in each row will be comparable and consequently row r-th of C will describe the membership strength of the genes to the r-th cluster. Thus, we can sort these rows to find the most informative genes to the corresponding clusters. And by choosing some top genes for each cluster, we can select the most informative genes and remove some irrelevant and misleading genes. This process is the core of our algorithm. But because the NMF does not have uniqueness property, the process is repeated so that only genes that consistently come at the top are selected. Because of this repetition process, we introduce a score scheme that assigns some predefined scores to the top genes at each trial. And genes with the largest cumulative scores are then selected as the most informative genes. Our score scheme is based on the MotoGP scoring system, but the scores are assigned only to top 10 genes in each cluster (the scores for the top 10 genes are: 5, 0, 16, 13, 11, 10, 9, 8, 7, and 6. Algorithm 3 outlines the complete gene selection procedure. Algorithm 3 NMF based gene selection algorithm. 1 Input: Gene expression data matrix A R M N + (the rows correspond to the samples and the columns correspond to the genes and the number of cluster R. Normalize each column of A, i.e., a mn a mn / m a mn. 3 for l =0,...,L do a Compute C using Algorithm. b Normalize each column of C, i.e., c rn c rn / r c rn c Sort in descending order each row of C. d Assign scores to the top 10 genes in each row of C. e Accumulate the scores by adding the current scores to the previous ones. 4 end for 5 Select some top genes G according to the cumulative scores. IV. EXPERIMENTAL RESULTS To evaluate the capability of the proposed algorithm in improving performance of the spectral clustering, six publicly available cancer datasets from the work of Souto et al. [45] in which they compiled the first most comprehensive datasets collected from many resources (there are 35 datasets in total were used. Tables I summarizes the information of the datasets. As shown the datasets are quite representative as the number of classes varied from to 10, the number of samples varied from tens to hundreds, and we also have one dataset, Su-001, that contains multiple type of cancers. There are some parameters need to be chosen. The first is maxiter in Algorithm. Here we set maxiter to 100 as the standard NMF algorithm is known to be fast in minimizing the TABLE I. CANCER DATASETS. Dataset name Tissue #Samples #Genes #Classes Nutt-003-v Brain Armstrong-00-v Blood Tomlins-006-v Prostate Pomeroy-00-v Brain Yeoh-00-v Bone Su-001 Multi

3 error only for the first iterations [49]. The second is the number of trials L in Algorithm 3. After several attempts, we found that there was not much performance gain between L = 100 and L > 100. Thus we set L to be 100. The third is the number of top genes G in step 5 of Algorithm 3. After several attempts, G was set to 0, 1600, 50, 300, 000, and 00 for Nutt, Armstrong, Tomlins, Pomeroy, Yeoh, and Su respectively. And δ in Algorithm was set to To evaluate clustering performance, two metrics were used: Accuracy and Adjusted Rand Index (ARI. Accuracy is the most commonly used metric to measure performance of clustering algorithms in medical community. It measures the fraction of the dominant class in a cluster. Accuracy is defined with [3]: Accuracy = 1 max c rs, M s r=1 where r and s denote the r-th cluster and s-th reference class respectively, R denotes the number of clusters produced by clustering algorithm, M denotes the number of samples, and c rs denotes the number of samples in r-th cluster that belong to s-th class. The values of Accuracy are between 0 and 1 with 1 indicates a perfect agreement between the reference classes and the clustering results. In machine learning community, this metric is also known as Purity [5]. Adjusted Rand Index (ARI has a value ranges from -1 to 1, with 1 indicates the perfect agreement and values near 0 or negatives correspond to clusters found by chance. ARI is defined wth [46] [48]: rs ARI = [ 1 r ( crs + s R ( N 1 ( c s ( c s r s ] ( N 1 r s ( c s, where c r denotes the number of samples in r-th cluster, and c s denotes the number of samples in s-th class. The experiment procedure is as follows. First Algorithm 3 was used to select top genes from the original data matrix A R+ M N. Then a new pruned data matrix A R+ M G was formed with the top G genes. This matrix was then inputted to Algorithm 1 to obtain clustering indicator matrix. The clustering quality was then measured by using Accuracy and ARI. Because of the nonuniqueness of the NMF, this procedure was repeated 100 times to get more statistically sound results. Fig. 1 shows performance of the spectral clustering with and without the gene selection procedure. As shown, the spectral clustering performed quite well in three datasets (Armstrong, Pomeroy, and Su, and rather produced unsatisfactory results in the other three datasets (Nutt, Tomlins, and Yeoh. The gene selection improved clustering performance of the spectral clustering in all cases with better improvements are observed in the cases where clustering results are rather unsatisfactory. These imply that there are not many irrelevant and misleading genes in the first cases so that the results with and without the gene selection are comparable. On the other hand, there are some of those genes in the second cases that were removed by the gene selection process. Table I dan II give the detailed experimental results for 100 trials where the values are displayed in format average values ± standard deviation values. Accuracy ARI Without Gene Selection With Gene Selection Nutt Armstrong Tomlins Pomeroy Yeoh Su (a Accuracy Without Gene Selection With Gene Selection Nutt Armstrong Tomlins Pomeroy Yeoh Su (b ARI Fig. 1. Performance of the spectral clustering with and without gene selection measured by Accuracy and ARI (average values over 100 runs. TABLE II. TABLE III. ACCURACY AND ARI WITHOUT GENE SELECTION. Dataset name Accuracy ARI Nutt-003-v ± ± Armstrong-00-v ± ± Tomlins-006-v ± ± Pomeroy-00-v ± ± Yeoh-00-v ± ± Su ± ± ACCURACY AND ARI WITH GENE SELECTION. Dataset name Accuracy ARI Nutt-003-v ± ± Armstrong-00-v ± ± Tomlins-006-v ± ± Pomeroy-00-v ± ± 0.03 Yeoh-00-v ± ± Su ± ± 0.044

4 V. CONCLUSION We have presented a gene selection algorithm based on the NMF to select the most informative genes from a microarray gene expression dataset. The experimental results showed that the proposed algorithm improved performance of the spectral clustering with more visible improvements are observed in the cases where the spectral clustering produced rather unsatisfactory results. ACKNOWLEDGMENT The author would like to thank the reviewers for useful comments. This research was supported by Ministry of Higher Education of Malaysia and Universiti Teknologi Malaysia under Exploratory Research Grant Scheme R.J L095. REFERENCES [1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, Vol. 86(5439, pp , [] S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y.H. Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen, D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, and T.R. Golub, Prediction of central nervous system embryonal tumour outcome based on gene expression, Nature, Vol. 415(6870, pp , 00. [3] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene Selection for Cancer Classification using Support Vector Machines, Machine Learning, Vol. 46(1-3, pp , 00. [4] J.P. Brunet, P. Tamayo, T.R. Golub, and J.P. Mesirov, Metagenes and molecular pattern discovery using matrix factorization, Proc. Natl Acad. Sci. USA, Vol. 101(1, pp , 003. [5] C.H. Zheng, D.S. Huang, D. Zhang, and X.Z. Kong, Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection, IEEE Transactions on Information Technology in Biomedicine, Vol. 13(4, pp , 009. [6] N. Yuvaraj and P. Vivekanandan, An efficient SVM based tumor classification with symmetry Non-negative Matrix Factorization using gene expression data, Proc. Int l Conf. on Information Communication and Embedded Systems, pp , 013. [7] M. Pirooznia, J.Y. Yang, M.Q. Yang, and Y. Deng, A comparative study of different machine learning methods on microarray gene expression data, BMC Genomics, Vol. 9(Suppl 1, pp. S13, 008. [8] X. Liu, A. Krishnan, and A. Mondry, An Entropy-based gene selection method for cancer classification using microarray data, BMC Bioinformatics, Vol. 6, pp. 76, 005. [9] L. Wang, F. Chu, and W. Xie, Accurate Cancer Classification Using Expressions of Very Few Genes, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 4(1, pp , 007. [10] L.Y. Chuang, H.W. Chang, C.J. Tu, and C.H. Yang, Improved binary PSO for feature selection using gene expression data, Computational Biology and Chemistry, Vol. 3(1, pp. 9-37, 008. [11] P. Mitra and D.D. Majumder, Feature Selection and Gene Clustering from Gene Expression Data, Proc. the 17th Int l Conf. on Pattern Recognition, pp , 004. [1] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer and D. Haussler, Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, Vol. 16(10, pp , 000. [13] S. Moon and H. Qi, Hybrid Dimensionality Reduction Method Based on Support Vector Machine and Independent Component Analysis, IEEE Transactions on Neural Networks and Learning Systems, Vol. 3(5, pp , 01. [14] Y. Lee and C.K. Lee, Classification of multiple cancer types by multicategory support vector machines using gene expression data, Bioinformatics, Vol. 19(9, pp , 003. [15] X. Zhang, X. Lu, Q. Shi, X. Xu, H.E. Leung, L.N. Harris, J.D. Iglehart, A. Miron, J.S. Liu, and W.H. Wong, Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data, BMC Bioinformatics, Vol. 7(197, 006. [16] Y. Lu and J. Han, Cancer classification using gene expression data, Information Systems, Vol. 8(4, pp , 003. [17] H.H. Zhang, J. Ahn, X. Lin. and C. Park, Gene selection using support vector machines with non-convex penalty, Bioinformatics, Vol. (1, pp , 006. [18] F. Shahnaz, M.W. Berry, V. Pauca, and R.J. Plemmons, Document clustering using nonnegative matrix factorization, Information Processing & Management, Vol. 4(, pp , 006. [19] W. Xu, X. Liu and Y. Gong, Document clustering based on nonnegative matrix factorization, Proc. ACM SIGIR, pp , 003. [0] M. Berry, M. Brown, A. Langville, P. Pauca, and R.J. Plemmons, Algorithms and applications for approximate nonnegative matrix factorization, Computational Statistics and Data Analysis, Vol. 5(1, pp , 007. [1] J. Yoo and S. Choi, Orthogonal nonnegative matrix factorization: Multiplicative updates on Stiefel manifolds, Proc. 9th Int l Conf. Intelligent Data Engineering and Automated Learning, pp , 008. [] J. Yoo and S. Choi, Orthogonal nonnegative matrix tri-factorization for co-clustering: Multiplicative updates on Stiefel manifolds, Information Processing & Management, Vol. 46(5, pp , 010. [3] Y. Gao and G. Church, Improving Molecular cancer class discovery through sparse non-negative matrix factorization, Bioinformatics, Vol. 1(1, pp , 005. [4] D. Dueck, Q.D. Morris, and B.J. Frey, Multi-way clustering of microarray data using probabilistic sparse matrix factorization, Bioinformatics, Vol. 1(1, pp , 005. [5] H. Kim and H. Park, Sparse non-negative matrix factorizations via alternating non-negativity constrained least squares for microarray data analysis, Bioinformatics, Vol. 3(1, pp , 007. [6] K. Devarajan, Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology, PLoS Computational Biology, Vol. 4(7, pp. e100009, 008. [7] H. Kim and H. Park, Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method, SIAM J. Matrix Anal. Appl., Vol. 30(, pp , 008. [8] P. Carmona-Saez, R.D. Pascual-Marqui, F. Tirado, J.M Carazo, and A. Pascual-Montano, Biclustering of gene expression data by non-smooth non-negative matrix factorization, BMC Bioinformatics, Vol. 7(78, 006. [9] K. Inamura, T. Fujiwara, Y. Hoshida, T. Isagawa, M.H. Jones, C. Virtanen, M. Shimane, Y. Satoh, S. Okumura, K. Nakagawa, E. Tsuchiya, S. Ishikawa, H. Aburatani, H. Nomura, and Y. Ishikawa, Two subclasses of lung squamous cell carcinoma with different gene expression profiles and prognosis identified by hierarchical clustering and non-negative matrix factorization, Oncogene, Vol. 4, pp , 005. [30] P. Fogel, S.S. Young, D.M. Hawkins, and N. Ledirac, Inferential, robust non-negative matrix factorization analysis of microarray data, Bioinformatics, Vol. 3(1, pp , 007. [31] G. Wang, A.V. Kossenkov, and M.F. Ochs, LS-NMF: A modified nonnegative matrix factorization algorithm utilizing uncertainty estimates, BMC Bioinformatics, Vol. 7(175, 006. [3] J.J.Y. Wang, X. Wang, and X. Gao, Non-negative matrix factorization by maximizing correntropy for cancer clustering, BMC Bioinformatics, Vol. 14(107, 013. [33] P.O. Hoyer, Non-negative matrix factorization with sparseness constraints, The Journal of Machine Learning Research, Vol. 5, pp , 004. [34] S.Z. Li, X.W. Hou, H.J. Zhang, and Q.S. Cheng, Learning spatially localized, parts-based representation, Proc. IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition, pp. 07-1, 001. [35] D. Wang and H. Lu, On-line learning parts-based representation via incremental orthogonal projective non-negative matrix factorization, Signal Processing, Vol. 93(6, pp , 013.

5 [36] A. Pascual-Montano, J.M. Carazo, K. Kochi, D. Lehman, and R.D. Pascual-Marqui, Nonsmooth nonnegative matrix factorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.8 (3, pp , 006. [37] N. Gillis and F. Glineur, A multilevel approach for nonnegative matrix factorization, Journal of Computational and Applied Mathematics, Vol. 36(7, pp , 01. [38] H. Kim and H. Park, Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method, SIAM. J. Matrix Anal. & Appl., Vol. 30(, pp , 008. [39] W. Kim, B. Chen, J. Kim, Y. Pan, and H. Park, Sparse nonnegative matrix factorization for protein sequence motif discovery, Expert Systems with Applications, Vol. 38(10, pp , 011. [40] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., Vol. (8, pp , 000. [41] A. Ng, M.I. Jordan, and Y. Weiss, On spectral clustering: analysis and an algorithm, Proc. Advances in Neural Information Processing Systems, pp , 00. [4] S.X. Yu and J. Shi, Multiclass spectral clustering, Proc. IEEE Int l Conf. on Computer Vision, pp , 003. [43] U. Luxburg, A tutorial on spectral clustering, Statistics and Computing, Vol. 17, pp , 007. [44] D. Lee and H. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, Vol. 401(6755, pp , [45] M.C.P. Souto, I.G. Costa, D.S.A. Araujo, T.B. Ludermir, and A. Schliep, Clustering cancer gene expression data: a comparative study, BMC Bioinformatics, Vol. 9(497, 008. [46] W.M. Rand, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, Vol. 66(336, pp , [47] L. Hubert and P. Arabie, Comparing partitions, Journal of Classification, Vol. (1, pp , [48] N.X. Vinh, J. Epps, and J. Bailey, Information theoretic measures for clustering comparison: Is a correction for chance necessary?, Proc. 6th Annual Int l Conf. on Machine Learning, pp , 009. [49] C.J. Lin, On the convergence of multiplicative update algorithms for nonnegative matrix factorization, IEEE Transactions on Neural Networks, Vol. 18(6, pp , 007.

AN ENHANCED INITIALIZATION METHOD FOR NON-NEGATIVE MATRIX FACTORIZATION. Liyun Gong 1, Asoke K. Nandi 2,3 L69 3BX, UK; 3PH, UK;

AN ENHANCED INITIALIZATION METHOD FOR NON-NEGATIVE MATRIX FACTORIZATION. Liyun Gong 1, Asoke K. Nandi 2,3 L69 3BX, UK; 3PH, UK; 213 IEEE INERNAIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEP. 22 25, 213, SOUHAMPON, UK AN ENHANCED INIIALIZAION MEHOD FOR NON-NEGAIVE MARIX FACORIZAION Liyun Gong 1, Asoke K. Nandi 2,3

More information

EUSIPCO

EUSIPCO EUSIPCO 2013 1569741067 CLUSERING BY NON-NEGAIVE MARIX FACORIZAION WIH INDEPENDEN PRINCIPAL COMPONEN INIIALIZAION Liyun Gong 1, Asoke K. Nandi 2,3 1 Department of Electrical Engineering and Electronics,

More information

Dimensional reduction of clustered data sets

Dimensional reduction of clustered data sets Dimensional reduction of clustered data sets Guido Sanguinetti 5th February 2007 Abstract We present a novel probabilistic latent variable model to perform linear dimensional reduction on data sets which

More information

Feature Selection and Classification on Matrix Data: From Large Margins To Small Covering Numbers

Feature Selection and Classification on Matrix Data: From Large Margins To Small Covering Numbers Feature Selection and Classification on Matrix Data: From Large Margins To Small Covering Numbers Sepp Hochreiter and Klaus Obermayer Department of Electrical Engineering and Computer Science Technische

More information

Non-Negative Factorization for Clustering of Microarray Data

Non-Negative Factorization for Clustering of Microarray Data INT J COMPUT COMMUN, ISSN 1841-9836 9(1):16-23, February, 2014. Non-Negative Factorization for Clustering of Microarray Data L. Morgos Lucian Morgos Dept. of Electronics and Telecommunications Faculty

More information

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,

More information

NONNEGATIVE matrix factorization (NMF) is a

NONNEGATIVE matrix factorization (NMF) is a Algorithms for Orthogonal Nonnegative Matrix Factorization Seungjin Choi Abstract Nonnegative matrix factorization (NMF) is a widely-used method for multivariate analysis of nonnegative data, the goal

More information

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION Ehsan Hosseini Asl 1, Jacek M. Zurada 1,2 1 Department of Electrical and Computer Engineering University of Louisville, Louisville,

More information

Modifying A Linear Support Vector Machine for Microarray Data Classification

Modifying A Linear Support Vector Machine for Microarray Data Classification Modifying A Linear Support Vector Machine for Microarray Data Classification Prof. Rosemary A. Renaut Dr. Hongbin Guo & Wang Juh Chen Department of Mathematics and Statistics, Arizona State University

More information

Module Based Neural Networks for Modeling Gene Regulatory Networks

Module Based Neural Networks for Modeling Gene Regulatory Networks Module Based Neural Networks for Modeling Gene Regulatory Networks Paresh Chandra Barman, Std 1 ID: 20044523 Term Project: BiS732 Bio-Network Department of BioSystems, Korea Advanced Institute of Science

More information

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics

More information

Preserving Privacy in Data Mining using Data Distortion Approach

Preserving Privacy in Data Mining using Data Distortion Approach Preserving Privacy in Data Mining using Data Distortion Approach Mrs. Prachi Karandikar #, Prof. Sachin Deshpande * # M.E. Comp,VIT, Wadala, University of Mumbai * VIT Wadala,University of Mumbai 1. prachiv21@yahoo.co.in

More information

Feature Selection for SVMs

Feature Selection for SVMs Feature Selection for SVMs J. Weston, S. Mukherjee, O. Chapelle, M. Pontil T. Poggio, V. Vapnik, Barnhill BioInformatics.com, Savannah, Georgia, USA. CBCL MIT, Cambridge, Massachusetts, USA. AT&T Research

More information

Research Article Comparative Features Extraction Techniques for Electrocardiogram Images Regression

Research Article Comparative Features Extraction Techniques for Electrocardiogram Images Regression Research Journal of Applied Sciences, Engineering and Technology (): 6, 7 DOI:.96/rjaset..6 ISSN: -79; e-issn: -767 7 Maxwell Scientific Organization Corp. Submitted: September 8, 6 Accepted: November,

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Feature gene selection method based on logistic and correlation information entropy

Feature gene selection method based on logistic and correlation information entropy Bio-Medical Materials and Engineering 26 (2015) S1953 S1959 DOI 10.3233/BME-151498 IOS Press S1953 Feature gene selection method based on logistic and correlation information entropy Jiucheng Xu a,b,,

More information

arxiv: v3 [cs.lg] 18 Mar 2013

arxiv: v3 [cs.lg] 18 Mar 2013 Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

On Spectral Basis Selection for Single Channel Polyphonic Music Separation On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu

More information

Research Article Relationship Matrix Nonnegative Decomposition for Clustering

Research Article Relationship Matrix Nonnegative Decomposition for Clustering Mathematical Problems in Engineering Volume 2011, Article ID 864540, 15 pages doi:10.1155/2011/864540 Research Article Relationship Matrix Nonnegative Decomposition for Clustering Ji-Yuan Pan and Jiang-She

More information

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. Uncorrelated Lasso

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. Uncorrelated Lasso Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence Uncorrelated Lasso Si-Bao Chen School of Computer Science and Technology, Anhui University, Hefei, 23060, China Chris Ding Department

More information

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,

More information

Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method

Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, hi-square Statistic, and a Hybrid Method hris Ding a, ao Li b and Wei Peng b a Lawrence Berkeley National Laboratory,

More information

Self-Tuning Spectral Clustering

Self-Tuning Spectral Clustering Self-Tuning Spectral Clustering Lihi Zelnik-Manor Pietro Perona Department of Electrical Engineering Department of Electrical Engineering California Institute of Technology California Institute of Technology

More information

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

Cholesky Decomposition Rectification for Non-negative Matrix Factorization Cholesky Decomposition Rectification for Non-negative Matrix Factorization Tetsuya Yoshida Graduate School of Information Science and Technology, Hokkaido University N-14 W-9, Sapporo 060-0814, Japan yoshida@meme.hokudai.ac.jp

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

Doubly Stochastic Normalization for Spectral Clustering

Doubly Stochastic Normalization for Spectral Clustering Doubly Stochastic Normalization for Spectral Clustering Ron Zass and Amnon Shashua Abstract In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that

More information

Linear Classification and SVM. Dr. Xin Zhang

Linear Classification and SVM. Dr. Xin Zhang Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Multiclass Classifiers Based on Dimension Reduction. with Generalized LDA

Multiclass Classifiers Based on Dimension Reduction. with Generalized LDA Multiclass Classifiers Based on Dimension Reduction with Generalized LDA Hyunsoo Kim a Barry L Drake a Haesun Park a a College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA Abstract

More information

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering Filippo Pompili 1, Nicolas Gillis 2, P.-A. Absil 2,andFrançois Glineur 2,3 1- University of Perugia, Department

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Non-negative matrix factorization with fixed row and column sums

Non-negative matrix factorization with fixed row and column sums Available online at www.sciencedirect.com Linear Algebra and its Applications 9 (8) 5 www.elsevier.com/locate/laa Non-negative matrix factorization with fixed row and column sums Ngoc-Diep Ho, Paul Van

More information

Non-negative Matrix Factorization on Kernels

Non-negative Matrix Factorization on Kernels Non-negative Matrix Factorization on Kernels Daoqiang Zhang, 2, Zhi-Hua Zhou 2, and Songcan Chen Department of Computer Science and Engineering Nanjing University of Aeronautics and Astronautics, Nanjing

More information

Internet Electronic Journal of Molecular Design

Internet Electronic Journal of Molecular Design ISSN 1538 6414 Internet Electronic Journal of Molecular Design May 2003, Volume 2, Number 5, Pages 348 357 Editor: Ovidiu Ivanciuc Special issue dedicated to Professor Haruo Hosoya on the occasion of the

More information

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning LIU, LU, GU: GROUP SPARSE NMF FOR MULTI-MANIFOLD LEARNING 1 Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning Xiangyang Liu 1,2 liuxy@sjtu.edu.cn Hongtao Lu 1 htlu@sjtu.edu.cn

More information

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Nonnegative Matrix Factorization Clustering on Multiple Manifolds Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,

More information

Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification

Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification 250 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification Keigo Kimura, Mineichi Kudo and Lu Sun Graduate

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Spectral Techniques for Clustering

Spectral Techniques for Clustering Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods

More information

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding, Xiaofeng He, Horst D. Simon Published on SDM 05 Hongchang Gao Outline NMF NMF Kmeans NMF Spectral Clustering NMF

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and

More information

From Lasso regression to Feature vector machine

From Lasso regression to Feature vector machine From Lasso regression to Feature vector machine Fan Li, Yiming Yang and Eric P. Xing,2 LTI and 2 CALD, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA 523 {hustlf,yiming,epxing}@cs.cmu.edu

More information

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY OUTLINE Why We Need Matrix Decomposition SVD (Singular Value Decomposition) NMF (Nonnegative

More information

Gene Expression Data Classification With Kernel Principal Component Analysis

Gene Expression Data Classification With Kernel Principal Component Analysis Journal of Biomedicine and Biotechnology 25:2 25 55 59 DOI:.55/JBB.25.55 RESEARCH ARTICLE Gene Expression Data Classification With Kernel Principal Component Analysis Zhenqiu Liu, Dechang Chen, 2 and Halima

More information

Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis

Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis Xiaoxu Han and Joseph Scazzero Department of Mathematics and Bioinformatics Program Department of Accounting and

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Mar Girolami 1 Department of Computing Science University of Glasgow girolami@dcs.gla.ac.u 1 Introduction

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

An indicator for the number of clusters using a linear map to simplex structure

An indicator for the number of clusters using a linear map to simplex structure An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany

More information

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing Computational Statistics and Data Analysis 52 (2008) 3913 3927 www.elsevier.com/locate/csda On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing Chris

More information

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition

More information

Improving Fusion of Dimensionality Reduction Methods for Nearest Neighbor Classification

Improving Fusion of Dimensionality Reduction Methods for Nearest Neighbor Classification Improving Fusion of Dimensionality Reduction Methods for Nearest Neighbor Classification Sampath Deegalla and Henrik Boström Department of Computer and Systems Sciences Stockholm University Forum 100,

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Generative MaxEnt Learning for Multiclass Classification

Generative MaxEnt Learning for Multiclass Classification Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,

More information

Learning Spectral Clustering

Learning Spectral Clustering Learning Spectral Clustering Francis R. Bach Computer Science University of California Berkeley, CA 94720 fbach@cs.berkeley.edu Michael I. Jordan Computer Science and Statistics University of California

More information

CLASSIFIER FUSION FOR POORLY-DIFFERENTIATED TUMOR CLASSIFICATION USING BOTH MESSENGER RNA AND MICRORNA EXPRESSION PROFILES

CLASSIFIER FUSION FOR POORLY-DIFFERENTIATED TUMOR CLASSIFICATION USING BOTH MESSENGER RNA AND MICRORNA EXPRESSION PROFILES 1 CLASSIFIER FUSION FOR POORLY-DIFFERENTIATED TUMOR CLASSIFICATION USING BOTH MESSENGER RNA AND MICRORNA EXPRESSION PROFILES Yuhang Wang and Margaret H. Dunham Department of Computer Science and Engineering,

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Efficient and Robust Feature Extraction by Maximum Margin Criterion

Efficient and Robust Feature Extraction by Maximum Margin Criterion Efficient and Robust Feature Extraction by Maximum Margin Criterion Haifeng Li and Tao Jiang Department of Computer Science and Engineering University of California, Riverside, CA 92521 {hli, jiang}@cs.ucr.edu

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Multi-Task Co-clustering via Nonnegative Matrix Factorization

Multi-Task Co-clustering via Nonnegative Matrix Factorization Multi-Task Co-clustering via Nonnegative Matrix Factorization Saining Xie, Hongtao Lu and Yangcheng He Shanghai Jiao Tong University xiesaining@gmail.com, lu-ht@cs.sjtu.edu.cn, yche.sjtu@gmail.com Abstract

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

Fusion of Dimensionality Reduction Methods: a Case Study in Microarray Classification

Fusion of Dimensionality Reduction Methods: a Case Study in Microarray Classification Fusion of Dimensionality Reduction Methods: a Case Study in Microarray Classification Sampath Deegalla Dept. of Computer and Systems Sciences Stockholm University Sweden si-sap@dsv.su.se Henrik Boström

More information

Non-Negative Matrix Factorization with Quasi-Newton Optimization

Non-Negative Matrix Factorization with Quasi-Newton Optimization Non-Negative Matrix Factorization with Quasi-Newton Optimization Rafal ZDUNEK, Andrzej CICHOCKI Laboratory for Advanced Brain Signal Processing BSI, RIKEN, Wako-shi, JAPAN Abstract. Non-negative matrix

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

Unsupervised dimensionality reduction via gradient-based matrix factorization with two adaptive learning rates

Unsupervised dimensionality reduction via gradient-based matrix factorization with two adaptive learning rates JMLR: Workshop and Conference Proceedings 7:8 95, Workshop on Unsupervised and Transfer Learning Unsupervised dimensionality reduction via gradient-based matrix factorization with two adaptive learning

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding Xiaofeng He Horst D. Simon Abstract Current nonnegative matrix factorization (NMF) deals with X = FG T type. We

More information

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,

More information

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Dongdong Chen and Jian Cheng Lv and Zhang Yi

More information

A Unifying Approach to Hard and Probabilistic Clustering

A Unifying Approach to Hard and Probabilistic Clustering A Unifying Approach to Hard and Probabilistic Clustering Ron Zass and Amnon Shashua School of Engineering and Computer Science, The Hebrew University, Jerusalem 91904, Israel Abstract We derive the clustering

More information

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an

More information

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE 5290 - Artificial Intelligence Grad Project Dr. Debasis Mitra Group 6 Taher Patanwala Zubin Kadva Factor Analysis (FA) 1. Introduction Factor

More information

Approximating the Covariance Matrix with Low-rank Perturbations

Approximating the Covariance Matrix with Low-rank Perturbations Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu

More information

An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets

An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets George Lee 1, Carlos Rodriguez 2, and Anant Madabhushi 1 1 Rutgers, The State University

More information

MULTIPLEKERNELLEARNING CSE902

MULTIPLEKERNELLEARNING CSE902 MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification

More information

Contingency Table Analysis via Matrix Factorization

Contingency Table Analysis via Matrix Factorization Contingency Table Analysis via Matrix Factorization Kumer Pial Das 1, Jay Powell 2, Myron Katzoff 3, S. Stanley Young 4 1 Department of Mathematics,Lamar University, TX 2 Better Schooling Systems, Pittsburgh,

More information

Clustering. SVD and NMF

Clustering. SVD and NMF Clustering with the SVD and NMF Amy Langville Mathematics Department College of Charleston Dagstuhl 2/14/2007 Outline Fielder Method Extended Fielder Method and SVD Clustering with SVD vs. NMF Demos with

More information

arxiv: v2 [math.oc] 6 Oct 2011

arxiv: v2 [math.oc] 6 Oct 2011 Accelerated Multiplicative Updates and Hierarchical ALS Algorithms for Nonnegative Matrix Factorization Nicolas Gillis 1 and François Glineur 2 arxiv:1107.5194v2 [math.oc] 6 Oct 2011 Abstract Nonnegative

More information

Single-channel source separation using non-negative matrix factorization

Single-channel source separation using non-negative matrix factorization Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics

More information

Multi Omics Clustering. ABDBM Ron Shamir

Multi Omics Clustering. ABDBM Ron Shamir Multi Omics Clustering ABDBM Ron Shamir 1 Outline Introduction Cluster of Clusters (COCA) icluster Nonnegative Matrix Factorization (NMF) Similarity Network Fusion (SNF) Multiple Kernel Learning (MKL)

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Wavelet-Based Feature Extraction for Microarray Data Classification

Wavelet-Based Feature Extraction for Microarray Data Classification 26 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 26 Wavelet-Based Feature Extraction for Microarray Data Classification Shutao

More information

Nonnegative Matrix Factorization with Toeplitz Penalty

Nonnegative Matrix Factorization with Toeplitz Penalty Journal of Informatics and Mathematical Sciences Vol. 10, Nos. 1 & 2, pp. 201 215, 2018 ISSN 0975-5748 (online); 0974-875X (print) Published by RGN Publications http://dx.doi.org/10.26713/jims.v10i1-2.851

More information

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization Samir Al-Stouhi Chandan K. Reddy Abstract Researchers have attempted to improve the quality of clustering solutions through

More information

Non-Negative Matrix Factorization

Non-Negative Matrix Factorization Chapter 3 Non-Negative Matrix Factorization Part 2: Variations & applications Geometry of NMF 2 Geometry of NMF 1 NMF factors.9.8 Data points.7.6 Convex cone.5.4 Projections.3.2.1 1.5 1.5 1.5.5 1 3 Sparsity

More information

Regularized NNLS Algorithms for Nonnegative Matrix Factorization with Application to Text Document Clustering

Regularized NNLS Algorithms for Nonnegative Matrix Factorization with Application to Text Document Clustering Regularized NNLS Algorithms for Nonnegative Matrix Factorization with Application to Text Document Clustering Rafal Zdunek Abstract. Nonnegative Matrix Factorization (NMF) has recently received much attention

More information

Research Article A Hybrid ICA-SVM Approach for Determining the Quality Variables at Fault in a Multivariate Process

Research Article A Hybrid ICA-SVM Approach for Determining the Quality Variables at Fault in a Multivariate Process Mathematical Problems in Engineering Volume 1, Article ID 8491, 1 pages doi:1.1155/1/8491 Research Article A Hybrid ICA-SVM Approach for Determining the Quality Variables at Fault in a Multivariate Process

More information

Unsupervised Learning of Ranking Functions for High-Dimensional Data

Unsupervised Learning of Ranking Functions for High-Dimensional Data Unsupervised Learning of Ranking Functions for High-Dimensional Data Sach Mukherjee and Stephen J. Roberts Department of Engineering Science University of Oxford U.K. {sach,sjrob}@robots.ox.ac.uk Abstract

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Spectral clustering and its use in bioinformatics

Spectral clustering and its use in bioinformatics Journal of Computational and Applied Mathematics 4 (7) 5 37 www.elsevier.com/locate/cam Spectral clustering and its use in bioinformatics Desmond J. Higham a,,, Gabriela Kalna a,, Milla Kibble b,3 a Department

More information

Graph Wavelets to Analyze Genomic Data with Biological Networks

Graph Wavelets to Analyze Genomic Data with Biological Networks Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation Zheng-jian Bai Abstract In this paper, we first consider the inverse

More information