NMF based Gene Selection Algorithm for Improving Performance of the Spectral Cancer Clustering

Similar documents
AN ENHANCED INITIALIZATION METHOD FOR NON-NEGATIVE MATRIX FACTORIZATION. Liyun Gong 1, Asoke K. Nandi 2,3 L69 3BX, UK; 3PH, UK;

EUSIPCO

Dimensional reduction of clustered data sets

Feature Selection and Classification on Matrix Data: From Large Margins To Small Covering Numbers

Non-Negative Factorization for Clustering of Microarray Data

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

NONNEGATIVE matrix factorization (NMF) is a

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION

Modifying A Linear Support Vector Machine for Microarray Data Classification

Module Based Neural Networks for Modeling Gene Regulatory Networks

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection

Preserving Privacy in Data Mining using Data Distortion Approach

Feature Selection for SVMs

Research Article Comparative Features Extraction Techniques for Electrocardiogram Images Regression

Iterative Laplacian Score for Feature Selection

Feature gene selection method based on logistic and correlation information entropy

arxiv: v3 [cs.lg] 18 Mar 2013

Support Vector Machine & Its Applications

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

Research Article Relationship Matrix Nonnegative Decomposition for Clustering

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. Uncorrelated Lasso

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method

Self-Tuning Spectral Clustering

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

Microarray Data Analysis: Discovery

Doubly Stochastic Normalization for Spectral Clustering

Linear Classification and SVM. Dr. Xin Zhang

L 2,1 Norm and its Applications

Multiclass Classifiers Based on Dimension Reduction. with Generalized LDA

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

Machine Learning (BSMC-GA 4439) Wenke Liu

Non-negative matrix factorization with fixed row and column sums

Non-negative Matrix Factorization on Kernels

Internet Electronic Journal of Molecular Design

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Spectral Techniques for Clustering

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

Compressed Sensing and Neural Networks

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

From Lasso regression to Feature vector machine

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

Gene Expression Data Classification With Kernel Principal Component Analysis

Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition

Discriminative Direction for Kernel Classifiers

ECS289: Scalable Machine Learning

An indicator for the number of clusters using a linear map to simplex structure

On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Improving Fusion of Dimensionality Reduction Methods for Nearest Neighbor Classification

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Generative MaxEnt Learning for Multiclass Classification

Learning Spectral Clustering

CLASSIFIER FUSION FOR POORLY-DIFFERENTIATED TUMOR CLASSIFICATION USING BOTH MESSENGER RNA AND MICRORNA EXPRESSION PROFILES

ECS289: Scalable Machine Learning

Efficient and Robust Feature Extraction by Maximum Margin Criterion

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Multi-Task Co-clustering via Nonnegative Matrix Factorization

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Fusion of Dimensionality Reduction Methods: a Case Study in Microarray Classification

Non-Negative Matrix Factorization with Quasi-Newton Optimization

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

Unsupervised dimensionality reduction via gradient-based matrix factorization with two adaptive learning rates

Statistical and Computational Analysis of Locality Preserving Projection

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

What is semi-supervised learning?

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

A Unifying Approach to Hard and Probabilistic Clustering

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

Approximating the Covariance Matrix with Low-rank Perturbations

An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets

MULTIPLEKERNELLEARNING CSE902

Contingency Table Analysis via Matrix Factorization

Clustering. SVD and NMF

arxiv: v2 [math.oc] 6 Oct 2011

Single-channel source separation using non-negative matrix factorization

Multi Omics Clustering. ABDBM Ron Shamir

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Wavelet-Based Feature Extraction for Microarray Data Classification

Nonnegative Matrix Factorization with Toeplitz Penalty

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization

Non-Negative Matrix Factorization

Regularized NNLS Algorithms for Nonnegative Matrix Factorization with Application to Text Document Clustering

Research Article A Hybrid ICA-SVM Approach for Determining the Quality Variables at Fault in a Multivariate Process

Unsupervised Learning of Ranking Functions for High-Dimensional Data

Cluster Kernels for Semi-Supervised Learning

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Spectral clustering and its use in bioinformatics

Graph Wavelets to Analyze Genomic Data with Biological Networks

Brief Introduction of Machine Learning Techniques for Content Analysis

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation

Transcription:

NMF based Gene Selection Algorithm for Improving Performance of the Spectral Cancer Clustering Andri Mirzal Faculty of Computing Universiti Teknologi Malaysia Skudai, Johor Bahru, Malaysia Email: andrimirzal@utm.my Abstract Analyzing cancers using microarray gene expression datasets is currently an active research in medical community. There are many tasks related to this research, e.g., clustering and classifcation, data compression, and samples characterization. In this paper, we discuss the task of cancer clustering. The spectral clustering is one of the most commonly used methods in cancer clustering. As the gene expression datasets usually are highly imbalanced, i.e., containing only a few tissue samples (hundreds at most but each is expressed by thousands of genes, filtering out some irrelevant and potentially misleading gene expressions is a necessary step to improve the performance of the method. In this paper, we propose an unsupervised gene selection algorithm based on the nonnegative matrix factorization (NMF. Our algorithm is designed by making use of the clustering capability of the NMF to select the most informative genes. Clustering performance of the spectral method is then evaluated by comparing the results using the original datasets with the results using the pruned datasets. Our results suggest that the proposed algorithm can be used to improve clustering performance of the spectral method. Keywords cancer clustering, gene selection, nonnegative matrix factorization, spectral clustering I. INTRODUCTION Cancer clustering is a task of grouping samples from patients with cancers so that samples with the same type can be clustered in the same group (usually each group refers to a specific cancer type [1]. In some datasets, normal tissues are also included for controlling purpose []. In the literature, one can find two related terms: cancer clustering and cancer classification which sometimes are used interchangeably. In this paper we explicitly differentiate these terms: cancer clustering refers to the unsupervised task for grouping the samples, and cancer classification refers to the supervised task where the classifiers are trained first before being used to classify the samples. In recent years, thousands of new gene expression datasets are being generated. These datasets usually consist of only a few samples (hundreds at most, but each sample is represented by thousands of gene expressions. This characteristic makes analyzing the datasets quite challenging because most clustering and classification techniques usually perform poorly when the number of samples is small. In addition, the high dimensionality of the data suggests that many of the gene expressions are actually irrelevant and possibly misleading, and thus gene selection procedure should be employed to clean the data. For classification problem, the small number of samples creates an additional problem of overfitting the classifiers [3]. The using of gene selection procedure to improve classification performance has been extensively studied [1], [3] [17]. Most of the proposed methods are based on the support vector machines (SVMs, and it has been shown that the methods can significantly improve performances of the classifiers. However in cancer clustering research, gene selection is not well studied yet. The common approach is to use the whole dimensions which potentially reduces performances of the clustering algorithms because the data can contain irrelevant and misleading gene expressions. In this paper, we propose an unsupervised gene selection algorithm based on the nonnegative matrix factorization (NMF. NMF is a matrix factorization technique that decomposes a nonnegative matrix into a pair of other nonnegative matrices. It has been successfully applied in many problem domains including clustering [4] [6], [18] [3], image analysis [33] [37], and feature extraction [4], [5], [38], [39]. The proposed algorithm is designed by using the fact that the NMF can be used for grouping similar genes in unsupervised manner and that the membership degrees of each gene to the clusters are directly given by entries in the corresponding column of the coefficient matrix. We then use the proposed algorithm to improve performance of the spectral clustering. II. THE SPECTRAL CLUSTERING The spectral clustering is a family of multiway clustering techniques that make use of eigenvectors of the data matrix to perform the clustering. Depending on the choosing of the matrix, the number of eigenvectors, and the algorithm to infer clusters from the eigenvectors, there are many spectral clustering algorithms available, e.g., [40] [4] (a detailed discussion on the spectral clustering can be found in ref. [43]. Here we use spectral clustering algorithm proposed by Ng et al. [41]. We choose this algorithm because of its intuitiveness and clustering capability. Algorithm 1 outlines the algorithm where R+ M N denotes an M-by-N nonnegative matrix and B+ M K denotes an M-by-K binary matrix. III. THE PROPOSED ALGORITHM Given a nonnegative data matrix A R+ M N,theNMF decomposes the matrix into the basis matrix B R+ M R and the coefficient matrix C R+ R N such that: A BC.

Algorithm 1 A spectral clustering algorithm by Ng et al. [41] 1 Input: Rectangular data matrix A R+ M N with M data points, and the number of clusters K. Construct symmetric affinity matrix Ȧ R+ M M from A by using the Gaussian kernel. 3 Normalize Ȧ by Ȧ D 1/ ȦD 1/ where D is a diagonal matrix with D ii = A j ij. 4 Compute K eigenvectors that correspond to the K largest eigenvalues of Ȧ, and form ˆX R M K = [ˆx 1,...,ˆx K ], where ˆx k is the k-th eigenvector. 5 Normalize every row of ˆX, i.e., X ij X ij /( j X ij 1/. 6 Apply k-means clustering on rows of ˆX to obtain clustering indicator matrix X B+ M K. To compute B and C, usually the following optimization problem is used: min B,C J(B, C =1 A BC F s.t. B 0, C 0, (1 where X F denotes the Frobenius norm of X. There are many algorithms proposed to solve the optimization problem in eq. 1. However, for clustering purpose, there is not much performance difference between the standar NMF algorithm proposed by Lee and Seung [44] and the more advanced and application specific algorithms [4] [6], [18] [3]. Accordingly, we use the standard NMF algorithm. Algorithm outlines the algorithm where b (k mr denotes the (m, r entry of B at k-th iteration, X T denotes the transpose of X, and δ denotes a small positive number to avoid division by zero. Algorithm The standard NMF algorithm [44]. Initialization: B (0 > 0 and C (0 > 0. for k =0,...,maxiter do end for b (k+1 mr c (k+1 rn b (k (AC (kt mr mr (B (k C (k C (kt mr + δ c (k rn m, r (B (k+1t A rn (B (k+1t B (k+1 C (k rn + δ r, n Let A denotes sample-by-gene matrix containing the gene expression data and R denotes the number of cancer classes. By using Algorithm to factorize A into B and C, column n-th of C describes the clustering membership degrees of nth gene to each cluster with the more positive the entry the more likely the gene to belong to the corresponding cluster. For hard clustering case, the membership is determined by the most positive entry. Further, if we normalize each column of C, i.e., c rn c rn / r c rn, the entries in each row will be comparable and consequently row r-th of C will describe the membership strength of the genes to the r-th cluster. Thus, we can sort these rows to find the most informative genes to the corresponding clusters. And by choosing some top genes for each cluster, we can select the most informative genes and remove some irrelevant and misleading genes. This process is the core of our algorithm. But because the NMF does not have uniqueness property, the process is repeated so that only genes that consistently come at the top are selected. Because of this repetition process, we introduce a score scheme that assigns some predefined scores to the top genes at each trial. And genes with the largest cumulative scores are then selected as the most informative genes. Our score scheme is based on the MotoGP scoring system, but the scores are assigned only to top 10 genes in each cluster (the scores for the top 10 genes are: 5, 0, 16, 13, 11, 10, 9, 8, 7, and 6. Algorithm 3 outlines the complete gene selection procedure. Algorithm 3 NMF based gene selection algorithm. 1 Input: Gene expression data matrix A R M N + (the rows correspond to the samples and the columns correspond to the genes and the number of cluster R. Normalize each column of A, i.e., a mn a mn / m a mn. 3 for l =0,...,L do a Compute C using Algorithm. b Normalize each column of C, i.e., c rn c rn / r c rn c Sort in descending order each row of C. d Assign scores to the top 10 genes in each row of C. e Accumulate the scores by adding the current scores to the previous ones. 4 end for 5 Select some top genes G according to the cumulative scores. IV. EXPERIMENTAL RESULTS To evaluate the capability of the proposed algorithm in improving performance of the spectral clustering, six publicly available cancer datasets from the work of Souto et al. [45] in which they compiled the first most comprehensive datasets collected from many resources (there are 35 datasets in total were used. Tables I summarizes the information of the datasets. As shown the datasets are quite representative as the number of classes varied from to 10, the number of samples varied from tens to hundreds, and we also have one dataset, Su-001, that contains multiple type of cancers. There are some parameters need to be chosen. The first is maxiter in Algorithm. Here we set maxiter to 100 as the standard NMF algorithm is known to be fast in minimizing the TABLE I. CANCER DATASETS. Dataset name Tissue #Samples #Genes #Classes Nutt-003-v Brain 8 1070 Armstrong-00-v Blood 7 194 3 Tomlins-006-v Prostate 9 188 4 Pomeroy-00-v Brain 4 1379 5 Yeoh-00-v Bone 48 56 6 Su-001 Multi 174 1571 10

error only for the first iterations [49]. The second is the number of trials L in Algorithm 3. After several attempts, we found that there was not much performance gain between L = 100 and L > 100. Thus we set L to be 100. The third is the number of top genes G in step 5 of Algorithm 3. After several attempts, G was set to 0, 1600, 50, 300, 000, and 00 for Nutt, Armstrong, Tomlins, Pomeroy, Yeoh, and Su respectively. And δ in Algorithm was set to 10 8. To evaluate clustering performance, two metrics were used: Accuracy and Adjusted Rand Index (ARI. Accuracy is the most commonly used metric to measure performance of clustering algorithms in medical community. It measures the fraction of the dominant class in a cluster. Accuracy is defined with [3]: Accuracy = 1 max c rs, M s r=1 where r and s denote the r-th cluster and s-th reference class respectively, R denotes the number of clusters produced by clustering algorithm, M denotes the number of samples, and c rs denotes the number of samples in r-th cluster that belong to s-th class. The values of Accuracy are between 0 and 1 with 1 indicates a perfect agreement between the reference classes and the clustering results. In machine learning community, this metric is also known as Purity [5]. Adjusted Rand Index (ARI has a value ranges from -1 to 1, with 1 indicates the perfect agreement and values near 0 or negatives correspond to clusters found by chance. ARI is defined wth [46] [48]: rs ARI = [ 1 r ( crs + s R ( N 1 ( c s ( c s r s ] ( N 1 r s ( c s, where c r denotes the number of samples in r-th cluster, and c s denotes the number of samples in s-th class. The experiment procedure is as follows. First Algorithm 3 was used to select top genes from the original data matrix A R+ M N. Then a new pruned data matrix A R+ M G was formed with the top G genes. This matrix was then inputted to Algorithm 1 to obtain clustering indicator matrix. The clustering quality was then measured by using Accuracy and ARI. Because of the nonuniqueness of the NMF, this procedure was repeated 100 times to get more statistically sound results. Fig. 1 shows performance of the spectral clustering with and without the gene selection procedure. As shown, the spectral clustering performed quite well in three datasets (Armstrong, Pomeroy, and Su, and rather produced unsatisfactory results in the other three datasets (Nutt, Tomlins, and Yeoh. The gene selection improved clustering performance of the spectral clustering in all cases with better improvements are observed in the cases where clustering results are rather unsatisfactory. These imply that there are not many irrelevant and misleading genes in the first cases so that the results with and without the gene selection are comparable. On the other hand, there are some of those genes in the second cases that were removed by the gene selection process. Table I dan II give the detailed experimental results for 100 trials where the values are displayed in format average values ± standard deviation values. Accuracy ARI 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0.1 0 0.7 0.6 0.5 0.4 0.3 0. 0.1 0 Without Gene Selection With Gene Selection Nutt Armstrong Tomlins Pomeroy Yeoh Su (a Accuracy Without Gene Selection With Gene Selection Nutt Armstrong Tomlins Pomeroy Yeoh Su (b ARI Fig. 1. Performance of the spectral clustering with and without gene selection measured by Accuracy and ARI (average values over 100 runs. TABLE II. TABLE III. ACCURACY AND ARI WITHOUT GENE SELECTION. Dataset name Accuracy ARI Nutt-003-v 0.571 ± 0.000 0.00 ± 0.000 Armstrong-00-v 0.861 ± 0.000 0.64 ± 0.000 Tomlins-006-v 0.587 ± 0.04 0.7 ± 0.019 Pomeroy-00-v 0.759 ± 0.03 0.503 ± 0.035 Yeoh-00-v 0.675 ± 0.030 0.347 ± 0.048 Su-001 0.744 ± 0.030 0.53 ± 0.050 ACCURACY AND ARI WITH GENE SELECTION. Dataset name Accuracy ARI Nutt-003-v 0.664 ± 0.036 0.095 ± 0.033 Armstrong-00-v 0.878 ± 0.014 0.667 ± 0.037 Tomlins-006-v 0.679 ± 0.031 0.343 ± 0.045 Pomeroy-00-v 0.767 ± 0.018 0.540 ± 0.03 Yeoh-00-v 0.730 ± 0.07 0.408 ± 0.030 Su-001 0.746 ± 0.033 0.569 ± 0.044

V. CONCLUSION We have presented a gene selection algorithm based on the NMF to select the most informative genes from a microarray gene expression dataset. The experimental results showed that the proposed algorithm improved performance of the spectral clustering with more visible improvements are observed in the cases where the spectral clustering produced rather unsatisfactory results. ACKNOWLEDGMENT The author would like to thank the reviewers for useful comments. This research was supported by Ministry of Higher Education of Malaysia and Universiti Teknologi Malaysia under Exploratory Research Grant Scheme R.J130000.788.4L095. REFERENCES [1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, Vol. 86(5439, pp. 531-537, 1999. [] S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y.H. Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen, D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, and T.R. Golub, Prediction of central nervous system embryonal tumour outcome based on gene expression, Nature, Vol. 415(6870, pp. 436-44, 00. [3] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene Selection for Cancer Classification using Support Vector Machines, Machine Learning, Vol. 46(1-3, pp. 389-4, 00. [4] J.P. Brunet, P. Tamayo, T.R. Golub, and J.P. Mesirov, Metagenes and molecular pattern discovery using matrix factorization, Proc. Natl Acad. Sci. USA, Vol. 101(1, pp. 4164-4169, 003. [5] C.H. Zheng, D.S. Huang, D. Zhang, and X.Z. Kong, Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection, IEEE Transactions on Information Technology in Biomedicine, Vol. 13(4, pp. 599-607, 009. [6] N. Yuvaraj and P. Vivekanandan, An efficient SVM based tumor classification with symmetry Non-negative Matrix Factorization using gene expression data, Proc. Int l Conf. on Information Communication and Embedded Systems, pp. 761-768, 013. [7] M. Pirooznia, J.Y. Yang, M.Q. Yang, and Y. Deng, A comparative study of different machine learning methods on microarray gene expression data, BMC Genomics, Vol. 9(Suppl 1, pp. S13, 008. [8] X. Liu, A. Krishnan, and A. Mondry, An Entropy-based gene selection method for cancer classification using microarray data, BMC Bioinformatics, Vol. 6, pp. 76, 005. [9] L. Wang, F. Chu, and W. Xie, Accurate Cancer Classification Using Expressions of Very Few Genes, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 4(1, pp. 40-53, 007. [10] L.Y. Chuang, H.W. Chang, C.J. Tu, and C.H. Yang, Improved binary PSO for feature selection using gene expression data, Computational Biology and Chemistry, Vol. 3(1, pp. 9-37, 008. [11] P. Mitra and D.D. Majumder, Feature Selection and Gene Clustering from Gene Expression Data, Proc. the 17th Int l Conf. on Pattern Recognition, pp. 343-346, 004. [1] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer and D. Haussler, Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, Vol. 16(10, pp. 906-914, 000. [13] S. Moon and H. Qi, Hybrid Dimensionality Reduction Method Based on Support Vector Machine and Independent Component Analysis, IEEE Transactions on Neural Networks and Learning Systems, Vol. 3(5, pp. 749-761, 01. [14] Y. Lee and C.K. Lee, Classification of multiple cancer types by multicategory support vector machines using gene expression data, Bioinformatics, Vol. 19(9, pp. 113-1139, 003. [15] X. Zhang, X. Lu, Q. Shi, X. Xu, H.E. Leung, L.N. Harris, J.D. Iglehart, A. Miron, J.S. Liu, and W.H. Wong, Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data, BMC Bioinformatics, Vol. 7(197, 006. [16] Y. Lu and J. Han, Cancer classification using gene expression data, Information Systems, Vol. 8(4, pp. 43-68, 003. [17] H.H. Zhang, J. Ahn, X. Lin. and C. Park, Gene selection using support vector machines with non-convex penalty, Bioinformatics, Vol. (1, pp. 88-95, 006. [18] F. Shahnaz, M.W. Berry, V. Pauca, and R.J. Plemmons, Document clustering using nonnegative matrix factorization, Information Processing & Management, Vol. 4(, pp. 373-386, 006. [19] W. Xu, X. Liu and Y. Gong, Document clustering based on nonnegative matrix factorization, Proc. ACM SIGIR, pp. 67-73, 003. [0] M. Berry, M. Brown, A. Langville, P. Pauca, and R.J. Plemmons, Algorithms and applications for approximate nonnegative matrix factorization, Computational Statistics and Data Analysis, Vol. 5(1, pp. 155-173, 007. [1] J. Yoo and S. Choi, Orthogonal nonnegative matrix factorization: Multiplicative updates on Stiefel manifolds, Proc. 9th Int l Conf. Intelligent Data Engineering and Automated Learning, pp. 140-147, 008. [] J. Yoo and S. Choi, Orthogonal nonnegative matrix tri-factorization for co-clustering: Multiplicative updates on Stiefel manifolds, Information Processing & Management, Vol. 46(5, pp. 559-570, 010. [3] Y. Gao and G. Church, Improving Molecular cancer class discovery through sparse non-negative matrix factorization, Bioinformatics, Vol. 1(1, pp. 3970-3975, 005. [4] D. Dueck, Q.D. Morris, and B.J. Frey, Multi-way clustering of microarray data using probabilistic sparse matrix factorization, Bioinformatics, Vol. 1(1, pp. 145-151, 005. [5] H. Kim and H. Park, Sparse non-negative matrix factorizations via alternating non-negativity constrained least squares for microarray data analysis, Bioinformatics, Vol. 3(1, pp. 1495-150, 007. [6] K. Devarajan, Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology, PLoS Computational Biology, Vol. 4(7, pp. e100009, 008. [7] H. Kim and H. Park, Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method, SIAM J. Matrix Anal. Appl., Vol. 30(, pp. 713-730, 008. [8] P. Carmona-Saez, R.D. Pascual-Marqui, F. Tirado, J.M Carazo, and A. Pascual-Montano, Biclustering of gene expression data by non-smooth non-negative matrix factorization, BMC Bioinformatics, Vol. 7(78, 006. [9] K. Inamura, T. Fujiwara, Y. Hoshida, T. Isagawa, M.H. Jones, C. Virtanen, M. Shimane, Y. Satoh, S. Okumura, K. Nakagawa, E. Tsuchiya, S. Ishikawa, H. Aburatani, H. Nomura, and Y. Ishikawa, Two subclasses of lung squamous cell carcinoma with different gene expression profiles and prognosis identified by hierarchical clustering and non-negative matrix factorization, Oncogene, Vol. 4, pp. 7105-7113, 005. [30] P. Fogel, S.S. Young, D.M. Hawkins, and N. Ledirac, Inferential, robust non-negative matrix factorization analysis of microarray data, Bioinformatics, Vol. 3(1, pp. 44-49, 007. [31] G. Wang, A.V. Kossenkov, and M.F. Ochs, LS-NMF: A modified nonnegative matrix factorization algorithm utilizing uncertainty estimates, BMC Bioinformatics, Vol. 7(175, 006. [3] J.J.Y. Wang, X. Wang, and X. Gao, Non-negative matrix factorization by maximizing correntropy for cancer clustering, BMC Bioinformatics, Vol. 14(107, 013. [33] P.O. Hoyer, Non-negative matrix factorization with sparseness constraints, The Journal of Machine Learning Research, Vol. 5, pp. 1457-1469, 004. [34] S.Z. Li, X.W. Hou, H.J. Zhang, and Q.S. Cheng, Learning spatially localized, parts-based representation, Proc. IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition, pp. 07-1, 001. [35] D. Wang and H. Lu, On-line learning parts-based representation via incremental orthogonal projective non-negative matrix factorization, Signal Processing, Vol. 93(6, pp. 1608-163, 013.

[36] A. Pascual-Montano, J.M. Carazo, K. Kochi, D. Lehman, and R.D. Pascual-Marqui, Nonsmooth nonnegative matrix factorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.8 (3, pp. 403-415, 006. [37] N. Gillis and F. Glineur, A multilevel approach for nonnegative matrix factorization, Journal of Computational and Applied Mathematics, Vol. 36(7, pp. 1708-173, 01. [38] H. Kim and H. Park, Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method, SIAM. J. Matrix Anal. & Appl., Vol. 30(, pp. 713-730, 008. [39] W. Kim, B. Chen, J. Kim, Y. Pan, and H. Park, Sparse nonnegative matrix factorization for protein sequence motif discovery, Expert Systems with Applications, Vol. 38(10, pp. 13198-1307, 011. [40] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., Vol. (8, pp. 888-905, 000. [41] A. Ng, M.I. Jordan, and Y. Weiss, On spectral clustering: analysis and an algorithm, Proc. Advances in Neural Information Processing Systems, pp. 849-856, 00. [4] S.X. Yu and J. Shi, Multiclass spectral clustering, Proc. IEEE Int l Conf. on Computer Vision, pp. 313-319, 003. [43] U. Luxburg, A tutorial on spectral clustering, Statistics and Computing, Vol. 17, pp. 395-416, 007. [44] D. Lee and H. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, Vol. 401(6755, pp. 788-791, 1999. [45] M.C.P. Souto, I.G. Costa, D.S.A. Araujo, T.B. Ludermir, and A. Schliep, Clustering cancer gene expression data: a comparative study, BMC Bioinformatics, Vol. 9(497, 008. [46] W.M. Rand, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, Vol. 66(336, pp. 846-850, 1971. [47] L. Hubert and P. Arabie, Comparing partitions, Journal of Classification, Vol. (1, pp. 193-18, 1985. [48] N.X. Vinh, J. Epps, and J. Bailey, Information theoretic measures for clustering comparison: Is a correction for chance necessary?, Proc. 6th Annual Int l Conf. on Machine Learning, pp. 1073-1080, 009. [49] C.J. Lin, On the convergence of multiplicative update algorithms for nonnegative matrix factorization, IEEE Transactions on Neural Networks, Vol. 18(6, pp. 1589-1596, 007.