NONNEGATIVE MATRIX FACTORIZATION FOR PATTERN RECOGNITION

Size: px
Start display at page:

Download "NONNEGATIVE MATRIX FACTORIZATION FOR PATTERN RECOGNITION"

Transcription

1 NONNEGATIVE MATRIX FACTORIZATION FOR PATTERN RECOGNITION Oleg Okun Machine Vision Group Department of Electrical and Information Engineering University of Oulu FINLAND Helen Priisalu Tallinn University of Technology ESTONIA ABSTRACT In this paper, linear and unsupervised dimensionality reduction via matrix factorization with nonnegativity constraints is studied when applied for feature extraction, followed by pattern recognition. Since typically matrix factorization is iteratively done, convergence can be slow. To alleviate this problem, a significantly (more than 11 times) faster algorithm is proposed, which does not cause severe degradations in classification accuracy when dimensionality reduction is followed by classification. Such results are due to two modifications of the previous algorithms: feature scaling (normalization) prior to the beginning of iterations and initialization of iterations, combining two techniques for mapping unseen data. KEY WORDS pattern recognition, nonnegative matrix factorization 1 Introduction Dimensionality reduction is typically used to cure or at least to alleviate the effect known as curse of dimensionality negatively affecting classification accuracy. The simplest way to reduce dimensionality is to linearly transform the original data. Given the original, high-dimensional data gathered in an n m matrix V, a transformed or reduced matrix H, composed of mr-dimensional vectors (r <n and often r n), is obtained from V according to the following linear transformation W: V WH, where W is an n r (basis) matrix. Nonnegative matrix factorization (NMF) is an example of a typical linear transformation, but unlike other transformations, it is based on nonnegativity constraints on all matrices involved. Thanks to this fact, it can generate a part-based representation, since no subtractions are allowed. Lee and Seung [1] proposed a simple iterative algorithm for NMF and proved its convergence. The factorized matrices are initialized with positive random numbers before starting matrix updates. It is well known that initialization is of importance for any iterative algorithm: properly initialized, an algorithm converges faster. However, this issue was not yet investigated in case of NMF. In this paper, our contribution is two modifications accelerating algorithm convergence: 1) feature scaling prior to NMF and 2) combination of two techniques for mapping unseen data with theoretical proof of why this combination leads to faster convergence. Because of its straightforward implementation, NMF has been already applied for object and pattern classification [2, 3, 4, 5, 6, 7, 8, 9]. Here we extend the application of NMF to bioinformatics: NMF coupled with a classifier is applied for protein fold recognition. Statistical analysis (by means of one-way analysis of variance (ANOVA) and multiple comparison tests) of the obtained results (error rates) testifies to the absence of significant accuracy degradations when introducing our modifications, compared to the standard NMF algorithm. 2 Nonnegative Matrix Factorization Given the nonnegative matrices V, W and H whose sizes are n m, n r and r m, respectively, we aim at such factorization that V WH. The value of r is selected nm n+m according to the rule r< in order to obtain dimensionality reduction. Each column of W is a basis vector while each column of H is a reduced representation of the corresponding column of V. In other words, W can be seen as a basis that is optimized for linear approximation of the data in V. NMF provides the following simple learning rule guaranteeing monotonical convergence to a local maximum without the need for setting any adjustable parameters [1]: W ia W ia W ia µ H aµ H aµ V iµ (WH) iµ H aµ, (1) W ia j W ja i, (2) W ia V iµ (WH) iµ. (3) The matrices W and H are initialized with positive random values. Eqs. (1-3) iterate until convergence to a local maximum of the following objective function: F = i=1 µ=1 (V iµ log(wh) iµ (WH) iµ ). (4)

2 In its original form, NMF can be slow to converge to a local maximum for large matrices and/or high data dimensionality. On the other hand, stopping after a predefined number of iterations, as sometimes done, can be too premature to get a good approximation. Introducing a parameter tol (0 <tol 1) to decide when to stop iterations significantly speeds up the convergence without negatively affecting the mean square error (MSE), measuring the approximation quality 1. That is, iterations stop when F new F old <tol. After learning the NMF basis functions, i.e. the matrix W, new (unseen before) data in the matrix are mapped to r-dimensional space by fixing W and using one of the following techniques: 1. randomly initializing H as described above and iterating Eq. 3 until convergence [3], 2. as a least squares solution of V new = WH new, i.e. H new =(W T W) 1 W T V new, e.g. as in [5], where V new contains the new (unseen or test) data. We will call the first technique iterative while the second - direct, because the latter provides a straightforward noniterative solution. 3 Our Contribution We propose two modifications in order to accelerate convergence of the iterative NMF algorithm. The first modification concerns feature scaling (normalization) linked to the initialization of the factorized matrices. Typically, these matrices are initialized with positive random numbers, say uniformly distributed between 0 and 1, in order to satisfy the nonnegativity constraints. Hence, elements of V (matrix of the original data) also need to be within the same range. Given that V j is an n-dimensional feature vector, where j =1,...,m, its components are normalized as follows: /V kj, where k =argmax l V lj. In other words, components of each feature vector are divided by the maximal value among them. As a result, feature vectors are composed of components whose nonnegative values do not exceed 1. Since all three matrices (V, W, H) have now entries between 0 and 1, it takes much less time to perform matrix factorization V WH (values of the entries in the factorized matrices do not have to grow much in order to satisfy the stopping criterion for the objection function F in Eq. 4) than if V had the original (unnormalized) values. As additional benefit, MSE becomes much smaller, too, because a difference of the original ( ) and approximated ((WH) ij ) values becomes smaller, given that mn is fixed. Though this modification is simple, it brings significant speed of convergence as will be shown below, and to our best knowledge, it was never explicitly mentioned in literature. 1 It was observed in numerous experiments that MSE quickly decreases after not very many iterations, and after that its rate of decrease dramatically slows down. It is generally acknowledged that proper initialization significantly speeds up the convergence of an iterative algorithm. Therefore the second modification concerns initialization of NMF iterations for mapping unseen data (known as generalization), i.e. after the basis matrix W has been learned. Since such a mapping in NMF involves only the matrix H (W is kept fixed), its initialization is to be done. We propose to initially set H to (W T W) 1 W T V new, i.e. to the solution provided by the direct mapping technique in Section 2, because 1) it provides a better initial approximation for H new than a random guess, and 2) it moves the start of iterations closer toward the final point, since the objective function F in Eq. 4 is increasing [1] and the inequality F direct >F iter always holds at initialization (theorem below proves this fact), where F direct and F iter stand for the values of F when using the direct and iterative techniques, respectively. Theorem 1 Given that F direct and F iter are values of the objective function in Eq. 4 when mapping unseen data with the direct and iterative techniques, respectively. Then F direct F iter > 0 always holds at the start of iterations when using Eq. 3. Proof. By definition, F iter = ( log(wh) ij (WH) ij ), F direct = ( log ). The difference F direct F iter is equal to ( log log(wh) ij +(WH) ij )= ( (log ( log ) 1) + (WH) ij = (WH) ij ) 10(WH) ij +(WH) ij In the last expression, we assumed the logarithm to the base 10, though the natural logarithm or the logarithm to the base 2 could be used as well without affecting our proof. Given that all three matrices involved are nonnegative, the last expression is positive if either condition is satisfied: 1. log 10(WH) ij > 0, 2. (WH) ij > log 10(WH) ij. Let us introduce a new variable, t: t = above conditions can be written as 1. log t 10 > 0 or logt > 1, Vij (WH) ij. Then the. 547

3 2. 1 >tlog t t+1 10 or log t< t. The first condition is satisfied if t>10 whereas the second if t<t 0 (t 0 12). Therefore either t>10 or t<12 should be satisfied for F direct >F iter. Since the union of both conditions covers the whole interval [0, + [, it means that F direct >F iter holds, independently of t, i.e. of whether > (WH) ij or not. This proof is valid for the base 2 and natural logarithms, too, because 2 or 2.71, replacing 10 above, is still smaller than 12. It means that the whole interval [0, + [ is covered. Q.E.D. Because our approach combines both direct and iterative techniques for mapping unseen data, we will call it iterative2, since the whole procedure remains essentially iterative but only the initialization is modified. 4 Summary of Our Algorithm Suppose that the whole data set is divided into training and test (unseen) sets. Our algorithm is summarized as follows: 1. Scale both training and test data and randomly initialize the factorized matrices as described in Section 3. Set parameters tol and r. 2. Iterate Eqs. 1-3 until convergence to obtain the NMF basis matrix W and to map training data to NMF (reduced) space. 3. Given W, map test data by using the direct technique. Set negative values in the resulting matrix H direct new to zero. 4. Fix the basis matrix and iterate Eq. 3 until convergence by using H direct new at initialization of iterations. The resulting matrix H iterative2 new provides reduced representations of the test data in the NMF space. Thus, the main distinction of our NMF algorithm from others is that it combines both techniques for mapping unseen data instead of applying one of them for this task. 5 Practical Application As a challenging task, we selected protein fold recognition from bioinformatics. Protein is an amino acid sequence. In bioinformatics, one of the current trends is to understand evolutionary relationships in terms of protein function. Two common approaches to identify protein function are sequence analysis and structure analysis. Sequence analysis is based on a comparison between unknown sequences and those whose function is already known and it is argued that it is good at high levels of sequence identity, but below 50% identity it becomes less reliable. Protein fold recognition is structure analysis without relying on sequence similarity. Proteins are said to have a common fold if they have the same major secondary structure 2 in the same arrangement and with the same topology, whether or not they have a common evolutionary origin. The structural similarities of proteins in the same fold arise from the physical and chemical properties favoring certain arrangements and topologies, meaning that various physicochemical features such as fold compactness or hydrophobicity are utilized for recognition. As the gap widens between the number of known sequences and the number of experimentally determined protein structures (the ratio is more than 100 to 1, and sequence databases are doubling in size every year), the demand for automated fold recognition techniques rapidly grows. 6 Experiments Experiments with NMF involve estimation of the error rate when combining NMF and a classifier. Three techniques for generalization (mapping of the test data) are used: direct, iterative, and iterative2. The training data are mapped into reduced space according to Eqs. 1-3 with simultaneous learning of the matrix W. We experimented with the real-world data set created by Ding and Dubchak [10] and briefly described below. Tests were repeated 10 times to collect statistics necessary for comparison of three generalization techniques. Each time, the different random initialization for learning the basis matrix W was used, but the same learned basis matrix was utilized in each run for all generalization techniques in order to create as fair comparison of three generalization techniques as possible. Values of r (dimensionality of reduced space) were set to 25, 50, 75, and 88 3, which constitutes 20%, 40%, 60%, and 71.2% of the original dimensionality, respectively, while tol was equal to In all reported statistical tests α =0.05. All algorithms were implemented in MAT- LAB running on a Pentium 4 (3 GHz CPU, 1GB RAM). 6.1 Data A challenging data set derived from the SCOP (Structural Classification of Proteins) database was used in experiments. It is available online 4 and its detailed description can be found in [10]. The data set contains the 27 most populated folds represented by seven or more proteins. Ding and Dubchak already split it into the training and test sets, which we will use as other authors did. Feature vectors characterizing protein folds are 125- dimensional. The training set consists of 313 protein folds having (for each two proteins) no more than 35% of the sequence identity for aligned subsequences longer than 80 residues. The test set of 385 folds is composed of protein sequences of less than 40% identity with each other and 2 Regions of local regularity within a fold is the largest possible value of r, resulting in dimensionality reduction with NMF for a 125x313 training matrix used in experiments. 4 cding/protein/ 548

4 less than 35% identity with the proteins of the first set. In fact, 90% of the proteins of the test set have less than 25% sequence identity with the proteins of the training set. This, as well as multiple classes, many of which are sparsely represented, render the task extremely difficult. 6.2 Classification Results The K-Local Hyperplane Distance Nearest Neighbor (HKNN) [11] was selected as a classifier, since it demonstrated a competitive performance compared to support vector machine (SVM) when both methods were applied to classify the abovementioned protein data set in the original space, i.e. without dimensionality reduction [12]. In addition, when applied to other data sets, the combination of NMF and HKNN showed very good results [13], thus rendering HKNN as a natural choice for NMF. HKNN computes distances of each test point x to L local hyperplanes, where L is the number of different classes. The lth hyperplane is composed of K nearest neighbors of x in the training set, belonging to the lth class. A test point x is associated with the class whose hyperplane is closest to x. HKNN needs to predefine two parameters, K and λ (regularization). The normalized features were used since feature normalization to zero mean and unit variance prior to HKNN dramatically increases the classification accuracy (on average by 6 percent). The optimal values for two parameters of HKNN, K and λ, determined via cross-validation, are 7 and 8, respectively. Table 1 shows error rates obtained with HKNN, SVM, and various neural networks when classifying protein folds in the original, 125-dimensional space. A brief survey of the methods used for classification of this data set can be found in [12]. From Table 1 one can see that HKNN outperformed all but one competitors 5 even though some of them employed ensembles of classifiers. Table 1. Error rates when classifying protein folds in the original space with different methods. Source Classifier Error rate [14] DIMLP-B 38.8 [14] DIMLP-A 40.9 [15] MLP 51.2 [15] GRNN 55.8 [15] RBFN 50.6 [15] SVM 48.6 [16] RBFN 48.8 [10] SVM 46.1 [12] HKNN In [14] they used an ensemble of four-layer Discretized Interpretable Multi-Layer Perceptrons (DIMLP), together with bagging or arcing for combining individual classifiers. When using a single DIMLP, the error rate was 61.8% as reported in [14]. Let us now turn to the error rates in reduced space. As mentioned above, we tried four different values for r and three different generalization techniques with and without scaling prior to NMF. For each value of r, NMF followed by HKNN are repeated 10 times, and the results are assembled in a single column vector [e 88 1,...,e88 10,e75 1,...,e75 10,e50 1,...,e50 10,e25 1,...,e25 10 ]T, where the first ten error rates correspond to r =88,next ten error rates to r = 75, etc. As a results, a 40x6 (10 runs times 4 values of r times 3 generalization techniques times 2 scaling options for NMF) matrix containing the error rates was generated. This matrix is then subjected to the one-way analysis of variance (ANOVA) and multiple comparison tests in order to make statistically driven conclusions about all generalization techniques. Table 2 shows identifiers used below in association with all generalization techniques. Table 2. Generalization techniques and their identifiers. Identifier Technique Scaling prior to NMF 1 Direct No 2 Iterative No 3 Iterative2 No 4 Direct Yes 5 Iterative Yes 6 Iterative2 Yes The one-way ANOVA test is first utilized in order to find out whether the mean error rates of all six techniques are the same (null hypothesis H 0 : µ 1 = µ 2 = = µ 6 ) or not. If the returned p-value is smaller than α =0.05 (which happened in our case), the null hypothesis is rejected, which implies that the mean error rates are not the same. The next step is to determine which pairs of means are significantly different, and which are not by means of the multiple comparison test. Such a sequence of tests constitutes the standard statistical protocol when two or more means are to be compared. Table 3 contains results of the multiple comparison test and it is seen that these results confirm that the direct technique stands apart from both iterative techniques. The main conclusions from Table 3 are µ 1 = µ 4 and µ 2 = µ 3 = µ 5 = µ 6, i.e. there are two groups of techniques, and the mean of the first group is larger than that of the second group based on Table 4. It implies that the direct technique tends to lead to higher error than either iterative technique when classifying protein folds. On the other hand, the multiple comparison test did not detect any significant difference in the mean error rates of both iterative techniques, though it looks like the standard iterative technique while being slower (see the next subsection) than the iterative2 technique, is nevertheless a bit more accurate on the data set used. The last column in Table 4 points to the very interesting result: independendently of feature normalization prior 549

5 Table 3. Results of the multiple comparison test. Numbers in the first two columns stand for technique identifiers. The third and fifth columns contain a confidence interval for the estimated difference in means of two techniques with the given identifiers. If the confidence interval contains 0 (zero), the difference in means is not significant, i.e. H 0 is not rejected Reject H 0 : µ 1 =µ Reject H 0 : µ 1 =µ Accept H 0 : µ 1 = µ Reject H 0 : µ 1 =µ Reject H 0 : µ 1 =µ Accept H 0 : µ 2 = µ Reject H 0 : µ 2 =µ Accept H 0 : µ 2 = µ Accept H 0 : µ 2 = µ Reject H 0 : µ 3 =µ Accept H 0 : µ 3 = µ Accept H 0 : µ 3 = µ Reject H 0 : µ 4 =µ Reject H 0 : µ 4 =µ Reject H 0 : µ 5 =µ 6 Table 4. Mean error rates for all six generalization techniques (standard error for all techniques is 0.4). Standard deviations are reported only for four techniques associated with smaller error. Identifier Mean error rate Std. deviation to NMF, the standard deviation of the error rate for the our technique is lowest. It implies that our modifications of NMF led to a visible reduction in the deviation of classification error! This reduction is caused by shrinking a search space of possible factorizations. That is, our initialization eliminates some potentially erroneous solutions before iterations even start and leads to more stable classification error. However, the open issue is when to stop iterations in order to achieve the highest accuracy. One can say that the error rates in reduced space are larger than the error rate (42.7%) achieved in the original space. Nevertheless, we observed that sometimes error in reduced space can be lower than 42.7: for example, the minimal error when applying the iterative technique with no scaling before NMF and r = 50 is 39.22, while the minimal error when using the iterative2 technique under the same conditions is Varying errors can be attributed to the fact that NMF factorization is not unique. The overall conclusion about three generalization techniques is that for the given data set, despite the fact that the direct technique is the fastest among the three, it lagged far behind either iterative technique in terms of classification accuracy when NMF is followed by HKNN. Among the iterative techniques, the conventional technique provides a slightly better accuracy (though the multiple comparison test did not detect any statistically significant difference) than the iterative2 technique, but at slower convergence as will be demonstrated in the next section. 6.3 Time Since the iterative techniques showed better performance than the direct technique, we will compare them in terms of time. Table 5 accumulates gains resulted from our modifications for different dimensionalities of reduced space. R 1 stands for the ratio of the average times spent on learning NMF basis and mapping training data to NMF space without and with scaling prior to NMF. R 2 means the ratio of the average times spent on generalization by means of the iterative technique without and with scaling prior to NMF. R 3 is the ratio of the average times spent on generalization by means of the iterative2 technique without and with scaling prior to NMF. R 4 corresponds to the ratio of the average times spent on learning and generalization together by means of the iterative technique without and with scaling prior to NMF. R 5 corresponds to the ratio of the average times spent on learning and generalization together by means of the iterative2 technique without and with scaling prior to NMF. R 6 is the gain in time caused by applying the iterative2 technique for generalization instead of the iterative one when no scaling before NMF occurs, whereas R 7 is the gain in time caused by applying the iterative2 technique for generalization instead of the iterative one when scaling before NMF occurs. As a result, the average gain in time obtained with our modifications is more than 11 times. Table 5. Gains in time resulting from our modifications of the conventional NMF algorithm. r R 1 R 2 R 3 R 4 R 5 R 6 R Average Conclusion The main contribution of this work is two modifications of the basic NMF algorithm and its practical application to 550

6 the challenging real-world task of protein fold recognition. The first modification concerns feature scaling before NMF while the second modification combines two known generalization techniques, which we called direct and iterative; the former is used as a starting point for updates of the latter, thus leading to a new generalization technique which is called iterative2. We proved (both theoretically and experimentally) that at the start of generalization the objective function in Eq. 4 of the new technique is always larger than that of the conventional iterative technique. Since this function is increasing, the generalization procedure according to the iterative2 technique converges faster than that with the ordinary iterative technique. When modifying NMF, we were also interested in the error rate when combining NMF and a classifier (HKNN in this work). To reach this goal, we tested all three generalization techniques. Statistical analysis of the obtained results indicates that the mean error rate associated with the direct technique is higher than that related to either iterative technique while both iterative techniques lead to the statistically similar error rates. Since the iterative2 technique provides a faster mapping of unseen data, it is advantageous to apply it instead of the ordinary iterative one, especially in combination with feature scaling before NMF because of the significant gain in time resulted from both modifications. References [1] D.D. Lee and H.S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755): , [2] I. Buciu and I. Pitas. Application of non-negative and local non-negative matrix factorization to facial expression recognition. In Proc. 17th Int. Conf. on Pattern Recognition, Cambridge, UK, pages , [3] D. Gillamet and J. Vitrià. Discriminant basis for object classification. In Proc. 11th Int. Conf. on Image Analysis and Processing, Palermo, Italy, pages , [4] X. Chen, L. Gu, S.Z. Li, and H.-J. Zhang. Learning representative local features for face detection. In Proc IEEE Conference on Computer Vision and Pattern Recognition, Kauai, HW, pages , [5] T. Feng, S.Z. Li, H.-Y. Shum, and H.-J. Zhang. Local non-negative matrix factorization as a visual representation. In Proc. 2nd International Conference on Development and Learning, Cambridge, MA, pages , [6] D. Guillamet and J. Vitrià. Evaluation of distance metrics for recognition based on non-negative matrix factorization. Pattern Recognition Letters, 24(9-10): , [7] D. Guillamet, J. Vitrià, and B. Schiele. Introducing a weighted non-negative matrix factorization for image classification. Pattern Recognition Letters, 24(14): , [8] Y. Wang, Y. Jia, C. Hu, and M. Turk. Fisher nonnegative matrix factorization for learning local features. In Proc. 6th Asian Conf. on Computer Vision, Jeju Island, Korea, [9] M. Rajapakse and L. Wyse. NMF vs ICA for face recognition. In Proc. 3rd Int. Symp. on Image and Signal Processing and Analysis, Rome, Italy, pages , [10] C.H.Q. Ding and I. Dubchak. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics, 17(4): , [11] P. Vincent and Y. Bengio. K-local hyperplane and convex distance nearest neighbor algorithms. In T.G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages MIT Press, Cambridge, MA, [12] O. Okun. Protein fold recognition with k-local hyperplane distance nearest neighbor algorithm. In Proc. 2nd European Workshop on Data Mining and Text Mining for Bioinformatics, Pisa, Italy, pages 47 53, [13] O. Okun. Non-negative matrix factorization and classifiers: experimental study. In Proc. 4th IASTED Int. Conf. on Visualization, Imaging, and Image Processing, Marbella, Spain, pages , [14] G. Bologna and R.D. Appel. A comparison study on protein fold recognition. In Proc. 9th Int. Conf. on Neural Information Processing, Singapore, pages , [15] I.-F. Chung, C.-D. Huang, Y.-H. Shen, and C.-T. Lin. Recognition of structure classification of protein folding by NN and SVM hierarchical learning architecture. In O. Kaynak, E. Alpaydin, E. Oja, and L. Xu, editors, Lecture Notes in Computer Science (Artificial Neural Networks and Neural Information Processing - ICANN/ICONIP 2003, Istanbul, Turkey), volume 2714, pages Springer- Verlag, Berlin, [16] N.R. Pal and D. Chakraborty. Some new features for protein fold recognition. In O. Kaynak, E. Alpaydin, E. Oja, and L. Xu, editors, Lecture Notes in Computer Science (Artificial Neural Networks and Neural Information Processing - ICANN/ICONIP 2003, Istanbul, Turkey), volume 2714, pages Springer- Verlag, Berlin,

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

AN ENHANCED INITIALIZATION METHOD FOR NON-NEGATIVE MATRIX FACTORIZATION. Liyun Gong 1, Asoke K. Nandi 2,3 L69 3BX, UK; 3PH, UK;

AN ENHANCED INITIALIZATION METHOD FOR NON-NEGATIVE MATRIX FACTORIZATION. Liyun Gong 1, Asoke K. Nandi 2,3 L69 3BX, UK; 3PH, UK; 213 IEEE INERNAIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEP. 22 25, 213, SOUHAMPON, UK AN ENHANCED INIIALIZAION MEHOD FOR NON-NEGAIVE MARIX FACORIZAION Liyun Gong 1, Asoke K. Nandi 2,3

More information

c Springer, Reprinted with permission.

c Springer, Reprinted with permission. Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent

More information

Non-negative Matrix Factorization on Kernels

Non-negative Matrix Factorization on Kernels Non-negative Matrix Factorization on Kernels Daoqiang Zhang, 2, Zhi-Hua Zhou 2, and Songcan Chen Department of Computer Science and Engineering Nanjing University of Aeronautics and Astronautics, Nanjing

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Nonnegative Matrix Factorization Clustering on Multiple Manifolds Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

MinOver Revisited for Incremental Support-Vector-Classification

MinOver Revisited for Incremental Support-Vector-Classification MinOver Revisited for Incremental Support-Vector-Classification Thomas Martinetz Institute for Neuro- and Bioinformatics University of Lübeck D-23538 Lübeck, Germany martinetz@informatik.uni-luebeck.de

More information

Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification

Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification 250 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification Keigo Kimura, Mineichi Kudo and Lu Sun Graduate

More information

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Daoqiang Zhang Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China

More information

Improvements to Platt s SMO Algorithm for SVM Classifier Design

Improvements to Platt s SMO Algorithm for SVM Classifier Design LETTER Communicated by John Platt Improvements to Platt s SMO Algorithm for SVM Classifier Design S. S. Keerthi Department of Mechanical and Production Engineering, National University of Singapore, Singapore-119260

More information

A General Method for Combining Predictors Tested on Protein Secondary Structure Prediction

A General Method for Combining Predictors Tested on Protein Secondary Structure Prediction A General Method for Combining Predictors Tested on Protein Secondary Structure Prediction Jakob V. Hansen Department of Computer Science, University of Aarhus Ny Munkegade, Bldg. 540, DK-8000 Aarhus C,

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

On Spectral Basis Selection for Single Channel Polyphonic Music Separation On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu

More information

A Simple Implementation of the Stochastic Discrimination for Pattern Recognition

A Simple Implementation of the Stochastic Discrimination for Pattern Recognition A Simple Implementation of the Stochastic Discrimination for Pattern Recognition Dechang Chen 1 and Xiuzhen Cheng 2 1 University of Wisconsin Green Bay, Green Bay, WI 54311, USA chend@uwgb.edu 2 University

More information

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,

More information

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Antti Honkela 1, Stefan Harmeling 2, Leo Lundqvist 1, and Harri Valpola 1 1 Helsinki University of Technology,

More information

A Simple Algorithm for Learning Stable Machines

A Simple Algorithm for Learning Stable Machines A Simple Algorithm for Learning Stable Machines Savina Andonova and Andre Elisseeff and Theodoros Evgeniou and Massimiliano ontil Abstract. We present an algorithm for learning stable machines which is

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning LIU, LU, GU: GROUP SPARSE NMF FOR MULTI-MANIFOLD LEARNING 1 Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning Xiangyang Liu 1,2 liuxy@sjtu.edu.cn Hongtao Lu 1 htlu@sjtu.edu.cn

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Adaptive Boosting of Neural Networks for Character Recognition

Adaptive Boosting of Neural Networks for Character Recognition Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Yoshua Bengio Dept. Informatique et Recherche Opérationnelle Université de Montréal, Montreal, Qc H3C-3J7, Canada fschwenk,bengioyg@iro.umontreal.ca

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

Feature Extraction with Weighted Samples Based on Independent Component Analysis

Feature Extraction with Weighted Samples Based on Independent Component Analysis Feature Extraction with Weighted Samples Based on Independent Component Analysis Nojun Kwak Samsung Electronics, Suwon P.O. Box 105, Suwon-Si, Gyeonggi-Do, KOREA 442-742, nojunk@ieee.org, WWW home page:

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16 COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms 03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Change point method: an exact line search method for SVMs

Change point method: an exact line search method for SVMs Erasmus University Rotterdam Bachelor Thesis Econometrics & Operations Research Change point method: an exact line search method for SVMs Author: Yegor Troyan Student number: 386332 Supervisor: Dr. P.J.F.

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

Logistic Regression and Boosting for Labeled Bags of Instances

Logistic Regression and Boosting for Labeled Bags of Instances Logistic Regression and Boosting for Labeled Bags of Instances Xin Xu and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {xx5, eibe}@cs.waikato.ac.nz Abstract. In

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

Analysis of Multiclass Support Vector Machines

Analysis of Multiclass Support Vector Machines Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

Sparse Support Vector Machines by Kernel Discriminant Analysis

Sparse Support Vector Machines by Kernel Discriminant Analysis Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines

More information

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch

More information

DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING

DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING DESIGNING RBF CLASSIFIERS FOR WEIGHTED BOOSTING Vanessa Gómez-Verdejo, Jerónimo Arenas-García, Manuel Ortega-Moral and Aníbal R. Figueiras-Vidal Department of Signal Theory and Communications Universidad

More information

Multilayer Perceptron Learning Utilizing Singular Regions and Search Pruning

Multilayer Perceptron Learning Utilizing Singular Regions and Search Pruning Multilayer Perceptron Learning Utilizing Singular Regions and Search Pruning Seiya Satoh and Ryohei Nakano Abstract In a search space of a multilayer perceptron having hidden units, MLP(), there exist

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

EUSIPCO

EUSIPCO EUSIPCO 2013 1569741067 CLUSERING BY NON-NEGAIVE MARIX FACORIZAION WIH INDEPENDEN PRINCIPAL COMPONEN INIIALIZAION Liyun Gong 1, Asoke K. Nandi 2,3 1 Department of Electrical Engineering and Electronics,

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

Cholesky Decomposition Rectification for Non-negative Matrix Factorization Cholesky Decomposition Rectification for Non-negative Matrix Factorization Tetsuya Yoshida Graduate School of Information Science and Technology, Hokkaido University N-14 W-9, Sapporo 060-0814, Japan yoshida@meme.hokudai.ac.jp

More information

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE 5290 - Artificial Intelligence Grad Project Dr. Debasis Mitra Group 6 Taher Patanwala Zubin Kadva Factor Analysis (FA) 1. Introduction Factor

More information

#33 - Genomics 11/09/07

#33 - Genomics 11/09/07 BCB 444/544 Required Reading (before lecture) Lecture 33 Mon Nov 5 - Lecture 31 Phylogenetics Parsimony and ML Chp 11 - pp 142 169 Genomics Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Local Feature Extraction Models from Incomplete Data in Face Recognition Based on Nonnegative Matrix Factorization

Local Feature Extraction Models from Incomplete Data in Face Recognition Based on Nonnegative Matrix Factorization American Journal of Software Engineering and Applications 2015; 4(3): 50-55 Published online May 12, 2015 (http://www.sciencepublishinggroup.com/j/ajsea) doi: 10.11648/j.ajsea.20150403.12 ISSN: 2327-2473

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Non-Negative Factorization for Clustering of Microarray Data

Non-Negative Factorization for Clustering of Microarray Data INT J COMPUT COMMUN, ISSN 1841-9836 9(1):16-23, February, 2014. Non-Negative Factorization for Clustering of Microarray Data L. Morgos Lucian Morgos Dept. of Electronics and Telecommunications Faculty

More information

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Does the Wake-sleep Algorithm Produce Good Density Estimators? Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

arxiv: v3 [cs.lg] 18 Mar 2013

arxiv: v3 [cs.lg] 18 Mar 2013 Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

NON-NEGATIVE SPARSE CODING

NON-NEGATIVE SPARSE CODING NON-NEGATIVE SPARSE CODING Patrik O. Hoyer Neural Networks Research Centre Helsinki University of Technology P.O. Box 9800, FIN-02015 HUT, Finland patrik.hoyer@hut.fi To appear in: Neural Networks for

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Linear Classification and SVM. Dr. Xin Zhang

Linear Classification and SVM. Dr. Xin Zhang Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

A Unifying View of Image Similarity

A Unifying View of Image Similarity ppears in Proceedings International Conference on Pattern Recognition, Barcelona, Spain, September 1 Unifying View of Image Similarity Nuno Vasconcelos and ndrew Lippman MIT Media Lab, nuno,lip mediamitedu

More information

Infinite Ensemble Learning with Support Vector Machinery

Infinite Ensemble Learning with Support Vector Machinery Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment

Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment Enhancing Generalization Capability of SVM Classifiers ith Feature Weight Adjustment Xizhao Wang and Qiang He College of Mathematics and Computer Science, Hebei University, Baoding 07002, Hebei, China

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH

SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH Ashutosh Kumar Singh 1, S S Sahu 2, Ankita Mishra 3 1,2,3 Birla Institute of Technology, Mesra, Ranchi Email: 1 ashutosh.4kumar.4singh@gmail.com,

More information

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this

More information

A lower bound for scheduling of unit jobs with immediate decision on parallel machines

A lower bound for scheduling of unit jobs with immediate decision on parallel machines A lower bound for scheduling of unit jobs with immediate decision on parallel machines Tomáš Ebenlendr Jiří Sgall Abstract Consider scheduling of unit jobs with release times and deadlines on m identical

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 5, Number 3-4, Pages 351 358 c 2009 Institute for Scientific Computing and Information STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Nonnegative Matrix Factorization with Toeplitz Penalty

Nonnegative Matrix Factorization with Toeplitz Penalty Journal of Informatics and Mathematical Sciences Vol. 10, Nos. 1 & 2, pp. 201 215, 2018 ISSN 0975-5748 (online); 0974-875X (print) Published by RGN Publications http://dx.doi.org/10.26713/jims.v10i1-2.851

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information