Information Theory Related Learning

Size: px
Start display at page:

Download "Information Theory Related Learning"

Transcription

1 Information Theory Related Learning T. Villmann 1,A.Cichocki 2,andJ.Principe 3 1 University of Applied Sciences Mittweida Dep. of Mathematics/Natural & Computer Sciences, Computational Intelligence Group Technikumplatz 17, Mittweida - GERMANY 2 Riken Brain Sience Institute Lab. for Advanced Brain Signal Processing 2-1 Hirosawa,Wako City, Saitama, JAPAN 3 University of Florida Computational NeuroEngineering Laboratory Gainesville, FL USA Abstract. This is the introduction paper to a special session held on ESANN conference It reviews and highlights recent developments andnewdirectionininformationrelatedlearning,whichisafastlydeveloping research area. These algorithms are based on the fundamental principles of information theory and relate them implicitly or explicitly to learning algoithms and strategies. 1 Introduction The amount of available data to be analyzed and processed is continuously increasing. However, normally one is not interested in the data but in the information they contain. Therefore, the need for efficient and reliable algorithms and methods is very actual and increasingly important. Different strategies are possible ranging from unsupervised compression, random selection, projection to supervised feature selection, or shape detection, to name just a few. All these approaches have in common that information contained in the data should be extracted emphasizing different aspects depending on the task, i.e. they realize an information processing system. Hence, these methods are based implicitly or explicitly on information theory and their consequences. In this paper, we will draw attention to recent developments of this field. Obviously, this is not a complete overview but still picks some interesting new aspects of the ongoing research and remembers also well known facts in that area. 2 Information theoretic learning via statistics and probability Probability theory and statistics are essentially related to information theory. Statistical distributions like Gaussians and others are directly involved in fundamental theorems of information theory: The second Gibbs theorem about corresponding author, thomas.villmann@hs-mittweida.de 1

2 maximum Shannon entropy property of normal distributions is the most prominent fact [18]. Similarity measures were related to probability theory [6], leading to divergence measures as generalized distances like the Kullback-Leibler divergence [35]. Trough time, a large variety of divergences were developed and several classes of divergences identified [14],[64]. This research has new focus [39] and involves modern differential-geometrical methods in statistics and probability theory [3] and in statistical learning [21]. Models in machine learning directly include these results into learning algorithms and strategies. One of the earliest approaches connecting statistics, information theory and biologically motivated learning is the perceptron neuron model of neurons [50]. Multilayer networks can be optimized avoiding overtraining using mutual information [17]. Boltzmann networks are derived from information principles of statistical mechanics [1],[42]. Source separation of data channels is based on the statistical deconvolution. Different aspects can be investigated like independent component analysis (ICA) and blind source separation (BSS) maximizing conditional probabilities [30] while also least- dependent-component analysis is in the focus [55]. New approaches incorporate information theoretic principles directly: Pham investigated BSS based on mutual information [47], whereas Minami applied β-divergences [44]. A method for learning overcomplete data representations and performing overcomplete noisy blind source separation is the sparse coding neural gas (SCNG) [36]. Related to statistical independence, the more difficult problem of estimating statistical dependence is becoming increasingly important and novel algorithms are becoming available [52],[54],[53]. Their applicability is enormous, ranging from variable selection, to BSS to statistical causality. A comprehensive overview for non-negative matrix and tensor factorization is the book by Cichocki & Amari [16]. Recent results including modern divergences (generalized α-β-divergences were just published [15],[14]. Further, it should be noticed that an information theoretic divergence measure like Rényidivergences (belonging to the family of α-divergences) capture directly the statistical information contained in the data as expressed by the probability density function. Otherwise, information theoretic values like the mutual information can be explicitly estimated from data [34]. Here, standard approaches of machine learning such as topographic maps or kernels are applied to achieve accurate estimators [45],[61],[60]. Jenssen et al. have established equivalencies between kernel methods and information theoretic methods [33]. Another example in information theoretic learning uses Rényi-entropy as a cost function instead of the mean squared error, which can be determined either by Parzen estimation [48] or on the basis of the nearest neighbor entropy estimation model [40]. Generally, the question in this context can be translated as: how to deal with the uncertainty contained in data. One way in this direction is to interpret the data as fuzzy values or to generate information about data equipped with uncertainty. This could be done in probabilistic terms, as for example by multivariate 2

3 class labeling [51], or more general fuzzy approaches. 3 Information theoretic feature extraction and selection Feature extraction or selection can be seen as a kind of dimension reduction of complex data by constructing combinations of a few variables. These variables should lead to a simplification of the data in a given context but still describing it with sufficient accuracy. By removing most irrelevant and redundant features, feature selection and extraction helps to improve the performance of learning models. It is clear that the complexity of finding an optimal solution grows with the number of features exponentially. Yet, frequently only a sufficient good solution is required. Most feature selection approaches are supervised schemes such that class information or expected regression values can be used for constructing such suboptimal feature subsets or respective ranking list. The strategies to achieve this goal can be classical Bayesian inference schemes [42], or statistical approaches like correlation or covariance investigation [56],[46]. An alternative to these approaches is feature extraction using the nonparametric mutual information. This can be realized by explicit maximization of the respective mutual information [57], or by learning of appropriate feature transformation optimizing the mutual information based on Rényi-entropies [58]. Andonie & Cataron suggested the utilization of a kind of information energy for relevance learning, which is structural similar to the Rényi-entropy but different in detail [4]. Thereby, relevance learning is a more general approach of input variable weighting in learning vector quantization [27]. Introducing sparseness constraints in this scheme according to the Occam s razor principle, the sparsity can be expressed in terms of entropy and, therefore, used for respective optimization [66]. Information theoretic feature selection for functional data classification is investigated in [25] based on mutual information optimization while forwardbackward strategy search for regression problems based on mutual information is studied in [24]. 4 Information theoretic approaches for vector quantization Vector quantization by a set of prototypes w is one of the most prominent methods for clustering and data compression based on the optimization of the γ-reconstruction error E VQ (γ). One of the key results concerning information theoretic principles for vector quantization is the magnification law discovered by Zador [23],[68]: If the data are given as vectors v in q-dimensional Euclidean space according to a probability density P and ρ is the probability, then the magnification law ρ P α holds with the magnification factor α = q q+γ related to the reconstruction error according to Z E VQ (γ) = kv w (v)k γ E P (v) d (v) 3

4 with kv w (v)k E is the Euclidean distance of the data vector and the prototype w (v) representing it. This is the basic principle of vector quantization based on Euclidean distances. For different schemes like self-organizing maps, neural gas variants slightly different magnification factors are obtained due to the neighborhood cooperativeness during prototype adaptation [62],[26],[43]. Optimum magnification is obtained for α =1, which is equivalent to maximum mutual information [68]. Yet, it is possible to control the magnification of most of these algorithm by different strategies such as local learning, winner relaxing or frequency sensitive competitive learning [2],[19]. For a overview we refer to [62]. If divergences are used instead of the Euclidean norm, optimum magnification α =1can also be achieved by maximum entropy learning [65]. Recently, an approximation to α =1was also obtained when the mean square error in the selforganizing map training is substituted by correntropy [13]. Divergence measure captures directly the statistical information contained in the data as expressed by the probability density function and can thus produce non-convex cluster boundaries. Generally, vector quantization using different types of divergences as similarity measure is an actual hot topic [5],[31]. A comprehensive overview is given in [64]. Other information theoretic vector quantizers directly optimize the mutual information or strongly related the Kullback-Leibler-divergence. These approaches do not try to minimize the reconstruction error but reduce the divergence between data and prototype density distributions [28] or maximizing unconditional entropy [59]. Vector quantization algorithm directly derived from information theoretic principles based on Rényi-entropies are intensively studied in [38],[49],[22] also highlighting its connection to graph clustering and Mercer kernel based learning [32]. 5 Information theory based data visualization and shape recognition Data visualization is a challenging task to explore data and extract information content. Mapping of complex data structures or high-dimensional data onto the two-dimensional plane or the three-dimensional space preserving the relevant information is of particular interest [?]. A good way is standard principal component analysis or its nonlinear counterpart [?]. Another option are topographic maps like the above mentioned self-organizing map (SOM) or generative topographic mapping (GTM) [8]. Compared to SOM, also the magnification properties of GTM mapping are known relating them to information preserving mapping [7]. Structural visualization based on SOMs is recently published and denoted as Exploration Machine (XOM) [67], which can be seen as variant of multi-dimensional scaling (MDS) [20]. Both approaches use in original the discrepancy between the pairwise data distances in the data space and in the embedding space based on the Euclidean distances. Yet, also divergences as dissimilarity measure in MDS is proposed [37]. Stochastic neighbor embedding (SNE) provides a principle alternative to 4

5 MDS: here, the distributions the embedding space is under control such that the mutual information is maximized, which is equivalent to minimizing the Kullback-Leibler-divergence between them [29]. A more robust variant is t- SNE, which uses a t-distribution instead of Gaussians in the original SNE [41]. A mathematical foundations of generalizations of t-sne and SNE for arbitrary divergences is given in [63]. Yet, Kullback-Leibler-divergence can also be plugged into XOM [10],[11]. A generalization of these ideas leads to self-organized neighbor embedding (SONE) [12],[9] 6 Conclusion This introduction paper reviews some recent developments in information related learning. Obviously, this paper can not be complete in any sense. However, it highlights some interesting new details and ideas in the field of a rapidly developing field. Information related learning is, thereby, a very general concept, which makes less assumptions about data than many other machine learning approaches and provides principled strategies based on fundamental cognizance about nature. As we have seen, it comprises different methodologies implicitly or explicitly making use of the concepts of information and entropy. References [1] D. Ackley, G. Hinton, and T. Sejnowski. A learning algorithm for Boltzmann machines. Cognitive Science, 9: , [2] S.C.Ahalt,A.K.Krishnamurty,P.Chen,andD.E.Melton.Competitive learning algorithms for vector quantization. Neural Networks, 3(3): , [3] S.-I. Amari. Differential-Geometrical Methods in Statistics. Springer, [4] R. Andonie and A. Cataron. An information energy LVQ approach for feature ranking. In M. Verleysen, editor, European Symposium on Artificial Neural Networks 2004, pages d-side publications, [5] A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh. Clustering with bregman divergences. Journal of Machine Learning Research, 6: , [6] A. Bhattacharyya. On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society, 35:99 110, [7] C. M. Bishop, M. Svensen, and C. K. I. Williams. Magnification factors for the SOM and GTM algorithms. In Proceedings of WSOM 97, Workshop on Self-Organizing Maps, Espoo, Finland, June 4 6, pages Helsinki University of Technology, Neural Networks Research Centre, Espoo, Finland,

6 [8] C. M. Bishop, M. Svensén, and C. K. I. Williams. GTM: The generative topographic mapping. Neural Computation, 10: , [9] K. Bunte, S. Haase, M. Biehl, and T. Villmann. Mathematical foundations of self-organized neighbor embedding (SONE) for dimension reduction and visualization. Machine Learning Reports, 4(MLR ):1 21, [10] K. Bunte, B. Hammer, T. Villmann, M. Biehl, and A. Wismüller. Exploratory observation machine (XOM) with kullback-leibler divergence for dimensionality reduction and visualziation. In M. Verleysen, editor, Proc. of European Symposium on Artificial Neural Networks (ESANN 2010), pages 87 92, Evere, Belgium, d-side publications. [11] K. Bunte, B. Hammer, T. Villmann, M. Biehl, and A. Wismüller. Neighbor embedding XOM for dimension reduction and visualization. Neurocomputing, page in press, [12] K. Bunte, F.-M. Schleif, S. Haase, and T. Villmann. Mathematical foundations of the self organized neighbor embedding (SONE) for dimension reduction and visualization. In M. Verleysen, editor, Proc.ofEuropean Symposium on Artificial Neural Networks (ESANN 2011), page in this volume, Evere, Belgium, d-side publications. [13] R. Chalasani and J. Principe. Self organizing maps with the correntropy induced metric. In Proc. Int. Joint. Conf. Neural Networks( IJCNN), Barcelona, Spain, [14] A. Cichocki and S. C. S.-I. Amari. Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy, 13: , [15] A. Cichocki, H. Lee, Y.-D. Kim, and S. Choi. Non-negative matrix factorization with α-divergence. Pattern Recognition Letters, [16] A. Cichocki, R. Zdunek, A. Phan, and S.-I. Amari. Nonnegative Matrix and Tensor Factorizations. Wiley, Chichester, [17] G. Deco, W. Finnoff, and H. Zimmermann. Unsupervised mutual information criterion for elemination of overtraining in supervised mulilayer networks. Neural Computation, 7:86 107, [18] G. Deco and D. Obradovic. An Information-Theoretic Approach to Neural Computing. Springer, Heidelberg, New York, Berlin, [19] D. DeSieno. Adding a conscience to competitive learning. In Proc. ICNN 88, International Conference on Neural Networks, pages , Piscataway, NJ, IEEE Service Center. [20] R. Duda and P. Hart. Pattern Classification and Scene Analysis. Wiley, New York,

7 [21] S. Eguchi. Information divergence geometry and the application to statistical machine learning. In F. Emmert-Streib and M. Dehmer, editors, Information Theory and Statistical Learning, pages Springer S, New York, [22] E. Gokcay and J. Principe. Information theoretic clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2): , [23] S. Graf and H. Lushgy. Foundations of quantization for random vectors. LNM Springer, Berlin, [24] A. Guillén, A. Sorjamaa, G. Rubio, A. Lendasse, and I. Rojas. Mutual information based initialization of forward-backward search for feature selection in regression problems. In C. Alippi, M. Polycarpou, C. Panayiotou, and G. Ellinas, editors, Proc. of the International Conference on Artificial Neural Networks - ICANN 2009, volume Part I of LNCS, pages 1 9, Berlin / Heidelberg, Springer. [25] V. Gómez-Verdejo, M. Verleysen, and J. Fleury. Information-theoretic feature selection for functional data classification. Neurocomputing, 72(16-18): , [26] B. Hammer, A. Hasenfuss, and T. Villmann. Magnification control for batch neural gas. Neurocomputing, 70(7-9): , March [27] B. Hammer and T. Villmann. Generalized relevance learning vector quantization. Neural Networks, 15(8-9): , [28] A. Hegde, D. Erdogmus, T. Lehn-Schioler, Y. Rao, and J. Principe. Vector quantization by density matching in the minimum Kullback-Leiblerdivergence sense. In Proc. of the International Joint Conference on Artificial Neural Networks (IJCNN) - Budapest, pages , IEEE Press, [29] G. Hinton and S. Roweis. Stochastic neighbor embedding. In Advances in Neural Information Processing Systems, volume 15, pages The MIT Press, Cammbridge, MA, USA, [30] A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. J. Wiley & Sons, [31] E. Jang, C. Fyfe, and H. Ko. Bregman divergences and the self organising map. In C. Fyfe, D. Kim, S.-Y. Lee, and H. Yin, editors, Intelligent Data Engineering and Automated Learning IDEAL 2008, LNCS 5326, pages Springer, [32] R. Jenssen, D. Erdogmus, J. Principe, and T. Eltoft. The Laplacian PDF distance: A cost function for clustering in a kernel feature space. In Advances in Neural Information Processing Systems, volume 17, pages , Cambridge MA, USA.,

8 [33] R. Jenssen, D. Erdogmus, J. Principe, and T. Eltoft. Some equivalences between kernel methods and information theoretic methods. Journal of VLSI Signal Processing, 45:49 65, [34] A. Kraskov, H. Stogbauer, and P. Grassberger. Estimating mutual information. Physical Review E, 69(6):66 138, [35] S. Kullback and R. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79 86, [36] K. Labusch, E. Barth, and T. Martinetz. Sparse coding neural gas: Learning of overcomplete data representations. Neuro, 72(7-9): , [37] P. Lai and C. Fyfe. Bregman divergences and multi-dimensional scaling. In M. Köppen, N. Kasabov, and G. Coghill, editors, Proceedings of the International Conference on Information Processing 2008 (ICONIP), volume Part II of LNCS 5507, pages Springer, [38] T. Lehn-Schiøler, A. Hegde, D. Erdogmus, and J. Principe. Vector quantization using information theoretic concepts. Natural Computing, 4(1):39 51, [39] F. Liese and I. Vajda. On divergences and informations in statistics and information theory. IEEE Transaction on Information Theory, 52(10): , [40] E. Liitiäinen, A. Lendasse, and F. Corona. On the statistical estimation of rényi entropies. In Proceedings of IEEE/MLSP 2009 International Workshop on Machine Learning for Signal Processing, Grenoble (France), [41] L. Maaten and G. Hinten. Visualizing data using t-sne. Journal of Machine Learning Research, 9: , [42] D. Mackay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, [43] E. Merényi, A. Jain, and T. Villmann. Explicit magnification control of self-organizing maps for "forbidden" data. IEEE Transactions on Neural Networks, 18(3): , May [44] M. Minami and S. Eguchi. Robust blind source separation by beta divergence. Neural Computation, 14: , [45] Y.-I. Moon, B. Rajagopalan, and U. Lall. Estimating mutual inf ormation by kernel density estimators. Physical Review E, 52: , [46] M.Strickert, B. Labitzke, A. Kolb, and T. Villmann. Multispectral image characterization by partial generalized covariance. In M. Verleysen, editor, Proc.ofEuropeanSymposiumonArtificial Neural Networks (ESANN 2011), page in this volume, Evere, Belgium, d-side publications. 8

9 [47] D. Pham. Mutual information approach to blind separation of stationary sources. IEEE Transactions on Information Theory, 48: , [48] J. Principe. Information Theoretic Learning. Springer, Heidelberg, [49] J. C. Principe, J. F. III, and D. Xu. Information theoretic learning. In S. Haykin, editor, Unsupervised Adaptive Filtering. Wiley, New York, NY, [50] F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psych. Rev., 65: , [51] P. Schneider, T. Geweniger, F.-M. Schleif, M. Biehl, and T. Villmann. Multivariate class labeling in Robust Soft LVQ. In M. Verleysen, editor, Proc. of European Symposium on Artificial Neural Networks (ESANN 2011), page in this volume, Evere, Belgium, d-side publications. [52] S. Seth, I. Park, and J. Principe. Variable selection: A statistical dependence perspective. In Proc. International Conference on Acoustic, Speech and Signal Processing (ICASSP), [53] S. Seth and J. Principe. A test of Granger non-causality based on nonparametric conditional independence. In Proc. International Conference on Pattern Recognition (ICPR), [54] S. Seth and J. Principe. Variable selection: A statistical dependence perspective. In Proc. International Conference on Machine Learning Applications (ICMLA), [55] H. Stogbauer, A. Kraskov, S. Astakhov, and P. Grassberger. Leastdependent-component analysis based on mutual information. Physical Review E, 70(6):66 123, [56] M. Strickert, F.-M. Schleif, U. Seiffert, and T. Villmann. Derivatives of pearson correlation for gradient-based analysis of biomedical data. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial, (37):37 44, [57] K. Torkkola. Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research, 3: , [58] K. Torkkola and W. Campbell. Mutual information in learning feature transformations. In P. Langley, editor, Proc.OfInternationalConference on Machine Learning ICML 2000, Stanford, CA, Morgan Kaufmann. [59] M. M. van Hulle. Topographic map formation by maximizing unconditional entropy: a plausible strategy for on-line unsupervised competitive learning and nonparametric density estimation. IEEE Transactions on Neural Networks, 7(5): ,

10 [60] M. M. van Hulle. Density-based clustering with topographic maps. IEEE Transactions on Neural Networks, 10(1): , [61] M. M. van Hulle. Faithful Representations and Topographic Maps From Distortion- to Information-based Self-organization. J. Wiley & Sons, Inc., [62] T. Villmann and J.-C. Claussen. Magnification control in self-organizing maps and neural gas. Neural Computation, 18(2): , February [63] T. Villmann and S. Haase. Mathematical foundations of the generalization of t-sne and SNE for arbitrary divergences. Machine Learning Reports, 4(MLR ):1 16, ISSN: , compint/mlr/mlr pdf. [64] T. Villmann and S. Haase. Divergence based vector quantization. Neural Computation, page in press, [65] T. Villmann and S. Haase. Magnification in divergence based neural maps. In R. Mikkulainen, editor, Proceedings of the International Joint Conference on Artificial Neural Networks (IJCNN 2011), page in press, San Jose, California, IEEE Computer Society Press, Los Alamitos. [66] T. Villmann and M. Kästner. Sparse functional relevance learning in generalized learning vector quantization. In T. Honkela and E. Oja, editors, Proc. of Workshop on Self-Organizing Maps (WSOM 2011), page in press, Helsinki, Finland, Springer. [67] A. Wismüller. The exploration machine a novel method for data visualization. In J. Principe and R. Miikkulainen, editors, Advances in Self- Organizing Maps Proceedings of the 7th InternationalWorkshop WSOM 2009, St. Augustine, FL, USA, LNCS5629, pages , Berlin, Springer. [68] P. L. Zador. Asymptotic quantization error of continuous signals and the quantization dimension. IEEE Transaction on Information Theory, (28): ,

Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences

Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences MACHINE LEARNING REPORTS Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Report 02/2010 Submitted: 01.04.2010 Published:26.04.2010 T. Villmann and S. Haase University of Applied

More information

A Sparse Kernelized Matrix Learning Vector Quantization Model for Human Activity Recognition

A Sparse Kernelized Matrix Learning Vector Quantization Model for Human Activity Recognition A Sparse Kernelized Matrix Learning Vector Quantization Model for Human Activity Recognition M. Kästner 1, M. Strickert 2, and T. Villmann 1 1- University of Appl. Sciences Mittweida - Dept. of Mathematics

More information

Non-Euclidean Independent Component Analysis and Oja's Learning

Non-Euclidean Independent Component Analysis and Oja's Learning Non-Euclidean Independent Component Analysis and Oja's Learning M. Lange 1, M. Biehl 2, and T. Villmann 1 1- University of Appl. Sciences Mittweida - Dept. of Mathematics Mittweida, Saxonia - Germany 2-

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Robust cartogram visualization of outliers in manifold leaning

Robust cartogram visualization of outliers in manifold leaning Robust cartogram visualization of outliers in manifold leaning Alessandra Tosi and Alfredo Vellido atosi@lsi.upc.edu - avellido@lsi.upc.edu LSI Department - UPC, Barcelona 1 Introduction Goals 2 NLDR methods:

More information

Multivariate class labeling in Robust Soft LVQ

Multivariate class labeling in Robust Soft LVQ Multivariate class labeling in Robust Soft LVQ Petra Schneider, Tina Geweniger 2, Frank-Michael Schleif 3, Michael Biehl 4 and Thomas Villmann 2 - School of Clinical and Experimental Medicine - University

More information

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Antti Honkela 1, Stefan Harmeling 2, Leo Lundqvist 1, and Harri Valpola 1 1 Helsinki University of Technology,

More information

Divergence based Learning Vector Quantization

Divergence based Learning Vector Quantization Divergence based Learning Vector Quantization E. Mwebaze 1,2, P. Schneider 2, F.-M. Schleif 3, S. Haase 4, T. Villmann 4, M. Biehl 2 1 Faculty of Computing & IT, Makerere Univ., P.O. Box 7062, Kampala,

More information

Divergence based classification in Learning Vector Quantization

Divergence based classification in Learning Vector Quantization Divergence based classification in Learning Vector Quantization E. Mwebaze 1,2, P. Schneider 2,3, F.-M. Schleif 4, J.R. Aduwo 1, J.A. Quinn 1, S. Haase 5, T. Villmann 5, M. Biehl 2 1 Faculty of Computing

More information

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo The Fixed-Point Algorithm and Maximum Likelihood Estimation for Independent Component Analysis Aapo Hyvarinen Helsinki University of Technology Laboratory of Computer and Information Science P.O.Box 5400,

More information

Bayesian ensemble learning of generative models

Bayesian ensemble learning of generative models Chapter Bayesian ensemble learning of generative models Harri Valpola, Antti Honkela, Juha Karhunen, Tapani Raiko, Xavier Giannakopoulos, Alexander Ilin, Erkki Oja 65 66 Bayesian ensemble learning of generative

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and

More information

Direct and Recursive Prediction of Time Series Using Mutual Information Selection

Direct and Recursive Prediction of Time Series Using Mutual Information Selection Direct and Recursive Prediction of Time Series Using Mutual Information Selection Yongnan Ji, Jin Hao, Nima Reyhani, and Amaury Lendasse Neural Network Research Centre, Helsinki University of Technology,

More information

Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology

Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology Erkki Oja Department of Computer Science Aalto University, Finland

More information

Generalized Information Potential Criterion for Adaptive System Training

Generalized Information Potential Criterion for Adaptive System Training IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1035 Generalized Information Potential Criterion for Adaptive System Training Deniz Erdogmus, Student Member, IEEE, and Jose C. Principe,

More information

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

EM-algorithm for Training of State-space Models with Application to Time Series Prediction EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research

More information

An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems

An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems 1780 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 7, JULY 2002 An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems Deniz Erdogmus, Member, IEEE, and Jose

More information

VECTOR-QUANTIZATION BY DENSITY MATCHING IN THE MINIMUM KULLBACK-LEIBLER DIVERGENCE SENSE

VECTOR-QUANTIZATION BY DENSITY MATCHING IN THE MINIMUM KULLBACK-LEIBLER DIVERGENCE SENSE VECTOR-QUATIZATIO BY DESITY ATCHIG I THE IIU KULLBACK-LEIBLER DIVERGECE SESE Anant Hegde, Deniz Erdogmus, Tue Lehn-Schioler 2, Yadunandana. Rao, Jose C. Principe CEL, Electrical & Computer Engineering

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

Areproducing kernel Hilbert space (RKHS) is a special

Areproducing kernel Hilbert space (RKHS) is a special IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 12, DECEMBER 2008 5891 A Reproducing Kernel Hilbert Space Framework for Information-Theoretic Learning Jian-Wu Xu, Student Member, IEEE, António R.

More information

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA 16 1 (Independent Component Analysis: ICA) 198 9 ICA ICA ICA 1 ICA 198 Jutten Herault Comon[3], Amari & Cardoso[4] ICA Comon (PCA) projection persuit projection persuit ICA ICA ICA 1 [1] [] ICA ICA EEG

More information

Blind Machine Separation Te-Won Lee

Blind Machine Separation Te-Won Lee Blind Machine Separation Te-Won Lee University of California, San Diego Institute for Neural Computation Blind Machine Separation Problem we want to solve: Single microphone blind source separation & deconvolution

More information

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Nonnegative Tensor Factorization with Smoothness Constraints

Nonnegative Tensor Factorization with Smoothness Constraints Nonnegative Tensor Factorization with Smoothness Constraints Rafal ZDUNEK 1 and Tomasz M. RUTKOWSKI 2 1 Institute of Telecommunications, Teleinformatics and Acoustics, Wroclaw University of Technology,

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data

Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data P. Schneider, M. Biehl Mathematics and Computing Science, University of Groningen, The Netherlands email: {petra,

More information

Recursive Generalized Eigendecomposition for Independent Component Analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu

More information

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

On the Estimation of the Mixing Matrix for Underdetermined Blind Source Separation in an Arbitrary Number of Dimensions

On the Estimation of the Mixing Matrix for Underdetermined Blind Source Separation in an Arbitrary Number of Dimensions On the Estimation of the Mixing Matrix for Underdetermined Blind Source Separation in an Arbitrary Number of Dimensions Luis Vielva 1, Ignacio Santamaría 1,Jesús Ibáñez 1, Deniz Erdogmus 2,andJoséCarlosPríncipe

More information

Independent Subspace Analysis

Independent Subspace Analysis Independent Subspace Analysis Barnabás Póczos Supervisor: Dr. András Lőrincz Eötvös Loránd University Neural Information Processing Group Budapest, Hungary MPI, Tübingen, 24 July 2007. Independent Component

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Maximum Within-Cluster Association

Maximum Within-Cluster Association Maximum Within-Cluster Association Yongjin Lee, Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu, Pohang 790-784, Korea Abstract This paper

More information

Linking non-binned spike train kernels to several existing spike train metrics

Linking non-binned spike train kernels to several existing spike train metrics Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents

More information

Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data

Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data P. Schneider, M. Biehl Mathematics and Computing Science, University of Groningen, The Netherlands email: {petra,

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

An Ensemble Learning Approach to Nonlinear Dynamic Blind Source Separation Using State-Space Models

An Ensemble Learning Approach to Nonlinear Dynamic Blind Source Separation Using State-Space Models An Ensemble Learning Approach to Nonlinear Dynamic Blind Source Separation Using State-Space Models Harri Valpola, Antti Honkela, and Juha Karhunen Neural Networks Research Centre, Helsinki University

More information

c Springer, Reprinted with permission.

c Springer, Reprinted with permission. Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent

More information

A two-layer ICA-like model estimated by Score Matching

A two-layer ICA-like model estimated by Score Matching A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional

More information

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Relevance determination in learning vector quantization Thorsten Bojer, Barbara Hammer, Daniel Schunk, and Katharina Tluk von Toschanowitz University of Osnabrück, Department of Mathematics/ Computer Science,

More information

Aspects in Classification Learning - Review of Recent Developments in Learning Vector Quantization

Aspects in Classification Learning - Review of Recent Developments in Learning Vector Quantization F O U N D A T I O N S O F C O M P U T I N G A N D D E C I S I O N S C I E N C E S Vol. 39 (2014) No. 2 DOI: 10.2478/fcds-2014-0006 ISSN 0867-6356 e-issn 2300-3405 Aspects in Classification Learning - Review

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

Non-Negative Matrix Factorization with Quasi-Newton Optimization

Non-Negative Matrix Factorization with Quasi-Newton Optimization Non-Negative Matrix Factorization with Quasi-Newton Optimization Rafal ZDUNEK, Andrzej CICHOCKI Laboratory for Advanced Brain Signal Processing BSI, RIKEN, Wako-shi, JAPAN Abstract. Non-negative matrix

More information

Matching the dimensionality of maps with that of the data

Matching the dimensionality of maps with that of the data Matching the dimensionality of maps with that of the data COLIN FYFE Applied Computational Intelligence Research Unit, The University of Paisley, Paisley, PA 2BE SCOTLAND. Abstract Topographic maps are

More information

Literature on Bregman divergences

Literature on Bregman divergences Literature on Bregman divergences Lecture series at Univ. Hawai i at Mānoa Peter Harremoës February 26, 2016 Information divergence was introduced by Kullback and Leibler [25] and later Kullback started

More information

Using the Delta Test for Variable Selection

Using the Delta Test for Variable Selection Using the Delta Test for Variable Selection Emil Eirola 1, Elia Liitiäinen 1, Amaury Lendasse 1, Francesco Corona 1, and Michel Verleysen 2 1 Helsinki University of Technology Lab. of Computer and Information

More information

Interpreting Deep Classifiers

Interpreting Deep Classifiers Ruprecht-Karls-University Heidelberg Faculty of Mathematics and Computer Science Seminar: Explainable Machine Learning Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge Author: Daniela

More information

Information Theoretical Estimators Toolbox

Information Theoretical Estimators Toolbox Journal of Machine Learning Research 15 (2014) 283-287 Submitted 10/12; Revised 9/13; Published 1/14 Information Theoretical Estimators Toolbox Zoltán Szabó Gatsby Computational Neuroscience Unit Centre

More information

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 12, DECEMBER 2008 2009 Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation Yuanqing Li, Member, IEEE, Andrzej Cichocki,

More information

NONLINEAR INDEPENDENT FACTOR ANALYSIS BY HIERARCHICAL MODELS

NONLINEAR INDEPENDENT FACTOR ANALYSIS BY HIERARCHICAL MODELS NONLINEAR INDEPENDENT FACTOR ANALYSIS BY HIERARCHICAL MODELS Harri Valpola, Tomas Östman and Juha Karhunen Helsinki University of Technology, Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT,

More information

Linear Dependency Between and the Input Noise in -Support Vector Regression

Linear Dependency Between and the Input Noise in -Support Vector Regression 544 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 3, MAY 2003 Linear Dependency Between the Input Noise in -Support Vector Regression James T. Kwok Ivor W. Tsang Abstract In using the -support vector

More information

An Improved Cumulant Based Method for Independent Component Analysis

An Improved Cumulant Based Method for Independent Component Analysis An Improved Cumulant Based Method for Independent Component Analysis Tobias Blaschke and Laurenz Wiskott Institute for Theoretical Biology Humboldt University Berlin Invalidenstraße 43 D - 0 5 Berlin Germany

More information

A Convex Cauchy-Schwarz Divergence Measure for Blind Source Separation

A Convex Cauchy-Schwarz Divergence Measure for Blind Source Separation INTERNATIONAL JOURNAL OF CIRCUITS, SYSTEMS AND SIGNAL PROCESSING Volume, 8 A Convex Cauchy-Schwarz Divergence Measure for Blind Source Separation Zaid Albataineh and Fathi M. Salem Abstract We propose

More information

Some New Results on Information Properties of Mixture Distributions

Some New Results on Information Properties of Mixture Distributions Filomat 31:13 (2017), 4225 4230 https://doi.org/10.2298/fil1713225t Published by Faculty of Sciences and Mathematics, University of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat Some New Results

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION Ehsan Hosseini Asl 1, Jacek M. Zurada 1,2 1 Department of Electrical and Computer Engineering University of Louisville, Louisville,

More information

Non-linear Measure Based Process Monitoring and Fault Diagnosis

Non-linear Measure Based Process Monitoring and Fault Diagnosis Non-linear Measure Based Process Monitoring and Fault Diagnosis 12 th Annual AIChE Meeting, Reno, NV [275] Data Driven Approaches to Process Control 4:40 PM, Nov. 6, 2001 Sandeep Rajput Duane D. Bruns

More information

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata ' / PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE Noboru Murata Waseda University Department of Electrical Electronics and Computer Engineering 3--

More information

A Coupled Helmholtz Machine for PCA

A Coupled Helmholtz Machine for PCA A Coupled Helmholtz Machine for PCA Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu Pohang 79-784, Korea seungjin@postech.ac.kr August

More information

Entropy Manipulation of Arbitrary Non I inear Map pings

Entropy Manipulation of Arbitrary Non I inear Map pings Entropy Manipulation of Arbitrary Non I inear Map pings John W. Fisher I11 JosC C. Principe Computational NeuroEngineering Laboratory EB, #33, PO Box 116130 University of Floridaa Gainesville, FL 326 1

More information

y(n) Time Series Data

y(n) Time Series Data Recurrent SOM with Local Linear Models in Time Series Prediction Timo Koskela, Markus Varsta, Jukka Heikkonen, and Kimmo Kaski Helsinki University of Technology Laboratory of Computational Engineering

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Non-parametric Residual Variance Estimation in Supervised Learning

Non-parametric Residual Variance Estimation in Supervised Learning Non-parametric Residual Variance Estimation in Supervised Learning Elia Liitiäinen, Amaury Lendasse, and Francesco Corona Helsinki University of Technology - Lab. of Computer and Information Science P.O.

More information

Analysis of Multiclass Support Vector Machines

Analysis of Multiclass Support Vector Machines Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern

More information

Variable selection and feature construction using methods related to information theory

Variable selection and feature construction using methods related to information theory Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and

More information

Single-channel source separation using non-negative matrix factorization

Single-channel source separation using non-negative matrix factorization Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics

More information

Curvilinear Components Analysis and Bregman Divergences

Curvilinear Components Analysis and Bregman Divergences and Machine Learning. Bruges (Belgium), 8-3 April, d-side publi., ISBN -9337--. Curvilinear Components Analysis and Bregman Divergences Jigang Sun, Malcolm Crowe and Colin Fyfe Applied Computational Intelligence

More information

Bankruptcy Analysis with Self-Organizing Maps in Learning Metrics

Bankruptcy Analysis with Self-Organizing Maps in Learning Metrics Publication 1 Copyright c 2004 IEEE. Reprinted, with permission, from Samuel Kaski, Janne Sinkkonen, and Jaakko Peltonen, Bankruptcy Analysis with Self-Organizing Maps in Learning Metrics, IEEE Transactions

More information

Information Dynamics Foundations and Applications

Information Dynamics Foundations and Applications Gustavo Deco Bernd Schürmann Information Dynamics Foundations and Applications With 89 Illustrations Springer PREFACE vii CHAPTER 1 Introduction 1 CHAPTER 2 Dynamical Systems: An Overview 7 2.1 Deterministic

More information

ADAPTIVE KERNEL SELF-ORGANIZING MAPS USING INFORMATION THEORETIC LEARNING

ADAPTIVE KERNEL SELF-ORGANIZING MAPS USING INFORMATION THEORETIC LEARNING ADAPTIVE KERNEL SELF-ORGANIZING MAPS USING INFORMATION THEORETIC LEARNING By RAKESH CHALASANI A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps

Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps Lucas Parra, Gustavo Deco, Stefan Miesbach Siemens AG, Corporate Research and Development, ZFE ST SN 4 Otto-Hahn-Ring

More information

Zero-Entropy Minimization for Blind Extraction of Bounded Sources (BEBS)

Zero-Entropy Minimization for Blind Extraction of Bounded Sources (BEBS) Zero-Entropy Minimization for Blind Extraction of Bounded Sources (BEBS) Frédéric Vrins 1, Deniz Erdogmus 2, Christian Jutten 3, and Michel Verleysen 1 1 Machine Learning Group Université catholique de

More information

Time series prediction

Time series prediction Chapter 12 Time series prediction Amaury Lendasse, Yongnan Ji, Nima Reyhani, Jin Hao, Antti Sorjamaa 183 184 Time series prediction 12.1 Introduction Amaury Lendasse What is Time series prediction? Time

More information

Estimation of linear non-gaussian acyclic models for latent factors

Estimation of linear non-gaussian acyclic models for latent factors Estimation of linear non-gaussian acyclic models for latent factors Shohei Shimizu a Patrik O. Hoyer b Aapo Hyvärinen b,c a The Institute of Scientific and Industrial Research, Osaka University Mihogaoka

More information

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Immediate Reward Reinforcement Learning for Projective Kernel Methods ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods

More information

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Nonnegative Matrix Factorization Clustering on Multiple Manifolds Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,

More information

Neural networks: Unsupervised learning

Neural networks: Unsupervised learning Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

A Computational Framework for Nonlinear Dimensionality Reduction of Large Data Sets: The Exploratory Inspection Machine (XIM)

A Computational Framework for Nonlinear Dimensionality Reduction of Large Data Sets: The Exploratory Inspection Machine (XIM) arxiv:1106.2156v1 [cs.ne] 10 Jun 2011 A Computational Framework for Nonlinear Dimensionality Reduction of Large Data Sets: The Exploratory Inspection Machine (XIM) Technical Report Version 1.0 Axel Wismüller,

More information

COMPARISON OF TWO FEATURE EXTRACTION METHODS BASED ON MAXIMIZATION OF MUTUAL INFORMATION. Nikolay Chumerin, Marc M. Van Hulle

COMPARISON OF TWO FEATURE EXTRACTION METHODS BASED ON MAXIMIZATION OF MUTUAL INFORMATION. Nikolay Chumerin, Marc M. Van Hulle COMPARISON OF TWO FEATURE EXTRACTION METHODS BASED ON MAXIMIZATION OF MUTUAL INFORMATION Nikolay Chumerin, Marc M. Van Hulle K.U.Leuven, Laboratorium voor Neuro- en Psychofysiologie Campus Gasthuisberg,

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information