A Coupled Helmholtz Machine for PCA

Similar documents
Constrained Projection Approximation Algorithms for Principal Component Analysis

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

Probabilistic Latent Semantic Analysis

Kernel Principal Component Analysis

Machine Learning - MT & 14. PCA and MDS

Maximum Within-Cluster Association

Independent Component Analysis (ICA)

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Minimum Entropy, k-means, Spectral Clustering

PCA, Kernel PCA, ICA

Principal Component Analysis

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Introduction to Machine Learning

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Convergence of Eigenspaces in Kernel Principal Component Analysis

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Nonparameteric Regression:

NONNEGATIVE matrix factorization (NMF) is a

Nonnegative Matrix Factorization

MLCC 2015 Dimensionality Reduction and PCA

Manifold Learning: Theory and Applications to HRI

Learning sets and subspaces: a spectral approach

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Principal Component Analysis

Lecture 16 Deep Neural Generative Models

Principal Component Analysis (PCA)

Fisher s Linear Discriminant Analysis

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s

The wake-sleep algorithm for unsupervised neural networks

Machine learning for pervasive systems Classification in high-dimensional spaces

Density Estimation. Seungjin Choi

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Independent Component Analysis on the Basis of Helmholtz Machine

PCA and admixture models

POLYNOMIAL SINGULAR VALUES FOR NUMBER OF WIDEBAND SOURCES ESTIMATION AND PRINCIPAL COMPONENT ANALYSIS

c Springer, Reprinted with permission.

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION WITH APPLICATIONS TO FACIAL IMAGE PROCESSING

Linear Heteroencoders

Principal Component Analysis

Wavelet Transform And Principal Component Analysis Based Feature Extraction

Feature Extraction with Weighted Samples Based on Independent Component Analysis

Tree-Dependent Components of Gene Expression Data for Clustering

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Single Channel Signal Separation Using MAP-based Subspace Decomposition

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Topographic Independent Component Analysis of Gene Expression Time Series Data

HST.582J/6.555J/16.456J

Does the Wake-sleep Algorithm Produce Good Density Estimators?

A Robust PCA by LMSER Learning with Iterative Error. Bai-ling Zhang Irwin King Lei Xu.

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Linear Models for Regression

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Neural networks: Unsupervised learning

Dimensionality Reduction AShortTutorial

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Collaborative Filtering: A Machine Learning Perspective

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

Covariance and Correlation Matrix

CS281 Section 4: Factor Analysis and PCA

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Unsupervised Learning

Matching the dimensionality of maps with that of the data


Discriminative Direction for Kernel Classifiers

Advanced Introduction to Machine Learning

Variational Autoencoders

Bayes Decision Theory

Principal Component Analysis and Linear Discriminant Analysis

Multivariate Statistical Analysis

Statistical Pattern Recognition

A Spectral Regularization Framework for Multi-Task Structure Learning

Independent Component Analysis and Its Application on Accelerator Physics

c 4, < y 2, 1 0, otherwise,

Clustering VS Classification

STA 414/2104: Lecture 8

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Machine Learning (BSMC-GA 4439) Wenke Liu

1 Feature Vectors and Time Series

Deriving Principal Component Analysis (PCA)

Change-Point Detection using Krylov Subspace Learning

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

CSC 411 Lecture 12: Principal Component Analysis

Covariance to PCA. CS 510 Lecture #8 February 17, 2014

LEC 5: Two Dimensional Linear Discriminant Analysis

A few applications of the SVD

Data Mining Techniques

Linear Algebra and Eigenproblems

Non-linear Dimensionality Reduction

Lecture 7: Con3nuous Latent Variable Models

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Logistic Regression. Seungjin Choi

STA 414/2104: Machine Learning

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

Advanced Introduction to Machine Learning CMU-10715

Transcription:

A Coupled Helmholtz Machine for PCA Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu Pohang 79-784, Korea seungjin@postech.ac.kr August 3, 26 Abstract In this letter we present a coupled Helmholtz machine for principal component analysis (PCA), where sub-machines are related through sharing some latent variables and associated weights. Then, we present a wake-sleep PCA algorithm for training the coupled Helmholtz machine, showing that the algorithm iteratively determines principal eigenvectors of a data covariance matrix without any rotational ambiguity, in contrast to some existing methods that performs factor analysis or principal subspace analysis. The coupled Helmholtz machine provides a unified view of principal component analysis, including various existing algorithms as its special cases. The validity of the wake-sleep PCA algorithm is confirmed by numerical experiments. Indexing terms: Dimensionality reduction, Helmholtz machines, PCA. to appear in Electronics Letters Please address correspondence to Prof. Seungjin Choi, Department of Computer Science, POSTECH, San 3 Hyoja-dong, Nam-gu, Pohang 79-784, Korea, Tel: +82-54-279-2259, Fax: +82-54-279-2299, Email: seungjin@postech.ac.kr

Introduction Spectral decomposition of a symmetric matrix involves determining eigenvectors of the matrix, which plays an important role in various methods of machine learning and signal processing. For instance, principal component analysis (PCA) or kernel PCA requires the calculation of first few principal eigenvectors of a data covariance matrix or a kernel matrix, respectively. A variety of methods have been developed for PCA (see [4] and references therein). A common derivation of PCA, is illustrated in terms of a linear (orthogonal) projection W R m n such that given a centered data matrix X R m N, the reconstruction error X WW X 2 F is minimized, where F denotes the Frobenius norm. It is well known that the reconstruction error is blind to an arbitrary rotation. Thus, the minimization of the reconstruction error leads to W = U Q where Q R n n is an arbitrary orthogonal matrix and U R m n contains n principal eigenvectors. The Helmholtz machine [3] is a statistical inference engine where a recognition model is used to infer a probability distribution over the underlying causes from the sensory input and a generative model is used to train the recognition model. The wake-sleep learning is a way of training the Helmholtz machine and the delta-rule wake-sleep learning was used for factor analysis [5]. In this letter, we present a coupled Helmholtz machine for PCA, where sub- Helmholtz machines are related through sharing some latent variables as well as associated weights. We develop a wake-sleep PCA algorithm for training the coupled Helmholtz machine, showing that the algorithm iteratively determines principal eigenvectors of a data covariance matrix without any rotational ambiguity, in contrast to some existing methods that performs factor analysis or principal subspace analysis. In addition, we show that the coupled Helmholtz machine includes [, 2] as its special cases. 2 Proposed Method 2. Coupled Helmholtz Machine We denote a centered data matrix by X = [X it ] R m N and the latent variable matrix by Y R n N (n m represents the intrinsic dimension). The coupled Helmholtz machine that is proposed in this letter, is described by a set of 2

generative models and recognition models, where a set of n generative models has the form X = AE i Y, i =,...,n, () where A R m n is the generative weigh matrix and E i R n n is a diagonal matrix, defined by for j =,...,i, [E i ] jj = for j = i +,...,n. The recognition model infers latent variables by Y = W X, (2) where W R m n is the recognition weight matrix. The set of generative models shares some latent variables Y it as well as associated generative weights A ij, in such a way that the 2nd sub-model shares Y t with the st sub-model and the 3rd sub-model shares Y t and Y 2t with the 2nd sub-model as well as Y t with the st submodel, and so on. 2.2 Wake-Sleep PCA Algorithm The objective function that we consider here, is the integrated squared error that has the form J = n X AE i W X 2 F, (3) i= where > are positive coefficients. We apply the alternating minimization to derive updating rules for A and W that iteratively minimize (3). In the sleep phase, we fix A and solve W J = for W, leading to [ ( W = A U A A)], (4) where U(Z) is an element-wise operator, whose arguments Z ij are transformed by P n Z ij Pl=i α l n l=j U(Z ij ) = f i > j, l Z ij if i j. (5) 3

Next, in the wake phase, we fix W and solve A J = for A, leading to [ ( A = XY U Y Y )]. (6) The updating algorithms in (4) and (6) are referred to as wake-sleep PCA (WS-PCA), that is summarized below. As in [], we can also consider the limiting case where + for i =,...,n, that is, weighting s are rapidly diminishing as i increases. In such a case, the operator U( ) becomes the conventional upper-triangularization operator U T ( ) where U T (Z ij ) = for i > j and U T (Z ij ) = Z ij for i j. The resulting algorithm is referred to as WS-PCA (limiting case). Algorithm Outline: WS-PCA Sleep phase W [ ( = A U A A)], Y = W X. Wake phase [ ( A = XY U Y Y )]. 3 Numerical Experiments We provide a numerical example with X R (intrinsic dimension n = 5), in order to demonstrate that the WS-PCA indeed finds the exact eigenvectors of XX without rotational ambiguity. Fig shows the convergence behavior of the WS-PCA algorithm (and its limiting case) with different choice of. Regardless of values of, generative weights (or recognition weights) converge to true eigenvectors. However, the convergence behavior of the WS-PCA algorithm is slightly different, especially according to the ratio + for i =,...,n (see Fig. ). The WS-PCA achieves the faster convergence, as the ratio, + for i =,...,n decreases. In fact, the limiting case of WS-PCA (where U T is used instead of U) shows the fastest convergence (see Fig. ). 4

4 Conclusion We have introduced a coupled Helmholtz machine where latent variables as well as associated weights are shared by a set of Helmholtz machines. We have presented a wake-sleep-like algorithm in the framework of the coupled Helmholtz machine, showing that the algorithm indeed determines the exact principal eigenvectors of a data covariance matrix without rotational ambiguity. The WS-PCA algorithm includes [, 2] as its special case, each of which is the generative-only and the recognition-only counterpart, respectively. The WS-PCA algorithm is useful in applications where only first few principal eigenvectors are required to be computed from the high-dimensional data covariance matrix. Acknowledgments: This work was supported by ITEP Brain Neuroinformatics Program, KISTEP International Cooperative Research Program, and Korea MIC under ITRC support program supervised by the IITA (IITA-25-C9-5-8). References [] J. -H. Ahn, S. Choi, and J. -H. Oh. A new way of PCA: Integrated-squared-error and EM algorithms. In Proc. IEEE Int l Conf. Acoustics, Speech, and Signal Processing, Montreal, Canada, 24. [2] S. Choi. On variations of power iteration. In Proc. Int l Conf. Artificial Neural Networks, volume 2, pages 45 5, Warsaw, Poland, 25. Springer. [3] P. Dayan, G. E. Hinton, R. M. Neal, and R. S. Zemel. The Helmholtz machine. Neural Computation, 7:22 37, 995. [4] K. I. Diamantaras and S. Y. Kung. Principal Component Neural Networks: Theory and Applications. John Wiely & Sons, INC, 996. [5] R. M. Neal and P. Dayan. Factor analysis using delta-rule wake-sleep learning. Neural Computation, 9:78 83, 997. 5

.5 WS PCA st PC.5 WS PCA st PC 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 (a) (b).5 WS PCA st PC.5 WS PCA (limiting case) st PC 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 (c) (d) Figure : Evolution of generative weight vectors, is shown in terms of the absolute value of the inner produce between a weight vector and a true eigenvector (computed by SVD): (a) WS- PCA with + = and α = ; (b) WS-PCA with + + =. and α = ; (d) WS-PCA (limiting case) = and α = ; (c) WS-PCA with 6