A Robust PCA by LMSER Learning with Iterative Error. Bai-ling Zhang Irwin King Lei Xu.

Similar documents
Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule

Covariance and Correlation Matrix

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

Multisets mixture learning-based ellipse detection

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

A Coupled Helmholtz Machine for PCA

IN neural-network training, the most well-known online

Kernel Hebbian Algorithm for Iterative Kernel Principal Component Analysis

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction

Constrained Projection Approximation Algorithms for Principal Component Analysis

BLIND SEPARATION OF POSITIVE SOURCES USING NON-NEGATIVE PCA

Non-Euclidean Independent Component Analysis and Oja's Learning

Introduction to Neural Networks

THE PRINCIPAL components analysis (PCA), also

CS281 Section 4: Factor Analysis and PCA

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

Fast principal component analysis using fixed-point algorithm

Artificial Intelligence

Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil

STA 414/2104: Lecture 8

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

LMS Algorithm Summary

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

X/94 $ IEEE 1894

Neural networks III: The delta learning rule with semilinear activation function

CS545 Contents XVI. l Adaptive Control. l Reading Assignment for Next Class

STA 414/2104: Lecture 8

Abstract. In this paper we propose recurrent neural networks with feedback into the input

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

Experts. Lei Xu. Dept. of Computer Science, The Chinese University of Hong Kong. Dept. of Computer Science. Toronto, M5S 1A4, Canada.

Artificial Neural Networks. Edward Gatt

MLCC 2015 Dimensionality Reduction and PCA

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

One-unit Learning Rules for Independent Component Analysis

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Hebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning

A New Look at the Power Method for Fast Subspace Tracking

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

IN THIS PAPER, we consider a class of continuous-time recurrent

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

1 Introduction Consider the following: given a cost function J (w) for the parameter vector w = [w1 w2 w n ] T, maximize J (w) (1) such that jjwjj = C

Relating Real-Time Backpropagation and. Backpropagation-Through-Time: An Application of Flow Graph. Interreciprocity.

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

squares based sparse system identification for the error in variables

Machine Learning and Adaptive Systems. Lectures 3 & 4

Geometry of Early Stopping in Linear Networks

arxiv: v3 [cs.lg] 18 Mar 2013

Fast pruning using principal components

Validation of nonlinear PCA

Artificial Neural Network : Training

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Revision: Neural Network

Training Guidelines for Neural Networks to Estimate Stability Regions

Bayesian ensemble learning of generative models

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.

NON-FIXED AND ASYMMETRICAL MARGIN APPROACH TO STOCK MARKET PREDICTION USING SUPPORT VECTOR REGRESSION. Haiqin Yang, Irwin King and Laiwan Chan

Neural networks: Unsupervised learning

Deep Feedforward Networks

Lab 5: 16 th April Exercises on Neural Networks

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Principal Component Analysis CS498

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Recursive Generalized Eigendecomposition for Independent Component Analysis


Neural Network Control of Robot Manipulators and Nonlinear Systems

Eigenvoice Speaker Adaptation via Composite Kernel PCA

Learning Neural Networks

Dictionary Learning for L1-Exact Sparse Coding

Subspace Methods for Visual Learning and Recognition

Multiple Similarities Based Kernel Subspace Learning for Image Classification

A summary of Deep Learning without Poor Local Minima

Unsupervised Learning

AS Elementary Cybernetics. Lecture 4: Neocybernetic Basic Models

c Springer, Reprinted with permission.

BASED on the minimum mean squared error, Widrow

Neural Networks Lecture 4: Radial Bases Function Networks

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Empirical Entropy Manipulation and Analysis

Multilayer Perceptrons (MLPs)

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Greedy Layer-Wise Training of Deep Networks

Deep unsupervised learning

Principal Component Analysis (PCA)

An Ensemble Learning Approach to Nonlinear Dynamic Blind Source Separation Using State-Space Models

PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION WITH APPLICATIONS TO FACIAL IMAGE PROCESSING

Linear Methods for Regression. Lijun Zhang

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Second-order Learning Algorithm with Squared Penalty Term

ORIENTED PCA AND BLIND SIGNAL SEPARATION

Optimal transfer function neural networks

Machine Learning and Adaptive Systems. Lectures 5 & 6

w 1 output input &N 1 x w n w N =&2

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

CS545 Contents XVI. Adaptive Control. Reading Assignment for Next Class. u Model Reference Adaptive Control. u Self-Tuning Regulators

Neural Network Identification of Non Linear Systems Using State Space Techniques.

Transcription:

A Robust PCA by LMSER Learning with Iterative Error Reinforcement y Bai-ling Zhang Irwin King Lei Xu blzhang@cs.cuhk.hk king@cs.cuhk.hk lxu@cs.cuhk.hk Department of Computer Science The Chinese University of Hong Kong Shatin, N.T. Hong Kong Abstract We propose an approach for performing adaptive principal component extraction. By this approach, the Least Mean Squared Error Reconstruction (LMSER) Principle is implemented in a successive way such that the reconstruction error is fedback as inputs for training the network's weights. Simulations results have shown that this type of LMSER implementation can perform Robust Principal Component Analysis (PCA) which is capable of resisting strong outliers. Introduction Linear neurons learning under an unsupervised Hebbian rule can learn to perform a linear statistical analysis of the input data, as was rst shown by Oja[3, 2] who proposed a learning rule based on a single neuron which nds the rst principal component of the covariance matrix from the input statistics. Later on, a number of researchers have devised numerous neural networks which nd the rst k>principal components of this matrix or a subspace that is spanned by the rst k> principal components, which are called the k-pca or Principal Subspace Analysis (PSA) network. A detailed reference can be found, e.g., in [6]. In 99, Xu [5, 6] proposed to use the Least Mean Squared Error Reconstruction (LMSER) principle for neural network self-organizations. Particularly, the special case for one-layer linear networks has been investigated in detail. It has been shown in [5, 6] that for one-layer networks the LMSER rule performs the PSA that is similar to the one given in [2], which actually descents down hill of the Mean Squared Error of the reconstruction in a direction with a positive projection on the evolution direction of the LMSER rule. Furthermore, the LMSER rule and the Oja's subspace rule can also perform the true k-pca by introducing dierent scaling factors to the output of each neuron. Moreover, it has been discovered in [5, 6] that for one-layer networks with nonlinear sigmoid activation, the LMSER rule can break the symmetry of the PSA to let the weight vector of each neuron to approximate each of the true k> principal components. Recently, Karhunen and Joutsensalo [] used this nonlinear LMSER on the problem of signal separation and showed that it gives a better separation property than other nonlinear PCA approaches. This paper considers the cases of implementing LMSER or PCA on the input data with strong outliers. This problem was studied by [4] in the literature of statistics, called Robust PCA, by block approaches. Based on statistic physics approach, Xu and Yuille [9, 8, ] have developed a set of adaptive Robust PCA learning rules which can resist strong outliers in data. In this paper, we propose a new approach for Robust PCA, which is based on a successive implementation of the above one-layer nonlinear LMSER. This work was supported in parts by the Hong Kong Research Grant Council, NO: 22572 y For correspondence, please contact I. King, e-mail: king@cs.cuhk.hk

2 New Approach for Robust PCA For a symmetrically circuited single layer feedforward network, let L M weight matrix W t = [w t () ::: w t (M)] L M, has the weight vectors of the M neurons after t iterations as its columns. y i = x T t w t (i) is the linear output of i-th neuron, z i =(y i ) is the corresponding nonlinear output via a nonlinear function (). Let u = Wz denotes a reconstruction vector through a linear transformation of the output vector z via the weight matrix W, with u =[u ::: u L ] T z =[z ::: z M ] T. The one-layer nonlinear LMSER rule was proposed in [5, 6] with criterion: minimize J(W) =Efkx ; W(W T x)k 2 g () then with W t+ = W t ; t rj(w t ) jx=x t (2) rj(w t ) jx=x t = @J(W) ; j x=x t @W = ;(x t e T t W t z t + e tz T t ) (3) where z t =(x T t W t ) z t = (x T t W t ) e t = x ; u t = x ; W t (Wt T x t ) is the reconstruction error vector. Equation (3) is the same as Eq. rst given in [5] with a slight dierent notation, and later used in Karhunen and Joutsensalo [] as their equation Eq.(6) for the problem of signal separation. Here we propose an approach based on the successive applying the LMSER learning rule for Robust PCA. For each data sample, we calculate a new data sequence by the reconstruction error, then the new data sequence is used to train the network. Initially, the input vector ~x is set equal to the original vector x and the weight vectors w k k = ::: M are all updated via LMSER. From the reconstructed vector u = W(W T ~x ), a new input data ~x 2 = ~x ; u is obtained. In this way, for each data sample x, we form a new augmented input data sequence ~x = x ~x 2 = ~x ; u ~x 3 = ~x 2 ; u 2 ~x M = ~x M; ; u M; u i being the reconstructed data. In other words, the procedure for calculating a new input sequence ~x i = ~x i; ; u i; is repeated M times. These are the iterative error reinforcement steps. Principal component analysis can be formulated as a mean square error minimization problem. In linear network case, the criterion for k-th principal component is minimize J t (w(k)) = Efkx ; w(k)w(k) T ^xk 2 g (4) where ^x =(I ;W(k ;) T W(k ;))x, W(k ;) = [c c k; ] T is the matrix composed of previous k ; eigenvectors of data covariance,. From this, principal eigenvectors can be adaptively calculated one by one via successively utilizing the above minimization procedure. Such a scheme is purely linear, from which nothing new can be expected except PCA. Moreover, parallel updating of the weights is not possible. To keep the parallel updating by a linear PCA network, a set of dierent scaling factors is introduced to the output of each neuron [6, 7]. One of the algorithm proposed in [6] is W t+ = W t + t [x t y T t D ; u ty T t ] (5) where D = diag[ ::: M ] > > M >. This rule performs the true PCA. We combine this idea of scaling factors and the above idea for successive implementation of one-layer nonlinear LMSER. Let the reconstruction square error through the weight of k-th output be: J t = t k~x t ; u t k 2 = t LX i= (~x i ; w i (k)(w(k) T ~x)) 2 (6) Here t is a scaling factor, with the same function as those in Eq.(5). From our experience, we take

Taking the gradient of Eq.(6), we have, t = exp(;2ji ; tj) i t= M i6= t (7) @J t @w(k) = ;2 te t (w(k) T ~x) ; 2 t e T t w(k) (w(k) T ~x) (8) where e t = ~x ; w(k)(w(k) T ~x) is the reconstruction error vector. The gradient descent algorithm in matrix form for the overall system is as follows: W t+ = W t + t D[e t (x T W T t )+e T t W (x T W T t )x] (9) where D = diag( ::: M ). For a given sample x, ifwe let the signal bidirectionally propagate M times, and at each time, t, the network input be updated by reconstruction vector u t as ~x t = ~x t; ;u t. 3 Simulations We generate a set of 5, 4-dimensional random Gaussian data points distributed in an ellipsoidal region centered at the origin of R 4 as shown in Fig.. Among these 5 sample points, outliers points were selected and replaced by the original value with a time amplication as illustrated in Fig.. Two experiments were performed with the proposed iterative error reinforcement algorithm. First, we demonstrated the learning results in comparison with the PCA learning scheme{eq. (5) in a semilinear network with amplier factors as A = diag(4 3 2 ). When there were no outliers in the data samples, the proposed heuristics converged to a solution similar to the semi-linear network as shown in Fig. 2. This illustrates that the proposed algorithm is functionally equivalent to the semi-linear PCA network found in Eq. (5) for data samples without outliers. Second, we compared the iterative error reinforcement algorithm with the one without iterative error reinforcement which directly set e t = x ; w(k)(w(k) T x) in Eq.(9). The result is shown in Fig. 2. With outliers in the data sample, both algorithms took longer time to converge. Although both methods converged to an adequate solution demonstrating their robustness against outliers, the iterative error reinforcement method appears to converge to a solution more quickly than the previous nonlinear method as shown in Fig. 2 and 2. 4 Conclusions We propose a new approach for performing adaptive principal component extraction. By this approach, a new input sequence was formed for each data sample and the LMSER learning was applied successively until convergence. This heuristic algorithm is capable of resisting outliers while obtaining PCA basis vectors. Comparative experiments have shown that it is robust and converges more rapidly than previous methods without the iterative reinforcement. References [] Juha Karhunen and Jyrki Jou Joutsensalo. Representation and separation of signals using nonlinear PCA type learning. Neural Networks, 7():3{27, 994. [2] E. Oja. Neural networks, principal components, and subspace. International Journal of Neural Systems, :6{68, 989. [3] Erkki Oja. A simplied neuron model as a principal component analyzer. J. Math. Biology, 5:267{273, 982.

2 2 2 2 4 2 2 4 2 2 2 2 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 4-Dimensional Gaussian Data Samples Data Sample with Outliers Figure : 4-dimensional Gaussian data samples used in the experiment which wer distributed in an ellipsoidal region. Their projections on x ~x 2 x 2 ~x 3 x 3 ~x 4 x 4 ~x planes are displayed in -, respectively. Data samples contaminated with outliers. Data views from x x 2 x 3 x x 2 x 4 x 2 x 3 x 4 x x 3 x 4 are illustrate in -, respectively. [4] F. H. Ruymagaart. A robust principal component analysis. J. Multivar. Anal., pages 485{497, 98. [5] Lei Xu. Least MSE reconstruction for self-organization: (I) multi-layer neural nets. In Proc. International Joint Conference on Neural Networks, volume II, pages 2362{2367, Singapore, November 99. [6] Lei Xu. Least mean square error reconstruction principle for self-organizing neural-netws. Neural Networks, 6:627{648, 993. [7] Lei Xu. Beyond PCA learnings: From linear to nonlinear and from global representation to local representation. In Myung-Won Kim and Soo-Young Lee, editors, Proceedings to the International Conference on Neural Information Processing, pages Vol. II:943{949, Seoul, Korea, Oct 994. [8] Lei Xu and Yuille. Self-organizing rules for robust principle component analysis. In J. K. Cowan S. J. Hanson and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 467{474, San Mateo, 993. Morgan Kaufmann. [9] Lei Xu and A. L. Yuille. Robust PCA learning rules based on statistical physics approach. In Proceedings of International Joint Conference on Neural Networks 992 Baltimore, volume I, pages 82{87, 992. [] Lei Xu and A. L. Yuille. Robust principal component analysis by self-organizing rules based on statistical physics approach. IEEE Trans. on Neural Networks, pages 3{43, 995.

.5.5.5.8.6.6.7.8.5.5.4.9 5 5 5 5.2 2 3 4 2 3 4.5.5.5.5.5.5 5 5.5 5 5 2 3 4.5 2 3 4 Figure 2: The displayed curves are learning results from the semi-linear PCA algorithm Eq. (5) (dashed line) and the proposed approach (solid line) when the data samples were without outliers. The lines are inner products between four weight vectors and corresponding eigenvectors of data covariance matrix which were calculated beforehand which signify a solution when they converge to. Learning results from the proposed iterative error reinforcement algorithm when the data samples were contaminated by outliers and the one without the iterative error reinforcement, i.e., e t = x ; w(k)(w(k) T x) in Eq. (9).