A Robust PCA by LMSER Learning with Iterative Error. Bai-ling Zhang Irwin King Lei Xu.

A Robust PCA by LMSER Learning with Iterative Error Reinforcement y Bai-ling Zhang Irwin King Lei Xu blzhang@cs.cuhk.hk king@cs.cuhk.hk lxu@cs.cuhk.hk Department of Computer Science The Chinese University of Hong Kong Shatin, N.T. Hong Kong Abstract We propose an approach for performing adaptive principal component extraction. By this approach, the Least Mean Squared Error Reconstruction (LMSER) Principle is implemented in a successive way such that the reconstruction error is fedback as inputs for training the network's weights. Simulations results have shown that this type of LMSER implementation can perform Robust Principal Component Analysis (PCA) which is capable of resisting strong outliers. Introduction Linear neurons learning under an unsupervised Hebbian rule can learn to perform a linear statistical analysis of the input data, as was rst shown by Oja[3, 2] who proposed a learning rule based on a single neuron which nds the rst principal component of the covariance matrix from the input statistics. Later on, a number of researchers have devised numerous neural networks which nd the rst k>principal components of this matrix or a subspace that is spanned by the rst k> principal components, which are called the k-pca or Principal Subspace Analysis (PSA) network. A detailed reference can be found, e.g., in [6]. In 99, Xu [5, 6] proposed to use the Least Mean Squared Error Reconstruction (LMSER) principle for neural network self-organizations. Particularly, the special case for one-layer linear networks has been investigated in detail. It has been shown in [5, 6] that for one-layer networks the LMSER rule performs the PSA that is similar to the one given in [2], which actually descents down hill of the Mean Squared Error of the reconstruction in a direction with a positive projection on the evolution direction of the LMSER rule. Furthermore, the LMSER rule and the Oja's subspace rule can also perform the true k-pca by introducing dierent scaling factors to the output of each neuron. Moreover, it has been discovered in [5, 6] that for one-layer networks with nonlinear sigmoid activation, the LMSER rule can break the symmetry of the PSA to let the weight vector of each neuron to approximate each of the true k> principal components. Recently, Karhunen and Joutsensalo [] used this nonlinear LMSER on the problem of signal separation and showed that it gives a better separation property than other nonlinear PCA approaches. This paper considers the cases of implementing LMSER or PCA on the input data with strong outliers. This problem was studied by [4] in the literature of statistics, called Robust PCA, by block approaches. Based on statistic physics approach, Xu and Yuille [9, 8, ] have developed a set of adaptive Robust PCA learning rules which can resist strong outliers in data. In this paper, we propose a new approach for Robust PCA, which is based on a successive implementation of the above one-layer nonlinear LMSER. This work was supported in parts by the Hong Kong Research Grant Council, NO: 22572 y For correspondence, please contact I. King, e-mail: king@cs.cuhk.hk

2 New Approach for Robust PCA For a symmetrically circuited single layer feedforward network, let L M weight matrix W t = [w t () ::: w t (M)] L M, has the weight vectors of the M neurons after t iterations as its columns. y i = x T t w t (i) is the linear output of i-th neuron, z i =(y i ) is the corresponding nonlinear output via a nonlinear function (). Let u = Wz denotes a reconstruction vector through a linear transformation of the output vector z via the weight matrix W, with u =[u ::: u L ] T z =[z ::: z M ] T. The one-layer nonlinear LMSER rule was proposed in [5, 6] with criterion: minimize J(W) =Efkx ; W(W T x)k 2 g () then with W t+ = W t ; t rj(w t ) jx=x t (2) rj(w t ) jx=x t = @J(W) ; j x=x t @W = ;(x t e T t W t z t + e tz T t ) (3) where z t =(x T t W t ) z t = (x T t W t ) e t = x ; u t = x ; W t (Wt T x t ) is the reconstruction error vector. Equation (3) is the same as Eq. rst given in [5] with a slight dierent notation, and later used in Karhunen and Joutsensalo [] as their equation Eq.(6) for the problem of signal separation. Here we propose an approach based on the successive applying the LMSER learning rule for Robust PCA. For each data sample, we calculate a new data sequence by the reconstruction error, then the new data sequence is used to train the network. Initially, the input vector ~x is set equal to the original vector x and the weight vectors w k k = ::: M are all updated via LMSER. From the reconstructed vector u = W(W T ~x ), a new input data ~x 2 = ~x ; u is obtained. In this way, for each data sample x, we form a new augmented input data sequence ~x = x ~x 2 = ~x ; u ~x 3 = ~x 2 ; u 2 ~x M = ~x M; ; u M; u i being the reconstructed data. In other words, the procedure for calculating a new input sequence ~x i = ~x i; ; u i; is repeated M times. These are the iterative error reinforcement steps. Principal component analysis can be formulated as a mean square error minimization problem. In linear network case, the criterion for k-th principal component is minimize J t (w(k)) = Efkx ; w(k)w(k) T ^xk 2 g (4) where ^x =(I ;W(k ;) T W(k ;))x, W(k ;) = [c c k; ] T is the matrix composed of previous k ; eigenvectors of data covariance,. From this, principal eigenvectors can be adaptively calculated one by one via successively utilizing the above minimization procedure. Such a scheme is purely linear, from which nothing new can be expected except PCA. Moreover, parallel updating of the weights is not possible. To keep the parallel updating by a linear PCA network, a set of dierent scaling factors is introduced to the output of each neuron [6, 7]. One of the algorithm proposed in [6] is W t+ = W t + t [x t y T t D ; u ty T t ] (5) where D = diag[ ::: M ] > > M >. This rule performs the true PCA. We combine this idea of scaling factors and the above idea for successive implementation of one-layer nonlinear LMSER. Let the reconstruction square error through the weight of k-th output be: J t = t k~x t ; u t k 2 = t LX i= (~x i ; w i (k)(w(k) T ~x)) 2 (6) Here t is a scaling factor, with the same function as those in Eq.(5). From our experience, we take

Taking the gradient of Eq.(6), we have, t = exp(;2ji ; tj) i t= M i6= t (7) @J t @w(k) = ;2 te t (w(k) T ~x) ; 2 t e T t w(k) (w(k) T ~x) (8) where e t = ~x ; w(k)(w(k) T ~x) is the reconstruction error vector. The gradient descent algorithm in matrix form for the overall system is as follows: W t+ = W t + t D[e t (x T W T t )+e T t W (x T W T t )x] (9) where D = diag( ::: M ). For a given sample x, ifwe let the signal bidirectionally propagate M times, and at each time, t, the network input be updated by reconstruction vector u t as ~x t = ~x t; ;u t. 3 Simulations We generate a set of 5, 4-dimensional random Gaussian data points distributed in an ellipsoidal region centered at the origin of R 4 as shown in Fig.. Among these 5 sample points, outliers points were selected and replaced by the original value with a time amplication as illustrated in Fig.. Two experiments were performed with the proposed iterative error reinforcement algorithm. First, we demonstrated the learning results in comparison with the PCA learning scheme{eq. (5) in a semilinear network with amplier factors as A = diag(4 3 2 ). When there were no outliers in the data samples, the proposed heuristics converged to a solution similar to the semi-linear network as shown in Fig. 2. This illustrates that the proposed algorithm is functionally equivalent to the semi-linear PCA network found in Eq. (5) for data samples without outliers. Second, we compared the iterative error reinforcement algorithm with the one without iterative error reinforcement which directly set e t = x ; w(k)(w(k) T x) in Eq.(9). The result is shown in Fig. 2. With outliers in the data sample, both algorithms took longer time to converge. Although both methods converged to an adequate solution demonstrating their robustness against outliers, the iterative error reinforcement method appears to converge to a solution more quickly than the previous nonlinear method as shown in Fig. 2 and 2. 4 Conclusions We propose a new approach for performing adaptive principal component extraction. By this approach, a new input sequence was formed for each data sample and the LMSER learning was applied successively until convergence. This heuristic algorithm is capable of resisting outliers while obtaining PCA basis vectors. Comparative experiments have shown that it is robust and converges more rapidly than previous methods without the iterative reinforcement. References [] Juha Karhunen and Jyrki Jou Joutsensalo. Representation and separation of signals using nonlinear PCA type learning. Neural Networks, 7():3{27, 994. [2] E. Oja. Neural networks, principal components, and subspace. International Journal of Neural Systems, :6{68, 989. [3] Erkki Oja. A simplied neuron model as a principal component analyzer. J. Math. Biology, 5:267{273, 982.

2 2 2 2 4 2 2 4 2 2 2 2 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 4-Dimensional Gaussian Data Samples Data Sample with Outliers Figure : 4-dimensional Gaussian data samples used in the experiment which wer distributed in an ellipsoidal region. Their projections on x ~x 2 x 2 ~x 3 x 3 ~x 4 x 4 ~x planes are displayed in -, respectively. Data samples contaminated with outliers. Data views from x x 2 x 3 x x 2 x 4 x 2 x 3 x 4 x x 3 x 4 are illustrate in -, respectively. [4] F. H. Ruymagaart. A robust principal component analysis. J. Multivar. Anal., pages 485{497, 98. [5] Lei Xu. Least MSE reconstruction for self-organization: (I) multi-layer neural nets. In Proc. International Joint Conference on Neural Networks, volume II, pages 2362{2367, Singapore, November 99. [6] Lei Xu. Least mean square error reconstruction principle for self-organizing neural-netws. Neural Networks, 6:627{648, 993. [7] Lei Xu. Beyond PCA learnings: From linear to nonlinear and from global representation to local representation. In Myung-Won Kim and Soo-Young Lee, editors, Proceedings to the International Conference on Neural Information Processing, pages Vol. II:943{949, Seoul, Korea, Oct 994. [8] Lei Xu and Yuille. Self-organizing rules for robust principle component analysis. In J. K. Cowan S. J. Hanson and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 467{474, San Mateo, 993. Morgan Kaufmann. [9] Lei Xu and A. L. Yuille. Robust PCA learning rules based on statistical physics approach. In Proceedings of International Joint Conference on Neural Networks 992 Baltimore, volume I, pages 82{87, 992. [] Lei Xu and A. L. Yuille. Robust principal component analysis by self-organizing rules based on statistical physics approach. IEEE Trans. on Neural Networks, pages 3{43, 995.

.5.5.5.8.6.6.7.8.5.5.4.9 5 5 5 5.2 2 3 4 2 3 4.5.5.5.5.5.5 5 5.5 5 5 2 3 4.5 2 3 4 Figure 2: The displayed curves are learning results from the semi-linear PCA algorithm Eq. (5) (dashed line) and the proposed approach (solid line) when the data samples were without outliers. The lines are inner products between four weight vectors and corresponding eigenvectors of data covariance matrix which were calculated beforehand which signify a solution when they converge to. Learning results from the proposed iterative error reinforcement algorithm when the data samples were contaminated by outliers and the one without the iterative error reinforcement, i.e., e t = x ; w(k)(w(k) T x) in Eq. (9).