A New Class of APEX-Like PCA Algorithms

Size: px

Start display at page:

Download "A New Class of APEX-Like PCA Algorithms"

Alvin Blake
5 years ago
Views:

1 Reprinted fro Proceedings of ISCAS-98, IEEE Int. Syposiu on Circuit and Systes, Monterey (USA), June 1998 A New Class of APEX-Like PCA Algoriths Sione Fiori, Aurelio Uncini, Francesco Piazza Dipartiento di Elettronica e Autoatica - Università di Ancona Via Brecce Bianche, Ancona-Italy. Fax:+39 (071) eail: aurel@ieee.org Internet:

A NEW CLASS OF APEX LIKE PCA ALGORITHMS Sione Fiori, Aurelio Uncini and Francesco Piazza Dept. Electronics and Autoatics University of Ancona (Italy) E-ail : sione@eealab. unian.

2 A NEW CLASS OF APEX LIKE PCA ALGORITHMS Sione Fiori, Aurelio Uncini and Francesco Piazza Dept. Electronics and Autoatics University of Ancona (Italy) E-ail : sione@eealab. unian. it ABSTRACT One of the ost coonly known algorith to perfor neural Principal Coponent Analysis of real-valued rando signals is the Kung-Diaantaras Adaptive Principal coponent EXtractor (APEX) for a laterally-connected neural architecture. In this paper we present a new approach to obtain an APEX-like PCA procedure as a special case of a ore general class of learning rules, by eans of an optiization theory specialized for the laterally-connected topology. Through siulations we show the new algoriths can be faster than the original one. 1. INTRODUCTION Principal Coponent Analysis (PCA) of ultivariate rando signals is a well-known statistical data analysis technique [1, 7]. It is possible to show that a linear transforation z = W t x of a given ultiple rando signal x into a new rando signal z with less coponents than x, such that: Hebbian Algorith, GHA, [8]), Kung-Diaantaras (APEX, [5, 6, 7]). All of these ethods are characterized by different architectural coplexity, convergence speed properties and nuerical precision at the equilibriu. In this paper we deal with one of these: the Adaptive Principal coponent EXtractor (APEX, [6, 7]) based on a laterally-connected neural architecture. It has a wide relevance in the field of analog ipleentations because it is characterized by a very low coplexity. Here we derive a new class of PCA algoriths based on the laterallyconnected neural architecture, arising fro a siple optiization theory specialized for this topology. Such a class contains, as a special case, an APEX-like algorith, but it contains also a subclass of algoriths that show a saller architectural coplexity and interesting convergence features when copared with the original one. NOTATION. In the following E[.] returns the atheatical expectation of the arguent; operator SUT[A] returns the strictly upper triangular part of the square atrix A; the i:th entry of the generic vector v is denoted with vi. the transfored signal power is axiized under suitable constraints [4]; the transfored scalar signals are statistically decorrelated [4]; the signal x is optially represented by z (in the ean squared reconstruction error sense) [4]; a proper easure of the uncertain [9], y of z is axiized can be obtained by assuing W = F, where (F,.) is a PCA of x. (The foral definition of PCA in ters of atrix pairs (F, D) can be found in [3].) Matrix F contains eigenvectors (noralized to unitary nors) of the covariance atrix of the analyzed signal, while D contains the powers of the Principal Coponents arranged in a descending order. In the literature several algoriths are known that allow the extraction of the (unique) PCA of a signal fro itself. The ost coonly used are those by Sanger (Generalized This researchwassupportedby the Italian MURST. 2. THE LATERALLY-CONNECTED NEURAL ARCHITECTURE Kung and Diaantaras realized a Principal Coponent analyzer using a linear neural network described by the following neural schee: y= Wtx+Lty, (1) with a proper unsupervised learning rule. The input vector x E 7?P, the output vector y E R (with < p, arbitrarily fixed), the direct-connection p x weight-atrix W and the lateral-connection x weight-atrix L are intended to be evaluated at the sae teporal instant. The coluns of W and L are naed in the following way: W = [W, w,.. w~], L= [ ~]. Notice that, being Lt a strictly lower-triangular square atrix (i.e. L~k = O if i z k), this neural network is hierarchical not recurrent. The original learning rule for the weight-atrix W was: AW = q[xy WY2], (2)

and the learning rule for the weight-atrix L was: AL= qsut[yy] qly2, (3) where q is a positive learning rate, X is a p x atrix, Y and Y are x atrices, defined by: Y = diag(yl, ys,....y~).

3 and the learning rule for the weight-atrix L was: AL= qsut[yy] qly2, (3) where q is a positive learning rate, X is a p x atrix, Y and Y are x atrices, defined by: Y = diag(yl, ys,....y~). Kung and Diaantaras were able to prove the convergence of the above algorith under soe conditions. In particular, we can restate their result saying that: (Theore.) Let x be a p-coponents real rando signal, zero-ean, with a finite covariance endowed with non-null distinct eigenvalues, and let (F, D) the unique PCA of x. Let DKN be the neural-net described by (1) trained by eans of the learningpair (2)-(3). If the rate q is chosen so sall that the behavior of the algorith is asyptotically stable and the initial entries of W are sall rando nubers and L(0) = O, then in the ean it holds true that: li L(t) = O, t-+ #~1 W(t) = F, JliI E[y(t)y (t)] = D In other words, under the above conditions the DKNasyptotically becoes, in the ean, a principal coponent ana- Iyzez rl Strictly speaking, they proved that if q is sufficiently sall and suitable initial conditions are assued, then in the ean (W, E[yyt]) ~ (F, D), where (F, D) is a PCA of the signal x. We call the above issue the Kung-Diaantaras Result (KDR). 3. THE +APEX CLASS In the following Subsection a new class of APEX like algoriths is presented. Later, differences and siilarities between our new algoriths and other ones will be discussed APEX-like algoriths based on an optiization forulation A PCA transforation is such that the transfored signals (with the above sybology, z = Wtx) are characterized by axiu variance. Furtherore, fro the foral definition of PCA it is known that, at the equilibriu, any unique PCA vector w~ ust be orthogonal with respect to each other and with an unitary nor. These targets can be thought as separated objectives to be attained by eans of laterally-connected More forally we can state the following: neural topology. (Proposition.) It is possible to dejine a pair (J, C) of objective functions whose extreization process yields a class of PCA algoriths containing, as a special case, an APEX-like. Functions J and C can be properly fixed by exaining the structure of a generic output signal yi fro(1) squared. Direct calculations show: Y? = (W:x)z + (UY)2 + 2(W: X)(UY). (4) The first ter at the right hand contains in the ean the power of the transfored signal.q = w~x, while the second ter at the right hand of the above equation contains in the ean a linear cobination of the output signals crosscorrelation, in fact it holds true that 11[(1~y) 2] = 1:E[yyt ]1~. By definition of PCA, the first one has to be axiized under the constraint w~w~ = 1 ([4]), while the second one ust be zeroed. Here we propose to use the direct-connection adaptation to axiize the powers of the transfored signal by axiizing the following objective function: J(W, L) ~f ~E[y~] + ; ~(W:Wi 1)/ui, (5) i=l %=1 with respect to W only. In the above equation pi are socalled Lagrange ultipliers to be deterined by iposing the constraints w: Wi = 1 in the equilibriu conditions ~ = O. It is iportant to notice that by definition of L, a scalar product (l: y) does not depend on w,, but only on W2 for j < i, then fro equations (4) and (5) we obtain: 8J 8W% = 2E[(W:X)X] + 2E[(l:y)x] + p~w~ = 2E[yix] + p~w~, therefore the optiu w~ satisfies:, (9J 2E[yi(w:x)] + Pi = 0, i dw~ and optiu pi is pi = 213[yi.zi]. If the Gradient Steepest Ascent (GSA) ethod is used to adapt each wi, that eans Awi = + ~q ~, the stochastic learning rule for W reads: AW = q[xy WYZ], (6) and true gradient of J has been replaced by its stochastic instantaneous approxiation. Finally, we choose to adapt the lateral-connection weightatrix L only, in order to iniize a cost function defined as: C(W, L) :f ~ E[y;] + ~(l;li)tii, (7) i=l i=l

where a set of Lagrange ultipliers @i has been introduced for the constraints llli//2 = O(that have to be reached attheequilibriu to preserve KDR)and toaddtothesystern a nuber of degree of freedo.

4 where a set of Lagrange has been introduced for the constraints llli//2 = O(that have to be reached attheequilibriu to preserve KDR)and toaddtothesystern a nuber of degree of freedo. Besides, it is interesting to recognize that those constraints also ebed a regularization property on the global criterion [2]. As it can be directly proved by using standard Kuhn-Tucker theory [2], under these constraints there are no theoretical reasons to force functions ~i to assue anyparticularshape. This second objective function C can be iniized, with respect to the variable atrix L, by eans of a Gradient Steepest Descent (GSD) ethod Ali = ~ q%. Fro equations (4) and (7) it follows: (%3 (Xi = 2E[yiy[~l] + 2@ili, where y[~l=[ylyz...yl l ]tfor2~i~, andyi1l=[o O... O O]t. By rewriting GSD equations in atrix notation, ignoring again expectation operator, the new stochastic learning rule for L reads: AL = +.UT[YY] ql*, (8) with being defined Rule (8) provides iniization of the cross-correlation between the network s output signals. Now we have all the eleents to propose the following definition, relative to the class of algorith represented by the above new neural learning rules: (Definition.) The faily of learning rules described by equations (6) and (8) is called the + APEX Principal Coponent analyzer class. The special eleent in this fareily with $ = Y2 is called Y2 APEX. Notice that Y2 APEX is not the s~e alg~rith as the original APEX, but as L -+ O also Z ~ Y, thus these algoriths asyptotically behave in the sae way, and we call it APEX-like. It is also iportant to reark that, apart fro firther stability considerations, the choice of the ultiplying fictions ii(t) is free. h fact, we can adopt any suitable arbitrarily chosen function that guarantees the asyptotic stability of the global learning process and good perforances of the Principal Coponent analyzing algorith Discussion In practice, in our experients we have exained the following three cases: 1. all are chosen null; 2. vi(t) are arbitrarily chosen non-null constant values tii ; 3. the vi (t) are assued as particular non-constant fhnctions of the unique variables yi (t). Roughly speaking, we can identie the special PC s extractor obtained by vanishing free functions ~i (t) as the & APEX algorith, whose descriptive equations are: AW = ~[XY WYZ], (9) AL = qsut[yy], (lo) In a coputational-coplexity point of view this algorith is the ost interesting one, since it requires a saller aount of operations than the original APEX, as shown in Table 1. The above rule recalls the linearized Rubner-Tavan odel that the O-APEX asyptotically behaves like. (For knowing details about Rubner-Tavan approach readers please refer to [7].) We observed that the ter y? in each of the (3) is too uch large and can also lead the algorith very far fro the right solution. Thus, when non-constant non-null fictions $$ are used, we found useful they satisfy this constraint: Each +~ (t)should be a positive fiction that grows less than t2 at least for large It 1. For instance, we found good results with ~; = Iyi 1. Other suitable choices are of course possible. Algorith Coplexity (Operations) GHA 2p+ ~(2 +)(p+ 1) APEX 3p+ ~2 ~ O-APEX Table 1: Coplexity 3P + 2 coparison. Table 1 provides estiates of the architectural coplexity of the neural networks in ters of the nuber of eleentary operations required by the corresponding learning rules with respect to the network diensions. We define an operation as a product eventually followed by a su. 4. EXPERIMENTAL RESULTS To assess our theoretical analysis and copare algoriths perforances, we perfored siulations by using Sanger s GHA, standard APEX and our new algoriths belonging to the O APEX class. Such PCA algoriths have been run with a network input signal x = Qs, where Q is a p x p orthonoral atrix (Qt Q = I) randoly generated, and s contains p utually uncorrelated zero-ean rando signals Si with dif- 2 = ~[5~]. signals Si are placed in s so ferent powers Ui that their powers are decreasingly ordered, i.e. u? > C! if i < j. This iplies that the first Principal Coponents of x (with < p) are the first colun-vectors of Q.

5 :L C Figure 1: Convergence speed coparison. Figure 2: Coparison of APEX algoriths. Each algorith starts fro the sae initial conditions, that are rando for W and null for L. In order to copare the convergence speed of the new algoriths with those of the GHA and APEX, a suitable easure of convergence b is used. This easure is defined as 6(W) = /IW Q IIF, where Q is that atrix whose coluns are the first of Q, and II. IIF denotes the Frobenius nor. Note that the quantity S ay converge to different values since recovering the coluns of Q is sign-blind. Siulation presented in Figure 1 concerns GHA, APEX and Iyl APEX (that = Iy~1) algoriths. The above results are obtained with a learning stepsize q = 0.01, network s diension p = 10 and = 5. Powers u; were assued fro the exponential law u? = 22 i (where i ranges fro 1 to p) in order to keep a good eigenvalue spread. New Iyl APEX perfors well: its convergence toward KDR looks faster and its precision sees to be coparable at all with that of the other algoriths. Figure 2 shows typical courses of lyl- APEX and Y2 APEX copared together for a: = 0.1 (p i + 1). Here input signals have sall powers one close to another. In this case all algoriths after few steps behave alost identically, therefore the O-APEX is the ost convenient one. 5. CONCLUSION In [7] a wide generalization of the standard APEX has been presented, but to our knowledge special attention has not been paid to particularization nor tests have been perfored in order to discover their features, hence we believe this paper points out soe new issue and contains new contributions. Extensions of the new ethod to the coplex-valued case is currently under investigation. 6. REFERENCES [2] [3] [4] [5] [6] [7] [8] [9] ral networks: A survey, IEEE Trans. on Neural works, Vol. 6, No. 4, pp , July 1995 Net- A. CICHOCKI AND R. UNBEHAUEN, Neural networks for optiization andsignalprocessing, J. Wiley Ltd., 1993 P. COMON, Independent Coponent Analysis, a new concept?, Signal Processing, Vol. 36, pp , 1994 J. KARHUNEN, Optiization criteria and nonlinear PCA neural networks, Proc. of International Joint Conference on Neural Networks (IJCNN), pp ,1994 S.Y. KUNG, Constrained Principal Coponent Analysis via an orthogonal learning network, Proc. of International Syposiu on Circuits and Systes (IS- CAS), pp , 1990 S.Y. KUNG AND K.I. DIAMANTARAS, A network learning algorith for adaptive principal coponent extraction, Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1990, pp S.Y. KUNG AND K.I. DIAMANTARAS, Principal Coponent Neural Networks: Theory and Applications, J. Wiley, 1996 r.d. SANGER, Optial unsupervised learning in a single-layer neural network, Neural Networks, Vol. 2, pp , 1989 L. XU, Theories for unsupervised learning: PCA and its nonlinear extension, Proc. of International Joint Conference on Neural Networks (IJCNN), pp ,1994 [1] P.F. BALDI AND K. HORNIK, Learning in linear neu-

Feature Extraction Techniques

Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that