Finite State Automata that Recurrent Cascade-Correlation Cannot Represent

Finite State Autmata that Recurrent Cascade-Crrelatin Cannt Represent Stefan C. Kremer Department f Cmputing Science University f Alberta Edmntn, Alberta, CANADA T6H 5B5 Abstract This paper relates the cmputatinal pwer f Fahlman' s Recurrent Cascade Crrelatin (RCC) architecture t that f finite state autmata (FSA). While sme recurrent netwrks are FSA equivalent, RCC is nt. The paper presents a theretical analysis f the RCC architecture in the frm f a prf describing a large class f FSA which cannt be realized by RCC. 1 INTRODUCTION Recurrent netwrks can be cnsidered t be defmed by tw cmpnents: a netwrk architecture, and a learning rule. The frmer describes hw a netwrk with a given set f weights and tplgy cmputes its utput values, while the latter describes hw the weights (and pssibly tplgy) f the netwrk are updated t fit a specific prblem. It is pssible t evaluate the cmputatinal pwer f a netwrk architecture by analyzing the types f cmputatins a netwrk culd perfrm assuming apprpriate cnnectin weights (and tplgy). This type f analysis prvides an upper bund n what a netwrk can be expected t learn, since n system can learn what it cannt represent. Many recurrent netwrk architectures have been prven t be finite state autmatn r even Turing machine equivalent (see fr example [Aln, 1991], [Gudreau, 1994], [Kremer, 1995], and [Siegelmann, 1992]). The existence f such equivalence prfs naturally gives cnfidence in the use f the given architectures. This paper relates the cmputatinal pwer f Fahlman's Recurrent Cascade Crrelatin architecture [Fahlman, 1991] t that f finite state autmata. It is rganized as fllws: Sectin 2 reviews the RCC architecture as prpsed by Fahlman. Sectin 3 describes finite state autmata in general and presents sme specific autmata which will play an imprtant rle in the discussins which fllw. Sectin 4 describes previus wrk by ther

Finite State Autmata that Recurrent Cascade-Crrelatin Cannt Represent 613 authrs evaluating RCC' s cmputatinal pwer. Sectin 5 expands upn the previus wrk, and presents a new class f autmata which cannt be represented by RCC. Sectin 6 further expands the result f the previus sectin t identify an infinite number f ther unrealizable classes f autmata. Sectin 7 cntains sme cncluding remarks. 2 THE RCC ARCHITECTURE The RCC architecture cnsists f three types f units: input units, hidden units and utput units. After training, a RCC netwrk perfrms the fllwing cmputatin: First, the activatin values f the hidden units are initialized t zer. Secnd, the input unit activatin values are initialized based upn the input signal t the netwrk. Third, each hidden unit cmputes its new activatin value. Furth, the utput units cmpute their new activatins. Then, steps tw thrugh fur are repeated fr each new input signal. The third step f the cmputatin, cmputing the activatin value f a hidden unit, is accmplished accrding t the frmula: a(t+l) = a( t W. a(t+l) + w.a(t)]. J ;=\ 'J' JJ J Here, ai(t) represents the activatin value f unit i at time t, a(e) represents a sigmid squashing functin with finite range (usually frm 0 t 1), and Wij represents the weight f the cnnectin frm unit it unitj. That is, each unit cmputes its activatin value by multiplying the new activatins f all lwered numbered units and its wn previus activatin by a set f weights, summing these prducts, and passing the sum thrugh a lgistic activatin functin. The recurrent weight Wjj frm a unit t itself functins as a srt f memry by transmitting a mdulated versin f the unit's ld activatin value. The utput units f the RCC architecture can be viewed as special cases f hidden units which have weights f value zer fr all cnnectins riginating frm ther utput units. This interpretatin implies that any restrictins n the cmputatinal pwers f general hidden units will als apply t the utput units. Fr this reasn, we shall cncern urselves exclusively with hidden units in the discussins which fllw. Finally, it shuld be nted that since this paper is abut the representatinal pwer f the RCC architecture, its assciated learning rule will nt be discussed here. The reader wishing t knw mre abut the learning rule, r requiring a mre detailed descriptin f the peratin f the RCC architecture, is referred t [Fahlman, 1991]. 3 FINITE STATE AUTOMATA A Finite State Autmatn (FSA) [Hpcrft, 1979] is a frmal cmputing machine defmed by a 5-tuple M=(Q,r.,8,q,F), where Q represents a fmite set f states, r. a fmite input alphabet, 8 a state transitin functin mapping Qxr. t Q, qeq the initial state, and FcQ a set f fmal r accepting states. FSA accept r reject strings f input symbls accrding t the fllwing cmputatin: First, the FSA' s current state is initialized t q. Secnd, the next inut symbl f the str ing, selected frm r., is presented t the autmatn by the utside wrld. Third, the transitin functin, 8, is used t cmpute the FSA' s new state based upn the input symbl, and the FSA's previus state. Furth, the acceptability f the string is cmputed by cmparing the current FSA state t the set f valid fmal states, F. If the current state is a member f F then the autmatn is said t accept the string f input symbls presented s far. Steps tw thrugh fur are repeated fr each input symbl presented by the utside wrld. Nte that the steps f this cmputatin mirrr the steps f an RCC netwrk's cmputatin as described abve. It is ften useful t describe specific autmata by means f a transitin diagram [Hpcrft, 1979]. Figure 1 depicts the transitin diagrams f five FSA. In each case, the states, Q,

614 S.C. KREMER are depicted by circles, while the transitins defmed by 0 are represented as arrws frm the ld state t the new state labelled with the apprpriate input symbl. The arrw labelled "Start" indicates the initial state, q; and fmal accepting states are indicated by duble circles. We nw defme sme terms describing particular FSA which we will require fr the fllwing prf. The first cncerns input signals which scillate. Intuitively, the input signal t a FSA scillates if every pm symbl is repeated fr p> 1. Mre frmally, a sequence f input symbls, s(t), s(t+ 1), s(t+ 2),..., scillates with a perid f p if and nly if p is the minimum value such that: Vt s(t)=s(t+p). Our secnd definitin cncerns scillatins f a FSA's internal state, when the machine is presented a certain sequence f input signals. Intuitively, a FSA' s internal state can scillate in respnse t a given input sequence if there is sme starting state fr which every subsequent <.>th state is repeated. Frmally, a FSA' s state can scillate with a perid f <.> in respnse t a sequence f input symbls, s(t), s(t+ 1), s(t+2),..., if and nly if <.> is the minimum value fr which: 3q S.t. Vt (q, s(t» = (.., ( ( (q, s(t», s(t+ 1», s(t+2»,..., s(t+<.>)) The recursive nature f this frmulatin is based n the fact a FSA' s state depends n its previus state, which in tum depends n the state befre, etc.. We can nw apply these tw defmitins t the FSA displayed in Figure 1. The autmatn labelled "a)" has a state which scillates with a perid f <.>=2 in respnse t any sequence cnsisting f Os and Is (e.g. "00000... ", "11111.... ", "010101... ", etc.). Thus, we can say that it has a state cycle f perid <.>=2 (Le. qqtqqt... ), when its input cycles with a perid f p= 1 (Le. "0000... If). Similarly, when autmatn b)'s input cycles with perid p= 1 (Le. ''000000... "), its state will cycle with perid <.>=3 (Le. qoqtq2qoqtq2'.. ). Fr autmatn c), things are smewhat mre cmplicated. When the input is the sequence "0000...", the state sequence will either be qq%q..' r fa fa fa fa... depending n the initial state. On the ther hand, when the input is the sequence "1111... ", the state sequence will alternate between q and qt. Thus, we say that autmatn c) has a state cycle f <.> = 2 when its input cycles with perid p = 1. But, this autmatn can als have larger state cycles. Fr example, when the input scillates with a perid p=2 (Le. "01010101... If), then the state f the autmatn will scillate with a perid <.>=4 (Le. qqqtqtqqqtqt...). Thus, we can als say that autmatn c) has a state cycle f <.>=4 when its input cycles with perid p =2. The remaining autmata als have state cycles fr varius input cycles, but will nt be discussed in detail. The imprtance f the relatinship between input perid (P) and the state perid (<.» will becme clear shrtly. 4 PREVIOUS RESULTS CONCERNING THE COMPUTATIONAL POWEROFRCC The first investigatin int the cmputatinal pwers f RCC was perfrmed by Giles et. al. [Giles, 1995]. These authrs prved that the RCC architecture, regardless f cnnectin weights and number f hidden units, is incapable f representing any FSA which "fr the same input has an utput perid greater than 2" (p. 7). Using ur scillatin defmitins abve, we can re-express this result as: if a FSA' s input scillates with a perid f p= 1 (Le. input is cnstant), then its state can scillate with a perid f at mst <.>=2. As already nted, Figure Ib) represents a FSA whse state scillates with a perid f <.>=3 in respnse t an input which scillates with a perid f p=1. Thus, Giles et. al.'s therem prves that the autmatn in Figure Ib) cannt be implemented (and hence learned) by a RCC netwrk.

Finite State Autmata that Recurrent Cascade-Crrelatin Cannt Represent 615 a) Start b) Start 0, I c) Start d) Start e) Start Figure I: Finite State Autmata. Giles et. al. als examined the autmata depicted in Figures la) and lc). Hwever, unlike the frmal result cncerning FSA b), the authrs' cnclusins abut these tw autmata were f an empirical nature. In particular, the authrs nted that while autmata which scillated with a perid f 2 under cnstant input (Le. Figure la» were realizable, the autmatn f Ic) appeared nt be be realizable by RCC. Giles et. al. culd nt accunt fr this last bservatin by a frmal prf.

616 S.C.KREMER 5 AUTOMATA WITH CYCLES UNDER ALTERNATING INPUT We nw turn ur attentin t the questin: why is a RCC netwrk unable t learn the autmatn f lc)? We answer this questin by cnsidering what wuld happen if lc) were realizable. In particular, suppse that the input units f a RCC netwrk which implements autmatn lc) are replaced by the hidden units f a RCC netwrk implementing la). In this situatin, the hidden units f la) will scillate with a perid f 2 under cnstant input. But if the inputs t lc) scillate with a perid f 2, then the state f Ic) will scillate with a perid f 4. Thus, the cmbined netwrk's state wuld scillate with a perid f fur under cnstant input. Furthermre, the cascaded cnnectivity scheme f the RCC architecture implies that a netwrk cnstructed by treating ne netwrk's hidden units as the input units f anther, wuld nt vilate any f the cnnectivity cnstraints f RCC. In ther wrds, if RCC culd implement the autmatn f lc), then it wuld als be able t implement a netwrk which scillates with a perid f 4 under cnstant input. Since Giles et. al. prved that the latter cannt be the case, it must als be the case that RCC cannt implement the autmatn f lc). The line f reasning used here t prve that the FSA f Figure lc) is unrealizable can als be applied t many ther autmata. In fact, any autmatn whse state scillates with a perid f mre than 2 under input which scillates with a perid 2, culd be used t cnstruct ne f the autmata prven t be illegal by Giles. This implies that RCC cannt implement any autmatn whse state scillates with a perid f greater than <.>=2 when its input scillates with a perid f p=2. 6 AUTOMATA WITH CYCLES UNDER OSCILLATING INPUT Giles et. ai.' s therem can be viewed as defining a class f autmata which cannt be implemented by the RCC architecture. The prf in Sectin 5 adds anther class f autmata which als cannt be realized. Mre precisely, the tw prfs cncern inputs which scillate with perids f ne and tw respectively. It is natural t ask whether further prfs fr state cycles can be develped when the input scillates with a perid f greater than tw. We nw present the central therem f this paper, a unified defmitin f unrealizable autmata: Therem: If the input signal t a RCC netwrk scillates with a perid, p, then the netwrk can represent nly thse FSA whse utputs frm cycles f length <.>, where pmd<.>=o if p is even and 2pmd<.> =0 if p is dd. T prve this therem we will first need t prve a simpler ne relating the rate f scillatin f the input signal t ne nde in an RCC netwrk t the rate f scillatin f that nde's utput signal. By "the input signal t ne nde" we mean the weighted sum f all activatins f all cnnected ndes (Le. all input ndes, and all lwer numbered hidden ndes), but nt the recurrent signal. I. e. : j - I A(t+ 1) == " W.. a.(t+ 1). L.J IJ I 1=1 Using this defmitin, it is pssible t rewrite the equatin t cmpute the activatin f nde j (given in Sectin 2) as: ap+l) == a( A(t+l)+Wha/t) ). But if we assume that the input signal scillates with a perid f p, then every value f A(t+ 1) can be replaced by ne f a fmite number f input signals (.t, AI, A 2,.,. A p. I ). In ther wrds, A(t+ 1) = A tmdp ' Using this substitutin, it is pssible t repeatedly expand the addend f the previus equatin t derive the frmula: ap+ 1) = a( Atmdp + '")j. a( A(t-I)mdp + Wp. a( A(t-2)mdp + '")j... a( A(t-p+I)mdp +,")/ait-p+ 1) )... ) ) )

Finite State Autmata that Recurrent Cascade-Crrelatin Cannt Represent 617 The unravelling f the recursive equatin nw allws us t examine the relatinship between ap+ 1) and t;(t-p+ 1). Specifically, we nte that if ~ >0 r if p is even then aj{t+ 1) = ft.ap-p+ 1» implies that/is a mntnically increasing functin. Furthermre, since 0' is a functin with finite range,f must als have finite range. It is well knwn that fr any mntnically increasing functin with [mite range, /, the sequence, ft.x), fif(x», fift.j{x»),..., is guaranteed t mntnically apprach a fixed pint (whereft.x)=x). This implies that the sequence, ap+l), t;(t+p+l), q(t+2p+l),..., must als mntnically apprach a fixed pint (where ap+ 1) = q.(t-p+ 1». In ther wrds, the sequence des nt scillate. Since every prh value f ~{t) appraches a fixed pint, the sequence ap), ap+ 1), ap+2), '" can have a perid f at mst p, and must have a perid which divides p evenly. We state this as ur first lemma: Lemma 1: If A.(t) scillates with even perid, p, r if Wu > 0, then state unit j's activatin value must scillate with a perid c..>, where pmdc..> =0. We must nw cnsider the case where '"11 < 0 and p is dd. In this case, ap+ 1) = ft.ap-p+ 1» implies that/is a mntnically decreasing functin. But, in this situatin the functin/ 2 (x)=ft.f{x» must be mntnically increasing with finite range. This implies that the sequence: ap+ 1), a;<t+2p+ 1), a;<t+4p+ 1),..., must mntnically apprach a fixed pint (where a;<t+ 1)=ap-2p+ 1». This in turn implies that the sequence ap), ap+ 1), ap+2),..., can have a perid f at mst 2p, and must have a perid which divides 2p evenly. Once again, we state this result in a lemma: Lemma 2: If A.(t) scillates with dd perid p, and if Wii<O, then state unit j must scillate with a perid c..>, where 2pmdc..>=0. Lemmas 1 and 2 relate the rate f scillatin f the weighted sum f input signals and lwer numbered unit activatins, A.(t) t that f unitj. Hwever, the therem which we wish t prve relates the rate f scillatin f nly the RCC netwrk's input signal t the entire hidden unit activatins. T prve the therem, we use a prf by inductin n the unit number, i: Basis: Nde i= 1 is cnnected nly t the netwrk inputs. Therefre, if the input signal scillates with perid p, then nde i can nly scillate with perid c..>, where pmdc..> =0 if P is even and 2pmdc..> =0 if P is dd. (This fllws frm Lemmas 1 and 2). Assumptin: If the input signal t the netwrk scillates with perid p, then nde i can nly scillate with perid c..>, where pmdc..> =0 if p is even and 2pmdc..>=0 if p is dd. Prf: If the Assumptin hlds fr all ndes i, then Lemmas 1 and 2 imply that it must als hld fr nde i+ 1.0 This prves the therem: Therem: If the input signal t a RCC netwrk scillates with a perid, p, then the netwrk can represent nly thse FSA whse utputs frm cycles f length c..>, where pmdc..>=o ifp is even and 2pmdc..> =0 ifp is dd. 7 CONCLUSIONS It is interesting t nte that bth Giles et. al. 's riginal prf and the cnstructive prf by cntradictin described in Sectin 5 are special cases f the therem. Specifically, Giles et. al. I S riginal prf cncerns input cycles f length p = 1. Applying the therem f Sectin 6 prves that an RCC netwrk can nly represent thse FSA whse state transitins frm cycles f length c..>, where 2(I)mdc..>=0, implying that state cannt scillate with a perid f greater than 2. This is exactly what Giles et. al cncluded, and prves that (amng thers) the autmatn f Figure Ib) cannt be implemented by RCC.

618 S.C.KREMER Similarly, the prf f Sectin 5 cncerns input cycles f length p=2. Applying ur therem prves that an RCC netwrk can nly represent thse machines whse state transitins frm cycles f length <.>, where (2)mdw=O. This again implies that state cannt scillate with a perid greater than 2, which is exactly what was prven in Sectin 5. This prves that the autmatn f Figure lc) (amng thers) cannt be implemented by RCC. In additin t unifying bth the results f Giles et. al. and Sectin 5, the therem f Sectin 6 als accunts fr many ther FSA which are nt representable by RCC. In fact, the therem identifies an inflnite number f ther classes f nn-representable FSA (fr p = 3, P =4, P = 5,...). Each class itself f curse cntains an infinite number f machines. Careful examinatin f the autmatn illustrated in Figure ld) reveals that it cntains a state cycle f length 9 (QcIJ.IQ2QIQ2Q3Q2Q3Q4QcIJ.IQ2Qlq2q3q2q3q4"') in respnse t an input cycle f length 3 ("001001... "). Since this is nt ne f the allwable input/state cycle relatinships defined by the therem, it can be cncluded that the autmatn f Figure Id) (amng thers) cannt be represented by RCC. Finally, it shuld be nted that it remains unknwn if the classes identified by this paper IS therem represent the cmplete extent f RCC's cmputatinal limitatins. Cnsider fr example the autmatn f Figure Ie). This device has n input/state cycles which vilate the therem, thus we cannt cnclude that it is unrepresentable by RCC. Of curse, the issue f whether r nt this particular autmatn is representable is f little interest. Hwever, the class f autmata t which the therem des nt apply, which includes autmatn Ie), requires further investigatin. Perhaps all autmata in this class are representable; perhaps there are ther subclasses (nt identified by the therem) which RCC cannt represent. This issue will be addressed in future wrk. References N. Aln, A. Dewdney, and T. Ott, Efficient simulatin f flnite autmata by neural nets, Jurnal f the Assciatin fr Cmputing Machinery, 38 (2) (1991) 495-514. S. Fahlman, The recurrent cascade-crrelatin architecture, in: R. Lippmann, J. Mdy and D. Turetzky, Eds., Advances in Neural Infrmatin Prcessing Systems 3 (Mrgan Kaufmann, San Mate, CA, 1991) 190-196. C.L. Giles, D. Chen, G.Z. Sun, H.H. Chen, Y.C. Lee, and M.W. Gudreau, Cnstructive Learning f Recurrent Neural Netwrks: Limitatins f Recurrent Cascade Crrelatin and a Simple Slutin, IEEE Transactins n Neural Netwrks, 6 (4) (1995) 829-836. M. Gudreau, C. Giles, S. Chakradhar, and D. Chen, First-rder v.s. secnd-rder single layer recurrent neural netwrks, IEEE Transactins n Neural Netwrks, 5 (3) (1994) 511-513. J.E. Hpcrft and J.D. Ullman, Intrductin t Autmata Thery, Languages and Cmputatin (Addisn-Wesley, Reading, MA, 1979). S.C. Kremer, On the Cmputatinal Pwer f Elman-style Recurrent Netwrks, IEEE Transactins n Neural Netwrks, 6 (4) (1995) 1000-1004. H.T. Siegelmann and E.D. Sntag, On the Cmputatinal Pwer f Neural Nets, in: Prceedings f the Fifth ACM Wrkshp n Cmputatinal Learning Thery, (ACM, New Yrk, NY, 1992) 440-449.