Turing Machines with Finite Memory

Turing Machines with Finite Memory Wilson Rosa de Oliveira Departamento de Física e Matemática UFRPE, Rua Dom Manoel de Medeiros, s/n, CEP 52171-030, Rece PE, Brazil wrdo@cin.ufpe.br Marcílio C. P. de Souto & Teresa B. Ludermir Centro de Informática - UFPE, Caixa Postal 7051, CEP 50732-970, Rece - PE, Brazil mcps,tbl @cin.ufpe.br Abstract 1. Introduction Faced with the problem of determining the computing power of weighted neural networks relating them to conventional models of computing, researchers have come up with main two approaches: An infinite number of neurons[7, 10, 21] or Motivated by our earlier work on Turing computability via neural networks[4, 3] the results by Maass et.al.[14, 15] on the limit of what can be actually computed by neural networks when noise (or limited precision on the weights) is considered, we introduce the notion of Definite Turing Machine (DTM) investigate some of its properties. We associate to every Turing Machine (TM) a Finite State Automaton (FSA) responsible for the next state transition action (output head movement). A DTM is TM in which its FSM is definite. A FSA is definite (of order )itspresentstatecanbeuniquelydetermined by the last inputs. We show that DTM are strictly less powerful than TM but are capable to compute all simple functions([1]). The corresponding notion of finite-memory Turing machine is shown to be computationally equivalent to Turing machine. afinitenumberofneuronsbutwithunbounded(rational) weights [19, 17]. They all have in common the assumption that the TM tape, an infinite resource, should be represented inside the NN. And that accounts for the need of an infinite number of neurons or the unbounded weights mentioned above. While the first approach is not biologically plausible the second is, due to the limits on measurements, not physically plausible. When some form of imprecision or noise due the physical, biological or technological limitations is considered, the computing power of analog NN are considerably decreased[14, 15]. We have on previous occasions [4, 3] argued against these approaches by claiming that the infinite tape of a Turing machine is an external non-intrinsic feature, which should not be internalized. In Turing own analysis [22], the tape size is merely a mathematical convenience. In fact, the tape is meant as an auxiliary memory for calculations by the machine as a counterpart to a sheet of paper for humans. The idea is that Turing machine = finite state control + infinite tape is a mathematical abstraction of Human calculator = mind + sheets of paper We proceed then to simulate a TM with a Discrete Time Recurrent Neural Network (DTRNN) leaving the TM tape outside the network regarding it as the environment with which the network interacts. We can have two views. In one view, our neural model could be seen as composed of four main modules: recognition, control, writing movement modules. The first module recognizes the content of a tape cell. The result of this recognition process is then sent to the control module, which simulates the TM control ( mind ). The output of the control module is passed on to: (1) the writing module, which writes the symbol on the tape; (2) the movement module, which moves the TM s head left or right. In another more abstract view we regard our system as sort of Neural Turing Machine, i.e. a Turing Machine in which the state control is governed by a neural network. No matter the view, the simulation amounts to actually implements the Finite State Automata (FSA) (more precisely, a sequential Mealy machine[12]) responsible for the next state action (output movement) of the TM. After the results in [15] which indicate that a NN can, in presence of noise, at most implements definite automata, the natural question to ask is What is the computing power of

atmwithdefinitecontrol?.sowedefineinvestigate the properties of these Definite Turing Machines the related notion of Finite-Memory Turing Machines For a more detailed critique to current implementations of TMs in Neural Networks, we refer the reader to [3]. 2. Main Definitions Below we start by formalizing the intuitive notion of a Turing machine depicted above. Then we define the notion of Mealy machine, which is a finite state automaton with an output function. And finally we define the kind of neural networks we use, the first order recurrent neural network where the units have the logistic function as activation function[2]. Next we show how our simulation works then we present a notion of Definite Turing Machines. Definition 1 A Turing Machine over afinitealphabet aquintuple,(thesubscriptsare used when necessary) where: (1) is afinitesetofstateswithadistinguished(initial) state, (usually ); (2) is a function, where is the set of possible head movements. is to be thought as a finite set of instructions means that the machine being in the state reading the symbol from the current cell in the tape will take the following actions: erase from the cell write in its place; change the internal state from to move the head position one cell to the left ( ), one to the right ( )ordoesnotmove( ) Definition 2 A Mealy machine is a sixtuple,where:(1) is a finite set of states; (2) is a finite input alphabet;(3) is a finite output alphabet; (4) is the next state function. should be interpreted as the machine being in state reading symbol goes to state ;(5). should be interpreted as the machine being in state reading symbol writes symbol ;(6) is the initial state where the machine is before the first symbol is read. Remark 1 Apair is usually called a transition system (over ). We can sometimes refer to the transition system of the Mealy machine. an input ( ); the in (a bias) indicates that it is used to compute a state. So,,, in the subscript means respectivelly input, state, output hidden output layer. Definition 3 A deterministic time recurrent neural network or simply a neural network is a sixtuple where: (1), the dimensional unit cube, is the state space of the network is the number os units or neurons; (2) is the set of possible input vector, with the set of real numbers the number of input lines; (3) is the set of outputs of the network, with the number of output units; (4) is the next state function which computes a new state from the previous state the input just read.the th coordinate of is given by (5) is the output function which computes a new output from the previous state the input just read.the th coordinate of is given by 1 where (6) finally, is the initial state of the network which simply is the value that will be used for. 3. The implementation The simulation can easily be grasped by first observing that computations in a Turing machine is actually controlled by a Mealy machine: Throughout let be the logistic map: can be seen as In what follows the superscripts in the weights indicate the computation involved: for example the in indicates that the weight is used to compute a state ( )from 1 As it is known (Gouddreau et.al.[9]), networks as above cannot represent the output function of all Mealy machine unless a two-layer scheme is used. Hence we assume be the hidden output layer where is the number of such units.

i.e. given a TM we look at the heart of which essentially is a Mealy machine,where, are defined as above. Remark 2 We can now speak of the transition system of a Turing machine meaning the transition system of the corresponding Mealy machine. In order to find an appropriate value for above take We now look at Carrasco et.al.[2] stable implementation of a Mealy machine in DTRNN. From the original Mealy machine M they construct the split-state split-output Mealy machine where (1),, is any of the ; (2) (3). The general construction of above usually gives rise to inaccessible states outputs which can be removed (see Hopcroft Ullmann[11]). Next, following Carrasco et. al. [2], we construct a sigmoid DTRNN simulation using the equations below. The number of state space units, input lines output units are, respectively,,. The weights biases for the next state function are given by now The iteration scheme A few iteration starting with an intermediate value in such as are enough for any. Now that we have found,wecanfindtheminimum value for such that which is the used to obtain such that The output function is defined by the following weight scheme: In its turn the output layer has the following weights: In Figure 1(a) we give an example of a Turing Machine with 6 states 2 input symbols which multiplies by two a given input in unary base. The corresponding Mealy machine implementing the control of is in Figure 1(b). In Figure 2(a) the split-state split-output machine from. has states which are unreachable from the initial state 0whenremovedgivethemachineinFigure2(b).Using the results in the present paper we find,. By repeating these steps starting now with a (minimal [18]) Universal Turing Machine with 24 states 2 input symbols (UTM(24,2)), we obtain,.theresultingmealy machine is given in [3]. The number of units depends on the amount of unreachable states being removed can not obviously be generally predicted. But we can easily find the upper bound as follows. The numbers of units for a Turing with states symbols is which in our case (two symbol TM) becomes. We have thus sketchily proved the following: Theorem 1 Any Turing machine over with states can be simulated by a sigmoid Turing neural network with units.

(a) (b) Figure 1. (a) A Turing machine which multiplies by two a given number in unary base. (b) The control of the TM as a Mealy machine. The output pairs,is coded as the integer. 4. Definite Turing Machines Definition 4 ([16, 20]) A transition system is definite, only, for every word of length greater than,,foranypairofstates. Definition 5 AMealymachineis its transition system is. Definition 6 ATuringmachineis its transition system is. definite, only, definite, only, There are various algorithms for deciding a given transition system is definite or not. We use one based on testing table testing graph presented in [13]. The testing table has columns, one for each symbol in the input alphabet. Its rows are divided into two two parts, the upper part correspond to the states of the machine, the table entries are the state transitions. The row headings in the lower part of the table are all unordered pairs of non equal sates, while the table entries are the corresponding pair of state transitions. The testing graph is a directed graph which has as vertices the row headings in the lower part of the testing table there is an edge from,,to,,,only,thereisanentry in row column.theedgeislabelled.noedge implies.thetransitionsystemis definite, only, its testing graph is loop free length of longest path in is. Anon-definiteTMisshowninFigure3whileinFigure4 there is a definite one. 5. Simple Functions In Computing Theory, on the class of primitive recursive functions we can define a hierarchy of classes of functions (a) Figure 2. (a) The split-state split-output Mealy machine obtained from (Figure 1(b)). We have encoded pairs into integers. (b) The Mealy machine obtained from in (a) of increasing complexity (e.g. see the textbook [1]). They are related to programs having a maximum depth of nesting FOR statements in a toy programming language. is the class of functions computed by programs having a maximum depth of nesting FOR statements. One should note that the whole of the primitive recursive functions can be computed by programs with no GOTO statements (see Theorem 3.3 page 53 in [1]). Two classes in this hierarchy are particularly interesting.,theclassofsimple functions,,theelementary functions. For,theequivalenceofprogramsisdecidable while for all the equivalence is undecidable. is claimed to contain all real problems since a program for a not elementary function say must, for any,usemore than steps on infinitely many inputs. For this reason the elemen- (b)

find a definite TM for the Euclidean algorithm in particular the elementary functions in general. In contrast, for the simple functions the situation is straightforward. Theorem 3 The class of simple function is computable by definite Turing machines Figure 3. A not definite Turing machine which finds the gcd of two number in unary notation using the Euclid s algorithm. Proof: We use the characterization in Theorem 2. The less trivial case in the list of basic simple function are those which involves division:. Butalready in Figure 4 we show a definite TM for division by 3, which can obviously be extended for any fixed.thetmfor is a submachine (in the graph theoretical sense) of the TM for a definite transition system does not have non definite subsystem. The operations of composition combinations are nothing but the serial parallel operations, respectively, in [20], where is proved that they preserve definiteness. 6. Finite Memory Machines Figure 4. A definite Turing machine which divides a number in unary notation by 3. tary functions are also called practical computable functions In order to show that definite TM computes the class of simple functions, we need the following characterization of the class (see [1]). Theorem 2 The class of simple function is the smallest class which contains the basic simple functions We have shown how to get a sequential machine of the Mealy type from a Turing machine. For Mealy machines there is another notion of finite memory which are also related to the output not only to the input as in the definite case. In a definite automaton the present state is completely determined by the last inputs. Definition 7 A Mealy machine is finite-memory machine of order is the least integer, so that the present state of can be determined uniquely from the knowledge of the last inputs the corresponding outputs. Definition 8 A Turingmachineis finite-memory only, its associated Mealy machine finite-memory In contrast to the definite machines, which does not recognize all regular language, finite-memory machines are computationally capable of computing all regular languages. for each constant for each constant which is closed under composition combination. The Euclidean algorithm shows that the is an elementary function our Figure 3 shows a non definite TM for it. It remains to be shown that it is impossible to Theorem 4 For any regular language there exists a finite-memory machine which recognizes. Proof: Let be the finite state automaton recognizing.definethemachine, where. is obviously finite-memory of order 1 the language represented by the subset of output letters is equal to. Corollary 1 Turing machines finite-memory Turing machines are computationally equivalent

7. Final Remarks Motivated by our research on the relationships between classical neural computation ([4, 3] the results by Maass et.al.[14, 15] limiting the capabilities of analog neural computation, we proposed a novel class of Turing machines by restricting the type finite control of the machine. In this work we give the first steps towards the characterization of the class of functions computable by definite Turing machines. We also have proved the equivalence between finite-memory Turing machines classical Turing machines. We are extending the results to the Einlemberg s X- machines[6], showing that definite X-machines compute the elementary functions[5]. References [1] W. Brainerd L. Lweber. Theory of Computation. Jonh Wiley & Sons, 1974. [2] R. C. Carrasco, M. L. Forcada, M. A. V.-M. noz, R. P.. Neco. Stable encoding of finite-state machines in discretetime recurrent neural nets with sigmoid units. Neural Computation, 12(9):2129 2174,2000. [3] W. R. de Oliveira, M. C. P. de Souto, T. B. Ludermir. Agent-enviroment approach to the simulation of turing machines by neural networks. In I. INNS, editor, Proceedings of IJCNN 01, 2001. [4] W. R. de Oliveira T. B. Ludermir. Turing machine simulation by logical neural networks. In I. Alekser J. Taylor, editors, Articial Neural Networks II, pages663 668. North-Holl, 1992. [5] W. R. de Oliveira, T. B. Ludermir, M. de Souto. On definite machines. Work in Progress. [6] S. Eilenberg. Automata, Languages, Machines, Volume A. Pure Applied Mathematics. New York: Academic Press, 1974. [7] S. Franklin M. Garzon. Neural computability. Progress in Neural Networks, 1:128,144,1990.Ablex,Norwood. [8] C. Frougny. Simple deterministic NTS languages. Information Processing Letters,12(4):174 178,1981. [9] M. W. Goudreau, C. L. Giles, S. T. Chakradhar, D. Chen. First-order vs. second-order single layer recurrent neural networks. IEEE Transactions on Neural Networks, 5(3):511 513, 1994. [10] R. Hartley H. Szu. A comparison of the computational power of neural network models. In Proc. IEEE Conf. Neural Networks III,pages17,22,1987. [11] J. Hopcroft J. Ullman. Introduction to Automata Theory, Languages Computation. Addison-Wesley,1979. [12] J. E. Hopcroft J. D. Ullman. Introduction to Automata Theory, Languages, Computation. Addison-Wesley Publishing Company, Inc., Reading, MA, 1979. [13] Z. Kohavi. Switching Finite automata theory. McGraw- Hill Book Company, USA, 2nd edition, 1978. [14] W. Maass P. Orponen. On the efect of analog noise on discrete-time analog computations. Neural Computation, 10(5):1071 1095, 1998. [15] W. Maass E. D. Sontag. Analog neural nets with Gaussian or other common noise distributions cannot recognize arbitrary regular languages. Neural Computation, 11:771 782, 1997. [16] M. Perles, M. O. Rabin, E. Shamir. The theory of definite automata. In IEEE Trans. Electronic Computers [], pages 233 243. EC-12. [17] J. B. Pollack. On Connectionist Models of Natural Language Processing. PhD thesis, Computer Science Department, University of Illinois, Urbana, 1987. Available as TR MCCS-87-100, Computing Research Laboratory, New Mexico State University, Las Cruces, NM. [18] Y. Rogozhin. Small universal Turing machines. Theoretical Computer Science, 168:215 240,1996. [19] H. T. Siegelmann E. D. Sontag. On the computational power of neural nets. Journal of Computer System Sciences, 50(1):132 150,1995. [20] M. Steinby. On definite automata related systems. Ann. Acad. Sci. Fenn., 1969. Series A I. [21] G. Sun, H. Chen, Y. Lee, C. Giles. Turing equivalence of neural networks with second order connection weights. In Proc. Int. Joint Conf. Neural Networks, 1991. [22] A. M. Turing. On computable numbers with an application to the Entscheidungsproblem. Proc. London Math. Soc. (3), 42:230 265, 1936. A correction, 43:544 546.