Sample Project: Simulation of Turing Machines by Machines with only Two Tape Symbols The purpose of this document is to illustrate what a completed project should look like. I have chosen a problem that is not among the posted topics, but is closely related. In fact, you could use the program attached to this project as part of solution to several of the other problems, such as construction of a universal Turing machine, or simulation of Turing machines by counter programs. There is a software component to this project: the simulation described in the written report is implemented by a program that takes a Turing machine specification and converts it to a specification of an equivalent machine that has only two tape symbols. Strictly speaking the project could be completed without this part, stopping at the by-hand example given below. However, the blow-up in the number of states makes this rather tedious to carry out on all but the smallest examples, so it is very useful to have the simulation computed automatically. 1 The problem We will show that for every TM M, there is a TM M with the following properties: The tape alphabet of M, is {0, 1}. There is encoding enc : Γ {0, 1}, where Γ is the tape alphabet of M, such that M accepts w if and only if M accepts enc(w), and M rejects w if and only if M rejects enc(w). There is a slight technical difficulty with this claim: It is easy enough to encode tape symbols of M by strings of bits, but the definition of Turing machine requires 1
that the blank symbol not be an element of the input alphabet. We will relax this requirement. In our simulation, we will treat 0 as the blank symbol, but also allow the input of M to contain 0 s. In our simulation, this will not create a problem: The encoding of w will be easy to compute, and the property stated above will still tell us whether M accepts w. 2 How the simulation works We encode each tape symbol of M with k bits. In order that each symbol have a unique encoding, we have to choose k large enough so that the number of tape symbols of M does not exceed 2 k. We will make the convention that the blank symbol of M is encoded by k zeros, but since we are using the machine M only for decision problems, this will not really matter. Apart from that, the encoding is arbitrary. For example, suppose that the tape alphabet of M is {a, b, X, Y, }. Then we need to choose k 3. We can then use the following encoding: a 001 b 010 X 011 Y 100 000 We then need to convert transitions in the specification of M into transitions of M. We can illustrate the general procedure with two examples, one for a left transition, and the other for a right transition. Suppose M contains a transition δ(q, b) = (q, a, R). The machine M will be also be in a state called q when it has to execute this transition, but it will not know that its input is b until it has read additional symbols to the right. For this purpose, we will need to introduce new states and transitions that move the reading head of M three steps to the right: δ (q, 0) = ((q, 0), 0, R), δ ((q, 0), 1) = ((q, 01), 1, R), δ ((q, 01), 0) = ((q, 010), 0, R). 2
The result of this first phase of transitions is that the reading head of M is now positioned just to the right of the encoding of b. (We do not know what symbol it is presently scanning.) Its state now contains both the original state q of M, and the input symbol 010 = enc(b) it was scanning. It now must move to the left four spaces and write enc(a) = 001 on the tape. (This could have been carried out more economically by making the last transition above a left transition that writes 1 on the tape, but that makes the description of the algorithm more complicated.) We then have δ (q, 010), 0) = ((q, 010, 1), 0, L). δ ((q, 010), 1) = ((q, 010, 1), 1, L). δ ((q, 010, 1), 0) = ((q, 010, 2), 1, L). δ ((q, 010, 2), 1) = ((q, 010, 3), 0, L). δ ((q, 010, 3), 0) = ((q, 010, 4), 0, L). δ ((q, 010, 4), 0) = ((q, 010, 5), 0, R). δ ((q, 010, 4), 1) = ((q, 010, 5), 1, R). Observe that of the ten transitions we wrote above, the first five and the last two do not depend at all on the transition function of M. It is only the middle three, where we write the new tape symbol enc(a), that use the information about M. The machine is now in state (q, enc(b), 5) positioned exactly where it was when we started the transition, but with 001 on the tape where 010 was present before.. We now must simulate the move to the right. This of course is accomplished by moving the head of M three symbols to the right: δ ((q, 010, 5), 0) = ((q, 010, 6), 0, R). δ ((q, 010, 5), 1) = ((q, 010, 6), 1, R). δ ((q, 010, 6), 0) = ((q, 010, 7), 0, R). δ ((q, 010, 6), 1) = ((q, 010, 7), 1, R). δ ((q, 010, 7), 0) = (q, 0, R). δ ((q, 010, 7), 1) = (q, 1, R). 3
Figure 1: A single transition, in this case the rightward transition pictured at left, of the original machine, is simulated by a long sequence of transitions of the twotape-symbol machine. Intermediate states are introduced to keep track of where we are in the process. 4
These six transitions depend on the transition function of M, but only for the last little piece the rightward move of the head. If the original transition in M had been a leftward move, all of these transitions would also be moves to the left. The simulation is illustrated in the accompanying diagram. Here, in summary, is the algorithm for construction the transition table of the new machine: 1. For each state q of M other than the accept and reject states, and each bit string v with 0 v k, we introduce a new state (q, v), and if v < k, new transitions δ ((q, v), b) = ((q, vb), b, R). Observe that we are including the state (q, ɛ), which we identify with q. 2. For each state q of M and each tape symbol γ, we look to see if there is a transition of the form δ(q, γ) = (q, γ, D). If there is, we set w = enc(γ), w = enc(γ ), and introduce the following new states: (q, w, j), for 1 j 2k + 1, and the following new transitions: (a) δ ((q, w), b) = ((q, w, 1), b, L), for b {0, 1}. (b) If 1 i k, δ ((q, w, i), b) = ((q, w, i + 1), b, L), (c) where b is the (k i + 1) th bit of w, and b is the (k i + 1) th bit of w. δ ((q, w, k + 1), b) = ((q, w, k + 2), b, R), for b {0, 1}. (d) For k + 2 j 2k, δ ((q, w, j), b) = ((q, w, j), b, D), for b {0, 1}, D {L, R}. (e) δ ((q, w, 2k + 1), b) = ((q, ɛ), b, D), for b {0, 1}, D {L, R}. 5
3 Example Because of the large increase in the number of states, we need to use a very small machine to illustrate the algorithm by hand. Below is the specification of a Turing machine that reads a string of a s and b s, accepting if and only if the last letter is a. 0 a 0 a R 0 b 0 b R 0 B 1 B L 1 a -1 a R The tape alphabet is {, a, b}. We can thus choose k = 1, and encode these three symbols by 00, 01, 10, respectively. (These are the binary encodings of 0,1 and 2.) There are two states apart from the accepting state, and the algorithm requires that for each of these states, q, and for each w {ɛ, 0, 1, 00, 01, 10, 11}, a new state (q, w). Now in fact we will never need the value w = 11, so this will result in twelve states of the given form. Additionally, each of the four transitions results in 2k + 1 = 5 new states, so there will be 32 states in all. Since there are only two input letters, that is a total of 64 transitions. We will adopt a scheme for numbering the states of the new machine: This will be a 3-digit number, where the leading digit (0 or 1) represents the state q of the original machine, the six possible values of w are represented in the second digit as 0,1,2,3,4,5 (for ɛ, 0, 1, 00, 01, 10 respectively) and the the third digit is either 0, 1, 2, 3, 4 or 5. With this scheme, the initial state is still 0. Let s work first on the transitions that don t depend on the transition table of M. Associated with state 0 we have 0 0 10 0 R 0 1 20 1 R 10 0 30 0 R 10 1 40 1 R 20 0 50 0 R Similarly, with state 1 we have 100 0 110 0 R 100 1 120 1 R 110 0 130 0 R 110 1 140 1 R 120 0 150 0 R 6
We now introduce the new transitions derived from the transition 0 a 0 a R The pair (0, a) gives rise to the state (0, 01), which we encode as 40. So we get 40 0 41 0 L 40 1 41 1 L The process of rewriting the symbol 1, then the 0, on the leftward scan is 41 1 42 1 L 42 0 43 0 L and then of advancing right is 43 0 44 0 R 43 1 44 1 R 44 0 45 0 R 44 1 45 1 R 45 0 100 0 R 45 1 100 1 R We will repeat these patterns with the states 30-35, 50-55, and 140-145. We can assemble these all into a single file and execute it with the Turing machine simulator. Before we do that, however, we need to take care about the blurring of the distinction between the input symbol 0 and the blank. To have this work properly with the simulator we would either have to tweak the simulator itself, or change all the zeros in the new specification to B. If you were completing this project and not including computer code, you would work through the rest of this small example and demonstrate it with the Turing machine simulator. Here, rather than continue with the example, we will demonstrate a program that carries this out automatically. 4 Efficiency of the simulation If the original machine M has m states, and t tape symbols, then the length of the encoding of a tape symbol is k bits, where k = log 2 t. There will be roughly 2t strings of length less than or equal to k, and thus the new machine will have 2mt states of the form (q, w). In addition, there are not more than mt transitions in 7
the specification of M, and each transition gives rise to 2k + 1 states of the form (q, w, j). So the total number of states of the new machine is not more than (2k + 3)mt. Another way to think of it is this: The total number of transitions of the original machine is mt, and the total number of transitions of the new machine is twice the number of states, so at worst, the size of the specification is multiplied by a factor of 4k + 6. Even if the number of tape symbols of the original machine is relatively large, say 100, the blowup in the size of the specification is only 34. Bad news if you are doing it by hand, but not very onerous for an automatic solution. Similarly, each transition of the new machine requires that we scan rightward k cells, leftward k + 1 cells, rightward 1 cell, and finally right- or leftward k cells, so we need 3k + 2 steps to simulate a step of the original machine. Again, this is not a particularly large blowup in time, and unlike the simulation of a two-tape machine by a one-tape machine, it is independent of the input size. 5 Implementation Attached to this document is a Python program that implements the algorithm described in the preceding sections. It contains several functions. The first, called create encoding, takes the name of a Turing Machine specification file as an argument. This function returns a triple (d,s,u). The first component is a representation of the state-transition function of the Turing machine this is simply lifted from the code for the Turing machine simulator. The second component gives the encoding of the tape symbols of the machine as bit strings. The third is the set of states of the original machine. The second function, called write new machine takes these same three components as arguments, and writes the specification file, called newoutfile.tm, that results from application of the algorithm above. The method of encoding the new states as integers is slightly different from the one described in the example above, and is described in detail in the comments to the program. The tape alphabet of this new machine is represented in both the specification file and the printed output as {1, B}. However, if you use the simulator to test it, you need to specify the internal blank symbols as space characters. Here is an example. We will apply our construction to the file equalasbs.tm, which decides whether the input string contains an equal number of a s and b s. Here is a run of the original machine on a short rejected input: 8
>>> tm.runtm( equalasbs, bab ) 1. state: 0 b a b 2. state: 1 c a b 3. state: 3 c c b 4. state: 3 B c c b 5. state: 0 c c b 6. state: 0 c c b 7. state: 0 c c b 8. state: 1 c c c B 9. state: -2 c c c B reject 9 steps c c c Here is the start of the same computation, performed with the machine that has two tape symbols: >>> (d,s,u)=create_encoding( equalasbs ) >>> print s { a : 10, : 00, c : 01, b : 11 } >>> write_new_machine(d,s,u) 9
The specification files equalasbs.tm and newoutfile.tm are attached to this project. The resulting Turing machine If we now want to run the new machine on the input bab we need to compute the encoding of the input. This is 111011, but it has to be fed to the machine as 111 11. >>> tm.runtm( newoutfile, 111 11 ) 1. state: 0 1 1 1 B 1 1 2. state: 16 1 1 1 B 1 1 3. state: 48 1 1 1 B 1 1 The computation continues for 64 steps, and ends with the rejection of the input. To turn this into a more useful tool, the program was revised so that rather than writing the new specification file, it creates the specification of the new machine internally. The code from the original simulator was tweaked so that in this new program, you can specify the input in the original alphabet, and see the run of the two-symbol machine, with the bits displayed as 0s an 1s, and gathered in groups of k bits, where k is the number of bits used to encode each tape symbol of M. To do this, I added a new function create tm to create the internal representation, and revised versions of runtm and display configuration from the original Turing machine simulator. Here is the same computation in this new setting: >>> run2symtm( equalasbs, bab ) 1 0 1. state: 0 1 1 1 0 1 1 1 16 2. state: 16 1 1 1 0 1 1 10
1 48 3. state: 48 1 1 1 0 1 1 1 49 4. state: 49 1 1 1 0 1 1 1 50 5. state: 50 1 1 1 0 1 1 51 6. state: 51 0 0 1 1 0 1 1 52 7. state: 52 0 1 1 0 1 1 1 53 8. state: 53 0 1 1 0 1 1 1 64 9. state: 64 0 1 1 0 1 1 80 10. state: 80 0 1 1 0 1 1 1 104 11. state: 104 0 1 1 0 1 1 105 11
12. state: 105 0 1 1 0 1 1 1 106 13. state: 106 0 1 1 1 1 1 1 107 14. state: 107 108 15. state: 108 1 109 16. state: 109 192 17. state: 192 1 200 18. state: 200 224 19. state: 224 1 225 20. state: 225 226 21. state: 226 12
227 22. state: 227 0 228 23. state: 228 229 24. state: 229 0 192 25. state: 192 0 0 200 26. state: 200 0 216 27. state: 216 217 28. state: 217 0 218 29. state: 218 0 0 219 30. state: 219 0 0 0 220 13
31. state: 220 0 0 221 32. state: 221 0 0 33. state: 0 1 8 34. state: 8 32 35. state: 32 1 33 36. state: 33 34 37. state: 34 35 38. state: 35 0 36 39. state: 36 1 37 40. state: 37 14
0 41. state: 0 1 8 42. state: 8 1 32 43. state: 32 1 33 44. state: 33 34 45. state: 34 1 35 46. state: 35 36 47. state: 36 1 37 48. state: 37 1 0 49. state: 0 1 16 15
50. state: 16 48 51. state: 48 0 1 49 52. state: 49 1 50 53. state: 50 1 51 54. state: 51 0 1 0 1 0 1 52 55. state: 52 0 1 0 1 0 1 1 53 56. state: 53 0 1 0 1 0 1 64 57. state: 64 0 1 0 1 0 1 0 72 58. state: 72 0 1 0 1 0 1 0 0 88 59. state: 88 0 1 0 1 0 1 0 0 0 16
120 60. state: 120 0 1 0 1 0 1 0 0 0 0 184 61. state: 184 0 1 0 1 0 1 0 0 0 0 0 248 62. state: 248 0 1 0 1 0 1 0 0 0 0 0 0 312 63. state: 312 0 1 0 1 0 1 0 0 0 0 0 0 0-2 64. state: -2 0 1 0 1 0 1 0 0 0 0 0 0 0 reject 64 steps 17