Testing Emptiness of a CFL. Testing Finiteness of a CFL. Testing Membership in a CFL. CYK Algorithm

Testing Emptiness of a CFL As for regular languages, we really take a representation of some language and ask whether it represents φ Can use either CFG or PDA Our choice, since there are algorithms to convert one to the other. The test: Use a CFG; check if the start symbol is useless Testing Finiteness of a CFL Let L be a CFL. Then there is some pumping-lemma constant n for L. Test all strings of length between n and n - for membership (as in next slide). If there is any such string, it can be pumped, and the language is infinite. If there is no such string, then n - is an upper limit on the length of strings, so the language is finite COSC 00 COSC 00 Testing Membership in a CFL Trick: If there were a string z = uvwxy of length n or longer, you can find a shorter string uwy in L, but it's at most n shorter (why?). Thus, if there are any strings of length n or more, you can repeatedly cut out vx to get, eventually, a string whose length is in the range n to n -. Simulating a PDA for L on string w doesn't quite work, because the PDA can grow its stack indefinitely on ε input, and we never finish, even if the PDA is deterministic. There is an O(n ) algorithm (n = length of w) that uses a "dynamic programming" technique. Called Cocke-Younger-Kasami (CYK) algorithm COSC 00 COSC 00 CYK Algorithm Start with a CNF grammar for L. Build a two-dimensional table Row = length of substring of w Column = start of substring Entry in row i and column j = set of variables that generate the substring of w beginning at position j and including i positions w is in L if S is in X n, w = n COSC 00 5 X X X X X X X X a a a X w X a Basis: (row ) X ii = the set of variables A such that A a is a production, and a is the symbol at position i of w. Induction: Assume the rows for substrings of length up to m - have been computed, and compute the row for substrings of length m. We can derive a i a i+ a j from A if there is a production A BC, B derives any prefix of a i a i+ a j, and C derives the rest. Thus, we must ask if there is any value of k such that i k < j B is in X ik C is in X k+,j COSC 00 6

Example Determine if w = aabbb is in language generated by G: S AB A BB a B AB b w = a, so X is set of all variables that immediately derive a, that is X = {A} 5 X 5 X X X X 5 X X X 5 X X 5 {A} {A}{B}{B}{B} a a b b b w Compute X : sincex = {A} and X = {A}, look to produce AA X consists of all variables on the left side of any production with right side AA; none, so X is φ ComputeX : look for AB, productions B AB and S AB fit, so X = {S,B} Rest is easy Since S is in X 5, w L(G) 5 {S,B} {A} {S,B} {S,B} {A} {S,B} φ {S,B} {A} {A} {A} {A} {B} {B} {B} a a b b b w COSC 00 7 COSC 00 8 Outline of Turing Machines and Complexity. Turing machine (TM) = formal model of a computer running a particular program. We must argue that the TM can do exactly what a computer can do, albeit slower.. We use the simplicity of the TM model to prove formally that there are specific problems (languages) that the TM cannot solve. Outline Undecidability unsolvable problems Turing Machines formalism for computers generally COSC 00 9 COSC 00 0 Undecidability: Intuitive Argument Are there problems a programme cannot solve? Simple hello, world problem: decide if a programme prints hello, world It's hard/impossible to determine, even for humans: cf. the International Obfuscated C Code Contest (IOCCC): goal is to write most confusing programme you can even humans don't understand these programmes why should programmes do much better? spoiler: they can't! Simulation won't work Simple approach: Run programme or simulate it, wait for the output, print /no depending on output Obvious problem: what if programme is: maybe_wait_for_00_years(); print("hello, world"); Is there no other way? Some oracle? No use proof by contradiction COSC 00 COSC 00

Undecidability: Informal Proof (I) Suppose H is a programme which solves the hello, world problem: H takes two inputs: P (a programme) and I (input for P) H prints if P prints hello, world after reading I H prints no otherwise NB: even if P never stops, H must print or no P I H no Undecidability: Informal Proof (II) From H, create H : acts just like H, but prints hello, world instead of no if P prints hello, world, H prints if P prints anything else, H prints hello, world P H I hello, world COSC 00 COSC 00 Undecidability: Informal Proof (III) Now from H, create H : acts just like H, but P doubles as input to itself, so H (P) = H (P,P) if P prints hello, world, H prints if P prints anything else, H prints hello, world P H hello, world Undecidability: Informal Proof (IV) Now feed H to itself, i.e., H (H )! What happens? If H prints hello, world, H should print but by printing, H is forced to print hello, world, and so on, ad infinitum. Hence, H, H, and H cannot exist. H hello, world COSC 00 5 COSC 00 6 Turing Machines and Complexity Turing Machine (TM): formal model of computer + program TM can do exactly what a computer can do, just slower there are specific problems that TM cannot solve:. Recursively Enumerable: accept but not reject. Non-RE: cannot even recognize problems with TM's that accept and always halt, i.e., accept + reject ² vs. NP-complete problems specific NP-complete problem(s), e.g., satisfiability The TM Finite-state control, like PDA One read-write tape serves as both input and unbounded storage device Tape divided into cells Each cell holds one symbol from tape alphabet Tape is "semi-infinite"; it ends only at left Tape head marks current cell, only cell that can influence move of TM Initially, tape holds a a a n BB where a a a n is input, chosen from input alphabet (subset of tape alphabet) and B is blank Finite-state control p a/b, R q tape head, about to read current cell a a a n B B COSC 00 7 COSC 00 8

Formal TM M = (Q,Σ,Γ,δ,q 0,B,F) where: Q = finite set of states Σ = input alphabet Γ = tapealphabet Σ Γ B = blank B in Γ - Σ q 0 = start state q 0 in Q F = accepting states F Q δ = transition function δ(q,) = (q',',d) maps state (q) and tape symbol () to new state (q'), replacement symbol (') (either might not change) and direction (d=l/r) for head motion COSC 00 9 Example XX0 M accepts if third input symbol is 0, and otherwise runs forever. M = ({p, q, r, s, t}, {0, }, {0,, B}, p, B, {s}). δ(p,x) = (q, X, R) for X = 0, i.e., in state p on reading 0 or, rewrite the input symbol, move tape head to Right and go to state q. δ(q,x) = (r, X, R) for X = 0,. δ(r,0) = (s, 0, L). δ(r,) = (t,, R) 5. δ(t,x) = (t, X, R) for X = 0,, B COSC 00 0 Example XX0 (II) ID's of a Turing Machine can draw M: p 0/0,R /,R q 0/0,R /,R r 0/0,R /,R s t ID (instantaneous description) captures what is going on at any moment: the current state, the contents of the tape, and the position of the tape head. Keep things finite by dropping all symbols to the right of the head and to the right of the rightmost nonblank. Subtle point: although there is no limit on how far right the head may move and write nonblanks, at any finite time, the TM has visited only a finite prefix of the infinite tape. COSC 00 COSC 00 Notation αqβ says: α is the tape contents to the left of the head The state is q β is the nonblank tape contents at or to the right of the tape head One move indicated byl Zero or more moves represented by L Check 8.. for detailed definition of L IDs for Example XX0 With input 00, sequence of ID's of TM is: p00l 0q0L 0r0L 0s0 At that point it halts, since state s has no move when the head is scanning With input 0 the sequence is: p0l 0qL rl 0tL 0tL 0BtL «The TM never halts, but continues to move right COSC 00 COSC 00

Acceptance by Final State / by Halting Two ways to define language of a TM:. by the set of input strings that cause it to reach an accepting state: L(M) = {w q 0 wl αpβ for some p in F and any α and β in Γ*}.. by the set of strings that cause the TM to halt, i.e., have no next move: H(M) = {w q 0 wl αpxβ and δ(p,x) is not defined} Language of Example XX0 can describe L(M) with RE: (0+)(0+)0(0+)* can describe H(M) with RE: ε+(0+)+(0+)(0+) + (0+)(0+)0(0+)* why the difference? no move on B defined from p, q or r could fix with δ(p,b) = δ(q,b) = δ(r,b) = (t, B, R) for this new machine M': L(M') = H(M') = L(M) COSC 00 5 COSC 00 6 Final State = Halting Need to show L is L(M ) (final state) for TM M iff L is H(M ) (halting) for TM M If:build M from M add final state r to M and transitions to r from any state where M might otherwise halt Only-if: can also do reverse wherever M has final state, M has no final move; wherever M has no move on some input, add transition to new state r which loops forever on any input Falling Off the Left End of Tape funny situation where the TM would halt but falls off the left end of the tape This situation is not halting. Neither does a TM accept if it tries to enter an accepting state as it falls off the left end. We can prevent falling off the left end, by marking the leftmost cell, as in the book. But it appears we do not need to do so in order to prove the equivalence of halting/accepting, since neither occurs when the TM falls off the left end. COSC 00 7 COSC 00 8 Stupid Turing Machine Tricks Can create structured state names & tape symbols: state named [q,x], where X is in Γ tape symbol [P,X], P = * or blank, X is real symbol Structured State Names use for swapping cells on tape, for example: r a/a,r b/b,r [q,a] [q,b] b/a,l a/a,l b/b,l a/b,l [p,b] [p,a] b/b,r a/b,r b/a,r a/a,r s Swapping cells on a tape with Γ = {a,b} COSC 00 9 COSC 00 0 5

Structured Tape Symbols simulate multi-tape TMs * marks cell to be read next a b c [,a] [*,b] [,c] Single-tape TM using structured tape symbols a b c d e f COSC 00 [,a,,d] [*,b,,e] [,c,*,f] Two-tape TM using structured tape symbols Example: Multiple Tracks A common use for multiple tracks is to use one track for data the other for a single "mark." Symbols of Γ are pairs [A,X], where X is the "real" symbol, and A is either B (blank) or *. Input symbol a is identified with [B, a]. The blank is [B,B]. Here's a program to find the *, assuming it is somewhere to the left of the present position.. δ(q,[b,x]) = (q,[b,x],l). δ(q,[*,x]) = (p, [B,X],R) COSC 00 Other TM Models While regular or CF languages are classes of languages that we defined by convenient notations (RE's, CFG's, etc.), no one supposed that they represented "everything we can compute." The purpose of the TM was to define "everything we can compute." For convenience, we use recognition of languages as the space of possibly computable things; other spaces, e.g., computing arithmetic functions, yield the same conclusions. Everything we can compute? are TMs powerful enough to represent everything we can compute? can we make a more powerful machine than TM? add more tapes? stacks? memory? nondeterminism? real computers? seen another way: does adding these facilities make a more powerful machine? COSC 00 COSC 00 Multitape TM's Allow the TM to have some finite number of tapes k, with a head for each tape. Move is a function of the state and the symbol scanned by each tape head. Action = new state, new symbol for each tape, and a head motion (L, R, or S, for "stationary"). First tape holds the input, other tapes are initially blank. Many Tapes to One Tape Simulation To simulate k tapes, use one tape with k tracks. One track holds the contents of each tape. Another track holds a mark representing the head position of that tape, as * W X Y Z To simulate one move of the multitape TM, the one-tape TM must remember how many *'s are to its left. COSC 00 5 COSC 00 6 6

Moves. Move left, then right, visiting all the *'s to see what each tape head is scanning.. Decide on the multitape TM's move, based on the scanned symbols and its state (remembered in the state of the one-tape TM).. Visit each * again, making the necessary adjustments: change symbols and move *'s one cell left or right, as needed. Important observation for when we study polynomial time TM's: If the multitape TM makes T(n) moves when the input is of length n, then the one-tape TM makes O(n ) moves. Thus, if the multitape TM takes polynomial time, so does the one-tape TM. Key point in proof: The *'s can't get more than n cells apart, so one move is simulated in n + k moves of the one-tape TM (k = constant to account for reverse of direction to write a symbol). This happens at most n times, so get O(n ). COSC 00 7 COSC 00 8 7