Open Problems in Automata Theory: An Idiosyncratic View

Similar documents
The Separating Words Problem

The Separating Words Problem

Remarks on Separating Words

Finite Automata, Palindromes, Powers, and Patterns

Space Complexity. The space complexity of a program is how much memory it uses.

Finite Automata, Palindromes, Patterns, and Borders

CP405 Theory of Computation

Final exam study sheet for CS3719 Turing machines and decidability.

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Theory of Computation Lecture 1. Dr. Nahla Belal

On NFAs Where All States are Final, Initial, or Both

6.045 Final Exam Solutions

Enumeration of Automata, Languages, and Regular Expressions

BALA RAVIKUMAR Department of Computer Science, Sonoma State University Rohnert Park, CA 94928, USA

Computer Sciences Department

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

1. Provide two valid strings in the languages described by each of the following regular expressions, with alphabet Σ = {0,1,2}.

Undecidability COMS Ashley Montanaro 4 April Department of Computer Science, University of Bristol Bristol, UK

UNIT-VIII COMPUTABILITY THEORY

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

Time and space classes

Chapter 5. Finite Automata

Subwords, Regular Languages, and Prime Numbers

Decidability (What, stuff is unsolvable?)

Chapter Two: Finite Automata

Chomsky Normal Form and TURING MACHINES. TUESDAY Feb 4

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

CSC173 Workshop: 13 Sept. Notes

V Honors Theory of Computation

Advanced topic: Space complexity

Midterm Exam 2 CS 341: Foundations of Computer Science II Fall 2018, face-to-face day section Prof. Marvin K. Nakayama

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Regular Sequences. Eric Rowland. School of Computer Science University of Waterloo, Ontario, Canada. September 5 & 8, 2011

6.045J/18.400J: Automata, Computability and Complexity Final Exam. There are two sheets of scratch paper at the end of this exam.

Computability and Complexity

Nondeterministic Finite Automata

The Logical Approach to Automatic Sequences

Introduction to the Theory of Computing

Turing Machines and Time Complexity

Decidability (intro.)

Additive Number Theory and Automata

Morphisms and Morphic Words

CS Lecture 29 P, NP, and NP-Completeness. k ) for all k. Fall The class P. The class NP

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

NODIA AND COMPANY. GATE SOLVED PAPER Computer Science Engineering Theory of Computation. Copyright By NODIA & COMPANY

SOLUTION: SOLUTION: SOLUTION:

What languages are Turing-decidable? What languages are not Turing-decidable? Is there a language that isn t even Turingrecognizable?

Solutions to Old Final Exams (For Fall 2007)

Question Bank UNIT I

Theory of Computation

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Lecture 22: PSPACE

Binary words containing infinitely many overlaps

Chapter Five: Nondeterministic Finite Automata

COMP/MATH 300 Topics for Spring 2017 June 5, Review and Regular Languages

Parikh s Theorem and Descriptional Complexity

Complexity Theory. Knowledge Representation and Reasoning. November 2, 2005

Two-Way Automata and Descriptional Complexity

arxiv: v3 [cs.fl] 2 Jul 2018

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Mix-Automatic Sequences

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

Outline. Complexity Theory. Example. Sketch of a log-space TM for palindromes. Log-space computations. Example VU , SS 2018

CSE 105 THEORY OF COMPUTATION

Polynomial Time Computation. Topics in Logic and Complexity Handout 2. Nondeterministic Polynomial Time. Succinct Certificates.

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Theory of Computation

Theory Bridge Exam Example Questions

Problems, and How Computer Scientists Solve Them Manas Thakur

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

Please give details of your answer. A direct answer without explanation is not counted.

CSCE 551: Chin-Tser Huang. University of South Carolina

Reversal of Regular Languages and State Complexity

Turing Machines Part III

Midterm Exam 2 CS 341: Foundations of Computer Science II Fall 2016, face-to-face day section Prof. Marvin K. Nakayama

Avoiding Approximate Squares

ACS2: Decidability Decidability

Shuffling and Unshuffling

The efficiency of identifying timed automata and the power of clocks

NOTES ON AUTOMATA. Date: April 29,

Theory of Computation (IX) Yijia Chen Fudan University

Finite Automata. BİL405 - Automata Theory and Formal Languages 1

Degrees of Streams. Jörg Endrullis Dimitri Hendriks Jan Willem Klop. Streams Seminar Nijmegen, 20th April Vrije Universiteit Amsterdam

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

CSE 468, Fall 2006 Homework solutions 1

CS481F01 Prelim 2 Solutions

Theory of Computation Turing Machine and Pushdown Automata

Two-Way Automata Characterizations of L/poly versus NL

Computational Models #1

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

Decision, Computation and Language

I have read and understand all of the instructions below, and I will obey the University Code on Academic Integrity.

Chapter 2: Finite Automata

Computer Sciences Department

Part I: Definitions and Properties

Nondeterminism. September 7, Nondeterminism

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Computational Models - Lecture 5 1

Transcription:

Open Problems in Automata Theory: An Idiosyncratic View Jeffrey Shallit School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1 Canada shallit@cs.uwaterloo.ca http://www.cs.uwaterloo.ca/~shallit 1/50

Outline 1. The separating words problem 2. Bucher s problem on separating context-free languages 3. Nonzero Hankel determinants 4. The Endrullis-Hendriks transducer problem 5. The Oldenburger-Kolakoski problem 6. Primes and automata 7. Divisibility and automata 8. A universality problem 2/50

The Simplest Computational Problem? Imagine a stupid computing device with very limited powers... What is the simplest computational problem you could ask it to solve? 3/50

The Simplest Computational Problem? - not the addition of two numbers - not sorting - it s telling two inputs apart - distinguishing them Thanks to Gavin Rymill for letting me use his Dalek image. 4/50

Our Computational Model: the Finite Automaton Our main computational model is the deterministic finite automaton, or DFA. We also consider nondeterministic finite automata, or NFA. 5/50

Motivation We want to know how many states suffice to tell one length-n input from another. On average, it s easy but how about in the worst case? Motivation: a classical problem from the early days of automata theory: Given two automata, how big a word do we need to distinguish them? 6/50

Motivation More precisely, given two DFA s M 1 and M 2, with m and n states, respectively, with L(M 1 ) L(M 2 ), what is a good bound on the length of the shortest word accepted by one but not the other? The cross-product construction gives an upper bound of mn 1 (make a DFA for L(M 1 ) L(M 2 )) But an upper bound of m+n 2 follows from the usual algorithm for minimizing automata Furthermore, this bound is best possible. For NFA s the bound is exponential in m and n 7/50

Separating Words with Automata Our problem is the inverse problem: given two distinct words, how big an automaton do we need to separate them? That is, given two words w and x of length n, what is the smallest number of states in any DFA that accepts one word, but not the other? Call this number sep(w,x). 8/50

Separation A machine M separates the word w from the word x if M accepts w and rejects x, or vice versa. For example, the machine below separates 0010 from 1000. 0 1 1 0 0, 1 However, no 2-state DFA can separate these two words. So sep(1000, 0010) = 3. 9/50

Separating Words of Different Length Easy case: if the two words are of different lengths, both n, we can separate them with a DFA of size O(logn). For by the prime number theorem, if k m, and k,m n then there is a prime p = O(logn) such that k m (mod p). So we can accept one word and reject the other by using a cycle mod p, and the appropriate residue class. 10/50

Separating Words of Different Length Example: suppose w = 22 and x = 52. Then w 1 (mod 7) and x 3 (mod 7). So we can accept w and reject x with a DFA that uses a cycle of size 7, as follows: 0. 1 0. 1 0. 1 0. 1 0. 1 0. 1 0. 1 11/50

Separating Words with Automata A similar idea works if the strings have a different number of 1 s, or if the 1 s are in different positions, or if the number of occurrences of a short subword is different, etc. 12/50

Separating Words With Automata Let S(n) := max sep(w,x), w = x =n w x the smallest number of states required to separate any two strings of length n. The separation problem was first studied by Goralcik and Koubek 1986, who proved S(n) = o(n). In 1989 Robson who obtained the best known bound: S(n) = O(n 2/5 (logn) 3/5 ). 13/50

Separating Words with Automata For equal-length strings, S(n) doesn t depend on alphabet size (provided it is at least 2). Suppose x,y are distinct strings of length n an alphabet Σ of size > 2. Then they must differ in some position, say for a b. x = x a x y = y b y Map a to 0, b to 1 and assign all other letters arbitrarily to either 0 or 1. This gives two new distinct strings X and Y of the same length. If X and Y can be separated by an m-state DFA, then so can x and y, by renaming transitions to be over Σ instead of 0 and 1. 14/50

Separating Words With Automata Not the best upper bound: Theorem (Robson, 1996). We can separate words by computing the parity of the number of 1 s occurring in positions congruent to i (mod j), for i,j = O( n). This gives the bound S(n) = O(n 1/2 ). Open Problem 1 ( 100): Improve Robson s upper bound of O(n 2/5 (logn) 3/5 ) on S(n). 15/50

Separating Words With Automata: Lower Bound Claim: S(n) = Ω(logn). To see this, consider the two strings Proof in pictures: 0 t 1+lcm(1,2,...,t) 1 t 1 and 0 t 1 1 t 1+lcm(1,2,...,t). 0-tail 1-tail 0-cycle 1-cycle 16/50

Separating Words With Automata: Lower Bound So no t-state machine can distinguish these strings. Since lcm(1,2,...,t) = e t+o(t) by the prime number theorem, the lower bound S(n) = Ω(logn) follows. 17/50

Variations on Separating Words Separation by context-free grammars; count number of productions Problem: right-hand sides can be arbitrarily complicated Solution: Use CFG s in Chomsky normal form (CNF), where all productions are of the form A BC or A a. 18/50

Variations on Separating Words In 1999 Currie, Petersen, Robson and JOS proved: If w x then there is a CFG in CNF with O(loglogn) productions separating w from x. Furthermore, this bound is optimal. If w = x there is a CFG in CNF with O(logn) productions separating w from x. There is a lower bound of Ω( logn loglogn ). Open Problem 2 ( 10): Find matching upper and lower bounds for CFG s in the case w = x. 19/50

More Variations on Separating Words Separation by NFA. Do NFA s give more power? Yes, but sep(0001, 0111) = 3 nsep(0001, 0111) = 2. 20/50

More Variations on Separating Words Is unbounded? sep(x,w)/nsep(x,w) Yes. Consider once again the strings w = 0 t 1+lcm(1,2,...,t) 1 t 1 and x = 0 t 1 1 t 1+lcm(1,2,...,t) where t = n 2 3n+2, n 4. 21/50

We know from before that any DFA separating these strings must have at least t +1 = n 2 3n+3 states. Now consider the following NFA M: 0 0 0 0 0 0 loop of n states 0 0 0 0 0 0 loop of n-1 states 1 The language accepted by this NFA is {0 a : a A}1, where A is the set of all integers representable by a non-negative integer linear combination of n and n 1. 22/50

0 0 0 0 0 0 loop of n states 0 0 0 0 0 0 loop of n-1 states 1 But t 1 = n 2 3n+1 A. On the other hand, every integer t is in A. Hence w = 0 t 1+lcm(1,2,...,t) 1 t 1 is accepted by M but x = 0 t 1 1 t 1+lcm(1,2,...,t) is not. M has 2n = Θ( t) states, so sep(x,w)/nsep(x,w) t = Ω( log x ), which is unbounded. 23/50

More Variations on Separating Words Open Problem 3 ( 50): Find good bounds on nsep(w,x) for w = x = n, as a function of n. Open Problem 4 ( 50): Find good bounds on sep(w,x)/nsep(w,x). 24/50

More Variations on Separating Words Must sep(w R,x R ) = sep(w,x)? No, for w = 1000, x = 0010, we have but sep(w,x) = 3 sep(w R,x R ) = 2. Open Problem 5 ( 10): Is sep(x,w) sep(x R,w ) R unbounded? 25/50

More Variations on Separating Words Two words are conjugates if one is a cyclic shift of the other. Is the separating words problem any easier if restricted to pairs of conjugates? 26/50

Another Kind of Separation Suppose you have regular languages R 1,R 2 with R 1 R 2 and R 2 R 1 infinite. Then it is easy to see that there is a regular language R 3 such that R 1 R 3 R 2 such that R 2 R 3 and R 3 R 1 are both infinite. This is a kind of topological separation property. 27/50

Another Kind of Separation In 1980, Bucher asked: Open Problem 6 ( 100): Is the same true for context-free languages? That is, given context-free languages L 1,L 2 with L 1 L 2 and L 2 L 1 infinite, need there be a context-free language L 3 such that L 1 L 3 L 2 such that L 2 L 3 and L 3 L 1 are both infinite? Not even known in the case where L 2 = Σ. 28/50

The primitive words problem Speaking of context-free languages, it s still not known if the primitive words over {0, 1} are context-free. A word is primitive if it is a non-power. This is Open Problem 7 ( 200). A forthcoming 500-page book by Dömösi, Horváth, and Ito called Context-Free Languages and Primitive Words will appear in June 2014. 29/50

The Thue-Morse sequence The Thue-Morse sequence t = t 0 t 1 t 2 = 011010011001011010010110 can be described in many ways by the recurrence t 2n = t n and t 2n+1 = 1 t n as the fixed point of the map 0 01 and 1 10 as the sequence generated by the following DFA, where the input is n expressed in base 2 0 0 1 0 1 1 30/50

Avoidability One of the beautiful properties of the Thue-Morse word is that it avoids overlaps An overlap is a subword (factor) of the form axaxa, where a is a single letter, and x is a word We d like to generalize this One possible generalization involves Hankel determinants 31/50

Avoidability An n n Hankel determinant of a sequence (a i ) i 0 is a determinant of a matrix of the form a k a k+1 a k+n 1 a k+1 a k+2 a k+n a k+2 a k+3 a k+n+1....... a k+n 1 a k+n a k+2n 2 If a sequence of real numbers has an overlap a x a {}}{{}}{{}}{{}}{{}}{ a k a k+1 a k+n 2 a k+n 1 a k+n a k+2n 3 a k+2n 2 then the corresponding n n Hankel matrix has the first and last rows both equal to axa, and so the determinant is 0. x a 32/50

Avoidability Is there a sequence on two real numbers for which all the Hankel determinants (of all orders) are nonzero? No - a simple backtracking argument proves that the longest such sequence is of length 14. For example, 1,1,2,1,1,2,2,1,2,1,1,2,1,2. How about sequences over three numbers? Open Problem 8 ( 100): Is there a sequence on three real numbers for which all the Hankel determinants are nonzero? 33/50

Avoidability Indeed, a backtracking algorithm easily finds such a sequence on {1,2,3} with 200 terms. What is interesting is that this algorithm never had to backtrack! The sequence begins 1,1,2,1,1,2,2,1,2,1,1,2,1,2,3,1,1,2,1,... 34/50

Avoidability Furthermore, the following morphism seems to generate such a sequence on 4 symbols: 1 12, 2 23, 3 14, 4 32. This generates the sequence 1223231423141232. Open Problem 9 ( 50): Does this sequence have all nonzero Hankel determinants? We have checked up to length 800. 35/50

The Endrullis-Hendriks Problem on Transducers For two infinite words w and x, we write w x if we can transform x into w by a finite-state transducer. If w x and x w then we write w x. We call an infinite word w prime if whenever x w either x w or x is ultimately periodic. 36/50

The Endrullis-Hendriks Problem on Transducers Example: the infinite word is prime. 01 2 {}}{ 00 1 3 {}}{ 000 1 4 {}}{ 0000... Open Problem 10 ( 20): are there any other (inequivalent) prime words over two letters? Open Problem 11 ( 20): is the Thue-Morse word prime? 37/50

The Oldenburger-Kolakoski problem Speaking of transducers, consider the following transducer: 2/11 1/1 The fixed point of this transducer is 1/2 2/22 k = 12211212212211211221211, the Oldenburger-Kolakoski word. Open Problem 12 ( 100): Do the frequencies of letters exist? Are they 1 2? 38/50

Primes and automata Open Problem 13 ( 50): Is the following question recursively solvable? Given a DFA M over the alphabet Σ k = {0,1,...,k 1}, does M accept the base-k representation of at least one prime number? If it were, we could decide if there are any more Fermat primes after the largest known one (2 16 +1). 39/50

Divisibility and automata Open Problem 14 ( 50): Is the following question recursively solvable? Given a DFA M over the alphabet Σ k Σ k, does it accept the base-k representation of a pair of integers (x,y) with x y? 40/50

A universality problem Classical problem: given M, a machine of some type (e.g., DFA, NFA, PDA), decide if L(M) = Σ. Unsolvable, if M is a PDA; PSPACE-complete, if M is an NFA; Solvable in polynomial time if M is a DFA. 41/50

A universality problem A simple variation: Given M, does there exist an integer n 0 such that Σ n L(M)? Still unsolvable for PDA s NP-complete for DFA s PSPACE-hard for NFA s - but is it in PSPACE? This is Open Problem 15 ( 25). 42/50

A universality problem It would follow that this problem is in PSPACE if we knew that if Σ r L(M) for some n, then there always exists a small such r (e.g., r exp(p(n)) for some polynomial p). To see this, on input the DFA, we just examine every length l up to the bound r and nondeterministically guess a string of length l (symbol-by-symbol) that fails to be accepted. All this can be done in NPSPACE and hence PSPACE by Savitch s theorem. 43/50

A related problem Here is a related problem. Open Problem 16 ( 25): what is the complexity of the following problem: given a finite language L, is L infinite? If L is represented by a regular expression, this problem is NP-hard. 44/50

Another related problem Open Problem 17 ( 25): What is the complexity of the following problem: given a finite list of words L over an alphabet Σ, is Fact(L ) = Σ? Here by Fact we mean the set of all factors (contiguous subwords). Similarly, we d like to find good bounds on the length of the shortest word in Σ Fact(S ), given that Fact(L ) Σ. This is Open Problem 18 ( 200). Currently examples of quadratic length are known, but the best upper bound is doubly-exponential in the length of the longest word in S. 45/50

One More for Dessert: Pierce Expansions Let a > b > 0 be integers. Define b 0 = b and b i+1 = a mod b i for i 0. Let P(a,b) be the least index n such that b n = 0. 46/50

One More for Dessert: Pierce Expansions An example with a = 35, b = 22: b 0 = 22 b 1 = 35 mod 22 = 13 b 2 = 35 mod 13 = 9 b 3 = 35 mod 9 = 8 b 4 = 35 mod 8 = 3 b 5 = 35 mod 3 = 2 b 6 = 35 mod 2 = 1 b 7 = 35 mod 1 = 0. So P(35,22) = 7. 47/50

One More for Dessert: Pierce Expansions Open Problem 19 ( 200): Find good estimates for how big P(a,b) can be, as a function of a. The problem is interesting because it is related to the so-called Pierce Expansion of b/a: 22 35 = 1 1 (1 1 2 (1 1 3 (1 1 4 (1 1 11 (1 1 17 (1 1 35 )))))). This is called a Pierce expansion. 48/50

One More for Dessert: Pierce Expansions It is known that P(a,b) = O(a 1/3 ). However, the true behavior is probably O((loga) 2 ). There is a lower bound of Ω(loga), which can be obtained by choosing a = lcm(1,2,...,n) 1 b = n. I offer 200 for a significant improvement to either the known upper or lower bound. 49/50

For Further Reading J. M. Robson, Separating words with machines and groups, RAIRO Info. Theor. Appl. 30 (1996), 81 86. J. Currie, H. Petersen, J. M. Robson, and J. Shallit, Separating words with small grammars, J. Autom. Lang. Combin. 4 (1999), 101 110. N. Rampersad, J. Shallit, Z. Xu, The computational complexity of universality problems for prefixes, suffixes, factors, and subwords of regular languages, Fundam. Inform. 116 (1-4) (2012), 223 236. 50/50