Words with the Smallest Number of Closed Factors

Similar documents
ON THE LEAST NUMBER OF PALINDROMES IN AN INFINITE WORD

Special Factors and Suffix and Factor Automata

Open and closed factors of Arnoux-Rauzy words arxiv: v1 [math.co] 12 Oct 2018 Olga Parshina 1,2 and Luca Zamboni 1

arxiv: v1 [math.co] 22 Jan 2013

The free group F 2, the braid group B 3, and palindromes

arxiv: v2 [math.co] 24 Oct 2012

Generalized Thue-Morse words and palindromic richness

A connection between palindromic and factor complexity using return words

Discrete Mathematics

Bourget-du-lac cedex, France. Extended Abstract

Palindromic complexity of codings of rotations

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

A characterization of balanced episturmian sequences

Combinatorics on Finite Words and Data Structures

arxiv: v1 [cs.dm] 16 Jan 2018

On different generalizations of episturmian words

Aperiodic languages and generalizations

FIXED POINTS OF MORPHISMS AMONG BINARY GENERALIZED PSEUDOSTANDARD WORDS

Unbordered Factors and Lyndon Words

Complexité palindromique des codages de rotations et conjectures

Two-dimensional Total Palindrome Complexity

On Strong Alt-Induced Codes

Generalized Thue-Morse words and palindromic richness extended abstract

Pascal Ochem 1 and Elise Vaslet Introduction REPETITION THRESHOLDS FOR SUBDIVIDED GRAPHS AND TREES

About Duval Extensions

Closed, palindromic, rich, privileged, trapezoidal, and balanced words in automatic sequences

One-relation languages and ω-code generators

arxiv: v1 [math.co] 27 May 2012

Online Computation of Abelian Runs

PALINDROMIC RICHNESS

Combinatorics On Sturmian Words. Amy Glen. Major Review Seminar. February 27, 2004 DISCIPLINE OF PURE MATHEMATICS

Properties of Fibonacci languages

arxiv: v1 [cs.dm] 13 Feb 2010

An Optimal Lower Bound for Nonregular Languages

#A36 INTEGERS 12 (2012) FACTOR FREQUENCIES IN LANGUAGES INVARIANT UNDER SYMMETRIES PRESERVING FACTOR FREQUENCIES

Theoretical Computer Science. Completing a combinatorial proof of the rigidity of Sturmian words generated by morphisms

Binary words containing infinitely many overlaps

arxiv: v1 [math.co] 25 Apr 2018

Hierarchy among Automata on Linear Orderings

Sturmian Words, Sturmian Trees and Sturmian Graphs

On Properties and State Complexity of Deterministic State-Partition Automata

3515ICT: Theory of Computation. Regular languages

Quasi-Linear Time Computation of the Abelian Periods of a Word

On Stateless Multicounter Machines

CS375: Logic and Theory of Computing

Jérémie Chalopin 1 and Pascal Ochem 2. Introduction DEJEAN S CONJECTURE AND LETTER FREQUENCY

ON HIGHLY PALINDROMIC WORDS

The Fixed String of Elementary Cellular Automata

Theoretical Computer Science

BOUNDS ON ZIMIN WORD AVOIDANCE

Product of Finite Maximal p-codes

A PASCAL-LIKE BOUND FOR THE NUMBER OF NECKLACES WITH FIXED DENSITY

Automata on linear orderings

Simple equations on binary factorial languages

Context Free Language Properties

DENSITY OF CRITICAL FACTORIZATIONS

Factorizations of the Fibonacci Infinite Word

IAENG International Journal of Applied Mathematics, 46:2, IJAM_46_2_08. A Solution to Yamakami s Problem on Non-uniform Context-free Languages

Involution Palindrome DNA Languages

Random Reals à la Chaitin with or without prefix-freeness

arxiv: v1 [math.co] 13 Sep 2017

Independence of certain quantities indicating subword occurrences

Square-free words with square-free self-shuffles

Finite repetition threshold for large alphabets

Bichain graphs: geometric model and universal graphs

Bordered Conjugates of Words over Large Alphabets

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

Counting Peaks and Valleys in a Partition of a Set

CS 275 Automata and Formal Language Theory

An algebraic characterization of unary two-way transducers

arxiv: v1 [cs.dm] 14 Oct 2016

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

Theory of Computation

The commutation with ternary sets of words

Automata Theory and Formal Grammars: Lecture 1

On the Sequence A and Its Combinatorial Interpretations

Ogden s Lemma. and Formal Languages. Automata Theory CS 573. The proof is similar but more fussy. than the proof of the PL4CFL.

Solution to CS375 Homework Assignment 11 (40 points) Due date: 4/26/2017

GENERALIZED PALINDROMIC CONTINUED FRACTIONS

On the Complete Join of Permutative Combinatorial Rees-Sushkevich Varieties

Introduction to Metalogic 1

STAR-FREE GEODESIC LANGUAGES FOR GROUPS

Results on Transforming NFA into DFCA

arxiv:math/ v2 [math.gr] 19 Oct 2007

Languages and monoids with disjunctive identity

On the Average Complexity of Brzozowski s Algorithm for Deterministic Automata with a Small Number of Final States

Arturo Carpi 1 and Cristiano Maggi 2

On Shyr-Yu Theorem 1

Word-representability of line graphs

Palindromic complexity of infinite words associated with simple Parry numbers

INITIAL POWERS OF STURMIAN SEQUENCES

The asymptotic number of prefix normal words

European Journal of Combinatorics

Transducers for bidirectional decoding of prefix codes

Power of controlled insertion and deletion

A ternary square-free sequence avoiding factors equivalent to abcacba

On the coinductive nature of centralizers

POLYNOMIAL ALGORITHM FOR FIXED POINTS OF NONTRIVIAL MORPHISMS

arxiv:cs/ v1 [cs.cc] 9 Feb 2007

Transcription:

Words with the Smallest Number of Closed Factors Gabriele Fici Zsuzsanna Lipták Abstract A word is closed if it contains a factor that occurs both as a prefix and as a suffix but does not have internal occurrences. We show that any word of length n contains at least n + 1 closed factors (i.e., factors that are closed words). We investigate the language L of words over the alphabet {a, b} containing exactly n + 1 closed factors. We show that a word belongs to L if and only if its closed factors and its palindromic factors coincide (and therefore the words in L are rich words). We also show that L coincides with the language of conjugates of words in a b. 1 Introduction A word is a finite sequence of elements from a finite set Σ. We refer to the elements of Σ as letters and to Σ as the alphabet. The i-th letter of a word w is denoted by w i. Given a word w = w 1 w 2 w n, with w i Σ for 1 i n, the nonnegative integer n is the length of w, denoted by w. The empty word has length zero and is denoted by ε. The set of all words over Σ is denoted by Σ. Any subset of Σ is called a language. A prefix (resp. a suffix) of a word w is any word u such that w = uz (resp. w = zu) for some word z. A factor of w is a prefix of a suffix (or, equivalently, a suffix of a prefix) of w. The set of prefixes, suffixes and factors of the word w are denoted by Pref(w), Suff(w) and Fact(w) respectively. A border of a word w is any word in Pref(w) Suff(w) different from w. From the definitions, we have that ε occurs as a prefix, suffix and factor in any word. An occurrence of a factor u in a word w is a pair of positions (i, j) such that w i... w j = u. An occurrence is internal if i > 1 and j < w. The word w = w n w n 1 w 1 is called the reversal (or mirror image) of w. A palindrome is a word w such that w = w. In particular, the empty word is a palindrome. A conjugate of a word w is any word of the form vu such that uv = w, for some u, v Σ. A conjugate of a word w is also called a rotation of w. Let w be a word. We denote by PAL(w) the set of factors of w that are palindromes. Droubay, Justin and Pirillo showed [4] that for any word w of length n, one has PAL(w) n + 1. Consequently, w is called rich [6] (or full [1]) if PAL(w) = n + 1, that is, if it contains the largest number of palindromes a word of length n can contain. A language L is called factorial if L = Fact(L), i.e., if L contains all the factors of its words. A language L is extendible if for every word w L, there exist letters a, b Σ such that awb L. The language of rich words over a fixed alphabet Σ is an example of factorial and extendible language. We now recall the definition of closed word [5]: I3S, CNRS & Université Nice Sophia Antipolis, France. Email: gabriele.fici@unice.fr. Università di Verona, Italy. Email: zsuzsanna.liptak@univr.it.

Definition 1. A word w is closed if it is empty or has a factor occurring exactly twice in w, as a prefix and as a suffix of w. The word aba is closed, since its factor a appears only as a prefix and as a suffix. The word abaa, instead, is not closed. Note that for any letter a Σ and for any n > 0, the word a n is closed, a n 1 being a factor occurring only as a prefix and as a suffix in it. More generally, any word w that is a power of a shorter word, i.e., w = v n for a non-empty v and n > 1, is closed. There exist closed words that are not palindromes, for example the word abab. Conversely, there exist palindromes that are not closed, but it is worth noticing that a shortest palindrome over a two-letter alphabet that is not closed has length 14. An example is aabbabaababbaa. Remark 1. The notion of closed word is closely related to the concept of complete return to a factor, as considered in [6]. A complete return to the factor u in a word w is any factor of w having exactly two occurrences of u, one as a prefix and one as a suffix. Hence w is closed if and only if it is a complete return to one of its factors; such a factor is clearly both the longest repeated prefix and the longest repeated suffix of w (that is, the longest border of w). The notion of closed word is also equivalent to that of periodic-like word [3]. A word w is periodic-like if its longest repeated prefix does not have two occurrences in w followed by different letters. Observation 1. Let w be a non-empty word over Σ. The following characterizations of closed words follow easily from the definition: 1. w has a factor occurring exactly twice in w, as a prefix and as a suffix of w; 2. the longest repeated prefix of w does not have internal occurrences in w, that is, occurs in w only as a prefix and as a suffix; 3. the longest repeated suffix of w does not have internal occurrences in w, that is, occurs in w only as a suffix and as a prefix; 4. the longest repeated prefix of w does not have two occurrences in w followed by different letters; 5. the longest repeated suffix of w does not have two occurrences in w preceded by different letters; 6. w has a border that does not have internal occurrences in w; 7. the longest border of w does not have internal occurrences in w; 8. w is the complete return to its longest prefix; 9. w is the complete return to its longest border. For more details on closed words and related results see [3, 2, 5]. 2 Closed factors Let w be a word. A factor of w that is a closed word is called a closed factor of w. The set of closed factors of the word w is denoted by C(w). Lemma 2. For any non-empty word w of length n, one has C(w) n + 1.

Proof. We show that every position of w is the ending position of an occurrence of a distinct closed factor of w. Thus w contains at least n non-empty closed factors, and the claim follows. Indeed, let v be the longest non-empty closed factor ending in position i, so that w i v +1 w i = v. Since a is closed for every a Σ, such a factor always exists. If v did not occur before in w, then we are done. Otherwise, let j be the largest position smaller than i such that w j v +1 w j = v. Set v = w j v +1 w i and observe that v is a closed factor ending in i, with longest border v. But v > v, in contradiction to the choice of v. Lemma 3. Let u, v be non-empty words. Then C(u) + C(v) C(uv) + 1. Proof. We have C(u) C(uv) and C(v) C(uv). If C(u) C(v) = {ε}, then the claim is proved. Otherwise, let z be a non-empty word in C(u) C(v). Then z is counted twice on the left side of the equation, but only once on the right. We will find a closed factor y(z) of uv which is neither a factor of u nor of v, i.e., y(z) C(uv) \ (C(u) C(v)). Since by construction, the function y is injective, the claim follows. Let uv = w = w 1 w n, and let j be the smallest integer greater than u such that z = w j w j+ z 1. Thus, j is the starting position of the first occurrence of z in v. Since z occurs in u, z has at least one occurrence in w before j. Let i be the greatest integer smaller than j such that z = w i w i+ z 1. So i is the starting position in w of the last occurrence of z before j. In particular, i u, but the occurrence can be completely within u, or overlapping between u and v. Now define z = w i w j+ z 1. By construction, z is a closed factor of w having z as longest border. If z C(u), then set y(z) = z, and we are done. Otherwise, repeat the construction for z. Eventually, we will find a closed factor y(z) = z (k) of w = uv that is not in C(u). Then y(z) C(w) \ (C(u) C(v)), as claimed. Notice that, by construction, j + z 1 is the ending position of the first occurrence of y(z) in w. Thus, the function y is injective, and the proof is complete. Proposition 4. Let w be a non-empty word of length n. If C(w) PAL(w), then C(w) = PAL(w) and C(w) = PAL(w) = n + 1. In particular, w is a rich word. Proof. On the one hand, from Corollary 2, one has C(w) n + 1. On the other hand, one has PAL(w) n + 1. Hence, if C(w) PAL(w), then it must be C(w) = PAL(w) and C(w) = PAL(w) = n + 1, and so w is a rich word. Bucci et al. showed [2, Proposition 4.3] that a word w is rich if and only if every closed factor v of w has the property that the longest palindromic prefix (or suffix) of v is unrepeated in v. Moreover, if w is a palindrome, then it is rich if and only if PAL(w) C(w) [2, Corollary 5.2]. In Section 4, we shall prove that the condition PAL(w) = C(w) characterizes the words having the smallest number of closed factors over a binary alphabet. 3 Words with the smallest number of closed factors By Corollary 2, we have that n + 1 is a lower bound on the number of closed factors of a word of length n > 0. We introduce the following definition: Definition 5. A word w Σ is C-poor if C(w) = w + 1. We also set L Σ = {w Σ : C(w) = w + 1} the language of C-poor words over the alphabet Σ. Remark 2. If Σ = 1, then L Σ = Σ. So in what follows we will suppose Σ 2.

Lemma 6. The language L Σ is closed under reversal. Proof. It follows from the definition that a word w Σ is closed if and only if its reversal w is closed. Therefore, we have that w L Σ if and only if w L Σ. Lemma 7. Let w be a C-poor word over the alphabet Σ and x Σ. The word wx (resp. xw) is C-poor if and only if it has a unique suffix (resp. prefix) that is closed and is not a factor of w. Proof. We prove the statement for wx, the one for xw will follow by symmetry. The if part is straightforward. For the only if part, recall from Lemma 2 that there is at least one new closed factor ending in every position, so in particular wx has at least one suffix that is closed and is not a factor of w. Suppose that there is another closed suffix of wx that is not a factor of w. Then C(wx) > C(w) + 1 = w + 2 = wx + 1 and therefore wx does not belong to L Σ. Proposition 8. A word w Σ belongs to L Σ if and only if every factor of w belongs to L Σ. That is, L Σ is a factorial language. Proof. If all factors of a word w are C-poor words, then, in particular, w itself is a C-poor word. Conversely, suppose by contradiction that there exists a C-poor word w containing a factor v that is not a C-poor word, i.e., w L Σ, w = uvz and C(v) > v + 1. By Proposition 3, C(w) C(u) + C(v) + C(z) 2 > u + z + v + 1 = w + 1 and therefore w cannot be a C-poor word. 4 Binary words In this section we fix the alphabet Σ = {a, b}. For simplicity of exposition, we will denote the language of C-poor words over {a, b} by L rather than by L {a,b}. We first recall the definition of bitonic word. Definition 9. A word w {a, b} is bitonic if it is a conjugate of a word in a b, i.e., if it is of the form a i b j a k or b i a j b k for integers i, j, k 0. The following lemma relates bitonic words to closed factors. Lemma 10. If a word w {a, b} does not contain any complete return to ab or ba as a factor, then it is bitonic. Proof. Straightforward. Lemma 11. Let w be a bitonic word. Then C(w) PAL(w). Proof. Since w is bitonic, a closed factor of w can only be the complete return to a power of a single letter. So a closed factor u of w is of the form u = a n, u = b n, u = a n b m a n or u = b n a m b n, for some n, m > 0, and these words are all palindromes. Thus, by Proposition 4, any bitonic word w of length n > 0 contains exactly n + 1 closed factors and so is a C-poor word. In the rest of the section we shall prove the converse, that is, we shall prove that if w is a C-poor word over {a, b}, then w is bitonic. Consider the word w = abab. The word w does not belong to L, since it has 6 closed factors, namely ε, a, b, aba, bab and abab. In fact, it has two suffixes (bab and abab) that are closed and do not appear before in w, and hence by Proposition 8 it cannot belong to L. More generally, any word u such that u is the complete return to ab or ba does not belong to L for the same reason. So, using Proposition 8, we get:

Lemma 12. If w L, then w does not contain any complete return to ab or ba as a factor. We summarize the characterizations of L in the following theorem: Theorem 13. Let w {a, b}. The following are equivalent: 1. w L; 2. C(w) = PAL(w); 3. C(w) PAL(w); 4. w is a bitonic word; 5. w does not contain any complete return to ab or ba. Proof. 1) 5) by Lemma 12; 5) 4) by Lemma 10; 4) 3) by Lemma 11; finally, 3) 2) and 2) 1) by Proposition 4. So, by Theorem 13 and Proposition 4, every word in L is rich. Notice that there exist rich words that are not in L, for example the word w = abab, which has 6 closed factors, namely ε, a, b, aba, bab and abab. Another consequence of Theorem 13 is that L is extendible, since the language of bitonic words is clearly extendible. Thus, the language L is a factorial and extendible subset of the language of (binary) rich words. It further follows from Theorem 13 that L is a regular language, since the language of the conjugates of words of a regular language is regular [7]. In the following proposition we exhibit a closed enumerative formula for the language L. Proposition 14. For every n > 0, there are exactly n 2 n + 2 distinct words in L. Proof. Each of the n 1 words of length n > 0 in a + b + has n distinct rotations, while for the words a n and b n all the rotations coincide. Thus, there are n(n 1) + 2 bitonic words of length n, and the statement follows from Theorem 13. 5 Conclusion and open problems In this paper we studied words with the smallest number of closed factors, which we referred to as C-poor words. We gave some interesting characterizations in the case of a binary alphabet. In particular, we showed that the language of binary C-poor words coincides with the language of bitonic words. A natural direction of further investigation is finding a characterization for C-poor words over alphabets larger than 2. An enumerative formula for rich words is not known, not even in the binary case. A possible approach to this problem is to separate rich words in subclasses to be enumerated separately. Our enumerative formula for C-poor words given in Proposition 14 constitutes a step in this direction.

References [1] S. Brlek, S. Hamel, M. Nivat, and C. Reutenauer. On the palindromic complexity of infinite words. Internat. J. Found. Comput. Sci., 15:293 306, 2004. [2] M. Bucci, A. de Luca, and A. De Luca. Rich and Periodic-Like Words. In DLT 2009, 13th International Conference on Developments in Language Theory, volume 5583 of Lecture Notes in Comput. Sci., pages 145 155. Springer, 2009. [3] A. Carpi and A. de Luca. Periodic-like words, periodicity and boxes. Acta Inform., 37:597 618, 2001. [4] X. Droubay, J. Justin, and G. Pirillo. Episturmian words and some constructions of de Luca and Rauzy. Theoret. Comput. Sci., 255(1-2):539 553, 2001. [5] G. Fici. A Classification of Trapezoidal Words. In WORDS 2011, 8th International Conference on Words, number 63 in Electronic Proceedings in Theoretical Computer Science, pages 129 137, 2011. [6] A. Glen, J. Justin, S. Widmer, and L. Q. Zamboni. Palindromic richness. European J. Combin., 30:510 531, 2009. [7] J.E. Hopcroft, R. Motwani, and J.D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 2001.