An introduction to finite automata and their connection to logic

Similar documents
Formal Languages and Automata

Minimal DFA. minimal DFA for L starting from any other

Coalgebra, Lecture 15: Equations for Deterministic Automata

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Designing finite automata II

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Theory of Computation Regular Languages

1 Nondeterministic Finite Automata

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Finite-State Automata: Recap

Convert the NFA into DFA

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

Finite Automata-cont d

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

CHAPTER 1 Regular Languages. Contents

Regular expressions, Finite Automata, transition graphs are all the same!!

Chapter 2 Finite Automata

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

p-adic Egyptian Fractions

Lecture 9: LTL and Büchi Automata

Lecture 08: Feb. 08, 2019

Homework 3 Solutions

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

CMSC 330: Organization of Programming Languages

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

First Midterm Examination

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

Lecture 09: Myhill-Nerode Theorem

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

3 Regular expressions

Harvard University Computer Science 121 Midterm October 23, 2012

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

A negative answer to a question of Wilke on varieties of!-languages

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

CDM Automata on Infinite Words

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Model Reduction of Finite State Machines by Contraction

Parse trees, ambiguity, and Chomsky normal form

Nondeterminism and Nodeterministic Automata

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Non-deterministic Finite Automata

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

State Minimization for DFAs

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Formal languages, automata, and theory of computation

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

Lecture 3: Equivalence Relations

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

1.3 Regular Expressions

Let's start with an example:

NFAs continued, Closure Properties of Regular Languages

Bases for Vector Spaces

Thoery of Automata CS402

DFA minimisation using the Myhill-Nerode theorem

More on automata. Michael George. March 24 April 7, 2014

CISC 4090 Theory of Computation

NFAs continued, Closure Properties of Regular Languages

CS 275 Automata and Formal Language Theory

GNFA GNFA GNFA GNFA GNFA

Tutorial Automata and formal Languages

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Java II Finite Automata I

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

CS 275 Automata and Formal Language Theory

First Midterm Examination

Deterministic Finite Automata

Relating logic to formal languages

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

CM10196 Topic 4: Functions and Relations

BACHELOR THESIS Star height

PART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA

Math 1B, lecture 4: Error bounds for numerical methods

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Non-deterministic Finite Automata

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

Some Theory of Computation Exercises Week 1

Fundamentals of Computer Science

Revision Sheet. (a) Give a regular expression for each of the following languages:

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

1.4 Nonregular Languages

Chapter 1, Part 1. Regular Languages. CSC527, Chapter 1, Part 1 c 2012 Mitsunori Ogihara 1

Handout: Natural deduction for first order logic

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

The size of subsequence automaton

Transcription:

An introduction to finite utomt nd their connection to logic Howrd Struing, Pscl Weil To cite this version: Howrd Struing, Pscl Weil. An introduction to finite utomt nd their connection to logic. Deepk D Souz, Priti Shnkr. Modern pplictions of utomt theory, World Scientific, pp.3-43, 2012, IISc Reserch Monogrphs. <hl-00541028v2> HAL Id: hl-00541028 https://hl.rchives-ouvertes.fr/hl-00541028v2 Sumitted on 21 Sep 2011 HAL is multi-disciplinry open ccess rchive for the deposit nd dissemintion of scientific reserch documents, whether they re pulished or not. The documents my come from teching nd reserch institutions in Frnce or rod, or from pulic or privte reserch centers. L rchive ouverte pluridisciplinire HAL, est destinée u dépôt et à l diffusion de documents scientifiques de niveu recherche, puliés ou non, émnnt des étlissements d enseignement et de recherche frnçis ou étrngers, des lortoires pulics ou privés.

Chpter 1 An Introduction to Finite Automt nd their Connection to Logic Howrd Struing Computer Science Deprtment, Boston College, Chestnut Hill, Msschusetts, USA Pscl Weil LBRI, Université de Bordeux nd CNRS, Bordeux, Frnce This introductory chpter is tutoril on finite utomt. We present the stndrd mteril on determiniztion nd minimiztion, s well s n ccount of the equivlence of finite utomt nd mondic second-order logic. We conclude with n introduction to the syntctic monoid, nd s n ppliction give proof of the equivlence of first-order definility nd periodicity. 1.1. Introduction 1.1.1. Motivtion The word utomton (plurl: utomt) ws originlly used to refer to devices like clocks nd wtches, s well s mechnicl mrvels uilt to resemle moving humns nd nimls, whose internl mechnisms re hidden nd which thus pper to operte spontneously. In theoreticl computer science, the finite utomton is mong the simplest models of computtion: A device tht cn e in one of finitely mny sttes, nd tht receives discrete sequence of inputs from the outside world, chnging its stte ccordingly. This is in mrked contrst to more generl nd powerful models of computtion, such s Turing mchines, in which the set of glol sttes of the device the so-clled instntneous descriptions is infinite. A finite utomton is more kin to the control unit of the Turing mchine (or, for tht mtter, the control unit of modern computer processor), in which the present stte of the unit nd the input symol under the reding hed determine the next stte of the unit, s well s signls to move the reding hed left or right nd to write symol on the mchine s tpe. The crucil distinction is tht while the Turing mchine cn record nd consult its entire computtion history, ll the Work prtilly supported y NSF Grnt CCF-0915065 Work prtilly supported y ANR 2010 BLAN 0202 01 FREC 3

4 H. Struing nd P. Weil informtion tht finite utomton cn use out the sequence of inputs it hs seen is represented in its current stte. But s rudimentry s this computtionl model my pper, it hs rich theory, nd mny pplictions. In this introductory chpter, we will present the coretheory: thtoffiniteutomtonredingfiniteword, thtis, finite stringof inputs, nd using the resulting stte to decide whether to ccept or reject the word. The centrl question motivting our presenttion is to determine wht properties of words cn e decided y finite utomt. Susequent chpters will present oth generliztions of the sic model (to devices tht red infinite words, leled trees, etc.) nd to pplictions. An importnt theme in this chpter, s well s throughout the volume, is the close connection etween utomt nd forml logic. 1.1.2. Pln of the chpter In Section 1.2, we introduce finite utomt s devices for recognizing forml lnguges, nd show the equivlence of severl vrints of the sic model, most notly the equivlence of deterministic nd nondeterministic utomt. Section 1.3 descries Büchi s sequentil clculus, the frmework in predicte logic for descriing properties of words tht re recognizle y finite utomt. In Section 1.4 we prove wht might well e descried s the two fundmentl theorems of finite utomt: tht the lnguges recognized y finite utomt re exctly those definle y sentences of the sequentil clculus, nd lso exctly those definle y rtionl expressions (lso clled regulr expressions). Section 1.5 presents methods tht cn e used to show certin lnguges cnnot e recognized y finite utomt. The lst sections, 1.6 nd 1.7, hve more lgeric flvor: we introduce oth the miniml utomton nd the syntctic monoid of lnguge, nd prove the importnt McNughton-Schützenerger theorem descriing the lnguges definle in the first-order frgment of the sequentil clculus. 1.1.3. Nottion Throughout this chpter, A denotes finite lphet, tht is, finite non-empty set. Elements of A re clled letters, nd finite sequence of letters is clled word. We denote words simply y conctenting the letters, so, for exmple, if A = {,,c}, then c is word over A. The empty sequence is considered word, nd we use ε to denote this sequence. The set of ll words over A is denoted A, nd the set of ll nonempty words is denoted A +. The length of the word w, tht is, the numer of letters in w, is denoted w. If u,v A then we cn form new word uv y conctenting the two sequences. Conctention of words is oviously n ssocitive nd (unless A hs single element) noncommuttive opertion on A. We hve uv = u + v, nd uε = εu = u.

An Introduction to Finite Automt 5 (Other texts frequently use Λ or 1 to denote the empty word. The ltter choice is justified y the second eqution ove.) A suset of A is clled lnguge over A. 1.1.4. Historicl note nd references This chpter contins modern presenttion of mteril tht goes ck more thn fifty yers. The reder cn find other ccounts in clssic ppers nd texts: The equivlence of finite utomt nd rtionl expressions given in Section 1.4 ws first descried y Kleene in [9]. The connection with mondic second-order logic ws found independently y Trkhtenrot [24] nd Büchi [1]. Nondeterministic utomt were introduced y Rin nd Scott [17], who showed their equivlence to deterministic utomt. Minimiztion of finite-stte devices (frmed in the lnguge of switching circuits uilt from relys) is due to Huffmn [8]. The simple congruentil ccount of minimiztion tht we give origintes with Myhill [13] nd Nerode [14]. The equivlence of periodicity of the syntctic monoid with str-freeness is due to Schützenerger[19], nd the connection with first-order logic is from McNughton nd Ppert [11]. Our ccount of these results relies hevily on n rgument given in Wilke [23]. Rtionl expressions, determiniztion nd minimiztion hve ecome prt of the sic course of study in theoreticl computer science, nd s such re descried in numer of undergrdute textooks. Hopcroft nd Ullmn [7], Lewis nd Ppdimitriou [10] nd the more recent Sipser [20] re notle exmples. A more technicl nd lgericlly-oriented ccount is given in the monogrph y Eilenerg [4, 5]. An lgeric view of utomt is developed y Skrovitch [18]. Detiled ccounts of the connection etween utomt, logic nd lger cn e found in Struing [21] nd Thoms [22]. The stte of the rt, especilly concerning the lgeric clssifiction of utomt, will pper in the forthcoming hndook [16]. 1.2. Automt nd rtionl expressions 1.2.1. Opertions on lnguges We descrie here collection of sic opertions on lnguges, which will e uilding locks in the chrcteriztion of the expressive power of utomt. Since lnguges over A re susets of A, we my of course consider the oolen opertions: union, intersection nd complement. The product opertion on words cn e nturlly extended to lnguges: if K nd L re lnguges over A, we define their conctention product KL to e the set of ll products of word in K followed y word in L: KL = {uv u K nd v L}.

6 H. Struing nd P. Weil We lso use the power nottion for lnguges: if n > 0, L n is the product LL L of n copies of L. We let L 0 = {ε}. Note tht if n > 1, L n differs from the set of n-th powers of the elements of L. The itertion (or Kleene str) of lnguge L is the lnguge L = n 0 Ln. Finlly, we introduce simple rewriting opertion, sed on the use of morphisms. If A nd B re lphets, morphism from A to B is mpping ϕ: A B such tht (1) ϕ(ε) = ε, (2) for ll u,v A, ϕ(uv) = ϕ(u)ϕ(v). To specify such morphism, it suffices to give the imges of the letters of A. Then the imge of word u A, sy u = 1 n, is otined y tking the conctention of the imges of the letters, ϕ(u) = ϕ( 1 ) ϕ( n ). Tht is, ϕ( 1 n ) is otined from 1 n y sustituting for ech letter i the word ϕ( i ). This opertion nturlly extends from words to lnguges: if L A, then ϕ(l) = {ϕ(u) u L}. The considertion of these opertions leds to the clssicl definition of rtionl lnguges (lso clled regulr lnguges). The opertions of union, conctention nd itertion re clled the rtionl opertions. A lnguge over lphet A is clled rtionl if it cn e otined from the letters of A y pplying ( finite numer of) rtionl opertions. More formlly, the clss of rtionl lnguges over the lphet A, denoted RtA, is the lest clss of lnguges such tht (1) the lnguges nd {} re rtionl for ech letter A; (2) if K nd L re rtionl lnguges, then K L, KL nd L re lso rtionl. ( ( Exmple 1.1. The lnguge () A A () ) ) 2 is rtionl. (Note tht in order to lighten the nottion, we write,, etc., insted of {}, {}.) The lnguge {ε}, contining just the empty word, is rtionl. Indeed, it is equl to. Any finite lnguge (tht is, contining only finitely mny words) is rtionl. Let, A e distinct letters. It is instructive to show tht the following lnguges re rtionl: () the set of ll words which do not contin two consecutive ; () the set of ll words which contin the fctor ut not the fctor. We lso consider the extended rtionl opertions: these re the rtionl opertions, nd the opertions of intersection, complement nd morphic imge. A lnguge is sid to e extended rtionl if it cn e otined from the letters of A y pplying( finite numer of) extended rtionl opertions. The clss of extended rtionl lnguges over A is written X-RtA. Of course, ll rtionl lnguges re extended rtionl. The definition of extended rtionl lnguges offers more expressive possiilities ut s we will see,

An Introduction to Finite Automt 7 q 0.05 w q 0 f t f t w q 0.1 f q 0.15 w t f t w w t f q 0.25 q 0.2 Fig. 1.1. The utomton of (simplified) coffee mchine they re not properly more expressive thn rtionl lnguges. 1.2.2. Automt Let us strt with couple of exmples. Exmple 1.2. A coffee mchine delivers cup of coffee fore.25. It ccepts only coins of e.20, e.10 nd e.05. While determining whether it hs received sufficient sum, the mchine is in one of six sttes, q 0, q 0.05, q 0.1, q 0.15, q 0.2 nd q 0.25. The nmes of the sttes correspond to the sum lredy received. The mchine chnges stte fter new coin is inserted, nd the new stte it ssumes is function of the vlue of the new coin inserted nd of the sum lredy received. The ltter informtion is encoded in the current stte of the mchine. Here, the input word is the sequence of coins inserted, nd the lphet consists of three letters, w, t nd f, stnding respectively for twenty cents, ten cents nd five cents. The mchine is represented in Figure 1.1. The incoming rrow indictes the initil stte of the mchine (q 0 ), nd the outgoing rrow indictes the only ccepting stte (q 0.25 ), tht is, the stte in which the mchine will indeed prepre cup of coffee for you. Notice tht the mchine does not return chnge, ut tht it will ccept sums up toe.40. Exmple 1.3. Our second exmple (Figure 1.2) reds n integer, given y its inry expnsion nd red from right to left, tht is, strting with the it of lest weight. Upon reding this word on lphet {0, 1}, the utomton decides whether the given integer is divisile y 3 or not. For instnce, consider the integer 19, in inry expnsion 10011: our input word is 11001. It is red letter y letter, strting from the initil stte (the stte indicted y n incoming rrow, stte r 0 ). After ech new letter is red, we follow the corresponding edge strting t the current stte. Thus, strting in stte r 0, we visit successively the sttes r 1, r 0, r 0, r 0 gin, nd finlly r 1. This stte is not ccepting (it is not mrked with n outgoing edge), so the word 11001 is not

8 H. Struing nd P. Weil 1 r 0 0 0 r 1 r 1 0 1 1 1 r 2 0 0 1 0 r 0 r 2 1 Fig. 1.2. An utomton to compute mod 3 reminders ccepted y the utomton. And indeed, 19 is not divisile y 3. In contrst, 93 is divisile y 3, which is confirmed y running its inry expnsion, nmely 1011101, red from right to left, through the utomton: strting in stte r 0, we end in stte r 0. The reder will quickly see tht this utomton is constructed in such wy tht, if n is n integer nd w n is the inry expnsion of n, then the stte reched when reding w n from right to left, strting in stte r 0, is r k (resp. r k ) if n is congruent to k (mod 3) nd w n hs even (resp. odd) length. We now turn to forml definition. A (finite stte) utomton on lphet A is 4-tuple A = (Q,T,I,F) where Q is finite set, clled the set of sttes, T is suset of Q A Q, clled the set of trnsitions, nd I nd F re susets of Q, clled respectively the sets of initil sttes nd finl sttes. Finl sttes re lso clled ccepting sttes. For instnce, the utomton of Exmple 1.2 uses 3-letter lphet, A = {f,t,w}. Formlly, it is the utomton A = (Q,T,I,F) given y Q = {q 0,q 0.05,q 0.1,q 0.15,q 0.2,q 0.25 }, I = {q 0 }, F = {q 0.25 } nd T is 15-element suset of Q A Q contining such triples s (q 0,f,q 0.05 ), (q 0.1,t,q 0.2 ) or (q 0.2,w,q 0.25 ). As in our first exmples, it is often convenient to represent n utomton A = (Q,T,I,F) y leled grph, whose vertices re the elements of Q (the sttes) nd whoses edges re of the form q q if (q,,q ) is trnsition, tht is, if (q,,q ) T. The initil sttes re specified y n incoming rrow, nd the finl sttes re specified y n outgoing edge. From now on, we will most often specify our utomt y their grphicl representtions. Exmple 1.4. Here, the lphet is A = {,}. Figure 1.3 represents the utomton A = (Q,T,I,F) where Q = {1,2,3}, I = {1}, F = {3} nd T = {(1,,1),(1,,1),(1,,2),(2,,3),(3,,3),(3,,3)}.

An Introduction to Finite Automt 9 1 2 3 Fig. 1.3. An utomton ccepting A A Fig. 1.4. Another utomton ccepting A A 1.2.2.1. The lnguge ccepted y n utomton A pth in utomton A is sequence of consecutive edges, lso drwn s p = (q 0, 1,q 1 )(q 1, 2,q 2 ) (q n 1, n,q n ), p = q 1 0 2 q1 q2 n qn. Then we sy tht p is pth of length n from q 0 to q n, leled y the word u = 1 2 n. By convention, for ech stte q, there exists n empty pth from q to q leled y the empty word. For instnce, in the utomton of Figure 1.3, the word 3 lels exctly four pths: from 1 to 1, from 1 to 2, from 1 to 3 nd from 3 to 3. A pth p is successful if its initil stte is in I nd its finl stte is in F. A word w is ccepted (or recognized) y A if there exists successful pth in the utomton with lel w. And the lnguge ccepted (or recognized) y A is the set of lels of successful pths in A. It is denoted y L(A). We sy tht A ccepts (or recognizes) L(A). For instnce, the lnguge of the utomton of Figure 1.1 is finite, with exctly 27words. The utomtonoffigure1.3cceptsthe setofwordsinwhich tlestone occurrence of is followed immeditely y, nmely A A, where A = {,}. Different utomt my recognize the sme lnguge: if A nd B re utomt such tht L(A) = L(B), we sy tht A nd B re equivlent. Exmple 1.5. The lnguge A A, ccepted y the utomton in Figure 1.3, is lso recognized y the utomton in Figure 1.4 A lnguge L is sid to e recognizle if it is recognized y n utomton.

10 H. Struing nd P. Weil z B Fig. 1.5. Two utomt ccepting B comp 1.2.2.2. Complete utomt An utomton A = (Q,T,I,F) on lphet A is sid to e complete if, for ech stte q Q nd ech letter A, there exists t lest one trnsition of the form (q,,q ): in grphicl representtion, this mens tht, for ech letter of the lphet, there is n edge leled y tht letter strting from ech stte. Nturlly, this esily implies tht, for ech stte q nd ech word w A, there exists t lest one pth leled w strting t q. Every utomton cn esily e turned into n equivlent complete utomton. If A = (Q,T,I,F) is not complete, the completion of A is the utomton A comp = (Q,T,I,F) given y Q = Q {z}, where z is new stte not in Q, nd T is otined y dding to T ll triples (z,,z) ( A) nd ll triples (q,,z) (q Q, A) such tht there is no element of the form (q,,q ) in T. If A is complete, we let A comp = A. It is immedite tht, in every cse, A comp is complete nd L(A comp ) = L(A). Exmple 1.6. Let A = {,}. The utomton B in Figure 1.5, which ccepts the lnguge, is evidently not complete. The utomton B comp is represented next to it. 1.2.2.3. Trim utomt A complete utomton reds its entire input efore deciding to ccept or reject it: whtever input it receives, there is trnsition tht cn e followed. However, we hveseenthtinthecompletiona comp ofnon-completeutomtona, sttez does not prticipte in ny successful pth: it is in wy useless stte. Trimming n utomton removes such useless sttes; it is, in sense, the opposite of completing n utomton, nd ims t producing more concise device. A stte q of n utomton A is sid to e ccessile if there exists pth in A strting from some initil stte nd ending t q. Stte q is co-ccessile if there exists pth in A strting from q nd ending t some finl stte. Oserve tht stte is oth ccessile nd co-ccessile if nd only if it is visited y t lest one successful pth. The utomton A itself is trim if ll its sttes re oth ccessile nd coccessile: in trim utomton, ech stte is useful, in the sense tht it is used in ccepting some word of the lnguge L(A).

An Introduction to Finite Automt 11 Ofcourse,everyutomtonAisequivlenttotrimone,writtenA trim,otined y restricting A to its ccessile nd co-ccessile sttes nd to the trnsitions etween them. Interestingly, A trim cn e constructed efficiently, using redth-first serch. One first computes the ccessile sttes of A, y letting Q 0 = I (the initil sttes re certinly ccessile) nd y computing itertively Q n+1 = Q n {q Q (q,,q ) T}. q Q n, A One verifies tht the elements of Q n re the sttes tht cn e reched from n initil stte, reding word of length t most n; nd tht if two consecutive sets Q n nd Q n+1 re equl, then Q n = Q m for ll m n, nd Q n is the set of ccessile sttes of A. In prticulr, the set of ccessile sttes is computed in t most Q steps. A similr procedure, strting from the finl sttes insted of the initil sttes, nd working in reverse, produces in t most Q steps the set of co-ccessile sttes of A. The utomton A trim is then immeditely constructed. Remrk 1.1. The construction of A trim, or indeed, just of the set of ccessile sttes of A provides n efficient solution of the emptiness prolem: given n utomton A, is the lnguge L(A) empty? tht is, does A ccept t lest one word? Indeed, A recognizes the empty set if nd only if no finl stte is ccessile: in order to decide the emptiness prolem for utomton A, it suffices to construct the set of ccessile sttes of A nd verify whether it contins finl stte. This yields n O( Q 2 A ) lgorithm. 1.2.2.4. Epsilon-utomt It is sometimes convenient to extend the notion of utomt to the so-clled ε- utomt: the difference from ordinry utomt is tht we lso llow ε-leled trnsitions, of the form (p,ε,q) with p,q Q. Proposition 1.1. Every ε-utomton is equivlent to n ordinry utomton. Sketch of proof. Let A = (Q,T,I,F) e n ε-utomton, nd let R e the reltion on Q given y p R q if there exists pth from p to q consisting only of ε-leled trnsitions (tht is: R is the reflexive trnsitive closure of the reltion defined y the ε-leled trnsitions of A). Let A e the (ordinry) utomton given y the tuple (Q,T,I,F) with T = { (p,,q) (p,,q ) T nd q R q for some q Q } I = { q p R q for some p I }. Then A is equivlent to A.

12 H. Struing nd P. Weil 1.2.3. Deterministic utomt Exmple 1.7. Consider the utomton of Figure 1.3, sy A, nd the utomton B of Figure 1.4. Both recognize the lnguge, L = A A, ut there is n importnt, qulittive difference eween them. We hve defined utomt s nondeterministic computing devices: given stte nd n input letter, there my e severl possile choices for the next stte. Thus n input word might e ssocited with mny different computtion pths, nd the word is ccepted if one of these pths ends t n ccepting stte. In contrst, B hs the convenient property tht ech input word lels t most one computtion pth. These remrks re formlized in the following definition. An utomton A = (Q,T,I,F) is sid to e deterministic if it hs exctly one initil stte, nd if, for ech letter nd for ll sttes q,q,q, (q,,q ), (q,,q ) T = q = q. Thus, of the utomt in Figures 1.3 nd 1.4, the second one is deterministic, nd the first is non-deterministic. This definition imposes certin condition of uniqueness on trnsitions, tht is, on pths of length 1. This property is then extended to longer pths y simple induction. Proposition 1.2. Let A e deterministic utomton nd let w e word. (1) For ech stte q of A, there exists t most one pth leled w strting t q. (2) If w L(A), then w lels exctly one successful pth. In prticulr, we cn represent the set of trnsitions of deterministic utomton A = (Q,T,I,F)ytrnsition function: the (possilyprtil)function δ: Q A Q which mps ech pir (q,) Q A to the stte q such tht (q,,q ) T (if it exists). This function is then nturlly extended to the set Q A : if q Q nd w A, δ(q,w) is the stte q such tht there exists pth from q to q leled y w in A (if such stte exists). In the sequel, deterministic utomt will e specified s 4-tuples (Q,δ,i,F) insted of the corresponding (Q,T,{i},F). We note the following elementry chrcteriztion of δ. Proposition 1.3. Let A = (Q,δ,i,F) e deterministic utomton. Then we hve δ(q,ε) = q; { δ(δ(q,u),) if oth δ(q,u) nd δ(δ(q,u),) exist, δ(q,u) = undefined otherwise; u L(A) if nd only if δ(i,u) F. for ech stte q, ech word u A nd ech letter A.

An Introduction to Finite Automt 13 {1} {1,2} {1,3} {1,2,3} {2} {3} {2,3} Fig. 1.6. The suset utomton of the utomton in Figure 1.3 Agin, it turns out tht every utomton is equivlent to deterministic utomton. This deterministic utomton cn e effectively constructed, lthough the lgorithm the so-clled suset construction is more complicted thn those used to construct complete or trim utomt. Let A = (Q,T,I,F) e n utomton. The suset trnsition function of A is the function δ: P(Q) A P(Q) defined, for ech P Q nd ech A y δ(p,) = {q Q p P, (p,,q) T}. Thus, δ(p,) is the set of sttes of A which cn e reched y n -leled trnsition, strting from n element of P. The suset utomton of A is A su = (P(Q),δ,I,F su ) where F su = {P Q P F }. The utomton A su is deterministic nd complete y construction, nd the suset trnsition function of A is the trnsition function of A su. Moreover, if A hs n sttes, then A su hs 2 n sttes. Exmple 1.8. The suset utomton of the non-deterministic utomton of Figure 1.3 is given in Figure 1.6. Notice tht the sttes of the second row re not ccessile. Proposition 1.4. The utomt A nd A su re equivlent. Sketch of proof. Let A = (Q,T,I,F). One shows y induction on w tht for ll P Q nd w A, δ(p,w) is the set of ll sttes q Q such tht w lels pth in A strting t some stte in P nd ending t q. Therefore, word w is ccepted y A if nd only if t lest one finl stte lies in the set δ(i,w), if nd only if δ(i,w) F su, if nd only if w is ccepted y A su. This concludes the proof. In generl, the suset utomtonis not trim (see Exmple1.8) nd we cn find deterministicutomtonsmllerthna su, whichstillrecognizesthesmelnguge s A, nmely y trimming A su. Oserve tht in the proof of Proposition 1.4, the

14 H. Struing nd P. Weil only useful sttes of A su re those of the form δ(i,w), tht is, the ccessile sttes of A su. We define the determinized utomton of A to e A det = ( A su ) trim. This utomton is equivlent to A. Exmple 1.9. The determinized utomton of the non-deterministic utomton of Figure 1.3 consists of the first row of sttes in Figure 1.6 (see Exmple 1.8). An ostcle in the computtion of A det is the explosion in the numer of sttes: if A hs n sttes, then A su hs 2 n sttes. The determinized utomton A det my well hve exponentilly mny sttes s well, ut it sometimes hs fewer. Therefore, it mkes sense to try nd compute A det directly, in time proportionl to its ctul numer of sttes, rther thn first constructing the exponentilly lrge utomton A su nd then trimming it. This cn e done using the sme ides s in the construction of A trim in Section 1.2.2.3. One first constructs B, the ccessile prt of A su, strting with the initil stte of A su, nmely I. Then for ech constructed stte P nd ech letter, we construct δ(p,) nd the trnsition (P,,δ(P,)). And we stop when no new stte rises this wy. The second step consists in finding the co-ccessile prt of B, using the method in Section 1.2.2.3. Exmple 1.10. Let A = {,}, let n 2, nd let L = A A n 2. Then L is ccepted y non-deterministic utomton A with n sttes. However, ny deterministic utomton ccepting L must hve t lest 2 n 1 sttes. To see this, suppose tht (Q,δ,i,F) is such deterministic utomton. Let u,v e distinct words of length n 1. Then one of the words (let us sy u) contins n in position in which v contins the letter. Thus u = u x, v = v y, where x = y. Let w e ny word of length n 2 x. Then uw L, vw / L. It follows tht δ(i,u) δ(i,v) nd thus there re t lest s mny sttes s there re words of length n 1. This shows tht the exponentil lowup in the numer of sttes in the suset construction cnnot in generl e reduced. 1.3. Logic: Büchi s sequentil clculus Let us strt with n exmple. Exmple 1.11. Recll tht is the logicl conjunction, which reds AND. And is the logicl disjunction, which reds OR. We will consider formuls such s x y (x < y) R x R y. This formul hs the following interprettion on word u: there exist two nturl numers x < y such tht, in u, the letter in position x is n nd the letter in position y is. Thus this formul specifies lnguge: the set of ll words u in which this formul holds, nmely A A A.

An Introduction to Finite Automt 15 1.3.1. First-order formuls Let us now formlize this point of view on lnguges. 1.3.1.1. Syntx The formuls of Büchi s sequentil clculus use the usul logicl symols (,, for the negtion), the equlity symol =, the constnt symol true, the quntifiers nd, vrile symols (x,y,z,...) nd prentheses. They lso use specific, nonlogicl symols: inry reltion symols < nd S, nd unry reltion symols R (one for ech letter A). For convenience, we my ssume tht the vriles re drwn from fixed, countle, set of vriles. The tomic formuls re the formuls of the form true, x = y, x < y, S(x,y), nd R x, where x nd y re vriles nd A. The first-order formuls re defined s follows: Atomic formuls re first-order formuls, If ϕ nd ψ re first-orderformuls, then ( ϕ), (ϕ ψ) nd (ϕ ψ) re first-order formuls, If ϕ is first-order formul nd if x is vrile, then ( x ϕ) nd ( x ϕ) re first-order formuls. Remrk 1.2. As is usul in logic, we will limit the usge of prentheses in our nottion of formuls, to wht is necessry for their proper prsing, writing for instnce x R x insted of ( x (R x)). Certin vriles pper fter quntifier (existentil or universl): occurrences of these vriles within the scope of the quntifier re sid to e ound. Other occurrences re sid to e free. A precise, recursive, definition of the set FV(ϕ) of the free vriles of formul ϕ is s follows: If ϕ is tomic, then FV(ϕ) is the set of ll vriles occurring in ϕ, FV( ϕ) = FV(ϕ), FV(ϕ ψ) = FV(ϕ ψ) = FV(ϕ) FV(ψ), FV( x ϕ) = FV( x ϕ) = FV(ϕ)\{x}. A formul without free vriles is clled sentence. 1.3.1.2. Interprettion of formuls In Büchi s sequentil clculus, formuls re interpreted in words: ech word u of length n 0 determines structure (which we usively denote y u) with domin Dom(u) = {0,...,n 1} (Dom(u) = if u = ε). Dom(u) is viewed s the set of positions in the word u (numered from 0).

16 H. Struing nd P. Weil The symol < is interpreted in Dom(u) s the usul order (s in (2 < 4) nd (3 < 2)). The symol S is interpreted s the successor symol: if x,y Dom(u), then S(x,y) if nd only if y = x + 1. Finlly, for ech letter A, the unry reltion symol R is interpreted s the set of positions in u tht crry n ( suset of Dom(u)). Exmple 1.12. If u =, then Dom(u) = {0,1,...,5}, R = {0,3,4} nd R = {1,2,5}. A vlution on u is mpping ν from set of vriles into the domin Dom(u). It will e useful to hve nottion for smll modifictions of vlution: if ν is vlution nd d is n element of Dom(u), we let ν[x d] e the vlution ν defined y extending the domin of ν to include the vrile x nd setting ν (y) = { ν(y) if y x, d if y = x. If ϕ is formul, u A nd ν is vlution on u whose domin includes the free vriles of ϕ, then we define u,ν = ϕ (nd sy tht the vlution ν stisfies ϕ in u, or equivlently u,ν stisfies ϕ) s follows: u,ν = (x = y) (resp. (x < y), S(x,y), R x) if nd only if ν(x) = ν(y) (resp. ν(x) < ν(y), S(ν(x),ν(y)), R ν(x)) in Dom(u); u,ν = ϕ if nd only if it is not true tht u,ν = ϕ; u,ν = (ϕ ψ) (resp. (ϕ ψ)) if nd only if t lest one (resp. oth) of u,ν = ϕ nd u,ν = ψ holds (resp. hold); u,ν = ( xϕ) if nd only if there exists d Dom(u) such tht u,ν[x d] = ϕ; u,ν = ( xϕ) if nd only if, for ech d Dom(u), u,ν[x d] = ϕ. Note tht the truth vlue of u,ν = ϕ depends only on the vlues ssigned y ν to the free vriles of ϕ. In prticulr, if ϕ is sentence, then there is vlution µ with n empty domin. We sy tht ϕ is stisfied y u (or u stisfies ϕ), nd we write u = ϕ for u,µ = ϕ. Thus ech sentence ϕ defines lnguge: the set L(ϕ) of ll words such tht u = ϕ. Note tht this interprettion mkes sense even if u is the empty word, for then the vlution µ is still defined: Every sentence eginning with universl quntifier is stisfied y ε, nd no sentence eginning with n existentil quntifier is stisfied y ε. An erly exmple ws given in Exmple 1.11, Remrk 1.3. Two sentences ϕ nd ψ re sid to e logiclly equivlent if they re stisfied y the sme structures. We will use freely the clssicl logicl equivlence results, such s the logicl equivlence of ϕ ψ nd ( ϕ ψ), or the logicl equivlence of x ϕ nd ( x ϕ). We will lso use the impliction nd i-impliction nottion: ϕ ψ stnds for ϕ ψ nd ϕ ψ stnds for (ϕ ψ) (ψ ϕ).

An Introduction to Finite Automt 17 Exmple 1.13. Let ϕ nd ψ e the following formuls. ( ( y ) ϕ = x (y < x) R x) ( ( y ) ψ = x (y < x) R x). The sentence ϕ sttes tht there exists position with no strict predecessor, contining n, while ψ sttes tht every such position contins n. The ltter sentence, like ll universlly quntified first-order sentences, is vcuously stisfied y the empty string. Thus L(ϕ) = A nd L(ψ) = A {ε}. The first-order logic of the liner order (resp. of the successor), written FO(<) (resp. FO(S)) is the frgment of the first-order logic descried so fr, where formuls do not use the symol S (resp. <). 1.3.2. Mondic second-order formuls In mondic second-order logic, we dd new type of vrile to first-order logic, clled set vriles nd usully denoted y upper cse letters, e.g. X,Y,... The tomic formuls of mondic second-order re the tomic formuls of first-order logic, nd the formuls of the form (Xy), where X is set vrile nd y is n ordinry vrile. The recursive definition of mondic second-order formuls, strting from the tomic formuls, closely resemles tht of first-order formuls: it uses the sme rules given in Section 1.3.1, nd the dditionl rule: If ϕ is mondic second-order formul nd X is set vrile, then ( Xϕ) nd ( Xϕ) re mondic second-order formuls. The notion of free vriles is extended in the sme fshion. The interprettion of mondic second-order formuls lso requires n extension of the definition of vlution on word u: mondic second-order vlution is mpping ν which ssocites with ech first-order vrile n element of the domin Dom(u), nd with ech set vrile, suset of Dom(u). If ν is vlution, X is set vrile, nd R is suset of Dom(u), we denote y ν[x R] the vlution otined from ν y mpping X to R (see Section 1.3.1.2). With these definitions, we cn recursively give mening to the notion tht vlution ν stisfies formul ϕ in word u (u,ν = ϕ): we use gin the rules given in Section 1.3.1.2, to which we dd the following: u,ν = (Xy) if nd only if ν(y) ν(x); u,ν = ( Xϕ) (resp. ( Xϕ)) if nd only if there exists R Dom(u) such tht (resp. for ech R Dom(u)) u,ν[x R] = ϕ. Note tht the empty set is vlid ssignment for set vrile: the empty word my stisfy mondic second order vriles even if they strt with n existentil set quntifier.

18 H. Struing nd P. Weil Büchi s sequentil clculus (see Section 1.3.1.2) is thus extended to include mondic second-order formuls. We denote y MSO(<) (resp. MSO(S)) the frgment of mondic second-order logic, where formuls do not use the symol S (resp. <). Of course, FO(<) nd FO(S) re susets of MSO(<) nd MSO(S), respectively. Exmple 1.14. Inspecting the following MSO(<) sentence, ϕ = X [ x (Xx (( y (x < y)) ( y (y < x)))) x (Xx R x) x Xx ]. one cn see tht the elements of X must e the first nd lst positions of the word in which we interpret ϕ, so L(ϕ) = A A. This lnguge cn lso e descried y first order sentence, see Exmple 1.13, tht is: this formul is equivlent to first-order formul. Exmple 1.15. We now consider the more complex formul ϕ = X ( ( x y ((x < y) ( z ((x < z) (z < y)))) (Xx Xy)) ( x ( y (y < x)) Xx) ( x ( y (x < y)) Xx) ). The formul ϕ sttes tht there exists set X of positions in the word, such tht position is in X if nd only if the next position is not in X (so X hs every other position), nd the first position is in X, nd the lst position is not in X. Thus L(ϕ) is the set of words of even length. It is n esy consequence of the results of Section 1.7 tht this lnguge cnnot e descried y first-order formul. The successor reltion cn e expressed in FO(<): S(x, y) is logiclly equivlent to the following formul: (x < y) z ((x < z) ((y = z) (y < z))). In wek converse, the order reltion < cn e expressed in MSO(S): the formul x < y is equivlent to: X ( Xy Xx [ z t ((Xz S(z,t)) Xt)] ). It follows tht MSO(<) nd MSO(S) hve the sme expressive power. Proposition 1.5. A lnguge cn e defined y sentence in MSO(S), if nd only if it cn e defined y sentence in MSO(<). However, the order reltion < cnnot e expressed in FO(S). This is non-trivil result; for proof, see [21]. Proposition 1.6. If lnguge cn e defined y sentence in FO(S), then it cn e defined y sentence in FO(<). The converse does not hold.

An Introduction to Finite Automt 19 1.4. The Kleene-Büchi theorem In this section, we prove the following theorem, comintion of the clssicl Kleene nd Büchi theorems. Theorem 1.1. Let L e lnguge in A. The following conditions re equivlent: (1) L is defined y sentence in MSO(<); (2) L is ccepted y n utomton; (3) L is extended rtionl; (4) L is rtionl. 1.4.1. From utomt to mondic second-order formuls Let A = (Q,i,δ,F) e deterministic utomton. The ide is to ssocite with ech stte q Q second order vrile X q, to encode the set of positions in which given pth visits stte q. Wht we need to express out the sets X q is the following: the sets X q form prtition of the set of ll positions (t ech point in time, the utomton must e in one nd exctly one stte); if pth visits stte q t time x, stte q t time x + 1 nd if the letter in position x+1 is n, then δ(q,) = q ; This nlysis leds to the following formul. For convenience, let Q e the set {q 0,q 1,...,q n }, with initil stte i = q 0. We lso use the shorthnd min nd mx to designte the first nd lst positions: this is cceptle s these positions cn e expressed y FO(S)-formuls. For instnce, R min stnds for x( y S(y,x) R x); nd Xmx stnds for x( y S(x,y) Xx). X q0 X q1 X qn ( x (X q x X q x) x X q x q q q x y [S(x,y) ( Xq x R y X δ(q,) y )] A q Q, A ( R min X ) ( δ(q0,)min X q mx) ). This sentence is ctully verified y the empty word, so the lnguge it defines coincides with L(A) on A +. If q 0 F, it ccurtely defines L(A). But if q 0 F, we must consider the conjunction of this sentence with x true. This is sentence in MSO(S,<) ut s we know, it is logiclly equivlent to one in MSO(<). Note tht it is in fct n existentil mondic second order sentence, tht is, the second-order quntifictions re ll existentil. q F

20 H. Struing nd P. Weil 1.4.2. From formuls to extended rtionl expressions The proof tht n MSO(<)-definle lnguge cn e descried y n extended rtionl expression, is more complex. The resoning is y induction on the recursive definition of formuls. Insted of ssociting lnguge only with sentences (formuls without free vriles), we will ssocite lnguges with ll formuls ut these lnguges will e over lrger lphets, which llow us to encode vlutions. 1.4.2.1. The uxiliry lphets B p,q Let p,q 0 nd let B p,q = A {0,1} p {0,1} q. A word over the lphet B p,q cn e identified with sequence (u 0,u 1,...,u p,u p+1,...,u p+q ) where u 0 A, u 1,...,u p,u p+1,...,u p+q {0,1} nd ll the u i hve the sme length. Let K p,q consist of the empty word nd the words in B p,q + such tht ech of the components u 1,...,u p contins exctly one occurrence of 1. Thus ech of these components relly designtes one position in the word u 0, nd ech of the components u p+1,...,u p+q designtes set of positions in u 0. Exmple 1.16. If A = {,}, the following is word in K 2,1 : u 0 u 1 0 0 0 0 1 0 0 u 2 0 0 1 0 0 0 0 u 3 0 1 1 0 0 1 1 Its components u 1 nd u 2 designte positions 4 nd 2, respectively, nd its component u 3 designtes the set {1,2,5,6}. The lnguges K p,q re extended rtionl. Indeed, for 1 i p, let C i e the set of elements ( 0, 1,..., p+q ) B p,q such tht i = 1. Then K p,q is the set of words in Bp,q which contin t most one letter in ech C i : K p,q = { ε } (B p,q \C i ) C i (B p,q \C i ) = Bp,q \ Bp,q C ibp,q C ibp,q. 1 i p 1.4.2.2. The lnguge ssocited with formul 1 i p Let now ϕ(x 1,...,x r,x 1,...,X s ) e formul in which the free first order (resp. set) vriles re x 1,...,x r (resp. X 1,...,X s ), with r p nd s q. We interpret R s R = {i Dom(u) u 0 (i) = }; x i s the unique position of 1 in u i (if u i ε); X j s the set of positions of 1 in u p+j.

An Introduction to Finite Automt 21 Note tht if p = q = 0, then ϕ is sentence nd this is the usul notion of interprettion. More formlly, let (u 0,u 1,...,u p+q ) e non-empty word in K p,q. Let n i e the position of the unique 1 in the word u i nd let N j e the set of the positions of the 1 s in the word u p+j. We sy tht u = (u 0,u 1,...,u p+q ) K p,q stisfies ϕ if u 0,ν stisfy ϕ where ν is the vlution defined y ν(x i ) = n i for 1 i r nd ν(x j ) = N j for 1 j s. We lso sy tht the empty word (in K p,q ) stisfies ϕ if ε = ϕ. We let L p,q (ϕ) = {u K p,q u stisfies ϕ}. Thus ech formul ϕ defines suset of K p,q, nd hence lnguge in B p,q. Exmple 1.17. Let ϕ = x (x < y R y). Then FV(ϕ) = {y}. And L 1,0 (ϕ) is the set of pirs of words (u 0,u 1 ) such tht u 0 A, u 1 {0,1}, u 0 nd u 1 hve the sme length, u 1 hs single 1, which is not the first position, nd u 0 hs n in tht position. Let ϕ = x ((Xx x < y R y) R x). Then L 1,1 (ϕ) is the set of triples of words (u 0,u 1,u 2 ) with u 0 A, u 1,u 2 {0,1}, ll three words hve the sme length, nd either this length is zero, or u 1 hs single 1 such tht: Let n e the position in u 1 which hs 1. If u 0 hs in position n, then u 0 hs n in ech position efore n in which u 2 hs 1. If u 0 does not hve in position n, then there is no constrint. 1.4.2.3. The MSO(<)-definle lnguges re extended rtionl We first consider the lnguges ssocited with n tomic formul. Let 1 i,j p+q nd let A. Let C j, = { B p,q j = 1 nd 0 = }, C i,j = { B p,q i = j = 1}, nd C i = { B p,q i = 1}. Then we hve L p,q (R x i ) = K p,q B p,qc i, B p,q L p,q (x i = x j ) = K p,q B p,q C i,jb p,q L p,q (x i < x j ) = K p,q B p,q C ib p,q C jb p,q L p,q (X i x j ) = K p,q B p,q C i+p,jb p,q. Thus, the lngugesdefined ythetomicformuls,nmelyl p,q (R x), L p,q (x = y), L p,q (x < y) nd L p,q (Xy), re extended rtionl.

22 H. Struing nd P. Weil Now let ϕ nd ψ e formuls nd let us ssume tht L p,q (ϕ) nd L p,q (ψ) re extended rtionl. Then we hve L p,q (ϕ ψ) = L p,q (ϕ) L p,q (ψ) L p,q (ϕ ψ) = L p,q (ϕ) L p,q (ψ) L p,q ( ϕ) = K p,q \L p,q (ϕ), nd hence these three lnguges re extended rtionl s well. We still need to hndle existentil quntifiction. Let π i e the morphism which deletes the i-th component in word of B p,q ; tht is: if1 i p, thenπ i : B p,q B p 1,q, ndifp < i p+q,thenπ i: B p,q B p,q 1. In either cse, we hve π i ( 0, 1,..., p+q ) = ( 0, 1,..., i 1, i+1,..., p+q ). Now, oserve tht, for ny formul ϕ(x 1,...,x r,x 1,...,X s ), nd for p r, q s, 1 i p nd 1 j q we hve L p 1,q ( x i ϕ) = π i (L p,q (ϕ)) nd L p,q 1 ( X j ϕ) = π p+j (L p,q (ϕ)). This concludes the proof tht L p,q (ϕ) is extended rtionl for ny p r, q s. In prticulr, if ϕ is sentence in MSO(<) (tht is, ϕ hs no free vriles), we my tke p = q = 0. Then L 0,0 (ϕ) is extended rtionl nd we lredy noted tht L(ϕ) = L 0,0 (ϕ). 1.4.3. From extended rtionl expressions to utomt It is immeditely verified tht the lnguges, {ε}, {} ( A) re ccepted y finite utomt. We now need to show tht if K,L A re recognizle nd if π: A B is morphism, then L, K L, K L, KL, K nd π(l) re recognizle. Proposition 1.7. If L A is recognizle, then the complement L of L is recognizle s well. Proof. Let A = (Q,δ,i,F) e deterministic complete utomton recognizing L. Then A = (Q,δ,i,F) recognizes L y Proposition 1.3. Exmple 1.18. The deterministic utomt in Exmples 1.5 nd 1.6 confirm tht, if A = {,}, then is the complement of A A. Note tht the resulting procedure yields deterministic utomton for L. It is very efficient if L is given y deterministic utomton, ut my led to n exponentil growth in the numer of sttes if L is given y non-deterministic utomton. Proposition 1.8. If K,L A re recognizle, then K L nd K L re recognizle s well.

An Introduction to Finite Automt 23 Proof. Let A = (Q,T,I,F) nd A = (Q,T,I,F ) e utomt recognizing L nd L, respectively. We ssume tht the stte sets Q nd Q re disjoint. Then it is redily verified tht the utomton A A = (Q Q,T T,I I,F F ) ccepts L L. Thus L L is recognizle, nd hence so is L L = L L, y Proposition 1.7. The construction in the ove proof lwys yields non-deterministic utomton forl L, evenif westrtfrom deterministic utomtforlnd L. The product of utomt provides n lterntive construction which preserves determinism, voids ny exponentition of the numer of sttes, nd works for oth the union nd the intersection. Let A = (Q,T,I,F) nd A = (Q,T,I,F ) e utomt recognizing the lnguges L nd L. Their crtesin product is the utomton A = (Q Q,T,I I,F F ) where T = {((p,p ),,(q,q )) (p,,q) T nd (p,,q ) T }. Note tht if A nd A re deterministic, then A is deterministic s well. The min property of A is the following: there exists pth (p,p u ) (q,q ) in A if nd only if there exist pths p u q nd p u q, in A nd A respectively. Therefore A recognizes L L. If we tke (F Q ) (Q F ) s the set of finl sttes, insted of F F, nd if the utomt A nd A re complete, then the product utomton recognizes L L. In prctice, the crtesin product of A nd A my not e trim, nd one my wnt to use the procedure in Section 1.2.2.3 to produce more concise utomt for L L nd L L. Remrk 1.4. Let us record here n lgorithmic consequence of Propositions 1.7 nd 1.8: given two utomt A nd B, it is decidle whether L(A) L(B) nd whether L(A) = L(B). Indeed, we cn compute utomt ccepting L(A)\L(B) = L(A) L(B) nd L(B)\L(A), nd decide whether these lnguges re empty (see Remrk 1.1). Proposition 1.9. If L,L A re recognizle, then LL nd L re recognizle s well. Sketch of proof. Let A = (Q,T,I,F) nd let A = (Q,T,I,F ) e utomt ccepting L nd L, respectively, nd let us ssume tht their stte sets re disjoint. It is esily verified tht the ε-utomton ( Q Q,T T (F {ε} I ),I,F )

24 H. Struing nd P. Weil ccepts LL (see Section 1.2.2.4). Similrly, if j is stte not in Q, the ε-utomton ( Q {j},t (F {ε} I),I {j},f {j} ) ccepts L. Proposition 1.10. If L A is recognizle nd ϕ: A B is morphism, then ϕ(l) is recognizle s well. Sketch of proof. Let A = (Q,T,I,F) e n utomton recognizing L. We let A e the ε-utomton A = (Q Q,T,I,F), where the set T consists of - the trnsitions of the form (p,ε,q) such tht (p,,q) T for some letter with ϕ() = ε, - the trnsitions occurring in the pths of the form p 1 q 1 2 q k k 1 q such tht (p,,q) T, ϕ() = 1 k ε nd q 1,...,q k 1 re new sttes tht we djoin for ech such triple (p,,q). The setq continsllthe new sttestht occurin the ltterpths. It iselementry to verify tht A recognizes ϕ(l). So fr, we hve shown tht lnguge is recognizle, if nd only if it is defined y sentence in MSO(<), if nd only if it is extended rtionl. Remrk 1.5. Note tht the proofs of this logicl equivlence re constructive, in thesensethtgivensentenceϕinmso(<), wecnconstructnutomtonasuch tht L(ϕ) = L(A). It follows tht MSO(<) is decidle: given n MSO sentence ϕ, we cn decide whether ϕ lwys holds. Indeed, this is the cse if nd only if L( ϕ) =, which cn e tested s discussed in Remrk 1.1. 1.4.4. From utomt to rtionl expressions To complete the proof of the Kleene-Büchi theorem, it suffices to prove tht every recognizle lnguge is rtionl. For this, we use the McNughton-Ymd construction. Let A = (Q,T,I,F) e n utomton. For ech pir of sttes p,q Q nd for ech suset P Q, let L p,q (P) e the set of ll words u A which lel pth from stte p to stte q, such tht the sttes visited internlly y tht pth re ll in P: L p,q (P) = { 1 2... n A there exists pth in A p 1 q 2 1 n...qn 1 q with q1,...,q n 1 P}.

An Introduction to Finite Automt 25 Recll tht, y convention, there lwys exists n empty pth, leled y the empty word, from ny stte q to itself. So ε L p,q (P) if nd only if p = q. We show y induction on the crdinlity of P tht ech lnguge L p,q (P) is rtionl. This will prove tht L(A) is rtionl, since L(A) = i I, f F L i,f(q). If P =, then L p,q ( ) = { A (p,,q) T} if p q, nd L q,q ( ) = { A (q,,q) T} {ε}. Thus L p,q ( ) is lwys finite, nd hence rtionl. Now let n > 0 nd let us ssume tht, for ny p,q Q nd P Q contining t most n 1 sttes, the lnguge L p,q (P) is rtionl. Let now P Q e suset with n elements nd let r P. Considering the first nd the lst visit to stte r of pth from p to q, we find tht L p,q (P) = L p,q (P \{r}) L p,r (P \{r})l r,r (P \{r}) L r,q (P \{r}). Since P \ {r} hs crdinlity n 1, it follows from the induction hypothesis tht L p,q (P) is rtionl. This concludes the proof of the Kleene-Büchi theorem. 1.4.5. Closure properties Rtionl lnguges enjoy mny dditionl closure properties. Proposition 1.11. Let ϕ: A B e morphism nd let L B. If L is rtionl, then ϕ 1 (L) is rtionl s well. Sketch of proof. Let A = (Q,T,I,F) e n utomton over B, recognizing L, nd let A = (Q,T,I,F) e the utomton over A where T = {(p,,q) p ϕ() q is pth in A}. It is redily verified tht A recognizes ϕ 1 (L). Let u A nd L A. The left nd right quotients of L y u re defined s follows: u 1 L = {v A uv L}; Lu 1 = {v A vu L}. These notions re generlized to lnguges: if K nd L re lnguges, the left nd right quotients of L y K re defined s follows: K 1 L = {v A u K such tht uv L} = u 1 L, u K LK 1 = {v A u K such tht vu L} = u K Lu 1. Proposition 1.12. If L A is rtionl nd K A is ny lnguge (possily not rtionl), then K 1 L nd LK 1 re rtionl s well.

26 H. Struing nd P. Weil Sketch of proof. If A = (Q,T,I,F) is n utomton recognizing L. Let I e the set of sttes of A which re ccessile from n initil stte of A following pth leled y word of K, I = {q Q i I, u K such tht i u q}. Then one shows tht A = (Q,T,I,F) recognizes K 1 L. The proof for LK 1 is similr. Remrk 1.6. The proof of Proposition 1.12 is not effective: we my not e le to construct the set of sttes I ssocited with K. However, if K is rtionl too, then I is effectively constructile. Recll tht word u is prefix of the word v if there exists word v A such tht v = uv (tht is: v strts with u). Similrly, u is suffix of v if there exists word v A such tht v = v u. Finlly u is fctor of v if there exist words v,v A such tht v = v uv. If L is lnguge, we let Pref(L) (resp. Suff(L), Fct(L)) e the set of ll prefixes (resp. suffixes, fctors) of the words in L. Proposition 1.13. If L A is rtionl, then Pref(L), Suff(L) nd Fct(L) re rtionl s well. Proof. The result follows from Proposition 1.12, since Pref(L) = L(A ) 1, Suff(L) = (A ) 1 L nd Fct(L) = (A ) 1 L(A ) 1. We leve it to the reder to verify tht the following opertions lso preserve rtionlity. The mirror imge of word u = 1... n A is the word ũ = n... 1. The corresponding lnguge opertion is given y L = {ũ u L} for ech L A. A word u = 1... n A is suword of word v A if there exist words u 0,...,u n A such tht v = u 0 1 u 1... n u n. If L A, we let SW(L) e the set of ll suwords of the words of L. The shuffle of the words u nd v is the set u v = {w A u 1,v 1,...,u n,v n A such tht u = u 1 u n, v = v 1 v n nd w = u 1 v 1 u n v n }. If K nd L re lnguges, we let K L = u K, v L u v. Proposition 1.14. Let K, L A e rtionl lnguges. Then L, SW(L) nd K L re rtionl s well. 1.5. Pumping lemms The chrcteriztions summrized in the Kleene-Büchi theorem re sufficient most of the time to show tht lnguge is rtionl. Showing tht lnguge is not

An Introduction to Finite Automt 27 u 1 u 2 p 0 p ij = p ik p n u 3 Fig. 1.7. Proof of the pumping lemm rtionl is trickier prolem. This short section presents the min tool for tht purpose, nmely the pumping lemm. We ctully first present rther strct version of this sttement, nd then its more clssicl corollries. Theorem 1.2. Let L e rtionl lnguge. There exists n integer N > 0 with the following property. For ech word w L nd for ech sequence of integers 0 i 0 < i 1 <... < i N w, there exist 0 j < k N such tht, if w = u 1 u 2 u 3 with u 1 = i j nd u 1 u 2 = i k, then u 1 u 2u 3 L. Proof. Let A e n utomton recognizing L, nd let N e the numer of sttes of A. Let w = 1 2 n L nd let p 1 0 2 p1 p2 n pn e successful pth in A leled w. Let 0 i 0 < i 1 < < i N n e sequence of integers. Then two of the sttes p i0,p i1,...,p in re equl, tht is, there exist 0 j < k N such tht p ij = p ik. Let u 1 = 1 ij, u 2 = 1+ij ik nd u 3 = 1+ik n. Of course, w = u 1 u 2 u 3, u 1 = i j, u 1 u 2 = i k. The sitution is summrized y Figure 1.7: we my iterte or skip the loop leled u 2 nd still retin successful pth, so u 1 u 2u 3 L. Corollry 1.1. Let L e rtionl lnguge. There exists n integer N > 0 such tht, for ech word w L with length w N, we cn fctor w in three prts, w = u 1 u 2 u 3, with u 2 ε nd u 1 u 2u 3 L. Corollry 1.2. Let L e rtionl lnguge. There exists n integer N > 0 such tht, for ech word w L with length w N, we cn fctor w in three prts, w = u 1 u 2 u 3, with u 2 ε, u 1 u 2 N (resp. u 2 u 3 N) nd u 1 u 2u 3 L. Sketch of proof. To prove Corollry 1.2, we pply Theorem 1.2 with i j = j (resp. i j = n N+j) for0 j N. And toprovecorollry1.1, we tkenysequence. Exmple 1.19. It is clssicl ppliction of Corollry 1.1 tht { n n n 0} is not rtionl: for ech N > 0, the word N N cnnot e fctored s w = u 1 u 2 u 3 with u 2 ε nd u 1 u 2 u 3 { n n n 0}. Corollry 1.2 cn e used to show tht {u {,} u = u } is not rtionl (tke gin N N ); however, this lnguge stisfies the necessry condition for rtionlity in Corollry 1.1, with N = 2.