CONCATENATION AND KLEENE STAR ON DETERMINISTIC FINITE AUTOMATA

Similar documents
CS 455/555: Finite automata

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Lecture 3: Nondeterministic Finite Automata

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Theory of Computation (I) Yijia Chen Fudan University

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Nondeterministic Finite Automata

Theoretical Computer Science. State complexity of basic operations on suffix-free regular languages

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Finite Automata. Mahesh Viswanathan

CS243, Logic and Computation Nondeterministic finite automata

Nondeterministic Finite Automata

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

3515ICT: Theory of Computation. Regular languages

Closure under the Regular Operations

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Finite Automata and Regular languages

Deterministic Finite Automata (DFAs)

Chapter 6: NFA Applications

Deterministic Finite Automata (DFAs)

Equivalence of DFAs and NFAs

Non-deterministic Finite Automata (NFAs)

Universal Disjunctive Concatenation and Star

Kleene Algebras and Algebraic Path Problems

Finite Automata and Languages

Chapter Five: Nondeterministic Finite Automata

Course 4 Finite Automata/Finite State Machines

CS 322 D: Formal languages and automata theory

Theory of computation: initial remarks (Chapter 11)

Lecture 1: Finite State Automaton

CS 154, Lecture 3: DFA NFA, Regular Expressions

September 7, Formal Definition of a Nondeterministic Finite Automaton

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Finite Automata and Regular Languages

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Outline. Nondetermistic Finite Automata. Transition diagrams. A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F)

Finite Automata. Seungjin Choi

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Nondeterministic Finite Automata

Operations on Unambiguous Finite Automata

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

CSC236 Week 11. Larry Zhang

Inf2A: Converting from NFAs to DFAs and Closure Properties

Closure under the Regular Operations

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

(Refer Slide Time: 0:21)

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

Nondeterministic Finite Automata. Nondeterminism Subset Construction

Intro to Theory of Computation

CSE 105 Theory of Computation Professor Jeanne Ferrante

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata.

Deterministic Finite Automata (DFAs)

Regular expressions and Kleene s theorem

Constructions on Finite Automata

State Complexity of Two Combined Operations: Catenation-Union and Catenation-Intersection

Uses of finite automata

Computer Sciences Department

Classes and conversions

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Advanced Automata Theory 2 Finite Automata

Theory of computation: initial remarks (Chapter 11)

Computational Models Lecture 2 1

Regular Expressions. Definitions Equivalence to Finite Automata

Reversal of Regular Languages and State Complexity

CPSC 421: Tutorial #1

CSC173 Workshop: 13 Sept. Notes

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Finite Automata and Regular Languages (part III)

Homework Assignment 6 Answers

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

Lecture 17: Language Recognition

CS21 Decidability and Tractability

Notes on State Minimization

Decision, Computation and Language

Takeaway Notes: Finite State Automata

Theory of Computation (II) Yijia Chen Fudan University

Languages. A language is a set of strings. String: A sequence of letters. Examples: cat, dog, house, Defined over an alphabet:

CS 208: Automata Theory and Logic

Computational Models - Lecture 1 1

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

Constructions on Finite Automata

This lecture covers Chapter 7 of HMU: Properties of CFLs

Model-Based Estimation and Inference in Discrete Event Systems

Further discussion of Turing machines

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

Nondeterministic finite automata

Deterministic Finite Automaton (DFA)

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Regular expressions and Kleene s theorem

Automata and Languages

Transcription:

1 CONCATENATION AND KLEENE STAR ON DETERMINISTIC FINITE AUTOMATA GUO-QIANG ZHANG, XIANGNAN ZHOU, ROBERT FRASER, LICONG CUI Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio 44106, USA E-mail: {gq,lxc48}@case.edu College of Mathematics and Econometrics, Hunan University, Changsha 41001, China Email: xnzhou8106@163.com Department of Mathematics, Case Western Reserve University, Cleveland, Ohio 44106, USA Email: rgf11@case.edu This paper presents direct, explicit algebraic constructions of concatenation and Kleene star on deterministic finite automata (DFA), using the Booleanmatrix method of Zhang in Ref. 1 and ideas of Kozen in Ref.. The consequence is trifold: (1) it provides an alternative proof of the classical Kleene s Theorem on the equivalence of regular expressions and DFAs without using nondeterministic finite automata (NFA); () it demonstrates how the language constructions of concatenation and Kleene star can be captured elegantly as algebraic laws in the form of binomial theorems; (3) it provides a demonstration of the (tight) upper bounds of the state complexity of concatenation and Kleene star, but offers a way to study the state complexity of NFA also. Keywords: Automata; Concatenation; Kleene Star; Boolean matrices. 1. Matrix-Approach to Automata Theory We begin by providing a brief account of the matrix-approach to automata theory as introduced by Zhang. 1 A Boolean matrix is a matrix (of size m n) whose elements are either 0 or 1, where the internal operations are carried out over the Boolean algebra. We write B m n for the set of all Boolean matrices of size m n. A Boolean (row) vector of dimension n is an n-tuple (b 1, b,..., b n ) of 0s and 1s. We write B n for the set of all Boolean vectors of dimension n. A column vector is the transpose ( ) t of a row vector. The characteristic vector of a subset A of {1,, n} is the row vector I n A B n such that the p-th component of

I n A is a 1 if and only if p A. The characteristic vector of a singleton set {p} is written as I n p, or simply I p. O m n stands for an (m n)-matrix, all of its elements are 0. When dimension is fixed by context, we abuse notion and write O n n as 0. A deterministic finite automaton (DFA) is a 5-tuple M = (Q, Σ, δ, q 0, F ), where Q is the finite set of states, Σ is the alphabet, δ : Q Σ Q is the transition function, q 0 is the start state, and F is the set of final states. For notational convenience, we use initial segments of natural numbers {1,,, n} to denote the set of states, and fix 1 to be the start state, for base/background DFAs. When there is no confusion, we omit the indication of the start state (which is assumed to be state 1 by default). Each n-state DFA determines a (associated) matrix system { a a Σ}, where a is the (n n) adjacency matrix of the a-labeled subgraph associated with the DFA. In other words, the (i, j) entry of a is 1 if and only if δ(i, a) = j. Since M is a DFA, each a is row-stochastic (i.e., every row contains precisely a single 1). The (Boolean) sum of all members a in the matrix system is the adjacency matrix. For a string w = a 1 a a n over Σ, we write w for the matrix product a1 a an. The language accepted by M, denoted L(M), is the set {w I q0 w I t F = 1}. See Ref. 1 for more details of the utility of this approach. Example {( ) ( 1.1. )} The matrix system of the following DFA is 0 1 1 0,. 1 0 0 1 b a b start 1 a With the use of Boolean matrices, it is straightforward to describe a wide spectrum of constructions on DFA in a simple, algebraic manner, with their correctness established by induction and algebraic manipulation. 1 Here we briefly treat Brzozowski s derivation in Ref. 3, as an example. Given a string u and a language L, the Brzozowski derivative u 1 L is the language {w uw L}. Suppose L is accepted by an n-state DFA M = (Q, Σ, δ, F ), with { a a Σ} its matrix system. Then a DFA accepting u 1 L can be

3 given as M = (Q, Σ, δ, q 0, F ), where Q = {A A B n n }, q 0 = u, δ (A, a) = A a, F = {A I 1 AI t F = 1}. One can see that w is accepted by M if and only if δ ( u, w) = uw F, i.e., uw is accepted by M. In the remainder of this paper, we present the constructions of concatenation and Kleene star on DFA, and analyze the state complexity of such constructions. It turns out that, without additional effort, these algebraic constructions are already optimal in the number of states used after projecting to the first row.. Concatenation This section presents the concatenation construction. Theorem.1. Suppose matrix systems { a 1 a Σ} and { a a Σ} are associated with m- and n-state DFAs M 1 = (Q 1, Σ, δ 1, F 1 ) and M = (Q, Σ, δ, F ), respectively. The DFA M = (Q, Σ, δ, q 0, F ) defined as Q = {(A, B) A B m m, B B m n }, q 0 = (T 0, T ), δ((a, B), a) = (A, B) a (= (A a 1, A a 1T + B a )), F = {(A, B) I m 1 BI t F = 1}, ( ) where a a = 1 a 1T 0 a for a Σ, T = I t F 1 I n 1, and T 0 is the (m m) identity matrix, has the property that L(M) = L(M 1 ) L(M ). To understand how this construction works, suppose δ(q 0, w) = (A, B) for some w Σ. By the definition of δ, we have, for a Σ, δ(q 0, wa) = ( wa 1, wa 1 T + B a ). Therefore, δ(q 0, wa) F if and only if I m 1 ( wa 1 T + B a )I t F = 1, or (I m 1 wa 1 I t F 1 I n 1 I t F )+(I m 1 B a I t F ) = 1. Hence, δ(q 0, wa) F if and only if either wa L(M 1 ) (i.e., I m 1 wa 1 I t F 1 = 1) and 1 F (i.e., I n 1 I t F = 1), or else I m 1 B a I t F = 1. In general, I m 1 A, the first row of A, keeps track of the ending state through w in M 1, and I m 1 B keeps track of all possible states (in M 1 and M ) resulting from a decomposition w = w 1 w,

4 with w 1 going through M 1 and w going through M. This analysis can be captured more precisely in general in the next lemma. Lemma.1. Suppose δ(q 0, w) = (A, B) in M, and suppose w = a 1 a l where a i Σ for 1 i l. We have B = l a1a ai i=0 1 T ai+1ai+ a l. Proof. Suppose δ(q 0, w) = (A, B) in the DFA M given in Theorem.1, and suppose w = a 1 a l, where a i Σ for 1 i l. In what follows, by the induction on the length of w, we show that A = w 1, B = l i=0 a1a ai 1 T ai+1ai+ a l Remark that when i = 0 or i = l, it represents T a1a a l and a1a a l 1 T, respectively. (1) Suppose that l = 1 and w = a 1, then δ(q 0, a 1 ) ( = (I m a 1 1 1, T ) a1 1 T ) 0 a1 = ( a1 1, a1 1 T + T a1 ). The conclusion holds. () Suppose that the conclusion holds when l = k 1 and δ(q 0, a 1 a a k 1 ) = (A k 1, B k 1 ), where k 1 A k 1 = w 1, B k 1 = a1a ai 1 T ai+1ai+ a k 1. i=0 Then when l = k and w = a 1 a a k, we have δ(δ(q 0, a 1 a a k 1 ), a k ) ( a k 1 a k 1 = (A k 1, B k 1 ) T ) 0 a k = (A k 1 a k 1, A k 1 a k 1 T + B k 1 a k ) k = ( w 1, a1a ai 1 T ai+1ai+ a k ). i=0 By induction, we know that the conclusion holds for any l N. This lemma captures the key technical content for the proof of Theorem.1. It is interesting to observe that this lemma assumes the general flavor of a binomial theorem. The proof of Theorem.1 is as follows:

5 Proof of Theorem.1. Suppose that δ(q 0, w) = q, then w L(M) iff q F. If w = ɛ, then q = q 0. Thus, ɛ L(M) iff q 0 F, iff I m 1 T I t F = 1, iff ɛ L(M 1 ) L(M ). Since q F iff I m 1 BI t F = 1, by Lemma.1, we have w = a 1 a a l L(M) iff l i=0 I m 1 a1a ai 1 T ai+1ai+ a l I t F = 1, which means w = a 1 a a l L(M 1 ) and ɛ L(M ), or there exists 1 i l 1 such that u = a 1 a a i L(M 1 ), v = a i+1 a i+ a l L(M ) and w = uv, or ɛ L(M 1 ) and w = a 1 a a l L(M ). Therefore, w L(M) iff w L(M 1 ) L(M ), that is, L(M) = L(M 1 ) L(M ). 3. Kleene Star This section presents the Kleene star construction. Theorem 3.1. Suppose the matrix system { a 1 a Σ} is associated with an n-state DFA M 1 = (Q 1, Σ, δ 1, F 1 ). The DFA M = (Q, Σ, δ, q 0, F ) with H = I t F 1 I 1 and Q = {A A B n n } {s}, q 0 = s, { a δ(q, a) = 1 (H 0 + H 1 ), if q = s, A a 1(H 0 + H 1 ), if q = A, F = {A I 1 AI t F 1 = 1} {s}, has the property that L(M) = (L(M 1 )). Here, H 1 = H and H 0 is the identity matrix. The role of H is to mark possible positions for string partition. Even though it has no effect by itself for the acceptance of strings (and represents a redundant term), it accounts for the restart of M 1 and prepares the way for the next chunk of strings to be scanned from the initial state of M 1. Therefore, upon reading a symbol a, M appends a to the end of the current chunk, but branches with two threads: extending the current chunk

6 (the a 1 term) for one, and starting a new chunk (the a 1H term) for the other. Lemma 3.1. Suppose w = a 1 a l with a i Σ for 1 i l. We have, for the DFA M given in Theorem 3.1, δ(s, w) = w1 1 H w 1 H w k 1 Hi. w=w 1 w k,1 k l w j ɛ,1 j k i=0,1 Proof. We show that the conclusion holds by induction on the length of w. (1) Suppose that l = 1 and w = a 1, then by the definition of the DFA M given in Theorem 3.1, we have δ(s, a 1 ) = a1 1 (H0 + H 1 ) = i=0,1 a1 1 Hi The conclusion holds. () Suppose that the conclusion holds when l = k 1 and w = a 1 a a k 1, i.e., δ(s, a 1 a a k 1 ) = w=w 1 w h,1 h k 1 w j ɛ,1 j h i=0,1 Then when l = k and w = a 1 a a k, we have Next, we show that w=w 1 w h,1 h k w j ɛ,1 j h i=0,1 δ(s, a 1 a a k ) = δ(δ(s, a 1 a a k 1 ), a k ) w1 1 H w h 1 Hi. = δ(s, a 1 a a k 1 ) a k 1 (H0 + H 1 ). δ(s, a 1 a k 1 ) a k 1 (H0 + H 1 ) = w1 1 H w h 1 Hi. w=w 1 w h,1 h k w j ɛ,1 j h i=0,1 Let L denote δ(s, a 1 a k 1 ) a k 1 (H0 + H 1 ), and let R denote w1 1 H w h 1 Hi. Let e be a term in L, then e = w1 1 H w h 1 Hi a k 0, 1, w 1 w h = a 1 a k 1. If i = 0, e = w1 1 H w 1 H w ha k 1 Hj, where i, j 1 H j, take

7 w h = w ha k, then w 1 w h = a 1 a k 1 a k, which means e is a term in R. If i = 1, e = w1 1 H w 1 H w h 1 H a k 1 Hj, take w h+1 = a k, then w 1 w h w h+1 = a 1 a k 1 a k, which yields e is a term in R. Hence, every term in L is a term in R. Let e be a term in R, then e = w1 1 H w h 1 Hi, where w 1 w h = w. If w h = a k, then e = w1 1 H w h 1 1 H a k 1 Hi and w 1 w h 1 = a 1 a k 1. By the induction, w1 1 H w h 1 in δ(s, a 1 a k 1 ). Thus, e is a term in L. Otherwise, w h = w h a k, w h ɛ. 1 H is a term In this case e = w1 1 H H w h 1 a k 1 Hi and w 1 w h 1 w h = a 1 a k 1, which yields w1 1 H H w h 1 is a term in δ(s, a 1 a k 1 ). Thus, e is a term in L. Therefore, every term in R is a term in L. Thus, when l = k, the conclusion holds. By induction, we know that the conclusion holds for any l N. Proof of Theorem 3.1. At first, s F implies ɛ L(M). Suppose w = a 1 a a l, then by Lemma 3.1, w L(M) iff there exist w 1, w,, w k such that w = w 1 w w k and I 1 w1 1 H w 1 H w k 1 (H0 + H 1 )I t F 1 = 1, i.e., w 1, w,, w k L(M 1 ). Therefore, L(M) = (L(M 1 )). Remark. The essential language operators associated with regular languages are union, concatenation, and Kleene star. After addressing the matrix constructions for concatenation, and Kleene star, we only need to note that the union (and intersection) construction is straightforward and is left as an exercise. 4. State Complexity State complexity studies the minimal number of states needed for a given language operation as a function of the sizes of the underlying automata. 4 One general observation on constructions given in Sections and 3 is that we only need to keep track of the first rows of the respective matrices used for states, since their status of being a final state is determined by prefixing I 1 in a matrix multiplication. Theorem 4.1. Projecting to the first row by replacing (A, B) systematically with (I 1 A, I 1 B) for concatenation and replacing A systematically with I 1 A for Kleene star, we have: (1) The number of reachable states for the concatenation construction given in Section is m n k n 1, where the first underlying DFA has m

8 states, the second has n states, and k is the number of final states the first DFA. () The number of reachable states for the Kleene star construction given in Section 3 is n 1 + n k 1, where n is the number of states of the underlying DFA and k is the number of its non-initial final states. We remark that these numbers are lowest possible upper bounds, since they agree with the results in Ref. 4. Proof. By replacing (A, B) systematically with (I 1 A, I 1 B) for concatenation and replacing A systematically with I 1 A for Kleene star, the construction M of concatenation in Section can be reduced as M = (Q, Σ, δ, q 0, F ) with Q = {(A, B) A B m, B B n }, q 0 = (I m 1 T 0, I m 1 T ) = I m 1 q 0, δ ((A, B), a) = (A, B) a, F = {(A, B) BI t F = 1}, and the construction M of Kleene star in Section 3 can be reduced as M = (Q, Σ, δ, s, F ) Q = {A A B n } {s} { δ I1 a (q, a) = 1(H 0 + H 1 ), if q = s, A a 1(H 0 + H 1 ), if q = A, F = {A AI t F 1 = 1} {s}. In what follows, the state complexity of concatenation and Kleene star are obtained by using the equivalent constructions M and M. Concatenation. Let k be the number of final states of M 1. Note that δ ((A, B), a) = (A, B) a = (A a 1, A a 1T + B a ), where (A, B) = δ (q 0, w), w Σ. From the proof of Theorem.1, we know that A = I m 1 w 1, which means A has exactly one entry being 1 among its m bits, since 1 is row stochastic (and so is wa 1 ). This means that there are a maximal number of m n possible bit vectors of the form (A a 1, A a 1T + B a ), where m accounts for the variability of A a 1 and n for the variability of A a 1T + B a. However, not all n combinations can be realized by A a 1T + B a : A a 1T is equal to I n 1 if and only if wa L(M 1 ). We know that the first entry in B will always be equal to 1 if any of the positions in A corresponding to any of the states in F 1 is equal to 1. In particular, we can never reach a state for which the entry of A corresponding to a final

9 state of M 1 is equal to 1 and the entry of B corresponding to the start state of M is equal to zero. There are k n 1 states of this form. So the total number of reachable states in M is m n k n 1. Kleene star. Let k be the number of non-initial final states of M 1. Then realizing that for nonempty w Σ, a Σ, we have δ (A, a) = A a 1(H 0 + H 1 ), where A = δ (s, w). Note that A a 1H = I 1 if and only if we have A a 1IF t 1 = 1. This, in turn, happens if and only if A a 1 has a 1 in some entry corresponding to a final state of M 1. But δ (A, a) is the sum of A a 1 and A a 1H. In particular, this means that if any entry of A a 1 corresponding to a final state of M 1 is equal to 1, then we have A a 1H = I 1, and so the first entry of A a 1(H 0 + H 1 ) must be equal to 1 as well. Finally, because A a 1H is always either equal to 0 or I 1, we know that if any position except for the first one in A a 1(H 0 + H 1 ) is nonzero, then the corresponding position in A a 1 must also be nonzero. Putting these facts together, we conclude that the first entry of δ (A, a) will always be equal to 1 if any position corresponding to any final state is equal to 1. There are n 1 possibly reachable states in which there is a 1 in the first position, and n k 1 possibly reachable states in which the first entry is 0 and the entry in the position corresponding to every element of F 1 is zero. Furthermore, we need to remember to include our start state in the total number of states for our DFA. So the maximum number of reachable states in the DFA M is n 1 + n k 1 + 1 1 = n 1 + n k 1. 5. Conclusion With the constructions given, we see that operations on regular expressions can be directly translated to constructions on DFA. We obtained along the way a proof of the classical Kleene s Theorem avoiding the use of NFA (using Arden s Lemma in the other direction). Our Lemmas (.1, 3.1) illustrated how laws of Boolean matrices capture language operations inductively and algebraically. The natural constructions using matrix systems are also optimal in the usage of states. Our approach does not depend on the deterministic nature of the underlying automata until the topic of state complexity. Barring the use of ɛ-edges, our constructions work for NFA, possibly informing the study of state complexity for NFA in Ref. 5 also. References 1. Guo-Qiang Zhang, Inform. Comput. 15(1), 138 (1999).. D. Kozen, Inform. Comput. 110, 366 (1994). 3. J.A. Brzozowski, J. Assoc. Comput. Mach. 11, 481 (1964).

10 4. S. Yu, Q. Zhuang, K. Salomaa, Theor. Comput. Sci. 15, 315 (1994). 5. S. Yu, Fundam. Inform. 64, 471 (005).