Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Similar documents
Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Chapter 1 AUTOMATA and LANGUAGES

CS 154. Finite Automata, Nondeterminism, Regular Expressions

CS 154, Lecture 3: DFA NFA, Regular Expressions

CS243, Logic and Computation Nondeterministic finite automata

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Nondeterministic finite automata

Lecture 3: Nondeterministic Finite Automata

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

Nondeterminism. September 7, Nondeterminism

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Theory of Computation (II) Yijia Chen Fudan University

Sri vidya college of engineering and technology

CS 455/555: Finite automata

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Theory of computation: initial remarks (Chapter 11)

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Theory of Computation (I) Yijia Chen Fudan University

Computational Models Lecture 2 1

Nondeterministic Finite Automata

CS21 Decidability and Tractability

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

Chapter Five: Nondeterministic Finite Automata

Computational Models Lecture 2 1

UNIT-III REGULAR LANGUAGES

1 More finite deterministic automata

Theory of Languages and Automata

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Theory of computation: initial remarks (Chapter 11)

Incorrect reasoning about RL. Equivalence of NFA, DFA. Epsilon Closure. Proving equivalence. One direction is easy:

Deterministic Finite Automaton (DFA)

Regular Expressions. Definitions Equivalence to Finite Automata

3515ICT: Theory of Computation. Regular languages

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Deterministic Finite Automata (DFAs)

(Refer Slide Time: 0:21)

Deterministic Finite Automata (DFAs)

Fooling Sets and. Lecture 5

How do regular expressions work? CMSC 330: Organization of Programming Languages

Recap DFA,NFA, DTM. Slides by Prof. Debasis Mitra, FIT.

Nondeterminism and Epsilon Transitions

Inf2A: Converting from NFAs to DFAs and Closure Properties

Foundations of

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

CSE 105 Theory of Computation Professor Jeanne Ferrante

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )

Finite Automata and Regular languages

CS21 Decidability and Tractability

Uses of finite automata

Equivalence of DFAs and NFAs

Outline. Nondetermistic Finite Automata. Transition diagrams. A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F)

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

CSC236 Week 11. Larry Zhang

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Non-deterministic Finite Automata (NFAs)

Intro to Theory of Computation

Introduction to Languages and Computation

CSE 105 THEORY OF COMPUTATION

Automata and Languages

September 7, Formal Definition of a Nondeterministic Finite Automaton

CMSC 330: Organization of Programming Languages

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata.

Concatenation. The concatenation of two languages L 1 and L 2

Computational Theory

COMP4141 Theory of Computation

Nondeterministic Finite Automata

CSC173 Workshop: 13 Sept. Notes

Lecture 2: Connecting the Three Models

Turing Machines (TM) Deterministic Turing Machine (DTM) Nondeterministic Turing Machine (NDTM)

CPS 220 Theory of Computation REGULAR LANGUAGES

Finite Automata and Formal Languages

Closure under the Regular Operations

Lecture 17: Language Recognition

Theory of Computation Prof. Raghunath Tewari Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

What we have done so far

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

Non-Deterministic Finite Automata

Lecture 4 Nondeterministic Finite Accepters

Automata and Computability. Solutions to Exercises

CMPSCI 250: Introduction to Computation. Lecture #23: Finishing Kleene s Theorem David Mix Barrington 25 April 2013

CISC4090: Theory of Computation

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

Nondeterministic Finite Automata. Nondeterminism Subset Construction

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Transcription:

Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu ETH Zürich (D-ITET) September, 24 2015

Last week was all about Deterministic Finite Automaton

We saw three main concepts Regular Language Formal definition Closure

Regular Language A language L is regular if some finite automaton recognizes it Formal definition Closure

Regular Language Formal definition A finite automaton is a 5-tuple (Q,,,q 0,F) Closure

set of states alphabet start state (Q,,,q 0,F) transition function set of accept states

Regular Language Formal definition Closure L 1 L 2 If and are regular, then so are: L 1 [ L 2 L 1 \ L 2 L 1 L 1 L 2 L 1 L 2

We still need to prove that regular languages are closed under concatenation and Kleene*

Finite Automata Thu Sept 24 1 2 Closure (the end) Equivalence DFA NFA Regular Expression

Back to concatenation Theorem The class of regular languages is closed under concatenation. If L 1 and L 2 are regular languages then so is L 1 L 2. 1/1

Back to concatenation Theorem The class of regular languages is closed under concatenation. If L 1 and L 2 are regular languages then so is L 1 L 2. Question: Can we apply the same trick as with the Unioner M which simultaneously runs M 1 and M 2 & accepts if either M 1 accepts or M 2? M would have to accept if the input can be split into two pieces: one which is accepted by M 1 and the following one by M 2 1/2

Let s play a bit First, find the DFA for L 1 = { x {0,1}* x is any string} L 2 = { x {0,1}* x is composed of one or more 0s} Then, find the DFA for L 1 L 2 = { x {0,1}* x is any string that ends with at least one 0}

Introduction to Nondeterministic Finite Automata We have mostly played with Deterministic Finite Automata (DFA) Every step of a computation follows from the preceding one in a unique way Nondeterministic Finite Automata (NFA) generalizes DFA Several choices may exist for the next state at any point We have already seen one example of a NFA: One in which not all transitions were defined Now, let s dive deeper 1/4

Introduction to Nondeterministic Finite Automata The static picture of an NFA is as a graph whose edges are labeled by S and by e (called S e ) and with start vertex q 0 and accept states F. Example: 0 1 e 0,1 1 1/5

NFA: Formal Definition. Definition: A nondeterministic finite automaton (NFA) is encapsulated by M = (Q, S, d, q 0, F) in the same way as an FA, except that Σ ε is now the alphabet instead of S d has different domain and co-domain: δ: Q Σ ε P Q P(Q) is the power set of Q so that d(q,a) is the set of all endpoints of edges from q which are labeled by a. Any state can have 0, 1 or more transitions for each symbol of the alphabet including e-transitions! Think of them as free ; they don t require to read 1/6

Formal Definition of an NFA: Dynamic Just as with FA s, there is an implicit auxiliary tape containing the input string which is operated on by the NFA. As opposed to FA s, NFA s are parallel machines. The are able to be in several states at any given instant. The NFA reads the tape from left to right with each new character causing the NFA to go into another set of states. When the string is completely read, the string is accepted depending on whether the NFA s final configuration contains an accept state. 1/7

Let s play Let s illustrate the computation performed by this NFA when the string 010110 is shown as input Bonus question: what is the language recognized by the automata?

Formal Definition of an NFA: Dynamic A string u is accepted by an NFA M iff there exists a path starting at q 0 which is labeled by u and ends in an accept state. The language accepted by M is the set of all strings which are accepted by M and is denoted by L(M) As following a label e is for free (without reading an input symbol), you should delete all e s when computing the label of a path. 1/9

Example M 4 : 0 q 1 0,1 q 0 e q 3 1 q 2 1 Question: Which of the following strings is accepted? 1. e 2. 0 3. 1 4. 0111 5. 0011 Bonus question: what is the language recognized by the automata? 1/10

NFA s and Regular Operations We will study how NFA s interact with regular operations. We will use the following schematic drawing for a general NFA. The red circle stands for the start state q 0, the green portion represents the accept states F, the other states are gray. 1/11

NFA: Union The union A B is formed by putting the automata A and B in parallel. Create a new start state and connect it to the former start states using e-edges: 1/12

Remember our union example? Compare this obtained using the Cartesian Product Construction: 0 0 0 0 1 1 1 1 1 1 0 0 1/13

With this NFA which recognizes the same language L = {x has even length} {x ends with 11} a e e 1 1 b d e f 0,1 0 1 0,1 c 0 0 1/14

Now, let s deal with concatenation! The concatenation A B is formed by putting the automata in serial The start state comes from A while the accept states come from B. A s accept states are turned off and connected via e-edges to B s start state: 1/15

Concatenation Example L = {x has even length} {x ends with 11} 0 1 e 1 b d e f 1 0,1 0,1 c 0 0 This example is somewhat questionable Why? 1/16

Now let s go back to our first example Given the DFA for L 1 = { x {0,1}* x is any string} L 2 = { x {0,1}* x is composed of one or more 0s} Draw the NFA for L 1 L 2 = { x {0,1}* x is any string that ends with at least one 0}

NFA s: Kleene-+. The Kleene-+ A + is formed by creating a feedback loop. The accept states connect to the start state via e-edges: 1/18

Kleene-+ Example x is a streak of one or more 1 s followed by a streak of two or more 0 s L = { } + 1/19

NFA s: Kleene-* The construction follows from Kleene-+ construction using the fact that A* is the union of A + with the empty string. Just create Kleene-+ and add a new start accept state connecting to old start state with an e-edge: 1/20

Closure of NFA under Regular Operations The constructions above all show that NFA s are constructively closed under the regular operations. Theorem: If L 1 and L 2 are accepted by NFA s, then so are: L 1 L 2 L 1 L 2, L 1 + L 1 * The accepting NFA s can be constructed in linear time. 1/21

Closure of NFA under Regular Operations The constructions above all show that NFA s are constructively closed under the regular operations. Theorem: If L 1 and L 2 are accepted by NFA s, then so are: L 1 L 2 L 1 L 2, L 1 + L 1 * The accepting NFA s can be constructed in linear time. If we can show that all NFA s can be converted into FA s this will show that FA s, and hence regular languages, are closed under the regular operations. 1/22

NFA FA?!? The regular languages were defined to be the languages accepted by FA s, which are by default, deterministic. It would be nice if NFA s could be determinized and converted to FA s If so, we could proof that regular languages are closed under regular operations Let s try this next. 1/23

NFA s have 3 types of non-determinism Nondeterminism type Machine Analog d -function Easy to fix? Formally Under-determined Crash No output yes, failstate d(q,a) = 0 Over-determined Random choice Multivalued no d(q,a) > 1 e Pause reading Redefine alphabet no d(q,e) > 0 1/24

Determinizing NFA s: Example Idea: convert the NFA into an equivalent DFA that simulates it In the DFA, we keep track of all possible active states as the input is being read. If at the end, one of the active states is an accept state, the input is accepted. 1/25

One-Slide-Recipe to Derandomize Instead of the states in the NFA, we consider the power-states in the FA. (If the NFA has n states, the FA has 2 n states.) First we figure out which power-states will reach which power-states in the FA. (Using the rules of the NFA.) 1/26

One-Slide-Recipe to Derandomize Instead of the states in the NFA, we consider the power-states in the FA. (If the NFA has n states, the FA has 2 n states.) First we figure out which power-states will reach which power-states in the FA. (Using the rules of the NFA.) Then we must add all epsilon-edges: We redirect pointers that are initially pointing to power-state {a,b,c} to power-state {a,b,c,d,e,f} iif there is an epsilon-edge-only-path pointing from any of the states a,b,c to states d,e,f (transitive closure). 1/27

One-Slide-Recipe to Derandomize Instead of the states in the NFA, we consider the power-states in the FA. (If the NFA has n states, the FA has 2 n states.) First we figure out which power-states will reach which power-states in the FA. (Using the rules of the NFA.) Then we must add all epsilon-edges: We redirect pointers that are initially pointing to power-state {a,b,c} to power-state {a,b,c,d,e,f} iif there is an epsilon-edge-only-path pointing from any of the states a,b,c to states d,e,f (transitive closure). We do the same for the starting state: starting state of DFA = {starting state of NFA + all NFA states that can recursively be reached from there} FA s accepting states are all states that include an accepting NFA state. 1/28

Determinizing NFA s: Example Let s derandomize the following NFA b 1 e a a 2 3 a,b 1/29

Remarks The previous recipe can be made totally formal. More details can be found in the reading material. Just following the recipe will often produce a too complicated FA. Sometimes obvious simplifications can be made. In general however, this is not an easy task. 1/30

NFA FA Summary: Starting from any NFA, we can use subset construction and the epsilon-transitive-closure to find an equivalent FA accepting the same language. Thus, Theorem: If L is any language accepted by an NFA, then there exists a constructible [deterministic] FA which also accepts L. Corollary: The class of regular languages is closed under the regular operations. Proof: Since NFA s are closed under regular operations, and FA s are by default also NFA s, we can apply the regular operations to any FA s and determinize at the end to obtain an FA accepting the language defined by the regular operations. 1/31

Back to Regular Expressions (REX) We just saw that DFA and a NFA have actually the same expressive power, they are equivalent. What about regular expressions? 1/32

Back to Regular Expressions (REX) Regular expressions enable to symbolize a sequence of regular operations, and so a way of generating new languages from old. For example, to generate the regular language {banana,nab}* from the atomic languages {a},{b} and {n} we could do the following: (({b} {a} {n} {a} {n} {a}) ({n} {a} {b}))* Regular expressions specify the same in a more compact form: (banana nab)* 1/33

Regular Expressions (REX) Definition: The set of regular expressions over an alphabet S and the languages in S* which they generate are defined recursively: Base Cases: Each symbol a S as well as the symbols e and are regular expressions: a generates the atomic language L(a) = {a} e generates the language L(e) = {e} generates the empty language L( ) = { } = Inductive Cases: if r 1 and r 2 are regular expressions so are r 1 r 2, (r 1 )(r 2 ), (r 1 )* and (r 1 ) + : L(r 1 r 2 ) = L(r 1 ) L(r 2 ), so r 1 r 2 generates the union L((r 1 )(r 2 )) = L(r 1 ) L(r 2 ), so (r 1 )(r 2 ) is the concatenation L((r 1 )*) = L(r 1 )*, so (r 1 )* represents the Kleene-* L((r 1 ) + ) = L(r 1 ) +, so (r 1 ) + represents the Kleene-+ 1/34

Regular Expressions: Table of Operations including UNIX Operation Notation Language UNIX Union r 1 r 2 L(r 1 ) L(r 2 ) r 1 r 2 Concatenation (r 1 )(r 2 ) L(r 1 ) L(r 2 ) (r 1 )(r 2 ) Kleene-* (r )* L(r )* (r )* Kleene-+ (r ) + L(r ) + (r )+ Exponentiation (r ) n L(r ) n (r ){n} 1/35

Regular Expressions: Simplifications Just as algebraic formulas can be simplified by using less parentheses when the order of operations is clear, regular expressions can be simplified. Using the pure definition of regular expressions to express the language {banana,nab}* we would be forced to write something nasty like ((((b)(a))(n))(((a)(n))(a)) (((n)(a))(b)))* Using the operator precedence ordering *,, and the associativity of allows us to obtain the simpler: (banana nab)* This is done in the same way as one would simplify the algebraic expression with re-ordering disallowed: ((((b)(a))(n))(((a)(n))(a))+(((n)(a))(b))) 4 = (banana+nab) 4 1/36

Regular Expressions: Example Question: Find a regular expression that generates the language consisting of all bit-strings which contain a streak of seven 0 s or contain two disjoint streaks of three 1 s. Legal: 010000000011010, 01110111001, 111111 Illegal: 11011010101, 10011111001010, 00000100000 Answer: (0 1)*(0 7 1 3 (0 1)*1 3 )(0 1)* An even briefer valid answer is: S*(0 7 1 3 S*1 3 )S* The official answer using only the standard regular operations is: (0 1)*(0000000 111(0 1)*111)(0 1)* A brief UNIX answer is: (0 1)*(0{7} 1{3}(0 1)*1{3})(0 1)* 1/37

Regular Expressions: Examples 1) 0*10* 2) (SS)* 3) 1*Ø 4) S = {0,1}, {w w has at least one 1} 5) S = {0,1}, {w w starts and ends with the same symbol} 6) {w w is a numerical constant with sign and/or fractional part} E.g. 3.1415, -.001, +2000 1/38

REX NFA Since NFA s are closed under the regular operations we immediately get Theorem: Given any regular expression r there is an NFA N which simulates r. The language accepted by N is precisely the language generated by r: so that L(N ) = L(r ). The NFA is constructible in linear time. 1/39

REX NFA Proof: The proof works by induction, using the recursive definition of regular expressions. First we need to show how to accept the base case regular expressions a S, e and. These are respectively accepted by the NFA s: q 0 a q 1 q 0 q 0 1/40

REX NFA Proof: The proof works by induction, using the recursive definition of regular expressions. First we need to show how to accept the base case regular expressions a S, e and. These are respectively accepted by the NFA s: q 0 a q 1 q 0 q 0 Finally, we need to show how to inductively accept regular expressions formed by using the regular operations. These are just the constructions that we saw before, encapsulated by: 1/41

REX NFA exercise: Find NFA for ab a 1/42

REX NFA FA REX We are one step away from showing that FA s NFA s REX s; i.e., all three representation are equivalent. We will be done when we can complete the circle of transformations: NFA FA REX 1/43

NFA REX is simple?!? Then FA REX even simpler! Please solve this simple example: 1 1 1 0 0 1 0 1 0 0 1/44

REX NFA FA REX In converting NFA s to REX s we ll introduce the most generalized notion of an automaton, the so called Generalized NFA or GNFA. In converting into REX s, we ll first go through a GNFA: NFA GNFA FA REX 1/45

GNFA s Definition: A generalized nondeterministic finite automaton (GNFA) is a graph whose edges are labeled by regular expressions, with a unique start state with in-degree 0, but arrows to every other state and a unique accept state with out-degree 0, but arrows from every other state (note that accept state start state) and an arrow from any state to any other state (including self). A GNFA accepts a string s if there exists a path p from the start state to the accept state such that w is an element of the language generated by the regular expression obtained by concatenating all labels of the edges in p. The language accepted by a GNFA consists of all the accepted strings of the GNFA. 1/46

GNFA Example 000 0 e b (0110 1001)* a c This is a GNFA because edges are labeled by REX s, start state has no in-edges, and the unique accept state has no out-edges. Convince yourself that 000000100101100110 is accepted. 1/47

NFA REX conversion process 1. Construct a GNFA from the NFA. A. If there are more than one arrows from one state to another, unify them using B. Create a unique start state with in-degree 0 C. Create a unique accept state of out-degree 0 D. [If there is no arrow from one state to another, insert one with label Ø] 2. Loop: As long as the GNFA has strictly more than 2 states: Rip out arbitrary interior state and modify edge labels. start r accept 3. The answer is the unique label r. 1/48

NFA REX: Ripping Out. Ripping out is done as follows. If you want to rip the middle state v out (for all pairs of neighbors u,w) r 2 u r 1 v r 3 w r 4 then you ll need to recreate all the lost possibilities from u to w. I.e., to the current REX label r 4 of the edge (u,w) you should add the concatenation of the (u,v ) label r 1 followed by the (v,v )-loop label r 2 repeated arbitrarily, followed by the (v,w ) label r 3.. The new (u,w) substitute would therefore be: u r 4 r 1 (r 2 )*r 3 w 1/49

FA REX: Example 1/50

FA REX: Exercise 1/51

Summary: FA NFA REX This completes the demonstration that the three methods of describing regular languages are: 1. Deterministic FA s 2. NFA s 3. Regular Expressions All these are equivalent! 1/52

Remark about Automaton Size Creating an automaton of small size is often advantageous. Allows for simpler/cheaper hardware, or better exam grades. Designing/Minimizing automata is therefore a funny sport. Example: 0,1 b 0 c 0 1 0,1 a 0 d 1 e 1 1 0,1 f 0 g 1/53

Minimization Definition: An automaton is irreducible if it contains no useless states, and no two distinct states are equivalent. 1/54

Minimization Definition: An automaton is irreducible if it contains no useless states, and no two distinct states are equivalent. By following these two rules, you can arrive at an irreducible FA. Generally, such a local minimum does not have to be a global minimum. It can be shown however, that these minimization rules actually produce the global minimum automaton. 1/55

Minimization Definition: An automaton is irreducible if it contains no useless states, and no two distinct states are equivalent. By following these two rules, you can arrive at an irreducible FA. Generally, such a local minimum does not have to be a global minimum. It can be shown however, that these minimization rules actually produce the global minimum automaton. The idea is that two prefixes u,v are indistinguishable iff for all suffixes x, ux L iff vx L. If u and v are distinguishable, they cannot end up in the same state. Therefore the number of states must be at least as many as the number of pairwise distinguishable prefixes. 1/56

Three tough languages 1) L 1 = {0 n 1 n n 0} 2) L 2 = {w w has an equal number of 0s and 1s} 3) L 3 = {w w has an equal number of occurrences of 01 and 10 as substrings} 1/57

Three tough languages 1) L 1 = {0 n 1 n n 0} 2) L 2 = {w w has an equal number of 0s and 1s} 3) L 3 = {w w has an equal number of occurrences of 01 and 10 as substrings} In order to fully understand regular languages, we also must understand their limitations! 1/58