Java II Finite Automata I

Similar documents
Java II Finite Automata I

Lecture 3: Nondeterministic Finite Automata

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

Automata and Languages

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Chapter 2: Finite Automata

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Nondeterministic Finite Automata

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Lecture 1: Finite State Automaton

Nondeterministic Finite Automata

Finite Automata. BİL405 - Automata Theory and Formal Languages 1

Nondeterministic Finite Automata. Nondeterminism Subset Construction

Classes and conversions

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata.

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Theory of Computation (I) Yijia Chen Fudan University

Finite Automata. Dr. Neil T. Dantam. Fall CSCI-561, Colorado School of Mines. Dantam (Mines CSCI-561) Finite Automata Fall / 35

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Non-deterministic Finite Automata (NFAs)

Finite Automata. Seungjin Choi

September 7, Formal Definition of a Nondeterministic Finite Automaton

Deterministic Finite Automata (DFAs)

Theory of Computation

Chapter 5. Finite Automata

Deterministic Finite Automata (DFAs)

CS 455/555: Finite automata

Inf2A: Converting from NFAs to DFAs and Closure Properties

Regular Language Equivalence and DFA Minimization. Equivalence of Two Regular Languages DFA Minimization

Sri vidya college of engineering and technology

Notes on State Minimization

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

NFA and regex. the Boolean algebra of languages. regular expressions. Informatics 1 School of Informatics, University of Edinburgh

Nondeterministic Finite Automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Constructions on Finite Automata

Chapter 6: NFA Applications

Finite-State Transducers

Chapter Five: Nondeterministic Finite Automata

Deterministic Finite Automaton (DFA)

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Chapter 2: Finite Automata

CS 154, Lecture 3: DFA NFA, Regular Expressions

Finite Automata. Dr. Neil T. Dantam. Fall CSCI-561, Colorado School of Mines. Dantam (Mines CSCI-561) Finite Automata Fall / 43

Finite Automata. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

CS21 Decidability and Tractability

Deterministic Finite Automata (DFAs)

CS243, Logic and Computation Nondeterministic finite automata

Nondeterministic finite automata

Finite-State Machines (Automata) lecture 12

Finite Automata. Finite Automata

Extended transition function of a DFA

Equivalence of DFAs and NFAs

Automata and Formal Languages - CM0081 Finite Automata and Regular Expressions

3515ICT: Theory of Computation. Regular languages

CSE 105 Theory of Computation Professor Jeanne Ferrante

Theory of Computation

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )

Automata and Formal Languages - CM0081 Non-Deterministic Finite Automata

Constructions on Finite Automata

Decision, Computation and Language

Automata Theory for Presburger Arithmetic Logic

Before we show how languages can be proven not regular, first, how would we show a language is regular?

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

컴파일러입문 제 3 장 정규언어

CSE 135: Introduction to Theory of Computation Optimal DFA

Author: Vivek Kulkarni ( )

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

Chap. 1.2 NonDeterministic Finite Automata (NFA)

Finite State Automata

Lecture 4 Nondeterministic Finite Accepters

Homomorphisms and Efficient State Minimization

Page 1. Compiler Lecture Note, (Regular Language) 컴파일러입문 제 3 장 정규언어. Database Lab. KHU

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

Uses of finite automata

Nondeterministic Finite Automata

Nondeterminism and Epsilon Transitions

Finite Automata and Regular Languages

Nondeterminism. September 7, Nondeterminism

Closure under the Regular Operations

Theory of Languages and Automata

Non-Deterministic Finite Automata

UNIT-III REGULAR LANGUAGES

Theory of Computation (II) Yijia Chen Fudan University

CS21 Decidability and Tractability

Finite Automata and Regular Languages (part III)

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

Intro to Theory of Computation

Finite Universes. L is a fixed-length language if it has length n for some

CS 154 Formal Languages and Computability Assignment #2 Solutions

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Finite Automata. Mahesh Viswanathan

Recitation 2 - Non Deterministic Finite Automata (NFA) and Regular OctoberExpressions

CS 581: Introduction to the Theory of Computation! Lecture 1!

Transcription:

Java II Finite Automata I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum für künstliche Intelligenz November, 23

Processing Regular Expressions We already learned about Java s regular expression functionality Now we get to know the machinery behind Pattern and Matcher classes Compiling a regular expression into a Pattern object produces a Finite Automaton This automaton is then used to perform the matching tasks We will see how to construct a finite automaton that recognizes an input string, i.e., tries to find a full match

Definition: Finite Automaton A finite automaton (FA) is a tuple A =< Q, Σ, δ, q, F > Q a finite non-empty set of states Σ a finite alphabet of input letters δ a (total) transition function Q Σ Q q Q the initial state F Q the set of final (accepting) states Transition graphs (diagrams): initial state states transition final state d o g q q

Finite Automata: Matching A finite automaton accepts a given input string s if there is a sequence of states p, p 2,..., p s Q such that. p = q, the start state 2. δ(p i, s i ) = p i+, where s i is the i-th character in s 3. p s F, i.e., a final state A string is successfully matched if we have found the appropriate sequence of states Imagine the string on an input tape with a pointer that is advanced when using a δ transition The set of strings accepted by an automaton is the accepted language, analogous to regular expressions

(Non)deterministic Automata in the definition of automata, δ was a total function given an input string, the path through the automaton is uniquely determined those automata are therefore called deterministic for nondeterministic FA, δ is a transition relation δ : Q Σ {} P(Q), where P(Q) is the powerset of Q allows transitions from one state into several states with the same input symbol need not be total can have transitions labeled (not in Σ), which represents the empty string

RegExps Automata Construct nondeterminstic automata from regular expressions (αβ) q α... q f α q β... q f β (α β) q q α... q f α q β... q f β q f (α) q q α... q f α q f

NFA vs. DFA Traversing a DFA is easy given the input string: the path is uniquely determined In contrast, traversing an NFA requires keeping track of a set of (current) states, starting with the set {q o } Processing the next input symbol means taking all possible outgoing transitions from this set and collecting the new set From every NFA, an equivalent DFA (one which does accept the same language), can be computed Basic Idea: track the subsets that can be reached for every possible input

Traversing an NFA 2 a 3 4 b 5 6 7 a 8 b 9 abab

Traversing an NFA 2 a 3 4 b 5 6 7 a 8 b 9 abab

Traversing an NFA 2 a 3 4 b 5 6 7 a 8 b 9 abab

Traversing an NFA 2 a 3 4 b 5 6 7 a 8 b 9 abab

Traversing an NFA 2 a 3 4 b 5 6 7 a 8 b 9 abab

Traversing an NFA 2 a 3 4 b 5 6 7 a 8 b 9 abab

NFA DFA: Subset Construction Simulate in parallel all possible moves the automaton can make The states of the resulting DFA will represent sets of states of the NFA, i.e., elements of P(Q) We use two operations on states/state-sets of the NFA -closure(t ) Set of states reachable from any state s in T on - transitions move(t, a) Set of states to which there is a transition from one state in T on input symbol a The final states of the DFA are those where the corresponding NFA subset contains a final state

Algorithm: Subset Construction proc SubsetConstruction(s ) DFAStates = -closure({s }) while there is an unmarked state T in DFAStates do mark T for each input symbol a do U := -closure(move(t, a)) DFADelta[T, a] := U if U DFAStates then add U as unmarked to DFAStates proc -closure(t ) -closure := T ; to check := T while to check not empty do get some state t from to check for each state u with edge labeled from t to u if u -closure then add u to -closure and to check

Example: Subset construction 2 a 3 4 b 5 6 7 a 8 b 9

Example: Subset construction 2 a 3 4 b 5 6 7 a 8 b 9,, 2,4,7

Example: Subset construction 2 a 3 4 b 5 6 7 a 8 b 9,, 2,4,7 a,2,3, 4,6,7,8

Example: Subset construction 2 a 3 4 b 5 6 7 a 8 b 9,, 2,4,7 a,2,3, 4,6,7,8 b,2,4, 5,6,7

Example: Subset construction 2 a 3 4 b 5 6 7 a 8 b 9,, 2,4,7,2,3, a 4,6,7,8 a b,2,4, 5,6,7

Example: Subset construction 2 a 3 4 b 5 6 7 a 8 b 9,, 2,4,7,2,3, a 4,6,7,8 b a b,2,4, 5,6,7

Example: Subset construction 2 a 3 4 b 5 6 7 a 8 b 9 a,, 2,4,7,2,3, a 4,6,7,8 b a b,2,4, 5,6,7

Example: Subset construction 2 a 3 4 b 5 6 7 a 8 b 9 a,, 2,4,7 a b,2,3, 4,6,7,8 a b,2,4, 6,5,7,9 b,2,4, 5,6,7

Example: Subset construction 2 a 3 4 b 5 6 7 a 8 b 9 a,, 2,4,7 a b,2,3, 4,6,7,8 a a b,2,4, 6,5,7,9 b,2,4, 5,6,7 b

Time/Space Considerations DFA traversal is linear to the length of input string x NFA needs O(n) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n x, so why use NFAs?

Time/Space Considerations DFA traversal is linear to the length of input string x NFA needs O(n) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n x, so why use NFAs? There are DFA that have at least 2 n states!

Time/Space Considerations DFA traversal is linear to the length of input string x NFA needs O(n) space (states+transitions), where n is the length of the regular expression NFA traversal may need time n x, so why use NFAs? There are DFA that have at least 2 n states! Solution : Lazy construction of the DFA: construct DFA states on the fly up to a certain amount and cache them Solution 2: Try to minimize the DFA: There is a unique (modulo state names) minimal automaton for a regular language!

Minimal Automata Take any state q of the deterministic automaton to minimize and assume it to be the (single) start state We call the language that this automaton accepts the right language of q The language of each state consists of suffixes of the overall accepted language If two states accept the same language, they are equivalent and can be merged To minimize the automaton, merge all equivalent nodes This is implemented by first partitioning the original set of states into equivalence classes, i.e., sets of equivalent states Finally, each equivalence class is replaced by a single state, merging transitions accordingly

DFA Minization Every DFA defines a unique language In general, there may be many DFAs for a given language The following DFAs accept the same language

Indistinguishable States Two states p and q are called indistinguishable, if for all w Σ, F being the set of final states δ (p, w) F δ (q, w) F, and δ (p, w) F δ (q, w) F L p = {w Σ δ (p, w) F } is also called suffix language of the automaton. The states p and q behave in the same way for all possible strings: L p = L q When is a state p distinguishable from q?

Indistinguishable States Two states p and q are called indistinguishable, if for all w Σ, F being the set of final states δ (p, w) F δ (q, w) F, and δ (p, w) F δ (q, w) F L p = {w Σ δ (p, w) F } is also called suffix language of the automaton. The states p and q behave in the same way for all possible strings: L p = L q When is a state p distinguishable from q? There is a string w where δ (p, w) F and δ (q, w) F or vice versa Algorithm: start with strings of length and work yourself backwards through the automaton

Indistinguishable States Indistinguishable states behave in an identical way As a consequence, a set of indistinguishable states can be merged into a single state Indistinguishability is an equivalence relation: Reflexive: Each state is indistinguishable from itself Symmetric: If p is indistinguishable from q, then q is indistinguishable from p Transitive: If p is indistinguishable from q, and q is indistinguishable from r, then p is indistinguishable from r.

Indistinguishability And Partitions Indistinguishability is an equivalence relation: Reflexive: Each state is indistinguishable from itself Symmetric: If p is indistinguishable from q, then q is indistinguishable from p Transitive: If p is indistinguishable from q, and q is indistinguishable from r, then p is indistinguishable from r. (Total) equivalence relations on nodes V of a graph induce a (total) partitioning into subsets of nodes N = n, n 2,..., n k, such that For all i and j, n i n j = i n i = V 5 n 3 9 V 2 4 n 3 6 8 n 2

Detecting Indistinguishable States Basis: Every nonaccepting state is distinguishable from any accepting state (w = ) Induction: States p and q are distinguishable if there is some input symbol a such that δ(p, a) is distinguishable from δ(q, a) Testing all possible state pairs until no more distinguishable states can be found leaves the rest indistinguishable The sets of indistinguishable states can be merged into one state per set

Detecting Indistinguishable States p q r Basis: p distinguishable from q and r : q and r go to p, does not separate them : q goes to r, and r to q, which are indistinguishable, again, no separation q and r can be merged into a single state p, qr

The Algorithm Create an n n boolean table distinct, n the number of states, and set all cells to false proc DFAMinimization() for every pair (p, q), p, q Q do if p F and q F or vice versa then distinct(p, q) = true while there is a change in distinct do for every pair (p, q), p, q Q and each a Σ do if distinct(p, q) distinct(δ(p, a), δ(q, a)) distinct(p, q) = true for p Q do // assume an order on the states for q > p Q do if distinct(p, q) then merge q into p, mark q deleted

Minimization in action q q 5 q q q 5, q q, q q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration, q, q 5,

Minimization in action q q 5 q q q q q 5 Iteration 2, q, q 5,

Minimization in action q q q 5 q / q 5 /3, q, q 5/6 q,