Constructive Formalization of Regular Languages

Similar documents
Constructive Formalization of Regular Languages Jan-Oliver Kaiser

3515ICT: Theory of Computation. Regular languages

CS 455/555: Finite automata

Finite State Automata

NOTES ON AUTOMATA. Date: April 29,

Finite Automata and Regular languages

Sri vidya college of engineering and technology

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Regular Language Representations in the Constructive Type Theory of Coq

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 9 : Myhill-Nerode Theorem and applications

Theory of Computation (I) Yijia Chen Fudan University

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Definition of Büchi Automata

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

COMP4141 Theory of Computation

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

We define the multi-step transition function T : S Σ S as follows. 1. For any s S, T (s,λ) = s. 2. For any s S, x Σ and a Σ,

Notes on State Minimization

Obtaining the syntactic monoid via duality

CS 154, Lecture 3: DFA NFA, Regular Expressions

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

What we have done so far

CS 154 Introduction to Automata and Complexity Theory

Equivalence of DFAs and NFAs

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Warshall s algorithm

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 5 : DFA minimization

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

1 More finite deterministic automata

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

1 Two-Way Deterministic Finite Automata

Intro to Theory of Computation

Inf2A: Converting from NFAs to DFAs and Closure Properties

Büchi Automata and their closure properties. - Ajith S and Ankit Kumar

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

More on Finite Automata and Regular Languages. (NTU EE) Regular Languages Fall / 41

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Incorrect reasoning about RL. Equivalence of NFA, DFA. Epsilon Closure. Proving equivalence. One direction is easy:

DD2371 Automata Theory

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Computational Models - Lecture 3

Decision, Computation and Language

Finite Automata and Regular Languages

Le rning Regular. A Languages via Alt rn ting. A E Autom ta

Theory of computation: initial remarks (Chapter 11)

CS21 Decidability and Tractability

Regular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza

CS243, Logic and Computation Nondeterministic finite automata

On closures of lexicographic star-free languages. E. Ochmański and K. Stawikowska

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Recap DFA,NFA, DTM. Slides by Prof. Debasis Mitra, FIT.

Constructions on Finite Automata

Automata extended to nominal sets

Finite Universes. L is a fixed-length language if it has length n for some

Extended transition function of a DFA

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

CSC236 Week 11. Larry Zhang

Algebraic Approach to Automata Theory

Automata and Formal Languages - CM0081 Non-Deterministic Finite Automata

Computational Models: Class 3

Computational Models - Lecture 1 1

2. Syntactic Congruences and Monoids

Finite Automata Part Two

Computational Models - Lecture 4

Finite Automata. Mahesh Viswanathan

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam

Introduction to Kleene Algebra Lecture 9 CS786 Spring 2004 February 23, 2004

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

2. Elements of the Theory of Computation, Lewis and Papadimitrou,

Simplification of finite automata

Computational Models #1

Chapter 5. Finite Automata

Nondeterministic Finite Automata

SEPARATING REGULAR LANGUAGES WITH FIRST-ORDER LOGIC

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CS 275 Automata and Formal Language Theory

Lecture 17: Language Recognition

DM17. Beregnelighed. Jacob Aae Mikkelsen

jflap demo Regular expressions Pumping lemma Turing Machines Sections 12.4 and 12.5 in the text

Classes and conversions

CONCATENATION AND KLEENE STAR ON DETERMINISTIC FINITE AUTOMATA

Lecture 3: Nondeterministic Finite Automata

CS 121, Section 2. Week of September 16, 2013

CSE 105 THEORY OF COMPUTATION

CS256/Spring 2008 Lecture #11 Zohar Manna. Beyond Temporal Logics

Regular expressions and Kleene s theorem

Recitation 2 - Non Deterministic Finite Automata (NFA) and Regular OctoberExpressions

Computational Models - Lecture 3 1

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Nondeterministic Finite Automata

The theory of regular cost functions.

Kleene Algebras and Algebraic Path Problems

Algebras with finite descriptions

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

Non-Deterministic Finite Automata

Transcription:

Constructive Formalization of Regular Languages Jan-Oliver Kaiser Advisors: Christian Doczkal, Gert Smolka Supervisor: Gert Smolka UdS November 7, 2012 Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 1 / 21

Table of Contents 1 Motivation 2 Recap Regular Expressions Finite Automata Finite Automata Regular Expression to Finite Automaton 3 Previous Talk Finite Automaton to Regular Expression 4 Myhill-Nerode Theorem Formalization of Equivalence Classes Myhill Relations and Nerode Relations Minimizing Equivalence Classes 5 Summary Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 2 / 21

Motivation Goal: Build an extensive, elegant, constructive formalization of regular languages that includes 1 regular expressions, 2 the decidability of equivalence of regular expressions, 3 finite automata, 4 and the Myhill-Nerode theorem. How: Coq with SSReflect. No sacrifices for performance. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 3 / 21

Regular Expressions Definition We use extended Regular Expressions (RE) over an alphabet Σ: r; s ::= ε a rs r + s r & s r r L( ) = L(ε) = {ε} L(.) = Σ L(a) = {a} L( r) = Σ \L(r) L(r ) = L(r) L(r + s) = L(r) L(s) L(r&s) = L(r) L(s) L(rs) = L(r) L(s) Implementation taken from Coquand and Siles 1. This saved us quite a bit of work. 200 lines of code including an implementation of regular languages and lots of useful lemmas. 1 Thierry Coquand and Vincent Siles. A Decision Procedure for Regular Expression Equivalence in Type Theory. In: CPP. 2011, pp. 119 134. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 4 / 21

Finite Automata Definition Our Finite Automata (FA) over an alphabet Σ consist of 1 a set of states Q, 2 a starting state s Q, 3 a set of final states F Q, 4 a transition relation δ Q Σ Q. Two types: one for non-deterministic FA (δ may be non-functional), one for deterministic FA (δ is functional). For our deterministic FA, δ is also total and, thus, a function. This helped, but maybe not a lot. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 5 / 21

Finite Automata Formalization Our definition is very close to the textbook definition. 120 lines of code including some general lemmas for later proofs. Record nfa : Type := { nfa state :> fintype; nfa s : nfa state ; nfa fin : pred nfa state ; nfa step : nfa state > char > pred nfa state }. Record dfa : Type := { dfa state :> fintype; dfa s : dfa state ; dfa fin : pred dfa state ; dfa step : dfa state > char > dfa state }. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 6 / 21

RE = FA Structure of proof given by inductive definition of RE. Plan: Construct FA for every RE constructor. Sounds simple enough.... 800 lines of code. 100 lines of that needed for equivalence of DFA and NFA. Another 100 lines of code for extended regular expressions. This is a candidate for improvement. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 7 / 21

FA = RE We use the Transitive Closure method, Kleene s original proof 2. This method recursively builds a regular expression R X x,y that recognizes words whose runs starting in x only pass through states in X and end in y. The previous version constructed R k x,y which translates to R {z #(z)<k} x,y where # is an ordering on Q. Instead of nat, we now recurse on the size of a finite subset of Q. 3 This also avoids cumbersome conversions from nat to SSReflect s ordinals and, finally, to states. 2 S. C. Kleene. Representation of Events in Nerve Nets and Finite Automata. In: Automata Studies (1965). 3 Dexter Kozen. Automata and computability. Undergraduate texts in computer science. Springer, 1997, pp. I XIII, 1 400. isbn: 978-0-387-94907-9. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 8 / 21

FA = RE Proof Outline We introduce a decidable language L X x,y that directly encodes the idea behind R X x,y as a boolean predicate. The proof of FA = RE consists of three steps: 1 We show that L X x,y respects the defining recursive equation of R X x,y. 2 We show that for all w Σ, w L X x,y w R X x,y. 3 We show that f F LQ s,f = L(A) and thus f F RQ s,f = L(A). Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 9 / 21

FA = RE After some restructuring: 550 lines of code, 150 of which are general infrastructure. Previous version: 800 lines of code, much harder to read. Lemma L split k i j a w: let k := k ord k in (a :: w) \in Lˆk.+1 i j > (a :: w) \in Lˆk i j \/ exists w1, exists w2, a :: w = w1 ++ w2 /\ w1!= [::] /\ w1 \in Lˆk i (enum val k) /\ w2 \in Lˆk.+1 (enum val k) j. Lemma L split X x y z w: w \in Lˆ(z : X) x y > w \in LˆX x y \/ exists w1, exists w2, w = w1 ++ w2 /\ size w2 < size w /\ w1 \in LˆX x z /\ w2 \in Lˆ(z : X) z y Figure: Previous and current version of the same lemma Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 10 / 21

Myhill-Nerode Theorem It turns out that there are two different concepts: Myhill relations and the Nerode relation. We also consider a related characterization: weak Nerode relations. All these are equivalence relations which we require to be of finite index, i.e. to have a finite number of equivalence classes. However, Coq has no notion of quotient types. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 11 / 21

Myhill-Nerode Theorem Equivalence Relations of Finite Index We use functions of finite domain to represent equivalence relations of finite index. Think of the domain as the set of equivalence classes. We also need to have a representative of every equivalence class. Thus, we require surjectivity. Record Fin Eq Cls := { fin type : fintype; fin func :> word > fin type; fin surjective : surjective fin func }. With constructive choice, we can then give a canonical representative of every every equivalence class. Definition cr (f : Fin Eq Cls) x := xchoose ( fin surjective f x). Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 12 / 21

Myhill-Nerode Theorem Myhill relations, weak Nerode relations, Nerode relation An equivalence relation of finite index is a Myhill relation 4 for L iff 1 is right-congruent, i.e. u, v Σ. a Σ. u v = ua va, 2 and refines L, i.e. u, v Σ. u v = (u L v L). An equivalence relation of finite index is a weak Nerode relation for L iff u, v Σ. u v = w Σ. (uw L vw L). The Nerode relation 5. =L for L is defined such that u, v Σ. u. = L v w Σ. (uw L vw L). 4 John R. Myhill. Finite Automata and the Representation of Events. Tech. rep. WADC TR-57-624. Wright-Paterson Air Force Base, 1957. 5 Anil Nerode. Linear Automaton Transformations. In: Proceedings of the American Mathematical Society 9.4 (1958), pp. 541 544. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 13 / 21

Myhill-Nerode Theorem Formalization of Myhill, weak Nerode and Nerode relation For all three relations, we build a record that consists of an equivalence relation of finite index and proofs of the properties of the respective relation. Example: Record Myhill Rel := { myhill func :> Fin Eq Cls; myhill congruent : right congruent myhill func ; myhill refines : refines myhill func }. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 14 / 21

Myhill-Nerode Theorem Our version of the Myhill-Nerode theorem states that the following are equivalent 1 language L is accepted by a DFA, 2 we can construct a Myhill relation for L, 3 we can construct a weak Nerode relation for L, 4 the Nerode relation for L is of finite index. (1) = (2) is easy. (Map word w to the last state of its run on the automaton.) (2) = (3) is a trivial inductive proof. (4) = (1) is also straight-forward. (Use equivalence classes as states.) Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 15 / 21

Myhill-Nerode Theorem (3) = (4) A weak Nerode relation (given as a function f ) has at least as many equivalence classes as the Nerode relation. Some of them are redundant w.r.t. to L, i.e. we may have equivalence classes s.t. uv Σ. fu fu w Σ. (uw L vw L). Our goal is to merge these equivalence classes. In fact, we construct an equivalence relation on these equivalence classes. Equivalence classes are contained in that relation iff they are equivalent w.r.t. to L. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 16 / 21

Myhill-Nerode Theorem Finding equivalent equivalence classes We use the table-filling algorithm 6 for minimizing DFA. Our fixed-point algorithm finds all equivalence classes whose representatives are distinguishable w.r.t. to L. The remaining equivalence classes are then equivalent w.r.t. to L. Start: {(x, y) cr(x) / L cr(y) L}. Step: if the previous result is d, the new result is d {(x, y) a. (f (cr(x)a), f (cr(y)a)) d}. Due to finiteness of the domain and monotonicity of the algorithm, it has a fixed point which we can compute. 6 D.A. Huffman. The synthesis of sequential switching circuits. In: Journal of the Franklin Institute 257.3 (1954), pp. 161 190. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 17 / 21

Myhill-Nerode Theorem Proof Outline We construct a function f min that maps every equivalence class to the set of equivalence classes equivalent to it w.r.t. L. We then show that f min is surjective and encodes an equivalence relation of finite index. Finally, we show that f min is equivalent to the Nerode relation. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 18 / 21

Myhill-Nerode Theorem The lemmas of this chapter are rather abstract, which makes for nice and short statements. The proofs also received more refinement than the other chapters. They are very concise. The entire chapter consists of 430 lines of code. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 19 / 21

Summary All in all we have 2100 lines of code. Mostly very close to the mathematical definitions. Code produced in the beginning of the project might be improved quite a bit. Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 20 / 21

Thank you for your attention! Jan-Oliver Kaiser (UdS) Constr. Formalization of Reg. Languages November 7, 2012 21 / 21