Chapter 3. Regular grammars

Similar documents
Introduction to computability

Theory of Computation (II) Yijia Chen Fudan University

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

Sri vidya college of engineering and technology

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

CS481F01 Prelim 2 Solutions

Introduction to Formal Languages, Automata and Computability p.1/42

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

CPS 220 Theory of Computation

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

MA/CSSE 474 Theory of Computation

CS 455/555: Finite automata

Advanced Automata Theory 7 Automatic Functions

CS 121, Section 2. Week of September 16, 2013

CISC4090: Theory of Computation

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

Automata Theory for Presburger Arithmetic Logic

Theory of Computation

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Automata: a short introduction

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

Foundations of Informatics: a Bridging Course

Context Free Language Properties

Further discussion of Turing machines

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Lecture 1: Finite State Automaton

Computational Theory

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Finite Automata and Regular languages

Chapter 6. Properties of Regular Languages

Representing Arithmetic Constraints with Finite Automata: An Overview

Finite Automata and Regular Languages

CPSC 421: Tutorial #1

Theory of Computation

Pushdown Automata. Reading: Chapter 6

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Theory of Computation (I) Yijia Chen Fudan University

Formal Definition of a Finite Automaton. August 26, 2013

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

What we have done so far

Discrete Mathematics. CS204: Spring, Jong C. Park Computer Science Department KAIST

Introduction to Formal Languages, Automata and Computability p.1/51

Computational Models - Lecture 4

Automata Theory and Formal Grammars: Lecture 1

Part I: Definitions and Properties

Büchi Automata and their closure properties. - Ajith S and Ankit Kumar

Finite Automata and Languages

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

A Universal Turing Machine

CS 275 Automata and Formal Language Theory

Deterministic Finite Automata (DFAs)

FLAC Context-Free Grammars

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Computer Sciences Department

Deterministic Finite Automata (DFAs)

Closure under the Regular Operations

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2. March 30, Please write your name in the upper corner of each page.

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Nondeterministic Finite Automata

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

DM17. Beregnelighed. Jacob Aae Mikkelsen

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

Fundamentele Informatica II

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

C2.1 Regular Grammars

Parikh s theorem. Håkan Lindqvist

Automata and Languages

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

Inf2A: Converting from NFAs to DFAs and Closure Properties

C2.1 Regular Grammars

CS481F01 Solutions 6 PDAS

Context Free Grammars

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Computability and Complexity

Chapter Five: Nondeterministic Finite Automata

Computational Models - Lecture 3

CSCI 2200 Foundations of Computer Science Spring 2018 Quiz 3 (May 2, 2018) SOLUTIONS

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Lecture 23 : Nondeterministic Finite Automata DRAFT Connection between Regular Expressions and Finite Automata

1 More finite deterministic automata

Deterministic Finite Automata (DFAs)

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Introduction to Automata

CPS 220 Theory of Computation Pushdown Automata (PDA)

3515ICT: Theory of Computation. Regular languages

Lecture 3: Nondeterministic Finite Automata

Languages. A language is a set of strings. String: A sequence of letters. Examples: cat, dog, house, Defined over an alphabet:

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

Finite State Transducers

Chapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin

Equivalence of DFAs and NFAs

Pushdown Automata (2015/11/23)

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Theory of Computer Science

ACS2: Decidability Decidability

Transcription:

Chapter 3 Regular grammars 59

3.1 Introduction Other view of the concept of language: not the formalization of the notion of effective procedure, but set of words satisfying a given set of rules Origin : formalization of natural language. 60

Example a phrase is of the form subject verb a subject is a pronoun a pronoun is he or she a verb is sleeps or listens Possible phrases: 1. he listens 2. he sleeps 3. she sleeps 4. she listens 61

Grammars Grammar: generative description of a language Automaton: analytical description Example: programming languages are defined by a grammar (BNF), but recognized with an analytical description (the parser of a compiler), Language theory establishes links between analytical and generative language descriptions. 62

3.2 Grammars A grammar is a 4-tuple G = (V, Σ, R, S), where V is an alphabet, Σ V is the set terminal symbols (V Σ is the set of nonterminal symbols), R (V + V ) is a finite set of production rules (also called simply rules or productions), S V Σ is the start symbol. 63

Notation: Elements of V Σ : A, B,... Elements of Σ : a, b,.... Rules (α, β) R : α β or α G β. The start symbol is usually written as S. Empty word: ε. 64

Example : V = {S, A, B, a, b}, Σ = {a, b}, R = {S A, S B, B bb, A aa, A ε, B ε}, S is the start symbol. 65

Words generated by a grammar: example aaaa is in the language generated by the grammar we have just described: S A rule S A aa A aa aaa A aa aaaa A aa aaaaa A aa aaaa A ε 66

Generated words: definition Let G = (V, Σ, R, S) be a grammar and u V +, v V be words. The word v can be derived in one step from u by G (notation u G v) if and only if: u = xu y (u can be decomposed in three parts x, u and y ; the parts x and y being allowed to be empty), v = xv y (v can be decomposed in three parts x, v and y), u G v (the rule (u, v ) is in R). 67

Let G = (V, Σ, R, S) be a grammar and u V +, v V be words. The word v can be derived in several steps from u (notation u G v) if and only if k 0 and v 0... v k V + such that u = v 0, v = v k, v i G v i+1 for 0 i < k. 68

Words generated by a grammar G: words v Σ (containing only terminal symbols) such that S G v. The language generated by a grammar G (written L(G)) is the set L(G) = {v Σ S G v}. Example : The language generated by the grammar shown in the example above is the set of all words containing either only a s or only b s. 69

Types of grammars Type 0: no restrictions on the rules. Type 1: Context sensitive grammars. The rules satisfy the condition Exception: the rule α β α β. S ε is allowed as long as the start symbol S does not appear in the right hand side of a rule. 70

Type 2: context-free grammars. Productions of the form A β where A V Σ and there is no restriction on β. Type 3: regular grammars. Productions rules of the form where A, B V Σ and w Σ. A wb A w 71

3.3 Regular grammars Theorem: A language is regular if and only if it can be generated by a regular grammar. A. If a language is regular, it can be generated by a regular grammar. If L is regular, there exists M = (Q, Σ,, s, F ) such that L = L(M). From M, one can easily construct a regular grammar G = (V G, Σ G, S G, R G ) generating L. 72

G is defined by: Σ G = Σ, V G = Q Σ, S G = s, R G = { A wb, for all(a, w, B) A ε for alla F } 73

B. If a language is generated by a regular grammar, it is regular. Let G = (V G, Σ G, S G, R G ) be the grammar generating L. A nondeterministic finite automaton accepting L can be defined as follows: Q = V G Σ G {f} (the states of M are the nonterminal symbols of G to which a new state f is added), Σ = Σ G, s = S G, F = {f}, = { (A, w, B), for alla wb RG (A, w, f), for alla w R G }. 74

3.4 The regular languages We have seen four characterizations of the regular languages: 1. regular expressions, 2. deterministic finite automata, 3. nondeterministic finite automata, 4. regular grammars. 75

Properties of regular languages Let L 1 and L 2 be two regular languages. L 1 L 2 is regular. L 1 L 2 is regular. L 1 is regular. L R 1 is regular. L 1 = Σ L 1 is regular. L 1 L 2 is regular. 76

L 1 L 2 regular? L 1 L 2 = L 1 L 2 Alternatively, if M 1 = (Q 1, Σ, δ 1, s 1, F 1 ) accepts L 1 and M 2 = (Q 2, Σ, δ 2, s 2, F 2 ) accepts L 2, the following automaton, accepts L 1 L 2 : Q = Q 1 Q 2, δ((q 1, q 2 ), σ) = (p 1, p 2 ) if and only if δ 1 (q 1, σ) = p 1 and δ 2 (q 2, σ) = p 2, s = (s 1, s 2 ), F = F 1 F 2. 77

Let Σ be the alphabet on which L 1 is defined, and let π : Σ Σ be a function from Σ to another alphabet Σ. This fonction, called a projection function can be extended to words by applying it to every symbol in the word, i.e. for w = w 1... w k Σ, π(w) = π(w 1 )... π(w k ). If L 1 is regular, the language π(l 1 ) is also regular. 78

Algorithms Les following problems can be solved by algorithms for regular languages: w L? L =? L = Σ? (L = ) L 1 L 2? (L 2 L 1 = ) L 1 = L 2? (L 1 L 2 and L 2 L 1 ) 79

3.5 Beyond regular languages Many languages are regular, But, all languages cannot be regular for cardinality reasons. We will now prove, using another techniques that some specific languages are not regular. 80

Basic Observations 1. All finite languages (including only a finite number of words) are regular. 2. A non regular language must thus include an infinite number of words. 3. If a language includes an infinite number of words, there is no bound on the size of the words in the language. 4. Any regular language is accepted by a finite automaton that has a given number number m of states. 81

5. Consider an infinite regular language and an automaton with m states accepting this language. For any word whose length is greater than m, the execution of the automaton on this word must go through an identical state s k at least twice, a nonempty part of the word being read between these two visits to s k. s x u y s s k s f k 6. Consequently, all words of the form xu y are also accepted by the automaton and thus are in the language. 82

The pumping lemmas (theorems) First version Let L be an infinite regular language. Then there exists words x, u, y Σ, with u ε such that xu n y L n 0. Second version : Let L be a regular language and let w L be such that w Q where Q is the set of states of a determnistic automaton accepting L. Then x, u, y, with u ε and xu Q such that xuy = w and, n, xu n y L. 83

Applications of the pumping lemmas The langage a n b n is not regular. Indeed, it is not possible to find words x, u, y such that xu k y a n b n k and thus the pumping lemma cannot be true for this language. u a : impossible. u b : impossible. u (a b) (a b ) : impossible. 84

The language L = a n2 is not regular. Indeed, the pumping lemma (second version) is contradicted. Let m = Q be the number of states of an automaton accepting L. Consider a m2. Since m 2 m, there must exist x, u and y such that xu m and xu n y L n. Explicitly, we have x = a p 0 p m 1, u = a q 0 < q m, y = a r r 0. Consequently xu 2 y L since p + 2q + r is not a perfect square. Indeed, m 2 < p + 2q + r m 2 + m < (m + 1) 2 = m 2 + 2m + 1. 85

The language L = {a n n is prime} is not regular. The first pumping lemma implies that there exists constants p, q and r such that k xu k y = a p+kq+r L, in other words, such that p + kq + r is prime for all k. This is impossible since for k = p + 2q + r + 2, we have p + kq + r = (q + 1) }{{} >1 (p + 2q + r), } {{ } >1 86

Applications of regular languages Problem : To find in a (long) character string w, all ocurrences of words in the language defined by a regular expression α. 1. Consider the regular expression β = Σ α. 2. Build a nondeterministic automaton accepting the language defined by β 3. From this automaton, build a deterministic automaton A β. 4. Simulate the execution of the automaton A β on the word w. Whenever this automaton is in an accepting state, one is at the end of an occurrence in w of a word in the language defined by α. 87

Applications of regular languages II: handling arithmetic A number written in base r is a word over the alphabet {0,..., r 1} ({0,..., 9} in decimal, {0, 1} en binary). The number represented by a word w = w 0... w l is nb(w) = l i=0 r l i nb(w i ) Adding leading 0 s to the representation of a number does not modify the represented value. A number thus has a infinite number of possible representations. Number encodings are read most significant digit first, and all possible encodings will be taken into account. Exemple: The set of binary representations of 5 is the language 0 101. 88

Which sets of numbers can be represented by regular languages? Finite sets. The set of multiples of 2 is represented by the language (0 1) 0. The set of powers of 2 is represented by the language 0 10, but is not representable in base 3. The set of multiples of 3 is represented by the following automaton. 0 1 > 1 1 0 0 89

Set of numbers represented by regular languages (continued) The set of numbers x 5 is represented by the automaton > 0 1 0 0 1 1 0, 1 0, 1 0, 1 More generally, one can represent sets of the form {ax x N} or {x a x N} for any given value of a. 90

Set of numbers represented by regular languages (continued II) Combining the two types of sets: sets of the form {ax + b x N}, for any given a and b. Union of such sets: the ultimately periodic sets. Intersection and complementation add nothing more. The only sets that can be represented in all bases are the ultimately periodic sets. 91

Representing vectors of numbers Each number is represented by a word, and bits in identical positions are read together. Example: the vector (5, 9) is encoded by the word (0, 1)(1, 0)(0, 0)(1, 1) defined over the alphabet {0, 1} {0, 1}. The set of binary encodings of the vector (5, 9) is (0, 0) (0, 1)(1, 0) (0, 0)(1, 1). 92

Which sets of number vectors can be represented by regular languages? The set of binary encodings of the vectors (x, y) such that x = y is accepted by the automaton (0, 0) > (1, 1) 93

Vectors (x, y) such that x < y (0, 0) (0, 0), (0, 1), (1, 0), (1, 1) > (0, 1) (1, 1) Three-dimentional vectors (x, y, z) such that z = x + y (0, 0, 0), (0, 1, 1), (1, 0, 1) (1, 0, 0), (0, 1, 0), (1, 1, 1) (0, 0, 1) > (1, 1, 0) 94

Definable sets of number vectors (continued) Intersection, union, complement of representable sets (closure properties of regular languages). Modifying the number of dimensions: projection and the inverse operation. Remark: projection does not always preserve the determinism of the automaton. Example: {(x, z) y x + y = z} (x z). (0, 0), (0, 1), (1, 1) (1, 0), (0, 0), (1, 1) (0, 1) > (1, 0) 95

Adding a dimension to the previous automaton yields (0, 1, 0), (0, 1, 1), (1, 1, 1) (1, 1, 0), (0, 1, 0), (1, 1, 1) (0, 0, 0), (0, 0, 1), (1, 0, 1) (1, 0, 0), (0, 0, 0), (1, 0, 1) (0, 1, 1) (0, 0, 1) > (1, 1, 0) (1, 0, 0) which is not equivalent to the automaton to which projection was applied. 96

Representable sets of vectors: conclusions Linear equality and inequality constraints Example: an automaton for x + 2y = 5 can be obtained by combing the automata for the following constraints: z 1 = y z 2 = y + z 1 z 3 = x + z 2 z 3 = 5. There exists also a more direct construction. 97

Representable vector sets: conclusions (continued) Boolean combinations of linear constraints Existential quantification can be handled with projection ( x). For universal quantification, one uses xf f Example: It is possible to build an automaton accepting the representations of the vectors (x, y) satisfying the arithmetic constraint u t[(2x + 3y + t 4u = 5) (x + 5y 3t + 2u = 8)] This is Presburger arithmetic, which corresponds exactly to the sets representable by automata in all bases. 98