REGular and Context-Free Grammars

Similar documents
Finite State Automata Design

Foundations of Informatics: a Bridging Course

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Please give details of your answer. A direct answer without explanation is not counted.

Automata: a short introduction

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

NODIA AND COMPANY. GATE SOLVED PAPER Computer Science Engineering Theory of Computation. Copyright By NODIA & COMPANY

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

AC68 FINITE AUTOMATA & FORMULA LANGUAGES JUNE 2014

Einführung in die Computerlinguistik

Fundamentele Informatica II

Closure under the Regular Operations

Automata Theory CS F-08 Context-Free Grammars

Theory of Computation - Module 3

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

CS500 Homework #2 Solutions

Languages, regular languages, finite automata

Solutions to Problem Set 3

Introduction to Formal Languages, Automata and Computability p.1/51

Nondeterministic Finite Automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Sri vidya college of engineering and technology

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

How do regular expressions work? CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

Computational Models - Lecture 4 1

Formal Definition of a Finite Automaton. August 26, 2013

CMSC 330: Organization of Programming Languages

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

Theory of Computation Turing Machine and Pushdown Automata

CS 455/555: Finite automata

Chapter 1. Formal Definition and View. Lecture Formal Pushdown Automata on the 28th April 2009

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

HW 3 Solutions. Tommy November 27, 2012

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Chapter 3. Regular grammars

INSTITUTE OF AERONAUTICAL ENGINEERING

Introduction to Formal Languages, Automata and Computability p.1/42

Introduction to Theory of Computing

5 Context-Free Languages

Deterministic Finite Automata (DFAs)

EXAMPLE CFG. L = {a 2n : n 1 } L = {a 2n : n 0 } S asa aa. L = {a n b : n 0 } L = {a n b : n 1 } S asb ab S 1S00 S 1S00 100

Theory of Computation (II) Yijia Chen Fudan University

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

CS481F01 Solutions 6 PDAS

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

CS 275 Automata and Formal Language Theory

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

Closure under the Regular Operations

CS243, Logic and Computation Nondeterministic finite automata

CMSC 330: Organization of Programming Languages

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

CS 121, Section 2. Week of September 16, 2013

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs)

Part I: Definitions and Properties

Miscellaneous. Closure Properties Decision Properties

Computational Models - Lecture 4 1

Formal Definition of Computation. August 28, 2013

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

How to write proofs - II

Turing machines and linear bounded automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Properties of Context Free Languages

Concordia University Department of Computer Science & Software Engineering

Turing machines and linear bounded automata

COMP4141 Theory of Computation

CS481F01 Prelim 2 Solutions

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Chapter 6. Properties of Regular Languages

Automata Theory - Quiz II (Solutions)

Theory of Computation

CPSC 421: Tutorial #1

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Chapter Five: Nondeterministic Finite Automata

3515ICT: Theory of Computation. Regular languages

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

CS375 Midterm Exam Solution Set (Fall 2017)

Fooling Sets and. Lecture 5

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Embedded systems specification and design

Theory of Computation (Classroom Practice Booklet Solutions)

Theory of computation: initial remarks (Chapter 11)

Theory of Languages and Automata

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Properties of Context-Free Languages

Before we show how languages can be proven not regular, first, how would we show a language is regular?

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Transcription:

REGular and Context-Free Grammars Nicholas Mainardi 1 Dipartimento di Elettronica e Informazione Politecnico di Milano nicholas.mainardi@polimi.it March 26, 2018 1 Partly Based on Alessandro Barenghi s material, largely enriched with some additional exercises

Grammars What are grammars? Another formalism to define a language Generative approach: the grammar points out how a sentence (i.e. an element of the language) is generated For some grammar classes, automated algorithms to derive the recogniser automaton are available (and pretty widely used!) Possible to define grammar classes corresponding to a precise computing model (FSA, [N D]PDA, TM)

Formalization A formal definition A grammar is defined by a 4-tuple (V n, V t, P, S) where: V n is the non-terminal symbol alphabet V t is the terminal symbol alphabet P V n + (V t V n ) is the set of syntactic productions S V n is the starting symbol, known as the axiom a production rule maps one or more symbols from V n into zero or more symbols from V = (V t V n ) More formally, a derivation α β, with α V n +, α = α 1 α 2 α 3 and β V, β = α 1 β 2 α 3 exists if and only if there is a p P such that p = α 2 β 2 indicates the reflexive and transitive closure of A grammar generates the language L G = {x x V t S x}

Notation Common Conventions Non terminal symbols are UPPERCASE, terminal symbols are lowercase S is the axiom of the grammar Single character symbols are used (no tokenization needed) Concatenation. mark is omitted Regular expressions in the RHS of the rule are not used except for the symbol employed to shorten the notation

Recognizer automaton Equivalences Grammar Language Generic Recognizer Type Class Rule Form Automaton 3 Regular A ab B ɛ FSA 2 Context Free A ABC NPDA 1 Context-sensitive α β, α β lin-bounded TM 0 Recursively enumerable any α β TM

First examples Warming up Regular: L = (aa) Sample grammar generating the language S ɛ (zero is even) S aa (when the first a is produced...) A as (make a pair and continue, or...) A a (make a pair and stop) Context Free: L = a n b n c m a m ; n 0, m 1 Sample grammar generating the language S S 1 S 2 (concatenation is easy) S 1 as 1 b ɛ (grow the pairs from within) S 2 cs 2 a ca (avoid generating no ca pairs)

Union With Grammars Union of languages is straightforward too! Big/Little Endian Encodings Consider the language L defined on the alphabet {0, 1, a}, as the union of 2 sublanguages L l and L b : L l = {(na n ) + 0 n 3} Here, n is written with little endian binary representation (first digit is the least significant one) L b = {(na n ) + 0 n 3} Here, n is written with big endian binary representation (first digit is the most significant one) For instance, for a sequence of 2 a: Little endian: 01aa Big Endian: 10aa

Union With Grammars Grammar for L l S l 0Z 1U Z 0N 0 1N 2 U 0N 1 1N 3 N 0 ɛ S l N 1 an 0 N 2 an 1 N 3 an 2 Grammar for L b S b 0Z 1U Z 0N 0 1N 1 U 0N 2 1N 3 N 0 ɛ S b N 1 an 0 N 2 an 1 N 3 an 2 Grammar for L = L l L b S S l S b

Union With Grammars Problem: There are conflicting non-terminal symbols in the sub-grammars! Hence, the grammar generates strings as 01a10a: S S b S b 0Z 0Z 01N 1 01N 1 01aN 0 01aN 0 01aS b 01aS b 01a1U 01a1U 01a10N 1 01a10N 1 01a10aN 0 01a10N 0 01a10a This derivation is possible since the merging operation transforms the U rule in: U 0N 1 0N 2 1N 3 Before performing union, the sets of non-terminal symbols of the sub-grammars must be disjoint

Union With Grammars L is a regular language defined on the alphabet Σ = {a, b, 0, 1} as: L 1 = {x = a.y y Σ y 0 = 2k + 1 y 1 = 2h, h, k 0} L 2 = {x = b.y y Σ y 0 = 2k y 1 = 2h + 1, h, k 0} L 3 = {x = (0 1).y y Σ } Idea: We can easily handle the 3 sub-languages with 3 sub-grammars, with axioms S 1, S 2, and S 3, and then choose among these sub-languages depending on the first character: Grammar for L S as 1 bs 2 0S 3 1S 3

Union With Grammars Grammar For L 1 S 1 as 1 bs 1 0O e 1E o O e 0S 1 1O o ao e bo e ɛ E o 0O o 1S 1 ae o be o O o 0E o 1O e ao o bo o Grammar For L 2 S 2 as 2 bs 2 0O 2 e 1E 2 o O 2 e 0S 2 1O 2 o ao 2 e bo 2 e E 2 o 0O 2 o 1S 2 ae 2 o be 2 o ɛ O 2 o 0E 2 o 1O 2 e ao o bo 2 o Grammar for L 3 is easier: S 3 as 3 bs 3 0S 3 1S 3 ɛ

A Left-Linear Version The grammar for the language L can be written in a more compact form using left-linear productions. With a left-linear grammar, we generate a string from the end to the beginning! Therefore, right linear grammars are often easier to be conceived Idea for this language: generates the string up to the second character, then allows replacement of the non-terminal symbol based on the parities of 0 and 1: 1 S Sa Sb O e 0 E o 1 0 1. The first character must be 0 or 1 2 O e O e a O e b S0 O o 1 a 0 1. The first character cannot be b 3 E o E o a E o b O o 0 S1 b 0 1. The first character cannot be a 4 O o O o a O o b E o 0 O e 0 0 1. The first character must be 0 or 1

Simplifying a Grammar Consider the language L = {s = c m.y y L Y m > 0} L Y = {a m.x x L X m > 0} ɛ L X = {b m.y y L Y m > 0} ɛ A Grammar for L 1 S cs cy 2 Y aa ax ɛ 3 X bb by ɛ 4 A aa ax 5 B bb by

Simplifying a Grammar: Transforming to FSA We can generate a FSA from a grammar: Q = V n q f I = V t δ(q, i) = q q i q P δ(q, i) = q f q i P F = {q q ɛ P} q f ND-FSA for L b c a start S Y A a B c b a X a b b

Simplifying a Grammar: Determinization Generally, it is likely that the automaton derived from a grammar is non-deterministic We can make it deterministic with the known algorithm: Deterministic FSA for L c a b start S c SY a AX b BY a

Simplifying a Grammar: FSA Minimization Given a FSA, it is always possible to get the FSA recognizing the same language with the minimum number of states There is an algorithm, which is based on the concept of indistinguishable states Indistinguishable States Given a FSA, 2 states q, q Q are indistinguishable if: q F q F i I(δ(q, i) = δ(q, i)) Idea of the algorithm: 1 Search for indistinguishable states 2 If there are indistinguishable states, merge them in a unique state and go back to 1 3 Otherwise, the minimum automaton has been reached NB Indistinguishability is transitive!

Simplifying a Grammar: FSA Minimization Are there any indistinguishable states in our FSA? Yes! AX and BY! Indeed: 1 AX F BY F 2 δ(ax, a) = δ(by, a) = AX 3 δ(by, b) = δ(ax, b) = BY Minimum Deterministic FSA for L c a, b c a start S SY F

Simplifying a Grammar: Getting Back to the Grammar Once we have our minimum deterministic FSA, we can transform it back to a grammar: V n = Q V t = I q i q P δ(q, i) = q q ɛ P q F Simplified Grammar for L 1 S cs y 2 S y cs y af ɛ 3 F af bf ɛ

Counting with grammars Position independent counting Target language L = (a b) +, x L, x a = x b Context-free language, we only need to count one kind of symbols S aabg bbag G aabg bbag ɛ A aab bba ɛ G B aab bba ɛ G or, in a more compact form, S agbg bgag G agbg bgag ɛ The arbitrary choice of the production rule allows to generate every combination

Counting With Grammars Example of derivations for the string bbaabaaababb: S bbag. It starts with a sequence of b bbag bbbaag. Dealing with the first sequence of b bbbaag bbaag. The sequence is finished, thus get rid of B bbaag bbaabbag. Another pair starting with b bbaabbag bbaabag.the sequence is finished, thus get rid of B bbaabag bbaabaaabg. This time we have a sequence of a bbaabaaabg bbaabaaaabbg. Dealing with the sequence of a bbaabaaaabbg bbaabaaagbbg. There are other sequences between the second a and the matching b, thus we need a new non-terminal G before the b bbaabaaagbbg bbaabaaabbagbbg. We need to generate a pair ba. bbaabaaabbagbbg bbaabaaabagbbg. The sequence is finished, thus get rid of B. bbaabaaabagbbg bbaabaaababb. We generate all the necessary A and B, thus we can get rid of G.

The Generative Approach of Grammars Consider the language L = {a m b n m n, m, n 0}. We want to write a grammar for this language. A Possible Solution 1 S asb aa bb (Generate balanced a, b pairs until an unbalanced character is generated) 2 A aa aab ɛ (Generate balanced and unbalanced a) 3 B bb abb ɛ (Generate balanced and unbalanced b) Basic idea of the design: non-terminal symbols A and B allows to recognize a string respectively with m n and with m n. In order to generate an A or B symbol, we generate an unbalanced a or b, thus ensuring that the generated strings have either m > n or m < n.

The Generative Approach of Grammars Grammars are a generative model! We do not need to allow a grammar to generate both balanced and unbalanced characters, but we can decide to split the generated string in 2 parts: the first where a and b are balanced, the second when we add either a or b. A Simplified Solution 1 S asb aa bb (Generate balanced a until an unbalanced character is generated) 2 A aa ɛ (Add unbalanced a after balanced a) 3 B bb ɛ (Add unbalanced b before balanced b) Easier to design the grammar, since we have two simpler separated generation phases

The Generative Approach of Grammars Consider the language: L = {ab n 1 ab n 2... ab n k i, n i > 0 k 2 j, h(1 j < h k n j = n h )} That is, the strings of the form (ab + )(ab + ) + where at least two substrings (ab + ) have the same number of b. A Grammar For L S GaXG (This defines the structure of a string) X bxb bgab (Generate the substrings with the same number of b) G ah ɛ (Generate a sequence (possibly empty) of substrings ab + ) H bh bg (Generate b + ) The strategy is again forcing the grammar to generate a substring with the require property (2 substrings with the same number of b) somewhere on the string, but without specifying when this substring should be generated Sooner or later the substring will be generated, and this is enough! Again, grammars have more free will than automata

Proofs on grammars Mathematical induction on generation Goal: prove that S 1S1 0S0 1 0 ɛ generates all, and only, the palindromes over Σ = {0, 1} The theorem we want to prove is a double implication: 1 x = w(0 1 ɛ)w R, w Σ = S x 2 S x = x = w(0 1 ɛ)w R, w Σ We will use thus two different induction steps : 1 Since the hypothesis is defined on words, induction on word length 2 Since the hypothesis is defined on the grammar, induction on the number of productions

Proofs on grammars - Part 1 Is a palindrome is generated by the grammar Base Case : ɛ (length 0) is a palindrome ɛ is generated by the grammar Induction Step: the theorem holds for x = k 1, k N, prove that it holds for x = k Split into two cases : 1 k is odd: x = u(0 1)u R. 2 k is even: x = ww R.

Proofs on grammars - Part 1 1 x = u(0 1)u R. By induction hypothesis, S uu R. We argue that S usu R. Indeed, the string uu R has an even number of characters, and since all the productions rewriting S generates 2 characters, we cannot stop generation with the productions S 1 or S 0, since we would get a string with an odd number of characters. Therefore, S ɛ must be employed to get uu R, in turn implying that the grammar generates the string usu R. We can get x from this string by applying one of the productions S 1 or S 0. 2 x = ww R = uaau R, a {0, 1}. By induction hypothesis, S uau R. From the previous case, we know also S usu R, since uau R is necessarily obtained through applying one of the S a productions. Then, if we apply one of the S asa productions and S ɛ one, we can generate x

Proofs on grammars - Part 2 Is generated by the grammar is a palindrome Base Case : the productions S 0 1 ɛ ɛ, 0, 1 are palindromes Induction Step: the theorem holds for S x w, x < k N, prove that it holds for S k w We need to check that all the grammar productions preserve the palindrome property. For a 0, 1, by inductive hypothesis: S k 1 x, x L = S k 2 wsw R = S k 1 wasaw R S k 1 wasaw R = S k waaw R L (using S ɛ). S k 1 wasaw R = S k wa(0 1)aw R L (using S 0 or S 1) All the possible productions leading to a valid word at k steps preserve the palindrome property, thus the theorem holds for k