Context Free Grammars

Similar documents
CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

CISC4090: Theory of Computation

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

FLAC Context-Free Grammars

Theory of Computation (II) Yijia Chen Fudan University

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Computational Models - Lecture 4

Computational Models - Lecture 3

Computational Models - Lecture 4 1

CPS 220 Theory of Computation

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Foundations of Informatics: a Bridging Course

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Einführung in die Computerlinguistik

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Context-Free Grammars and Languages. Reading: Chapter 5

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Computational Models - Lecture 4 1

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

MA/CSSE 474 Theory of Computation

Properties of Context-Free Languages. Closure Properties Decision Properties

Context-Free Grammars (and Languages) Lecture 7

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Lecture 11 Context-Free Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

NPDA, CFG equivalence

Introduction to Theory of Computing

MTH401A Theory of Computation. Lecture 17

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Context-Free Grammars and Languages

Context-free Grammars and Languages

Chomsky Normal Form and TURING MACHINES. TUESDAY Feb 4

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

What we have done so far

Section 1 (closed-book) Total points 30

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

CISC 4090 Theory of Computation

Notes for Comp 497 (454) Week 10

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

CS 373: Theory of Computation. Fall 2010

CISC 4090 Theory of Computation

Miscellaneous. Closure Properties Decision Properties

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

This lecture covers Chapter 7 of HMU: Properties of CFLs

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

Theory of Computation - Module 3

The View Over The Horizon

Context Free Languages and Grammars

Properties of Context-Free Languages

Computability Theory

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Theory Of Computation UNIT-II

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Chapter 5: Context-Free Languages

This lecture covers Chapter 5 of HMU: Context-free Grammars

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

Theory of Computation

Chapter 6. Properties of Regular Languages

INTRODUCTION TO LOGIC. Propositional Logic. Examples of syntactic claims

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:

The Pumping Lemma for Context Free Grammars

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Non-context-Free Languages. CS215, Lecture 5 c

5 Context-Free Languages

Functions on languages:

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

Ogden s Lemma for CFLs

Chapter 4: Context-Free Grammars

Automata Theory CS F-08 Context-Free Grammars

10. The GNFA method is used to show that

CSE 105 THEORY OF COMPUTATION

Pushdown Automata. Reading: Chapter 6

Einführung in die Computerlinguistik

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

CS375: Logic and Theory of Computing

CS481F01 Prelim 2 Solutions

Properties of Context-free Languages. Reading: Chapter 7

6.1 The Pumping Lemma for CFLs 6.2 Intersections and Complements of CFLs

Grammars and Context-free Languages; Chomsky Hierarchy

Chapter 3. Regular grammars

MA/CSSE 474 Theory of Computation

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Context-Free Grammar

Chomsky and Greibach Normal Forms

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Computational Models - Lecture 5 1

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

Automata: a short introduction

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

Transcription:

Automata and Formal Languages Context Free Grammars Sipser pages 101-111 Lecture 11 Tim Sheard 1

Formal Languages 1. Context free languages provide a convenient notation for recursive description of languages. 2. The original goal of formalizing the structure of natural languages is still elusive, but CFGs are now the universally accepted formalism for definition of (the syntax of) programming languages. 3. Writing parsers has become an almost fully automated process thanks to this theory.

A Simple Grammar for English Example taken from Floyd & Beigel. <Sentence> <Subject> <Predicate> <Subject> <Pronoun1> <Pronoun2> <Pronoun1> I we you he she it they <Noun Phrase> <Simple Noun Phrase> <Article> <Noun Phrase> <Article> a an the <Simple Noun Phrase> <Noun> <Adjective> <Simple Noun Phrase> <Predicate> <Verb> <Verb> <Object> <Object> <Pronoun2> <Noun Phrase> <Pronoun2> me us you him her it them <Noun>... <Verb>... 1. Each rule is called a production 2. lhs lhs lhs is a symbol, rhs is a string of symbols 3. Variable (or non-terminal) < > 4. Symbol (or terminal) bold

Example Derive the sentence She drives a shiny black car from these rules. sentence subject predicate pronoun1 verb object Noun phrase article Simple Noun phrase adjective Simple Noun phrase adjective Simple noun phrase Noun She drives a shiny black car

Derivation rules 1. Write down the start symbol (lhs of the first rule, unless otherwise stated) 2. Find a variable, v, and a corresponding rule where: v rhs 3. Replace the variable with the string rhs 4. Repeat, starting at 2. until no more variables remain.

Derivation <sentence> <subject> <predicate> <pronoun> <predicate> She <predicate> She <verb> <object> She drives <object> She drives <simple noun phrase> She drives <article> <noun phrase> She drives a <noun phrase> She drives a <adjective> <noun phrase> She drives a shiny <noun phrase> I Have underlined the variable I will replace in the next step. She drives a shiny <adjective> <simple noun phrase> She drives a shiny black <simple noun phrase> She drives a shiny black <noun> She drives a shiny black car

A Grammar for Expressions <Expression> <Term> <Expression> + <Term> <Term> <Factor> <Term> * <Factor> <Factor> <Identifer> ( <Expression> ) 3 <Identifier> x y z In class exercise: Derive x + ( y * 3) x + z * w + q

Definition of Context-Free-Grammars A CFG is a quadruple G= (V,Σ,R,S), where V is a finite set of variables (nonterminals, syntactic categories) Σ is a finite set of terminals R is a finite set of productions -- rules of the form X a, where X V and a (V Σ) * S, the start symbol, is an element of V Vertical bar ( ), as used in the examples on the previous slide, is used to denote a set of several productions (with the same lhs).

Example <Expression> <Term> <Expression> + <Term> <Term> <Factor> <Term> * <Factor> <Factor> <Identifer> ( <Expression> ) 3 <Identifier> x y z V = {<Expression>, <Term>, <Factor>, <Identifier>} Σ = {+, *, (, ), x, y, z, 3, } R = { } <Expression> <Term> <Expression> <Expression> + <Term> <Term> <Factor> <Term> <Term> * <Factor> <Factor> <Identifer> <Factor> ( <Expression> ) <Identifier> x <Identifier> y <Identifier> z <Identifier> S = <Expression>

Notational Conventions a,b,c, (lower case, beginning of alphabet) are concrete terminals; u,v,w,x,y,z (lower case, end of alphabet) are for strings of terminals α,β,γ, (Greek letters) are for strings over (T V) (sentential forms) A,B,C, (capitals, beginning of alphabet) are for variables (for non-terminals). X,Y,Z are for variables standing for terminals.

Short-hand Note. We often abbreviate a context free grammar, such as: G 2 = ( V={S}, Σ={(,)}, R={ S ε, S SS, S (S) }, S=S} By giving just its productions S ε SS (S) And by using the following conventions. 1) The start symbol is the lhs of the first production. 2) Multiple production for the same lhs non-terminal can be grouped together by using vertical bar ( ) 3) Non-terminals are capitalized. 4) Terminal-symbols are lower case or non-alphabetic.

Definitions The single-step derivation relation on (V T) * is defined by: 1. α β iff β is obtained from α by replacing an occurrence of the lhs of a production with its rhs. That is, α'aα'' α'γα'' is true iff A γ is a production. We say α'aα'' yields α'γα'' 2. We write α β when β can be obtained from α through a sequence of several (possibly zero) derivation steps. 3. The language of the CFG, G, is the set L(G) = {w T * S w} (where S is the start symbol of G) 1. Context-free languages are languages of the form L(G)

Example 1 The familiar non-regular language L = { a k b k k 0 } is context-free. The grammar G 1 for it is given by T={a,b}, V={S}, and productions: 1. S ε 2. S a S b Here is a derivation showing a 3 b 3 L(G): S 2 asb 2 aasbb 2 aaasbbb 1 aaabbb (Note: we sometimes label the arrow with a subscript which tells the production used to enable the transformation)

Example 1 continued Note, however, that the fact L=L(G 1 ) is not totally obvious. We need to prove set inclusion both ways. To prove L L(G 1 ) we must show that there exists a derivation for every string a k b k ; this is done by induction on k. For the converse, L(G 1 ) L, we need to show that if S w and w T *, then w L. This is done by induction on the length of derivation of w.

Example 2 The language of balanced parentheses is context-free. It is generated by the following grammar: G 2 = ( V={S}, Σ={(,)}, R={ S ε SS (S)}, S=S}

Example 3 Consider the grammar: S AS ε A 0A1 A1 01 The derivation: S AS A1S 011S 011AS 0110A1S 0110011S 0110011 shows that 0110011 L(G 3 ).

Example 3 notes The language L(G 3 ) consists of strings w {0,1} * such that: P(w): Either w=ε, or w begins with 0, and every block of 0's in w is followed by at least as many 1's Again, the proof that G 3 generates all and only strings that satisfy P(w) is not obvious. It requires a two-part inductive proof.

Leftmost and Rightmost Derivations The same string w usually has many possible derivations S a 0 a 1 a 2 a n w We call a derivation leftmost if in every step a i a i+1, it is the first (leftmost) variable in a i $ that is being replaced with the rhs of a production. Similarly, in a rightmost derivation, it is always the last variable that gets replaced. The above derivation of the string 0110011 in the grammar G 3 is leftmost. Here is a rightmost derivation of the same string: S AS AAS AA A0A1 A0011 A10011 0110011 S AS ε A 0A1 A1 01

Facts Every Regular Language is also a Context Free Language How might we prove this? Choose one of the many specifications for regular languages Show that every instance of that kind of specification has a total mapping into a Context Free Grammar What is an appropriate choice?

Designing CFGs Break the language into simpler (disjoint) parts with Grammars A B C. Then put them together S Start A Start B Start C If a Language fragment is Regular construct a DFA or RE, use these to guide you. Infinite languages use rules like R a R R R b Languages with linked parts use rules like R B x B R x R x

In Class Exercise Map the Haskell Regular Expression datatype into a Context Free language. data RegExp a = Lambda -- the empty string "" Empty -- the empty set One a -- a singleton set {a} Union (RegExp a) (RegExp a) -- union of two RegExp Cat (RegExp a) (RegExp a) -- Concatenation Star (RegExp a) -- Kleene closure

Find CFG for these languages {a n b a n n Є Nat} { w w Є {a,b}*, and w is a palindrome of even length} {a n b k n,k Є Nat, n k} {a n b k n,k Є Nat, n k} { w w Є {a,b}*, w has equal number of a s and b s }

Ambiguity A language is ambiguous if one of its strings has 2 or more leftmost (or rightmost) derivations. Consider the grammar 1. E -> E + E 2. E -> E * E 3. E -> x y And the string: x + x * y E 1 E + E 3 x + E 2 x + E * E 3 x + x * E 3 x + x * y E 2 E * E 1 E + E * E 3 x + E * E 3 x + x * E 3 x + x * y Note we use the productions in a different order.

Common Grammars with ambiguity Expression grammars with infix operators with different precedence levels. Nested if-then-else statements st -> if exp then st else st if exp then st id := exp if x=2 then if x=3 then y:=2 else y := 4 if x=2 then (if x=3 then y:=2 ) else y := 4 if x=2 then (if x=3 then y:=2 else y := 4)

Removing ambiguity. Adding levels to a grammar E -> E + E E * E id ( E ) Transform to an equivalent grammar E -> E + T T T -> T * F F F -> id ( E ) Levels make formal the notion of precedence. Operators that bind tightly are on the lowest levels

Chomsky Normal Form defined There are many CFG's for any given CFL. When reasoning about CFL's, it often helps to assume that a grammar for it has some particularly simple form. A grammar is in Chomsky normal form (CNF) if every rule is of the form 1. A BC where B,C are variables 1. B and C can t be S 2. A a where a is a terminal 3. S ε is allowed (only S can be nullable) Remove ε-rules A ε Remove unit rules A B Transform rules with too many symbols on rhs A B B x C

Finding a Chomsky Normal Form Every grammar has a equivalent grammar (that generates the same language) in Chomsky normal form. This grammar can be constructed using a few simple (language preserving) grammar transformations

ε-productions A variable A is nullable if A ε. We can modify a given grammar G and obtain a grammar G' in which there are no nullable variables and which satisfies L(G') = L(G) - {ε}. Find nullable symbols iteratively, using these facts: 1. If A ε is a production, then A is nullable. 2. If A B 1 B 2 B k is a production and B 1,B 2,,B k are all nullable, then A is nullable.

Once nullable symbols are known, we get G' as follows: 1. For every production A α, add new productions A α, where α is obtained by deleting some (or all) nullable symbols from α. 2. Remove all productions A ε Example. If G contains a production A BC and both B and C are nullable, then we add A B C to G'.

Unit Productions These are of the form A B, where A,B are variables. Assuming the grammar has no ε productions, we can eliminate unit productions as follows. 1. Find all pairs of variables such that A B. (This happens iff B can be obtained from A by a chain of unit productions.) 2. Add new production A α whenever A B α. 3. Remove all unit productions.

Theorem. For every CFG G, there exists a CFG G' in CNF such that L(G')=L(G) - {ε} The first three steps of getting G' are elimination of ε-productions, elimination of unit productions, and elimination of useless symbols (in that order). There remain two steps: 1. Arrange that all productions are of the form A α, where α is a terminal, or contains only variables. 2. Break up every production A α with α >2 into productions whose rhs has length two.

For the first part, introduce a new variable C for each terminal c that occurs in the rhs of some production, add the production C c (unless such a production already exists), and replace c with C in all other productions. For example, the production A 0B1 would be replaced with A 0 0, A 1 1, A A 0 BA 1. An example explains the second part. The production A BCDE is replaced by three others, 1. A BA 1, 2. A 1 CA 2, 3. A 2 DE, using two new variables A 1, A 2.

Example page 110 Sipser S -> a s a Ab A -> B S B -> b ε Rules 1. Add new start symbol 2. Remove ε 3. Remove unit rules 4. Make all rules have 1 terminal or 2 variables