Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Similar documents
Einführung in die Computerlinguistik

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Chapter 4: Context-Free Grammars

Context-Free Grammars: Normal Forms

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

CS 373: Theory of Computation. Fall 2010

Context Free Grammars

CPS 220 Theory of Computation

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Theory of Computation - Module 3

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

Context-Free Grammar

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

Context-Free Grammars and Languages. Reading: Chapter 5

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Context-Free Grammars and Languages

Context-free Grammars and Languages

MTH401A Theory of Computation. Lecture 17

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Context Free Languages and Grammars

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

CS375: Logic and Theory of Computing

This lecture covers Chapter 7 of HMU: Properties of CFLs

Foundations of Informatics: a Bridging Course

Chomsky and Greibach Normal Forms

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

Pushdown Automata. Reading: Chapter 6

Theory Of Computation UNIT-II

Lecture 11 Context-Free Languages

Mildly Context-Sensitive Grammar Formalisms: Embedded Push-Down Automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Introduction to Formal Languages, Automata and Computability p.1/42

Properties of Context-free Languages. Reading: Chapter 7

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

Computational Models - Lecture 4 1

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Context Free Grammars: Introduction

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Computational Models - Lecture 4 1

Chapter 5: Context-Free Languages

5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Functions on languages:

Introduction to Theory of Computing

MA/CSSE 474 Theory of Computation

Automata Theory CS F-08 Context-Free Grammars

Section 1 (closed-book) Total points 30

Properties of context-free Languages

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Computational Models - Lecture 5 1

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

Concordia University Department of Computer Science & Software Engineering

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Properties of Context-Free Languages

Pushdown Automata: Introduction (2)

The View Over The Horizon

CISC 4090 Theory of Computation

CPSC 313 Introduction to Computability

This lecture covers Chapter 5 of HMU: Context-free Grammars

Simplification and Normalization of Context-Free Grammars

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Grammars and Context-free Languages; Chomsky Hierarchy

Context-Free Grammars (and Languages) Lecture 7

INSTITUTE OF AERONAUTICAL ENGINEERING

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

Computational Models - Lecture 5 1

Solution. S ABc Ab c Bc Ac b A ABa Ba Aa a B Bbc bc.

Theory of Computer Science

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Solutions to Problem Set 3

Chap. 7 Properties of Context-free Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Notes for Comp 497 (454) Week 10

Computational Models - Lecture 4

FLAC Context-Free Grammars

5 Context-Free Languages

Parsing. Left-Corner Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 17

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

CISC 4090 Theory of Computation

The Pumping Lemma for Context Free Grammars

Parsing. Weighted Deductive Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Non-context-Free Languages. CS215, Lecture 5 c

Tree Adjoining Grammars

An automaton with a finite number of states is called a Finite Automaton (FA) or Finite State Machine (FSM).

CA320 - Computability & Complexity

Computational Models - Lecture 3

Transcription:

Parsing Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26

Table of contents 1 Context-Free Grammars 2 Simplifying CFGs Removing useless symbols Eliminating ε-rules Removing unary rules 3 CFG normal forms Chomsky Normal Form Greibach Normal Form 2 / 26

Grammar Formalisms (again) Type 1/2/3 grammars A type 0 grammar is called context-sensitive (or of type 1) if for all productions α β, α β holds. The only exception is S ε which is allowed if S does not appear in any righthand side. context-free (or of type 2) if for all productions α β, α N. regular (or of type 3) if for all productions α β, α N and β T or β = β X with β T, X N. The type 1/2/3 languages are the languages generated by the corresponding grammars. The hierarchy of the type 0, 1, 2 and 3 languages is called the Chomsky Hierarchy. 3 / 26

CFG CFG A context-free grammar (CFG) is a tuple G = N, T, P, S such that: N and T are disjoint alphabets, the nonterminals and terminals S N is the start symbol P is a set of productions of the form A β with A N, β (N T) Any β (N T) with S β is called a sentential form. 4 / 26

CFG CFG parse tree A tree t is a parse tree for a CFG G = N, T, P, S iff each node in t is labeled with an α N T {ε} the root label is S if there is a node with label A that has n daughters labeled (from left to right) α 1,..., α n, then A α 1... α n P if a node has label ε, it is a leaf and the unique daughter of its mother node S β in G iff there is a parse tree for G with yield β. 5 / 26

CFG Languages generated by a CFG Let G = N, T, P, S be a CFG The tree language is the set of all parse trees with root label S and all leaves labelled with a T {ε}. The string language L(G) of G is the set {w T S w} where 1 for w, w (N T) : w w iff there is a A β P and there are v, u (N T) such that w = vau and w = vβu. 2 is the reflexive transitive closure of. A derivation of a word w T is a sequence S α 1 α 2 w of derivation steps leading to w. 6 / 26

CFG For a single parse tree, there might be more than one corresponding derivation. Derivations CFG G a,b = {S, A, B}, {a, b}, P, S with productions S AB BA A a as baa B b bs abb (This CFG generates the language {w {a, b} + w a = w b }.) Input w = ab. The two derivations for w are S AB ab ab and S AB Ab ab ( w a gives the number of occurrences of a in w.) 7 / 26

CFG Leftmost and rightmost derivations A derivation is called a leftmost derivation iff, in each derivation step, a production is applied to the leftmost non-terminal of the already derived sentential form rightmost derivation iff, in each derivation step, a production is applied to the rightmost non-terminal of the already derived sentential form In the preceding example, the first derivation was a leftmost derivation and the second a rightmost derivation. 8 / 26

CFG For a single word w, there might even be more than one parse tree. Ambiguous grammars A CFG giving more than one parse tree for some word w is called ambiguous. Ambiguous grammar Consider again G a,b (S AB BA, A a as baa, B b bs abb) The two parse trees for aabb are A S B A S B a a B B a S b b b A B a b 9 / 26

CFG Some languages are such that their structure is necessarily ambiguous. Natural languages are probably such cases. Inherently ambiguous A CFL L is called inherently ambiguous if each CFG G with L = L(G) is ambiguous. Inherently ambiguous language L = {a n b n c m d m n 1, m 1} {a n b m c m d n n 1, m 1} For words of the form a k b k c k d k one cannot tell wich of the two patterns is the right structure. Both are possible. 10 / 26

Removing useless symbols An important grammar clean-up one has to do sometimes is the removal of symbols (non-terminals or terminals) that cannot occur in any derivation of a word in the string language. Useful/useless symbols Let G = N, T, P, S be a CFG. An X N T is called useful if there is a derivation S αxβ w with w T useless otherwise 11 / 26

Removing useless symbols Non-terminals X can be useless because they do not allow to derive a terminal string, i.e., there is no derivation X w with w T. Non-terminals and terminals X can be useless because they cannot be reached from the start symbol, i.e., there is no derivation S αx. Useless symbols 1 G = {S, A, B, C}, {a, b, c}, P, S with P = {S AB as, A aaac c, B acc} Useless: B and C since from them one cannot derive any terminal string. 2 G = {S, A, B, C}, {a, b, c}, P, S with P = {S AB as, A aaac c, B ac, C b} Useless: C and b since they are not reachable from S. 12 / 26

Removing useless symbols To eliminate all useless symbols, two things need to be done. 1 All X N need to be eliminated that cannot lead to a terminal sequence. This can be done recursively: Starting from the terminals and following the productions from right to left, the set of all symbols leading to terminals can be computed recursively. Productions containing symbols that are not in this set are eliminated. 2 In the resulting CFG, the unreachable symbols need to be eliminated. This is done starting from S and applying productions. Each time, the symbols from the right-hand sides are added. Again, productions containing non-terminals or terminals that are not in the set are eliminated. 13 / 26

Eliminating ε-rules Let G = N, T, P, S. A production of the form A ε is called an ε-production. The following holds: For each CFG G, there is a CFG G without ε-productions such that L(G ) = L(G) \ {ε}. Removing ε-productions G = {S, T}, {a, b, c, d}, P, S with P = {S asb atb, T ctd ε} Equivalent ε-free CFG: G = {S, T}, {a, b, c, d}, P, S with P = {S asb atb ab, T ctd cd} 14 / 26

Eliminating ε-rules In order to eliminate ε-productions, we Compute the set N ε = {A A ε} recursively 1 N ε := {A N A ε} 2 For all A with A α, α N ε : add A to N ε 3 Repeat 2. until N ε does not change any more Delete the ε-productions and for each A X 1... X n : add all productions one can obtain by deleting some X j N ε from the right-hand side as long as one does not delete all X 1,..., X n. 15 / 26

Removing unary rules Let G = N, T, P, S. For A, B N, a production of the form A B is called a unary production For each CFL that does not contain ε-rules, a CFG without unary productions can be found. Removing unary rules G = {S, T}, {a, b, c, d}, P, S with P = {S asb T, T ctd cd} Equivalent CFG without unary rules: G = {S, T}, {a, b, c, d}, P, S with P = {S asb ctd cd, T ctd cd} Elimination of unary productions for a CFG without ε-productions: 1 For all A B and all B β, β / N: add A β to the set of productions 2 Delete all unary productions 16 / 26

Chomsky Normal Form A normal form of a grammar formalism F is a further restriction on the grammars in F that does not affect the set of generated string languages. There are two important normal forms for CFGs. CFG normal forms A CFG G = N, T, P, S for a language without ε-rules is 1 in Chomsky normal form iff all productions have either the form A BC or A a with A, B, C N, a T 2 in Greibach normal form iff all productions have the form A aα with a T, α N 17 / 26

Chomsky Normal Form For each CFL L without ε, there is a CFG in Chomsky normal form with L = L(G). Construction of an equivalent CFG in CNF for a given CFG G = N, T, P, S 1 Eliminate useless symbols 2 Eliminate ε-productions 3 Eliminate unary productions 4 For each a T: introduce new non-terminal T a, replace a by T a in all right-hand sides of length > 1 and add production T a a to P 5 For each production A B 0... B n introduce new non-terminals X 1,..., X n 1 and replace production A B 0... B n with A B 0 X 1 X 1 B 1 X 2... X n 1 B n 1 B n 18 / 26

Chomsky Normal Form A B 0 X 1 A B 0... B n B 1 X 2 X n 1 B n 1 B n 19 / 26

Chomsky Normal Form Transformation to CNF G = {S, C}, {a, b, c, }, {S asbc ab, C c cc}, S. 1 Introducing new preterminals: G = {S, C}, {a, b, c, }, P, S with productions S T a ST b C T a T b, C c T c C, T a a, T b b, T c c 2 Binarization: G = {S, C}, {a, b, c, }, P, S with productions S T a X 1 T a T b, X 1 SX 2, X 2 T b C, C c T c C, T a a, T b b, T c c 20 / 26

Greibach Normal Form For each CFL L without ε, there is a CFG in Greibach normal form with L = L(G). Construction of the CFG in GNF for a given CFG G = N, T, P, S 1 Eliminate useless symbols 2 Eliminate unary productions 3 Eliminate ε-productions 4 Left-recursion is eliminated, i.e., a grammar is constructed that does not allow derivations of the form A + Aα 21 / 26

Greibach Normal Form Elimination of left-recursion Assume the set of non-terminals to be ordered, i.e. N = {A 1,..., A m } Construct a CFG with j > i if A i A j γ: For all A i, 1 i m, steps (I) and (II) are done. (I) Transformation such that j i if A i A j γ: Replace all productions A i A j γ with j < i with new productions obtained from replacing A j with all right-hand sides of A j -productions. Do this until the condition holds for all A i - productions. 22 / 26

Greibach Normal Form (II) Elimination of left-recursive productions A i A i α: Add a new non-terminal B and replace A i A i α 1,..., A i A i α r, A i β 1,..., A i β s with and A i β i, A i β i B B α i, B α i B i, 1 i s i, 1 i r (Left recursion is turned into right recursion) In the resulting grammar, for all A i + Aj α, i < j holds. 23 / 26

Greibach Normal Form Transformation to GNF G = N, T, P, S, N = {A 1, A 2, A 3, B}, T = {a, b} A 1 A 2 A 3 A 2 A 3 A 1 b A 3 A 1 A 2 a A 1 A 2 A 3 A 2 A 3 A 1 b A 3 A 2 A 3 A 2 a A 1 A 2 A 3 A 2 A 3 A 1 b A 3 A 3 A 1 A 3 A 2 ba 3 A 2 a Replace A 3 A 3 A 1 A 3 A 2 ba 3 A 2 a by A 3 ba 3 A 2 A 3 a A 3 ba 3 A 2 B A 3 ab B A 1 A 3 A 2 B A 1 A 3 A 2 B The resulting grammar does not allow left-recursive derivations. 24 / 26

Greibach Normal Form Lexicalization Now, two more steps are necessary to obtain the grammar in GNF: 5 For i = m 1 to i = 1: Replace all A i A j β with all productions obtained by replacing A j with the right-hand side of a A j -production Then, do the same for all B-productions where B is one of the symbols introduced in step (II) 6 For all productions A αaβ, α ε: replace a with a new non-terminal T a and add a production T a a to P. 25 / 26

Greibach Normal Form Transformation to GNF (cont d) A 1 A 2 A 3 A 2 A 3 A 1 b A 3 ba 3 A 2 a ba 3 A 2 B ab B A 1 A 3 A 2 A 1 A 3 A 2 B Replace 1 A 2 A 3 A 1 by A 2 ba 3 A 2 A 1 aa 1 ba 3 A 2 BA 1 aba 1 2 A 1 A 2 A 3 by A 1 ba 3 A 2 A 1 A 3... aba 1 A 3 ba 3 3 B A 1 A 3 A 2 by B ba 3 A 2 A 1 A 3 A 3 A 2... ba 3 A 3 A 2 4 B A 1 A 3 A 2 B by B ba 3 A 2 A 1 A 3 A 3 A 2 B... ba 3 A 3 A 2 B 26 / 26