Einführung in die Computerlinguistik

Similar documents
Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Einführung in die Computerlinguistik

NPDA, CFG equivalence

Tree Adjoining Grammars

Chomsky and Greibach Normal Forms

Foundations of Informatics: a Bridging Course

Ogden s Lemma for CFLs

Properties of Context-Free Languages. Closure Properties Decision Properties

Properties of Context-Free Languages

This lecture covers Chapter 7 of HMU: Properties of CFLs

Non-context-Free Languages. CS215, Lecture 5 c

CPS 220 Theory of Computation

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Computability Theory

Properties of context-free Languages

Context Free Grammars

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

3515ICT: Theory of Computation. Regular languages

Context-Free Languages (Pre Lecture)

CS20a: summary (Oct 24, 2002)

Context Free Languages and Grammars

CS375: Logic and Theory of Computing

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Introduction to Theory of Computing

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

The View Over The Horizon

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

6.1 The Pumping Lemma for CFLs 6.2 Intersections and Complements of CFLs

Computational Models - Lecture 4 1

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Computational Models - Lecture 3

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

Before We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions?

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

CS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL)

Computational Models - Lecture 4 1

Context Free Language Properties

Theory of Computation (II) Yijia Chen Fudan University

Fall 1999 Formal Language Theory Dr. R. Boyer. Theorem. For any context free grammar G; if there is a derivation of w 2 from the

The Pumping Lemma for Context Free Grammars

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Context-Free and Noncontext-Free Languages

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

Computational Models - Lecture 4

Computational Models - Lecture 3 1

Theory Of Computation UNIT-II

CS 373: Theory of Computation. Fall 2010

AC68 FINITE AUTOMATA & FORMULA LANGUAGES JUNE 2014

CS 275 Automata and Formal Language Theory

Section 1 (closed-book) Total points 30

Even More on Dynamic Programming

CS481F01 Prelim 2 Solutions

MA/CSSE 474 Theory of Computation

CS 455/555: Finite automata

MTH401A Theory of Computation. Lecture 17

Sri vidya college of engineering and technology

Computational Models - Lecture 5 1

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Chapter 6. Properties of Regular Languages

Computational Models - Lecture 5 1

CISC4090: Theory of Computation

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Pushdown Automata: Introduction (2)

Properties of Context-free Languages. Reading: Chapter 7

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

Context-Free Grammars and Languages. Reading: Chapter 5

Chap. 7 Properties of Context-free Languages

Parsing Regular Expressions and Regular Grammars

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

Miscellaneous. Closure Properties Decision Properties

CS311 Computational Structures More about PDAs & Context-Free Languages. Lecture 9. Andrew P. Black Andrew Tolmach

Properties of Context Free Languages

Mildly Context-Sensitive Grammar Formalisms: Embedded Push-Down Automata

Theory of Computation (Classroom Practice Booklet Solutions)

Automata Theory. CS F-10 Non-Context-Free Langauges Closure Properties of Context-Free Languages. David Galles

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

The Pumping Lemma and Closure Properties

Computational Models: Class 3

DD2371 Automata Theory

What we have done so far

Theory of Computation - Module 3

Chapter 4: Context-Free Grammars

Complexity of Natural Languages Mildly-context sensitivity T1 languages

Context-Free Languages

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

Parikh s theorem. Håkan Lindqvist

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

Theory of Languages and Automata

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Transcription:

Einführung in die Computerlinguistik Context-Free Grammars formal properties Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2018 1 / 20

Normal forms (1) Hopcroft and Ullman (1979) A normal form of a grammar formalism F is a further restriction on the grammars in F that does not affect the set of generated string languages. Let G = N, T, P, S be a CFG. A X N T is called useful if there is a derivation S αxβ w with w T. useless otherwise. For each CFG, there exists an equivalent CFG (= a CFG generating the same string language) without useless symbols. 2 / 20

Normal forms (2) To eliminate all useless symbols two things need to be done: 1 All X N need to be eliminated that cannot lead to a terminal sequence. This can be done recursively: Starting from the terminals and following the productions from right to left, the set of all symbols leading to terminals can be computed recursively. Productions containing symbols that are not in this set are eliminated. 2 In the resulting CFG, the unreachable symbols need to be eliminated. This is done starting from S and applying productions. Each time, the symbols from the right-hand sides are added. Again, productions containing non-terminals or terminals that are not in the set are eliminated. 3 / 20

Normal Forms (3) A production of the form A ɛ is called a ɛ-production. The following holds: For each CFG G, there is a CFG G without ɛ-productions such that L(G ) = L(G) \ {ɛ}. 4 / 20

Normal Forms (4) In order to eliminate ɛ-productions, we compute the set N ɛ = {A A ɛ} recursively: 1 N ɛ := {A N A ɛ}. 2 For all A with A α, α N ɛ : add A to N ɛ. 3 Repeat 2. until N ɛ does not change any more. delete the ɛ-productions and for each A X 1... X n : add all productions one can obtain by deleting some X N ɛ from the right-hand side as long as one does not delete all X 1,..., X n. 5 / 20

Normal Forms (5) A production of the form A B is called a unary production. For each CFL that does not contain ɛ, a CFG without unary productions can be found. Elimination of unary productions for a CFG without ɛ-productions: For all A B and all B β, β / N: add A β. Delete all unary productions. 6 / 20

Normal Forms (6) There are two important normal forms for CFGs: A CFG for a language without ɛ is in Chomsky normal form iff all productions have either the form A BC or A a with A, B, C N, a T. in Greibach normal form iff all productions have the form A aα with a T, α N. 7 / 20

Normal Forms (7) For each CFL L without ɛ, there is a CFG G in Chomsky normal form with L = L(G). Construction of an equivalent CFG in CNF for a given CFG (after elimination of useless symbols, ɛ-productions and unary productions): 1. For each terminal a: introduce new non-terminal C a, replace a with C a in all right-hand sides of length > 1 and add production C a a. 8 / 20

Normal Forms (8) 2. For each production A B 0... B n introduce new non-terminals D 1,..., D n 1 and replace production with productions A B 0 D 1, D 1 B 1 D 2, D 2 B 2 D 3,..., D n 1 B n 1 B n. A B 0... B n B 0 A D 1 B 1 D 2... D n 1 B n 1 B n 9 / 20

Normal Forms (9) For each CFL L without ɛ, there is a CFG G in Greibach normal form with L = L(G). For the construction see Hopcroft and Ullman (1994). 10 / 20

Closure Properties (1) CFLs are closed under union (construction: add S S 1 S 2 where S 1, S 2 the start symbols of the two CFGs; condition: non-terminals in the two CFGs pairwise disjoint) under concatenation and Kleene closure (construction as with regular languages) under homomorphisms (construction: replace terminals in productions with their images under the homomorphism) 1 under substitution (construction: replace terminals in first CFG with start symbols of corresponding CFGs, make sure nonterminals of the involved CFGs are pairwise disjoint) 1 A homomorphism in this context is a function f : Σ 1 Σ 2 such that f (w 1w 2) = f (w 1)f (w 2). 11 / 20

Closure Properties (2) CFLs are closed under intersection with regular languages (construction: take the CFG and the DFA of the regular language; then build a new CFG by replacing non-terminals A with triples q 1, A, q 2 where the triple stands for derivation of the yield of A while traversing a DFA path from q 1 to q 2 ) 12 / 20

Closure Properties (3) Example for intersection with regular languages: CFG: S asb ε, regular language a + b +. DFA: q a 0 q b 1 q 2 b a Intersection grammar, start symbol S : S q 0, S, q 2 (q 2 is the only final state) q 0, S, q 2 a q 1, S, q 1 b (with δ(q 0, a) = q 1, δ(q 1, b) = q 2) q 0, S, q 2 a q 1, S, q 2 b (with δ(q 0, a) = q 1, δ(q 2, b) = q 2) q 1, S, q 2 a q 1, S, q 2 b (with δ(q 1, a) = q 1, δ(q 2, b) = q 2) q 1, S, q 2 a q 1, S, q 1 b (with δ(q 1, a) = q 1, δ(q 1, b) = q 2) q 1, S, q 1 ε (since q 1 = q 1) 13 / 20

Pumping Lemma (1) In a context-free derivation, the expansion of a non-terminal A does not depend on the context A occurs in. Consequently, if we have a derivation S + xaz + xv 1 Av 2 z + xv 1 yv 2 z then the part A + v 1 Av 2 of the derivation can be iterated, i.e., we can also have S + xv i 1yv i 2z for any i 1. 14 / 20

Pumping Lemma (2) Looking at the derivation trees, this is even clearer: Assume that in a derivation tree, if the derivation tree has a certain minimal height (maximal length of paths from root to leaves), we have a path from the root (symbol S) to a leaf such that on this path, a non-terminal A occurs twice, and below the higher of these As, there is only a single A and no other non-terminal is repeated on any path. Since the number of non-terminals is finite, from a certain string length on, every derivation treee of a word in the language is necessarily of this form. 15 / 20

Pumping Lemma (3) The part of the derivation tree in between the two nodes with the same non-terminal can be iterated. This means that the strings yielded by this part are pumped. S A A x v 1 y v 2 z 16 / 20

Pumping Lemma (4) Pumping lemma for context-free languages: Let L be a context-free language. Then there is a constant k such that for all w L with w k: w = xv 1 yv 2 z with v 1 v 2 1, v 1 yv 2 k, and for all i 0: xv i 1 yvi 2 z L. 17 / 20

Pumping Lemma (5) With the pumping lemma and the closure properties, we can show for a lot of languages that they are not context-free: L 1 = {a n b n c n n 1} is not context-free. Proof: Assume that L 1 is context-free. Then it must satisfy the pumping lemma with some constant k. Consequently, for every w L 1 and then in particular for a k b k c k, we must find substrings v 1, v 2 that can be iterated. Either they contain each only occurrences of a single terminal. Then the iteration will yield words that have no longer the same numbers of as, bs and cs. Or at least one contains at least two different terminals. Then the iterations necessarily lead to words where the as, bs and cs get mixed. L 1 does not satisfy the pumping lemma, contrary to the assumption and therefore L 1 cannot be context-free. 18 / 20

Pumping Lemma (6) L 2 = {w w {a, b, c}, w a = w b = w c } is not context-free. Proof: Assume that L 2 is context-free. Then its intersection with the regular language a + b + c + must also be context-free. However, this intersection is L 1 = {a n b n c n n 1}, for which we have just shown that it is not context-free. Since L 1 is not context-free, our assmption is false and L 2 is not context-free either. 19 / 20

Hopcroft, J. E. and Ullman, J. D. (1979). Introduction to Automata Theory, Languages and Computation. Addison Wesley. Hopcroft, J. E. and Ullman, J. D. (1994). Einführung in die Automatentheorie, Formale Sprachen und Komplexitätstheorie. Addison Wesley, 3. edition. 20 / 20