Einführung in die Computerlinguistik

Similar documents
Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Einführung in die Computerlinguistik

Pushdown Automata. Reading: Chapter 6

Introduction to Formal Languages, Automata and Computability p.1/42

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Mildly Context-Sensitive Grammar Formalisms: Embedded Push-Down Automata

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

5 Context-Free Languages

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Foundations of Informatics: a Bridging Course

Pushdown Automata: Introduction (2)

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

Computational Models - Lecture 4 1

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

Theory of Computation - Module 3

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

CPS 220 Theory of Computation Pushdown Automata (PDA)

Push-down Automata = FA + Stack

Context-Free Grammars and Languages. Reading: Chapter 5

Computational Models - Lecture 4

Section 1 (closed-book) Total points 30

Chapter 1. Formal Definition and View. Lecture Formal Pushdown Automata on the 28th April 2009

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

Theory of Computation (IV) Yijia Chen Fudan University

Pushdown Automata (2015/11/23)

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Pushdown Automata (Pre Lecture)

Theory of Computation

Parsing. Left-Corner Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 17

CS Pushdown Automata

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

CISC 4090 Theory of Computation

UNIT-VI PUSHDOWN AUTOMATA

Fundamentele Informatica II

Computational Models - Lecture 5 1

SCHEME FOR INTERNAL ASSESSMENT TEST 3

MA/CSSE 474 Theory of Computation

Lecture 11 Context-Free Languages

Lecture III CONTEXT FREE LANGUAGES

CFGs and PDAs are Equivalent. We provide algorithms to convert a CFG to a PDA and vice versa.

Introduction to Theory of Computing

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

Parsing Regular Expressions and Regular Grammars

Accept or reject. Stack

Computational Models - Lecture 4 1

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

CISC 4090 Theory of Computation

C6.2 Push-Down Automata

Recitation 4: Converting Grammars to Chomsky Normal Form, Simulation of Context Free Languages with Push-Down Automata, Semirings

PUSHDOWN AUTOMATA (PDA)

CISC4090: Theory of Computation

CS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA)

Computability and Complexity

Context-Free Grammars and Languages

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Context-Free and Noncontext-Free Languages

Pushdown Automata. Chapter 12

5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:

3.13. PUSHDOWN AUTOMATA Pushdown Automata

Theory of Computation (II) Yijia Chen Fudan University

CPS 220 Theory of Computation

Context Free Languages and Grammars

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Pushdown Automata. Pushdown Automata. Pushdown Automata. Pushdown Automata. Pushdown Automata. Pushdown Automata. The stack

Theory of Computation

TAFL 1 (ECS-403) Unit- IV. 4.1 Push Down Automata. 4.2 The Formal Definition of Pushdown Automata. EXAMPLES for PDA. 4.3 The languages of PDA

Context-free Grammars and Languages

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

Functions on languages:

Context Free Languages: Decidability of a CFL

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

The Idea of a Pushdown Automaton

Homework. Context Free Languages. Announcements. Before We Start. Languages. Plan for today. Final Exam Dates have been announced.

CS311 Computational Structures More about PDAs & Context-Free Languages. Lecture 9. Andrew P. Black Andrew Tolmach

Theory of Computation Turing Machine and Pushdown Automata

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

MTH401A Theory of Computation. Lecture 17

What we have done so far

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

INSTITUTE OF AERONAUTICAL ENGINEERING

Automata and Computability. Solutions to Exercises

CS Rewriting System - grammars, fa, and PDA

Pushdown Automata (PDA) The structure and the content of the lecture is based on

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

Peter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages

Everything You Always Wanted to Know About Parsing

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

(pp ) PDAs and CFGs (Sec. 2.2)

NODIA AND COMPANY. GATE SOLVED PAPER Computer Science Engineering Theory of Computation. Copyright By NODIA & COMPANY

Properties of Context-Free Languages

CA320 - Computability & Complexity

Transcription:

Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22

CFG (1) Example: Grammar G telescope : Productions: S NP VP NP D N N N PP VP VP PP VP V NP PP P NP N man N girl N telescope D the NP John NP Mary P with V saw 2 / 22

CFG (2) Sentences one can generate with this grammar: (1) John saw Mary (2) John saw the girl (3) the man with the telescope saw John (4) John saw the girl with the telescope... 3 / 22

CFG (3) A context-free grammar (CFG) is a tuple G = N, T, P, S such that N and T are disjoint alphabets, the nonterminals and terminals, S N is the start symbol, and P is a set of productions of the form A β with A N, β (N T). Any β (N T) with S β is called a sentential form. 4 / 22

CFG (4) Example: CFG G a,b with productions S ab S ba A a A as A baa B b B bs B abb generates the languge {w w {a, b} +, w a = w b } 5 / 22

CFG (5) A tree t is a parse tree for a CFG G = N, T, P, S iff each node in t is labeled with a x N T {ɛ}; the root label is S; if there is a node with label A that has n daughters labeled (from left to right) x 1,..., x n, then A x 1... x n P. if a node has label ɛ, it is a leaf and the unique daughter of its mother node. S α in G iff there is a parse tree for G with yield α. 6 / 22

CFG (6) G a,b : S ab S ba A a A as A baa B b B bs B abb S B S b a S A A a A A a A a b b b S B B S A a b b B b a a 7 / 22

CFG (7) Let G = N, T, P, S be a CFG. The string language L(G) of G is the set {w T S w} where for w, w (N T) : w w iff there is a A β P and there are v, u (N T) such that w = vau and w = vβu. is the reflexive transitive closure of. A derivation of a word w T is a sequence S α 1 w of derivation steps leading to w. The tree language is the set of all parse trees with root label S and all leaves labelled with a T {ε}. 8 / 22

CFG (8) For a single parse tree, there might be more than one corresponding derivation. A derivation is called a leftmost derivation iff, in each derivation step, a production is applied to the leftmost non-terminal of the already derived sentential form. rightmost derivation iff, in each derivation step, a production is applied to the rightmost non-terminal of the already derived sentential form. 9 / 22

CFG (9) For a single word w, there might be more than one parse tree: A CFG giving more than one parse tree for some word w is called ambiguous. Example: G telescope with w = John saw the man with the telescope G a,b with w = aabbab A CFL L is called inherently ambiguous if each CFG G with L = L(G) is ambiguous. Example: {a n b n c m d m n 1, m 1} {a n b m c m d n n 1, m 1} 10 / 22

PDA (1) A push-down automaton is a FSA with an additional stack. The moves of the automaton depend on the current state, the next input symbol, and the topmost stack symbol. Each move consists of changing state, popping the topmost symbol from the stack, and pushing a new sequence of symbols on the stack. 11 / 22

PDA (2) Example: Automaton that starts with q 1 and stack #, in q 1 : pushes A on the stack for an input symbol a, in q 1 : leaves stack unchanged and goes to q 2 for an input symbol c, in q 2 : pops an A from the stack for input symbol b, in q 2 : moves to q 3 if the top of the stack is # The automaton accepts all words that allow to end up in q 3. Language {a n cb n n 0} 12 / 22

PDA (3) In general, PDAs are non-deterministic, since a given state, input symbol and topmost stack symbol can allow for more than one move. In contrast to FSA, the deterministic version of the automaton is not equivalent to the non-deterministic one: There are languages that are accepted by a non-deterministic PDA but not by any deterministic PDA. CFLs are the languages accepted by (non-deterministic) PDAs. 13 / 22

PDA (4) A push-down automaton (PDA) M is a tuple Q, Σ, Γ, δ, q 0, Z 0, F with Q is a finite set of states. Σ is a finite set, the input alphabet. Γ is a finite set, the stack alphabet. q 0 Q is the initial state. Z 0 Γ is the initial stack symbol. F Q is the set of final states. δ : Q (Σ {ɛ}) Γ P fin (Q Γ ) is the transition function. (P fin (X) is the set of finite subsets of X). 14 / 22

PDA (5) An instantaneous description of a PDA is a triple (q, w, γ) with q Q is the current state of the automaton, w Σ is the remaining part of the input string, and γ Γ is the current stack. (q, aw, Zα) (q, w, βα) iff q, β δ(q, a, Z) for all q, q Q, a Σ {ɛ}, w Σ, Z Γ, α, β Γ. is the reflexive transitive closure of. 15 / 22

PDA (6) There are two alternatives for the definition of the language accepted by a PDA M = Q, Σ, Γ, δ, q 0, Z 0, F : The language accepted by M with a final state is L(M) := {w (q 0, w, Z 0 ) (q f, ɛ, γ) for a q f F and a γ Γ } The language accepted by M with an empty stack is N(M) := {w (q 0, w, Z 0 ) (q, ɛ, ɛ) for a q Q} The two modes of acceptance are equivalent, i.e., for each language L: there is a PDA M 1 with L = L(M 1 ) iff there is a PDA M 2 with L = N(M 2 ). 16 / 22

PDA (7) Example: PDA M for L(M) = {wcw R w {a, b} }. Q = {q 1, q 2, q 3 }, Σ = {a, b, c}, Γ = {#, A, B}. q 0 = q 1, Z 0 = #, F = {q 3 }. δ(q 1, a, #) = { q 1, A# } δ(q 1, a, A) = { q 1, AA } δ(q 1, a, B) = { q 1, AB } δ(q 1, c, #) = { q 2, # } δ(q 1, c, A) = { q 2, A } δ(q 1, c, B) = { q 2, B } δ(q 2, a, A) = { q 2, ɛ } δ(q 2, ɛ, #) = { q 3, # } δ(q 1, b, #) = { q 1, B# } δ(q 1, b, A) = { q 1, BA } δ(q 1, b, B) = { q 1, BB } δ(q 2, b, B) = { q 2, ɛ } 17 / 22

PDA (8) Example: PDA M for N(M) = {wcw R w {a, b} }. Q = {q 1, q 2 }, Σ = {a, b, c}, Γ = {#, A, B}. q 0 = q 1, Z 0 = #, F =. δ(q 1, a, #) = { q 1, A# } δ(q 1, a, A) = { q 1, AA } δ(q 1, a, B) = { q 1, AB } δ(q 1, c, #) = { q 2, # } δ(q 1, c, A) = { q 2, A } δ(q 1, c, B) = { q 2, B } δ(q 2, a, A) = { q 2, ɛ } δ(q 2, ɛ, #) = { q 2, ɛ } δ(q 1, b, #) = { q 1, B# } δ(q 1, b, A) = { q 1, BA } δ(q 1, b, B) = { q 1, BB } δ(q 2, b, B) = { q 2, ɛ } 18 / 22

PDA (9) A PDA M = Q, Σ, Γ, δ, q 0, Z 0, F is a deterministic PDA (DPDA) iff for all q Q, Z Γ, a Σ {ɛ}: δ(q, a, Z) 1, and for all q Q, Z Γ: if δ(q, ɛ, Z), then δ(q, a, Z) = for all a Σ. The preceding example was a DPDA. The class of languages accepted by DPDAs is smaller than the class accepted by (non-deterministic) PDAs. Example of a language that requires a non-determinstic PDA: {ww R w {a, b} }. 19 / 22

PDA and CFG (1) For each CFL L, there is a PDA M with L = N(M). Construction: Assume that ɛ / L. L = L(G) for a CFG G = N, T, P, S in Greibach-Normal Form (GNF). This means that all productions have the form A aγ with A N, a T, γ N. Every CFG can be transformed into an equivalent CFG in GNF. Equivalent PDA: M = {q}, T, N, δ, q, S, with q, γ δ(q, a, A) iff A aγ P. The automaton simulates leftmost derivations in G. 20 / 22

PDA and CFG (2) For each PDA M with L = N(M): L is a context-free language. Construction of equivalent CFG for given PDA M = Q, Σ, Γ, δ, q 0, Z 0, F : nonterminals: S and all [q 1, Z, q 2 ] with q 1, q 2 Q, Z Γ. productions: S [q 0, Z 0, q] for every q Q, and [q, A, q m+1 ] a[q 1, B 1, q 2 ],..., [q m, B m, q m+1 ] for q, q 1,..., q m+1 Q, a Σ {ɛ}, A, B 1,..., B m Γ such that q 1, B 1... B m δ(q, a, A) [q, A, q 1 ] a if q 1, ɛ δ(q, a, A). [q 1, A, q 2 ] w iff (q 1, w, A) (q 2, ɛ, ɛ). 21 / 22

Hopcroft, J. E. and Ullman, J. D. (1979). Introduction to Automata Theory, Languages and Computation. Addison Wesley. 22 / 22