Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Similar documents
Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

Einführung in die Computerlinguistik

Parsing. Left-Corner Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 17

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Einführung in die Computerlinguistik

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Parsing. Weighted Deductive Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

CPS 220 Theory of Computation

Even More on Dynamic Programming

Introduction to Computational Linguistics

CISC4090: Theory of Computation

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Parsing beyond context-free grammar: Parsing Multiple Context-Free Grammars

Computational Models - Lecture 5 1

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

CS 314 Principles of Programming Languages

Tree Adjoining Grammars

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Context Free Grammars: Introduction

Computational Models - Lecture 4 1

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Pushdown Automata (Pre Lecture)

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

Space Complexity. The space complexity of a program is how much memory it uses.

Computational Models - Lecture 4 1

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Probabilistic Context-free Grammars

Recursive descent for grammars with contexts

CPS 220 Theory of Computation Pushdown Automata (PDA)

Context- Free Parsing with CKY. October 16, 2014

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Mildly Context-Sensitive Grammar Formalisms: Thread Automata

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Natural Language Processing. Lecture 13: More on CFG Parsing

Remembering subresults (Part I): Well-formed substring tables

Lecture 11 Context-Free Languages

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Parsing. A Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 27

Automata Theory CS F-08 Context-Free Grammars

Foundations of Informatics: a Bridging Course

Pushdown Automata: Introduction (2)

A* Search. 1 Dijkstra Shortest Path

Introduction to Theory of Computing

Mildly Context-Sensitive Grammar Formalisms: Embedded Push-Down Automata

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Context-Free Grammars and Languages. Reading: Chapter 5

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

MA/CSSE 474 Theory of Computation

Everything You Always Wanted to Know About Parsing

CSCI Compiler Construction

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Parsing with Context-Free Grammars

Context Free Grammars

Chapter 14 (Partially) Unsupervised Parsing

CYK Algorithm for Parsing General Context-Free Grammars

Decoding and Inference with Syntactic Translation Models

Computability Theory

In this chapter, we explore the parsing problem, which encompasses several questions, including:

Natural Language Processing

Probabilistic Context Free Grammars

Top-Down Parsing and Intro to Bottom-Up Parsing

CSE302: Compiler Design

CS375: Logic and Theory of Computing

Creating a Recursive Descent Parse Table

Parsing Regular Expressions and Regular Grammars

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Computational Linguistics II: Parsing

Parser Combinators, (Simply) Indexed Grammars, Natural Language Parsing

Context-free Grammars and Languages

CS415 Compilers Syntax Analysis Top-down Parsing

Grammars and Context Free Languages

CISC 4090 Theory of Computation

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

CS20a: summary (Oct 24, 2002)

CS311 Computational Structures More about PDAs & Context-Free Languages. Lecture 9. Andrew P. Black Andrew Tolmach

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

CpSc 421 Final Exam December 6, 2008

Solutions to Problem Set 3

Context-Free Grammars and Languages

Solution. S ABc Ab c Bc Ac b A ABa Ba Aa a B Bbc bc.

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Statistical Machine Translation

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Transcription:

Parsing Unger s Parser Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2016/17 1 / 21

Table of contents 1 Introduction 2 The Parser 3 An Example 4 Optimizations 5 Conclusion 2 / 21

Introduction (1) Unger s parser (Grune and Jacobs, 2008) is a CFG parser that is a top-down parser: we start with S and subsequently replace lefthand sides of productions with righthand sides a non-directional parser: the expanding of non-terminals (with appropriate righthand sides) is not ordered; therefore we need to guess the yields of all non-terminals in a right-hand side at once 3 / 21

Introduction (2) G = N, T, P, S, N = {S, NP, VP, PP, V,...}, T = {Mary, man, telescope,...}, productions: S NP VP, VP VP PP, VP V NP, NP Mary,... Input: Mary sees the man with the telescope 1. S Mary sees the man with the telescope 2. NP Mary S NP VP (1.) 3. VP sees the man with the telescope 4. NP Mary sees S NP VP (1.) 5. VP the man with the telescope. 14. Mary Mary NP Mary (2.) 15. VP sees VP VP PP (3.) 16. PP the man with the telescope 17. VP sees the VP VP PP (3.) 18. PP man with the telescope. 4 / 21

Introduction (3) Parsing strategy: The parser takes an X N T and a substring w of the input. Initially, this is S and the entire input. If X and the remaining substring are equal, we can stop (success for X and w). Otherwise, X must be a non-terminal that can be further expanded. We then choose an X-production and partition w into further substrings that are paired with the righthand side elements of the production. The parser continues recursively. 5 / 21

The parser (1) Assume CFG without ɛ-productions and without loops A + A function unger(w,x): out := false; if w = X, then out := true else for all X X 1... X k : for all x 1,..., x k T + with w = x 1... x k : if k i=1 unger(x i,x i ) then out := true; return out The following holds: unger(w,x) iff X w (for X N T, w T ) 6 / 21

The parser (2) Extension to deal with ɛ-productions and loops: Add a list of preceding calls pass this list when calling the parser again if the new call is already on the list, stop and return false Initial call: unger(w, S, ) 7 / 21

The parser (3) function unger(w,x,l): out := false; if X, w L, return out; else if w = X or (w = ɛ and X ɛ P) then out := true else for all X X 1... X k P: for all x 1,..., x k T with w = x 1... x k : if k i=1 unger(x i,x i, L { X, w }) then out := true; return out 8 / 21

The parser (4) So far, we have a recognizer, not a parser. To turn this into a parser, every call unger(..) must return a (set of) parse trees. This can be obtained from 1 the succssful productions X X 1... X k, and 2 the parse trees returned by the calls unger(x i,x i ). Note, however, that there might be a large amount of parse trees since in each call, there might be more than one successful production. We will come back to the compact presentation of several analyses in a parse forest. 9 / 21

An example (1) Assume a CFG without ε-productions Production S NP VP Input sentence w with w = 34: Mr. Sarkozy s pension reform, which only affects about 500,000 public sector employees, is the opening salvo in a series of measures aimed more broadly at rolling back France s system of labor protections. (New York Times) 10 / 21

An example (2) Partitions according to Unger s parser: NP S VP 1. Mr. Sarkozy s... protections 2. Mr. Sarkozy s... protections 3. Mr. Sarkozy s pension... protections. 33. Mr....labor protections w = 34, consequently we have 33 different partitions. 11 / 21

An example (3) Consider the following partition for S NP VP: S NP VP Mr. Sarkozy s pension reform, which...employees, is...protections For NP NP S, there are 12 partitions of the NP part The partition above is just one partition for one production! In the worst case, parsing is exponential in the length n of the input string! 12 / 21

A note about time complexity Time complexity We say that an algorithm is of polynomial time complexity if there is a constant c and a k such that the parsing of a string of length n takes an amount of time cn k. Notation: O(n k ) exponential time complexity if there is a constant c and a k such that the parsing of a string of length n takes an amount of time ck n. Notation: O(k n ) 13 / 21

Optimizations (1) As an additional filter, we can constrain the set of partitions that we investigate: Check on occurrences of terminals in rhs. Check on minimal length of terminal string derived by a nonterminal. Check on obligatory terminals (pre-terminals) in strings derived by non-terminals, e.g., each NP contains an N, each VP contains a V,... Check on the first terminals derivable from a non-terminal. 14 / 21

Optimizations (2) Furthermore, we can use tabulation (dynamic programming) in order to avoid computing several times the same thing: 1 Whenever unger(x, w, L) yields a result res, we store X, w, res in our table of partial parsing results. 2 In every call unger(x, w, L), we first check whether we have already computed a result X, w, res and if so, we stop immediately and return res. 15 / 21

Optimizations (3) Results X, w, res can be stored in a three-dimensional table (chart) C: Assume k = N + T and non-terminals N and terminals T to have a unique index k. Furthermore, assume w = n with w = w 1 w n, then you can use a k n n table, the chart! 1 Whenever unger(x, w i w j, L) yields a result res and m index of X, then C(m, i, j) = res 2 In every call unger(x, w i w j, L), we first check whether we have already a value in C(m, i, j) and if so, we stop and return C(m, i, j) Advantage: access of C(m, i, j) in constant time. Disadvantage: storing the Chart needs more memory. Assumption: grammar is ε-free otherwise we need a k (n + 1) (n + 1) chart. 16 / 21

Optimizations (4) Example G = N, T, P, S, N = {S, B}, T = {a, b, c} and productions S asb c B bb Input word w = acbb. We assume that, when guessing the span of a rhs element, we take into account that... 1 each terminal spans only a corresponding single terminal 2 the span of an S has to start with an a or a c 3 the span of a B has to start with a b 4 the span of each X N T contains at least one symbol (no ε-productions) 17 / 21

Optimizations (5) Example continued Chart obtained for w = acbb j 4 S, t B, t b, t B, f 3 S, f b, t 2 S, t c, t 1 a, t 1 2 3 4 i S acbb? t a a? t S c? t c c? t B bb? t b b? t b b? t S cb f B b f (Productions: S asb c B bb) 18 / 21

Optimizations (6) In addition, we can tabulate entire productions with the spans of their different symbols. This gives us a compact presentation of the parse forest! In every call unger(x, w i w j ), we first check whether we have already a value in C(m, i, j) and if so, we stop and return C(m, i, j). Otherwise, we compute all possible first steps of derivations X w: for every production X X 1... X k and all w 1,..., w k such that the recursive Unger calls yield true, we add X, w X 1, w 1... X k, w k with the indices of the spans to the list of productions. If at least one such production has been found, we return true, otherwise false. Example on handout. 19 / 21

Conclusion Unger s parser is a non-directional top-down parser. highly non-deterministic because during parsing, the yields of all non-terminals in righthand sides must be guessed. in general of exponential (time) complexity. of polynomial time complexity if tabulation is applied. 20 / 21

Grune, D. and Jacobs, C. (2008). Parsing Techniques. A Practical Guide. Monographs in Computer Science. Springer. Second Edition. 21 / 21