A* Search. 1 Dijkstra Shortest Path

Similar documents
Parsing with Context-Free Grammars

Parsing. Weighted Deductive Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Aspects of Tree-Based Statistical Machine Translation

Parsing. A Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 27

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Probabilistic Context-free Grammars

Decoding and Inference with Syntactic Translation Models

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Aspects of Tree-Based Statistical Machine Translation

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Context- Free Parsing with CKY. October 16, 2014

Probabilistic Context Free Grammars

CS 188 Introduction to AI Fall 2005 Stuart Russell Final

The Generalized A* Architecture

Context Free Grammars

Parsing with Context-Free Grammars

LECTURER: BURCU CAN Spring

Natural Language Processing

Properties of Context-Free Languages

Probabilistic Context-Free Grammar

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Natural Language Processing. Lecture 13: More on CFG Parsing

Computational Models - Lecture 4

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

A* Beats Dynamic Programming

Statistical Machine Translation

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

CS20a: summary (Oct 24, 2002)

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Section 1 (closed-book) Total points 30

MA/CSSE 474 Theory of Computation

Soft Inference and Posterior Marginals. September 19, 2013

Remembering subresults (Part I): Well-formed substring tables

Chapter 14 (Partially) Unsupervised Parsing

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Statistical Methods for NLP

Context Free Language Properties

Sharpening the empirical claims of generative syntax through formalization

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Syntax Analysis Part I

Syntax Analysis Part I

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Computational Models - Lecture 3

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

Computational Linguistics

Logic. proof and truth syntacs and semantics. Peter Antal

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

X-bar theory. X-bar :

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Object Detection Grammars

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Probabilistic Linguistics

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Notes for Comp 497 (Comp 454) Week 10 4/5/05

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Lecture 5: UDOP, Dependency Grammars

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Final exam of ECE 457 Applied Artificial Intelligence for the Fall term 2007.

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Before We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions?

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

UNIT-VIII COMPUTABILITY THEORY

Graph Search Howie Choset

Einführung in die Computerlinguistik

Introduction to Kleene Algebras

Introduction to Computational Linguistics

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Computational Models - Lecture 4 1

CS 6120/CS4120: Natural Language Processing

DT2118 Speech and Speaker Recognition

In this chapter, we explore the parsing problem, which encompasses several questions, including:

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

This lecture covers Chapter 5 of HMU: Context-free Grammars

Syntax Analysis Part III

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus

Probabilistic Context Free Grammars. Many slides from Michael Collins

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

The View Over The Horizon

Transcription:

A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the empty location into the empty location. The goal is to take a given configuration and through a series of moves arrange the tiles in order from left to right and top to bottom. The set of configurations of the tiles forms a graph where two nodes (configurations) are connected if one node can be converted to the other by a single move. We are interested in computing the shortest path from a given start node to a given goal node. 1 Dijkstra Shortest Path We are given a graph G where each edge is assigned a non-negative weight and we are also given a start node s and goal node g. We want the path of least weight from s to g. Procedure Dijkstra Shortest Path 1. Initialize Q to be the set containing the single pair s, 0. 2. S 3. while Q is not empty and S does not contain an assignment to g: 4. Remove a pair n, w from Q with minimum weight w (over all elements of Q). 5. if S does not already contains a weight for n, i.e., if S does not contain any pair of the form n, w, then: 6. add n, w to S. 7. For each edge n, m with weight w in the graph G add the pair m, w + w 8. If S contains an assignment to g then terminate with success (a path has been found) else terminate with failure (there is no path). Note that if removing n, w form Q results in adding m, w then because edge weights are non-negative we have that w w. This implies that the weights removed from Q monotonically increase. This implies that when a pair n, w is added to S, w is the weight of the shortest path to n. 1

2 Admissible Heuristics Let h be a function form the nodes of G to non-negative real numbers. The function h is admissible if, for any node n, the distance from n to g is at least h(n). The function h is called monotone if for any edge n, m with weight w we have h(m) h(n) w Suppose h is monotone and h(g) = 0. Consider a path from g to n. As we cross an edge moving from g to n along this path the increase in h can be no larger than the weight of the edge. So h(n) cannot be larger than the weight of the path. This gives us that if h(g) = 0 and h is monotone then h is admissible. Manhattan distance in the eight puzzle is both monotone and admissible (this is the sum over tiles of the number of rows plus the number of columns that the tile must be moved). 3 A* The following differs from Dijkstra shortest path only in the priority used in step 4. Procedure A* 1. Initialize Q to be the set containing the single pair s, 0. 2. S 3. while Q is not empty and S does not contain an assignment to g: 4. Remove a pair n, w from Q minimizing w + h(n) (over all elements of Q). 5. if S does not already contains a weight for n, i.e., if S does not contain any pair of the form n, w, then: 6. add n, w to S. 7. For each edge n, m with weight w in the graph G add the pair m, w + w 8. If S contains an assignment to g then terminate with success (a path has been found) else terminate with failure (there is no path). If h is monotone then the sum w + h(n) of the item removed from the queue monotonically increases. This implies that when we remove n, w from the queue, the weight w is the least possible weight for n. Note that monotonicity alone gives correctness of the algorithm we do not need that h(g) = 0. 2

By taking the distance to goal into account A* can be vastly more efficient than Dijkstra Shortest path. 4 A Star Parsing A CFG for English might generate the phrase that grand old house on the hill using the following grammar. NP Det N Det that Det the N AP N AP grand AP old N N P P N house P P P NP P on N hill But the parsing accuracy of PCFGs can be improved by lexicalizing the nonterminals. This means a nonterminal is specified by a pair of a syntactic category and a particular word. For example NP house is a nonterminal which can generate an noun phrase whose head word is the word house. The concept of head word is not formally defined, but intuitively the head word is the most important word in a phrase. In a lexicalized PCFG the phrase that grand old house on the hill might be generated using the following productions. 3

NP house Det that N house Det that that N house AP grand N house AP grand grand N house AP old N house AP old old N house N house P P on N house house. Parsing a lexicalized PCFG is challenging. Even if we restrict the head words to the words that occur in the sentence, the average sentence length in the New York times is 25 words. Combining that with 20 syntactic categories gives 460 nonterminals. The viterbi algorithm considers all possible pairs of adgajent phrases this approximately 25 3 times 460 2 about 4 billion phrase pairs. Many parsers use even more information in the nonterminals making complete Viterbi parsing impossible. 5 Probabilities to Weights Consider a probabilistic context free grammar (PCFG) in Chomsky normal form. We have have productions of the form X Y Z and X a and for each nonterminal X we have β P (X β) = 1. We define the weight a production as follows. w(x β) = log 2 1 P (X β) We can think of w(x β) as a number of bits. The weight of a parse tree is the sum of the weights of the productions used in that parse tree. The weight of a tree can be thought of as the number of bits it takes to name (or code for) that tree. We are interested in finding the Viterbi parse tree the most probably tree, or equivalently, the lightest weight tree. 4

6 Knuth Lightest Derivation The efficiency of Parsing can be improved by using algorithms similar to Dijkstra shortest path and A*. Instead of nodes we work with phrases where a phrase is defined to be a triple X, i, j representing the generation of the string a i... a j 1 from the nonterminal X. We consider an input string a 1... a n. We are seking a derivation of the goal phrase S, 1, n + 1. In the following allgorithm the set S is often called the chart and algorithms of this form are often called chart parses. The essence of a chart parser is a table (the set S) which stores dynamic programming values. Please compare the following algorithm to Dijkstra Shortest Path. Procedure Knuth Lightest Derivation for Parsing 1. 2. Initialize Q to be the set containing all pairs X, i, i + 1 where the grammar contains X a with weight w and the ith word in the input string is a. 3. S 4. while Q is not empty and S does not contain an assignment to S, 1, n + 1 : 5. Remove a pair Y, i, j, w from Q minimizing w (over all elements of Q). 6. if S does not already contains a weight for Y, i, j then: 7. add n, w to S. 8. For each pair of the from Z, j, k, w in S where the grammar G Y Z add the pair X, i, k, w + w + w 9. For each pair of the from Z, k, j, w in S where the grammar G ZY add the pair X, k, j, w + w + w 10. If S contains an assignment to S, 1, n + 1 then terminate with success (a parse has been found) else terminate with failure (there is no parse). Note that if removing Y i,j, w form Q results in adding Xu, v, w then, because weights of grammar rules are non-negative, we have that w w. This implies that the weights removed from Q monotonically increase. This implies that when a pair X u,v, w is added to S, w is the weight of the lightest weight derivation of X u,v. Note that if the given word string has a light weight parse tree then heavy weight phrases are never considered. 5

7 A* Parsing A context for phrase X, i, j (relative to input string a 1... a n ) is is a parse tree whose leaf nodes are labeled with a 1... a i 1 Xa j... a n. The weight of a context is the sum of the weights of the productions appearing in the parse tree. A function h with h( X, i, j ) a real number will be called an admissible heuristic for parsing if h( X, i, j ) is a lower bound on the weight of any context for X, i, j. For a heuristic function h we define the following A* algorithm which differes from the preceding algorithm in step 4. Procedure A* Parsing 1. 2. Initialize Q to be the set containing all pairs X, i, i + 1 where the grammar contains X a with weight w and the ith word in the input string is a. 3. S 4. while Q is not empty and S does not contain an assignment to S, 1, n + 1 : 5. Remove a pair Y, i, j, w from Q minimizing w + h( Y, i, j ). 6. if S does not already contains a weight for Y, i, j then: 7. add n, w to S. 8. For each pair of the from Z, j, k, w in S where the grammar G Y Z add the pair X, i, k, w + w + w 9. For each pair of the from Z, k, j, w in S where the grammar G ZY add the pair X, k, j, w + w + w 10. If S contains an assignment to S, 1, n + 1 then terminate with success (a parse has been found) else terminate with failure (there is no parse). 8 Problem Suppose you are given a curve model consisting of a sequence of positions x 1, x 2,... x n each of which is a position on the plane (a pair of real numbers). This sequence of positions is a curve model. We are indterested in finding curves of this shape in a given image. Consider a sequence of points in the image y 1,... y n. The score of placing the model points at these image positions is given as follows. 6

cost = n 2 Shape-Cost(x i, x i+1, x i+2, y i, y i+1, y i+2 ) i=1 n 1 + Data-Cost(y i, y i+1 ) i=1 Give an algorithm for computing the cost of the least cost sequence y 1,..., y n. You should first give inference rules for computing, for a given i and image positions y i and y i+1, the cost cost(i, y i, y i+1 ), of the least cost sequence y 1,..., y i, y i+1. You should be able to derive cost(i + 1, y i+1, y i+2 ) given all values for cost(i, y i, y i+1 ). Give the Dijstra lightest derivation algorithm for this problem. 7