Introduction to Semantic Parsing with CCG

Similar documents
Lecture 5: Semantic Parsing, CCGs, and Structured Classification

Cross-lingual Semantic Parsing

Driving Semantic Parsing from the World s Response

Semantic Parsing with Combinatory Categorial Grammars

Bringing machine learning & compositional semantics together: central concepts

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

Surface Reasoning Lecture 2: Logic and Grammar

A Multi-Modal Combinatory Categorial Grammar. A MMCCG Analysis of Japanese Nonconstituent Clefting

Learning Dependency-Based Compositional Semantics

Lab 12: Structured Prediction

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

LECTURER: BURCU CAN Spring

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Categorial Grammar. Larry Moss NASSLLI. Indiana University

Probabilistic Context Free Grammars. Many slides from Michael Collins

Sharpening the empirical claims of generative syntax through formalization

Parsing with Context-Free Grammars

Probabilistic Context-free Grammars

NLU: Semantic parsing

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Advanced Natural Language Processing Syntactic Parsing

An introduction to mildly context sensitive grammar formalisms. Combinatory Categorial Grammar

Outline. Learning. Overview Details Example Lexicon learning Supervision signals

X-bar theory. X-bar :

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Unterspezifikation in der Semantik Scope Semantics in Lexicalized Tree Adjoining Grammars

Multiword Expression Identification with Tree Substitution Grammars

Linguistics 819: Seminar on TAG and CCG. Introduction to Combinatory Categorial Grammar

Features of Statistical Parsers

Decoding and Inference with Syntactic Translation Models

A* Search. 1 Dijkstra Shortest Path

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Outline. Learning. Overview Details

Statistical Machine Translation

Semantics and Generative Grammar. Expanding Our Formalism, Part 1 1

Natural Language Processing

Lambek Grammars as Combinatory Categorial Grammars

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

Learning to translate with neural networks. Michael Auli

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

AN ABSTRACT OF THE DISSERTATION OF

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

A proof theoretical account of polarity items and monotonic inference.

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Soft Inference and Posterior Marginals. September 19, 2013

Scope Ambiguities through the Mirror

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

A Context-Free Grammar

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Maximum Entropy Models for Natural Language Processing

CS460/626 : Natural Language

Marrying Dynamic Programming with Recurrent Neural Networks

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus

Introduction to Semantics. The Formalization of Meaning 1

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Collaborative topic models: motivations cont

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

Structured Prediction Models via the Matrix-Tree Theorem

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Outline. Parsing. Approximations Some tricks Learning agenda-based parsers

Parts 3-6 are EXAMPLES for cse634

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

Natural Language Processing

Machine Learning for natural language processing

Parsing. Left-Corner Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 17

Parsing. Weighted Deductive Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Semantics and Pragmatics of NLP Pronouns

Model-Theory of Property Grammars with Features

Logic and machine learning review. CS 540 Yingyu Liang

Logical Agents (I) Instructor: Tsung-Che Chiang

Artificial Intelligence

Dual Decomposition for Natural Language Processing. Decoding complexity

Computational Linguistics

Models of Adjunction in Minimalist Grammars

564 Lecture 25 Nov. 23, Continuing note on presuppositional vs. nonpresuppositional dets.

Natural Language Processing. Lecture 13: More on CFG Parsing

CS : Speech, NLP and the Web/Topics in AI

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing

LC Graphs for the Lambek calculus with product

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Introduction to Semantics. Common Nouns and Adjectives in Predicate Position 1

Multi-Component Word Sense Disambiguation

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

Parsing with Context-Free Grammars

On rigid NL Lambek grammars inference from generalized functor-argument data

Algorithms for Syntax-Aware Statistical Machine Translation

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Statistical Methods for NLP

Predicate Logic. Xinyu Feng 09/26/2011. University of Science and Technology of China (USTC)

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Quasi-Second-Order Parsing for 1-Endpoint-Crossing, Pagenumber-2 Graphs

LEARNING COMPOSITIONALITY

Transcription:

Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24

Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 2 2 / 42

Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 3 3 / 42

Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 4 4 / 42

Plain old context-free grammars S NP VP N V Mary sings NP N Mary S VP V NP kicks N John NP N Mary S VP VP AdvP V NP Adv kicks N happily John 5 5 / 42

Plain old context-free grammars S NP VP N V Mary sings NP N Mary S VP V NP kicks N John NP N Mary S VP VP AdvP V NP Adv kicks N happily John S NP VP NP N AdvP Adv VP V VP V NP VP VP AdvP 6 5 / 42

Categorial grammars S NP S \ NP Mary sings NP Mary S S \ NP (S \ NP)/NP NP kicks John NP Mary S S \ NP S \ NP (S \ NP)\(S \ NP) (S \ NP)/NP NP happily kicks John 7 6 / 42

Categorial grammars S NP S \ NP Mary sings NP Mary S S \ NP (S \ NP)/NP NP kicks John NP Mary S S \ NP S \ NP (S \ NP)\(S \ NP) (S \ NP)/NP NP happily kicks John X X/Y Y X Y X\Y 8 6 / 42

Notation Mary sings NP S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP happily (S \ NP)\(S \ NP) < 0 S \ NP < 0 S 9 7 / 42

Notation Mary sings NP S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP happily (S \ NP)\(S \ NP) < 0 S \ NP < 0 S X/Y Y X (> 0 ) (forward application) Y X\Y X (< 0 ) (backward application) 10 7 / 42

Semantics Mary NP kicks (S \ NP)/NP John NP 11 8 / 42

Semantics Mary NP : mary kicks (S \ NP)/NP John NP 12 9 / 42

Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP 13 10 / 42

Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP : john 14 11 / 42

Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP : john X/Y : f Y : g X : f (g) (> 0 ) (forward application) Y : g X\Y : f X : f (g) (< 0 ) (backward application)

Semantics Mary NP : mary kicks John (S \ NP)/NP : λx.λy.kicks(y, x) NP : john S \ NP : λy.kicks(y, john) > 0 X/Y : f Y : g X : f (g) (> 0 ) (forward application) Y : g X\Y : f X : f (g) (< 0 ) (backward application)

Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP : john S \ NP : λy.kicks(y, john) > 0 S : kicks(mary, john) < 0 X/Y : f Y : g X : f (g) (> 0 ) (forward application) Y : g X\Y : f X : f (g) (< 0 ) (backward application) 17 12 / 42

Coordination of Functors Mary NP mary sings S \ NP λx.sings(x) and ((S \ NP)\(S \ NP))/(S \ NP) λf.λg.λx.g(x) f (x) (S \ NP)\(S \ NP) λg.λx.g(x) dances(x) S \ NP λx.sings(x) dances(x) S sings(mary) dances(mary) dances S \ NP λx.dances(x) > 0 < 0 < 0 18 13 / 42

Coordination of Arguments Mary NP mary S/(S \ NP) λf.f (mary) and John sing ((S/(S \ NP))\(S/(S \ NP)))/(S/(S \ NP)) λf.λg.λh.g(h) f (h) NP john S \ NP λx.sings(x) T > T > S/(S \ NP) λf.f (john) > 0 (S/(S \ NP))\(S/(S \ NP)) λg.λh.g(h) h(john) S/(S \ NP) λh.h(mary) h(john) S sing(mary) sing(john) < 0 > 0 Y : g T/(T\Y) : λf.f (g) (T > ) (forward type raising) Y : g T\(T/Y) : λf.f (g) (T < ) (backward type raising) 19 14 / 42

How CGs differ from CFGs 1 informative labels (categories) 2 general rules (combinators) 3 transparent syntax-semantics interface (syntactic dependency function application) 20 15 / 42

Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 21 16 / 42

Combinatory Categorial Grammar type of CG developed by Mark Steedman [1] additional combinators elegant treatment of incomplete constituents object relative clauses non-canonical coordination... 22 17 / 42

Object relative clauses the NP/N car N that Sue bought (N \ N)/(S/NP) NP (S \ NP)/NP T > S/(S \ NP) > 1 S/NP > 0 N \ N < 0 N > 0 NP X/Y Y/Z X/Z (> 1 ) (forward harmonic composition) Y\Z X\Y X\Z (< 1 ) (backward harmonic composition) X/Y Y\Z X\Z (> 1 ) (forward crossing composition) Y/Z X\Y X/Z (< 1 ) (backward crossing composition) 23 18 / 42

Non-canonical coordination John NP made (S \ NP)/NP and ((S/NP)\(S/NP))/(S/NP) Mary NP ate (S \ NP)/NP T > S/(S \ NP) T > S/(S \ NP) S/NP > 1 S/NP > 1 > 0 (S/NP)\(S/NP) < 0 S/NP S the NP/N marmalade N > 0 NP > 0 24 19 / 42

What makes CCG suitable for semantic parsing? few labels few rules flexible constituency: syntax adapts to semantics clear syntax-semantics interface there is not much to learn about syntax, can focus on semantics 25 20 / 42

Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 26 21 / 42

Zettlemoyer and Collins (2005) Luke S. Zettlemoyer and Michael Collins (2005): Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars [3] First paper to apply CCG to semantic parser learning, many followed 27 22 / 42

Characteristics of ZC05 s parser NLU representation list of words MRL GeoQuery and Jobs (in lambda format) data sentences (NLUs) + logical forms (MRs) evaluation metric exact match (modulo renaming variables) lexicon representation CCG categories with lambda terms 28 23 / 42

Characteristics of ZC05 s parser (cont.) candidate lexicon generation using templates (GENLEX) parsing algorithm CKY-style (chart) features only lexical features model log-linear training algorithm alternates between GENLEX, pruning, and parameter estimation 29 24 / 42

Characteristics of ZC05 s parser (cont.) experimental setup Geo880, Jobs640 datasets, standard splits 30 25 / 42

Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 31 26 / 42

Example parse What (S/(S \ NP))/N λf.λg.λx.f (x) g(x) S/(S \ NP) λg.λx.state(x) g(x) states N λx.state(x) > 0 border (S \ NP)/NP λx.λy.borders(y, x) S λx.state(x) borders(x, texas) S \ NP λy.borders(y, texas) Texas NP texas > 0 > 0 32 27 / 42

Learning lexicon entries Input (a training example) (1) a. What states border Texas b. λx.state(x) borders(x, texas) Desired output What := (S/(S \ NP))/N : λf.λg.λx.f (x) g(x) states := N : λx.state(x) border := (S \ NP)/NP : λx.λy.borders(y, x) Texas := NP : texas 33 28 / 42

Hard-coded cross-domain lexicon entries What := (S/(S \ NP))/N : λf.λg.λx.f (x) g(x) 34 29 / 42

Lexical entries for names in the domain database Texas := NP : texas Michigan := NP : michigan Seattle := NP : seattle... 35 30 / 42

Candidate lexicon generation with GENLEX Input (a training example) (2) a. Utah borders Idaho b. borders(utah, idaho) GENLEX output Utah := NP : utah Idaho := NP : idaho borders := (S \ NP)/NP : λx.λy.borders(y, x) borders := NP : idaho borders Utah := (S \ NP)/NP : λx.λy.borders(y, x) spurious items need to be pruned later 36 31 / 42

GENLEX GENLEX(S, L) = {x := y x W(S), y C(L)} where S is a sentence L is its logical form W(S) is the set of all subsequences of words in S C maps L to a set of categories through rules (see next slide) 37 32 / 42

GENLEX (cont.) 38 33 / 42

Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 39 34 / 42

Probabilistic CCG parsing mapping a sentence S to (T, L) where T is a parse tree (derivation) and L is a logical form even with a perfect lexicon, one S can have multiple (T, L) pairs (lexical ambiguity, attachment ambiguity, spurious ambiguity...) solution: define probability distribution P(L, T S) 40 35 / 42

Probabilistic CCG parsing (cont.) here: features = lexicon entries used in a parse feature function f maps parses (L, T, S) to a feature vector that indicates for each lexicon entry how often it is used in T assume a parameter vector θ that scores individual lexical features such that P(L, T S; θ) = f (L,T,S) θ (log-linear model) (L,T) ef (L,T,S) θ e Given S, return the parse (L, T) with the highest P(L, T S; θ) problem: how to find a good lexicon and a good θ? 41 36 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} 42 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries 43 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs 44 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation 45 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) 46 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) 47 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π 48 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for 49 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon 50 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon θ t := ESTIMATE(Λ t, E, θ t 1 ) parameter estimation 51 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon θ t := ESTIMATE(Λ t, E, θ t 1 ) parameter estimation end for 52 37 / 42

Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon θ t := ESTIMATE(Λ t, E, θ t 1 ) end for Output: lexicon Λ T, parameters θ T parameter estimation 53 37 / 42

Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 54 38 / 42

Results Geo880 Jobs640 Precision Recall Precision Recall Zettlemoyer & Collins [3] 96.25 79.29 97.36 79.29 Tang & Mooney [2] 89.92 79.40 93.25 79.84 55 39 / 42

Some learned lexicon entries states := N : λx.state(x) major := N/N : λf.λx.major(x) f (x) population := N : λx.population(x) cities := N : λx.river(x) rivers := N : λx.river(x) run through := (S \ NP)/NP : λx.λy.traverse(y, x) the largest := NP/N : λf. arg max(f, λx.size(x)) river := N : λx.river(x) the highest := NP/N : λf. arg max(f, λx.elev(x)) the longest := NP/N : λf. arg max(f, λx.len(x)) 56 40 / 42

Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 57 41 / 42

[1] Steedman, Mark. 2001. The syntactic process. The MIT Press. [2] Tang, Lappoon R. & Raymond J. Mooney. 2001. Using multiple clause constructors in inductive logic programming for semantic parsing. In Proceedings of the 12th european conference on machine learning, 466 477. [3] Zettlemoyer, Luke & Michael Collins. 2005. Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In Proceedings of the twenty-first conference annual conference on uncertainty in artificial intelligence (UAI-05), 658 666. AUAI Press.