Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24
Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 2 2 / 42
Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 3 3 / 42
Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 4 4 / 42
Plain old context-free grammars S NP VP N V Mary sings NP N Mary S VP V NP kicks N John NP N Mary S VP VP AdvP V NP Adv kicks N happily John 5 5 / 42
Plain old context-free grammars S NP VP N V Mary sings NP N Mary S VP V NP kicks N John NP N Mary S VP VP AdvP V NP Adv kicks N happily John S NP VP NP N AdvP Adv VP V VP V NP VP VP AdvP 6 5 / 42
Categorial grammars S NP S \ NP Mary sings NP Mary S S \ NP (S \ NP)/NP NP kicks John NP Mary S S \ NP S \ NP (S \ NP)\(S \ NP) (S \ NP)/NP NP happily kicks John 7 6 / 42
Categorial grammars S NP S \ NP Mary sings NP Mary S S \ NP (S \ NP)/NP NP kicks John NP Mary S S \ NP S \ NP (S \ NP)\(S \ NP) (S \ NP)/NP NP happily kicks John X X/Y Y X Y X\Y 8 6 / 42
Notation Mary sings NP S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP happily (S \ NP)\(S \ NP) < 0 S \ NP < 0 S 9 7 / 42
Notation Mary sings NP S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP happily (S \ NP)\(S \ NP) < 0 S \ NP < 0 S X/Y Y X (> 0 ) (forward application) Y X\Y X (< 0 ) (backward application) 10 7 / 42
Semantics Mary NP kicks (S \ NP)/NP John NP 11 8 / 42
Semantics Mary NP : mary kicks (S \ NP)/NP John NP 12 9 / 42
Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP 13 10 / 42
Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP : john 14 11 / 42
Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP : john X/Y : f Y : g X : f (g) (> 0 ) (forward application) Y : g X\Y : f X : f (g) (< 0 ) (backward application)
Semantics Mary NP : mary kicks John (S \ NP)/NP : λx.λy.kicks(y, x) NP : john S \ NP : λy.kicks(y, john) > 0 X/Y : f Y : g X : f (g) (> 0 ) (forward application) Y : g X\Y : f X : f (g) (< 0 ) (backward application)
Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP : john S \ NP : λy.kicks(y, john) > 0 S : kicks(mary, john) < 0 X/Y : f Y : g X : f (g) (> 0 ) (forward application) Y : g X\Y : f X : f (g) (< 0 ) (backward application) 17 12 / 42
Coordination of Functors Mary NP mary sings S \ NP λx.sings(x) and ((S \ NP)\(S \ NP))/(S \ NP) λf.λg.λx.g(x) f (x) (S \ NP)\(S \ NP) λg.λx.g(x) dances(x) S \ NP λx.sings(x) dances(x) S sings(mary) dances(mary) dances S \ NP λx.dances(x) > 0 < 0 < 0 18 13 / 42
Coordination of Arguments Mary NP mary S/(S \ NP) λf.f (mary) and John sing ((S/(S \ NP))\(S/(S \ NP)))/(S/(S \ NP)) λf.λg.λh.g(h) f (h) NP john S \ NP λx.sings(x) T > T > S/(S \ NP) λf.f (john) > 0 (S/(S \ NP))\(S/(S \ NP)) λg.λh.g(h) h(john) S/(S \ NP) λh.h(mary) h(john) S sing(mary) sing(john) < 0 > 0 Y : g T/(T\Y) : λf.f (g) (T > ) (forward type raising) Y : g T\(T/Y) : λf.f (g) (T < ) (backward type raising) 19 14 / 42
How CGs differ from CFGs 1 informative labels (categories) 2 general rules (combinators) 3 transparent syntax-semantics interface (syntactic dependency function application) 20 15 / 42
Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 21 16 / 42
Combinatory Categorial Grammar type of CG developed by Mark Steedman [1] additional combinators elegant treatment of incomplete constituents object relative clauses non-canonical coordination... 22 17 / 42
Object relative clauses the NP/N car N that Sue bought (N \ N)/(S/NP) NP (S \ NP)/NP T > S/(S \ NP) > 1 S/NP > 0 N \ N < 0 N > 0 NP X/Y Y/Z X/Z (> 1 ) (forward harmonic composition) Y\Z X\Y X\Z (< 1 ) (backward harmonic composition) X/Y Y\Z X\Z (> 1 ) (forward crossing composition) Y/Z X\Y X/Z (< 1 ) (backward crossing composition) 23 18 / 42
Non-canonical coordination John NP made (S \ NP)/NP and ((S/NP)\(S/NP))/(S/NP) Mary NP ate (S \ NP)/NP T > S/(S \ NP) T > S/(S \ NP) S/NP > 1 S/NP > 1 > 0 (S/NP)\(S/NP) < 0 S/NP S the NP/N marmalade N > 0 NP > 0 24 19 / 42
What makes CCG suitable for semantic parsing? few labels few rules flexible constituency: syntax adapts to semantics clear syntax-semantics interface there is not much to learn about syntax, can focus on semantics 25 20 / 42
Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 26 21 / 42
Zettlemoyer and Collins (2005) Luke S. Zettlemoyer and Michael Collins (2005): Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars [3] First paper to apply CCG to semantic parser learning, many followed 27 22 / 42
Characteristics of ZC05 s parser NLU representation list of words MRL GeoQuery and Jobs (in lambda format) data sentences (NLUs) + logical forms (MRs) evaluation metric exact match (modulo renaming variables) lexicon representation CCG categories with lambda terms 28 23 / 42
Characteristics of ZC05 s parser (cont.) candidate lexicon generation using templates (GENLEX) parsing algorithm CKY-style (chart) features only lexical features model log-linear training algorithm alternates between GENLEX, pruning, and parameter estimation 29 24 / 42
Characteristics of ZC05 s parser (cont.) experimental setup Geo880, Jobs640 datasets, standard splits 30 25 / 42
Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 31 26 / 42
Example parse What (S/(S \ NP))/N λf.λg.λx.f (x) g(x) S/(S \ NP) λg.λx.state(x) g(x) states N λx.state(x) > 0 border (S \ NP)/NP λx.λy.borders(y, x) S λx.state(x) borders(x, texas) S \ NP λy.borders(y, texas) Texas NP texas > 0 > 0 32 27 / 42
Learning lexicon entries Input (a training example) (1) a. What states border Texas b. λx.state(x) borders(x, texas) Desired output What := (S/(S \ NP))/N : λf.λg.λx.f (x) g(x) states := N : λx.state(x) border := (S \ NP)/NP : λx.λy.borders(y, x) Texas := NP : texas 33 28 / 42
Hard-coded cross-domain lexicon entries What := (S/(S \ NP))/N : λf.λg.λx.f (x) g(x) 34 29 / 42
Lexical entries for names in the domain database Texas := NP : texas Michigan := NP : michigan Seattle := NP : seattle... 35 30 / 42
Candidate lexicon generation with GENLEX Input (a training example) (2) a. Utah borders Idaho b. borders(utah, idaho) GENLEX output Utah := NP : utah Idaho := NP : idaho borders := (S \ NP)/NP : λx.λy.borders(y, x) borders := NP : idaho borders Utah := (S \ NP)/NP : λx.λy.borders(y, x) spurious items need to be pruned later 36 31 / 42
GENLEX GENLEX(S, L) = {x := y x W(S), y C(L)} where S is a sentence L is its logical form W(S) is the set of all subsequences of words in S C maps L to a set of categories through rules (see next slide) 37 32 / 42
GENLEX (cont.) 38 33 / 42
Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 39 34 / 42
Probabilistic CCG parsing mapping a sentence S to (T, L) where T is a parse tree (derivation) and L is a logical form even with a perfect lexicon, one S can have multiple (T, L) pairs (lexical ambiguity, attachment ambiguity, spurious ambiguity...) solution: define probability distribution P(L, T S) 40 35 / 42
Probabilistic CCG parsing (cont.) here: features = lexicon entries used in a parse feature function f maps parses (L, T, S) to a feature vector that indicates for each lexicon entry how often it is used in T assume a parameter vector θ that scores individual lexical features such that P(L, T S; θ) = f (L,T,S) θ (log-linear model) (L,T) ef (L,T,S) θ e Given S, return the parse (L, T) with the highest P(L, T S; θ) problem: how to find a good lexicon and a good θ? 41 36 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} 42 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries 43 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs 44 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation 45 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) 46 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) 47 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π 48 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for 49 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon 50 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon θ t := ESTIMATE(Λ t, E, θ t 1 ) parameter estimation 51 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon θ t := ESTIMATE(Λ t, E, θ t 1 ) parameter estimation end for 52 37 / 42
Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon θ t := ESTIMATE(Λ t, E, θ t 1 ) end for Output: lexicon Λ T, parameters θ T parameter estimation 53 37 / 42
Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 54 38 / 42
Results Geo880 Jobs640 Precision Recall Precision Recall Zettlemoyer & Collins [3] 96.25 79.29 97.36 79.29 Tang & Mooney [2] 89.92 79.40 93.25 79.84 55 39 / 42
Some learned lexicon entries states := N : λx.state(x) major := N/N : λf.λx.major(x) f (x) population := N : λx.population(x) cities := N : λx.river(x) rivers := N : λx.river(x) run through := (S \ NP)/NP : λx.λy.traverse(y, x) the largest := NP/N : λf. arg max(f, λx.size(x)) river := N : λx.river(x) the highest := NP/N : λf. arg max(f, λx.elev(x)) the longest := NP/N : λf. arg max(f, λx.len(x)) 56 40 / 42
Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 57 41 / 42
[1] Steedman, Mark. 2001. The syntactic process. The MIT Press. [2] Tang, Lappoon R. & Raymond J. Mooney. 2001. Using multiple clause constructors in inductive logic programming for semantic parsing. In Proceedings of the 12th european conference on machine learning, 466 477. [3] Zettlemoyer, Luke & Michael Collins. 2005. Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In Proceedings of the twenty-first conference annual conference on uncertainty in artificial intelligence (UAI-05), 658 666. AUAI Press.