Introduction to Semantic Parsing with CCG

Size: px

Start display at page:

Download "Introduction to Semantic Parsing with CCG"

William Barker
5 years ago
Views:

1 Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf

2 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 2 2 / 42

3 Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 3 3 / 42

4 Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References 4 4 / 42

5 Plain old context-free grammars S NP VP N V Mary sings NP N Mary S VP V NP kicks N John NP N Mary S VP VP AdvP V NP Adv kicks N happily John 5 5 / 42

6 Plain old context-free grammars S NP VP N V Mary sings NP N Mary S VP V NP kicks N John NP N Mary S VP VP AdvP V NP Adv kicks N happily John S NP VP NP N AdvP Adv VP V VP V NP VP VP AdvP 6 5 / 42

7 Categorial grammars S NP S \ NP Mary sings NP Mary S S \ NP (S \ NP)/NP NP kicks John NP Mary S S \ NP S \ NP (S \ NP)\(S \ NP) (S \ NP)/NP NP happily kicks John 7 6 / 42

8 Categorial grammars S NP S \ NP Mary sings NP Mary S S \ NP (S \ NP)/NP NP kicks John NP Mary S S \ NP S \ NP (S \ NP)\(S \ NP) (S \ NP)/NP NP happily kicks John X X/Y Y X Y X\Y 8 6 / 42

9 Notation Mary sings NP S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP happily (S \ NP)\(S \ NP) < 0 S \ NP < 0 S 9 7 / 42

10 Notation Mary sings NP S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP S < 0 Mary NP kicks John (S \ NP)/NP NP > 0 S \ NP happily (S \ NP)\(S \ NP) < 0 S \ NP < 0 S X/Y Y X (> 0 ) (forward application) Y X\Y X (< 0 ) (backward application) 10 7 / 42

11 Semantics Mary NP kicks (S \ NP)/NP John NP 11 8 / 42

12 Semantics Mary NP : mary kicks (S \ NP)/NP John NP 12 9 / 42

13 Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP / 42

14 Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP : john / 42

15 Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP : john X/Y : f Y : g X : f (g) (> 0 ) (forward application) Y : g X\Y : f X : f (g) (< 0 ) (backward application)

16 Semantics Mary NP : mary kicks John (S \ NP)/NP : λx.λy.kicks(y, x) NP : john S \ NP : λy.kicks(y, john) > 0 X/Y : f Y : g X : f (g) (> 0 ) (forward application) Y : g X\Y : f X : f (g) (< 0 ) (backward application)

17 Semantics Mary NP : mary kicks (S \ NP)/NP : λx.λy.kicks(y, x) John NP : john S \ NP : λy.kicks(y, john) > 0 S : kicks(mary, john) < 0 X/Y : f Y : g X : f (g) (> 0 ) (forward application) Y : g X\Y : f X : f (g) (< 0 ) (backward application) / 42

18 Coordination of Functors Mary NP mary sings S \ NP λx.sings(x) and ((S \ NP)\(S \ NP))/(S \ NP) λf.λg.λx.g(x) f (x) (S \ NP)\(S \ NP) λg.λx.g(x) dances(x) S \ NP λx.sings(x) dances(x) S sings(mary) dances(mary) dances S \ NP λx.dances(x) > 0 < 0 < / 42

19 Coordination of Arguments Mary NP mary S/(S \ NP) λf.f (mary) and John sing ((S/(S \ NP))\(S/(S \ NP)))/(S/(S \ NP)) λf.λg.λh.g(h) f (h) NP john S \ NP λx.sings(x) T > T > S/(S \ NP) λf.f (john) > 0 (S/(S \ NP))\(S/(S \ NP)) λg.λh.g(h) h(john) S/(S \ NP) λh.h(mary) h(john) S sing(mary) sing(john) < 0 > 0 Y : g T/(T\Y) : λf.f (g) (T > ) (forward type raising) Y : g T\(T/Y) : λf.f (g) (T < ) (backward type raising) / 42

20 How CGs differ from CFGs 1 informative labels (categories) 2 general rules (combinators) 3 transparent syntax-semantics interface (syntactic dependency function application) / 42

21 Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References / 42

22 Combinatory Categorial Grammar type of CG developed by Mark Steedman [1] additional combinators elegant treatment of incomplete constituents object relative clauses non-canonical coordination / 42

23 Object relative clauses the NP/N car N that Sue bought (N \ N)/(S/NP) NP (S \ NP)/NP T > S/(S \ NP) > 1 S/NP > 0 N \ N < 0 N > 0 NP X/Y Y/Z X/Z (> 1 ) (forward harmonic composition) Y\Z X\Y X\Z (< 1 ) (backward harmonic composition) X/Y Y\Z X\Z (> 1 ) (forward crossing composition) Y/Z X\Y X/Z (< 1 ) (backward crossing composition) / 42

24 Non-canonical coordination John NP made (S \ NP)/NP and ((S/NP)\(S/NP))/(S/NP) Mary NP ate (S \ NP)/NP T > S/(S \ NP) T > S/(S \ NP) S/NP > 1 S/NP > 1 > 0 (S/NP)\(S/NP) < 0 S/NP S the NP/N marmalade N > 0 NP > / 42

25 What makes CCG suitable for semantic parsing? few labels few rules flexible constituency: syntax adapts to semantics clear syntax-semantics interface there is not much to learn about syntax, can focus on semantics / 42

26 Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References / 42

27 Zettlemoyer and Collins (2005) Luke S. Zettlemoyer and Michael Collins (2005): Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars [3] First paper to apply CCG to semantic parser learning, many followed / 42

28 Characteristics of ZC05 s parser NLU representation list of words MRL GeoQuery and Jobs (in lambda format) data sentences (NLUs) + logical forms (MRs) evaluation metric exact match (modulo renaming variables) lexicon representation CCG categories with lambda terms / 42

29 Characteristics of ZC05 s parser (cont.) candidate lexicon generation using templates (GENLEX) parsing algorithm CKY-style (chart) features only lexical features model log-linear training algorithm alternates between GENLEX, pruning, and parameter estimation / 42

30 Characteristics of ZC05 s parser (cont.) experimental setup Geo880, Jobs640 datasets, standard splits / 42

31 Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References / 42

32 Example parse What (S/(S \ NP))/N λf.λg.λx.f (x) g(x) S/(S \ NP) λg.λx.state(x) g(x) states N λx.state(x) > 0 border (S \ NP)/NP λx.λy.borders(y, x) S λx.state(x) borders(x, texas) S \ NP λy.borders(y, texas) Texas NP texas > 0 > / 42

33 Learning lexicon entries Input (a training example) (1) a. What states border Texas b. λx.state(x) borders(x, texas) Desired output What := (S/(S \ NP))/N : λf.λg.λx.f (x) g(x) states := N : λx.state(x) border := (S \ NP)/NP : λx.λy.borders(y, x) Texas := NP : texas / 42

34 Hard-coded cross-domain lexicon entries What := (S/(S \ NP))/N : λf.λg.λx.f (x) g(x) / 42

35 Lexical entries for names in the domain database Texas := NP : texas Michigan := NP : michigan Seattle := NP : seattle / 42

36 Candidate lexicon generation with GENLEX Input (a training example) (2) a. Utah borders Idaho b. borders(utah, idaho) GENLEX output Utah := NP : utah Idaho := NP : idaho borders := (S \ NP)/NP : λx.λy.borders(y, x) borders := NP : idaho borders Utah := (S \ NP)/NP : λx.λy.borders(y, x) spurious items need to be pruned later / 42

37 GENLEX GENLEX(S, L) = {x := y x W(S), y C(L)} where S is a sentence L is its logical form W(S) is the set of all subsequences of words in S C maps L to a set of categories through rules (see next slide) / 42

38 GENLEX (cont.) / 42

39 Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References / 42

40 Probabilistic CCG parsing mapping a sentence S to (T, L) where T is a parse tree (derivation) and L is a logical form even with a perfect lexicon, one S can have multiple (T, L) pairs (lexical ambiguity, attachment ambiguity, spurious ambiguity...) solution: define probability distribution P(L, T S) / 42

41 Probabilistic CCG parsing (cont.) here: features = lexicon entries used in a parse feature function f maps parses (L, T, S) to a feature vector that indicates for each lexicon entry how often it is used in T assume a parameter vector θ that scores individual lexical features such that P(L, T S; θ) = f (L,T,S) θ (log-linear model) (L,T) ef (L,T,S) θ e Given S, return the parse (L, T) with the highest P(L, T S; θ) problem: how to find a good lexicon and a good θ? / 42

42 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} / 42

43 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries / 42

44 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs / 42

45 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation / 42

46 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) / 42

47 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) / 42

48 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π / 42

49 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for / 42

50 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon / 42

51 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon θ t := ESTIMATE(Λ t, E, θ t 1 ) parameter estimation / 42

52 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon θ t := ESTIMATE(Λ t, E, θ t 1 ) parameter estimation end for / 42

53 Bird s-eye view of the learning algorithm Input: initial lexicon Λ 0, training examples E = {(S i, L i ) : i = 1... n} Initialization: θ 0 := vector with 0.1 for lexicon entries in Λ 0, 0.01 for all other lexicon entries for t = 1... T do epochs for i = 1... n do lexical generation λ := Λ 0 GENLEX(S i, L i ) π := PARSE(S i, L i, λ, θ t 1 ) λ i := the set of lexicon entries in π end for Λ t := Λ 0 n i=1 λ i update the lexicon θ t := ESTIMATE(Λ t, E, θ t 1 ) end for Output: lexicon Λ T, parameters θ T parameter estimation / 42

54 Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References / 42

55 Results Geo880 Jobs640 Precision Recall Precision Recall Zettlemoyer & Collins [3] Tang & Mooney [2] / 42

56 Some learned lexicon entries states := N : λx.state(x) major := N/N : λf.λx.major(x) f (x) population := N : λx.population(x) cities := N : λx.river(x) rivers := N : λx.river(x) run through := (S \ NP)/NP : λx.λy.traverse(y, x) the largest := NP/N : λf. arg max(f, λx.size(x)) river := N : λx.river(x) the highest := NP/N : λf. arg max(f, λx.elev(x)) the longest := NP/N : λf. arg max(f, λx.len(x)) / 42

57 Outline 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG) 2 Zettlemoyer and Collins (2005) Candidate lexicon generation Features, model, learning Results 3 References / 42

58 [1] Steedman, Mark The syntactic process. The MIT Press. [2] Tang, Lappoon R. & Raymond J. Mooney Using multiple clause constructors in inductive logic programming for semantic parsing. In Proceedings of the 12th european conference on machine learning, [3] Zettlemoyer, Luke & Michael Collins Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. In Proceedings of the twenty-first conference annual conference on uncertainty in artificial intelligence (UAI-05), AUAI Press.

Lecture 5: Semantic Parsing, CCGs, and Structured Classification

Lecture 5: Semantic Parsing, CCGs, and Structured Classification Kyle Richardson kyle@ims.uni-stuttgart.de May 12, 2016 Lecture Plan paper: Zettlemoyer and Collins (2012) general topics: (P)CCGs, compositional