Outline. Learning. Overview Details Example Lexicon learning Supervision signals

Size: px

Start display at page:

Download "Outline. Learning. Overview Details Example Lexicon learning Supervision signals"

Jayson Golden
6 years ago
Views:

1 Outline Learning Overview Details Example Lexicon learning Supervision signals 0

2 Outline Learning Overview Details Example Lexicon learning Supervision signals 1

3 Supervision in syntactic parsing Input: S NP VP NP NP V VP ESSLLI 2016 the known summer school is V PP located in Bolzano Output: S They play football NP They V VP NP play football 2

4 [Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011] Supervision in semantic parsing Input: Heavy supervision How tall is Lebron James? HeightOf.LebronJames What is Steph Curry s daughter called? ChildrenOf.StephCurry Gender.Female Youngest player of the Cavaliers arg min(playerof.cavaliers, BirthDateOf)... Light supervision How tall is Lebron James? 203cm What is Steph Curry s daughter called? Riley Curry Youngest player of the Cavaliers Kyrie Irving... 3

5 [Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011] Supervision in semantic parsing Input: Heavy supervision How tall is Lebron James? HeightOf.LebronJames What is Steph Curry s daughter called? ChildrenOf.StephCurry Gender.Female Youngest player of the Cavaliers arg min(playerof.cavaliers, BirthDateOf)... Light supervision How tall is Lebron James? 203cm What is Steph Curry s daughter called? Riley Curry Youngest player of the Cavaliers Kyrie Irving... Output: WeightOf.ClayThompson Clay Thompson s weight ClayThompson s Weight Weight.ClayThompson 205 lbs Clay Thompson weight 3

6 Learning in a nutshell utterance 0. Define model for derivations 4

7 Learning in a nutshell utterance Parsing. 0. Define model for derivations 1. Generate candidate derivations (later) 4

8 Learning in a nutshell utterance Parsing.. Label 0. Define model for derivations 1. Generate candidate derivations (later) 2. Label as correct and incorrect 4

9 Learning in a nutshell utterance Parsing.. Label Update model 0. Define model for derivations 1. Generate candidate derivations (later) 2. Label as correct and incorrect 3. Update model to favor correct trees 4

10 Training intuition Where did Mozart tupress? Vienna 5

11 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna 5

12 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna 5

13 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna 5

14 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna Where did Hogarth tupress? 5

15 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna Where did Hogarth tupress? PlaceOfBirth.WilliamHogarth PlaceOfDeath.WilliamHogarth PlaceOfMarriage.WilliamHogarth London 5

16 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna Where did Hogarth tupress? PlaceOfBirth.WilliamHogarth London PlaceOfDeath.WilliamHogarth London PlaceOfMarriage.WilliamHogarth Paddington London 5

17 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna Where did Hogarth tupress? PlaceOfBirth.WilliamHogarth London PlaceOfDeath.WilliamHogarth London PlaceOfMarriage.WilliamHogarth Paddington London 5

18 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna Where did Hogarth tupress? PlaceOfBirth.WilliamHogarth London PlaceOfDeath.WilliamHogarth London PlaceOfMarriage.WilliamHogarth Paddington London 5

19 Outline Learning Overview Details Example Lexicon learning Supervision signals 6

20 Constructing derivations Type.Person PlaceLived.Chicago intersect Type.Person lexicon who PlaceLived.Chicago join people PlaceLived lived in lexicon Chicago Chicago lexicon 7

21 Many possible derivations! x = people who have lived in Chicago? set of candidate derivations D(x) Type.Person PlaceLived.Chicago intersect Type.Org PresentIn.ChicagoMusical intersect Type.Person lexicon who PlaceLived.Chicago join Type.Org lexicon who PresentIn.ChicagoMusical join people PlaceLived Chicago people PresentIn ChicagoMusical lexicon lexicon lexicon lexicon lived in Chicago lived in Chicago 8

22 Type.Person PlaceLived.Chicago intersect x: utterance d: derivation Type.Person people lexicon who PlaceLived.Chicago join PlaceLived Chicago lived in lexicon Chicago lexicon Feature vector and parameters in R F : φ(x, d) θ learned apply join apply intersect apply lexicon lived maps to PlacesLived lived maps to PlaceOfBirth born maps to PlaceOfBirth

23 Type.Person PlaceLived.Chicago intersect x: utterance d: derivation Type.Person people lexicon who PlaceLived.Chicago join PlaceLived Chicago lived in lexicon Chicago lexicon Feature vector and parameters in R F : φ(x, d) θ learned apply join apply intersect apply lexicon lived maps to PlacesLived lived maps to PlaceOfBirth born maps to PlaceOfBirth Score θ (x, d) = φ(x, d) θ =

24 Deep learning alert! The feature vector φ(x, d) is constructed by hand 10

25 Deep learning alert! The feature vector φ(x, d) is constructed by hand Constructing good features is hard 10

26 Deep learning alert! The feature vector φ(x, d) is constructed by hand Constructing good features is hard Algorithms are likely to do it better 10

27 Deep learning alert! The feature vector φ(x, d) is constructed by hand Constructing good features is hard Algorithms are likely to do it better Perhaps we can train φ(x, d) φ(x, d) = F ψ (x, d), where ψ are the parameters 10

28 Candidate derivations: D(x) Log-linear model Model: distribution over derivations d given utterance x p θ (d x) = exp(score θ (x,d)) d D(x) exp(score θ(x,d )) 11

29 Candidate derivations: D(x) Log-linear model Model: distribution over derivations d given utterance x p θ (d x) = exp(score θ (x,d)) d D(x) exp(score θ(x,d )) score θ (x, d) [1, 2, 3, 4] p θ (d x) [ e e+e 2 +e 3 +e 4, e 2 e+e 2 +e 3 +e 4, e 3 e+e 2 +e 3 +e 4, e 4 e+e 2 +e 3 +e 4 ] 11

30 Candidate derivations: D(x) Log-linear model Model: distribution over derivations d given utterance x p θ (d x) = exp(score θ (x,d)) d D(x) exp(score θ(x,d )) score θ (x, d) [1, 2, 3, 4] p θ (d x) [ e e+e 2 +e 3 +e 4, e 2 e+e 2 +e 3 +e 4, e 3 e+e 2 +e 3 +e 4, e 4 e+e 2 +e 3 +e 4 ] Parsing: find the top-k derivation trees D θ (x) 11

31 Features Dense features: intersection=0.67 ent-popularity:high denoation-size:1 12

32 Features Dense features: intersection=0.67 ent-popularity:high denoation-size:1 Sparse features: bridge-binary:study born:placeofbirth city:type.location 12

33 Features Dense features: intersection=0.67 ent-popularity:high denoation-size:1 Sparse features: bridge-binary:study born:placeofbirth city:type.location Syntactic features: ent-pos:nnp NNP join-pos:v NN skip-pos:in 12

34 Features Dense features: intersection=0.67 ent-popularity:high denoation-size:1 Sparse features: bridge-binary:study born:placeofbirth city:type.location Syntactic features: ent-pos:nnp NNP join-pos:v NN skip-pos:in Grammar features: Binary->Verb 12

35 Learning θ: maximum-likelihood Training data: What s Bulgaria s capital? Sofia What movies has Tom Cruise been in? TopGun,VanillaSky, What s Bulgaria s capital? CapitalOf.Bulgaria What movies has Tom Cruise been in? Type.Movie HasPlayed.TomCruise... 13

36 Learning θ: maximum-likelihood Training data: What s Bulgaria s capital? Sofia What movies has Tom Cruise been in? TopGun,VanillaSky, What s Bulgaria s capital? CapitalOf.Bulgaria What movies has Tom Cruise been in? Type.Movie HasPlayed.TomCruise... arg max θ n i=1 log p θ(y (i) x (i) ) = arg max θ n i=1 log d (i) p θ (d (i) x (i) )R(d (i) ) 13

37 Learning θ: maximum-likelihood Training data: What s Bulgaria s capital? Sofia What movies has Tom Cruise been in? TopGun,VanillaSky, What s Bulgaria s capital? CapitalOf.Bulgaria What movies has Tom Cruise been in? Type.Movie HasPlayed.TomCruise... arg max θ n i=1 log p θ(y (i) x (i) ) = arg max θ n i=1 log d (i) p θ (d (i) x (i) )R(d (i) ) R(d) = { 1 d.z = z (i) 0 o/w R(d) = { 1 [d.z] K = y (i) 0 o/w R(d) = F 1 ([d.z] K, y (i) ) 13

38 Optimization: stochastic gradient descent For every example: O(θ) = log d p θ(d x)r(d) O(θ) = E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] p θ (d x) exp(φ(x, d) θ) q θ (d x) exp(φ(x, d) θ) R(d) 14

39 Optimization: stochastic gradient descent For every example: O(θ) = log d p θ(d x)r(d) O(θ) = E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] p θ (d x) exp(φ(x, d) θ) q θ (d x) exp(φ(x, d) θ) R(d) p θ (D(x)) = [0.2, 0.1, 0.1, 0.6] R(D(x)) = [1, 0, 0, 1] 14

40 Optimization: stochastic gradient descent For every example: O(θ) = log d p θ(d x)r(d) O(θ) = E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] p θ (d x) exp(φ(x, d) θ) p θ (D(x)) = [0.2, 0.1, 0.1, 0.6] R(D(x)) = [1, 0, 0, 1] q θ (D(x)) = [0.25, 0, 0, 0.75] q θ = p θ p θ R q θ (d x) exp(φ(x, d) θ) R(d) 14

41 Optimization: stochastic gradient descent For every example: O(θ) = log d p θ(d x)r(d) O(θ) = E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] p θ (d x) exp(φ(x, d) θ) p θ (D(x)) = [0.2, 0.1, 0.1, 0.6] R(D(x)) = [1, 0, 0, 1] q θ (D(x)) = [0.25, 0, 0, 0.75] q θ = p θ p θ R Gradient: q θ (d x) exp(φ(x, d) θ) R(d) 0.05 φ(x, d 1 ) 0.1 φ(x, d 2 ) 0.1 φ(x, d 3 ) φ(x, d 4 ) 14

42 Training Input: {x i, y i } n i=1 Output: θ 15

43 Training Input: {x i, y i } n i=1 Output: θ θ 0 15

44 Training Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i D(x i ) arg max K (p θ (d x i )) 15

45 Training Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i D(x i ) arg max K (p θ (d x i )) θ θ + η τ,i (E qθ (d x i )[φ(x i, d)] E pθ (d x i )[φ(x i, d)]) 15

46 Training Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i D(x i ) arg max K (p θ (d x i )) θ θ + η τ,i (E qθ (d x i )[φ(x i, d)] E pθ (d x i )[φ(x i, d)]) η τ,i : learning rate Regularization often added (L2, L1,...) 15

47 Training (structured perceptron) Input: {x i, y i } n i=1 Output: θ 16

48 Training (structured perceptron) Input: {x i, y i } n i=1 Output: θ θ 0 16

49 Training (structured perceptron) Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i ˆd arg max(p θ (d x i )) d arg max(q θ (d x i )) 16

50 Training (structured perceptron) Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i ˆd arg max(p θ (d x i )) d arg max(q θ (d x i )) if [d ] K [ ˆd] K θ θ + φ(x i, d ) φ(x i, ˆd)) 16

51 Training (structured perceptron) Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i ˆd arg max(p θ (d x i )) d arg max(q θ (d x i )) if [d ] K [ ˆd] K θ θ + φ(x i, d ) φ(x i, ˆd)) Regularization often added with weight averaging 16

52 Training Other simple variants exist: E.g., cost-sensitive max-margin training That is, find pairs of good and bad derivations that look different but have similar scores and update on those 17

53 Outline Learning Overview Details Example Lexicon learning Supervision signals 18

54 \ -Grammar.inPaths esslli 2016/class3 demo.grammar \ -SimpleLexicon.inPaths esslli 2016/class3 demo.lexicon (loadgraph geo880/geo880.kg) size of california size capital california size of capital of california california size 19

55 Exercise Find a pair of natural language utterances that cannot be distinguished using the current feature representation The utterances can be not fully grammatical in English You can ignore the denotation feature if that helps Verify this in sempre (ask me how to disable features) Design a feature that will solve this problem 20

56 Outline Learning Overview Details Example Lexicon learning Supervision signals 21

57 The lexicon problem How is the lexicon generated? Annotation Exhaustive search String matching Supervised alignment Unsupervised alignment Learning 22

58 Training Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i Add lexicon entries D(x i ) arg max K (p θ (d x i ) θ θ + η τ,i (E qθ (d x i )[φ(x i, d)] E pθ (d x i )[φ(x i, d)]) η τ,i : learning rate Regularization often added (L2, L1,...) 23

59 Adding lexicon entries [Adapted from semantic parsing tutorial, Artzi et al.] Input: training example (x i, y i ), current lexicon Λ, model θ Λ temp Λ GENLEX(x i, y i ) Create expanded temporaty lexicon 24

60 Adding lexicon entries [Adapted from semantic parsing tutorial, Artzi et al.] Input: training example (x i, y i ), current lexicon Λ, model θ Λ temp Λ GENLEX(x i, y i ) D(x i ) arg max K (p θ,λtemp (d x i )) Create expanded temporaty lexicon Parse with temporary lexicon 24

61 Adding lexicon entries [Adapted from semantic parsing tutorial, Artzi et al.] Input: training example (x i, y i ), current lexicon Λ, model θ Λ temp Λ GENLEX(x i, y i ) Create expanded temporaty lexicon D(x i ) arg max K (p θ,λtemp (d x i )) Parse with temporary lexicon Λ = Λ {l l ˆd, ˆd D(x i ), R(d) = 1} Add entries from correct trees 24

62 Adding lexicon entries [Adapted from semantic parsing tutorial, Artzi et al.] Input: training example (x i, y i ), current lexicon Λ, model θ Λ temp Λ GENLEX(x i, y i ) Create expanded temporaty lexicon D(x i ) arg max K (p θ,λtemp (d x i )) Parse with temporary lexicon Λ = Λ {l l ˆd, ˆd D(x i ), R(d) = 1} Add entries from correct trees Overgenerate lexical entries and add promising ones 24

63 Lexicon generation [Zettlemoyer and Collins, 2005] Logical form supervision: Largest state bordering California argmax(type(state) Border(California), Area) 25

64 Lexicon generation [Zettlemoyer and Collins, 2005] Logical form supervision: Largest state bordering California argmax(type(state) Border(California), Area) Enumerate spans Largest state bordering California Largest state... Use rules to extract sub-forumulas California Border(California) λf.f(california) Area λx.argmax(x, Area)... 25

65 Lexicon generation [Zettlemoyer and Collins, 2005] Logical form supervision: Largest state bordering California argmax(type(state) Border(California), Area) Enumerate spans Largest state bordering California Largest state... Use rules to extract sub-forumulas California Border(California) λf.f(california) Area λx.argmax(x, Area)... Add cross-product to lexicon 25

66 Lexicon generation Denotation supervision: Largest state bordering California Arizona 26

67 Lexicon generation Denotation supervision: Largest state bordering California Arizona Enumerate spans Largest state bordering California Largest state... Generate sub-formulas from KB California Border(California) Traverse Type.Mountain λx.argmax(x, Elevation)... Restrict candidates with alignment, string matching,... 26

68 Lexicon generation Denotation supervision: Largest state bordering California Arizona Enumerate spans Largest state bordering California Largest state... Generate sub-formulas from KB California Border(California) Traverse Type.Mountain λx.argmax(x, Elevation)... Restrict candidates with alignment, string matching,... Fancier methods exist (coarse-to-fine) 26

69 Unification [Kwiatkowski et al, 2010] Logical form supervision: 27

70 Unification [Kwiatkowski et al, 2010] Logical form supervision: Initialize lexicon with (x i, z i ): States bordering California Type(State) Border(California) Split lexical entry in all possible ways: 27

71 Unification [Kwiatkowski et al, 2010] Logical form supervision: Initialize lexicon with (x i, z i ): States bordering California Type(State) Border(California) Split lexical entry in all possible ways: Enumerate spans (states, bordering california) (states bordering, california) Generate sub-formulas from KB (Type(State), λx.x Border(California)) (λx.type(state) x, Border(California)) (λf.type(state) f(california), California)... 27

72 Unification [Kwiatkowski et al, 2010] For example x i, z i : 28

73 Unification [Kwiatkowski et al, 2010] For example x i, z i : Find highest scoring correct parse d 28

74 Unification [Kwiatkowski et al, 2010] For example x i, z i : Find highest scoring correct parse d Split all lexical entries in d in all possible ways 28

75 Unification [Kwiatkowski et al, 2010] For example x i, z i : Find highest scoring correct parse d Split all lexical entries in d in all possible ways Add to lexicon lexical entry that improves parse score best 28

76 Do we need a lexicon? Type(State) Border(California) Type(State) Border(California) Type State Border California California neighbors 29

77 Do we need a lexicon? Type(State) Border(California) Type(State) Border(California) Type State Border California California neighbors Floating parse tree: generalization of bridging 29

78 Do we need a lexicon? Type(State) Border(California) Type(State) Border(California) Type State Border California California neighbors Floating parse tree: generalization of bridging Perhaps with better learning and search not necessary? 29

79 Outline Learning Overview Details Example Lexicon learning Supervision signals 30

80 Supervision signals We discussed training from logical forms and denotations 31

81 Supervision signals We discussed training from logical forms and denotations Other forms of supervision have been proposed: Demonstrations Distant supervision Conversations Unsupervised Paraphrasing 31

82 Training from demonstrations [Artzi and Zettlemoyer, 2013] Input: (x i, s i, t i ) x i : utterance s i : start state t i : end state move forward until you reach the intersection 32

83 Training from demonstrations [Artzi and Zettlemoyer, 2013] Input: (x i, s i, t i ) x i : utterance s i : start state t i : end state move forward until you reach the intersection λa.move(a) dir(a.forward)... 32

84 Training from demonstrations [Artzi and Zettlemoyer, 2013] Input: (x i, s i, t i ) x i : utterance s i : start state t i : end state move forward until you reach the intersection λa.move(a) dir(a.forward)... An instance of learning from denotations 32

85 Distant supervision [Reddy et al., 2014] Data generation: Decompose declarative text to questions and answers James Cameron is the director of Titanic Q: X is the director of Titanic A: James Cameron 33

86 Distant supervision [Reddy et al., 2014] Data generation: Decompose declarative text to questions and answers James Cameron is the director of Titanic Q: X is the director of Titanic A: James Cameron Declarative text is cheap! 33

87 Distant supervision [Reddy et al., 2014] Training: Use exising non-executable semantic parsers X is the director of Titanic 34

88 Distant supervision [Reddy et al., 2014] Training: Use exising non-executable semantic parsers X is the director of Titanic λx.director(x) director.of.arg1(e, x) director.of.arg2(e, Titanic) 34

89 Distant supervision [Reddy et al., 2014] Training: Use exising non-executable semantic parsers X is the director of Titanic λx.director(x) director.of.arg1(e, x) director.of.arg2(e, Titanic) λx.director(x) FilmDirectedBy(e, x) FileDirected(e, Titanic) λx.producer(x) FilmProducedBy(e, x) FileProduced(e, Titanic) 34

90 Distant supervision [Reddy et al., 2014] Training: Use exising non-executable semantic parsers X is the director of Titanic λx.director(x) director.of.arg1(e, x) director.of.arg2(e, Titanic) λx.director(x) FilmDirectedBy(e, x) FileDirected(e, Titanic) λx.producer(x) FilmProducedBy(e, x) FileProduced(e, Titanic) James Cameron James Camerson, Jon Landau true false 34

91 Training from conversations 35

92 Training from conversations z 1 : From(Atlanta) To(London) z 2 : From(Atlanta) From(London) z 3 : To(Atlanta) To(London) z 4 : To(Atlanta) From(London) 35

93 Training from conversations z 1 : From(Atlanta) To(London) z 2 : From(Atlanta) From(London) z 3 : To(Atlanta) To(London) z 4 : To(Atlanta) From(London) Define loss: Does z align with converation? Does z obey domain constraints? 35

94 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct 36

95 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ 36

96 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ 36

97 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ Until stopping criterion met for example x i S = S (x i, arg max p θ (d x i )) 36

98 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ Until stopping criterion met for example x i S = S (x i, arg max p θ (d x i )) Compute statistics of S 36

99 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ Until stopping criterion met for example x i S = S (x i, arg max p θ (d x i )) Compute statistics of S S conf find confident subset 36

100 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ Until stopping criterion met for example x i S = S (x i, arg max p θ (d x i )) Compute statistics of S S conf find confident subset Train on S conf 36

101 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ Until stopping criterion met for example x i S = S (x i, arg max p θ (d x i )) Compute statistics of S S conf find confident subset Train on S conf Substantially lower performance 36

102 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? 37

103 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil 37

104 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? What language is the language of Brazil?... What city is the capital of Brazil? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil 37

105 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? paraphrase model What language is the language of Brazil?... What city is the capital of Brazil? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil 37

106 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? paraphrase model What language is the language of Brazil?... What city is the capital of Brazil? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil Portuguese,... 37

107 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? paraphrase model What language is the language of Brazil?... What city is the capital of Brazil? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil Portuguese,... Model: p θ (c, z x) = p(z x) p θ (c x) Idea: train a large paraphrase model p θ (c x) 37

108 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? paraphrase model What language is the language of Brazil?... What city is the capital of Brazil? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil Portuguese,... Model: p θ (c, z x) = p(z x) p θ (c x) Idea: train a large paraphrase model p θ (c x) More later 37

109 Summary We saw how to train from denotations and logical forms We see methods for inducing lexicons during training We reviewed work on how to use even weaker forms of supervision Still an open problem 38

Outline. Learning. Overview Details

Outline. Learning. Overview Details Outline Learning Overview Details 0 Outline Learning Overview Details 1 Supervision in syntactic parsing Input: S NP VP NP NP V VP ESSLLI 2016 the known summer school is V PP located in Bolzano Output: