Outline. Learning. Overview Details Example Lexicon learning Supervision signals

Size: px
Start display at page:

Download "Outline. Learning. Overview Details Example Lexicon learning Supervision signals"

Transcription

1 Outline Learning Overview Details Example Lexicon learning Supervision signals 0

2 Outline Learning Overview Details Example Lexicon learning Supervision signals 1

3 Supervision in syntactic parsing Input: S NP VP NP NP V VP ESSLLI 2016 the known summer school is V PP located in Bolzano Output: S They play football NP They V VP NP play football 2

4 [Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011] Supervision in semantic parsing Input: Heavy supervision How tall is Lebron James? HeightOf.LebronJames What is Steph Curry s daughter called? ChildrenOf.StephCurry Gender.Female Youngest player of the Cavaliers arg min(playerof.cavaliers, BirthDateOf)... Light supervision How tall is Lebron James? 203cm What is Steph Curry s daughter called? Riley Curry Youngest player of the Cavaliers Kyrie Irving... 3

5 [Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011] Supervision in semantic parsing Input: Heavy supervision How tall is Lebron James? HeightOf.LebronJames What is Steph Curry s daughter called? ChildrenOf.StephCurry Gender.Female Youngest player of the Cavaliers arg min(playerof.cavaliers, BirthDateOf)... Light supervision How tall is Lebron James? 203cm What is Steph Curry s daughter called? Riley Curry Youngest player of the Cavaliers Kyrie Irving... Output: WeightOf.ClayThompson Clay Thompson s weight ClayThompson s Weight Weight.ClayThompson 205 lbs Clay Thompson weight 3

6 Learning in a nutshell utterance 0. Define model for derivations 4

7 Learning in a nutshell utterance Parsing. 0. Define model for derivations 1. Generate candidate derivations (later) 4

8 Learning in a nutshell utterance Parsing.. Label 0. Define model for derivations 1. Generate candidate derivations (later) 2. Label as correct and incorrect 4

9 Learning in a nutshell utterance Parsing.. Label Update model 0. Define model for derivations 1. Generate candidate derivations (later) 2. Label as correct and incorrect 3. Update model to favor correct trees 4

10 Training intuition Where did Mozart tupress? Vienna 5

11 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna 5

12 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna 5

13 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna 5

14 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna Where did Hogarth tupress? 5

15 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna Where did Hogarth tupress? PlaceOfBirth.WilliamHogarth PlaceOfDeath.WilliamHogarth PlaceOfMarriage.WilliamHogarth London 5

16 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna Where did Hogarth tupress? PlaceOfBirth.WilliamHogarth London PlaceOfDeath.WilliamHogarth London PlaceOfMarriage.WilliamHogarth Paddington London 5

17 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna Where did Hogarth tupress? PlaceOfBirth.WilliamHogarth London PlaceOfDeath.WilliamHogarth London PlaceOfMarriage.WilliamHogarth Paddington London 5

18 Training intuition Where did Mozart tupress? PlaceOfBirth.WolfgangMozart PlaceOfDeath.WolfgangMozart PlaceOfMarriage.WolfgangMozart Vienna Salzburg Vienna Vienna Where did Hogarth tupress? PlaceOfBirth.WilliamHogarth London PlaceOfDeath.WilliamHogarth London PlaceOfMarriage.WilliamHogarth Paddington London 5

19 Outline Learning Overview Details Example Lexicon learning Supervision signals 6

20 Constructing derivations Type.Person PlaceLived.Chicago intersect Type.Person lexicon who PlaceLived.Chicago join people PlaceLived lived in lexicon Chicago Chicago lexicon 7

21 Many possible derivations! x = people who have lived in Chicago? set of candidate derivations D(x) Type.Person PlaceLived.Chicago intersect Type.Org PresentIn.ChicagoMusical intersect Type.Person lexicon who PlaceLived.Chicago join Type.Org lexicon who PresentIn.ChicagoMusical join people PlaceLived Chicago people PresentIn ChicagoMusical lexicon lexicon lexicon lexicon lived in Chicago lived in Chicago 8

22 Type.Person PlaceLived.Chicago intersect x: utterance d: derivation Type.Person people lexicon who PlaceLived.Chicago join PlaceLived Chicago lived in lexicon Chicago lexicon Feature vector and parameters in R F : φ(x, d) θ learned apply join apply intersect apply lexicon lived maps to PlacesLived lived maps to PlaceOfBirth born maps to PlaceOfBirth

23 Type.Person PlaceLived.Chicago intersect x: utterance d: derivation Type.Person people lexicon who PlaceLived.Chicago join PlaceLived Chicago lived in lexicon Chicago lexicon Feature vector and parameters in R F : φ(x, d) θ learned apply join apply intersect apply lexicon lived maps to PlacesLived lived maps to PlaceOfBirth born maps to PlaceOfBirth Score θ (x, d) = φ(x, d) θ =

24 Deep learning alert! The feature vector φ(x, d) is constructed by hand 10

25 Deep learning alert! The feature vector φ(x, d) is constructed by hand Constructing good features is hard 10

26 Deep learning alert! The feature vector φ(x, d) is constructed by hand Constructing good features is hard Algorithms are likely to do it better 10

27 Deep learning alert! The feature vector φ(x, d) is constructed by hand Constructing good features is hard Algorithms are likely to do it better Perhaps we can train φ(x, d) φ(x, d) = F ψ (x, d), where ψ are the parameters 10

28 Candidate derivations: D(x) Log-linear model Model: distribution over derivations d given utterance x p θ (d x) = exp(score θ (x,d)) d D(x) exp(score θ(x,d )) 11

29 Candidate derivations: D(x) Log-linear model Model: distribution over derivations d given utterance x p θ (d x) = exp(score θ (x,d)) d D(x) exp(score θ(x,d )) score θ (x, d) [1, 2, 3, 4] p θ (d x) [ e e+e 2 +e 3 +e 4, e 2 e+e 2 +e 3 +e 4, e 3 e+e 2 +e 3 +e 4, e 4 e+e 2 +e 3 +e 4 ] 11

30 Candidate derivations: D(x) Log-linear model Model: distribution over derivations d given utterance x p θ (d x) = exp(score θ (x,d)) d D(x) exp(score θ(x,d )) score θ (x, d) [1, 2, 3, 4] p θ (d x) [ e e+e 2 +e 3 +e 4, e 2 e+e 2 +e 3 +e 4, e 3 e+e 2 +e 3 +e 4, e 4 e+e 2 +e 3 +e 4 ] Parsing: find the top-k derivation trees D θ (x) 11

31 Features Dense features: intersection=0.67 ent-popularity:high denoation-size:1 12

32 Features Dense features: intersection=0.67 ent-popularity:high denoation-size:1 Sparse features: bridge-binary:study born:placeofbirth city:type.location 12

33 Features Dense features: intersection=0.67 ent-popularity:high denoation-size:1 Sparse features: bridge-binary:study born:placeofbirth city:type.location Syntactic features: ent-pos:nnp NNP join-pos:v NN skip-pos:in 12

34 Features Dense features: intersection=0.67 ent-popularity:high denoation-size:1 Sparse features: bridge-binary:study born:placeofbirth city:type.location Syntactic features: ent-pos:nnp NNP join-pos:v NN skip-pos:in Grammar features: Binary->Verb 12

35 Learning θ: maximum-likelihood Training data: What s Bulgaria s capital? Sofia What movies has Tom Cruise been in? TopGun,VanillaSky, What s Bulgaria s capital? CapitalOf.Bulgaria What movies has Tom Cruise been in? Type.Movie HasPlayed.TomCruise... 13

36 Learning θ: maximum-likelihood Training data: What s Bulgaria s capital? Sofia What movies has Tom Cruise been in? TopGun,VanillaSky, What s Bulgaria s capital? CapitalOf.Bulgaria What movies has Tom Cruise been in? Type.Movie HasPlayed.TomCruise... arg max θ n i=1 log p θ(y (i) x (i) ) = arg max θ n i=1 log d (i) p θ (d (i) x (i) )R(d (i) ) 13

37 Learning θ: maximum-likelihood Training data: What s Bulgaria s capital? Sofia What movies has Tom Cruise been in? TopGun,VanillaSky, What s Bulgaria s capital? CapitalOf.Bulgaria What movies has Tom Cruise been in? Type.Movie HasPlayed.TomCruise... arg max θ n i=1 log p θ(y (i) x (i) ) = arg max θ n i=1 log d (i) p θ (d (i) x (i) )R(d (i) ) R(d) = { 1 d.z = z (i) 0 o/w R(d) = { 1 [d.z] K = y (i) 0 o/w R(d) = F 1 ([d.z] K, y (i) ) 13

38 Optimization: stochastic gradient descent For every example: O(θ) = log d p θ(d x)r(d) O(θ) = E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] p θ (d x) exp(φ(x, d) θ) q θ (d x) exp(φ(x, d) θ) R(d) 14

39 Optimization: stochastic gradient descent For every example: O(θ) = log d p θ(d x)r(d) O(θ) = E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] p θ (d x) exp(φ(x, d) θ) q θ (d x) exp(φ(x, d) θ) R(d) p θ (D(x)) = [0.2, 0.1, 0.1, 0.6] R(D(x)) = [1, 0, 0, 1] 14

40 Optimization: stochastic gradient descent For every example: O(θ) = log d p θ(d x)r(d) O(θ) = E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] p θ (d x) exp(φ(x, d) θ) p θ (D(x)) = [0.2, 0.1, 0.1, 0.6] R(D(x)) = [1, 0, 0, 1] q θ (D(x)) = [0.25, 0, 0, 0.75] q θ = p θ p θ R q θ (d x) exp(φ(x, d) θ) R(d) 14

41 Optimization: stochastic gradient descent For every example: O(θ) = log d p θ(d x)r(d) O(θ) = E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] p θ (d x) exp(φ(x, d) θ) p θ (D(x)) = [0.2, 0.1, 0.1, 0.6] R(D(x)) = [1, 0, 0, 1] q θ (D(x)) = [0.25, 0, 0, 0.75] q θ = p θ p θ R Gradient: q θ (d x) exp(φ(x, d) θ) R(d) 0.05 φ(x, d 1 ) 0.1 φ(x, d 2 ) 0.1 φ(x, d 3 ) φ(x, d 4 ) 14

42 Training Input: {x i, y i } n i=1 Output: θ 15

43 Training Input: {x i, y i } n i=1 Output: θ θ 0 15

44 Training Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i D(x i ) arg max K (p θ (d x i )) 15

45 Training Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i D(x i ) arg max K (p θ (d x i )) θ θ + η τ,i (E qθ (d x i )[φ(x i, d)] E pθ (d x i )[φ(x i, d)]) 15

46 Training Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i D(x i ) arg max K (p θ (d x i )) θ θ + η τ,i (E qθ (d x i )[φ(x i, d)] E pθ (d x i )[φ(x i, d)]) η τ,i : learning rate Regularization often added (L2, L1,...) 15

47 Training (structured perceptron) Input: {x i, y i } n i=1 Output: θ 16

48 Training (structured perceptron) Input: {x i, y i } n i=1 Output: θ θ 0 16

49 Training (structured perceptron) Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i ˆd arg max(p θ (d x i )) d arg max(q θ (d x i )) 16

50 Training (structured perceptron) Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i ˆd arg max(p θ (d x i )) d arg max(q θ (d x i )) if [d ] K [ ˆd] K θ θ + φ(x i, d ) φ(x i, ˆd)) 16

51 Training (structured perceptron) Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i ˆd arg max(p θ (d x i )) d arg max(q θ (d x i )) if [d ] K [ ˆd] K θ θ + φ(x i, d ) φ(x i, ˆd)) Regularization often added with weight averaging 16

52 Training Other simple variants exist: E.g., cost-sensitive max-margin training That is, find pairs of good and bad derivations that look different but have similar scores and update on those 17

53 Outline Learning Overview Details Example Lexicon learning Supervision signals 18

54 \ -Grammar.inPaths esslli 2016/class3 demo.grammar \ -SimpleLexicon.inPaths esslli 2016/class3 demo.lexicon (loadgraph geo880/geo880.kg) size of california size capital california size of capital of california california size 19

55 Exercise Find a pair of natural language utterances that cannot be distinguished using the current feature representation The utterances can be not fully grammatical in English You can ignore the denotation feature if that helps Verify this in sempre (ask me how to disable features) Design a feature that will solve this problem 20

56 Outline Learning Overview Details Example Lexicon learning Supervision signals 21

57 The lexicon problem How is the lexicon generated? Annotation Exhaustive search String matching Supervised alignment Unsupervised alignment Learning 22

58 Training Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i Add lexicon entries D(x i ) arg max K (p θ (d x i ) θ θ + η τ,i (E qθ (d x i )[φ(x i, d)] E pθ (d x i )[φ(x i, d)]) η τ,i : learning rate Regularization often added (L2, L1,...) 23

59 Adding lexicon entries [Adapted from semantic parsing tutorial, Artzi et al.] Input: training example (x i, y i ), current lexicon Λ, model θ Λ temp Λ GENLEX(x i, y i ) Create expanded temporaty lexicon 24

60 Adding lexicon entries [Adapted from semantic parsing tutorial, Artzi et al.] Input: training example (x i, y i ), current lexicon Λ, model θ Λ temp Λ GENLEX(x i, y i ) D(x i ) arg max K (p θ,λtemp (d x i )) Create expanded temporaty lexicon Parse with temporary lexicon 24

61 Adding lexicon entries [Adapted from semantic parsing tutorial, Artzi et al.] Input: training example (x i, y i ), current lexicon Λ, model θ Λ temp Λ GENLEX(x i, y i ) Create expanded temporaty lexicon D(x i ) arg max K (p θ,λtemp (d x i )) Parse with temporary lexicon Λ = Λ {l l ˆd, ˆd D(x i ), R(d) = 1} Add entries from correct trees 24

62 Adding lexicon entries [Adapted from semantic parsing tutorial, Artzi et al.] Input: training example (x i, y i ), current lexicon Λ, model θ Λ temp Λ GENLEX(x i, y i ) Create expanded temporaty lexicon D(x i ) arg max K (p θ,λtemp (d x i )) Parse with temporary lexicon Λ = Λ {l l ˆd, ˆd D(x i ), R(d) = 1} Add entries from correct trees Overgenerate lexical entries and add promising ones 24

63 Lexicon generation [Zettlemoyer and Collins, 2005] Logical form supervision: Largest state bordering California argmax(type(state) Border(California), Area) 25

64 Lexicon generation [Zettlemoyer and Collins, 2005] Logical form supervision: Largest state bordering California argmax(type(state) Border(California), Area) Enumerate spans Largest state bordering California Largest state... Use rules to extract sub-forumulas California Border(California) λf.f(california) Area λx.argmax(x, Area)... 25

65 Lexicon generation [Zettlemoyer and Collins, 2005] Logical form supervision: Largest state bordering California argmax(type(state) Border(California), Area) Enumerate spans Largest state bordering California Largest state... Use rules to extract sub-forumulas California Border(California) λf.f(california) Area λx.argmax(x, Area)... Add cross-product to lexicon 25

66 Lexicon generation Denotation supervision: Largest state bordering California Arizona 26

67 Lexicon generation Denotation supervision: Largest state bordering California Arizona Enumerate spans Largest state bordering California Largest state... Generate sub-formulas from KB California Border(California) Traverse Type.Mountain λx.argmax(x, Elevation)... Restrict candidates with alignment, string matching,... 26

68 Lexicon generation Denotation supervision: Largest state bordering California Arizona Enumerate spans Largest state bordering California Largest state... Generate sub-formulas from KB California Border(California) Traverse Type.Mountain λx.argmax(x, Elevation)... Restrict candidates with alignment, string matching,... Fancier methods exist (coarse-to-fine) 26

69 Unification [Kwiatkowski et al, 2010] Logical form supervision: 27

70 Unification [Kwiatkowski et al, 2010] Logical form supervision: Initialize lexicon with (x i, z i ): States bordering California Type(State) Border(California) Split lexical entry in all possible ways: 27

71 Unification [Kwiatkowski et al, 2010] Logical form supervision: Initialize lexicon with (x i, z i ): States bordering California Type(State) Border(California) Split lexical entry in all possible ways: Enumerate spans (states, bordering california) (states bordering, california) Generate sub-formulas from KB (Type(State), λx.x Border(California)) (λx.type(state) x, Border(California)) (λf.type(state) f(california), California)... 27

72 Unification [Kwiatkowski et al, 2010] For example x i, z i : 28

73 Unification [Kwiatkowski et al, 2010] For example x i, z i : Find highest scoring correct parse d 28

74 Unification [Kwiatkowski et al, 2010] For example x i, z i : Find highest scoring correct parse d Split all lexical entries in d in all possible ways 28

75 Unification [Kwiatkowski et al, 2010] For example x i, z i : Find highest scoring correct parse d Split all lexical entries in d in all possible ways Add to lexicon lexical entry that improves parse score best 28

76 Do we need a lexicon? Type(State) Border(California) Type(State) Border(California) Type State Border California California neighbors 29

77 Do we need a lexicon? Type(State) Border(California) Type(State) Border(California) Type State Border California California neighbors Floating parse tree: generalization of bridging 29

78 Do we need a lexicon? Type(State) Border(California) Type(State) Border(California) Type State Border California California neighbors Floating parse tree: generalization of bridging Perhaps with better learning and search not necessary? 29

79 Outline Learning Overview Details Example Lexicon learning Supervision signals 30

80 Supervision signals We discussed training from logical forms and denotations 31

81 Supervision signals We discussed training from logical forms and denotations Other forms of supervision have been proposed: Demonstrations Distant supervision Conversations Unsupervised Paraphrasing 31

82 Training from demonstrations [Artzi and Zettlemoyer, 2013] Input: (x i, s i, t i ) x i : utterance s i : start state t i : end state move forward until you reach the intersection 32

83 Training from demonstrations [Artzi and Zettlemoyer, 2013] Input: (x i, s i, t i ) x i : utterance s i : start state t i : end state move forward until you reach the intersection λa.move(a) dir(a.forward)... 32

84 Training from demonstrations [Artzi and Zettlemoyer, 2013] Input: (x i, s i, t i ) x i : utterance s i : start state t i : end state move forward until you reach the intersection λa.move(a) dir(a.forward)... An instance of learning from denotations 32

85 Distant supervision [Reddy et al., 2014] Data generation: Decompose declarative text to questions and answers James Cameron is the director of Titanic Q: X is the director of Titanic A: James Cameron 33

86 Distant supervision [Reddy et al., 2014] Data generation: Decompose declarative text to questions and answers James Cameron is the director of Titanic Q: X is the director of Titanic A: James Cameron Declarative text is cheap! 33

87 Distant supervision [Reddy et al., 2014] Training: Use exising non-executable semantic parsers X is the director of Titanic 34

88 Distant supervision [Reddy et al., 2014] Training: Use exising non-executable semantic parsers X is the director of Titanic λx.director(x) director.of.arg1(e, x) director.of.arg2(e, Titanic) 34

89 Distant supervision [Reddy et al., 2014] Training: Use exising non-executable semantic parsers X is the director of Titanic λx.director(x) director.of.arg1(e, x) director.of.arg2(e, Titanic) λx.director(x) FilmDirectedBy(e, x) FileDirected(e, Titanic) λx.producer(x) FilmProducedBy(e, x) FileProduced(e, Titanic) 34

90 Distant supervision [Reddy et al., 2014] Training: Use exising non-executable semantic parsers X is the director of Titanic λx.director(x) director.of.arg1(e, x) director.of.arg2(e, Titanic) λx.director(x) FilmDirectedBy(e, x) FileDirected(e, Titanic) λx.producer(x) FilmProducedBy(e, x) FileProduced(e, Titanic) James Cameron James Camerson, Jon Landau true false 34

91 Training from conversations 35

92 Training from conversations z 1 : From(Atlanta) To(London) z 2 : From(Atlanta) From(London) z 3 : To(Atlanta) To(London) z 4 : To(Atlanta) From(London) 35

93 Training from conversations z 1 : From(Atlanta) To(London) z 2 : From(Atlanta) From(London) z 3 : To(Atlanta) To(London) z 4 : To(Atlanta) From(London) Define loss: Does z align with converation? Does z obey domain constraints? 35

94 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct 36

95 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ 36

96 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ 36

97 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ Until stopping criterion met for example x i S = S (x i, arg max p θ (d x i )) 36

98 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ Until stopping criterion met for example x i S = S (x i, arg max p θ (d x i )) Compute statistics of S 36

99 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ Until stopping criterion met for example x i S = S (x i, arg max p θ (d x i )) Compute statistics of S S conf find confident subset 36

100 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ Until stopping criterion met for example x i S = S (x i, arg max p θ (d x i )) Compute statistics of S S conf find confident subset Train on S conf 36

101 Unsupervised learning [Goldwasser et al., 2011] Intuition: assume repeating patterns are correct Input: {x i } n i=1 Output: θ θ initialized manually, S = φ Until stopping criterion met for example x i S = S (x i, arg max p θ (d x i )) Compute statistics of S S conf find confident subset Train on S conf Substantially lower performance 36

102 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? 37

103 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil 37

104 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? What language is the language of Brazil?... What city is the capital of Brazil? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil 37

105 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? paraphrase model What language is the language of Brazil?... What city is the capital of Brazil? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil 37

106 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? paraphrase model What language is the language of Brazil?... What city is the capital of Brazil? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil Portuguese,... 37

107 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? paraphrase model What language is the language of Brazil?... What city is the capital of Brazil? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil Portuguese,... Model: p θ (c, z x) = p(z x) p θ (c x) Idea: train a large paraphrase model p θ (c x) 37

108 Paraphrasing B. and Liang, 2014 What languages do people in Brazil use? paraphrase model What language is the language of Brazil?... What city is the capital of Brazil? Type.HumanLanguage LanguagesSpoken.Brazil... CapitalOf.Brazil Portuguese,... Model: p θ (c, z x) = p(z x) p θ (c x) Idea: train a large paraphrase model p θ (c x) More later 37

109 Summary We saw how to train from denotations and logical forms We see methods for inducing lexicons during training We reviewed work on how to use even weaker forms of supervision Still an open problem 38

Outline. Learning. Overview Details

Outline. Learning. Overview Details Outline Learning Overview Details 0 Outline Learning Overview Details 1 Supervision in syntactic parsing Input: S NP VP NP NP V VP ESSLLI 2016 the known summer school is V PP located in Bolzano Output:

More information

Semantic Parsing with Combinatory Categorial Grammars

Semantic Parsing with Combinatory Categorial Grammars Semantic Parsing with Combinatory Categorial Grammars Yoav Artzi, Nicholas FitzGerald and Luke Zettlemoyer University of Washington ACL 2013 Tutorial Sofia, Bulgaria Learning Data Learning Algorithm CCG

More information

Driving Semantic Parsing from the World s Response

Driving Semantic Parsing from the World s Response Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,

More information

Outline. Parsing. Approximations Some tricks Learning agenda-based parsers

Outline. Parsing. Approximations Some tricks Learning agenda-based parsers Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 0 Parsing We need to compute compute argmax d D(x) p θ (d x) Inference: Find best tree given model 1 Parsing We need to compute

More information

Learning Dependency-Based Compositional Semantics

Learning Dependency-Based Compositional Semantics Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference Workshop Mar. 0, 0 Percy Liang Google/Stanford joint work with Michael Jordan and Dan Klein Motivating Problem:

More information

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak

More information

Cross-lingual Semantic Parsing

Cross-lingual Semantic Parsing Cross-lingual Semantic Parsing Part I: 11 Dimensions of Semantic Parsing Kilian Evang University of Düsseldorf 1 / 94 Abbreviations NL natural language e.g., English, Bulgarian NLU natural language utterance

More information

Introduction to Semantic Parsing with CCG

Introduction to Semantic Parsing with CCG Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG)

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

Lecture 5: Semantic Parsing, CCGs, and Structured Classification

Lecture 5: Semantic Parsing, CCGs, and Structured Classification Lecture 5: Semantic Parsing, CCGs, and Structured Classification Kyle Richardson kyle@ims.uni-stuttgart.de May 12, 2016 Lecture Plan paper: Zettlemoyer and Collins (2012) general topics: (P)CCGs, compositional

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

NLP Programming Tutorial 11 - The Structured Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is

More information

Bringing machine learning & compositional semantics together: central concepts

Bringing machine learning & compositional semantics together: central concepts Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0

More information

Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)

Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Nathan Schneider (some slides borrowed from Chris Dyer) ENLP 12 February 2018 23 Outline Words, probabilities Features,

More information

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

6.891: Lecture 24 (December 8th, 2003) Kernel Methods

6.891: Lecture 24 (December 8th, 2003) Kernel Methods 6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl

More information

The Infinite PCFG using Hierarchical Dirichlet Processes

The Infinite PCFG using Hierarchical Dirichlet Processes S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise

More information

PAC Generalization Bounds for Co-training

PAC Generalization Bounds for Co-training PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research

More information

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 20, 2017 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic

More information

Personal Project: Shift-Reduce Dependency Parsing

Personal Project: Shift-Reduce Dependency Parsing Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce

More information

Feature selection. Micha Elsner. January 29, 2014

Feature selection. Micha Elsner. January 29, 2014 Feature selection Micha Elsner January 29, 2014 2 Using megam as max-ent learner Hal Daume III from UMD wrote a max-ent learner Pretty typical of many classifiers out there... Step one: create a text file

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David

More information

Lab 5: 16 th April Exercises on Neural Networks

Lab 5: 16 th April Exercises on Neural Networks Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Log-Linear Models, MEMMs, and CRFs

Log-Linear Models, MEMMs, and CRFs Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx

More information

Structured Prediction

Structured Prediction Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization

More information

Topics in Lexical-Functional Grammar. Ronald M. Kaplan and Mary Dalrymple. Xerox PARC. August 1995

Topics in Lexical-Functional Grammar. Ronald M. Kaplan and Mary Dalrymple. Xerox PARC. August 1995 Projections and Semantic Interpretation Topics in Lexical-Functional Grammar Ronald M. Kaplan and Mary Dalrymple Xerox PARC August 199 Kaplan and Dalrymple, ESSLLI 9, Barcelona 1 Constituent structure

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Multi-Component Word Sense Disambiguation

Multi-Component Word Sense Disambiguation Multi-Component Word Sense Disambiguation Massimiliano Ciaramita and Mark Johnson Brown University BLLIP: http://www.cog.brown.edu/research/nlp Ciaramita and Johnson 1 Outline Pattern classification for

More information

Features of Statistical Parsers

Features of Statistical Parsers Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science

More information

Feedforward Neural Networks. Michael Collins, Columbia University

Feedforward Neural Networks. Michael Collins, Columbia University Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Probabilistic Context-Free Grammar

Probabilistic Context-Free Grammar Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612

More information

Introduction to Computational Linguistics

Introduction to Computational Linguistics Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates

More information

Language to Logical Form with Neural Attention

Language to Logical Form with Neural Attention Language to Logical Form with Neural Attention Mirella Lapata and Li Dong Institute for Language, Cognition and Computation School of Informatics University of Edinburgh mlap@inf.ed.ac.uk 1 / 32 Semantic

More information

Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings

Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings Unanimous Prediction for 00% Precision with Application to Learning Semantic Mappings Fereshte Khani Stanford University fereshte@cs.stanford.edu Martin Rinard MIT rinard@lcs.mit.edu Percy Liang Stanford

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP  NP PP 1.0. N people 0. /6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5

More information

Machine Learning Basics Lecture 3: Perceptron. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 3: Perceptron. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview Previous lectures: (Principle for loss function) MLE to derive loss Example: linear

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Structured Prediction Models via the Matrix-Tree Theorem

Structured Prediction Models via the Matrix-Tree Theorem Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu

More information

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Logic and machine learning review. CS 540 Yingyu Liang

Logic and machine learning review. CS 540 Yingyu Liang Logic and machine learning review CS 540 Yingyu Liang Propositional logic Logic If the rules of the world are presented formally, then a decision maker can use logical reasoning to make rational decisions.

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Surface Reasoning Lecture 2: Logic and Grammar

Surface Reasoning Lecture 2: Logic and Grammar Surface Reasoning Lecture 2: Logic and Grammar Thomas Icard June 18-22, 2012 Thomas Icard: Surface Reasoning, Lecture 2: Logic and Grammar 1 Categorial Grammar Combinatory Categorial Grammar Lambek Calculus

More information

CS 662 Sample Midterm

CS 662 Sample Midterm Name: 1 True/False, plus corrections CS 662 Sample Midterm 35 pts, 5 pts each Each of the following statements is either true or false. If it is true, mark it true. If it is false, correct the statement

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Slides credit: Graham Neubig Outline Perceptron: recap and limitations Neural networks Multi-layer perceptron Forward propagation

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21

More information

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing Daniel Ferrés and Horacio Rodríguez TALP Research Center Software Department Universitat Politècnica de Catalunya {dferres,horacio}@lsi.upc.edu

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011 Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information