Outline. Parsing. Approximations Some tricks Learning agenda-based parsers

Size: px
Start display at page:

Download "Outline. Parsing. Approximations Some tricks Learning agenda-based parsers"

Transcription

1 Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 0

2 Parsing We need to compute compute argmax d D(x) p θ (d x) Inference: Find best tree given model 1

3 Parsing We need to compute compute argmax d D(x) p θ (d x) Inference: Find best tree given model E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] Learning: Computing gradient O(θ) = E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] 1

4 Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 2

5 Finding best tree Can we use CKY? Only if f(x, y) = r d g(x, r) θ r = (A, B, C, i, j, k) is a rule production Why not? is g sufficient? 3

6 Finding best tree Maybe we can only have type features? Root :WonElection(Type(Country)) Root :WonAward(Type(Country)) Binary :WonElection Set :Type(Country) Binary :WonAward Set :Type(Country) won country won country Type features: binary-unary type match 4

7 Finding best tree Maybe we can only have type features? Root :WonElection(Type(Country)) Root :WonAward(Type(Country)) Binary :WonElection Set :Type(Country) Binary :WonAward Set :Type(Country) won country won country Type features: binary-unary type match We can get exact decoding if augment state: Binary-Person-Country, Binary-Person-Award, Set-Country, Set-Award Explosion in grammar size! If number of types is T, then N grows by at least T 2 4

8 Finding best tree In practice many features depend on the logical form itself Bridging features If we could hold the logical form as part of the state we could use dynamic programming, but usually that is not possible 5

9 E qθ (d x)[φ(x, d)] = d D(x) Logical form features exp(φ(x,d) θ) R(d) φ(x, d) d exp(φ(x,d ) θ) R(d ) R(d) depends on entire logical form - too large! 6

10 E qθ (d x)[φ(x, d)] = d D(x) Logical form features exp(φ(x,d) θ) R(d) φ(x, d) d exp(φ(x,d ) θ) R(d ) R(d) depends on entire logical form - too large! Do approximate inference! 6

11 Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 7

12 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways abraham lincoln born in 8

13 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways 20 abraham lincoln born in 8

14 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K abraham lincoln born in 8

15 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K 20 abraham lincoln born in 8

16 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K abraham lincoln born in 8

17 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K 391 abraham lincoln born in 8

18 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K abraham lincoln born in 8

19 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K 508 abraham lincoln born in 8

20 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K abraham lincoln born in 8

21 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K 2 K K abraham lincoln K born K in 8

22 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K abraham lincoln K born K in 8

23 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K 2 K K K K abraham lincoln born in 8

24 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K abraham lincoln born in 8

25 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K 2 K K K K abraham lincoln born in 8

26 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K abraham lincoln born in 8

27 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K 2 K K K K K K K abraham lincoln born in 8

28 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K K abraham lincoln born in 8

29 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K 2 K K K K K K K abraham lincoln born in 8

30 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K K K abraham lincoln born in 8

31 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K 2 K K K K K K K K K abraham lincoln born in 8

32 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K K K K abraham lincoln born in 8

33 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K K K K abraham lincoln born in Building a chart cell is now O(K 2 + L) and not O(1) 8

34 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K K K K abraham lincoln born in Building a chart cell is now O(K 2 + L) and not O(1) Assume root beam ˆD(x) D(x) E pθ [φ(x, d)] = d ˆD(x) p θ(d)φ(x, d) E qθ [φ(x, d)] = d ˆD(x) q θ(d)φ(x, d) 8

35 Beam Parsing variants Keep top-k for every span (i, j) and category A Keep top-k for every span (i, j) 9

36 Beam Parsing variants Keep top-k for every span (i, j) and category A Keep top-k for every span (i, j) Prune cell with probability ratio 9

37 Beam Parsing variants Keep top-k for every span (i, j) and category A Keep top-k for every span (i, j) Prune cell with probability ratio for α = 10: [0.6, 0.2, 0.1, 0.05, 0.03, 0.02] 9

38 Beam Parsing variants Keep top-k for every span (i, j) and category A Keep top-k for every span (i, j) Prune cell with probability ratio for α = 10: [0.6, 0.2, 0.1] 9

39 Beam Parsing variants Keep top-k for every span (i, j) and category A Keep top-k for every span (i, j) Prune cell with probability ratio for α = 10: [0.6, 0.2, 0.1] Learn a classifier that predict K for every chart cell, [Bodenstab et al., 2011] 9

40 Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 10

41 Coarse pruning Most of parsing time is in building logical forms and extracting features Extracting lexical items Executing logical forms Suggesting binary predicates for bridging Do 2-pass parsing. First only recognizing chart cells (without feature extraction), and then again pruning unnecessary chrat cells 11

42 Coarse pruning 1. Build derivations without logical forms and features (one derivation per rule application) - fast! 12

43 Coarse pruning 1. Build derivations without logical forms and features (one derivation per rule application) - fast! 2. Record cell reachability for every rule application: (A, i, j) (B, i, k), (C, k, j) 12

44 Coarse pruning 1. Build derivations without logical forms and features (one derivation per rule application) - fast! 2. Record cell reachability for every rule application: (A, i, j) (B, i, k), (C, k, j) 3. Collect all chart cells that are reachable from (S, 1, n) 12

45 Coarse pruning 1. Build derivations without logical forms and features (one derivation per rule application) - fast! 2. Record cell reachability for every rule application: (A, i, j) (B, i, k), (C, k, j) 3. Collect all chart cells that are reachable from (S, 1, n) 4. Parse again ignoring unreachable states 12

46 Coarse pruning 1. Build derivations without logical forms and features (one derivation per rule application) - fast! 2. Record cell reachability for every rule application: (A, i, j) (B, i, k), (C, k, j) 3. Collect all chart cells that are reachable from (S, 1, n) 4. Parse again ignoring unreachable states We know unreachable states do not contribute to root derivations over sentence 12

47 Coarse parsing Root Root Root Binary 0.9 Set 0.7 won Binary 0.1 Set 0.3 Mod 1.0 Conj 1.0 country 13

48 Coarse parsing Root Root Root Binary 0.9 Set 0.7 won Binary 0.1 Set 0.3 Mod 1.0 Conj 1.0 country π(0, 2, Root) (π(0, 1, Binary), π(1, 2, Set)) (π(0, 1, Set), π(1, 2, Binary)) (π(0, 1, Set), π(1, 2, Mod)) π(0, 1, Binary) π(0, 1, Set) π(1, 2, Binary) π(1, 2, Set) π(1, 2, Mod) π(1, 2, Conj) 13

49 Unary rules in CKY Unary rules are convenient when desinging grammars Set Entity Set Unary 14

50 Unary rules in CKY Unary rules are convenient when desinging grammars Set Entity Set Unary Assume: rules A B and B C A derivation d C i:j Computing π(i, j, A) before π(i, j, B) Why this fails? 14

51 Unary rules Solution: Construct an acyclic (!) directed graph G = (V, E): V = N E are unary rules B A D C 15

52 Unary rules Solution: Construct an acyclic (!) directed graph G = (V, E): V = N E are unary rules A B D Sort topolocially edges and apply rules from end to start after binary rules Guaranteed to apply B C before A B C 15

53 Unary rules Solution: Construct an acyclic (!) directed graph G = (V, E): V = N E are unary rules A B D Sort topolocially edges and apply rules from end to start after binary rules Guaranteed to apply B C before A B Need to verify grammar has no unary cycles 15 C

54 Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 16

55 [Daume et al., 2009; Ross et al., 2011; Jiang et al., 2012, Goldberg & Nivre, 2013] Learning to search We have a globaly-normalized model, but our search procedure is inexact and slow Perhaps we can make it faster with transition-style methods? Let s try to change the learning problem so that we explicitly learn to search 17

56 Fixed-order beam parsing abraham lincoln born in 18

57 Fixed-order beam parsing 20 abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... 18

58 Fixed-order beam parsing K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... 18

59 Fixed-order beam parsing K 20 abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... 18

60 Fixed-order beam parsing K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... 18

61 Fixed-order beam parsing K K 391 abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... 18

62 Fixed-order beam parsing K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... 18

63 Fixed-order beam parsing K K K 508 abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... 18

64 Fixed-order beam parsing K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... 18

65 Fixed-order beam parsing K 2 K K abraham lincoln K born K in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... 18

66 Fixed-order beam parsing K K K abraham lincoln K born K in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... 18

67 Fixed-order beam parsing K K 2 K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18

68 Fixed-order beam parsing K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18

69 Fixed-order beam parsing K K K 2 K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18

70 Fixed-order beam parsing K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18

71 Fixed-order beam parsing K 2 K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18

72 Fixed-order beam parsing K K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18

73 Fixed-order beam parsing K K 2 K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18

74 Fixed-order beam parsing K K K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18

75 Fixed-order beam parsing K 2 K K K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18

76 Fixed-order beam parsing K K K K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18

77 Fixed-order beam parsing K K K K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... Many improbable derivations generated 18

78 Goal: control parsing order abraham lincoln born in 19

79 Goal: control parsing order 1 abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln 19

80 Goal: control parsing order 2 abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument 19

81 Goal: control parsing order 2 1 abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln 19

82 Goal: control parsing order 2 1 abraham lincoln 1 born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf 19

83 Goal: control parsing order 2 1 abraham lincoln 2 born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf 19

84 Goal: control parsing order 2 1 abraham lincoln 3 born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded 19

85 Goal: control parsing order abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf 19

86 Goal: control parsing order abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf (skip) Binary Binary Token born in BirthDate 19

87 Goal: control parsing order abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf (skip) Binary Binary Token born in BirthDate (skip) Binary Binary Token born in OrganizationFounded 19

88 Goal: control parsing order abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf (skip) Binary Binary Token born in BirthDate (skip) Binary Binary Token born in OrganizationFounded (join) Set Set Binary BirthPlaceOf.AbeLincoln 19

89 Goal: control parsing order abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf (skip) Binary Binary Token born in BirthDate (skip) Binary Binary Token born in OrganizationFounded (join) Set Set Binary BirthPlaceOf.AbeLincoln (join) Set Set Binary BirthDateOf.AbeLincoln 19

90 Goal: control parsing order K abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf (skip) Binary Binary Token born in BirthDate (skip) Binary Binary Token born in OrganizationFounded (join) Set Set Binary BirthPlaceOf.AbeLincoln (join) Set Set Binary BirthDateOf.AbeLincoln 19

91 20

92 Agenda-based parsing Control order of derivation generation! [Kay, 1986; Klein and Manning, 2003; Pauls and Klein, 2009] agenda + chart 21

93 Agenda-based parsing Control order of derivation generation! [Kay, 1986; Klein and Manning, 2003; Pauls and Klein, 2009] Parsing action agenda + chart Choose derivation (based on p θ (d x)) 21

94 Agenda-based parsing [Kay, 1986; Klein and Manning, 2003; Pauls and Klein, 2009] Control order of derivation generation! agenda + chart Parsing action Choose derivation (based on p θ (d x)) Execute action: Add derivation to chart 21

95 Agenda-based parsing [Kay, 1986; Klein and Manning, 2003; Pauls and Klein, 2009] Control order of derivation generation! Parsing action agenda + chart Choose derivation (based on p θ (d x)) Execute action: Add derivation to chart Combine and add to agenda 21

96 Agenda-based parsing [Kay, 1986; Klein and Manning, 2003; Pauls and Klein, 2009] Control order of derivation generation! Parsing action agenda + chart Choose derivation (based on p θ (d x)) Execute action: Add derivation to chart Combine and add to agenda Stop when K root derivations, rerank, and return best 21

97 Learning a scoring function Scoring function: A distribution over full derivations p θ (d x): Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in 22

98 Learning a scoring function Scoring function: A distribution over full derivations p θ (d x): Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in But search over partial derivations: LincolnTown USSLincoln BirthplaceOf.Lincoln lincoln lincoln AbeLincoln BirthplaceOf lincoln born in 22

99 Learning a scoring function Scoring function: A distribution over full derivations p θ (d x): Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in But search over partial derivations: LincolnTown USSLincoln BirthplaceOf.Lincoln lincoln lincoln AbeLincoln BirthplaceOf lincoln born in Idea: learn explcitly to make good search choices 22

100 Markov Decision Process [Sutton et al., 1999] s 1 state s derivations on chart and agenda 23

101 Markov Decision Process [Sutton et al., 1999] s 1 s 2 a 1 state s derivations on chart and agenda action a choosing an agenda derivation p θ (a s) 23

102 Markov Decision Process [Sutton et al., 1999] s 1 s 2 s 3 s T +1 a 1 a 2 a 3 a T... state s derivations on chart and agenda action a choosing an agenda derivation p θ (a s) History h = (s 1, a 1,..., a T, s T +1 ) p θ (h) = T t=1 p(a t s t ) 23

103 Markov Decision Process [Sutton et al., 1999] s 1 s 2 s 3 s T +1 a 1 a 2 a 3 a T... state s derivations on chart and agenda action a choosing an agenda derivation p θ (a s) History h = (s 1, a 1,..., a T, s T +1 ) p θ (h) = T t=1 p(a t s t ) Objective: O(θ) = E pθ [Acc(h)] = h p θ(h)acc(h) Explicitly model all parsing actions in all states. 23

104 Learning is hard! s t 1 s t s t+1 a t 1 a t O(θ) = E pθ (h) [ Acc(h) ] T t=1 log p θ(a t s t ) 24

105 Learning is hard! s t 1 s t s t+1 a t 1 a t O(θ) = E pθ (h) Challenge: [ Acc(h) ] T t=1 log p θ(a t s t ) Sampling from p θ leads to slow learning. 24

106 Learning is hard! s t 1 s t s t+1 a t 1 a t O(θ) = E pθ (h) Challenge: [ Acc(h) ] T t=1 log p θ(a t s t ) Sampling from p θ leads to slow learning. Imitation Learning: follow a proposal distribution. 24

107 Proposal distribution - local reweighting Assume a oracle d : e β e β d q θ (a s) p θ (a s) exp[β I[a in d ]] 25

108 Proposal distribution - local reweighting Assume a oracle d : e β e β d q θ (a s) p θ (a s) exp[β I[a in d ]] Prefer sampling from oracle sub-derivations 25

109 Proposal distribution - compression a 4 = d a 1 a 3 h : a 1 a 2 a 3 a 4 s 1 s 2 s 3 s 4 s 5 compressed history c(h) = (s 1, a 1, s 3, a 3, s 4, a 4 ) 26

110 Proposal distribution - compression a 4 = d a 1 a 3 h : a 1 a 2 a 3 a 4 s 1 s 2 s 3 s 4 s 5 compressed history c(h) = (s 1, a 1, s 3, a 3, s 4, a 4 ) q θ (h) = h :c(h )=h p θ(h) 26

111 Proposal distribution - compression a 4 = d a 1 a 3 h : a 1 a 2 a 3 a 4 s 1 s 2 s 3 s 4 s 5 compressed history c(h) = (s 1, a 1, s 3, a 3, s 4, a 4 ) q θ (h) = h :c(h )=h p θ(h) Ignore non-oracle actions 26

112 Proposal distribution city lincoln born Hodgenville, KY 27

113 Proposal distribution city lincoln born Hodgenville, KY H H KY 27

114 Proposal distribution city lincoln born Hodgenville, KY H H KY d Type.CityTown BirthPlaceOf.ABeLincoln Type.CityTown BirthPlaceOf.ABeLincoln city ABeLincoln BirthPlaceOf lincoln born 27

115 Proposal distribution city lincoln born Hodgenville, KY H H KY d Type.CityTown BirthPlaceOf.ABeLincoln Type.CityTown BirthPlaceOf.ABeLincoln Prefer sampling subtrees of d city ABeLincoln BirthPlaceOf lincoln born 27

116 Learning Input: {x i, y i } n i=1 Output: θ 28

117 Learning Input: {x i, y i } n i=1 Output: θ θ 0 28

118 Learning Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i h 0 parse x i sampling from model distribution 28

119 Learning Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i h 0 parse x i sampling from model distribution d choose oracle from h 0 28

120 Learning Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i h 0 parse x i sampling from model distribution d choose oracle from h 0 h parse x i sampling from proposal distribution 28

121 Learning Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i h 0 parse x i sampling from model distribution d choose oracle from h 0 h parse x i sampling from proposal distribution Update θ using proposed history h 28

122 Learning Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i h 0 parse x i sampling from model distribution d choose oracle from h 0 h parse x i sampling from proposal distribution Update θ using proposed history h Test time: standard agenda-based parsing 28

123 Summary Agenda-based parsing Learning to search Oracle (imitation learning) guides learning 29

124 Evaluation: WebQuestions dataset [Berant et al. 2013] What character did Natalie Portman play in Star Wars? Padmé Amidala What kind of money to take to Bahamas? Bahamian dollar What currency do you use in Costa Rica? Costa Rican colón What did Obama study in school? political science What do Michelle Obama do for a living? writer, lawyer What killed Sammy Davis Jr? throat cancer 30

125 Comparison with beam parsing (test) Test Train actions Time (ms) +learn +parse AgendaIL , learn -parse FixedOrder ,127 1,782 Comparable accuarcy 13 times fewer actions 6 times faster 31

126 Learning analysis (dev) Dev actions Time (ms) +learn +parse AgendaIL , learn -parse FixedOrder ,259 1,972 -learn +parse Agenda , AgendaIL is faster and better 32

127 Beam size analysis (dev) 33

128 WebQuestions SOTA [Yao, 2015] [Yih et al., 2015] [BH, 2015] [this work] [Reddy et al., 2016] [Yih et al., 2015] 34

Outline. Learning. Overview Details

Outline. Learning. Overview Details Outline Learning Overview Details 0 Outline Learning Overview Details 1 Supervision in syntactic parsing Input: S NP VP NP NP V VP ESSLLI 2016 the known summer school is V PP located in Bolzano Output:

More information

Outline. Learning. Overview Details Example Lexicon learning Supervision signals

Outline. Learning. Overview Details Example Lexicon learning Supervision signals Outline Learning Overview Details Example Lexicon learning Supervision signals 0 Outline Learning Overview Details Example Lexicon learning Supervision signals 1 Supervision in syntactic parsing Input:

More information

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak

More information

Better! Faster! Stronger*!

Better! Faster! Stronger*! Jason Eisner Jiarong Jiang He He Better! Faster! Stronger*! Learning to balance accuracy and efficiency when predicting linguistic structures (*theorems) Hal Daumé III UMD CS, UMIACS, Linguistics me@hal3.name

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Improving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley

Improving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley Improving Neural Parsing by Disentangling Model Combination and Reranking Effects Daniel Fried*, Mitchell Stern* and Dan Klein UC erkeley Top-down generative models Top-down generative models S VP The

More information

Personal Project: Shift-Reduce Dependency Parsing

Personal Project: Shift-Reduce Dependency Parsing Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

Introduction to Semantic Parsing with CCG

Introduction to Semantic Parsing with CCG Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG)

More information

Topics in Natural Language Processing

Topics in Natural Language Processing Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 5 Solving an NLP Problem When modelling a new problem in NLP, need to

More information

Driving Semantic Parsing from the World s Response

Driving Semantic Parsing from the World s Response Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Graph-based Dependency Parsing. Ryan McDonald Google Research

Graph-based Dependency Parsing. Ryan McDonald Google Research Graph-based Dependency Parsing Ryan McDonald Google Research ryanmcd@google.com Reader s Digest Graph-based Dependency Parsing Ryan McDonald Google Research ryanmcd@google.com root ROOT Dependency Parsing

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)

Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing

More information

Multilevel Coarse-to-Fine PCFG Parsing

Multilevel Coarse-to-Fine PCFG Parsing Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP  NP PP 1.0. N people 0. /6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5

More information

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up

More information

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week

More information

Cross-lingual Semantic Parsing

Cross-lingual Semantic Parsing Cross-lingual Semantic Parsing Part I: 11 Dimensions of Semantic Parsing Kilian Evang University of Düsseldorf 1 / 94 Abbreviations NL natural language e.g., English, Bulgarian NLU natural language utterance

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,

More information

Aspects of Tree-Based Statistical Machine Translation

Aspects of Tree-Based Statistical Machine Translation Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based

More information

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett

CS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week

More information

Quantum Searching. Robert-Jan Slager and Thomas Beuman. 24 november 2009

Quantum Searching. Robert-Jan Slager and Thomas Beuman. 24 november 2009 Quantum Searching Robert-Jan Slager and Thomas Beuman 24 november 2009 1 Introduction Quantum computers promise a significant speed-up over classical computers, since calculations can be done simultaneously.

More information

Algorithms for Syntax-Aware Statistical Machine Translation

Algorithms for Syntax-Aware Statistical Machine Translation Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial

More information

Marrying Dynamic Programming with Recurrent Neural Networks

Marrying Dynamic Programming with Recurrent Neural Networks Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying

More information

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu סקר הערכת הוראה Fill it! Presentations next week 21 projects 20/6: 12 presentations 27/6: 9 presentations 10 minutes

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0

More information

AN ABSTRACT OF THE DISSERTATION OF

AN ABSTRACT OF THE DISSERTATION OF AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms

More information

Large-scale Collaborative Ranking in Near-Linear Time

Large-scale Collaborative Ranking in Near-Linear Time Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack

More information

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David

More information

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

Soft Inference and Posterior Marginals. September 19, 2013

Soft Inference and Posterior Marginals. September 19, 2013 Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

with Local Dependencies

with Local Dependencies CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence

More information

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment. even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

Transition-Based Parsing

Transition-Based Parsing Transition-Based Parsing Based on atutorial at COLING-ACL, Sydney 2006 with Joakim Nivre Sandra Kübler, Markus Dickinson Indiana University E-mail: skuebler,md7@indiana.edu Transition-Based Parsing 1(11)

More information

A Discriminative Model for Semantics-to-String Translation

A Discriminative Model for Semantics-to-String Translation A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley

More information

Linear Classifiers IV

Linear Classifiers IV Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

Transition-based dependency parsing

Transition-based dependency parsing Transition-based dependency parsing Daniël de Kok Overview Dependency graphs and treebanks. Transition-based dependency parsing. Making parse choices using perceptrons. Today Recap Transition systems Parsing

More information

Online Passive-Aggressive Algorithms. Tirgul 11

Online Passive-Aggressive Algorithms. Tirgul 11 Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

A Support Vector Method for Multivariate Performance Measures

A Support Vector Method for Multivariate Performance Measures A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

Learning Dependency-Based Compositional Semantics

Learning Dependency-Based Compositional Semantics Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference Workshop Mar. 0, 0 Percy Liang Google/Stanford joint work with Michael Jordan and Dan Klein Motivating Problem:

More information

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for atural Language Processing Alexander M. Rush (based on joint work with Michael Collins, Tommi Jaakkola, Terry Koo, David Sontag) Uncertainty

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

Kaggle.

Kaggle. Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project

More information

Products of Weighted Logic Programs

Products of Weighted Logic Programs Under consideration for publication in Theory and Practice of Logic Programming 1 Products of Weighted Logic Programs SHAY B. COHEN, ROBERT J. SIMMONS, and NOAH A. SMITH School of Computer Science Carnegie

More information

Structured Prediction Models via the Matrix-Tree Theorem

Structured Prediction Models via the Matrix-Tree Theorem Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

Statistical Ranking Problem

Statistical Ranking Problem Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Ecient Higher-Order CRFs for Morphological Tagging

Ecient Higher-Order CRFs for Morphological Tagging Ecient Higher-Order CRFs for Morphological Tagging Thomas Müller, Helmut Schmid and Hinrich Schütze Center for Information and Language Processing University of Munich Outline 1 Contributions 2 Motivation

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Fall 2015 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

Lecture 5: Semantic Parsing, CCGs, and Structured Classification

Lecture 5: Semantic Parsing, CCGs, and Structured Classification Lecture 5: Semantic Parsing, CCGs, and Structured Classification Kyle Richardson kyle@ims.uni-stuttgart.de May 12, 2016 Lecture Plan paper: Zettlemoyer and Collins (2012) general topics: (P)CCGs, compositional

More information

Parsing. A Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 27

Parsing. A Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 27 Parsing A Parsing Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 27 Table of contents 1 Introduction 2 A parsing 3 Computation of the SX estimates 4 Parsing with the SX estimate

More information

In this chapter, we explore the parsing problem, which encompasses several questions, including:

In this chapter, we explore the parsing problem, which encompasses several questions, including: Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

arxiv: v2 [cs.cl] 20 Aug 2016

arxiv: v2 [cs.cl] 20 Aug 2016 Solving General Arithmetic Word Problems Subhro Roy and Dan Roth University of Illinois, Urbana Champaign {sroy9, danr}@illinois.edu arxiv:1608.01413v2 [cs.cl] 20 Aug 2016 Abstract This paper presents

More information

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing Dependency Parsing Statistical NLP Fall 2016 Lecture 9: Dependency Parsing Slav Petrov Google prep dobj ROOT nsubj pobj det PRON VERB DET NOUN ADP NOUN They solved the problem with statistics CoNLL Format

More information

Computational Linguistics

Computational Linguistics Computational Linguistics Dependency-based Parsing Clayton Greenberg Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2016 Acknowledgements These slides

More information

CSC321 Lecture 5 Learning in a Single Neuron

CSC321 Lecture 5 Learning in a Single Neuron CSC321 Lecture 5 Learning in a Single Neuron Roger Grosse and Nitish Srivastava January 21, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 5 Learning in a Single Neuron January 21, 2015 1 / 14

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017 CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm 6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm Overview The EM algorithm in general form The EM algorithm for hidden markov models (brute force) The EM algorithm for hidden markov models (dynamic

More information