Outline. Parsing. Approximations Some tricks Learning agenda-based parsers
|
|
- Reynold Lane
- 5 years ago
- Views:
Transcription
1 Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 0
2 Parsing We need to compute compute argmax d D(x) p θ (d x) Inference: Find best tree given model 1
3 Parsing We need to compute compute argmax d D(x) p θ (d x) Inference: Find best tree given model E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] Learning: Computing gradient O(θ) = E qθ (d x)[φ(x, d)] E pθ (d x)[φ(x, d)] 1
4 Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 2
5 Finding best tree Can we use CKY? Only if f(x, y) = r d g(x, r) θ r = (A, B, C, i, j, k) is a rule production Why not? is g sufficient? 3
6 Finding best tree Maybe we can only have type features? Root :WonElection(Type(Country)) Root :WonAward(Type(Country)) Binary :WonElection Set :Type(Country) Binary :WonAward Set :Type(Country) won country won country Type features: binary-unary type match 4
7 Finding best tree Maybe we can only have type features? Root :WonElection(Type(Country)) Root :WonAward(Type(Country)) Binary :WonElection Set :Type(Country) Binary :WonAward Set :Type(Country) won country won country Type features: binary-unary type match We can get exact decoding if augment state: Binary-Person-Country, Binary-Person-Award, Set-Country, Set-Award Explosion in grammar size! If number of types is T, then N grows by at least T 2 4
8 Finding best tree In practice many features depend on the logical form itself Bridging features If we could hold the logical form as part of the state we could use dynamic programming, but usually that is not possible 5
9 E qθ (d x)[φ(x, d)] = d D(x) Logical form features exp(φ(x,d) θ) R(d) φ(x, d) d exp(φ(x,d ) θ) R(d ) R(d) depends on entire logical form - too large! 6
10 E qθ (d x)[φ(x, d)] = d D(x) Logical form features exp(φ(x,d) θ) R(d) φ(x, d) d exp(φ(x,d ) θ) R(d ) R(d) depends on entire logical form - too large! Do approximate inference! 6
11 Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 7
12 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways abraham lincoln born in 8
13 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways 20 abraham lincoln born in 8
14 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K abraham lincoln born in 8
15 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K 20 abraham lincoln born in 8
16 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K abraham lincoln born in 8
17 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K 391 abraham lincoln born in 8
18 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K abraham lincoln born in 8
19 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K 508 abraham lincoln born in 8
20 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K abraham lincoln born in 8
21 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K 2 K K abraham lincoln K born K in 8
22 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K abraham lincoln K born K in 8
23 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K 2 K K K K abraham lincoln born in 8
24 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K abraham lincoln born in 8
25 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K 2 K K K K abraham lincoln born in 8
26 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K abraham lincoln born in 8
27 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K 2 K K K K K K K abraham lincoln born in 8
28 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K K abraham lincoln born in 8
29 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K 2 K K K K K K K abraham lincoln born in 8
30 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K K K abraham lincoln born in 8
31 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K 2 K K K K K K K K K abraham lincoln born in 8
32 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K K K K abraham lincoln born in 8
33 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K K K K abraham lincoln born in Building a chart cell is now O(K 2 + L) and not O(1) 8
34 Beam parsing Keep top-k derivations in every chart cell Combine chart cells in all possible ways K K K K K K K K K K abraham lincoln born in Building a chart cell is now O(K 2 + L) and not O(1) Assume root beam ˆD(x) D(x) E pθ [φ(x, d)] = d ˆD(x) p θ(d)φ(x, d) E qθ [φ(x, d)] = d ˆD(x) q θ(d)φ(x, d) 8
35 Beam Parsing variants Keep top-k for every span (i, j) and category A Keep top-k for every span (i, j) 9
36 Beam Parsing variants Keep top-k for every span (i, j) and category A Keep top-k for every span (i, j) Prune cell with probability ratio 9
37 Beam Parsing variants Keep top-k for every span (i, j) and category A Keep top-k for every span (i, j) Prune cell with probability ratio for α = 10: [0.6, 0.2, 0.1, 0.05, 0.03, 0.02] 9
38 Beam Parsing variants Keep top-k for every span (i, j) and category A Keep top-k for every span (i, j) Prune cell with probability ratio for α = 10: [0.6, 0.2, 0.1] 9
39 Beam Parsing variants Keep top-k for every span (i, j) and category A Keep top-k for every span (i, j) Prune cell with probability ratio for α = 10: [0.6, 0.2, 0.1] Learn a classifier that predict K for every chart cell, [Bodenstab et al., 2011] 9
40 Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 10
41 Coarse pruning Most of parsing time is in building logical forms and extracting features Extracting lexical items Executing logical forms Suggesting binary predicates for bridging Do 2-pass parsing. First only recognizing chart cells (without feature extraction), and then again pruning unnecessary chrat cells 11
42 Coarse pruning 1. Build derivations without logical forms and features (one derivation per rule application) - fast! 12
43 Coarse pruning 1. Build derivations without logical forms and features (one derivation per rule application) - fast! 2. Record cell reachability for every rule application: (A, i, j) (B, i, k), (C, k, j) 12
44 Coarse pruning 1. Build derivations without logical forms and features (one derivation per rule application) - fast! 2. Record cell reachability for every rule application: (A, i, j) (B, i, k), (C, k, j) 3. Collect all chart cells that are reachable from (S, 1, n) 12
45 Coarse pruning 1. Build derivations without logical forms and features (one derivation per rule application) - fast! 2. Record cell reachability for every rule application: (A, i, j) (B, i, k), (C, k, j) 3. Collect all chart cells that are reachable from (S, 1, n) 4. Parse again ignoring unreachable states 12
46 Coarse pruning 1. Build derivations without logical forms and features (one derivation per rule application) - fast! 2. Record cell reachability for every rule application: (A, i, j) (B, i, k), (C, k, j) 3. Collect all chart cells that are reachable from (S, 1, n) 4. Parse again ignoring unreachable states We know unreachable states do not contribute to root derivations over sentence 12
47 Coarse parsing Root Root Root Binary 0.9 Set 0.7 won Binary 0.1 Set 0.3 Mod 1.0 Conj 1.0 country 13
48 Coarse parsing Root Root Root Binary 0.9 Set 0.7 won Binary 0.1 Set 0.3 Mod 1.0 Conj 1.0 country π(0, 2, Root) (π(0, 1, Binary), π(1, 2, Set)) (π(0, 1, Set), π(1, 2, Binary)) (π(0, 1, Set), π(1, 2, Mod)) π(0, 1, Binary) π(0, 1, Set) π(1, 2, Binary) π(1, 2, Set) π(1, 2, Mod) π(1, 2, Conj) 13
49 Unary rules in CKY Unary rules are convenient when desinging grammars Set Entity Set Unary 14
50 Unary rules in CKY Unary rules are convenient when desinging grammars Set Entity Set Unary Assume: rules A B and B C A derivation d C i:j Computing π(i, j, A) before π(i, j, B) Why this fails? 14
51 Unary rules Solution: Construct an acyclic (!) directed graph G = (V, E): V = N E are unary rules B A D C 15
52 Unary rules Solution: Construct an acyclic (!) directed graph G = (V, E): V = N E are unary rules A B D Sort topolocially edges and apply rules from end to start after binary rules Guaranteed to apply B C before A B C 15
53 Unary rules Solution: Construct an acyclic (!) directed graph G = (V, E): V = N E are unary rules A B D Sort topolocially edges and apply rules from end to start after binary rules Guaranteed to apply B C before A B Need to verify grammar has no unary cycles 15 C
54 Outline Parsing CKY Approximations Some tricks Learning agenda-based parsers 16
55 [Daume et al., 2009; Ross et al., 2011; Jiang et al., 2012, Goldberg & Nivre, 2013] Learning to search We have a globaly-normalized model, but our search procedure is inexact and slow Perhaps we can make it faster with transition-style methods? Let s try to change the learning problem so that we explicitly learn to search 17
56 Fixed-order beam parsing abraham lincoln born in 18
57 Fixed-order beam parsing 20 abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... 18
58 Fixed-order beam parsing K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... 18
59 Fixed-order beam parsing K 20 abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... 18
60 Fixed-order beam parsing K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... 18
61 Fixed-order beam parsing K K 391 abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... 18
62 Fixed-order beam parsing K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... 18
63 Fixed-order beam parsing K K K 508 abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... 18
64 Fixed-order beam parsing K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... 18
65 Fixed-order beam parsing K 2 K K abraham lincoln K born K in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... 18
66 Fixed-order beam parsing K K K abraham lincoln K born K in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... 18
67 Fixed-order beam parsing K K 2 K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18
68 Fixed-order beam parsing K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18
69 Fixed-order beam parsing K K K 2 K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18
70 Fixed-order beam parsing K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18
71 Fixed-order beam parsing K 2 K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18
72 Fixed-order beam parsing K K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18
73 Fixed-order beam parsing K K 2 K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18
74 Fixed-order beam parsing K K K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18
75 Fixed-order beam parsing K 2 K K K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18
76 Fixed-order beam parsing K K K K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... 18
77 Fixed-order beam parsing K K K K K K K K K K abraham lincoln born in (lexicon) Set Phrase Abraham, AbeLincoln, AbrahamFilm,... (lexicon) Set Phrase AbeLincoln, LincolTown, LincolnFilm,... (lexicon) Binary Phrase BirthPlaceOf, BirthDateOf, ReleaseDateOf,... (lexicon) Set Phrase BornBelgium, BornThisWay, BornAlbum,... (inters) Set Set Set AbeLincoln AbeLincoln,... (join) Set Set Binary BirthPlaceOf.AbeLincoln, ReleaseDateOf.LincolnFilm,... Many improbable derivations generated 18
78 Goal: control parsing order abraham lincoln born in 19
79 Goal: control parsing order 1 abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln 19
80 Goal: control parsing order 2 abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument 19
81 Goal: control parsing order 2 1 abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln 19
82 Goal: control parsing order 2 1 abraham lincoln 1 born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf 19
83 Goal: control parsing order 2 1 abraham lincoln 2 born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf 19
84 Goal: control parsing order 2 1 abraham lincoln 3 born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded 19
85 Goal: control parsing order abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf 19
86 Goal: control parsing order abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf (skip) Binary Binary Token born in BirthDate 19
87 Goal: control parsing order abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf (skip) Binary Binary Token born in BirthDate (skip) Binary Binary Token born in OrganizationFounded 19
88 Goal: control parsing order abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf (skip) Binary Binary Token born in BirthDate (skip) Binary Binary Token born in OrganizationFounded (join) Set Set Binary BirthPlaceOf.AbeLincoln 19
89 Goal: control parsing order abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf (skip) Binary Binary Token born in BirthDate (skip) Binary Binary Token born in OrganizationFounded (join) Set Set Binary BirthPlaceOf.AbeLincoln (join) Set Set Binary BirthDateOf.AbeLincoln 19
90 Goal: control parsing order K abraham lincoln born in (lexicon) Set Phrase abraham lincoln AbeLincoln (lexicon) Set Phrase abraham lincon LincolnMonument (lexicon) Set Phrase lincon AbeLincoln (lexicon) Binary Phrase born BirthPlaceOf (lexicon) Binary Phrase born BirthDateOf (lexicon) Binary Phrase born OrganizationFounded (skip) Binary Binary Token born in BirthPlaceOf (skip) Binary Binary Token born in BirthDate (skip) Binary Binary Token born in OrganizationFounded (join) Set Set Binary BirthPlaceOf.AbeLincoln (join) Set Set Binary BirthDateOf.AbeLincoln 19
91 20
92 Agenda-based parsing Control order of derivation generation! [Kay, 1986; Klein and Manning, 2003; Pauls and Klein, 2009] agenda + chart 21
93 Agenda-based parsing Control order of derivation generation! [Kay, 1986; Klein and Manning, 2003; Pauls and Klein, 2009] Parsing action agenda + chart Choose derivation (based on p θ (d x)) 21
94 Agenda-based parsing [Kay, 1986; Klein and Manning, 2003; Pauls and Klein, 2009] Control order of derivation generation! agenda + chart Parsing action Choose derivation (based on p θ (d x)) Execute action: Add derivation to chart 21
95 Agenda-based parsing [Kay, 1986; Klein and Manning, 2003; Pauls and Klein, 2009] Control order of derivation generation! Parsing action agenda + chart Choose derivation (based on p θ (d x)) Execute action: Add derivation to chart Combine and add to agenda 21
96 Agenda-based parsing [Kay, 1986; Klein and Manning, 2003; Pauls and Klein, 2009] Control order of derivation generation! Parsing action agenda + chart Choose derivation (based on p θ (d x)) Execute action: Add derivation to chart Combine and add to agenda Stop when K root derivations, rerank, and return best 21
97 Learning a scoring function Scoring function: A distribution over full derivations p θ (d x): Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in 22
98 Learning a scoring function Scoring function: A distribution over full derivations p θ (d x): Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in But search over partial derivations: LincolnTown USSLincoln BirthplaceOf.Lincoln lincoln lincoln AbeLincoln BirthplaceOf lincoln born in 22
99 Learning a scoring function Scoring function: A distribution over full derivations p θ (d x): Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect Type.Loc ContainedBy.LincolnTown intersect Type.City PlaceOfBirth.AbeLincoln intersect what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join what Type.Loc lexicon was ContainedBy.LincolnTown join what Type.City lexicon was PlaceOfBirth.AbeLincoln join city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth city LincolnTown ContainedBy city AbeLincoln PlaceOfBirth lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon lexicon abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in abraham lincoln born in But search over partial derivations: LincolnTown USSLincoln BirthplaceOf.Lincoln lincoln lincoln AbeLincoln BirthplaceOf lincoln born in Idea: learn explcitly to make good search choices 22
100 Markov Decision Process [Sutton et al., 1999] s 1 state s derivations on chart and agenda 23
101 Markov Decision Process [Sutton et al., 1999] s 1 s 2 a 1 state s derivations on chart and agenda action a choosing an agenda derivation p θ (a s) 23
102 Markov Decision Process [Sutton et al., 1999] s 1 s 2 s 3 s T +1 a 1 a 2 a 3 a T... state s derivations on chart and agenda action a choosing an agenda derivation p θ (a s) History h = (s 1, a 1,..., a T, s T +1 ) p θ (h) = T t=1 p(a t s t ) 23
103 Markov Decision Process [Sutton et al., 1999] s 1 s 2 s 3 s T +1 a 1 a 2 a 3 a T... state s derivations on chart and agenda action a choosing an agenda derivation p θ (a s) History h = (s 1, a 1,..., a T, s T +1 ) p θ (h) = T t=1 p(a t s t ) Objective: O(θ) = E pθ [Acc(h)] = h p θ(h)acc(h) Explicitly model all parsing actions in all states. 23
104 Learning is hard! s t 1 s t s t+1 a t 1 a t O(θ) = E pθ (h) [ Acc(h) ] T t=1 log p θ(a t s t ) 24
105 Learning is hard! s t 1 s t s t+1 a t 1 a t O(θ) = E pθ (h) Challenge: [ Acc(h) ] T t=1 log p θ(a t s t ) Sampling from p θ leads to slow learning. 24
106 Learning is hard! s t 1 s t s t+1 a t 1 a t O(θ) = E pθ (h) Challenge: [ Acc(h) ] T t=1 log p θ(a t s t ) Sampling from p θ leads to slow learning. Imitation Learning: follow a proposal distribution. 24
107 Proposal distribution - local reweighting Assume a oracle d : e β e β d q θ (a s) p θ (a s) exp[β I[a in d ]] 25
108 Proposal distribution - local reweighting Assume a oracle d : e β e β d q θ (a s) p θ (a s) exp[β I[a in d ]] Prefer sampling from oracle sub-derivations 25
109 Proposal distribution - compression a 4 = d a 1 a 3 h : a 1 a 2 a 3 a 4 s 1 s 2 s 3 s 4 s 5 compressed history c(h) = (s 1, a 1, s 3, a 3, s 4, a 4 ) 26
110 Proposal distribution - compression a 4 = d a 1 a 3 h : a 1 a 2 a 3 a 4 s 1 s 2 s 3 s 4 s 5 compressed history c(h) = (s 1, a 1, s 3, a 3, s 4, a 4 ) q θ (h) = h :c(h )=h p θ(h) 26
111 Proposal distribution - compression a 4 = d a 1 a 3 h : a 1 a 2 a 3 a 4 s 1 s 2 s 3 s 4 s 5 compressed history c(h) = (s 1, a 1, s 3, a 3, s 4, a 4 ) q θ (h) = h :c(h )=h p θ(h) Ignore non-oracle actions 26
112 Proposal distribution city lincoln born Hodgenville, KY 27
113 Proposal distribution city lincoln born Hodgenville, KY H H KY 27
114 Proposal distribution city lincoln born Hodgenville, KY H H KY d Type.CityTown BirthPlaceOf.ABeLincoln Type.CityTown BirthPlaceOf.ABeLincoln city ABeLincoln BirthPlaceOf lincoln born 27
115 Proposal distribution city lincoln born Hodgenville, KY H H KY d Type.CityTown BirthPlaceOf.ABeLincoln Type.CityTown BirthPlaceOf.ABeLincoln Prefer sampling subtrees of d city ABeLincoln BirthPlaceOf lincoln born 27
116 Learning Input: {x i, y i } n i=1 Output: θ 28
117 Learning Input: {x i, y i } n i=1 Output: θ θ 0 28
118 Learning Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i h 0 parse x i sampling from model distribution 28
119 Learning Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i h 0 parse x i sampling from model distribution d choose oracle from h 0 28
120 Learning Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i h 0 parse x i sampling from model distribution d choose oracle from h 0 h parse x i sampling from proposal distribution 28
121 Learning Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i h 0 parse x i sampling from model distribution d choose oracle from h 0 h parse x i sampling from proposal distribution Update θ using proposed history h 28
122 Learning Input: {x i, y i } n i=1 Output: θ θ 0 for iteration τ and example i h 0 parse x i sampling from model distribution d choose oracle from h 0 h parse x i sampling from proposal distribution Update θ using proposed history h Test time: standard agenda-based parsing 28
123 Summary Agenda-based parsing Learning to search Oracle (imitation learning) guides learning 29
124 Evaluation: WebQuestions dataset [Berant et al. 2013] What character did Natalie Portman play in Star Wars? Padmé Amidala What kind of money to take to Bahamas? Bahamian dollar What currency do you use in Costa Rica? Costa Rican colón What did Obama study in school? political science What do Michelle Obama do for a living? writer, lawyer What killed Sammy Davis Jr? throat cancer 30
125 Comparison with beam parsing (test) Test Train actions Time (ms) +learn +parse AgendaIL , learn -parse FixedOrder ,127 1,782 Comparable accuarcy 13 times fewer actions 6 times faster 31
126 Learning analysis (dev) Dev actions Time (ms) +learn +parse AgendaIL , learn -parse FixedOrder ,259 1,972 -learn +parse Agenda , AgendaIL is faster and better 32
127 Beam size analysis (dev) 33
128 WebQuestions SOTA [Yao, 2015] [Yih et al., 2015] [BH, 2015] [this work] [Reddy et al., 2016] [Yih et al., 2015] 34
Outline. Learning. Overview Details
Outline Learning Overview Details 0 Outline Learning Overview Details 1 Supervision in syntactic parsing Input: S NP VP NP NP V VP ESSLLI 2016 the known summer school is V PP located in Bolzano Output:
More informationOutline. Learning. Overview Details Example Lexicon learning Supervision signals
Outline Learning Overview Details Example Lexicon learning Supervision signals 0 Outline Learning Overview Details Example Lexicon learning Supervision signals 1 Supervision in syntactic parsing Input:
More informationNatural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu
Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak
More informationBetter! Faster! Stronger*!
Jason Eisner Jiarong Jiang He He Better! Faster! Stronger*! Learning to balance accuracy and efficiency when predicting linguistic structures (*theorems) Hal Daumé III UMD CS, UMIACS, Linguistics me@hal3.name
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationImproving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley
Improving Neural Parsing by Disentangling Model Combination and Reranking Effects Daniel Fried*, Mitchell Stern* and Dan Klein UC erkeley Top-down generative models Top-down generative models S VP The
More informationPersonal Project: Shift-Reduce Dependency Parsing
Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationIntroduction to Semantic Parsing with CCG
Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG)
More informationTopics in Natural Language Processing
Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 5 Solving an NLP Problem When modelling a new problem in NLP, need to
More informationDriving Semantic Parsing from the World s Response
Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationGraph-based Dependency Parsing. Ryan McDonald Google Research
Graph-based Dependency Parsing Ryan McDonald Google Research ryanmcd@google.com Reader s Digest Graph-based Dependency Parsing Ryan McDonald Google Research ryanmcd@google.com root ROOT Dependency Parsing
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationLecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)
Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for
More informationParsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing
L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the
More informationParsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.
: Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching
More informationProblem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26
Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing
More informationMultilevel Coarse-to-Fine PCFG Parsing
Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa
More informationNatural Language Processing
Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationA* Search. 1 Dijkstra Shortest Path
A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the
More informationS NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.
/6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5
More informationReview. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering
Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up
More informationCS395T: Structured Models for NLP Lecture 19: Advanced NNs I
CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week
More informationCross-lingual Semantic Parsing
Cross-lingual Semantic Parsing Part I: 11 Dimensions of Semantic Parsing Kilian Evang University of Düsseldorf 1 / 94 Abbreviations NL natural language e.g., English, Bulgarian NLU natural language utterance
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationCMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss
CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based
More informationCS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett
CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week
More informationQuantum Searching. Robert-Jan Slager and Thomas Beuman. 24 november 2009
Quantum Searching Robert-Jan Slager and Thomas Beuman 24 november 2009 1 Introduction Quantum computers promise a significant speed-up over classical computers, since calculations can be done simultaneously.
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationMarrying Dynamic Programming with Recurrent Neural Networks
Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying
More informationNatural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu
Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu סקר הערכת הוראה Fill it! Presentations next week 21 projects 20/6: 12 presentations 27/6: 9 presentations 10 minutes
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationA Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0
More informationAN ABSTRACT OF THE DISSERTATION OF
AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms
More informationLarge-scale Collaborative Ranking in Near-Linear Time
Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack
More informationLagrangian Relaxation Algorithms for Inference in Natural Language Processing
Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David
More informationAlgorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationCKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016
CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:
More informationSoft Inference and Posterior Marginals. September 19, 2013
Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationwith Local Dependencies
CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence
More informationAttendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.
even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,
More informationLecture 6: Neural Networks for Representing Word Meaning
Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,
More informationTransition-Based Parsing
Transition-Based Parsing Based on atutorial at COLING-ACL, Sydney 2006 with Joakim Nivre Sandra Kübler, Markus Dickinson Indiana University E-mail: skuebler,md7@indiana.edu Transition-Based Parsing 1(11)
More informationA Discriminative Model for Semantics-to-String Translation
A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationLow-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park
Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationTransition-based dependency parsing
Transition-based dependency parsing Daniël de Kok Overview Dependency graphs and treebanks. Transition-based dependency parsing. Making parse choices using perceptrons. Today Recap Transition systems Parsing
More informationOnline Passive-Aggressive Algorithms. Tirgul 11
Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationA Support Vector Method for Multivariate Performance Measures
A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme
More informationProbabilistic Context-Free Grammars. Michael Collins, Columbia University
Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationLearning Dependency-Based Compositional Semantics
Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference Workshop Mar. 0, 0 Percy Liang Google/Stanford joint work with Michael Jordan and Dan Klein Motivating Problem:
More informationProbabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing
Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for atural Language Processing Alexander M. Rush (based on joint work with Michael Collins, Tommi Jaakkola, Terry Koo, David Sontag) Uncertainty
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationProducts of Weighted Logic Programs
Under consideration for publication in Theory and Practice of Logic Programming 1 Products of Weighted Logic Programs SHAY B. COHEN, ROBERT J. SIMMONS, and NOAH A. SMITH School of Computer Science Carnegie
More informationStructured Prediction Models via the Matrix-Tree Theorem
Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationBe able to define the following terms and answer basic questions about them:
CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional
More informationStatistical Ranking Problem
Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationStatistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields
Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationEcient Higher-Order CRFs for Morphological Tagging
Ecient Higher-Order CRFs for Morphological Tagging Thomas Müller, Helmut Schmid and Hinrich Schütze Center for Information and Language Processing University of Munich Outline 1 Contributions 2 Motivation
More informationThe exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS 188 Fall 2015 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew
More informationLecture 5: Semantic Parsing, CCGs, and Structured Classification
Lecture 5: Semantic Parsing, CCGs, and Structured Classification Kyle Richardson kyle@ims.uni-stuttgart.de May 12, 2016 Lecture Plan paper: Zettlemoyer and Collins (2012) general topics: (P)CCGs, compositional
More informationParsing. A Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 27
Parsing A Parsing Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 27 Table of contents 1 Introduction 2 A parsing 3 Computation of the SX estimates 4 Parsing with the SX estimate
More informationIn this chapter, we explore the parsing problem, which encompasses several questions, including:
Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationarxiv: v2 [cs.cl] 20 Aug 2016
Solving General Arithmetic Word Problems Subhro Roy and Dan Roth University of Illinois, Urbana Champaign {sroy9, danr}@illinois.edu arxiv:1608.01413v2 [cs.cl] 20 Aug 2016 Abstract This paper presents
More informationDependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing
Dependency Parsing Statistical NLP Fall 2016 Lecture 9: Dependency Parsing Slav Petrov Google prep dobj ROOT nsubj pobj det PRON VERB DET NOUN ADP NOUN They solved the problem with statistics CoNLL Format
More informationComputational Linguistics
Computational Linguistics Dependency-based Parsing Clayton Greenberg Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2016 Acknowledgements These slides
More informationCSC321 Lecture 5 Learning in a Single Neuron
CSC321 Lecture 5 Learning in a Single Neuron Roger Grosse and Nitish Srivastava January 21, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 5 Learning in a Single Neuron January 21, 2015 1 / 14
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationCPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017
CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code
More informationParsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22
Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More information6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm
6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm Overview The EM algorithm in general form The EM algorithm for hidden markov models (brute force) The EM algorithm for hidden markov models (dynamic
More information