Quasi-Second-Order Parsing for 1-Endpoint-Crossing, Pagenumber-2 Graphs Junjie Cao, Sheng Huang, Weiwei Sun, Xiaojun Wan Institute of Computer Science and Technology Peking University September 5, 2017 1 of 41
Overview The Problem First-order Algorithm Second-order Algorithm Experiments 2 of 41
Outline The Problem First-order Algorithm Second-order Algorithm Experiments 3 of 41
Semantic dependency parsing Example arg1 arg1 arg1 arg1 arg1 The company that Mark wants to buy Predicate argument analysis, bi-lexical relations Long-distance dependencies Graph-structured representations, many crossing arcs Not a tree: single-headed ( ), cycle-free ( ) 4 of 41
Semantic dependency parsing Example arg1 arg1 arg1 arg1 arg1 The company that Mark wants to buy Predicate argument analysis, bi-lexical relations Long-distance dependencies Graph-structured representations, many crossing arcs Not a tree: single-headed ( ), cycle-free ( ) 4 of 41
Semantic dependency parsing Example arg1 arg1 arg1 arg1 arg1 The company that Mark wants to buy Predicate argument analysis, bi-lexical relations Long-distance dependencies Graph-structured representations, many crossing arcs Not a tree: single-headed ( ), cycle-free ( ) 4 of 41
Semantic dependency parsing Example arg1 arg1 arg1 arg1 arg1 The company that Mark wants to buy Predicate argument analysis, bi-lexical relations Long-distance dependencies Graph-structured representations, many crossing arcs Not a tree: single-headed ( ), cycle-free ( ) 4 of 41
Semantic dependency parsing Example arg1 arg1 arg1 arg1 The company that Mark wants to buy Predicate argument analysis, bi-lexical relations Long-distance dependencies Graph-structured representations, many crossing arcs Not a tree: single-headed ( ), cycle-free ( ) 4 of 41
Maximum Subgraph Input A directed graph G = (V, A) Output Subgraph G = (V, A A) with maximum total weight such that G belongs to G G (s) = arg max ScorePart(s, p) H G(s,G ) p H Example When G is tree, Maximum Subgraph = Maximum Spanning Tree Complexity G and the order of ScorePart determine the complexity of inference. 5 of 41
Complexity G O Algo Arbitrary 1 O(n 2 ) Arbitrary 2 NP-hard (Du et al., 2015) Acyclic 1 NP-hard (Kuhlmann and Jonsson, 2015) Noncrossing 1 O(n 3 ) (Kuhlmann and Jonsson, 2015) Noncrossing 2 O(n 4 ) (Sun et al., 2017) 1-endpoint-crossing 1 O(n 5 ) Ongoing work 1-endpoint-crossing 1 O(n 5 ) (Cao et al., 2017) pagenumber-2 1-endpoint-crossing 1 O(n 4 ) (Cao et al., 2017) pagenumber-2, C-free 1-endpoint-crossing 2 O(n 4 ) This paper pagenumber-2, C-free 6 of 41
1-Endpoint-Crossing Graphs Definition A dependency graph is 1-Endpoint-Crossing if for any edge e, all edges that cross e share an endpoint p named pencil point. 7 of 41
1-Endpoint-Crossing Graphs Definition A dependency graph is 1-Endpoint-Crossing if for any edge e, all edges that cross e share an endpoint p named pencil point. arg1 arg1 arg1 arg1 arg1 The company that Mark wants to buy 7 of 41
1-Endpoint-Crossing Graphs Definition A dependency graph is 1-Endpoint-Crossing if for any edge e, all edges that cross e share an endpoint p named pencil point. arg1 arg1 The company that Mark wants to buy 7 of 41
1-Endpoint-Crossing Graphs Definition A dependency graph is 1-Endpoint-Crossing if for any edge e, all edges that cross e share an endpoint p named pencil point. arg1 The company that Mark wants to buy 7 of 41
1-Endpoint-Crossing Graphs Definition A dependency graph is 1-Endpoint-Crossing if for any edge e, all edges that cross e share an endpoint p named pencil point. arg1 The company that Mark wants to buy 7 of 41
1-Endpoint-Crossing Graphs Definition A dependency graph is 1-Endpoint-Crossing if for any edge e, all edges that cross e share an endpoint p named pencil point. arg1 The company that Mark wants to buy 7 of 41
Pagenumber-K Graphs Definition A dependency graph G is a pagenumber-k graph if G consists at most K subgraphs called pages. Each page contains all vertices, but only a subset of arcs that are not crossed with other arcs in this page. 8 of 41
Pagenumber-K graph Example The company that Mark wants to buy arg1 arg1 arg1 arg1 arg1 A Pagenumber-2 Graph 9 of 41
Pagenumber-K graph Example The company that Mark wants to buy arg1 arg1 arg1 arg1 arg1 A Pagenumber-3 Graph 9 of 41
Coverage PN 2 1EC EnjuBank DeepBank PCEDT CCGBank Yes Both 99.53% 99.69% 98.39% 98.09% Both Yes 97.28% 97.67% 97.53% 95.73% Yes Yes 97.28% 97.67% 97.53% 95.68% No Yes 0.0% 0.0% 0.0% 0.05% Yes No 2.25% 2.02% 0.86% 2.41% Sentences 100% 100% 100% 100% 10 of 41
Coverage PN 2 1EC EnjuBank DeepBank PCEDT CCGBank Yes Both 99.53% 99.69% 98.39% 98.09% Both Yes 97.28% 97.67% 97.53% 95.73% Yes Yes 97.28% 97.67% 97.53% 95.68% No Yes 0.0% 0.0% 0.0% 0.05% Yes No 2.25% 2.02% 0.86% 2.41% Sentences 100% 100% 100% 100% Most semantic dependency graphs are 1EC/P2 graphs. 10 of 41
Coverage PN 2 1EC EnjuBank DeepBank PCEDT CCGBank Yes Both 99.53% 99.69% 98.39% 98.09% Both Yes 97.28% 97.67% 97.53% 95.73% Yes Yes 97.28% 97.67% 97.53% 95.68% No Yes 0.0% 0.0% 0.0% 0.05% Yes No 2.25% 2.02% 0.86% 2.41% Sentences 100% 100% 100% 100% Most semantic dependency graphs are 1EC/P2 graphs. Theorem The pagenumber of a 1EC graph is at most 3. 10 of 41
Previous Work (1) G O Algo Arbitrary 1 O(n 2 ) Arbitrary 2 NP-hard (Du et al., 2015) Acyclic 1 NP-hard (Kuhlmann and Jonsson, 2015) Noncrossing 1 O(n 3 ) (Kuhlmann and Jonsson, 2015) Noncrossing 2 O(n 4 ) (Sun et al., 2017) 1-endpoint-crossing 1 O(n 5 ) Ongoing work 1-endpoint-crossing 1 O(n 5 ) (Cao et al., 2017) pagenumber-2 1-endpoint-crossing 1 O(n 4 ) (Cao et al., 2017) pagenumber-2, C-free 1-endpoint-crossing 2 O(n 4 ) This paper pagenumber-2, C-free 11 of 41
Previous Work (2) Key observation Every subgraph of a 1EC/P2 graph is still a 1EC/P2 graph. A dynamic programming algorithm gchsw In each construction step, usually more than one arcs are allowed to be constructed. Whether or not such arcs are created depends on their arc-weights. We are able to get a maximal 1EC/P2 graph, but just choose a subgraph of it with all positive arcs. 12 of 41
Challenge of High-order Factorization (1) A single step in gchsw i l k j e (i,k),e (l,j) and e (i,j) can be created at the same time. Eisner s algorithm 13 of 41 In a single step, which arc is created is deterministic!
Challenge of High-order Factorization (2) It is very difficult to enumerate all high-order features for crossing arcs. 14 of 41
Challenge of High-order Factorization (2) It is very difficult to enumerate all high-order features for crossing arcs. x r x i r i k j l j 14 of 41
Challenge of High-order Factorization (2) It is very difficult to enumerate all high-order features for crossing arcs. x r x i r i k j l j It is hard to cover sibling features between e (x,k) and e (x,rx ). 14 of 41
Challenge of High-order Factorization (3) Pitler (2014) It is still possible to build accurate tree parsers by considering only higher-order features of noncrossing arcs. 15 of 41
Challenge of High-order Factorization (3) Pitler (2014) It is still possible to build accurate tree parsers by considering only higher-order features of noncrossing arcs. arg1 arg1 arg1 arg1 arg1 The company that Mark wants to buy 15 of 41
Challenge of High-order Factorization (3) Pitler (2014) It is still possible to build accurate tree parsers by considering only higher-order features of noncrossing arcs. arg1 arg1 arg1 arg1 arg1 The company that Mark wants to buy 15 of 41
Challenge of High-order Factorization (3) Pitler (2014) It is still possible to build accurate tree parsers by considering only higher-order features of noncrossing arcs. arg1 arg1 arg1 arg1 arg1 The company that Mark wants to buy 15 of 41
Challenge of High-order Factorization (3) Pitler (2014) It is still possible to build accurate tree parsers by considering only higher-order features of noncrossing arcs. arg1 arg1 arg1 arg1 arg1 The company that Mark wants to buy 15 of 41
Challenge of High-order Factorization (3) Pitler (2014) It is still possible to build accurate tree parsers by considering only higher-order features of noncrossing arcs. arg1 arg1 arg1 arg1 arg1 The company that Mark wants to buy Good news: Most of arcs are noncrossing even in crossing graphs. 15 of 41
Previous Work (3) O[s, e] s e C [s, e, l] s e s e = s + 1 e s e = s k + k e 16 of 41
Outline The Problem First-order Algorithm Second-order Algorithm Experiments 17 of 41
Sub-problem of C-free 1EC/P2 Int O [i, j] LR[i, j, x] N O [i, j, x] L O [i, j, x] R O [i, j, x] i j x i j x i j x i j x i j 18 of 41
Sub-problem of C-free 1EC/P2 Int O [i, j] LR[i, j, x] N O [i, j, x] L O [i, j, x] R O [i, j, x] i j x i j x i j x i j x i j Int C [i, j] N C [i, j, x] L C [i, j, x] R C [i, j, x] i j x i j x i j x i j 18 of 41
Sub-problem of C-free 1EC/P2 Int O [i, j] LR[i, j, x] N O [i, j, x] L O [i, j, x] R O [i, j, x] i j x i j x i j x i j x i j Int C [i, j] N C [i, j, x] L C [i, j, x] R C [i, j, x] i j x i j x i j x i j Open-structure can be transformed to close-structure if red arc exists. 18 of 41
Decomposition of Int C Decompose Int C considering farthest arc from i 1 No arc 2 Noncrossing edge 3 Crossing edge with outer pencil point 4 Crossing edge with inner pencil point 19 of 41
Decomposition of Int C (a) i j = i + 1 j If there is no arc from i to (i, j). 20 of 41
Decomposition of Int C (b) i k j = i k + k j If there is a noncrossing arc from i to (i, j). 21 of 41
Decomposition of Int C (c) i k Dashed edge exist? x j For a crossing arc e (i,k) with outer pt(i,k) = x 22 of 41
Decomposition of Int C (c) i k Dashed edge exist? x j (c.1) i k x j = i k x + k x + k x j For a crossing arc e (i,k) with outer pt(i,k) = x 22 of 41
Decomposition of Int C (c) i k Dashed edge exist? x j (c.1) i k x j = i k x + k x + k x j (c.2) i k x j = i k x + k x + x j For a crossing arc e (i,k) with outer pt(i,k) = x 22 of 41
Decomposition of Int C Dashed edge exist? i x k j For a crossing arc e (i,k) with inner pt(i,k) = x 23 of 41
Decomposition of Int C Dashed edge exist? i (d.1) i x x k k j = i x + i x k + x k j j For a crossing arc e (i,k) with inner pt(i,k) = x 23 of 41
Decomposition of Int C Dashed edge exist? i (d.1) i x (d.2) i x x k k k j j = i x + i x k + x k j j = i x k + x k + x k j For a crossing arc e (i,k) with inner pt(i,k) = x 23 of 41
C-free LR Decomposition x i j 24 of 41
C-free LR Decomposition x i j x i = j x i k + x k j If there exists k dividing [i,j] into two independent spans 24 of 41
C-free LR Decomposition For each vertex k, there must be edges from [i,k) to (k,j]. x i b 1 a 1 b 2 a 2 j, b 3 b 3 = j, there exists only e x,b1 or e x,a2. 25 of 41
C-free LR Decomposition For each vertex k, there must be edges from [i,k) to (k,j]. x i b 1 a 1 b 2 a 2 j, b 3 b 3 = j, there exists only e x,b1 or e x,a2. x i b 1 a 1 b 2 a 2 b 3 j, a 3 a 3 = j, there exists both e x,b1 and e x,b3. 25 of 41
Example The company that Mark wants to buy Int O [1, 7] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 Int O [1, 7] = Int C [1, 2] + Int O [2, 7] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 Int O [2, 7] = Int C [2, 7] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 Int c [2, 7] = Int c [2, 3] + Int O [3, 7] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 Int O [3, 7] = R O [3, 4; 5] + Int O [4, 5] + L O [5, 7; 4] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 R O [3, 4; 5] = Int O [3, 4] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 Int O [4, 5] = Int C [4, 5] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 L O [5, 7; 4] = L C [5, 7; 4] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 L C [5, 7; 4] = Int O [5, 6] + L O [6, 7; 5] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 L O [6, 7; 5] = L C [6, 7; 5] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 L C [6, 7; 5] = Int O [6, 7] = Int C [6, 7] 26 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 Get All Arcs 26 of 41
Spurious Ambiguity A cross-type subproblem allows to build crossing arcs, but does not necessarily create crossing arcs. 27 of 41
Spurious Ambiguity A cross-type subproblem allows to build crossing arcs, but does not necessarily create crossing arcs. a b c d e 27 of 41
Spurious Ambiguity A cross-type subproblem allows to build crossing arcs, but does not necessarily create crossing arcs. a b c d e Int C [a, e] Int C [a, c] + Int O [c, e]. 27 of 41
Spurious Ambiguity A cross-type subproblem allows to build crossing arcs, but does not necessarily create crossing arcs. a b c d e Int C [a, e] LR[a, c, d] + Int O [k, d] + L O [d, e, c]; LR[a, c; d] L O [a, b; d] + R O [b, c, d] Int O [a, b] + Int O [b, c]. 27 of 41
Outline The Problem First-order Algorithm Second-order Algorithm Experiments 28 of 41
Crossing-sensitive Single-side Second-order algorithm G (s) = arg max G e Edge(G ) Score 1 (e) + s Sib(G ) max(score 2 (s), 0) 29 of 41
Crossing-sensitive Single-side Second-order algorithm G (s) = arg max G e Edge(G ) Score 1 (e) + s Sib(G ) max(score 2 (s), 0) Both sibling arcs are noncrossing 29 of 41
Second-order Factorization s e = s + 1 e 1 s e = s r s + rs e 1 s e = s + 1 l e + le e s e = s + 1 l e + le e 30 of 41
Second-order Factorization Noncrossing sibling features can only be captured by decomposing Int C 31 of 41
Second-order Factorization Noncrossing sibling features can only be captured by decomposing Int C (a.1) (b.1) (c.1) i j = i + 1 j 1 i j = i j 1 i j = + i ri ri j 1 (a.2) i j = i + 1 j (b.2) i j = i rj rj j (c.2) i j = + i ri ri j (a.3) i j = i + 1 lj + lj j (b.3) i j = + i rj rj j (c.3) i j = i ri + ri lj + lj j 31 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 32 of 41
Example The company that Mark wants to buy 1 2 3 4 5 6 7 Int c [2, 7] = Int c [2, 3] + Int O [3, 7] + sib(e (2,7), e (2,3) ) 32 of 41
Spurious Ambiguity (1) This model is somehow inadequate given that the second-order score function cannot penalize a bad factor. When a negative score is assigned to a second-order factor, it will be taken as 0 by our algorithm. 33 of 41
Spurious Ambiguity (1) This model is somehow inadequate given that the second-order score function cannot penalize a bad factor. When a negative score is assigned to a second-order factor, it will be taken as 0 by our algorithm. a b c d e 33 of 41
Spurious Ambiguity (1) This model is somehow inadequate given that the second-order score function cannot penalize a bad factor. When a negative score is assigned to a second-order factor, it will be taken as 0 by our algorithm. a b c d e Int C [a, e] Int C [a, c] + Int O [c, e] + S sib (e (a,e), e (a,c) ). 33 of 41
Spurious Ambiguity (1) This model is somehow inadequate given that the second-order score function cannot penalize a bad factor. When a negative score is assigned to a second-order factor, it will be taken as 0 by our algorithm. a b c d e Int C [a, e] LR[a, c, d] + Int O [k, d] + L O [d, e, c]; LR[a, c; d] L O [a, b; d] + R O [b, c, d] Int O [a, b] + Int O [b, c]. 33 of 41
Spurious Ambiguity (2) G (s) = arg max G e Edge(G ) Score 1 (e) + s Sib(G ) max(score 2 (s), 0) Score 2 (s) 0 Our algorithm selects the derivation that takes s into account since it increases the total score. Score 2 (s) < 0 Our algorithm avoids including s by selecting other paths. In other words, our algorithm treats this score as 0. 34 of 41
Outline The Problem First-order Algorithm Second-order Algorithm Experiments 35 of 41
Results 92 Without Tree 90 F-Score 88 86 84 DM PAS CCG PCEDT First Second 36 of 41
Results 94 Syntax Tree F-Score 92 90 DM PAS CCG PCEDT First Second 37 of 41
Conclusion Our contributions A new dynamic programming algorithm for first-order parsing to 1-endpiont-crossing, pagenumber-2, C-free graphs. A new quasi-second-order extension. Lesson learned Crossing-sensitive second-order features are helpful. 38 of 41
Game Over 39 of 41
Game Over QUESTIONS? COMMENTS? 39 of 41
References (1) Junjie Cao, Sheng Huang, Weiwei Sun, and Xiaojun Wan. 2017. Parsing to 1-endpoint-crossing, pagenumber-2 graphs. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2110 2120. Association for Computational Linguistics, Vancouver, Canada. URL http://aclweb.org/anthology/p17-1193. Yantao Du, Weiwei Sun, and Xiaojun Wan. 2015. A data-driven, factorization parser for CCG dependency structures. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1545 1555. Association for Computational Linguistics, Beijing, China. URL http://www.aclweb.org/anthology/p15-1149. Marco Kuhlmann and Peter Jonsson. 2015. Parsing to noncrossing dependency graphs. Transactions of the Association for Computational Linguistics, 3:559 570. Emily Pitler. 2014. A crossing-sensitive third-order factorization for dependency parsing. TACL, 2:41 54. URL http://www.transacl.org/wp-content/uploads/2014/02/39.pdf. 40 of 41
References (2) Weiwei Sun, Junjie Cao, and Xiaojun Wan. 2017. Semantic dependency parsing via book embedding. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 828 838. Association for Computational Linguistics, Vancouver, Canada. URL http://aclweb.org/anthology/p17-1077. 41 of 41