Dependency grammar. Recurrent neural networks. Transition-based neural parsing. Word representations. Informs Models

Similar documents
Computational Linguistics

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Transition-based dependency parsing

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

Alessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Parsing with Context-Free Grammars

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Transition-Based Parsing

Parsing with Context-Free Grammars

Dependency Parsing. COSI 114 Computational Linguistics Marie Meteer. March 21, 2015 Brandeis University

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Personal Project: Shift-Reduce Dependency Parsing

10/17/04. Today s Main Points

X-bar theory. X-bar :

The Formal Architecture of. Lexical-Functional Grammar. Ronald M. Kaplan and Mary Dalrymple

LECTURER: BURCU CAN Spring

Multiword Expression Identification with Tree Substitution Grammars

arxiv: v2 [cs.cl] 20 Apr 2017

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Spatial Role Labeling CS365 Course Project

Model-Theory of Property Grammars with Features

Bringing machine learning & compositional semantics together: central concepts

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

CS460/626 : Natural Language

13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Natural Language Processing

Extensions to the Logic of All x are y: Verbs, Relative Clauses, and Only

Probabilistic Context-free Grammars

A* Search. 1 Dijkstra Shortest Path

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

Models of Adjunction in Minimalist Grammars

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

Artificial Intelligence

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Word Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks. Michael Collins, Columbia University

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Context-free grammars for natural languages

The Real Story on Synchronous Rewriting Systems. Daniel Gildea Computer Science Department University of Rochester

Ling 240 Lecture #15. Syntax 4

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

Posterior vs. Parameter Sparsity in Latent Variable Models Supplementary Material

(7) a. [ PP to John], Mary gave the book t [PP]. b. [ VP fix the car], I wonder whether she will t [VP].

Modular Grammar Design with Typed Parametric Principles

National Centre for Language Technology School of Computing Dublin City University

Spectral Unsupervised Parsing with Additive Tree Metrics

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

1. Background. Task: Determine whether a given string of words is a grammatical (well-formed) sentence of language L i or not.

POS-Tagging. Fabian M. Suchanek

Part-of-Speech Tagging + Neural Networks CS 287

CS 188 Introduction to AI Fall 2005 Stuart Russell Final

Parasitic Scope (Barker 2007) Semantics Seminar 11/10/08

Probabilistic Context Free Grammars. Many slides from Michael Collins

HPSG: Binding Theory

A Context-Free Grammar

Lecture 5: UDOP, Dependency Grammars

CS 6120/CS4120: Natural Language Processing

Introduction to Semantics. The Formalization of Meaning 1

Two-phase Implementation of Morphological Analysis

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Design and Evaluation of the Linguistic Basis of an Automatic F-Structure Annotation Algorithm for the Penn-II Treebank

Statistical Methods for NLP

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Structured Prediction Models via the Matrix-Tree Theorem

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Other types of Movement

Splitting the structured paths in stratified graphs. Application in Natural Language Generation

Ling 5801: Lecture Notes 7 From Programs to Context-Free Grammars

Syntax-Based Decoding

Constituency. Doug Arnold

Proseminar on Semantic Theory Fall 2010 Ling 720. Remko Scha (1981/1984): Distributive, Collective and Cumulative Quantification

CS 545 Lecture XVI: Parsing

Annotation tasks and solutions in CLARIN-PL

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Unterspezifikation in der Semantik Scope Semantics in Lexicalized Tree Adjoining Grammars

Treebank Grammar Techniques for Non-Projective Dependency Parsing

Categorial Grammar. Larry Moss NASSLLI. Indiana University

Effectiveness of complex index terms in information retrieval

Grammar and Feature Unification

Graph-based Dependency Parsing. Ryan McDonald Google Research

Marrying Dynamic Programming with Recurrent Neural Networks

Wide-coverage Translation in GF

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Vine Pruning for Efficient Multi-Pass Dependency Parsing. Alexander M. Rush and Slav Petrov

Compositionality and Syntactic Structure Marcus Kracht Department of Linguistics UCLA 3125 Campbell Hall 405 Hilgard Avenue Los Angeles, CA 90095

Seminar in Semantics: Gradation & Modality Winter 2014

Modular Grammar Design with Typed Parametric Principles

C SC 620 Advanced Topics in Natural Language Processing. Lecture 21 4/13

Simpler Syntax. Ling : Sign-Based Construction Grammar Instructor: Ivan A. Sag URL:

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis. July 31, 2014

Unification. Two Routes to Deep Structure. Unification. Unification Grammar. Martin Kay. Stanford University University of the Saarland

Transcription:

Dependency grammar Morphology Word order Transition-based neural parsing Word representations Recurrent neural networks Informs Models

Dependency grammar Morphology Word order Transition-based neural parsing Word representations Recurrent neural networks

Dependency Grammar I Modern theories of dependency grammar originate with Lucien Tesnière I Reference: Lucien Tesnière (1959). Éléments de syntaxe structurale, Klincksieck, Paris. ISBN 2-252-01861-5 I Underlying ideas date back to Panini and his system of karakas I Di erent contemporary frameworks of dependency grammar, including the Prague School s Functional Generative Description, Melcuk s Meaning-Text Theory, and Hudson s Word Grammar.

Dependency Grammar The sentence is an organized whole, whose constituent elements are words. [1.2] Every word that belongs to a sentence ceases by itself to be isolated as in the dictionary. Between the word and its neighbors, the mind perceives connections, the totality of which forms the structure of the sentence. [1.3] The structural connections establish dependency relations between the words. Each connection in principle unites a superior term and an inferior term. [2.1] The superior term receives the name governor. Theinferiortermreceivesthenamesubordinate. Thus, in the sentence Alfred parle [... ], parle is the governor and Alfred the subordinate. from: Tesnière (1959)

Advantages of Dependency Grammars I a completely word-based framework (no phrasal projections). I most dependency grammar frameworks are non-derivational and mono-stratal. I allows for a surface-level syntactic account of languages with flexible word order and syntactic constructions with discontinuous elements. However, these syntactic phenomena raise also challenging questions about the dependency grammar formalism and the notion of projectivity of dependency structures.

Parsing with Dependency Grammars I Parsing a sentence is not a goal in itself, but ultimately needs to help provide an adequate answer to the question: Who did what to whom, when, where, and why? In other words: syntactic structure needs to be linked in a systematic fashion to semantic representation/interpretation. I Dependency grammar o ers a direct interface between syntax and semantics: dependency relations between a governor (lexical head) and its lexical dependents can link lexical representations of the main participants of an event or state of a airs with lexical representations of the cicrumstances under which they occurred or hold. I Parsing with dependency grammars benefits from the lexicalist character of dependency relations. This is beneficial, inter alia, for parsing long-distance dependencies and coordinations (see Kübler and Prokic 2006)

UD English treebank: treatment of nominal arguments nsubj aux root obj punct det compound you should get a cocker spaniel. you should get a cocker spaniel. PRON AUX VERB DET NOUN NOUN PUNCT

UD English treebank: treatment of PP adjuncts root obl punct nsubj obj case He announced this in January : he announce this in January : PRON VERB PRON ADP PROPN PUNCT

UD English treebank: treatment of clausal subjects punct root csubj mark obj obl case Great to have you on board! great to have you on board! ADJ PART VERB PRON ADP NOUN PUNCT

UD English treebank: treatment of relative clauses punct nsubj root obj advmod acl:relcl det det nsubj amod Every move Google makes brings this particular future closer. every move Google make bring this particular future closer. DET NOUN PROPN VERB VERB DET ADJ NOUN ADV PUNCT

UD English treebank: treatment of relative clauses vocative punct nsubj acl:relcl nsubj root obj punct Malach, What you say makes sense. Malach, what you say make sense. PROPN PUNCT PRON PRON VERB VERB NOUN PUNCT

UD English treebank: treatment of direct questions advmod auxpass nsubjpass advmod root nmod nmod:tmod punct Why were they suddenly acted on Saturday? why be they suddenly act on Saturday? ADV AUX PRON ADV VERB ADP PROPN PUNCT

Heads and Dependents Tests for identifying a head H and a dependent D in a syntactic construction C: 1. H determines the syntactic category of C and can often replace C. 2. H determines the semantic category of C; D gives semantic specification 3. H is obligatory; D may be optional. 4. H selects D and determines whether D is obligatory or optional. 5. The form of D depends on H (agreement or government). 6. The linear position of D is specified with reference to H. from: Kübler et. al. (2009), p.3f.

Heads and Dependents: Some Unclear Cases I Auxiliary-main-verb constructions I determiner-adjective-noun constructions I prepositional phrases I Coordination structures The answer often depends on di erent purposes that the dependency structure is put to use for.

Case Study: Strong and Weak Adjectives in Dutch (1) a. de bruine beer the brown [weak] beer [masc] b. een a the brown beer bruine brown a brown beer beer [strong] beer [masc] c. de bruine beest the brown [weak] animal [neut] d. een a the brown animal bruin beest brown [strong] animal [neut] a brown animal

Universal Dependency Initiative I objective: develop cross-linguistically consistent treebank annotation for many languages I goal: facilitate multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective I strategy: provide a universal inventory of categories and guidelines to facilitate consistent annotation of similar constructions across languages, while allowing language-specific extensions when necessary

UD annotations across languages some examples

Universal Dependency Relations

Universal Tagset Open class words Closed class words Other ADJ ADP PUNCT ADV AUX SYM INTJ CCONJ X NOUN DET PROPN VERB NUM PART PRON SCONJ

Some Definitions: Sentence and Arc Labels Definition 2.1. A sentence is a sequence of tokens denoted by: S = w 0 w 1...w n,wherew 0 = root Definition 2.2. Let R = {r 1,...,r m } be a finite set of possible dependency relation types that can hold between any two words in a sentence. A relation type r 2 R is additionally called an arc label. Acknowledgement: Definitions 2.1-2.4; 2.16-2.18, and Notation 2.6-2.9 and are all taken from Kübler, McDonald, and Nivre (2009), chapt. 2

Dependency Structures and Dependency Trees Definition 2.3. A dependency graph G =(V, A) isalabeled directed graph (digraph) in the standard graph-theoretic sense and consists of nodes, V,andarcs,A, suchthatforsentence S = w 0 w 1...w n and label set R the following holds: 1. V {w 0, w 1,...,w n } 2. A V R V 3. if (w i, r, w j ) 2 A then (w i, r 0, w j ) /2 A for all r 0 6= r The spanning node set V S = {w 0, w 0,...,w n } contains all and only the words of a sentence, including w 0 = root.

Dependency Trees Definition 2.4. A well-formed dependency graph G =(V, A) for an input sentence S and dependency relation set R is any dependency graph that is adirectedtreeoriginating out of node w 0 and has the spanning node set V = V S.Wecallsuch dependency graphs dependency trees.

Unique Head Property Remark: Dependency trees rule out the following dependency configuration: arc 1 arc 2 head dep head Some putative counterexample: In cases of VP coordination, as in Sandy listened and smiled, itappearsatleastpausibleto establish a dependency relation between each verbal head to the nominal dependent.

Some Notation Notation 2.6. The notation w i! w j indicates the unlabeled dependency relation (or dependency relation for short) in a tree G =(V, A). That is, w i! w j if and only if (w i, r, w j ) 2 A for some r 2 R. Notation 2.7. The notation w i! w j indicates the reflexive transitive closure of the dependency relation in a tree G =(V, A). That is, w i! w j if and only if i = j (reflexive) or both w i! w i 0 and w i 0! w j hold (for some w i 0 2 V ). Notation 2.8. The notation w i $ w j indicates the undirected dependency relation in a tree G =(V, A). That is, w i $ w j if and only if either w i! w j or w j! w i. Notation 2.9. The notation w i $ w j indicates the reflexive transitive closure of the undirected dependency relation in a tree G =(V, A). That is, w i $ w j if and only if i = j (reflexive) or both w i $ w i 0 and w i 0 $ w j hold (for some w i 0 2 V ).

Connectedness AdependencytreeG =(V, A) satisfiestheconnectedness property, whichstatesthatforallw i, w j 2 V it is the case that w i $ w j. That is, there is a path connecting every two words in a dependency tree when the direction of the arc (dependency relation) is ignored.

(Non-)Projective Dependency Trees Definition 2.16. An arc (w i, r, w j ) 2 A in a dependency tree G =(V, A) isprojective if and only if w i! w k for all i < k < j when i < j, orj < k < i when j < i. Definition 2.17. AdependencytreeG =(V, A) isaprojective dependency tree if (1) it is a dependency tree (definition 2.4), and (2) all (w i, r; w j ) 2 A are projective. Definition 2.18. A dependency tree G = (V;A) is a non-projective dependency tree if (1) it is a dependency tree (definition 2.4), and (2) it is not projective.

Converting Non-projective to Projective Dependency Trees PU root ATT TMP PC DET SBJ VC ATT A hearing is scheduled on the issue today. VC:TMP PU root SBJ:ATT PC DET SBJ VC ATT A hearing is scheduled on the issue today.

Adependencygammartreebank A dependency gammar treebank consists of pairs of sentences S and their corresponding dependency trees G : T = {(S d, G d )} T d=0 The dependency trees G can be obtained by I manual annotation by one or more human annotators I automatically annotated by a parser I derived automatically by a conversion algorithm from a constituent grammar treebank

Tübingen Treebank of Written German (TüBa-D/Z) I developed by my research group at the Seminar für Sprachwissenschaft at the University of Tübingen since 1999. I language data taken from the German newspaper die tageszeitung (taz). I largest manually annotated treebank for German I total of 104,787 sentences I average sentence length: 18.7 words per sentence. I total number of tokens: 1,959,474.

Tübingen Treebank of Written German (TüBa-D/Z) I orginally annotated for constituent structure I now also available in dependency structure format I The annotation guidelines are published in the Stylebook for the Tübingen Treebank of Written German (TüBa-D/Z) http://www.sfs.uni-tuebingen.de/fileadmin/user_ upload/ascl/\tuebadz-stylebook-1508.pdf I Information on how to obtain the data can be found at: http://www.sfs.uni-tuebingen.de/en/ascl/ resources/corpora/tueba-dz.html

=<, > = 0 1...

Θ()

= 0 1... =< σ,β, > σ σ : β β : (,, )

= 0 1... =< σ,β, > σ σ : β β : (,, )

σ β

σ =[, ] β =[, ] = {(,, )}

0 () ([ 0 ], [ 1... ], ) ([ 0 ], [], ) 0 =

[σ ] σ [ β] β

(σ, [ β], ) ([σ ],β,) ([σ ],β,) ([σ ]β, (,, )) 0 ([σ ],β,) ([σ ],β, (,, ))

0, =( 0, 1,..., ) 0 0 () 1... = ( 1 ) =<, >

σ β...

σ β...

σ β...

σ β... 1 = {(,, )}

σ β... 1 = {(,, )} 1

σ β... 1 = {(,, )} 1 1

σ β... 1 = {(,, )} 1 1 2 = 1 {(,, )}

σ β... 1 = {(,, )} 1 1 2 = 1 {(,, )} 3 = 2 {(,, )}

σ β... 1 = {(,, )} 1 1 2 = 1 {(,, )} 3 = 2 {(,, )} 4 = 3 {(,, )}

,

() ( {}) ()

() ( {}) () ( )= {

(σ, [ β], ) ([σ ],β,) ([σ ],β,) ([σ ]β, (,, )) 0 ([σ ],β,) ([σ ],β, (,, )) ([σ ],β,) ([σ ],β,)

σ β...

σ β...

σ β...

σ β... 1 = {(,, )}

σ β... 1 = {(,, )} 2 = 1 {(,, )}

σ β... 1 = {(,, )} 2 = 1 {(,, )} [σ 1 2 ] 2

σ β...

σ β...

σ β...

σ β... 1 = {(,, )}

σ β... 1 = {(,, )} 1

σ β... 1 = {(,, )} 1 1

σ β... 1 = {(,, )} 1 1 1

σ β... 1 = {(,, )} 1 1 1 2 = 1 {(,, )}

σ β... 1 = {(,, )} 1 1 1 2 = 1 {(,, )} 3 = 2 {(,, )}

σ β... 1 = {(,, )} 1 1 1 2 = 1 {(,, )} 3 = 2 {(,, )} 4 = 3 {(,, )}

σ β... 1 = {(,, )} 1 1 1 2 = 1 {(,, )} 3 = 2 {(,, )} 4 = 3 {(,, )} 5 = 4 {(,, )}

(σ, [ β], ) ([σ ],β,) ([σ ], [ β], ) ([σ ],β, (,, )) ([σ ], [ β], ) (σ, [ β], (,, )) 0 (,, ) ([σ ],β,) (σ, β, ) (,, )

(σ, [ β], ) ([σ ],β,) ([σ ], [ β], ) ([σ ],β, (,, )) ([σ ], [ β], ) (σ, [ β], (,, )) 0 (,, ) ([σ ],β,) (σ, β, ) (,, )

σ β...

σ β...

σ β... 1 = {(,, )}

σ β... 1 = {(,, )} 2 = 1 {(,, )}

σ β... 1 = {(,, )} 2 = 1 {(,, )} 3 = 2 {(,, )}

σ β... 1 = {(,, )} 2 = 1 {(,, )} 3 = 2 {(,, )} 4 = 3 {(,, )}

σ β... 1 = {(,, )} 2 = 1 {(,, )} 3 = 2 {(,, )} 4 = 3 {(,, )} 5 = 4 {(,, )}

σ β... 1 = {(,, )} 2 = 1 {(,, )} 3 = 2 {(,, )} 4 = 3 {(,, )} 5 = 4 {(,, )} 4 5

2 :

0 () () () ()

O()

O()

O() 1

O() 1 1