Alessandro Mazzei Dipartimento di Informatica Università di Torino MATER DI CIENZE COGNITIVE GENOVA 2005 04-11-05 Natural Language Grammars and Parsing
Natural Language yntax Paolo ama Francesca yntactic Parsing: deriving a syntactic structure from the word sequence NP VP N V N Paolo ama Francesca sub ama obj Paolo Francesca
yntax and emantics Paolo ama Francesca Francesca ama Paolo yntactic Parsing yntactic Parsing NP VP N V N Paolo ama Francesca NP VP N V Francesca ama N Paolo
Dependency and PCFG ummary Dependency relations Dependency grammars and parsers Lexicalized PCFG
Anatomy of a Parser (1) Grammar Context-Free,... (2) Algorithm I. earch strategy top-down, bottom-up, left-to-right,... II.Memory organization (3) Oracle back-tracking, dynamic programming,... Probabilistic, rule-based,...
Generative Grammars and Natural Languages Generative Grammars can model the natural language as a formal language The derivation tree can model the syntactic structure of the sentences
Generative grammar G=(Σ,V,,P) Σ = alphabet V = {A,B,...} V P = {Ψ θ,...}
Grammar 3 G 4 =(Σ 4,{,NP,VP,V 1,V 2 },,P 4 }) Σ 4 = {I,Anna,John,Harry,saw,see,swimming} P 4 = { NP VP, VP V 1, VP V 2, NP I John Harry Anna, V 1 saw see, V 2 swimming}
Grammar 3 Derivation NP VP VP V 1 VP V 2 NP I John Harry Anna V 1 saw see V 2 swimming
Grammar 3 Derivation NP VP VP V 1 NP VP VP V 2 NP I John Harry Anna V 1 saw see V 2 swimming NP VP
Grammar 3 Derivation NP VP VP V 1 NP VP I VP VP V 2 NP I John Harry Anna V 1 saw see V 2 swimming NP I VP
Grammar 3 Derivation NP VP VP V 1 NP VP I VP I V 1 VP V 2 NP I John Harry Anna V 1 saw see V 2 swimming NP I V 1 VP
Grammar 3 Derivation NP VP VP V 1 VP V 2 NP I John Harry Anna V 1 saw see V 2 swimming NP VP I VP I V 1 I saw NP I V 1 saw VP
Grammar 3 Derivation NP VP VP V 1 VP V 2 NP I John Harry Anna V 1 saw see V 2 swimming NP VP I VP I V 1 I saw I saw NP VP NP I V 1 VP saw NP VP
Grammar 3 Derivation NP VP VP V 1 VP V 2 NP I John Harry Anna V 1 saw see V 2 swimming NP VP I VP I V 1 I saw I saw NP VP I saw Harry VP NP I V 1 VP saw NP Harry VP
Grammar 3 Derivation NP VP VP V 1 VP V 2 NP I John Harry Anna V 1 saw see V 2 swimming NP VP I VP I V 1 I saw I saw NP VP I saw Harry VP I saw Harry V 2 NP I V 1 VP saw NP Harry VP V 2
Grammar 3 Derivation NP VP VP V 1 VP V 2 NP I John Harry Anna V 1 saw see V 2 swimming NP VP I VP I V 1 I saw I saw NP VP I saw Harry VP I saw Harry V 2 I saw Harry swimming NP I V 1 VP saw NP Harry VP V 2 swimming
Dependency and PCFG ummary Dependency relations Dependency grammars and parsers Lexicalized PCFG
A different syntactic structure: Dependency Constituent structure represents the grouping relations among the words Dependence structure represents the dependency relations among the words NP VP N V N Paolo ama Francesca sub ama obj Paolo Francesca
Dependency relation Relation among two words: Head: dominant word Dependent: dominated word The head selects his dependents and determines their properties Example: the verb determines the number of his arguments
Dependency relation Head: dominant word ama Paolo Francesca
Dependency relation Dependent: dominated word ama Paolo Francesca
Dependency relation Dependent argument arg ama arg Paolo Francesca
Dependency relation Dependent argument modifier arg Paolo corre mod velocemente
Dependency relation Dependent argument modifier mod il cane mod giallo
Constituency and Dependency Constituency relation captures dependency relation in the X-bar theory X'' arg head X' X' arg mod NP Paolo VP VP V N ama Francesca ADV dolcemente
Constituency and Dependency Constituency relation captures dependency relation in the X-bar theory X'' arg head X' X' arg mod NP Paolo VP VP V N ama Francesca ADV dolcemente Problem with free-word order languages
Constituency and Dependency sub Paolo obj ama Francesca mod dolcemente NP Paolo VP VP V N ama Francesca ADV dolcemente
Constituency and Dependency sub Paolo :1 obj ama :2 Francesca :3 mod dolcemente :4 NP Paolo VP VP V N ama Francesca ADV dolcemente
Turin University Treebank Dependency Treebank: 1800 sentences, ~40000 words Various genres: newspaper, civil law, albanian, miscellaneous Augmented Relational tructure (AR) Morpho-syntactic yntactic-functional emantic
Turin University Treebank ************** FRAE ALB-4 ************** 1 Il (IL ART DEF M ING) [5;VERB-UBJ] 2 Governo (GOVERNO NOUN COMMON M ING) [1;DET+DEF-ARG] 3 di (DI PREP MONO) [2;PREP-RMOD] 4 Berisha ( Berisha NOUN PROPER) [3;PREP-ARG] 5 appare (APPARIRE VERB MAIN IND PRE INTRAN 3 ING) [0;TOP-VERB] 6 in (IN PREP MONO) [5;VERB-PREDCOMPL+UBJ] 7 difficolta' ( difficolta` NOUN COMMON F ALLVAL) [6;PREP-ARG] 8. (#\. PUNCT) [5;END]
Turin University Treebank
Generative Grammars and Natural Languages Generative Grammars model the generation of the sentences The derivation tree can model the constituency structure of the sentences
Generative Grammars and Natural Languages Generative Grammars model the generation of the sentences The derivation tree can model the constituency structure of the sentences Representation vs. Generation
Dependency and PCFG ummary Dependency relations Dependency grammars and parsers Lexicalized PCFG
Dependency grammars and parsers How can we generate a dependency structure? dependency grammar How can we build the dependency structure of a sentence? dependency parser
Dependency grammars In the constituency paradigm: generative grammars rewriting rule In the dependency paradigm: constraint grammars constraint
Dependency parsers: Turin University Parser A rule-based dependency parser that uses subcategorization frames Chunk parser (~bottom-up) AR annotation Morpho-syntactic yntactic-functional emantic
Turin University Parser 1) Non verbal Rules: (ADJ-QUALIF BEFORE (ADV (TYPE MANNER)) ADVMOD-MANNER ) If an adverb of subcategory (TYPE) MANNER immediately precedes a qualificative adjective, then it can depend from it via an arc labelled as ADVMOD-MANNER.... davvero veloce... veloce davvero ADVMOD-MANNER
Turin University Parser 2) Verbal Rules based on a taxonomy of subcategorization classes: VERB TRAN... INTRAN... INTRAN-INDOBJ-PRED (Ex. La casa gli sembra bella )...
Turin University Parser Paolo è davvero veloce 1) NVR Paolo è veloce ADVMOD-MANNER davvero 2) VR VERB-UBJ è VERB-PREDCOMPL Paolo veloce ADVMOD-MANNER davvero
Anatomy of the TUP (1) Grammar Dependency grammar (constraint),... (2) Algorithm I. earch strategy top-down, ~bottom-up, left-to-right,... II.Memory organization (3) Oracle depth-first, back-tracking, dynamic programming,... Probabilistic, rule-based,...
Dependency and PCFG ummary Dependency relations Dependency grammars and parsers Lexicalized PCFG
Probabilistic CFG G=(Σ,V,,P) A β [p] p (0,1)
PCFG P(T a ) =.15 *.4 *.05 *.05 *.35 *.75 *.4 *.4 *.4 *.3 *.4 *.5 = = 1.5 x 10-6 P(T b ) =.15 *.4 *.4 *.05 *.05 *.75 *.4 *.4 *.4 *.3 *.4 *.5 = = 1.7 x 10-6
Problem with PCFG Independence assumption: no structural and lexical preferences
Problem with PCFG Independence assumption: no structural and lexical preferences
Problem with PCFG Independence assumption: no structural and lexical preferences
Lexicalized PCFG Each CF rule is augmented with information about the heads of the constituents involved A BC A(head A ) B(head B ) C(head C ) Middle point between dependency and constituency paradigm
Lexicalized PCFG VP VBD NP PP VP(dumped) VBD(dumped) NP(sacks) PP(into) [3x10-10 ] VP(dumped) VBD(dumped) NP(cats) PP(into) [8x10-11 ] VP(dumped) VBD(dumped) NP(hats) PP(into) [4x10-10 ] VP(dumped) VBD(dumped) NP(sacks) PP(above) [1x10-12 ]
Lexicalized PCFG
Lexicalized PCFG
Conclusions yntactic structure: constituency and dependency relations Parsing: generative and constraint paradigm Lexicalized Probabilistic CFGs Treebank
References PEECH and LANGUAGE PROCEING D. Jurafsky and J.H. Martin Prentice Hall 2000 An Introduction to yntax R.D. Van Valin Cambridge 2001 TUT and TUP: http://www.di.unito.it/~gull