S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Similar documents
Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

LECTURER: BURCU CAN Spring

CS 6120/CS4120: Natural Language Processing

Dependency Parsing. COSI 114 Computational Linguistics Marie Meteer. March 21, 2015 Brandeis University

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

A Context-Free Grammar

Probabilistic Context-free Grammars

Natural Language Processing

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Advanced Natural Language Processing Syntactic Parsing

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

Features of Statistical Parsers

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Multiword Expression Identification with Tree Substitution Grammars

Probabilistic Context Free Grammars. Many slides from Michael Collins

Bringing machine learning & compositional semantics together: central concepts

Parsing with Context-Free Grammars

Mid-term Reviews. Preprocessing, language models Sequence models, Syntactic Parsing

Dependency grammar. Recurrent neural networks. Transition-based neural parsing. Word representations. Informs Models

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Maschinelle Sprachverarbeitung

Computational Linguistics

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

Statistical Methods for NLP

Maschinelle Sprachverarbeitung

Constituency Parsing

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

Marrying Dynamic Programming with Recurrent Neural Networks

A* Search. 1 Dijkstra Shortest Path

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

X-bar theory. X-bar :

Decoding and Inference with Syntactic Translation Models

Lecture 5: UDOP, Dependency Grammars

Algorithms for Syntax-Aware Statistical Machine Translation

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

CS 545 Lecture XVI: Parsing

10/17/04. Today s Main Points

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Spectral Unsupervised Parsing with Additive Tree Metrics

CS460/626 : Natural Language

Multilevel Coarse-to-Fine PCFG Parsing

Processing/Speech, NLP and the Web

In this chapter, we explore the parsing problem, which encompasses several questions, including:

Parsing with Context-Free Grammars

Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Alessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005

Sharpening the empirical claims of generative syntax through formalization

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

Log-Linear Models with Structured Outputs

13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017

Introduction to Probablistic Natural Language Processing

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

The relation of surprisal and human processing

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

Probabilistic Linguistics

Lecture 11: PCFGs: getting luckier all the time

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

DT2118 Speech and Speaker Recognition

Introduction to Semantic Parsing with CCG

probabilities are conditioned on lexical information and lexical information is available at every point in the parse-tree. Structural relations, in c

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

CS : Speech, NLP and the Web/Topics in AI

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Chapter 14 (Partially) Unsupervised Parsing

CISC4090: Theory of Computation

The Noisy Channel Model and Markov Models

Spatial Role Labeling CS365 Course Project

The Infinite PCFG using Hierarchical Dirichlet Processes

The Rise of Statistical Parsing

Introduction to Data-Driven Dependency Parsing

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing

An introduction to PRISM and its applications

Chiastic Lambda-Calculi

Artificial Intelligence

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Better! Faster! Stronger*!

Probabilistic Context-Free Grammar

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Lab 12: Structured Prediction

Context- Free Parsing with CKY. October 16, 2014

Latent Variable Models in NLP

Proseminar on Semantic Theory Fall 2010 Ling 720. Remko Scha (1981/1984): Distributive, Collective and Cumulative Quantification

Computational Models - Lecture 3

Midterm sample questions

Spectral Learning for Non-Deterministic Dependency Parsing

Transcription:

/6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5 VP V. VP V @VP_V. VP V PP. @VP_V NP PP. NP NP NP. NP NP PP. NP N.7 N rods. V tanks. P with. fish people fish tanks.9. fish people fish tanks score[][] score[][] score[][] score[][].5 VP V. VP V @VP_V. VP V PP. score[][] score[][] score[][] @VP_V NP PP. NP NP NP. NP NP PP. NP N.7 score[][] score[][] score[][] N rods. for i=; i<#(words); i++ for A in nonterms if A -> words[i] in grammar score[i][i+][a] = P(A -> words[i]); P with..9..5 VP V. VP V @VP_V. VP V PP. @VP_V NP PP. NP NP NP. NP NP PP. NP N.7 N rods. P with. fish people fish tanks // handle unaries boolean added = true while added added = false for A, B in nonterms if score[i][i+][b] > && A->B in grammar prob = P(A->B)*score[i][i+][B] if(prob > score[i][i+][a]) score[i][i+][a] = prob back[i][i+][a] = B added = true.9..5 VP V. VP V @VP_V. VP V PP. @VP_V NP PP. NP NP NP. NP NP PP. NP N.7 N rods. P with. fish people fish tanks.6 NP N.5 VP V...6 prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if (prob > score[begin][end][a]) VP V..

/6/7.9..5 VP V. VP V @VP_V. VP V PP. @VP_V NP PP. NP NP NP. NP NP PP. NP N.7 N rods. P with. fish people fish tanks NP NP NP.9.5.6.6 NP NP NP.9 NP N.5.7 VP V...89 NP NP NP.96. //handle unaries boolean added = true.6.78 while added added = false for A, B in nonterms prob = P(A->B)*score[begin][end][B]; score[begin][end][a] = prob VP V. back[begin][end][a] = B. added = true.9..5 VP V. VP V @VP_V. VP V PP. @VP_V NP PP. NP NP NP. NP NP PP. NP N.7 N rods. P with. fish people fish tanks NP NP NP.9.5.6.5 NP NP NP.9 NP N.5.7 VP V...89 NP NP NP.96..6. for split = begin+ to end- prob=score[begin][split][b]*score[split][end][c]*p(a->bc) VP V...9..5 VP V. VP V @VP_V. VP V PP. @VP_V NP PP. NP NP NP. NP NP PP. NP N.7 N rods. P with. fish people fish tanks NP NP NP NP NP NP.9.686.5.7.6.5.88 NP NP NP.9 NP N.5.7 VP V...89 NP NP NP.96..6. for split = begin+ to end- prob=score[begin][split][b]*score[split][end][c]*p(a->bc) VP V...9..5 VP V. VP V @VP_V. VP V PP. @VP_V NP PP. NP NP NP. NP NP PP. NP N.7 N rods. P with. fish people fish tanks NP NP NP NP NP NP.9.686.5.7.6.5.88 NP NP NP NP NP NP.9.686 NP N.5.7.98 VP V...89. NP NP NP.96..6. for split = begin+ to end- prob=score[begin][split][b]*score[split][end][c]*p(a->bc) VP V...9..5 VP V. VP V @VP_V. VP V PP. @VP_V NP PP. NP NP NP. NP NP PP. NP N.7 N rods. P with. fish people fish tanks NP NP NP NP NP NP NP NP NP.9.686.96.5.7.58.6.5.88 NP NP NP NP.85 NP NP.9.686 NP N.5.7.98 VP V...89. NP NP NP.96..6. VP V.. Call buildtree(score, back) to get the best parse Evaluating constituency parsing

/6/7 Evaluating constituency parsing Gold standard brackets: S-(:), NP-(:), VP-(:9), VP-(:9), NP-(:6), PP-(6-9), NP-(7,9), NP-(9:) Candidate brackets: S-(:), NP-(:), VP-(:), VP-(:), NP-(:6), PP-(6-), NP-(7,) Labeled Precision /7 =.9% Labeled Recall /8 = 7.5% LP/LR F.% POS Tagging Accuracy / =.% How good are PCFGs? Penn WSJ parsing accuracy: about 7% LP/LR F Robust Usually admit everything, but with low probability Partial solution for grammar ambiguity A PCFG gives some idea of the plausibility of a parse But not so good because the independence assumptions are too strong Give a probabilistic language model But in the simple case it performs worse than a trigram model The problem seems to be that PCFGs lack the lexicalization of a trigram model [Magerman 995, Collins 997; Charniak 997] [Magerman 995, Collins 997; Charniak 997] [Magerman 995, Collins 997; Charniak 997] [Magerman 995, Collins 997; Charniak 997] Word-to-word affinities are useful for certain ambiguities PP attachment is now (partly) captured in a local PCFG rule. Think about: What useful information isn t captured? VP NP PP VP NP PP announce RATES FOR January ANNOUNCE rates IN January Also useful for: coordination scope, verb complement patterns

/6/7 Lexicalized parsing was seen as the parsing breakthrough of the late 99s Eugene Charniak, JHU workshop: To do better, it is necessary to condition probabilities on the actual words of the sentence. This makes the probabilities much tighter: p( NP) =.5 p( NP said) =. p( NP gave) =.98 Lexicalization of PCFGs: Charniak (997) A very straightforward model of a lexicalized PCFG Probabilistic conditioning is top-down like a regular PCFG But actual parsing is bottom-up, somewhat like the CKY algorithm we saw Michael Collins, COLT tutorial: Lexicalized Probabilistic Context- Free Grammars perform vastly better than PCFGs (88% vs. 7% accuracy) Charniak (997) example Lexicalization models argument selection by sharpening rule expansion probabilities The probability of different verbal complement frames (i.e., subcategorizations ) depends on the verb: Local Tree come take think want VP V 9.5%.6%.6% 5.7%.%.%.%.9% VP V PP.5%.% 7.%.% VP V SBAR 6.6%.% 7.%.% VP V S.%.%.8% 7.8% S.% 5.7%.%.% VP V PRT NP.% 5.8%.%.% VP V PRT PP 6.%.5%.%.% monolexical probabilities Lexicalization sharpens probabilities: Predicting heads Bilexical probabilities Charniak (997) linear interpolation/shrinkage P(prices n-plural) =. P(prices n-plural, NP) =. P(prices n-plural, NP, S) =.5 P(prices n-plural, NP, S, v-past) =.5 P(prices n-plural, NP, S, v-past, fell) =.6

/6/7 Charniak (997) shrinkage example Dependency Grammar and Dependency Structure Dependency syntax postulates that syntactic structure consists of lexical items linked by binary asymmetric relations ( arrows ) called dependencies The arrows are commonly typed with the name of grammatical relations (subject, ositional object, apposition, etc.) submitted nsubjpass auxpass Bills were by on Brownback nn appos ports Senator Republican cc conj and immigration of Kansas Dependency Grammar and Dependency Structure Dependency syntax postulates that syntactic structure consists of lexical items linked by binary asymmetric relations ( arrows ) called dependencies The arrow connects a head (governor, superior, regent) with a dependent (modifier, inferior, subordinate) Usually, dependencies form a tree (connected, acyclic, single-head) submitted nsubjpass auxpass Bills were by on Brownback nn appos ports Senator Republican cc conj and immigration of Kansas Relation between phrase structure and dependency structure A dependency grammar has a notion of a head. Officially, CFGs don t. But modern linguistic theory and all modern statistical parsers (Charniak, Collins, Stanford, ) do, via hand-written phrasal head rules : The head of a Noun Phrase is a noun/number/adj/ The head of a Verb Phrase is a verb/modal/. The head rules can be used to extract a dependency parse from a CFG parse Methods of Dependency Parsing. Dynamic programming (like in the CKY algorithm) You can do it similarly to lexicalized PCFG parsing: an O(n 5 ) algorithm Eisner (996) gives a clever algorithm that reduces the complexity to O(n ), by producing parse items with heads at the ends rather than in the middle. Graph algorithms You create a Maximum Spanning Tree for a sentence McDonald et al. s (5) MSTParser scores dependencies independently using a ML classifier (he uses MIRA, for online learning, but it could be MaxEnt). Constraint Satisfaction Edges are eliminated that don t satisfy hard constraints. Karlsson (99), etc.. Deterministic parsing Greedy choice of attachments guided by machine learning classifiers MaltParser (Nivre et al. 8) discussed in the next segment Dependency Conditioning Preferences What are the sources of information for dependency parsing?. Bilexical affinities [issues à the] is plausible. Dependency distance mostly with nearby words. Intervening material Dependencies rarely span intervening verbs or punctuation. Valency of heads How many dependents on which side are usual for a head? ROOT Discussion of the outstanding issues was completed. 5