Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Similar documents
LECTURER: BURCU CAN Spring

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Parsing with Context-Free Grammars

A Context-Free Grammar

Maschinelle Sprachverarbeitung

Statistical Methods for NLP

Maschinelle Sprachverarbeitung

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Processing/Speech, NLP and the Web

Alessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005

CS460/626 : Natural Language

Probabilistic Context-free Grammars

Advanced Natural Language Processing Syntactic Parsing

Natural Language Processing

Chapter 14 (Partially) Unsupervised Parsing

Lecture 12: Algorithms for HMMs

Probabilistic Context Free Grammars. Many slides from Michael Collins

Lecture 12: Algorithms for HMMs

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

The Infinite PCFG using Hierarchical Dirichlet Processes

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Soft Inference and Posterior Marginals. September 19, 2013

Probabilistic Context-Free Grammar

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Latent Variable Models in NLP

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

CS : Speech, NLP and the Web/Topics in AI

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CS 545 Lecture XVI: Parsing

10/17/04. Today s Main Points

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

CS460/626 : Natural Language

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

CS 6120/CS4120: Natural Language Processing

Spectral Unsupervised Parsing with Additive Tree Metrics

Stochastic Parsing. Roberto Basili

Multilevel Coarse-to-Fine PCFG Parsing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

A* Search. 1 Dijkstra Shortest Path

Multiword Expression Identification with Tree Substitution Grammars

Constituency Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Parsing with Context-Free Grammars

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Context- Free Parsing with CKY. October 16, 2014

Language and Statistics II

Introduction to Computational Linguistics

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Lecture 15. Probabilistic Models on Graph

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

Introduction to Probablistic Natural Language Processing

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

DT2118 Speech and Speaker Recognition

Expectation Maximization (EM)

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Lecture 13: Structured Prediction

Lecture 5: UDOP, Dependency Grammars

CS838-1 Advanced NLP: Hidden Markov Models

Statistical Methods for NLP

Hidden Markov Models

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

Features of Statistical Parsers

Hidden Markov Models

Algorithms for Syntax-Aware Statistical Machine Translation

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Context Free Grammars

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Decoding and Inference with Syntactic Translation Models

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Midterm sample questions

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011

Lecture 12: EM Algorithm

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

NLP Programming Tutorial 11 - The Structured Perceptron

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

Intelligent Systems (AI-2)

Statistical methods in NLP, lecture 7 Tagging and parsing

Hidden Markov Models (HMMs)

Cross-Entropy and Estimation of Probabilistic Context-Free Grammars

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Probabilistic Linguistics

Probabilistic Context-Free Grammars and beyond

Marrying Dynamic Programming with Recurrent Neural Networks

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

POS-Tagging. Fabian M. Suchanek

Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction

An introduction to PRISM and its applications

Hidden Markov Models in Language Processing

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

The Rise of Statistical Parsing

Transcription:

Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Statistical Parsing Define a probabilistic model of syntax P(T S): Probabilistic Context Free Grammars (PCFG). Lexicalized PCFGs: Collins parser, Charniak s parser, Use probabilistic model for: Statistical parsing choose the most probable parse: Tˆ( S) arg max T : yield ( T ) S Language Modeling compute the probability of a sentence: P( S) T: yield ( T ) S P( T P( T, S) S)

Probabilistic CFG (PCFG) Augment each rule in a CFG with a conditional probability: A β [ p] p β p( A β ) p( β A) p( A β ) 1

1.0 1.0 1.0 1.0

Probability of Parse Trees Assume rewriting rules are chosen independently: probability of derivation product of the probability of all its productions LHS i RHS i : Statistical parsing choose the most probable parse: Language Modeling compute the probability of a sentence: n i RHS i LHS i P S T P T P 1 ) ( ), ( ) ( ) ( arg max ) ( arg max ) ˆ( ) ( : ) ( : T P S T P S T S T yield T S T yield T S T yield T S T yield T T P S T P S P ) ( : ) ( : ) ( ), ( ) (

6 1 10 2.2.40.10.60.30 75.20.20.20.05 ) ( T P 7 2 10 6.1.40.10.60.30.75 75.15.20.10.05 ) ( T P

PCFGs P( T1 ) 2.2 10 6 P( T2 ) 6.1 10 Statistical parsing choose the most probable parse: Tˆ ( S) arg max P( T ) T T: yield ( T ) S Language Modeling the probability of a sentence: P( S) P( T ) T: yield ( T ) S 1 2.2 10 6 7 + 6.1 10 7

HMMs: Inference and Training Three fundamental questions: 1) Given a model µ (A, B, Π), compute the probability of a given observation sequence i.e. p(o µ) (Forward/Backward). 2) Given a model µ and an observation sequence O, compute the most likely hidden state sequence (Viterbi). X ˆ arg max P( X O, µ ) X 3) Given an observation sequence O, find the model µ (A, B, Π) that best explains the observed data (EM). Given observation and state sequence O, X find µ (ML). Lecture 08 8

PCFGs: Inference and Training Three fundamental questions: 1) Given a model µ, compute the probability of a given sentence S i.e. p(s µ) (Inside/Outside). P( S) T: yield ( T ) S P( T, S) 2) Given a model µ and a sentence S, compute the most likely parse tree (pcky). Tˆ( S) arg max P( T S) arg max P( T ) T : yield ( T ) S T: yield ( T ) S 3) Given a set of sentences {S}, find the model µ that best explains the observed data (EM). Given sentences and parses {S, T} find µ (ML). P( T ) T: yield ( T ) S

2) Probabilistic CKY (pcky) Parsing 2) Given a model µ and a sentence S, compute the most likely parse tree (pcky): Tˆ( S) arg max P( T S) arg max T : yield ( T ) S T: yield ( T ) S P( T ) CKY can be modified for PCFG parsing by including in each cell a probability for each non-terminal. T[i,j] must retain the most probable derivation of each constituent (non-terminal) covering words i +1 through j together with its associated probability. When transforming the grammar to CNF, must set production probabilities to preserve the probability of derivations.

Original Grammar S NP VP S Aux NP VP S VP NP Pronoun NP Proper-Noun NP Det Nominal Nominal Noun Nominal Nominal Noun Nominal Nominal PP VP Verb VP Verb NP VP VP PP PP Prep NP 0.8 0.1 0.1 0.2 0.2 0.6 0.3 0.2 0.5 0.2 0.5 0.3 1.0 Chomsky Normal Form S NP VP S X1 VP X1 Aux NP S book include prefer 0.01 0.004 0.006 S Verb NP S VP PP NP I he she me 0.1 0.02 0.02 0.06 NP Houston NWA 0.16.04 NP Det Nominal Nominal book flight meal money 0.03 0.15 0.06 0.06 Nominal Nominal Noun Nominal Nominal PP VP book include prefer 0.1 0.04 0.06 VP Verb NP VP VP PP PP Prep NP 0.8 0.1 1.0 0.05 0.03 0.6 0.2 0.5 0.5 0.3 1.0

Probabilistic CKY (pcky) Parsing Book the flight through Houston [Animation by Ray Mooney] S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 Det:.6 NP:.6*.6*.15.054 Nominal:.15 Noun:.5

Probabilistic CKY (pcky) Parsing Book the flight through Houston [Animation by Ray Mooney] S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 VP:.5*.5*.054.0135 Det:.6 NP:.6*.6*.15.054 Nominal:.15 Noun:.5

Probabilistic CKY (pcky) Parsing Book the flight through Houston [Animation by Ray Mooney] S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054.00135 VP:.5*.5*.054.0135 Det:.6 NP:.6*.6*.15.054 Nominal:.15 Noun:.5

Probabilistic CKY (pcky) Parsing Book the flight through Houston [Animation by Ray Mooney] S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054.00135 VP:.5*.5*.054.0135 Det:.6 NP:.6*.6*.15.054 Nominal:.15 Noun:.5 Prep:.2

Probabilistic CKY (pcky) Parsing Book the flight through Houston [Animation by Ray Mooney] S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054.00135 VP:.5*.5*.054.0135 Det:.6 NP:.6*.6*.15.054 Nominal:.15 Noun:.5 Prep:.2 PP:1.0*.2*.16.032 NP:.16 PropNoun:.8

Probabilistic CKY (pcky) Parsing Book the flight through Houston [Animation by Ray Mooney] S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054.00135 VP:.5*.5*.054.0135 Det:.6 NP:.6*.6*.15.054 Nominal:.15 Noun:.5 Nominal:.5*.15*.032.0024 Prep:.2 PP:1.0*.2*.16.032 NP:.16 PropNoun:.8

Probabilistic CKY (pcky) Parsing Book the flight through Houston [Animation by Ray Mooney] S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054.00135 VP:.5*.5*.054.0135 Det:.6 NP:.6*.6*.15.054 NP:.6*.6*.0024.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032.0024 Prep:.2 PP:1.0*.2*.16.032 NP:.16 PropNoun:.8

Probabilistic CKY (pcky) Parsing Book the flight through Houston [Animation by Ray Mooney] S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054.00135 VP:.5*.5*.054.0135 S:.05*.5*.000864.0000216 Det:.6 NP:.6*.6*.15.054 NP:.6*.6*.0024.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032.0024 Prep:.2 PP:1.0*.2*.16.032 NP:.16 PropNoun:.8

Probabilistic CKY (pcky) Parsing Book the flight through Houston [Animation by Ray Mooney] S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054.00135 VP:.5*.5*.054.0135 S:.03*.0135*.032.00001296 S:.0000216 Det:.6 NP:.6*.6*.15.054 NP:.6*.6*.0024.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032.0024 Prep:.2 PP:1.0*.2*.16.032 NP:.16 PropNoun:.8

Probabilistic CKY (pcky) Parsing Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054.00135 VP:.5*.5*.054.0135 S:.0000216 Most probable parse. Det:.6 NP:.6*.6*.15.054 NP:.6*.6*.0024.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032.0024 Prep:.2 PP:1.0*.2*.16.032 NP:.16 PropNoun:.8

Probabilistic CKY Algorithm

1) Observation Probability using pcky 1) Given a model µ, compute the probability of a given sentence S i.e. p(s µ) (Inside/Outside): Use Inside probabilities, the analogue of Backward probabilities in HMMs: j β j ( p, q) p( w N, G) pq pq Compute Inside probabilities by replacing max with sum inside the pcky algorithm. Or use Outside probs, the analogue of Forward probs in HMMs.

Probabilistic CKY (pcky) Parsing: Sum Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054.00135 VP:.5*.5*.054.0135 S:.00001296 S:.0000216 Det:.6 NP:.6*.6*.15.054 NP:.6*.6*.0024.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032.0024 Prep:.2 PP:1.0*.2*.16.032 NP:.16 PropNoun:.8

Probabilistic CKY (pcky) Parsing: Sum Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054.00135 VP:.5*.5*.054.0135 S:.00001296 +.0000216.00003456 Sum probabilities of each derivation. Det:.6 NP:.6*.6*.15.054 NP:.6*.6*.0024.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032.0024 Prep:.2 PP:1.0*.2*.16.032 NP:.16 PropNoun:.8

PCFG Training: Unsupervised 3) Given a set of sentences {S}, find the model µ that best explains the observed data (EM). Use Inside-Outside, a generalization of Forward-Backward. 1. Begin with a grammar with equal rule probabilities / random. 2. For each sentence, compute the probability of each parse. 3. Re-estimate the rule probabilities by using the parse probabilities as weights for the Counts. 4. Repeat from 2, until probabilities converge. Problems: each iteration is slow O(m 3 n 3 ), sensitive to initialization many local maxima, no guarantee learned non-terminals correspond to linguistic intuitions / constituents. Exact algorithm in M&S, pages 398 402.

PCFG Training: Supervised 3) Given sentences and parses {S, T} find µ (ML): estimate parameters directly from counts in the treebank. P( α β α) count( α β ) count( α γ ) γ count( α β ) count( α)

Limitations of Vanilla PCFGs Poor Independence Assumptions: cannot model the fact that: NPs that are syntactic subjects are far more likely to be pronouns. NPs that are syntactic objects are far more likley to be nonpronominal. Lack of Lexical Conditioning: can only model general preference for PP attachment to NPs vs VPs. but PPs sometimes attach to NPs, sometimes attach to VPs, depending on the actual Verb, Preposition, Noun.

Splitting Non-terminals Split the NP into two versions: one for subjects, one for objects. Parent Annotation: annotate each node with its parent in the parse tree. NP subject annotated as NP^S NP object annotated as NP^VP

Splitting Non-terminals Split pre-terminals to allow if to prefer a sentential complements:

Split and Merge Node splitting increases the size of the grammar need to find the right level of granularity: automatically search for the optimal splits. start with a simple X-bar grammar. alternate between splitting and merging non-terminals. [Petrov et al., 2006] stop when likelihood of training treebank is maximized. Alternatively, use hand-written rules to find an optimal number of non-terminals: [Klein and Manning, 2003]

The Argument for Lexicalization VP VBD NP PP VP VBD NP NP NP PP If preference is given to verb attachment, then the PCFG get the wrong parse for fishermen caught tons of herring.

Lexicalized PCFGs Lexicalized Grammar in every rule, associate each nonterminal symbol with its lexical head and head tag. VP(dumped, VBD) VBD(dumped, VBD) NP(sacks, NNS) PP(into, IN)

Lexicalized PCFGs Lexicalized Grammar in every rule, associate each nonterminal symbol with its lexical head and head tag. important to have rules for head identification. VP(dumped, VBD) VBD(dumped, VBD) NP(sacks, NNS) PP(into, IN) Estimating the corresponding probabilities is not feasible, due to sparse counts. Need to make further independence assumptions Collins Parser.

Collins Parser All rules are expressed as: P(h) L n+1 L n (l n ) L 1 (l 1 ) H(h) R 1 (r 1 ) R m (r m ) R m+1 where L n+1 STOP, R m+1 STOP Generative story: 1. Generate the head label of the phrase: P h (H P,h) 2. Generate modifiers to the left of the head, independently given the head info: P L (L i (l i ) P,h,H) stop when STOP is generated. 3. Generate modifiers to the right of the head, independently given the head info: P R (R i (r i ) P,h,H) stop when STOP is generated.

Workers [ VP dumped sacks into bins]. VP(dumped, VBD) STOP VBD(dumped, VBD) NP(sacks, NNS) PP(into, IN) STOP P(h) L n+1 L n (l n ) L 1 (l 1 ) H(h) R 1 (r 1 ) R m (r m ) R m+1 n 0, m 2 P VP, H VBD, L 1 STOP, R 1 NP, R 2 PP, R 3 STOP h dumped, VBD, r 1 sacks, NNS, r 2 dumped, VBD P H (VBD VP, dumped) P L (STOP VP, VBD, dumped) P R (NP(sacks, NNS) VP, VBD, dumped) P R (PP(into, IN) VP, VBD, dumped) P R (STOP VP, VBD, dumped)

Collins Parser: Training Estimate P H, P L and P R from treebank data: P R (PPinto-IN VPdumped-VBD) Count(PPinto-IN right of head in a VPdumped-VBD production) Count(symbol right of head in a VPdumped-VBD) Smooth estimates by linearly interpolating with simpler models conditioned on just POS tag or no lexical info. P R (PPinto-IN VPdumped-VBD) λ 1 P R (PPinto-IN VPdumped-VBD) + (1- λ 1 ) (λ 2 P R (PPinto-IN VPVBD) + Witten-Bell discounting. (1- λ 2 ) P R (PPinto-IN VP))

Collins Parser Model 1 also conditions on a distance feature: distance as a function of words between modifier and head: is the distance 0? do the words contain a verb? Model 2 adds more sophisticated features: condition on the subcategorization frames for each verb. distinguish arguments from adjuncts. IBM bought Lotus yesterday. Parsing algorithm is an extension of pcky.

Dependency Parsing Non-projective trees: Maximum Spanning Trees Chu-Liu-Edmonds algorithm: O(n 2 ). Projective trees: Dynamic Programming Eisner algorithm: O(n 3 ). McDonald, Pereira, Ribarov, Hajic Non-projective Dependency Parsing using Spanning Tree algorithms, HLT-EMNLP 05