CS460/626 : Natural Language

Similar documents
Processing/Speech, NLP and the Web

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Artificial Intelligence

CS : Speech, NLP and the Web/Topics in AI

CS460/626 : Natural Language

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Probabilistic Context-free Grammars

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Probabilistic Context-Free Grammar

Advanced Natural Language Processing Syntactic Parsing

LECTURER: BURCU CAN Spring

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Natural Language Processing

CS460/626 : Natural Language Processing/Speech, NLP and the Web

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Statistical Methods for NLP

Parsing with Context-Free Grammars

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

DT2118 Speech and Speaker Recognition

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Probabilistic Context Free Grammars. Many slides from Michael Collins

10/17/04. Today s Main Points

Parsing with Context-Free Grammars

Lecture 13: Structured Prediction

A Context-Free Grammar

CS 6120/CS4120: Natural Language Processing

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Introduction to Probablistic Natural Language Processing

Probabilistic Context Free Grammars

The Noisy Channel Model and Markov Models

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8 POS tagset) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan, 2012

CS 712: Topics in NLP Linguistic Phrases and Statistical Phrases

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Spectral Unsupervised Parsing with Additive Tree Metrics

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Lecture 9: Hidden Markov Model

Statistical Methods for NLP

POS-Tagging. Fabian M. Suchanek

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Features of Statistical Parsers

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

Chapter 14 (Partially) Unsupervised Parsing

Multiword Expression Identification with Tree Substitution Grammars

Probabilistic Linguistics

Multilevel Coarse-to-Fine PCFG Parsing

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Context Free Grammars

X-bar theory. X-bar :

Lecture 3: ASR: HMMs, Forward, Viterbi

Statistical methods in NLP, lecture 7 Tagging and parsing

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Constituency Parsing

Maxent Models and Discriminative Estimation

Sequence Labeling: HMMs & Structured Perceptron

CS838-1 Advanced NLP: Hidden Markov Models

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

A gentle introduction to Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Lecture 5: UDOP, Dependency Grammars

Graphical models for part of speech tagging

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Intelligent Systems (AI-2)

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Hidden Markov Models

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Hidden Markov Models in Language Processing

Stochastic Parsing. Roberto Basili

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

A* Search. 1 Dijkstra Shortest Path

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Introduction to Computational Linguistics

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Alessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005

Soft Inference and Posterior Marginals. September 19, 2013

Natural Language Processing

Intelligent Systems (AI-2)

Computational Models - Lecture 4 1

CSE 490 U Natural Language Processing Spring 2016

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Computational Models - Lecture 4

Transcription:

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th, 10 th March, 2011 (Lectures 21 and 22 were on Sentiment Analysis (Lectures 21 and 22 were on Sentiment Analysis by Aditya Joshi)

A note on Language Modeling Example sentence ^ The tortoise beat the hare in the race. Guided Guided by Guided by by frequency Language world Knowledge Knowledge N-gram (n=3) CFG Probabilistic CFG Dependency Grammar ^ the tortoise 5*10-3 S-> NP VP S->NP VP 10 1.0 Semantic Roles agt, obj, sen, etc. the tortoise beat NP->DT N NP->DT N Semantic Rules 3*10-2 0.5 are always tortoise beat the VP->V NP VP->V NP PP between 7*10-5 PP 0.4 Heads beat the hare PP-> P NP PP-> P NP 5*10-6 1.0 Prob. DG Semantic Roles with probabilities

Parse Tree S NP VP DT N V NP PP The Tortoise beat DT N P NP the hare in DT N the race

UNL Expression beat@past agt obj scn (scene) tortoise@def hare@def race@def

Purpose of LM Prediction of next word (Speech Processing) Language Identification (for same script) Belongingness check (parsing) P(NP->DT N) means what is the probability that the YIELD of the non terminal NP is DT N

Need for Deep Parsing Sentences are linear structures But there is a hierarchy- a tree- hidden behind the linear structure There are constituents and branches

PPs are at the same level: flat with respect to the head word book NP No distinction in terms of dominance or c-command command The AP PP PP book with the blue cover big of poems [The big book of poems with the [The big book of poems with the Blue cover] is on the table.

Constituency test of Replacement runs into problems One-replacement: Ib bought htthe big [book of poems with iththe blue cover] not the small [one] One-replacemen targets book of poems with the blue cover Another one-replacement: I bought the big [book of poems] with the blue cover not the small [one] with the red cover One-replacemen targets book of poems

More deeply embedded structure NP N 1 The AP N 2 big N 3 PP N book PP with the blue cover of poems

Grammar and Parsing Algorithms

A simplified grammar S NP VP NP DT N N VP V ADV V

Example Sentence People e laugh 1 2 3 These are positions Lexicon: People - N, V Laugh - N, V This indicate that both Noun and Verb is possible for the word People

Top-Down Parsing State Backup State Action ----------------------------------------------------------------------------------------------------- 1. ((S) 1) - - Position of input pointer 2. ((NP VP)1) - - 3a. ((DT N VP)1) ((N VP) 1) - 3b. ((N VP)1) - - 4. ((VP)2) - Consume People 5a. ((V ADV)2) ((V)2) - 6. ((ADV)3) ((V)2) Consume laugh 5b. ((V)2) - - 6. ((.)3) - Consume laugh Termination Condition : All inputs over. No symbols remaining. Note: Input symbols can be pushed back.

Discussion for Top-Down Parsing This kind of searching is goal driven This kind of searching is goal driven. Gives importance to textual precedence (rule precedence). No regard for data, a priori (useless expansions made).

Bottom-Up Parsing Some conventions: N 12 Represents positions S 1? -> NP 12 VP 2? End position unknown Work on the LHS done, while the work on RHS remaining

Bottom-Up Parsing (pictorial representation) S -> NP 12 VP 23 People Laugh 1 2 3 N 12 N 23 12 23 V 12 V 23 NP 12 -> N 12 NP 23 -> N 23 VP 12 -> V 12 VP 23 -> V 23 12 12 23 23 S 1? -> NP 12 VP 2?

Problem with Top-Down Parsing Left Recursion Suppose you have A-> AB rule. Then we will have the expansion as follows: ((A)K) -> ((AB)K) -> ((ABB)K)..

Combining i top-down and bottom-up strategies

Top-Down Bottom-Up Chart Parsing Combines advantages of top-down & bottomup pparsing. Does not work in case of left recursion. e.g. People laugh People noun, verb Laugh noun, verb Grammar S NP VP NP DT N N VP V ADV V

Transitive Closure People laugh 1 2 3 S NP VP NP N VP V NP DT N S NP VP S NP VP NP N N VP V V ADV success VP V

Arcs in Parsing Each arc represents a chart which records Completed work (left of ) Expected work (right of )

Example People laugh loudly 1 2 3 4 S NP VP NP N VP V VP V ADV NP DT N S NP VP VP V ADV S NP VP NP N VP V ADV S NP VP VP V

Dealing With Structural Ambiguity Multiple parses for a sentence The man saw the boy with a telescope. The man saw the mountain with a telescope. The man saw the boy with the ponytail. At the level of syntax, all these sentences are ambiguous. But semantics can disambiguate 2 nd &3 rd sentence.

Prepositional Phrase (PP) Attachment Problem V NP 1 P NP 2 (Here P means preposition) NP 2 attaches to NP 1? or NP 2 attaches to V?

Parse Trees for a Structurally Ambiguous Sentence Let the grammar be S NP VP NP DT N DT N PP PP PNP VP V NP PP V NP For the sentence, I saw a boy with a telescope

Parse Tree - 1 S NP VP N V NP I saw Det N a boy PP P NP with Det N a telescope

Parse Tree -2 S NP VP N V NP PP I saw Det N a boy P with NP Det N a telescope

Parsing Structural Ambiguity

Parsing for Structurally Ambiguous Sentences Sentence I saw a boy with a telescope Grammar: S NP VP NP ART N ART N PP PRON VP V NP PP V NP ART a an the N boy telescope PRON I V saw

Ambiguous Parses Two possible parses: PP attached with Verb (i.e. I used a telescope to see) ( S ( NP ( PRON I ))() VP ( V saw ) ( NP ( (ART a ) ( N boy )) ( PP (P with ) (NP ( ART a ) ( N telescope ))))) PP attached with Noun (i.e. boy had a telescope) ( S ( NP ( PRON I ) ) ( VP ( V saw ) ( NP ( (ART a ) ( N boy ) (PP (P with ) (NP ( ART a ) ( N telescope ))))))

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( )

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 2 ( ( NP VP ) 1 ) Use NP ART N ART N PP PRON

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 2 ( ( NP VP ) 1 ) Use NP ART N ART N PP PRON 3 ((( ART N VP ) 1 ) (a) ( ART N PP VP ART does not (( ) th I b k 1 ) (b) ( ( PRON VP ) 1) match I, backup state (b) used

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 2 ( ( NP VP ) 1 ) Use NP ART N ART N PP PRON 3 ((( ART N VP ) 1 ) (a) ( ART N PP VP ART does not (( ) th I b k 1 ) state (b) used (b) ( ( PRON VP ) 1) 3 B ( ( PRON VP ) 1 ) match I, backup

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 2 ( ( NP VP ) 1 ) Use NP ART N ART N PP PRON 3 ((( ART N VP ) 1 ) (a) ( ART N PP VP ART does not (( ) th I b k 1 ) state (b) used (b) ( ( PRON VP ) 1) 3 B ( ( PRON VP ) 1 ) 4 ( ( VP ) 2 ) Consumed I match I, backup

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 2 ( ( NP VP ) 1 ) Use NP ART N ART N PP PRON 3 ((( ART N VP ) 1 ) (a) ( ART N PP VP ART does not (( ) th I b k 1 ) state (b) used (b) ( ( PRON VP ) 1) 3 B ( ( PRON VP ) 1 ) 4 ( ( VP ) 2 ) Consumed I match I, backup 5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) Verb Attachment Rule used

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 2 ( ( NP VP ) 1 ) Use NP ART N ART N PP PRON 3 ((( ART N VP ) 1 ) (a) ( ART N PP VP ART does not (( ) th I b k 1 ) state (b) used (b) ( ( PRON VP ) 1) 3 B ( ( PRON VP ) 1 ) 4 ( ( VP ) 2 ) Consumed I match I, backup 5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) Verb Attachment Rule used 6 ( ( NP PP ) 3 ) Consumed saw

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 2 ( ( NP VP ) 1 ) Use NP ART N ART N PP PRON 3 ((( ART N VP ) 1 ) (a) ( ART N PP VP ART does not (( ) th I b k 1 ) state (b) used (b) ( ( PRON VP ) 1) 3 B ( ( PRON VP ) 1 ) 4 ( ( VP ) 2 ) Consumed I match I, backup 5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) Verb Attachment Rule used 6 ( ( NP PP ) 3 ) Consumed saw 7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3) (b) ( ( PRON PP ) 3 )

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 2 ( ( NP VP ) 1 ) Use NP ART N ART N PP PRON 3 ((( ART N VP ) 1 ) (a) ( ART N PP VP ART does not (( ) th I b k 1 ) state (b) used (b) ( ( PRON VP ) 1) 3 B ( ( PRON VP ) 1 ) 4 ( ( VP ) 2 ) Consumed I match I, backup 5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) Verb Attachment Rule used 6 ( ( NP PP ) 3 ) Consumed saw 7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3) (b) ( ( PRON PP ) 3 ) 8 ( ( N PP) 4 ) Consumed a

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 2 ( ( NP VP ) 1 ) Use NP ART N ART N PP PRON 3 ((( ART N VP ) 1 ) (a) ( ART N PP VP ART does not (( ) th I b k 1 ) state (b) used (b) ( ( PRON VP ) 1) 3 B ( ( PRON VP ) 1 ) 4 ( ( VP ) 2 ) Consumed I match I, backup 5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) Verb Attachment Rule used 6 ( ( NP PP ) 3 ) Consumed saw 7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3) (b) ( ( PRON PP ) 3 ) 8 ( ( N PP) 4 ) Consumed a 9 ((PP)5) Consumed

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 2 ( ( NP VP ) 1 ) Use NP ART N ART N PP PRON 3 ((( ART N VP ) 1 ) (a) ( ART N PP VP ART does not (( ) th I b k 1 ) state (b) used (b) ( ( PRON VP ) 1) 3 B ( ( PRON VP ) 1 ) 4 ( ( VP ) 2 ) Consumed I match I, backup 5 ( ( V NP PP ) 2 ) ( ( V NP ) 2 ) Verb Attachment Rule used 6 ( ( NP PP ) 3 ) Consumed saw 7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3) (b) ( ( PRON PP ) 3 ) 8 ( ( N PP) 4 ) Consumed a 9 ((PP)5) Consumed

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 ) (b) ( ( PRON PP ) 3 ) 8 ( ( N PP) 4 ) Consumed a 9 ( ( PP ) 5 ) Consumed boy 10 ( ( P NP ) 5 ) 11 ((NP)6) Consumed 11 ( ( NP ) 6 ) with

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 ) (b) ( ( PRON PP ) 3 ) 8 ( ( N PP) 4 ) Consumed a 9 ( ( PP ) 5 ) Consumed boy 10 ( ( P NP ) 5 ) 11 ((NP)6) ( ) Consumed with 12 ( ( ART N ) 6 ) (a) ( ( ART N PP ) 6 ) (b) ( ( PRON ) 6)

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 ) (b) ( ( PRON PP ) 3 ) 8 ( ( N PP) 4 ) Consumed a 9 ( ( PP ) 5 ) Consumed boy 10 ( ( P NP ) 5 ) 11 ((NP)6) ( ) Consumed with 12 ( ( ART N ) 6 ) (a) ( ( ART N PP ) 6 ) (b) ( ( PRON ) 6) 13 ( ( N ) 7 ) Consumed a

Top Down Parse State Backup State Action Comments ((S)1) Use S NP VP 1 ( ) 7 ( ( ART N PP ) 3 ) (a) ( ( ART N PP PP ) 3 ) (b) ( ( PRON PP ) 3 ) 8 ( ( N PP) 4 ) Consumed a 9 ( ( PP ) 5 ) Consumed boy 10 ( ( P NP ) 5 ) 11 ((NP)6) ( ) Consumed with 12 ( ( ART N ) 6 ) (a) ( ( ART N PP ) 6 ) (b) ( ( PRON ) 6) 13 ( ( N ) 7 ) Consumed a 14 ( ( ) 8 ) Consume telescope Finish Parsing

Top Down Parsing - Observations Top down parsing gave us the Verb Attachment Parse Tree (i.e., Iuseda telescope) To obtain the alternate parse tree, the backup state in step 5 will have to be invoked Is there an efficient way to obtain all parses?

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 Colour Scheme : Blue for Normal Parse Green for Verb Attachment Parse Purple for Noun Attachment Parse Red for Invalid Parse

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 S 1? NP 12 VP 2?

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP??

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? NP 35 ART 34 N 45 S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP?? NP 3? ART 34 N 45 PP 5?

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? NP 35 ART 34 N 45 NP 35 ART 34N 45 S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP?? NP 3? ART 34 N 45 PP 5? NP 3? ART 34 N 45 PP 5?

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? NP 35 ART 34 N 45 NP 35 ART 34N 45 S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP?? NP 3? ART 34 N 45 PP 5? NP 3? ART 34 N 45 PP 5? VP 25 V 23 NP 35 S 15 NP 12 VP 25 VP 2? V 23 NP 35 PP 5?

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? NP 35 ART 34 N 45 NP 35 ART 34 N 45 PP 5? P 56 NP 6? S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP?? NP 3? ART 34 N 45 PP 5? NP 3? ART 34 N 45 PP 5? VP 25 V 23 NP 35 S 15 NP 12 VP 25 VP 2? V 23 NP 35 PP 5?

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? NP 35 ART 34 N 45 NP 35 ART 34 N 45 PP 5? P 56 NP 6? NP 68 ART 67 N 7? S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP?? NP 3? ART 34 N 45 PP 5? NP 3? ART 34 N 45 PP 5? NP6? ART 67 N 78 PP 8? VP 25 V 23 NP 35 S 15 NP 12 VP 25 VP 2? V 23 NP 35 PP 5?

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? NP 35 ART 34 N 45 NP 35 ART 34 N 45 PP 5? P 56 NP 6? NP 68 ART 67 N 7? NP 68 ART 67 N 78 S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP?? NP 3? ART 34 N 45 PP 5? NP 3? ART 34 N 45 PP 5? NP6? ART 67 N 78 PP 8? VP 25 V 23 NP 35 S 15 NP 12 VP 25 VP 2? V 23 NP 35 PP 5?

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? NP 35 ART 34 N 45 NP 35 ART 34 N 45 PP 5? P 56 NP 6? NP 68 ART 67 N 7? NP 68 ART 67 N 78 S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP?? NP 3? ART 34 N 45 PP 5? NP 3? ART 34 N 45 PP 5? NP6? ART 67 N 78 PP 8? VP 25 V 23 NP 35 S 15 NP 12 VP 25 VP 2? V 23 NP 35 PP 5? PP 58 P 56 NP 68

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? NP 35 ART 34 N 45 NP 35 ART 34 N 45 PP 5? P 56 NP 6? NP 68 ART 67 N 7? NP 68 ART 67 N 78 S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP?? NP 3? ART 34 N 45 PP 5? NP 3? ART 34 N 45 PP 5? NP6? ART 67 N 78 PP 8? VP 25 V 23 NP 35 S 15 NP 12 VP 25 VP 2? V 23 NP 35 PP 5? PP 58 P 56 NP 68 NP 38 ART 34 N 45 PP 58

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? NP 35 ART 34 N 45 NP 35 ART 34 N 45 PP 5? P 56 NP 6? NP 68 ART 67 N 7? NP 68 ART 67 N 78 S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP?? NP 3? ART 34 N 45 PP 5? NP 3? ART 34 N 45 PP 5? NP6? ART 67 N 78 PP 8? VP 25 V 23 NP 35 S 15 NP 12 VP 25 VP 2? V 23 NP 35 PP 5? PP 58 P 56 NP 68 NP 38 ART 34 N 45 PP 58 VP 28 V 23 NP 35 PP 58 VP 28 V 23 NP 38

Bottom Up Parse I saw a boy with a telescope 1 2 3 4 5 6 7 8 NP 12 PRON 12 VP 2? V 23 NP 3? NP 35 ART 34 N 45 NP 35 ART 34 N 45 PP 5? P 56 NP 6? NP 68 ART 67 N 7? NP 68 ART 67 N 78 S 1? NP 12 VP 2? VP 2? V 23 NP 3? PP?? NP 3? ART 34 N 45 PP 5? NP 3? ART 34 N 45 PP 5? NP6? ART 67 N 78 PP 8? VP 25 V 23 NP 35 S 15 NP 12 VP 25 VP 2? V 23 NP 35 PP 5? PP 58 P 56 NP 68 NP 38 ART 34 N 45 PP 58 VP 28 V 23 NP 35 PP 58 VP 28 V 23 NP 38 S 18 NP 12 VP 28

Bottom Up Parsing - Observations Both Noun Attachment and Verb Attachment Parses obtained by simply systematically applying the rules Numbers in subscript help in verifying the parse and getting chunks from the parse

Exercise For the sentence, The man saw the boy with a telescope & the grammar given previously, compare the performance of top-down, bottom-up & top-down chart parsing.

Start of Probabilistic Parsing

Example of Sentence labeling: Parsing [ S1[ S[ S[ VP[ VBCome][ NP[ NNPJuly]]]] [,,] [ CC and] [ S [ NP [ DT the] [ JJ IIT] [ NN campus]] [ VP [ AUX is] [ ADJP [ JJ abuzz] [ PP [ IN with] [ NP [ ADJP [ JJ new] [ CC and] [ VBG returning]] [ NNS students]]]]]] [..]]]

Noisy Channel Modeling Source sentence Noisy Channel Target parse T*= argmax [P(T S)] T = argmax [P(T).P(S T)] ( )] T = argmax [P(T)], since given the parse the T sentence is completely determined and P(S T)=1

Corpus A collection of text called corpus, is used for collecting various language data With annotation: more information, but manual labor intensive Practice: label automatically; correct manually The famous Brown Corpus contains 1 million tagged words. Switchboard: very famous corpora 2400 conversations, 543 speakers, many US dialects, annotated with orthography and phonetics

Discriminative vs. Generative Model W * = argmax (P(W SS)) W Discriminativei i i Model Generative Model Compute directly from P(W SS) Compute from P(W).P(SS W)

Language Models N-grams: sequence of n consecutive words/characters Probabilistic / Stochastic Context Free Grammars: Simple probabilistic models capable of handling recursion A CFG with probabilities attached to rules Rule probabilities how likely is it that a particular rewrite rule is used?

PCFGs Why PCFGs? Intuitive probabilistic models for tree-structured languages Algorithms are extensions of HMM algorithms Better than the n-gram model for language modeling.

Formal Definition of PCFG A PCFG consists of A set of terminals {w k }, k = 1,.,V {w k } = { child, teddy, bear, played } A set of non-terminals {N i }, i = 1,,n {N i } = { NP, VP, DT } A designated start symbol N 1 A set of rules {N i ζ j }, where ζ j is a sequence of terminals & non-terminals NP DT NN A corresponding set of rule probabilities bili i

Rule Probabilities Rule probabilities are such that i j i P(N ζ ) = 1 i Eg E.g., P( NP DT NN) = 0.2 P( NP NN) = 0.5 P( NP NP PP) = 03 0.3 P( NP DT NN) = 0.2 Means 20 % of the training data parses Means 20 % of the training data parses use the rule NP DT NN

Probabilistic Context Free Grammars S NP VP 10 1.0 DT the 10 1.0 NP DT NN 0.5 NN gunman 0.5 NP NNS 03 0.3 NN building 0.5 NP NP PP 0.2 VBD sprayed 1.0 PP PNP 1.0 NNS bullets 1.0 VP VP PP 0.6 VP VBD NP 0.4

Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = DT 1.0 NN 0.00225 0.5 VP 0.4 PP 1.0 The gunman VBD 1.0 NP 0.5 P 1.0 NP 0.3 sprayed DT 1.0 NN 0.5 with NNS 1.0 the building bullets

Another Parse t 2 The gunman sprayed the building with bullets. S 1.0 P (t 2 ) = 1.0 * 0.5 * 1.0 * 0.5 * NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.2 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015 The gunman sprayed NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 th building with NNS 1.0 e bullet s

Probability of a sentence Notation : w ab subsequence w a.w b N j NP N j dominates w a.w b or yield(n j ) = w a.w b w a..w b the..sweet..teddy..be Probability of a sentence = P(w 1m) ) Pw ( ) = Pw (, t) 1m 1m t = PtPw ( ) ( t) t 1m Where t is a parse tree of the sentence = Pt () Q Pw ( 1m t ) = 1 t: yield ( t) = w1 m If t is a parse tree for the sentence w 1m, this will be 1!!

Assumptions of the PCFG model Place invariance : P(NP DT NN) is same in locations 1 and 2 Context-free : P(NP DT NN anything outside The child ) = P(NP DT NN) Ancestor free : At 2, P(NP DT NN its ancestor is VP) =P(NP DT NN) 1 NP The child S VP 2 NP The toy

Probability of a parse tree Domination :We say N j dominates from k to l, symbolized as, if W k,l is derived from N j P (tree sentence) = P (tree S 1l 1,l ) where S 1,l means that the start symbol S dominates the word sequence W 1,l P (t s) approximately equals joint probability of constituent non-terminals dominating the sentence fragments (next slide)

Probability of a parse tree (cont.) S 1,l NP 1,2 VP 3,l P ( t s ) = P (t S 1,l ) DT 1 N 2 V 3,3 PP 4,l = P ( NP 1,2, DT 1,1, w 1, w w P NP 1 2 w 3 4,4 5,l N 2,2, w 2, VP 3,l, V 3,3, w 3, PP 4,l, P 4,4, w 4, NP 5,l, w 5 l S 1,l ) w 4 w 5 w l = P ( NP 1,2, VP 3,l S 1,l ) * P ( DT 1,1, N 2,2 NP 1,2 ) * D(w 1 DT 1,1 ) * P( (w 2 N 2,2 )*P(V 3,3, PP 4,l VP 3,l )*P( P(w 3 V 3,3 )*P(P P 4,4, NP 5,l PP 4,l ) * P(w 4 P 4,4 ) * P (w 5 l NP 5,l ) (Using Chain Rule, Context Freeness and Ancestor Freeness )

HMM PCFG O observed sequence w 1m sentence X state sequence t parse tree μ model G grammar Three fundamental questions

HMM PCFG How likely is a certain observation given the model? How likely l is a sentence given the grammar? PO ( μ ) Pw ( G ) How to choose a state sequence which best explains the observations? How to choose a parse which best supports the sentence? 1m arg max PX ( Oμ, ) arg max P( t w1 m, G) X t

HMM PCFG How to choose the model parameters that best explain the observed data? How to choose rule probabilities which maximize the probabilities of the observed sentences? arg max PO ( ) μ μ P w1 m G arg max ( G)

Interesting Probabilities N 1 What is the probability of having a NP at this position such that it will derive the building? - β NP (4,5) NP Inside Probabilities The gunman sprayed the building with bullets 1 2 3 4 5 6 7 Outside Probabilities biliti What is the probability of starting from N 1 and deriving The gunman sprayed, a NP and with bullets? - α NP (4,5)

Interesting Probabilities Random variables to be considered The non-terminal being expanded. E.g., NP The word-span covered by the non-terminal. E.g., (4,5) refers to words the building While calculating gp probabilities, consider: The rule to be used for expansion : E.g., NP DT NN The probabilities associated with the RHS nonterminals : E.g., DT subtree s inside/outside probabilities & NN subtree s inside/outside probabilities

Outside Probability α j(p,q) : The probability of beginning with N 1 & generating the non-terminal N j pq and all words outside w p..w q ( p, q) P( w, N, w G) j α j = 1( p 1) pq ( q+ 1) m N 1 N j w 1 w p-1 w p w q w q+1 w m

Inside Probabilities β j (p,q) : The probability of generating the words w p..w q starting with the non-terminal N j pq. β ( pq, ) = Pw ( N j, G) j pq pq N 1 α N j β w 1 w p-1 w p w q w q+1 w m

α = NP Outside & Inside Probabilities: example (4,5) for "the building" P (The gunman sprayed, NP,with bullets G) β (4,5) for "the building" = (the building, ) NP P NP4,5 G N 1 4,5 NP The gunman sprayed the building with bullets 1 2 3 4 5 6 7