LECTURER: BURCU CAN Spring

Similar documents
Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

CS 6120/CS4120: Natural Language Processing

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Probabilistic Context-free Grammars

Parsing with Context-Free Grammars

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Statistical Methods for NLP

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

10/17/04. Today s Main Points

Advanced Natural Language Processing Syntactic Parsing

Constituency Parsing

Probabilistic Context Free Grammars. Many slides from Michael Collins

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Lecture 12: Algorithms for HMMs

Hidden Markov Models

Maschinelle Sprachverarbeitung

Lecture 12: Algorithms for HMMs

Maschinelle Sprachverarbeitung

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

CS 545 Lecture XVI: Parsing

Hidden Markov Models (HMMs)

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Natural Language Processing

Parsing with Context-Free Grammars

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Processing/Speech, NLP and the Web

A Context-Free Grammar

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Multiword Expression Identification with Tree Substitution Grammars

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

CS460/626 : Natural Language

CS838-1 Advanced NLP: Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Lecture 13: Structured Prediction

Spectral Unsupervised Parsing with Additive Tree Metrics

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Sequence Labeling: HMMs & Structured Perceptron

Chapter 14 (Partially) Unsupervised Parsing

Probabilistic Context-Free Grammar

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

CS460/626 : Natural Language

DT2118 Speech and Speaker Recognition

Lecture 9: Hidden Markov Model

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Lecture 5: UDOP, Dependency Grammars

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Midterm sample questions

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

In this chapter, we explore the parsing problem, which encompasses several questions, including:

Probabilistic Graphical Models

Introduction to Computational Linguistics

Multilevel Coarse-to-Fine PCFG Parsing

Latent Variable Models in NLP

Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

A* Search. 1 Dijkstra Shortest Path

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Aspects of Tree-Based Statistical Machine Translation

Probabilistic Context-Free Grammars and beyond

Lecture 15. Probabilistic Models on Graph

Statistical Methods for NLP

Introduction to Probablistic Natural Language Processing

NLP Programming Tutorial 11 - The Structured Perceptron

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars

Statistical Methods for NLP

Text Mining. March 3, March 3, / 49

Context- Free Parsing with CKY. October 16, 2014

Dept. of Linguistics, Indiana University Fall 2009

Features of Statistical Parsers

Decoding and Inference with Syntactic Translation Models

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Soft Inference and Posterior Marginals. September 19, 2013

Today s Agenda. Need to cover lots of background material. Now on to the Map Reduce stuff. Rough conceptual sketch of unsupervised training using EM

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Expectation Maximization (EM)

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

The Infinite PCFG using Hierarchical Dirichlet Processes

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Statistical methods in NLP, lecture 7 Tagging and parsing

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Structured Output Prediction: Generative Models

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

CS : Speech, NLP and the Web/Topics in AI

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

Aspects of Tree-Based Statistical Machine Translation

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

Sentence-level processing

Transcription:

LECTURER: BURCU CAN 2017-2018 Spring

Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can model a more powerful class of languages than HMMs. Can we take advantage of this property?

Production Rule: <Left-Hand Side> <Right-Hand Side> (Probability) Example Grammar: S N V (1.0) N Bob (0.3) Jane (0.7) Example Parse: S V V V N (0.4) loves (0.6) N Jane V N loves Bob

Natural Language Processing: parsing written sentences BioInformatics: RNA sequences Stock Markets: model rise/fall of the Dow Jones Computer Vision: parsing architectural scenes

Statistical parsing uses a probabilistic model of syntax in order to assign probabilities to each parse tree. Provides principled approach to resolving syntactic ambiguity. Allows supervised learning of parsers from tree-banks of parse trees provided by human linguists. Also allows unsupervised learning of parsers from unannotated text, but the accuracy of such parsers has been limited. 5

A PCFG is a probabilistic version of a CFG where each production has a probability. Probabilities of all productions rewriting a given non-terminal must add to 1, defining a distribution for each non-terminal.

S NP VP VP V NP VP V PP VP V NP PP NP NP NP NP NP PP NP N PP P NP people fish tanks people fish with rods N people N fish N tanks N rods V people V fish V tanks P with

S NP VP 1.0 VP V NP 0.6 VP V NP PP 0.4 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0

P(t) The probability of a tree t is the product of the probabilities of the rules used to generate it. P(s) The probability of the string s is the sum of the probabilities of the trees which have that string as their yield P(s) = Σ j P(s, t) where t is a parse of s = Σ j P(t)

EXAMPLE - 1

s = people fish tanks with rods P(t 1 ) = 1.0 0.7 0.4 0.5 0.6 0.7 1.0 0.2 1.0 0.7 0.1 = 0.0008232 P(t 2 ) = 1.0 0.7 0.6 0.5 0.6 0.2 0.7 1.0 0.2 1.0 0.7 0.1 = 0.00024696 P(s) = P(t 1 ) + P(t 2 ) = 0.0008232 + 0.00024696 = 0.00107016

EXAMPLE - 2

All rules are of the form X Y Z or X w X, Y, Z N (non-terminals) and w T (terminals) A transformation to this form doesn t change the generative capacity of a CFG That is, it recognizes the same language But maybe with different trees Empties and unaries are removed recursively n-ary rules are divided by introducing new nonterminals (n > 2)

Observation likelihood: To classify and order sentences. Most likely derivation: To determine the most likely parse tree for a sentence. Maximum likelihood training: To train a PCFG to fit empirical training data. 24

There is an analog to the Viterbi algorithm to efficiently determine the most probable derivation (parse tree) for a sentence. S NP VP S VP NP Det A N NP NP PP NP PropN A ε A Adj A PP Prep NP VP V NP VP VP PP English 0.9 0.1 0.5 0.3 0.2 0.6 0.4 1.0 0.7 0.3 John liked the dog in the pen. S XNP VP PCFG John V NP PP Parser liked the dog in the pen

There is an analog to the Viterbi algorithm to efficiently determine the most probable derivation (parse tree) for a sentence. S NP VP S VP NP Det A N NP NP PP NP PropN A ε A Adj A PP Prep NP VP V NP VP VP PP English 0.9 0.1 0.5 0.3 0.2 0.6 0.4 1.0 0.7 0.3 John liked the dog in the pen. PCFG Parser S NP VP John V NP liked the dog in the pen 26

CKY can be modified for PCFG parsing by including in each cell a probability for each non-terminal. Cell[i,j] must retain the most probable derivation of each constituent (non-terminal) covering words i +1 through j together with its associated probability. When transforming the grammar to CNF, must set production probabilities to preserve the probability of derivations.

Original Grammar S NP VP S Aux NP VP S VP NP Pronoun NP Proper-Noun NP Det Nominal Nominal Noun Nominal Nominal Noun Nominal Nominal PP VP Verb VP Verb NP VP VP PP PP Prep NP 0.8 0.1 0.1 0.2 0.2 0.6 0.3 0.2 0.5 0.2 0.5 0.3 1.0 Chomsky Normal Form S NP VP S X1 VP X1 Aux NP S book include prefer 0.01 0.004 0.006 S Verb NP S VP PP NP I he she me 0.1 0.02 0.02 0.06 NP Houston NWA 0.16.04 NP Det Nominal Nominal book flight meal money 0.03 0.15 0.06 0.06 Nominal Nominal Noun Nominal Nominal PP VP book include prefer 0.1 0.04 0.06 VP Verb NP VP VP PP PP Prep NP 0.8 0.1 1.0 0.05 0.03 0.6 0.2 0.5 0.5 0.3 1.0

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 29

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 30

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 31

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 Prep:.2 32

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 33

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 34

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 NP:.6*.6*.0024 =.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 35

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 S:.05*.5*.000864 =.0000216 Det:.6 NP:.6*.6*.15 =.054 NP:.6*.6*.0024 =.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 36

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 S:.03*.0135*.032 =.00001296 S:.0000216 Det:.6 NP:.6*.6*.15 =.054 NP:.6*.6*.0024 =.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 37

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 Det:.6 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 S:.0000216 NP:.6*.6*.0024 =.000864 Nominal:.5*.15*.032 =.0024 Pick most probable parse, i.e. take max to combine probabilities of multiple derivations of each constituent in each cell. Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 38

S NP VP VP V NP VP V NP PP NP NP NP NP NP PP NP N NP e PP P NP N people N fish N tanks N rods V people V fish V tanks P with Epsilon removal.

S NP VP S VP VP V NP VP V VP V NP PP VP V PP NP NP NP NP NP NP NP PP NP PP NP N PP P NP PP P N people N fish N tanks N rods V people V fish V tanks P with Remove unary rules. (remove S VP)

S NP VP VP V NP S V NP VP V S V VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP NP NP PP NP PP NP N N people N fish N tanks N rods V people V fish V tanks P with PP P NP PP P Remove unary rules. (remove S V)

S NP VP VP V NP S V NP VP V VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP NP NP PP NP PP NP N PP P NP PP P N people N fish N tanks N rods V people S people V fish S fish V tanks S tanks P with Remove unary rules. (remove VP V)

S NP VP VP V NP S V NP VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP NP NP PP NP PP NP N PP P NP PP P N people N fish N tanks N rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with Remove unary rules. (remove NP NP, NP N, PP P)

S NP VP VP V NP S V NP VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP PP NP P NP PP P NP NP people NP fish NP tanks NP rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with PP with Binarize now.

S NP VP VP V NP S V NP VP V @VP_V @VP_V NP PP S V @S_V @S_V NP PP VP V PP S V PP NP NP NP NP NP PP NP P NP PP P NP NP people NP fish NP tanks NP rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with PP with Chomsky Normal Form!

S NP VP VP V NP VP V NP PP NP NP NP NP NP PP NP N NP e PP P NP N people N fish N tanks N rods V people V fish V tanks P with Initial grammar.

S NP VP VP V NP S V NP VP V @VP_V @VP_V NP PP S V @S_V @S_V NP PP VP V PP S V PP NP NP NP NP NP PP NP P NP PP P NP NP people NP fish NP tanks NP rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with PP with Final CNF grammar.

You should think of this as a transformation for efficient parsing Binarization is crucial for cubic time CFG parsing The rest isn t necessary; it just makes the algorithms cleaner and a bit quicker

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0

0 1 2 3 4 fish people fish tanks score[0][1] score[0][2] score[0][3] score[0][4] 1 score[1][2] score[1][3] score[1][4] 2 score[2][3] score[2][4] 3 score[3][4] 4

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 0 1 2 1 2 3 4 fish people fish tanks N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 3 4 for i=0;; i<#(words);; i++ for A in nonterms if A -> words[i] in grammar score[i][i+1][a] = P(A -> words[i]);;

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 0 1 2 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 N people 0.5 V people 0.1 N fish 0.2 V fish 0.6 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 // handle unaries boolean 3 added = true while added added = false for A, B in nonterms if score[i][i+1][b] > 0 && A->B in grammar prob = P(A->B)*score[i][i+1][B] if(prob > score[i][i+1][a]) 4 score[i][i+1][a] = prob back[i][i+1][a] = B added = true N tanks 0.2 V tanks 0.1

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 //handle binaries prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if (prob > score[begin][end][a]) score[begin]end][a] = prob back[begin][end][a] = new Triple(split,B,C) N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.0049 VP V NP 0.105 S NP VP 0.00126 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 //handle unaries boolean added = true while added added = false for A, B in nonterms prob = P(A->B)*score[begin][end][B];; if prob > score[begin][end][a] score[begin][end][a] = prob back[begin][end][a] = B added = true NP NP NP 0.0049 VP V NP 0.007 S NP VP 0.0189 N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.00196 VP V NP 0.042 S NP VP 0.00378 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.0049 VP V NP 0.105 S VP 0.0105 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 NP NP NP 0.0049 VP V NP 0.007 S NP VP 0.0189 N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 for split = begin+1 to end-1 for A,B,C in nonterms prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if prob > score[begin][end][a] score[begin]end][a] = prob back[begin][end][a] = new Triple(split,B,C) NP NP NP 0.00196 VP V NP 0.042 S VP 0.0042 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.0049 VP V NP 0.105 S VP 0.0105 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 NP NP NP 0.0000686 VP V NP 0.00147 S NP VP 0.000882 NP NP NP 0.0049 VP V NP 0.007 S NP VP 0.0189 N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 for split = begin+1 to end-1 for A,B,C in nonterms prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if prob > score[begin][end][a] score[begin]end][a] = prob back[begin][end][a] = new Triple(split,B,C) NP NP NP 0.00196 VP V NP 0.042 S VP 0.0042 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.0049 VP V NP 0.105 S VP 0.0105 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 NP NP NP 0.0000686 VP V NP 0.00147 S NP VP NP NP NP 0.0049 VP V NP 0.007 S NP VP 0.0189 N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 for split = begin+1 to end-1 for A,B,C in nonterms prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if prob > score[begin][end][a] score[begin]end][a] = prob back[begin][end][a] = new Triple(split,B,C) 0.000882 NP NP NP 0.0000686 VP V NP 0.000098 S NP VP NP NP 0.01323 NP 0.00196 VP V NP 0.042 S VP 0.0042 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.0049 VP V NP 0.105 S VP 0.0105 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 NP NP NP 0.0000686 VP V NP 0.00147 S NP VP NP NP NP 0.0049 VP V NP 0.007 S NP VP 0.0189 N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 P with 1.0 Call buildtree(score, back) to get the best parse NP NP NP 0.0000009604 VP V NP 0.00002058 S NP VP 0.00018522 0.000882 NP NP NP 0.0000686 VP V NP 0.000098 S NP VP 0.01323 NP NP NP 0.00196 VP V NP 0.042 S VP 0.0042 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003

There is an analog to Forward algorithm for HMMs called the Inside algorithm for efficiently determining how likely a string is to be produced by a PCFG. Can use a PCFG as a language model to choose between alternative sentences for speech recognition or machine translation. 59 S NP VP S VP NP Det A N NP NP PP NP PropN A ε A Adj A PP Prep NP VP V NP VP VP PP English 0.9 0.1 0.5 0.3 0.2 0.6 0.4 1.0 0.7 0.3?? O 1 The dog big barked. The big dog barked O 2 P(O 2 English) > P(O 1 English)?

Use CKY probabilistic parsing algorithm but combine probabilities of multiple derivations of any constituent using addition instead of max. 60

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 S:..00001296 S:.0000216 Det:.6 NP:.6*.6*.15 =.054 NP:.6*.6*.0024 =.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 61

Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 Det:.6 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 NP:.6*.6*.15 =.054 S:.00001296 +.0000216 Sum probabilities =.00003456 of multiple derivations NP:.6*.6*.0024 =.000864 of each constituent in each cell. Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 62

If parse trees are provided for training sentences, a grammar and its parameters can be can all be estimated directly from counts accumulated from the treebank (with appropriate smoothing). S Tree Bank NP 63 VP John V NP PP NP S put the dog in the pen VP John V NP PP put the dog in the pen. Supervised PCFG Training S NP VP S VP NP Det A N NP NP PP NP PropN A ε A Adj A PP Prep NP VP V NP VP VP PP English 0.9 0.1 0.5 0.3 0.2 0.6 0.4 1.0 0.7 0.3

Set of production rules can be taken directly from the set of rewrites in the treebank. Parameters can be directly estimated from frequency counts in the treebank. P( a b a ) = count( a b ) count( a g ) å g = count( a b ) count( a ) 64

Given a set of sentences, induce a grammar that maximizes the probability that this data was generated from this grammar. Assume the number of non-terminals in the grammar is specified. Only need to have an unannotated set of sequences generated from the model. Does not need correct parse trees for these sentences. In this sense, it is unsupervised. 65

Training Sentences John ate the apple A dog bit Mary Mary hit the dog John gave Mary the cat.. PCFG Training S NP VP S VP NP Det A N NP NP PP NP PropN A ε A Adj A PP Prep NP VP V NP VP VP PP English 0.9 0.1 0.5 0.3 0.2 0.6 0.4 1.0 0.7 0.3 66

The Inside-Outside algorithm is a version of EM for unsupervised learning of a PCFG. Analogous to Baum-Welch (forward-backward) for HMMs Given the number of non-terminals, construct all possible CNF productions with these non-terminals and observed terminal symbols. Use EM to iteratively train the probabilities of these productions to locally maximize the likelihood of the data. See Manning and Schütze text for details Experimental results are not impressive, but recent work imposes additional constraints to improve unsupervised grammar learning.

Specialized productions can be generated by including the head word and its POS of each non-terminal as part of that non-terminal s symbol. NP NNP John S John-NNP liked-vbd VBD liked VP liked-vbd the DT NP Nominal Nominal PP NN dog dog-nn dog-nn IN in Nominaldog-NN Nominaldog-NN PPin-IN dog-nn in-in NP DT pen-nn Nominal pen-nn the NN pen

NP NNP John S John-NNP VBD put-vbd VP put DT the VP put-vbd NP put-vbd dog-nn Nominal NN dog dog-nn IN PP in-in NP in DT the VPput-VBD VPput-VBD PPin-IN pen-nn Nominal NN pen pen-nn

Accurately estimating parameters on such a large number of very specialized productions could require enormous amounts of treebank data. Need some way of estimating parameters for lexicalized productions that makes reasonable independence assumptions so that accurate probabilities for very specific rules can be learned.

English Penn Treebank: Standard corpus for testing syntactic parsing consists of 1.2 M words of text from the Wall Street Journal (WSJ). Typical to train on about 40,000 parsed sentences and test on an additional standard disjoint test set of 2,416 sentences. Chinese Penn Treebank: 100K words from the Xinhua news service. 71

72 ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (..) ))

Eryiğit et al. Multiword Expressions in Statistical Dependency Parsing, SPMRL,2011 5635 sentences Uses CoNLL format

PARSEVAL metrics measure the fraction of the constituents that match between the computed and human parse trees. If P is the system s parse tree and T is the human parse tree (the gold standard ): Recall = (# correct constituents in P) / (# constituents in T) Precision = (# correct constituents in P) / (# constituents in P) F 1 is the harmonic mean of precision and recall.

Correct Tree T S Computed Tree P S Verb book VP the NP Det Nominal Nominal Noun flight Prep PP through NP Proper-Noun Houston Verb book VP the NP Det Nominal Noun flight Prep through # Constituents: 12 # Constituents: 12 # Correct Constituents: 10 Recall = 10/12= 83.3% Precision = 10/12=83.3% F 1 = 83.3% VP PP NP Proper-Noun Houston

Statistical models such as PCFGs allow for probabilistic resolution of ambiguities. PCFGs can be easily learned from treebanks. Lexicalization is required to effectively resolve many ambiguities. Current statistical parsers are quite accurate but not yet at the level of human-expert agreement. 76

Raymond Mooney, Statistical Parsing. University of Texas Grant Schindler, PCFGs. Julia Hockenmaier, More on PCFG Parsing