CS 6120/CS4120: Natural Language Processing

Similar documents
LECTURER: BURCU CAN Spring

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Constituency Parsing

Probabilistic Context-free Grammars

Parsing with Context-Free Grammars

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

CS 545 Lecture XVI: Parsing

Parsing with Context-Free Grammars

CS460/626 : Natural Language

Maschinelle Sprachverarbeitung

Multilevel Coarse-to-Fine PCFG Parsing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Processing/Speech, NLP and the Web

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Maschinelle Sprachverarbeitung

Statistical Methods for NLP

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

10/17/04. Today s Main Points

Probabilistic Context Free Grammars. Many slides from Michael Collins

In this chapter, we explore the parsing problem, which encompasses several questions, including:

Context- Free Parsing with CKY. October 16, 2014

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Advanced Natural Language Processing Syntactic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

A Context-Free Grammar

Natural Language Processing

A* Search. 1 Dijkstra Shortest Path

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Introduction to Computational Linguistics

Lecture 13: Structured Prediction

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Decoding and Inference with Syntactic Translation Models

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

CS 662 Sample Midterm

Chapter 14 (Partially) Unsupervised Parsing

X-bar theory. X-bar :

CISC4090: Theory of Computation

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

Multiword Expression Identification with Tree Substitution Grammars

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars

Algorithms for Syntax-Aware Statistical Machine Translation

CS838-1 Advanced NLP: Hidden Markov Models

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Introduction to Probablistic Natural Language Processing

Features of Statistical Parsers

Probabilistic Context Free Grammars

Effectiveness of complex index terms in information retrieval

Lecture 12: Algorithms for HMMs

Aspects of Tree-Based Statistical Machine Translation

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

The Infinite PCFG using Hierarchical Dirichlet Processes

Probabilistic Context-Free Grammar

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Statistical Machine Translation

Midterm sample questions

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

DT2118 Speech and Speaker Recognition

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Lecture 12: Algorithms for HMMs

Artificial Intelligence

Sentence-level processing

Syntax-Based Decoding

Lecture 5: UDOP, Dependency Grammars

Even More on Dynamic Programming

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Soft Inference and Posterior Marginals. September 19, 2013

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

CS460/626 : Natural Language

Remembering subresults (Part I): Well-formed substring tables

Lecture 9: Hidden Markov Model

CSCI Compiler Construction

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

Probabilistic Linguistics

CS : Speech, NLP and the Web/Topics in AI

Section 1 (closed-book) Total points 30

TnT Part of Speech Tagger

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Transcription:

CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang

Assignment/report submission Assignment submission problem, e.g. format Contact Tirthraj, Manthan Project reports submission problem Contact Liwen Academic integrity: Declaration on submission Code from Github, stackoverflow, etc Discussion with XX, etc

Project Progress Report What changes you have made for the task compared to the proposal, including problem/task, models, datasets, or evaluation methods? If there is any change, please explain why. Describe data preprocessing process. This includes data cleaning, selection, feature generation or other representation you have used, etc. What methods or models you have tried towards the project goal? And why do you choose the methods (you can including related work on similar task or relevant tasks)? What results you have achieved up to now based on your proposed evaluation methods? What is working or what is wrong with the model? How can you improve your models? What are the next steps? Grading: For 2-5, each aspect will take about 25 points. Length: Length: 2 page (or more if necessary)

Two views of linguistic structure: 1. Constituency (phrase structure) Phrase structure organizes words into nested constituents. Fed raises interest rates

Two views of linguistic structure: 1. Constituency (phrase structure) Phrase structure organizes words into nested constituents.

Two views of linguistic structure: 1. Constituency (phrase structure) Phrase structure organizes words into nested constituents. How do we know what is a constituent? (Not that linguists don t argue about some cases.) Distribution: a constituent behaves as a unit that can appear in different places: John talked [to the children] [about drugs]. John talked [about drugs] [to the children]. *John talked drugs to the children about Substitution/expansion/pronoun (pro-forms): I sat [on the box/right on top of the box/there].

Headed phrase structure Context-free grammar VP VB* NP NN* ADJP JJ* ADVP RB* SBAR(Q) S SINV SQ NP VP Plus minor phrase types: QP (quantifier phrase in NP), CONJP (multi word constructions: as well as), INTJ (interjections), etc.

Two views of linguistic structure: 2. Dependency structure Dependency structure shows which words depend on (modify or are arguments of) which other words. The boy put the tortoise on the rug

Two views of linguistic structure: 2. Dependency structure Dependency structure shows which words depend on (modify or are arguments of) which other words. put The boy put the tortoise on the rug The boy the tortoise on the rug

Phrase Chunking Find all non-recursive noun phrases (NPs) and verb phrases (VPs) in a sentence. [NP I] [VP ate] [NP the spaghetti] [PP with] [NP meatballs]. [NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only 1.8 billion ] [PP in ] [NP September ]

Phrase Chunking as Sequence Labeling Tag individual words with one of 3 tags B (Begin) word starts new target phrase I (Inside) word is part of target phrase but not the first word O (Other) word is not part of target phrase Sample for NP chunking He reckons the current account deficit will narrow to only 1.8 billion in September. Begin Inside Other

Evaluating Chunking Per token accuracy does not evaluate finding correct full chunks. Instead use: Number of correct chunks found Precision = Total number of chunks found Number of correct chunks found Recall = Total number of actual chunks Take harmonic mean to produce a single evaluation metric called F measure. 1 2PR 2 ( b + 1) PR F1 = = F = 1 1 P + R 2 b P + R ( + ) / 2 P R

Current Chunking Results Best system for NP chunking: F 1 =96% Typical results for finding range of chunk types (CONLL 2000 shared task: NP, VP, PP, ADV, SBAR, ADJP) is F 1 =92 94%

Headed phrase structure Context-free grammar VP VB* NP NN* ADJP JJ* ADVP RB* SBAR(Q) S SINV SQ NP VP Plus minor phrase types: QP (quantifier phrase in NP), CONJP (multi word constructions: as well as), INTJ (interjections), etc.

A Brief Parsing History

Pre 1990 ( Classical ) NLP Parsing Wrote symbolic grammar (CFG or often richer) and lexicon S NP VP NP (DT) NN NP NN NNS NP NNP VP V NP NN interest NNS rates NNS raises VBP interest VBZ rates Used grammar systems to prove parses from words This scaled very badly and didn t give coverage. For sentence: Fed raises interest rates 0.5% in effort to control inflation Minimal grammar: 36 parses Simple 10 rule grammar: 592 parses Real-size broad-coverage grammar: millions of parses

Syntactic Parsing Produce the correct syntactic parse tree for a sentence.

An exponential number of attachments

Attachment ambiguities A key parsing decision is how we attach various constituents PPs, adverbial or participial phrases, infinitives, coordinations, etc. The board approved [its acquisition] [by Royal Trustco Ltd.] [of Toronto] [for $27 a share] [at its monthly meeting].

Attachment ambiguities A key parsing decision is how we attach various constituents PPs, adverbial or participial phrases, infinitives, coordinations, etc. The board approved [its acquisition] [by Royal Trustco Ltd.] [of Toronto] [for $27 a share] [at its monthly meeting].

Attachment ambiguities A key parsing decision is how we attach various constituents PPs, adverbial or participial phrases, infinitives, coordinations, etc.

Classical NLP Parsing: The problem and its solution Categorical constraints can be added to grammars to limit unlikely/weird parses for sentences But the attempt make the grammars not robust In traditional systems, commonly 30% of sentences in even an edited text would have no parse. A less constrained grammar can parse more sentences But simple sentences end up with ever more parses with no way to choose between them We need mechanisms that allow us to find the most likely parse(s) for a sentence Statistical parsing lets us work with very loose grammars that admit millions of parses for sentences but still quickly find the best parse(s)

The rise of annotated data: The Penn Treebank ( (S (NP-SBJ (DT The) (NN move)) (VP (VBD followed) (NP (NP (DT a) (NN round)) (PP (IN of) (NP (NP (JJ similar) (NNS increases)) (PP (IN by) (NP (JJ other) (NNS lenders))) (PP (IN against) (NP (NNP Arizona) (JJ real) (NN estate) (NNS loans)))))) (,,) (S-ADV (NP-SBJ (-NONE- *)) (VP (VBG reflecting) (NP (NP (DT a) (VBG continuing) (NN decline)) (PP-LOC (IN in) (NP (DT that) (NN market))))))) (..))) [Marcus et al. 1993, Computational Linguistics]

The rise of annotated data Starting off, building a treebank seems a lot slower and less useful than building a grammar But a treebank gives us many things Reusability of the labor Many parsers, POS taggers, etc. Valuable resource for linguistics Broad coverage Frequencies and distributional information A way to evaluate systems

Statistical parsing applications Statistical parsers are now robust and widely used in larger NLP applications: High precision question answering [Pasca and Harabagiu SIGIR 2001] Improving biological named entity finding [Finkel et al. JNLPBA 2004] Syntactically based sentence compression [Lin and Wilbur 2007] Extracting opinions about products [Bloom et al. NAACL 2007] Improved interaction in computer games [Gorniak and Roy 2005] Helping linguists find data [Resnik et al. BLS 2005] Source sentence analysis for machine translation [Xu et al. 2009] Relation extraction systems [Fundel et al. Bioinformatics 2006]

Two problems to solve: 1. Repeated work

Two problems to solve: 1. Repeated work

Two problems to solve: 2. Choosing the correct parse How do we work out the correct attachment: She saw the man with a telescope Words are good predictors of attachment, even absent full understanding Moscow sent more than 100,000 soldiers into Afghanistan Sydney Water breached an agreement with NSW Health Our statistical parsers will try to exploit such statistics.

(Probabilistic) Context-Free Grammars CFG PCFG

A phrase structure grammar S NP VP VP V NP VP V NP PP NP NP NP NP NP PP NP N NP e PP P NP people fish tanks people fish with rods N people N fish N tanks N rods V people V fish V tanks P with

Phrase structure grammars = context-free grammars (CFGs) G = (T, N, S, R) T is a set of terminal symbols N is a set of nonterminal symbols S is the start symbol (S N) R is a set of rules/productions of the form X g X N and g (N T)* A grammar G generates a language L.

Sentence Generation Sentences are generated by recursively rewriting the start symbol using the productions until only terminals symbols remain. S VP Verb book NP Det Nominal the Nominal PP Noun flight Prep through NP Proper-Noun Houston

Phrase structure grammars in NLP G = (T, C, N, S, L, R) T is a set of terminal symbols C is a set of preterminal symbols N is a set of nonterminal symbols S is the start symbol (S N) L is the lexicon, a set of items of the form X x X C and x T R is the grammar, a set of items of the form X g X N and g (N C)* By usual convention, S is the start symbol, but in statistical NLP, we usually have an extra node at the top (ROOT, TOP) We usually write e for an empty sequence, rather than nothing

A phrase structure grammar S NP VP VP V NP VP V NP PP NP NP NP NP NP PP NP N NP e PP P NP people fish tanks people fish with rods N people N fish N tanks N rods V people V fish V tanks P with

Probabilistic or stochastic context-free grammars (PCFGs) G = (T, N, S, R, P) T is a set of terminal symbols N is a set of nonterminal symbols S is the start symbol (S N) R is a set of rules/productions of the form X g P is a probability function P: R [0,1] X N, P(X γ) =1 X γ R A grammar G generates a language model L.

A PCFG S NP VP 1.0 VP V NP 0.6 VP V NP PP 0.4 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 [With empty NP removed so less ambiguous]

The probability of trees and strings P(t) The probability of a tree t is the product of the probabilities of the rules used to generate it. P(s) The probability of the string s is the sum of the probabilities of the trees which have that string as their yield P(s) = Σ t P(s, t) where t is a parse of s = Σ t P(t)

Tree and String Probabilities s = people fish tanks with rods P(t 1 ) = 1.0 0.7 0.4 0.5 0.6 0.7 1.0 0.2 1.0 0.7 0.1 = 0.0008232 P(t 2 ) = 1.0 0.7 0.6 0.5 0.6 0.2 0.7 1.0 0.2 1.0 0.7 0.1 = 0.00024696 P(s) = P(t 1 ) + P(t 2 ) = 0.0008232 + 0.00024696 = 0.00107016 Verb attach Noun attach

Chomsky Normal Form All rules are of the form X Y Z or X w X, Y, Z N and w T A transformation to this form doesn t change the generative capacity of a CFG That is, it recognizes the same language But maybe with different trees Empties and unaries are removed recursively n-aryrules are divided by introducing new nonterminals (n > 2)

A phrase structure grammar S NP VP VP V NP VP V NP PP NP NP NP NP NP PP NP N NP e PP P NP N people N fish N tanks N rods V people V fish V tanks P with

Chomsky Normal Form steps S NP VP S VP VP V NP VP V VP V NP PP VP V PP NP NP NP NP NP NP NP PP NP PP NP N PP P NP PP P N people N fish N tanks N rods V people V fish V tanks P with

Chomsky Normal Form steps S NP VP VP V NP S V NP VP V S V VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP NP NP PP NP PP NP N PP P NP PP P N people N fish N tanks N rods V people V fish V tanks P with

Chomsky Normal Form steps S NP VP VP V NP S V NP VP V VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP NP NP PP NP PP NP N PP P NP PP P N people N fish N tanks N rods V people S people V fish S fish V tanks S tanks P with

Chomsky Normal Form steps S NP VP VP V NP S V NP VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP NP NP PP NP PP NP N PP P NP PP P N people N fish N tanks N rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with

Chomsky Normal Form steps S NP VP VP V NP S V NP VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP PP NP P NP PP P NP NP people NP fish NP tanks NP rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with PP with

Chomsky Normal Form steps S NP VP VP V NP S V NP VP V @VP_V @VP_V NP PP S V @S_V @S_V NP PP VP V PP S V PP NP NP NP NP NP PP NP P NP PP P NP NP people NP fish NP tanks NP rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with PP with

Chomsky Normal Form You should think of this as a transformation for efficient parsing Binarization is crucial for cubic time CFG parsing The rest isn t necessary; it just makes the algorithms cleaner and a bit quicker

An example: before binarization ROOT S NP VP N V NP PP N P NP N people fish tanks with rods

After binarization on VP ROOT S NP VP @VP_V N V NP PP N P NP N people fish tanks with rods

Parsing Given a string of terminals and a CFG, determine if the string can be generated by the CFG. Also return a parse tree for the string Also return all possible parse trees for the string Must search space of derivations for one that derives the given string. Top-Down Parsing: Start searching space of derivations for the start symbol. Bottom-up Parsing: Start search space of reverse deivations from the terminal symbols in the string.

Parsing Example S VP book that flight Verb NP book Det Nominal that Noun flight

Top Down Parsing S NP VP Pronoun

Top Down Parsing S NP VP Pronoun X book

Top Down Parsing S NP VP ProperNoun

Top Down Parsing S NP VP ProperNoun X book

Top Down Parsing S NP VP Det Nominal

Top Down Parsing S NP VP Det X book Nominal

Top Down Parsing S Aux NP VP

Top Down Parsing S Aux NP VP X book

Top Down Parsing S VP

Top Down Parsing S VP Verb

Top Down Parsing S VP Verb book

Top Down Parsing S VP Verb book X that

Top Down Parsing S VP Verb NP

Top Down Parsing S VP Verb NP book

Top Down Parsing S VP Verb NP book Pronoun

Top Down Parsing S VP Verb NP book Pronoun X that

Top Down Parsing S VP Verb NP book ProperNoun

Top Down Parsing S VP Verb NP book ProperNoun X that

Top Down Parsing S VP Verb NP book Det Nominal

Top Down Parsing S VP Verb NP book Det Nominal that

Top Down Parsing S VP Verb NP book Det Nominal that Noun

Top Down Parsing S VP Verb NP book Det Nominal that Noun flight

Bottom Up Parsing book that flight

Bottom Up Parsing Noun book that flight

Bottom Up Parsing Nominal Noun book that flight

Bottom Up Parsing Nominal Nominal Noun Noun book that flight

Bottom Up Parsing Nominal Nominal Noun X Noun book that flight

Bottom Up Parsing Nominal Nominal PP Noun book that flight

Bottom Up Parsing Nominal Nominal PP Noun Det book that flight

Bottom Up Parsing Nominal Nominal PP NP Noun Det Nominal book that flight

Bottom Up Parsing Nominal Nominal PP NP Noun book Det that Nominal Noun flight

Bottom Up Parsing Nominal Nominal PP NP Noun book Det that Nominal Noun flight

Bottom Up Parsing Nominal Nominal Noun book PP Det that NP S Nominal Noun flight VP

Bottom Up Parsing Nominal Nominal PP S Noun Det NP Nominal VP X book that Noun flight

Bottom Up Parsing Nominal Nominal PP X NP Noun book Det that Nominal Noun flight

Bottom Up Parsing NP Verb book Det that Nominal Noun flight

Bottom Up Parsing VP NP Verb book Det that Nominal Noun flight

Bottom Up Parsing S VP NP Verb book Det that Nominal Noun flight

Bottom Up Parsing S VP X NP Verb book Det that Nominal Noun flight

Bottom Up Parsing VP VP PP NP Verb book Det that Nominal Noun flight

Bottom Up Parsing VP VP Verb book PP X Det that NP Nominal Noun flight

Bottom Up Parsing VP Verb NP Det NP Nominal book that Noun flight

Bottom Up Parsing VP NP Verb book Det that Nominal Noun flight

Bottom Up Parsing S VP NP Verb book Det that Nominal Noun flight

Top Down vs. Bottom Up Top down never explores options that will not lead to a full parse, but can explore many options that never connect to the actual sentence. Bottom up never explores options that do not connect to the actual sentence but can explore options that can never lead to a full parse. Relative amounts of wasted search depend on how much the grammar branches in each direction.

Dynamic Programming Parsing To avoid extensive repeated work, must cache intermediate results, i.e. completed phrases. Caching (memorizing) is critical to obtaining a polynomial time parsing (recognition) algorithm for CFGs. Dynamic programming algorithms based on both top-down and bottom-up search can achieve O(n 3 ) time where n is the length of the input string.

(Probabilistic) CKY Parsing

Constituency Parsing PCFG Rule Prob θ i S S NP VP θ 0 NP VP NP NP NP NP θ 1 N fish θ 42 N N V N fish people fish tanks N people θ 43 V fish θ 44

Cocke-Kasami-Younger (CKY) Constituency Parsing fish people fish tanks

Viterbi (Max) Scores NP 0.35 V 0.1 N 0.5 people VP 0.06 NP 0.14 V 0.6 N 0.2 fish S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0

Extended CKY parsing Unaries can be incorporated into the algorithm Messy, but doesn t increase algorithmic complexity Empties can be incorporated Doesn t increase complexity; essentially like unaries Binarization is vital Without binarization, you don t get parsing cubic in the length of the sentence and in the number of nonterminals in the grammar

The CKY algorithm (1960/1965) extended to unaries function CKY(words, grammar) returns [most_probable_parse,prob] score = new double[#(words)+1][#(words)+1][#(nonterms)] back = new Pair[#(words)+1][#(words)+1][#nonterms]] for i=0; i<#(words); i++ for A in nonterms if A -> words[i] in grammar score[i][i+1][a] = P(A -> words[i]) //handle unaries boolean added = true while added added = false for A, B in nonterms if score[i][i+1][b] > 0 && A->B in grammar prob = P(A->B)*score[i][i+1][B] if prob > score[i][i+1][a] score[i][i+1][a] = prob back[i][i+1][a] = B added = true

The CKY algorithm (1960/1965) extended to unaries for span = 2 to #(words) for begin = 0 to #(words)- span end = begin + span for split = begin+1 to end-1 for A,B,C in nonterms prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if prob > score[begin][end][a] score[begin]end][a] = prob back[begin][end][a] = new Triple(split,B,C) //handle unaries boolean added = true while added added = false for A, B in nonterms prob = P(A->B)*score[begin][end][B]; if prob > score[begin][end][a] score[begin][end][a] = prob back[begin][end][a] = B added = true return buildtree(score, back)