Introduction to Natural Language Processing

Introduction to Natural Language Processing PARSING: Earley, Bottom-Up Chart Parsing Jean-Cédric Chappelier Jean-Cedric.Chappelier@epfl.ch Artificial Intelligence Laboratory 1/20

Objectives of this lecture After CYK algorithm, present two other algorithms used for syntactic parsing 2/20

Earley Parsing Top-down algorithm(predictive) Bottom-up = inference Top-down = search 3 advantages: best known worst-case complexity (as CYK) adaptive complexity for least complex languages (e.g. regular languages) No need for a special form of the CF grammar 2 drawbacks : No way to correct/reconstruct non-parsable sentences ( early error detection ) not very intuitive 3/20

Earley Parsing (2) Idea: on-line (i.e during parsing) binarization of the grammar doted rules and Earley items doted rules: X X 1...X k X k+1...x m withx X 1...X m a rule of the grammar Earley item: one doted rule with one integeri (0 i n,n: size of the input string) the part before the dot ( ) represents the subpart of the rule that derives a substring of the input string starting at positioni+1 Example: (VP V NP, 2) is an Earley item for input string the cat ate a mouse 1 2 3 4 5 4/20

Earley Parsing (3) Principle: Starting from all possible (S X 1...X m,0), parallel construction of all the dotted rules deriving (larger and larger) substrings of the input string, up to a point where the whole input sentence is derived construction of sets of items (E j ) such that: (X α β,i) E j γ,δ : S γxδ and γ w 1...w i and α w i+1...w j Example: in the former example(vp V NP,2) E 3 The input string (lengthn) is syntactically correct (accepted) iff at least one (S X 1...X m,0) is ine n 5/20

Earley Parsing (4) ➊ Initialization: construction ofe 0 1. For each rules X 1...X m in the grammar: add(s X 1...X m,0) toe 0 2. For each(x Y β,0) ine 0 and every ruley γ, add(y γ,0) toe 0 3. Iterate (2) until convergence ofe 0 6/20

Earley Parsing: Interpretation ➋ Iterations: building of derivations ofw 1...w j (E j ) 1. Linking with words: Introduce wordw j whenever a derivation ofw 1...w j 1 can eat w j (i.e. there is a beforew j ) 2. Stepping in the derivation: Whenever non-terminalx can derive a subsequence starting atw i+1 and if there exists one subderivation ending inw i which can eat X, do it! 3. Prediction (of useful items): If at some placey could be eaten by some rule, then introduce all the rules that might (later on) producey 7/20

Earley Parsing (next) ➋ Iterations: construction of thee j sets (1 j n) 1. for all(x α w j β,i) ine j 1, add(x αw j β,i) toe j 2. For all(x γ,i) ofe j, for all(y α Xβ,k) ofe i, add(y αx β,k) toe j 3. For all(y α Xβ,i) ine j and for each rulex γ, add(x γ,j) toe j 4. Repeat to (2) whilee j keeps changing 8/20

Earley Parsing: Full Example Example for I think, and the grammar: S NP VP NP Pron NP DetN VP V VP V S VP V NP Pron I V think 9/20

E 0 : (S NP VP,0) (NP Pron,0) (NP DetN,0) (Pron I,0) E 1 : (Pron I,0) (NP Pron,0) (S NP VP,0) (VP V,1) (VP V P,1) (VP V NP,1) (V think,1) E 2 : (V think,1) (VP V,1) (VP V S,1) (VP V NP,1) (S NP VP,0) (S NP VP,2) (NP Pron,2) (NP DetN,2) (Pron I,2) 10/20

Link between CYK and Earley (X α β,i) E j (X α β) cell j i,i+1................................. (S NP VP,0) (S NP VP,0) (VP V S,1) (VP V NP,1).................................................................. (S NP VP,0) (NP Pron,0) (NP DetN,0) (Pron I,0) (NP Pron,0) (Pron I,0) (VP V,1) (VP V S,1) (VP V NP,1) (V think,1) I (VP V,1) (V think,1) (S NP VP,2) (NP N,2) (NP DetN,2) (Pron I,2) think 11/20

Bottom-up Chart Parsing Idea: keep the best of both CYK and Earley on-line binarization à la Earley (and even better) within a bottom-up CYK-like algorithm Mainly: no need for indices in items: cell position is enough factorize (with respect toα) all thex α β α... This is possible when processing bottom-up replace all thex α simply byx supression ofx α This is possible when processing bottom-up (and without lookahead) 12/20

Bottom-up Chart Parsing: Example S S VP NP... NP 01 Det... V... Det N V VP NP... NP Det... Det The crocodile ate the cat N 13/20

Bottom-up Chart Parsing (3) More formally, a CYK algorithm in which: If cell contents are denoted by[α..., i, j] and[x, i, j] respectively Then initialization isw ij [X,i,j] forx w ij R and the completion phase becomes: (association of two cells) [α...,i,j] [X,k,j +i] [αx...,i+k,j] ify αxβ R [Y,i+k,j] ify αx R ( self-filling ) [X,i,j] [X...,i,j] ify Xβ R [Y,i,j] ify X R 14/20

Bottom-up Chart Parsing: Example Initialization: Det N V Det N The dog hate the cat Det... V... Det... Det N V VP Det N The dog hate the cat k Completion: 00 11 00 11 01 00 11 00 11 01 k 15/20

S S VP 01 01 NP... NP... NP 01 Det... V... NP 01 Det... Det N V NP Det The crocodile ate the N cat 16/20

Dealing with compounds Example on how to deal with compouds during initialization phase: N N V credit N card 17/20

Complexity still ino(n 3 ) What coefficient forn 3? (with respect to grammar parameters) m(r ) NT n 3 wherem(r ) the number of internal nodes of the trie of the right-hand sides of the non-lexical grammar rules NT : the set of non-terminals R : the set of non-lexical grammar rules 18/20

Keypoints The way algorithms work (Earley items, linking, stepping, prediction, link CYK-Earley) worst-case complexityo(n 3 ) Advantages and drawbacks of algorithms 19/20

References [1] D. Jurafsky & J. H. Martin, Speech and Language Processing, pp. 377-385, Prentice Hall, 2000. [2] R. Dale, H. Moisi, H. Somers, Handbook of Natural Language Processing, pp. 69-73, Dekker, 2000. 20/20