LECTURER: BURCU CAN 2017-2018 Spring
Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can model a more powerful class of languages than HMMs. Can we take advantage of this property?
Production Rule: <Left-Hand Side> <Right-Hand Side> (Probability) Example Grammar: S N V (1.0) N Bob (0.3) Jane (0.7) Example Parse: S V V V N (0.4) loves (0.6) N Jane V N loves Bob
Natural Language Processing: parsing written sentences BioInformatics: RNA sequences Stock Markets: model rise/fall of the Dow Jones Computer Vision: parsing architectural scenes
Statistical parsing uses a probabilistic model of syntax in order to assign probabilities to each parse tree. Provides principled approach to resolving syntactic ambiguity. Allows supervised learning of parsers from tree-banks of parse trees provided by human linguists. Also allows unsupervised learning of parsers from unannotated text, but the accuracy of such parsers has been limited. 5
A PCFG is a probabilistic version of a CFG where each production has a probability. Probabilities of all productions rewriting a given non-terminal must add to 1, defining a distribution for each non-terminal.
S NP VP VP V NP VP V PP VP V NP PP NP NP NP NP NP PP NP N PP P NP people fish tanks people fish with rods N people N fish N tanks N rods V people V fish V tanks P with
S NP VP 1.0 VP V NP 0.6 VP V NP PP 0.4 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0
P(t) The probability of a tree t is the product of the probabilities of the rules used to generate it. P(s) The probability of the string s is the sum of the probabilities of the trees which have that string as their yield P(s) = Σ j P(s, t) where t is a parse of s = Σ j P(t)
EXAMPLE - 1
s = people fish tanks with rods P(t 1 ) = 1.0 0.7 0.4 0.5 0.6 0.7 1.0 0.2 1.0 0.7 0.1 = 0.0008232 P(t 2 ) = 1.0 0.7 0.6 0.5 0.6 0.2 0.7 1.0 0.2 1.0 0.7 0.1 = 0.00024696 P(s) = P(t 1 ) + P(t 2 ) = 0.0008232 + 0.00024696 = 0.00107016
EXAMPLE - 2
All rules are of the form X Y Z or X w X, Y, Z N (non-terminals) and w T (terminals) A transformation to this form doesn t change the generative capacity of a CFG That is, it recognizes the same language But maybe with different trees Empties and unaries are removed recursively n-ary rules are divided by introducing new nonterminals (n > 2)
Observation likelihood: To classify and order sentences. Most likely derivation: To determine the most likely parse tree for a sentence. Maximum likelihood training: To train a PCFG to fit empirical training data. 24
There is an analog to the Viterbi algorithm to efficiently determine the most probable derivation (parse tree) for a sentence. S NP VP S VP NP Det A N NP NP PP NP PropN A ε A Adj A PP Prep NP VP V NP VP VP PP English 0.9 0.1 0.5 0.3 0.2 0.6 0.4 1.0 0.7 0.3 John liked the dog in the pen. S XNP VP PCFG John V NP PP Parser liked the dog in the pen
There is an analog to the Viterbi algorithm to efficiently determine the most probable derivation (parse tree) for a sentence. S NP VP S VP NP Det A N NP NP PP NP PropN A ε A Adj A PP Prep NP VP V NP VP VP PP English 0.9 0.1 0.5 0.3 0.2 0.6 0.4 1.0 0.7 0.3 John liked the dog in the pen. PCFG Parser S NP VP John V NP liked the dog in the pen 26
CKY can be modified for PCFG parsing by including in each cell a probability for each non-terminal. Cell[i,j] must retain the most probable derivation of each constituent (non-terminal) covering words i +1 through j together with its associated probability. When transforming the grammar to CNF, must set production probabilities to preserve the probability of derivations.
Original Grammar S NP VP S Aux NP VP S VP NP Pronoun NP Proper-Noun NP Det Nominal Nominal Noun Nominal Nominal Noun Nominal Nominal PP VP Verb VP Verb NP VP VP PP PP Prep NP 0.8 0.1 0.1 0.2 0.2 0.6 0.3 0.2 0.5 0.2 0.5 0.3 1.0 Chomsky Normal Form S NP VP S X1 VP X1 Aux NP S book include prefer 0.01 0.004 0.006 S Verb NP S VP PP NP I he she me 0.1 0.02 0.02 0.06 NP Houston NWA 0.16.04 NP Det Nominal Nominal book flight meal money 0.03 0.15 0.06 0.06 Nominal Nominal Noun Nominal Nominal PP VP book include prefer 0.1 0.04 0.06 VP Verb NP VP VP PP PP Prep NP 0.8 0.1 1.0 0.05 0.03 0.6 0.2 0.5 0.5 0.3 1.0
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 29
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 30
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 31
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 Prep:.2 32
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 33
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 34
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 Det:.6 NP:.6*.6*.15 =.054 NP:.6*.6*.0024 =.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 35
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 S:.05*.5*.000864 =.0000216 Det:.6 NP:.6*.6*.15 =.054 NP:.6*.6*.0024 =.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 36
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 S:.03*.0135*.032 =.00001296 S:.0000216 Det:.6 NP:.6*.6*.15 =.054 NP:.6*.6*.0024 =.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 37
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 Det:.6 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 NP:.6*.6*.15 =.054 Nominal:.15 Noun:.5 S:.0000216 NP:.6*.6*.0024 =.000864 Nominal:.5*.15*.032 =.0024 Pick most probable parse, i.e. take max to combine probabilities of multiple derivations of each constituent in each cell. Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 38
S NP VP VP V NP VP V NP PP NP NP NP NP NP PP NP N NP e PP P NP N people N fish N tanks N rods V people V fish V tanks P with Epsilon removal.
S NP VP S VP VP V NP VP V VP V NP PP VP V PP NP NP NP NP NP NP NP PP NP PP NP N PP P NP PP P N people N fish N tanks N rods V people V fish V tanks P with Remove unary rules. (remove S VP)
S NP VP VP V NP S V NP VP V S V VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP NP NP PP NP PP NP N N people N fish N tanks N rods V people V fish V tanks P with PP P NP PP P Remove unary rules. (remove S V)
S NP VP VP V NP S V NP VP V VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP NP NP PP NP PP NP N PP P NP PP P N people N fish N tanks N rods V people S people V fish S fish V tanks S tanks P with Remove unary rules. (remove VP V)
S NP VP VP V NP S V NP VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP NP NP PP NP PP NP N PP P NP PP P N people N fish N tanks N rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with Remove unary rules. (remove NP NP, NP N, PP P)
S NP VP VP V NP S V NP VP V NP PP S V NP PP VP V PP S V PP NP NP NP NP NP PP NP P NP PP P NP NP people NP fish NP tanks NP rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with PP with Binarize now.
S NP VP VP V NP S V NP VP V @VP_V @VP_V NP PP S V @S_V @S_V NP PP VP V PP S V PP NP NP NP NP NP PP NP P NP PP P NP NP people NP fish NP tanks NP rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with PP with Chomsky Normal Form!
S NP VP VP V NP VP V NP PP NP NP NP NP NP PP NP N NP e PP P NP N people N fish N tanks N rods V people V fish V tanks P with Initial grammar.
S NP VP VP V NP S V NP VP V @VP_V @VP_V NP PP S V @S_V @S_V NP PP VP V PP S V PP NP NP NP NP NP PP NP P NP PP P NP NP people NP fish NP tanks NP rods V people S people VP people V fish S fish VP fish V tanks S tanks VP tanks P with PP with Final CNF grammar.
You should think of this as a transformation for efficient parsing Binarization is crucial for cubic time CFG parsing The rest isn t necessary; it just makes the algorithms cleaner and a bit quicker
S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0
0 1 2 3 4 fish people fish tanks score[0][1] score[0][2] score[0][3] score[0][4] 1 score[1][2] score[1][3] score[1][4] 2 score[2][3] score[2][4] 3 score[3][4] 4
S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 0 1 2 1 2 3 4 fish people fish tanks N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 3 4 for i=0;; i<#(words);; i++ for A in nonterms if A -> words[i] in grammar score[i][i+1][a] = P(A -> words[i]);;
S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 0 1 2 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 N people 0.5 V people 0.1 N fish 0.2 V fish 0.6 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 // handle unaries boolean 3 added = true while added added = false for A, B in nonterms if score[i][i+1][b] > 0 && A->B in grammar prob = P(A->B)*score[i][i+1][B] if(prob > score[i][i+1][a]) 4 score[i][i+1][a] = prob back[i][i+1][a] = B added = true N tanks 0.2 V tanks 0.1
S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 //handle binaries prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if (prob > score[begin][end][a]) score[begin]end][a] = prob back[begin][end][a] = new Triple(split,B,C) N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003
S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.0049 VP V NP 0.105 S NP VP 0.00126 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 //handle unaries boolean added = true while added added = false for A, B in nonterms prob = P(A->B)*score[begin][end][B];; if prob > score[begin][end][a] score[begin][end][a] = prob back[begin][end][a] = B added = true NP NP NP 0.0049 VP V NP 0.007 S NP VP 0.0189 N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.00196 VP V NP 0.042 S NP VP 0.00378 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003
S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.0049 VP V NP 0.105 S VP 0.0105 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 NP NP NP 0.0049 VP V NP 0.007 S NP VP 0.0189 N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 for split = begin+1 to end-1 for A,B,C in nonterms prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if prob > score[begin][end][a] score[begin]end][a] = prob back[begin][end][a] = new Triple(split,B,C) NP NP NP 0.00196 VP V NP 0.042 S VP 0.0042 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003
S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.0049 VP V NP 0.105 S VP 0.0105 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 NP NP NP 0.0000686 VP V NP 0.00147 S NP VP 0.000882 NP NP NP 0.0049 VP V NP 0.007 S NP VP 0.0189 N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 for split = begin+1 to end-1 for A,B,C in nonterms prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if prob > score[begin][end][a] score[begin]end][a] = prob back[begin][end][a] = new Triple(split,B,C) NP NP NP 0.00196 VP V NP 0.042 S VP 0.0042 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003
S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 P with 1.0 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.0049 VP V NP 0.105 S VP 0.0105 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 NP NP NP 0.0000686 VP V NP 0.00147 S NP VP NP NP NP 0.0049 VP V NP 0.007 S NP VP 0.0189 N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 for split = begin+1 to end-1 for A,B,C in nonterms prob=score[begin][split][b]*score[split][end][c]*p(a->bc) if prob > score[begin][end][a] score[begin]end][a] = prob back[begin][end][a] = new Triple(split,B,C) 0.000882 NP NP NP 0.0000686 VP V NP 0.000098 S NP VP NP NP 0.01323 NP 0.00196 VP V NP 0.042 S VP 0.0042 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003
S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V @VP_V 0.3 VP V PP 0.1 @VP_V NP PP 1.0 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 N people 0.5 N fish 0.2 N tanks 0.2 N rods 0.1 V people 0.1 V fish 0.6 V tanks 0.3 0 1 2 3 4 1 2 3 4 fish people fish tanks N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 NP NP NP 0.0049 VP V NP 0.105 S VP 0.0105 N people 0.5 V people 0.1 NP N 0.35 VP V 0.01 S VP 0.001 NP NP NP 0.0000686 VP V NP 0.00147 S NP VP NP NP NP 0.0049 VP V NP 0.007 S NP VP 0.0189 N fish 0.2 V fish 0.6 NP N 0.14 VP V 0.06 S VP 0.006 P with 1.0 Call buildtree(score, back) to get the best parse NP NP NP 0.0000009604 VP V NP 0.00002058 S NP VP 0.00018522 0.000882 NP NP NP 0.0000686 VP V NP 0.000098 S NP VP 0.01323 NP NP NP 0.00196 VP V NP 0.042 S VP 0.0042 N tanks 0.2 V tanks 0.1 NP N 0.14 VP V 0.03 S VP 0.003
There is an analog to Forward algorithm for HMMs called the Inside algorithm for efficiently determining how likely a string is to be produced by a PCFG. Can use a PCFG as a language model to choose between alternative sentences for speech recognition or machine translation. 59 S NP VP S VP NP Det A N NP NP PP NP PropN A ε A Adj A PP Prep NP VP V NP VP VP PP English 0.9 0.1 0.5 0.3 0.2 0.6 0.4 1.0 0.7 0.3?? O 1 The dog big barked. The big dog barked O 2 P(O 2 English) > P(O 1 English)?
Use CKY probabilistic parsing algorithm but combine probabilities of multiple derivations of any constituent using addition instead of max. 60
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 S:..00001296 S:.0000216 Det:.6 NP:.6*.6*.15 =.054 NP:.6*.6*.0024 =.000864 Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 61
Book the flight through Houston S :.01, VP:.1, Verb:.5 Nominal:.03 Noun:.1 Det:.6 S:.05*.5*.054 =.00135 VP:.5*.5*.054 =.0135 NP:.6*.6*.15 =.054 S:.00001296 +.0000216 Sum probabilities =.00003456 of multiple derivations NP:.6*.6*.0024 =.000864 of each constituent in each cell. Nominal:.15 Noun:.5 Nominal:.5*.15*.032 =.0024 Prep:.2 PP:1.0*.2*.16 =.032 NP:.16 PropNoun:.8 62
If parse trees are provided for training sentences, a grammar and its parameters can be can all be estimated directly from counts accumulated from the treebank (with appropriate smoothing). S Tree Bank NP 63 VP John V NP PP NP S put the dog in the pen VP John V NP PP put the dog in the pen. Supervised PCFG Training S NP VP S VP NP Det A N NP NP PP NP PropN A ε A Adj A PP Prep NP VP V NP VP VP PP English 0.9 0.1 0.5 0.3 0.2 0.6 0.4 1.0 0.7 0.3
Set of production rules can be taken directly from the set of rewrites in the treebank. Parameters can be directly estimated from frequency counts in the treebank. P( a b a ) = count( a b ) count( a g ) å g = count( a b ) count( a ) 64
Given a set of sentences, induce a grammar that maximizes the probability that this data was generated from this grammar. Assume the number of non-terminals in the grammar is specified. Only need to have an unannotated set of sequences generated from the model. Does not need correct parse trees for these sentences. In this sense, it is unsupervised. 65
Training Sentences John ate the apple A dog bit Mary Mary hit the dog John gave Mary the cat.. PCFG Training S NP VP S VP NP Det A N NP NP PP NP PropN A ε A Adj A PP Prep NP VP V NP VP VP PP English 0.9 0.1 0.5 0.3 0.2 0.6 0.4 1.0 0.7 0.3 66
The Inside-Outside algorithm is a version of EM for unsupervised learning of a PCFG. Analogous to Baum-Welch (forward-backward) for HMMs Given the number of non-terminals, construct all possible CNF productions with these non-terminals and observed terminal symbols. Use EM to iteratively train the probabilities of these productions to locally maximize the likelihood of the data. See Manning and Schütze text for details Experimental results are not impressive, but recent work imposes additional constraints to improve unsupervised grammar learning.
Specialized productions can be generated by including the head word and its POS of each non-terminal as part of that non-terminal s symbol. NP NNP John S John-NNP liked-vbd VBD liked VP liked-vbd the DT NP Nominal Nominal PP NN dog dog-nn dog-nn IN in Nominaldog-NN Nominaldog-NN PPin-IN dog-nn in-in NP DT pen-nn Nominal pen-nn the NN pen
NP NNP John S John-NNP VBD put-vbd VP put DT the VP put-vbd NP put-vbd dog-nn Nominal NN dog dog-nn IN PP in-in NP in DT the VPput-VBD VPput-VBD PPin-IN pen-nn Nominal NN pen pen-nn
Accurately estimating parameters on such a large number of very specialized productions could require enormous amounts of treebank data. Need some way of estimating parameters for lexicalized productions that makes reasonable independence assumptions so that accurate probabilities for very specific rules can be learned.
English Penn Treebank: Standard corpus for testing syntactic parsing consists of 1.2 M words of text from the Wall Street Journal (WSJ). Typical to train on about 40,000 parsed sentences and test on an additional standard disjoint test set of 2,416 sentences. Chinese Penn Treebank: 100K words from the Xinhua news service. 71
72 ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (..) ))
Eryiğit et al. Multiword Expressions in Statistical Dependency Parsing, SPMRL,2011 5635 sentences Uses CoNLL format
PARSEVAL metrics measure the fraction of the constituents that match between the computed and human parse trees. If P is the system s parse tree and T is the human parse tree (the gold standard ): Recall = (# correct constituents in P) / (# constituents in T) Precision = (# correct constituents in P) / (# constituents in P) F 1 is the harmonic mean of precision and recall.
Correct Tree T S Computed Tree P S Verb book VP the NP Det Nominal Nominal Noun flight Prep PP through NP Proper-Noun Houston Verb book VP the NP Det Nominal Noun flight Prep through # Constituents: 12 # Constituents: 12 # Correct Constituents: 10 Recall = 10/12= 83.3% Precision = 10/12=83.3% F 1 = 83.3% VP PP NP Proper-Noun Houston
Statistical models such as PCFGs allow for probabilistic resolution of ambiguities. PCFGs can be easily learned from treebanks. Lexicalization is required to effectively resolve many ambiguities. Current statistical parsers are quite accurate but not yet at the level of human-expert agreement. 76
Raymond Mooney, Statistical Parsing. University of Texas Grant Schindler, PCFGs. Julia Hockenmaier, More on PCFG Parsing