A Context-Free Grammar

Size: px
Start display at page:

Download "A Context-Free Grammar"

Transcription

1 Statistical Parsing

2 A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in

3 Ambiguity A sentence of reasonable length can easily have 10s, 100s, or 1000s of possible analyses, most of which are very implausible Examples of sources of ambiguity: Part-of-Speech ambiguity NNS walks Vi walks Prepositional phrase attachment I drove down the street in the car Pre-nominal modifiers the angry car mechanic

4 S VP I VP PP Vt PP in drove down the car the street S VP I Vt PP drove down PP the street in the car

5 D N D N the JJ N the N N angry NN N JJ N NN car NN angry NN mechanic mechanic car

6 A program to promote safety in trucks and vans There are at least 14 analyses for this noun phrase...

7 A Probabilistic Context-Free Grammar S VP 1.0 VP Vi 0.4 VP Vt 0.4 VP VP PP 0.2 DT NN 0.3 PP 0.7 PP P 1.0 Vi sleeps 1.0 Vt saw 1.0 NN man 0.7 NN dog 0.2 NN telescope 0.1 DT the 1.0 IN with 0.5 IN in 0.5 Probability of a tree with rules α i β i is i P(α i β i α i )

8 DERIVATION RULES USED PROBABILITY S S VP 1.0 VP DT N 0.3 DT N VP DT the 1.0 the N VP N dog 0.1 the dog VP VP VB 0.4 the dog VB VB laughs 0.5 the dog laughs PROBABILITY = S DT N VP VB the dog laughs

9 Properties of PCFGs Say we have a sentence S, set of derivations for that sentence is T (S). Then a PCFG assigns a probability to each member of T (S). i.e., we now have a ranking in order of probability. Given a PCFG and a sentence S, we can find arg max P(T, S) T T (S) using dynamic programming (e.g., a variant of the CKY algorithm)

10 Overview Weaknesses of PCFGs Heads in context-free rules Dependency representations of parse trees Two models making use of dependencies

11 Weaknesses of PCFGs Lack of sensitivity to lexical information Lack of sensitivity to structural frequencies

12 S VP N IBM Vt bought N Lotus PROB = P(S VP S) P(N IBM N) P(VP V VP) P(Vt bought Vt) P( N ) P(N Lotus N) P( N )

13 A Case of PP Attachment Ambiguity (a) S VP NNS workers VBD VP IN PP dumped NNS into DT NN sacks a bin

14 (b) S VP NNS workers VBD dumped PP NNS IN sacks into DT NN a bin

15 (a) Rules S VP NNS VP VP PP VP VBD NNS PP IN DT NN NNS workers VBD dumped NNS sacks IN into DT a NN bin (b) Rules S VP NNS PP VP VBD NNS PP IN DT NN NNS workers VBD dumped NNS sacks IN into DT a NN bin If P( PP ) > P(VP VP PP VP) then (b) is more probable, else (a) is more probable. Attachment decision is completely independent of the words

16 A Case of Coordination Ambiguity (a) CC PP and NNS NNS IN cats dogs in NNS houses

17 (b) PP NNS IN dogs in CC NNS and NNS houses cats

18 (a) Rules CC PP NNS PP IN NNS NNS NNS dogs IN in NNS houses CC and NNS cats (b) Rules CC PP NNS PP IN NNS NNS NNS dogs IN in NNS houses CC and NNS cats Here the two parses have identical rules, and therefore have identical probability under any assignment of PCFG rule probabilities

19 Structural Preferences: Close Attachment (a) (b) PP PP NN IN PP IN PP NN IN NN NN IN NN NN Example: president of a company in Africa Both parses have the same rules, therefore receive same probability under a PCFG Close attachment (structure (a)) is twice as likely in Wall Street Journal text.

20 Structural Preferences: Close Attachment John was believed to have been shot by Bill Here the low attachment analysis (Bill does the shooting) contains same rules as the high attachment analysis (Bill does the believing), so the two analyses receive same probability.

21 Heads in Context-Free Rules Add annotations specifying the head of each rule: S VP VP Vi VP Vt VP VP PP DT NN PP PP IN Vi sleeps Vt saw NN man NN woman NN telescope DT the IN with IN in Note: S=sentence, VP=verb phrase, =noun phrase, PP=prepositional phrase, DT=determiner, Vi=intransitive verb, Vt=transitive verb, NN=noun, IN=preposition

22 Rules which Recover Heads: An Example of rules for s If the rule contains NN, NNS, or N: Choose the rightmost NN, NNS, or N Else If the rule contains an : Choose the leftmost Else If the rule contains a JJ: Choose the rightmost JJ Else If the rule contains a CD: Choose the rightmost CD Else Choose the rightmost child e.g., DT N NN DT NN N PP DT JJ DT

23 Adding Headwords to Trees S VP DT the NN lawyer Vt questioned DT NN the witness S(questioned) (lawyer) VP(questioned) DT(the) the NN(lawyer) lawyer Vt(questioned) questioned DT(the) (witness) NN(witness) the witness

24 Adding Headwords to Trees S(questioned) (lawyer) VP(questioned) DT(the) the NN(lawyer) lawyer Vt(questioned) questioned DT(the) (witness) NN(witness) the witness A constituent receives its headword from its head child. S VP (S receives headword from VP) VP Vt (VP receives headword from Vt) DT NN ( receives headword from NN)

25 Adding Headtags to Trees S(questioned, Vt) (lawyer, NN) VP(questioned, Vt) DT the NN lawyer Vt questioned (witness, NN) DT NN the witness Also propagate part-of-speech tags up the trees (We ll see soon why this is useful!)

26 Lexicalized PCFGs S(questioned, Vt) (lawyer, NN) VP(questioned, Vt) DT the NN lawyer Vt questioned (witness, NN) DT NN the witness In PCFGs we had rules such as S -> VP, with probabilities such as P( VP S) In lexicalized PCFGs we have rules such as S(questioned,Vt) -> (lawyer,nn) VP(questioned,Vt) with probabilities such as P((lawyer,NN) VP(questioned,Vt) S(questioned,Vt))

27 A Model from Charniak (1997) S(questioned,Vt) P((,NN) VP S(questioned,Vt)) S(questioned,Vt) (,NN) VP(questioned,Vt) P(lawyer S,VP,,NN, questioned,vt)) S(questioned,Vt) (lawyer,nn) VP(questioned,Vt)

28 Smoothed Estimation P((,NN) VP S(questioned,Vt)) = λ 1 +λ 2 Count(S(questioned,Vt) (,NN) VP) Count(S(questioned,Vt)) Count(S(,Vt) (,NN) VP) Count(S(,Vt)) Where 0 λ 1, λ 2 1, and λ 1 + λ 2 = 1

29 Smoothed Estimation P(lawyer S,VP,,NN,questioned,Vt) = λ 1 +λ 2 +λ 3 Count(lawyer S,VP,,NN,questioned,Vt) Count(S,VP,,NN,questioned,Vt) Count(lawyer S,VP,,NN,Vt) Count(S,VP,,NN,Vt) Count(lawyer NN) Count(NN) Where 0 λ 1, λ 2, λ 3 1, and λ 1 + λ 2 + λ 3 = 1

30 P((lawyer,NN) VP S(questioned,Vt)) = (λ 1 Count(S(questioned,Vt) (,NN) VP) Count(S(questioned,Vt)) +λ 2 Count(S(,Vt) (,NN) VP) Count(S(,Vt)) ) ( λ 1 +λ 2 +λ 3 Count(lawyer S,VP,,NN,questioned,Vt) Count(S,VP,,NN,questioned,Vt) Count(lawyer S,VP,,NN,Vt) Count(S,VP,,NN,Vt) Count(lawyer NN) Count(NN) )

31 Motivation for Breaking Down Rules First step of decomposition of (Charniak 1997): S(questioned,Vt) P((,NN) VP S(questioned,Vt)) S(questioned,Vt) (,NN) VP(questioned,Vt) Relies on counts of entire rules These counts are sparse: 40,000 sentences from Penn treebank have 12,409 rules. 15% of all test data sentences contain a rule never seen in training

32 Motivation for Breaking Down Rules Rule Count No. of Rules Percentage No. of Rules Percentage by Type by Type by token by token > Statistics for rules taken from sections 2-21 of the treebank (Table taken from my PhD thesis).

33 Modeling Rule Productions as Markov Processes Step 1: generate category of head child S(told,V[6]) S(told,V[6]) VP(told,V[6]) P h (VP S, told, V[6])

34 Modeling Rule Productions as Markov Processes Step 2: generate left modifiers in a Markov chain S(told,V[6])?? VP(told,V[6]) S(told,V[6]) (Hillary,N) VP(told,V[6]) P h (VP S, told, V[6]) P d ((Hillary,N) S,VP,told,V[6],LEFT)

35 Modeling Rule Productions as Markov Processes Step 2: generate left modifiers in a Markov chain S(told,V[6])?? (Hillary,N) VP(told,V[6]) S(told,V[6]) (yesterday,nn) (Hillary,N) VP(told,V[6]) P h (VP S, told, V[6]) P d ((Hillary,N) S,VP,told,V[6],LEFT) P d ((yesterday,nn) S,VP,told,V[6],LEFT)

36 Modeling Rule Productions as Markov Processes Step 2: generate left modifiers in a Markov chain S(told,V[6])?? (yesterday,nn) (Hillary,N) VP(told,V[6]) S(told,V[6]) STOP (yesterday,nn) (Hillary,N) VP(told,V[6]) P h (VP S, told, V[6]) P d ((Hillary,N) S,VP,told,V[6],LEFT) P d ((yesterday,nn) S,VP,told,V[6],LEFT) P d (STOP S,VP,told,V[6],LEFT)

37 Modeling Rule Productions as Markov Processes Step 3: generate right modifiers in a Markov chain S(told,V[6]) STOP (yesterday,nn) (Hillary,N) VP(told,V[6])?? S(told,V[6]) STOP (yesterday,nn) (Hillary,N) VP(told,V[6]) STOP P h (VP S, told, V[6]) P d ((Hillary,N) S,VP,told,V[6],LEFT) P d ((yesterday,nn) S,VP,told,V[6],LEFT) P d (STOP S,VP,told,V[6],LEFT) P d (STOP S,VP,told,V[6],RIGHT)

38 A Refinement: Adding a Distance Variable = 1 if position is adjacent to the head. S(told,V[6])?? VP(told,V[6]) S(told,V[6]) (Hillary,N) VP(told,V[6]) P h (VP S, told, V[6]) P d ((Hillary,N) S,VP,told,V[6],LEFT, = 1)

39 A Refinement: Adding a Distance Variable = 1 if position is adjacent to the head. S(told,V[6])?? (Hillary,N) VP(told,V[6]) S(told,V[6]) (yesterday,nn) (Hillary,N) VP(told,V[6]) P h (VP S, told, V[6]) P d ((Hillary,N) S,VP,told,V[6],LEFT) P d ((yesterday,nn) S,VP,told,V[6],LEFT, = 0)

40 The Final Probabilities S(told,V[6]) STOP (yesterday,nn) (Hillary,N) VP(told,V[6]) STOP P h (VP S, told, V[6]) P d ((Hillary,N) S,VP,told,V[6],LEFT, = 1) P d ((yesterday,nn) S,VP,told,V[6],LEFT, = 0) P d (STOP S,VP,told,V[6],LEFT, = 0) P d (STOP S,VP,told,V[6],RIGHT, = 1)

41 Adding the Complement/Adjunct Distinction S subject VP V verb S(told,V[6]) (yesterday,nn) (Hillary,N) VP(told,V[6]) NN N V[6]... yesterday Hillary told Hillary is the subject yesterday is a temporal modifier But nothing to distinguish them.

42 Adding the Complement/Adjunct Distinction VP V verb object VP(told,V[6]) V[6] (Bill,N) (yesterday,nn) SBAR(that,COMP) told N NN... Bill yesterday Bill is the object yesterday is a temporal modifier But nothing to distinguish them.

43 Complements vs. Adjuncts Complements are closely related to the head they modify, adjuncts are more indirectly related Complements are usually arguments of the thing they modify yesterday Hillary told... Hillary is doing the telling Adjuncts add modifying information: time, place, manner etc. yesterday Hillary told... yesterday is a temporal modifier Complements are usually required, adjuncts are optional vs. yesterday Hillary told... (grammatical) vs. Hillary told... (grammatical) vs. yesterday told... (ungrammatical)

44 Adding Tags Making the Complement/Adjunct Distinction S S -C VP VP subject V modifier V verb verb S(told,V[6]) (yesterday,nn) -C(Hillary,N) VP(told,V[6]) NN N V[6]... yesterday Hillary told

45 Adding Tags Making the Complement/Adjunct Distinction VP VP V -C V verb object verb modifier VP(told,V[6]) V[6] -C(Bill,N) (yesterday,nn) SBAR-C(that,COMP) told N NN... Bill yesterday

46 Adding Subcategorization Probabilities Step 1: generate category of head child S(told,V[6]) S(told,V[6]) VP(told,V[6]) P h (VP S, told, V[6])

47 Adding Subcategorization Probabilities Step 2: choose left subcategorization frame S(told,V[6]) VP(told,V[6]) S(told,V[6]) VP(told,V[6]) {-C} P h (VP S, told, V[6]) P lc ({-C} S, VP, told, V[6])

48 Step 3: generate left modifiers in a Markov chain S(told,V[6])?? VP(told,V[6]) {-C} S(told,V[6]) -C(Hillary,N) VP(told,V[6]) {} P h (VP S, told, V[6]) P lc ({-C} S, VP, told, V[6]) P d (-C(Hillary,N) S,VP,told,V[6],LEFT,{-C})

49 S(told,V[6])?? -C(Hillary,N) VP(told,V[6]) {} S(told,V[6]) (yesterday,nn) -C(Hillary,N) VP(told,V[6]) {} P h (VP S, told, V[6]) P lc ({-C} S, VP, told, V[6]) P d (-C(Hillary,N) S,VP,told,V[6],LEFT,{-C}) P d ((yesterday,nn) S,VP,told,V[6],LEFT,{})

50 S(told,V[6])?? (yesterday,nn) -C(Hillary,N) VP(told,V[6]) {} S(told,V[6]) STOP (yesterday,nn) -C(Hillary,N) VP(told,V[6]) {} P h (VP S, told, V[6]) P lc ({-C} S, VP, told, V[6]) P d (-C(Hillary,N) S,VP,told,V[6],LEFT,{-C}) P d ((yesterday,nn) S,VP,told,V[6],LEFT,{}) P d (STOP S,VP,told,V[6],LEFT,{})

51 The Final Probabilities S(told,V[6]) STOP (yesterday,nn) -C(Hillary,N) VP(told,V[6]) STOP P h (VP S, told, V[6]) P lc ({-C} S, VP, told, V[6]) P d (-C(Hillary,N) S,VP,told,V[6],LEFT, = 1,{-C}) P d ((yesterday,nn) S,VP,told,V[6],LEFT, = 0,{}) P d (STOP S,VP,told,V[6],LEFT, = 0,{}) P rc ({} S, VP, told, V[6]) P d (STOP S,VP,told,V[6],RIGHT, = 1,{})

52 Another Example VP(told,V[6]) V[6](told,V[6]) -C(Bill,N) (yesterday,nn) SBAR-C(that,COMP) P h (V[6] VP, told, V[6]) P lc ({} VP, V[6], told, V[6]) P d (STOP VP,V[6],told,V[6],LEFT, = 1,{}) P rc ({-C, SBAR-C} VP, V[6], told, V[6]) P d (-C(Bill,N) VP,V[6],told,V[6],RIGHT, = 1,{-C, SBAR-C}) P d ((yesterday,nn) VP,V[6],told,V[6],RIGHT, = 0,{SBAR-C}) P d (SBAR-C(that,COMP) VP,V[6],told,V[6],RIGHT, = 0,{SBAR-C}) P d (STOP VP,V[6],told,V[6],RIGHT, = 0,{})

53 Summary Identify heads of rules dependency representations Presented two variants of PCFG methods applied to lexicalized grammars. Break generation of rule down into small (markov process) steps Build dependencies back up (distance, subcategorization)

54 Evaluation: Representing Trees as Constituents S VP DT NN Vt the lawyer questioned DT NN the witness Label Start Point End Point VP 3 5 S 1 5

55 Precision and Recall Label Start Point End Point PP VP 3 8 S 1 8 Label Start Point End Point PP VP 3 8 S 1 8 G = number of constituents in gold standard = 7 P = number in parse output = 6 C = number correct = 6 Recall = 100% C G = 100% 6 7 Precision = 100% C P = 100% 6 6

56 Results Method Recall Precision PCFGs (Charniak 97) 70.6% 74.8% Conditional Models Decision Trees (Magerman 95) 84.0% 84.3% Lexical Dependencies (Collins 96) 85.3% 85.7% Conditional Models Logistic (Ratnaparkhi 97) 86.3% 87.5% Generative Lexicalized Model (Charniak 97) 86.7% 86.6% Model 1 (no subcategorization) 87.5% 87.7% Model 2 (subcategorization) 88.1% 88.3%

57 Effect of the Different Features MODEL A V R P Model 1 NO NO 75.0% 76.5% Model 1 YES NO 86.6% 86.7% Model 1 YES YES 87.8% 88.2% Model 2 NO NO 85.1% 86.8% Model 2 YES NO 87.7% 87.8% Model 2 YES YES 88.7% 89.0% Results on Section 0 of the WSJ Treebank. Model 1 has no subcategorization, Model 2 has subcategorization. A = YES, V = YES mean that the adjacency/verb conditions respectively were used in the distance measure. R/P = recall/precision.

58 Weaknesses of Precision and Recall Label Start Point End Point PP VP 3 8 S 1 8 Label Start Point End Point PP VP 3 8 S 1 8 attachment: (S ( The men) (VP dumped ( ( sacks) (PP of ( the substance))))) VP attachment: (S ( The men) (VP dumped ( sacks) (PP of ( the substance))))

59 S(told,V[6]) -C(Hillary,N) VP(told,V[6]) N Hillary V[6](told,V[6]) V[6] -C(Clinton,N) N SBAR-C(that,COMP) told Clinton COMP S-C that -C(she,PRP) VP(was,Vt) PRP she Vt was -C(president,NN) NN president ( told V[6] TOP S SPECIAL) (told V[6] Hillary N S VP -C LEFT) (told V[6] Clinton N VP V[6] -C RIGHT) (told V[6] that COMP VP V[6] SBAR-C RIGHT) (that COMP was Vt SBAR-C COMP S-C RIGHT) (was Vt she PRP S-C VP -C LEFT) (was Vt president NN VP Vt -C RIGHT)

60 Dependency Accuracies All parses for a sentence with n words have n dependencies Report a single figure, dependency accuracy Model 2 with all features scores 88.3% dependency accuracy (91% if you ignore non-terminal labels on dependencies) Can calculate precision/recall on particular dependency types e.g., look at all subject/verb dependencies all dependencies with label (S,VP,-C,LEFT) Recall = number of subject/verb dependencies correct number of subject/verb dependencies in gold standard Precision = number of subject/verb dependencies correct number of subject/verb dependencies in parser s output

61 R CP P Count Relation Rec Prec B TAG TAG L PP TAG -C R S VP -C L B PP R VP TAG -C R VP TAG VP-C R VP TAG PP R TOP TOP S R VP TAG SBAR-C R QP TAG TAG R B R SBAR TAG S-C R B SBAR R VP TAG ADVP R B TAG B L VP TAG TAG R VP TAG SG-C R Accuracy of the 17 most frequent dependency types in section 0 of the treebank, as recovered by model 2. R = rank; CP = cumulative percentage; P = percentage; Rec = Recall; Prec = precision.

62

63 Type Sub-type Description Count Recall Precision Complement to a verb S VP -C L Subject VP TAG -C R Object = 16.3% of all cases VP TAG SBAR-C R VP TAG SG-C R VP TAG S-C R S VP S-C L S VP SG-C L TOTAL Other complements PP TAG -C R VP TAG VP-C R = 18.8% of all cases SBAR TAG S-C R SBAR WH SG-C R PP TAG SG-C R SBAR WHADVP S-C R PP TAG PP-C R SBAR WH S-C R SBAR TAG SG-C R PP TAG S-C R SBAR WHPP S-C R S ADJP -C L PP TAG SBAR-C R TOTAL

64 Type Sub-type Description Count Recall Precision PP modification B PP R VP TAG PP R = 11.2% of all cases S VP PP L ADJP TAG PP R ADVP TAG PP R PP R PP PP PP L NAC TAG PP R TOTAL Coordination R VP VP VP R = 1.9% of all cases S S S R ADJP TAG TAG R VP TAG TAG R NX NX NX R SBAR SBAR SBAR R PP PP PP R TOTAL

65 Type Sub-type Description Count Recall Precision Mod n within Bases B TAG TAG L B TAG B L = 29.6% of all cases B TAG TAG R B TAG ADJP L B TAG QP L B TAG NAC L B NX TAG L B QP TAG L TOTAL Mod n to s B R Appositive B SBAR R Relative clause = 3.6% of all cases B VP R Reduced relative B SG R B PRN R B ADVP R B ADJP R TOTAL

66 Type Sub-type Description Count Recall Precision Sentential head TOP TOP S R TOP TOP SINV R = 4.8% of all cases TOP TOP R TOP TOP SG R TOTAL Adjunct to a verb VP TAG ADVP R VP TAG TAG R = 5.6% of all cases VP TAG ADJP R S VP ADVP L VP TAG R VP TAG SBAR R VP TAG SG R S VP TAG L S VP SBAR L VP TAG ADVP L S VP PRN L S VP L S VP SG L VP TAG PRN R VP TAG S R TOTAL

67 Some Conclusions about Errors in Parsing Core sentential structure (complements, chunks) recovered with over 90% accuracy. Attachment ambiguities involving adjuncts are resolved with much lower accuracy ( 80% for PP attachment, 50 60% for coordination).

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences

More information

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic

More information

Constituency Parsing

Constituency Parsing CS5740: Natural Language Processing Spring 2017 Constituency Parsing Instructor: Yoav Artzi Slides adapted from Dan Klein, Dan Jurafsky, Chris Manning, Michael Collins, Luke Zettlemoyer, Yejin Choi, and

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

CS 545 Lecture XVI: Parsing

CS 545 Lecture XVI: Parsing CS 545 Lecture XVI: Parsing brownies_choco81@yahoo.com brownies_choco81@yahoo.com Benjamin Snyder Parsing Given a grammar G and a sentence x = (x1, x2,..., xn), find the best parse tree. We re not going

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Multilevel Coarse-to-Fine PCFG Parsing

Multilevel Coarse-to-Fine PCFG Parsing Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa

More information

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania 1 PICTURE OF ANALYSIS PIPELINE Tokenize Maximum Entropy POS tagger MXPOST Ratnaparkhi Core Parser Collins

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Features of Statistical Parsers

Features of Statistical Parsers Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical

More information

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP  NP PP 1.0. N people 0. /6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a) Chapter 13 Statistical Parsg Given a corpus of trees, it is easy to extract a CFG and estimate its parameters. Every tree can be thought of as a CFG derivation, and we just perform relative frequency estimation

More information

6.891: Lecture 8 (October 1st, 2003) Log-Linear Models for Parsing, and the EM Algorithm Part I

6.891: Lecture 8 (October 1st, 2003) Log-Linear Models for Parsing, and the EM Algorithm Part I 6.891: Lecture 8 (October 1st, 2003) Log-Linear Models for Parsing, and EM Algorithm Part I Overview Ratnaparkhi s Maximum-Entropy Parser The EM Algorithm Part I Log-Linear Taggers: Independence Assumptions

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,

More information

Processing/Speech, NLP and the Web

Processing/Speech, NLP and the Web CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

(7) a. [ PP to John], Mary gave the book t [PP]. b. [ VP fix the car], I wonder whether she will t [VP].

(7) a. [ PP to John], Mary gave the book t [PP]. b. [ VP fix the car], I wonder whether she will t [VP]. CAS LX 522 Syntax I Fall 2000 September 18, 2000 Paul Hagstrom Week 2: Movement Movement Last time, we talked about subcategorization. (1) a. I can solve this problem. b. This problem, I can solve. (2)

More information

Chapter 14 (Partially) Unsupervised Parsing

Chapter 14 (Partially) Unsupervised Parsing Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Assignment/report submission

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

The Infinite PCFG using Hierarchical Dirichlet Processes

The Infinite PCFG using Hierarchical Dirichlet Processes S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise

More information

10/17/04. Today s Main Points

10/17/04. Today s Main Points Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points

More information

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

Lecture 5: UDOP, Dependency Grammars

Lecture 5: UDOP, Dependency Grammars Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M

More information

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CMPT-825 Natural Language Processing. Why are parsing algorithms important? CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate

More information

Constituency. Doug Arnold

Constituency. Doug Arnold Constituency Doug Arnold doug@essex.ac.uk Spose we have a string... xyz..., how can we establish whether xyz is a constituent (i.e. syntactic unit); i.e. whether the representation of... xyz... should

More information

Lecture 11: PCFGs: getting luckier all the time

Lecture 11: PCFGs: getting luckier all the time Lecture 11: PCFGs: getting luckier all the time Professor Robert C. Berwick berwick@csail.mit.edu Menu Statistical Parsing with Treebanks I: breaking independence assumptions How lucky can we get? Or:

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

CS : Speech, NLP and the Web/Topics in AI

CS : Speech, NLP and the Web/Topics in AI CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; insideoutside probabilities Probability of a parse tree (cont.) S 1,l NP 1,2

More information

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith U. Washington CMU UT-Austin CMU Contributions 1. A new generative model for learning

More information

X-bar theory. X-bar :

X-bar theory. X-bar : is one of the greatest contributions of generative school in the filed of knowledge system. Besides linguistics, computer science is greatly indebted to Chomsky to have propounded the theory of x-bar.

More information

Ling 240 Lecture #15. Syntax 4

Ling 240 Lecture #15. Syntax 4 Ling 240 Lecture #15 Syntax 4 agenda for today Give me homework 3! Language presentation! return Quiz 2 brief review of friday More on transformation Homework 4 A set of Phrase Structure Rules S -> (aux)

More information

Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters

Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters Klinton Bicknell & Roger Levy LSA 2015 Summer Institute University

More information

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically NP S AUX VP Ali will V NP help D N the man A node is any point in the tree diagram and it can

More information

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...

More information

The Rise of Statistical Parsing

The Rise of Statistical Parsing The Rise of Statistical Parsing Karl Stratos Abstract The effectiveness of statistical parsing has almost completely overshadowed the previous dependence on rule-based parsing. Statistically learning how

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

6.891: Lecture 24 (December 8th, 2003) Kernel Methods

6.891: Lecture 24 (December 8th, 2003) Kernel Methods 6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 27, 2016 Recap: Probabilistic Language

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models 1 University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Language Models & Hidden Markov Models Stephan Oepen & Erik Velldal Language

More information

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis

More information

Statistical methods in NLP, lecture 7 Tagging and parsing

Statistical methods in NLP, lecture 7 Tagging and parsing Statistical methods in NLP, lecture 7 Tagging and parsing Richard Johansson February 25, 2014 overview of today's lecture HMM tagging recap assignment 3 PCFG recap dependency parsing VG assignment 1 overview

More information

Hidden Markov Models (HMMs)

Hidden Markov Models (HMMs) Hidden Markov Models HMMs Raymond J. Mooney University of Texas at Austin 1 Part Of Speech Tagging Annotate each word in a sentence with a part-of-speech marker. Lowest level of syntactic analysis. John

More information

Alessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005

Alessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005 Alessandro Mazzei Dipartimento di Informatica Università di Torino MATER DI CIENZE COGNITIVE GENOVA 2005 04-11-05 Natural Language Grammars and Parsing Natural Language yntax Paolo ama Francesca yntactic

More information

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

{Probabilistic Stochastic} Context-Free Grammars (PCFGs) {Probabilistic Stochastic} Context-Free Grammars (PCFGs) 116 The velocity of the seismic waves rises to... S NP sg VP sg DT NN PP risesto... The velocity IN NP pl of the seismic waves 117 PCFGs APCFGGconsists

More information

Hidden Markov Models

Hidden Markov Models CS 2750: Machine Learning Hidden Markov Models Prof. Adriana Kovashka University of Pittsburgh March 21, 2016 All slides are from Ray Mooney Motivating Example: Part Of Speech Tagging Annotate each word

More information

Maxent Models and Discriminative Estimation

Maxent Models and Discriminative Estimation Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

Introduction to Probablistic Natural Language Processing

Introduction to Probablistic Natural Language Processing Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation

More information

Stochastic Parsing. Roberto Basili

Stochastic Parsing. Roberto Basili Stochastic Parsing Roberto Basili Department of Computer Science, System and Production University of Roma, Tor Vergata Via Della Ricerca Scientifica s.n.c., 00133, Roma, ITALY e-mail: basili@info.uniroma2.it

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 18, 2017 Recap: Probabilistic Language

More information

Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling

Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling Xavier Carreras and Lluís Màrquez TALP Research Center Technical University of Catalonia Boston, May 7th, 2004 Outline Outline of the

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 159/259 Lecture 15: Review (Oct 11, 2018) David Bamman, UC Berkeley Big ideas Classification Naive Bayes, Logistic regression, feedforward neural networks, CNN. Where does

More information

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):

More information

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities 1 / 22 Inside L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 Inside- 2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence

More information

Terry Gaasterland Scripps Institution of Oceanography University of California San Diego, La Jolla, CA,

Terry Gaasterland Scripps Institution of Oceanography University of California San Diego, La Jolla, CA, LAMP-TR-138 CS-TR-4844 UMIACS-TR-2006-58 DECEMBER 2006 SUMMARIZATION-INSPIRED TEMPORAL-RELATION EXTRACTION: TENSE-PAIR TEMPLATES AND TREEBANK-3 ANALYSIS Bonnie Dorr Department of Computer Science University

More information

Artificial Intelligence

Artificial Intelligence CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20-21 Natural Language Parsing Parsing of Sentences Are sentences flat linear structures? Why tree? Is

More information

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment. even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,

More information

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing! CS 6120 Spring 2014! Northeastern University!! David Smith! with some slides from Jason Eisner & Andrew

More information

Transformational Priors Over Grammars

Transformational Priors Over Grammars Transformational Priors Over Grammars Jason Eisner Johns Hopkins University July 6, 2002 EMNLP This talk is called Transformational Priors Over Grammars. It should become clear what I mean by a prior over

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Computational Linguistics

Computational Linguistics Computational Linguistics Dependency-based Parsing Clayton Greenberg Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2016 Acknowledgements These slides

More information

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing Computational Linguistics Dependency-based Parsing Dietrich Klakow & Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2013 Acknowledgements These slides

More information

Computationele grammatica

Computationele grammatica Computationele grammatica Docent: Paola Monachesi Contents First Last Prev Next Contents 1 Unbounded dependency constructions (UDCs)............... 3 2 Some data...............................................

More information

A Support Vector Method for Multivariate Performance Measures

A Support Vector Method for Multivariate Performance Measures A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme

More information

Effectiveness of complex index terms in information retrieval

Effectiveness of complex index terms in information retrieval Effectiveness of complex index terms in information retrieval Tokunaga Takenobu, Ogibayasi Hironori and Tanaka Hozumi Department of Computer Science Tokyo Institute of Technology Abstract This paper explores

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning A Deterministic Word Dependency Analyzer Enhanced With Preference Learning Hideki Isozaki and Hideto Kazawa and Tsutomu Hirao NTT Communication Science Laboratories NTT Corporation 2-4 Hikaridai, Seikacho,

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2016 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017

13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017 Computational Linguistics CSC 2501 / 485 Fall 2017 13A 13A. Log-Likelihood Dependency Parsing Gerald Penn Department of Computer Science, University of Toronto Based on slides by Yuji Matsumoto, Dragomir

More information

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018. Recap: HMM ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2018 Elements of HMM: Set of states (tags) Output alphabet (word types) Start state (beginning of sentence) State transition probabilities

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

Tutorial on Corpus-based Natural Language Processing Anna University, Chennai, India. December, 2001

Tutorial on Corpus-based Natural Language Processing Anna University, Chennai, India. December, 2001 Tutorial on Corpus-based Natural Language Processing Anna University, Chennai, India. December, 2001 Anoop Sarkar anoop@cis.upenn.edu 1 Chunking and Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/

More information

Probabilistic Context Free Grammars

Probabilistic Context Free Grammars 1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set

More information

Boosting Applied to Tagging and PP Attachment

Boosting Applied to Tagging and PP Attachment Boosting Applied to Tagging and PP Attachment Steven Abney Robert E. Schapire Yoram Singer AT&T Labs Research 180 Park Avenue Florham Park, NJ 07932 {abney, schapire, singer}@research.att.com Abstract

More information

Collapsed Variational Bayesian Inference for Hidden Markov Models

Collapsed Variational Bayesian Inference for Hidden Markov Models Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

frequencies that matter in sentence processing

frequencies that matter in sentence processing frequencies that matter in sentence processing Marten van Schijndel 1 September 7, 2015 1 Department of Linguistics, The Ohio State University van Schijndel Frequencies that matter September 7, 2015 0

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information