Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters

Size: px
Start display at page:

Download "Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters"

Transcription

1 Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters Klinton Bicknell & Roger Levy LSA 2015 Summer Institute University of Chicago 10 July 2015

2 We ll start with some puzzles The women discussed the dogs on the beach.

3 We ll start with some puzzles The women discussed the dogs on the beach. What does on the beach modify?

4 We ll start with some puzzles The women discussed the dogs on the beach. What does on the beach modify? Ford et al., 1982 dogs (90%); discussed (10%)

5 We ll start with some puzzles The women discussed the dogs on the beach. What does on the beach modify? Ford et al., 1982 dogs (90%); discussed (10%) The women kept the dogs on the beach.

6 We ll start with some puzzles The women discussed the dogs on the beach. What does on the beach modify? Ford et al., 1982 dogs (90%); discussed (10%) The women kept the dogs on the beach. What does on the beach modify?

7 We ll start with some puzzles The women discussed the dogs on the beach. What does on the beach modify? Ford et al., 1982 dogs (90%); discussed (10%) The women kept the dogs on the beach. What does on the beach modify? Ford et al., 1982 kept (95%); dogs (5%)

8 We ll start with some puzzles The women discussed the dogs on the beach. What does on the beach modify? Ford et al., 1982 dogs (90%); discussed (10%) The women kept the dogs on the beach. What does on the beach modify? Ford et al., 1982 kept (95%); dogs (5%) The complex houses married children and their families.

9 We ll start with some puzzles The women discussed the dogs on the beach. What does on the beach modify? Ford et al., 1982 dogs (90%); discussed (10%) The women kept the dogs on the beach. What does on the beach modify? Ford et al., 1982 kept (95%); dogs (5%) The complex houses married children and their families. The warehouse fires a dozen employees each year.

10 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for:

11 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input

12 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input Accurate disambiguation

13 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input Accurate disambiguation Inference on basis of incomplete input (Tanenhaus et al 1995, Altmann and Kamide 1999, Kaiser and Trueswell 2004)

14 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input Accurate disambiguation Inference on basis of incomplete input (Tanenhaus et al 1995, Altmann and Kamide 1999, Kaiser and Trueswell 2004)

15 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input Accurate disambiguation Inference on basis of incomplete input (Tanenhaus et al 1995, Altmann and Kamide 1999, Kaiser and Trueswell 2004) the boy will eat

16 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input Accurate disambiguation Inference on basis of incomplete input (Tanenhaus et al 1995, Altmann and Kamide 1999, Kaiser and Trueswell 2004) the boy will eat

17 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input Accurate disambiguation Inference on basis of incomplete input (Tanenhaus et al 1995, Altmann and Kamide 1999, Kaiser and Trueswell 2004) the boy will eat

18 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input Accurate disambiguation Inference on basis of incomplete input (Tanenhaus et al 1995, Altmann and Kamide 1999, Kaiser and Trueswell 2004) the boy will eat Processing difficulty is differential and localized

19 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input Accurate disambiguation Inference on basis of incomplete input (Tanenhaus et al 1995, Altmann and Kamide 1999, Kaiser and Trueswell 2004) the boy will eat Processing difficulty is differential and localized

20 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input Accurate disambiguation Inference on basis of incomplete input (Tanenhaus et al 1995, Altmann and Kamide 1999, Kaiser and Trueswell 2004) how to get from here the boy will eat Processing difficulty is differential and localized

21 Comprehension: Theoretical Desiderata Realistic models of human sentence comprehension must account for: Robustness to arbitrary input Accurate disambiguation Inference on basis of incomplete input (Tanenhaus et al 1995, Altmann and Kamide 1999, Kaiser and Trueswell 2004) how to get from here to here? the boy will eat Processing difficulty is differential and localized

22 Crash course in grammars and parsing

23 Crash course in grammars and parsing A grammar is a structured set of production rules

24 Crash course in grammars and parsing A grammar is a structured set of production rules Most commonly used for syntactic description, but also useful for (sematics, phonology, )

25 Crash course in grammars and parsing A grammar is a structured set of production rules Most commonly used for syntactic description, but also useful for (sematics, phonology, ) E.g., context-free grammars: S VP Det N VP V Det the N dog N cat V chased

26 Crash course in grammars and parsing A grammar is a structured set of production rules Most commonly used for syntactic description, but also useful for (sematics, phonology, ) E.g., context-free grammars: S VP Det N VP V Det the N dog N cat V chased A grammar is said to license a derivation if all the derivation s rules are present in the grammar

27 Crash course in grammars and parsing A grammar is a structured set of production rules Most commonly used for syntactic description, but also useful for (sematics, phonology, ) E.g., context-free grammars: S VP Det N VP V Det the N dog N cat V chased A grammar is said to license a derivation if all the derivation s rules are present in the grammar

28 Crash course in grammars and parsing A grammar is a structured set of production rules Most commonly used for syntactic description, but also useful for (sematics, phonology, ) E.g., context-free grammars: S VP Det N VP V Det the N dog N cat V chased A grammar is said to license a derivation if all the derivation s rules are present in the grammar OK

29 Crash course in grammars and parsing A grammar is a structured set of production rules Most commonly used for syntactic description, but also useful for (sematics, phonology, ) E.g., context-free grammars: S VP Det N VP V Det the N dog N cat V chased A grammar is said to license a derivation if all the derivation s rules are present in the grammar OK

30 Crash course in grammars and parsing A grammar is a structured set of production rules Most commonly used for syntactic description, but also useful for (sematics, phonology, ) E.g., context-free grammars: S VP Det N VP V Det the N dog N cat V chased A grammar is said to license a derivation if all the derivation s rules are present in the grammar OK

31 Context-free Grammars A context-free grammar (CFG) consists of a tuple (N, V, S, R) such that: N is a finite set of non-terminal symbols; V is a finite set of terminal symbols; S is the start symbol; R is a finite set of rules of the form X α where X N and α is a sequence of symbols drawn from N V. A CFG derivation is the recursive expansion of non-terminal symbols in a string by rules in R, starting with S, and a derivation tree T is the history of those rule applications.

32 Context-free Grammars: an example Let our grammar (the rule-set R) be S VP Det N PP PP P VP V Det the N dog N cat P near V growled The nonterminal set N is {S,, VP, Det, N, P, V}, the terminal set V is {the, dog, cat, near, growled}, and our start symbol S is S.

33 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S

34 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP

35 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP

36 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP Det N

37 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP Det N the

38 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP Det the N dog

39 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP Det N P the dog

40 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP Det N P the dog near

41 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP Det N P the dog near Det N

42 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP Det N P the dog near Det N the

43 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP Det N P the dog near Det N the cat

44 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP V Det N P the dog near Det N the cat

45 Context-free Grammars: an example II S VP Det N PP PP P VP V Det the N dog N cat P near V growled Here is a derivation and the resulting derivation tree: S VP PP V Det N P growled the dog near Det N the cat

46 Probabilistic Context-Free Grammars A probabilistic context-free grammar (PCFG) consists of a tuple (N, V, S, R, P) such that: N is a finite set of non-terminal symbols; V is a finite set of terminal symbols; S is the start symbol; R is a finite set of rules of the form X α where X N and α is a sequence of symbols drawn from N V ; P is a mapping from R into probabilities, such that for each X N, [X α] R P(X α) = 1 PCFG derivations and derivation trees are just like for CFGs. The probability P(T) of a derivation tree is simply the product of the probabilities of each rule application.

47 Example PCFG 1 S VP 0.8 Det N 0.2 PP 1 PP P 1 VP V S 1 Det the 0.5 N dog 0.5 N cat 1 P near 1 V growled 0.2 VP 0.8 Det N P 0.5 the dog near PP V growled 0.8 Det N 0.5 the cat P(T) = = 0.02

48 Top-down parsing S VP Det N Det The

49 Top-down parsing S VP Det N Det The

50 Top-down parsing S VP Det N Det The

51 Top-down parsing S VP Det N Det The

52 Top-down parsing Fundamental operation: S VP Det N Det The Permits structure building inconsistent with perceived input, or corresponding to as-yet-unseen input

53 Bottom-up parsing VP V PP P S VP

54 Bottom-up parsing VP V PP P S VP

55 Bottom-up parsing VP V PP P S VP

56 Bottom-up parsing Fundamental operation: check whether a sequence of categories matches a rule s right-hand side VP V PP P S VP Permits structure building inconsistent with global context

57 Bottom-up parsing Fundamental operation: check whether a sequence of categories matches a rule s right-hand side VP V PP P S VP Permits structure building inconsistent with global context

58 Ambiguity There is usually more than one structural analysis for a (partial) sentence The girl saw the boy with Corresponds to choices (non-determinism) in parsing

59 Ambiguity There is usually more than one structural analysis for a (partial) sentence The girl saw the boy with Corresponds to choices (non-determinism) in parsing VP can expand to V PP

60 Ambiguity There is usually more than one structural analysis for a (partial) sentence The girl saw the boy with Corresponds to choices (non-determinism) in parsing VP can expand to V PP or VP can expand to V and then can expand to PP

61 Ambiguity There is usually more than one structural analysis for a (partial) sentence The girl saw the boy with Corresponds to choices (non-determinism) in parsing VP can expand to V PP or VP can expand to V and then can expand to PP Ambiguity can be local (eventually resolved) with a puppy on his lap.

62 Ambiguity There is usually more than one structural analysis for a (partial) sentence The girl saw the boy with Corresponds to choices (non-determinism) in parsing VP can expand to V PP or VP can expand to V and then can expand to PP Ambiguity can be local (eventually resolved) with a puppy on his lap. or it can be global (unresolved): with binoculars.

63 Serial vs. Parallel processing

64 Serial vs. Parallel processing A serial processing model is one where, when faced with a choice, chooses one alternative and discards the rest

65 Serial vs. Parallel processing A serial processing model is one where, when faced with a choice, chooses one alternative and discards the rest A parallel model is one where at least two alternatives are chosen and maintained

66 Serial vs. Parallel processing A serial processing model is one where, when faced with a choice, chooses one alternative and discards the rest A parallel model is one where at least two alternatives are chosen and maintained A full parallel model is one where all alternatives are maintained

67 Serial vs. Parallel processing A serial processing model is one where, when faced with a choice, chooses one alternative and discards the rest A parallel model is one where at least two alternatives are chosen and maintained A full parallel model is one where all alternatives are maintained A limited parallel model is one where some but not necessarily all alternatives are maintained

68 Serial vs. Parallel processing A serial processing model is one where, when faced with a choice, chooses one alternative and discards the rest A parallel model is one where at least two alternatives are chosen and maintained A full parallel model is one where all alternatives are maintained A limited parallel model is one where some but not necessarily all alternatives are maintained A joke about the man with an umbrella that I heard

69 Serial vs. Parallel processing A serial processing model is one where, when faced with a choice, chooses one alternative and discards the rest A parallel model is one where at least two alternatives are chosen and maintained A full parallel model is one where all alternatives are maintained A limited parallel model is one where some but not necessarily all alternatives are maintained A joke about the man with an umbrella that I heard *ambiguity goes as the Catalan numbers (Church and Patel 1982):

70 Dynamic programming There is an exponential number of parse trees for a given sentence (Church & Patil 1982) So sentence comprehension can t entail an exhaustive enumeration of possible structural representations But parsing can be made tractable by dynamic programming

71 Dynamic programming There is an exponential number of parse trees for a given sentence (Church & Patil 1982) So sentence comprehension can t entail an exhaustive enumeration of possible structural representations But parsing can be made tractable by dynamic programming

72 Dynamic programming (2) Dynamic programming = storage of partial results There are two ways to make an out of but the resulting can be stored just once in the parsing process Result: parsing time polynomial (cubic for CFGs) in sentence length Still problematic for modeling human sentence processing

73 Psycholinguistic methodology The workhorses of psycholinguistic experimentation involve behavioral measures What choices do people make in various types of language-producing and language-comprehending situations? and how long do they take to make these choices? Offline measures rating sentences, completing sentences, Online measures tracking people s eye movements, having people read words aloud, reading under (implicit) time pressure

74 Psycholinguistic methodology (2)

75 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading

76 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press

77 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press

78 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press While

79 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press the

80 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press clouds

81 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press crackled,

82 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press above

83 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press the

84 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press glider

85 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press soared

86 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press soared Readers aren t allowed to backtrack

87 Psycholinguistic methodology (2) Example online-processing methodology: word-by-word self-paced reading Reveal each consecutive word with a button press soared Readers aren t allowed to backtrack We measure time between button presses and use it as a proxy for incremental processing difficulty

88 Psycholinguistic methodology () Caveat: neurolinguistic experimentation more and more widely used to study language comprehension methods vary in temporal and spatial resolution people are more passive in these experiments: sit back and listen to/read a sentence, word by word strictly speaking not behavioral measures the question of what is difficult becomes a little less straightforward

89 Estimating grammar probabilities To test our models, we need to approximate the probabilistic linguistic knowledge of the native speaker By principle of rational analysis: native speaker s probabilities should match those of the environment We can use syntactically annotated datasets (Treebanks) to estimate frequencies in the environment e.g., via relative frequency estimation P (! Det N) = 2 Det N N Det N P (! N) = 1 P (Det! the) =1 the dog cats the cats P (N! dog) = 1 P (N! cats) = 2 14

90 Pruning approaches Jurafsky 1996: a probabilistic approach to lexical access and syntactic disambiguation Main argument: sentence comprehension is probabilistic, construction-based, and parallel Probabilistic parsing model explains human disambiguation preferences garden-path sentences The probabilistic parsing model has two components: constituent probabilities a probabilistic CFG model valence probabilities

91 Jurafsky 1996 Every word is immediately completely integrated into the parse of the sentence (i.e., full incrementality) Alternative parses are ranked in a probabilistic model Parsing is limited-parallel: when an alternative parse has unacceptably low probability, it is pruned Unacceptably low is determined by beam search (described a few slides later)

92 Jurafsky 1996: valency model

93 Jurafsky 1996: valency model Whereas the constituency model makes use of only phrasal, not lexical information, the valency model tracks lexical subcategorization, e.g.:

94 Jurafsky 1996: valency model Whereas the constituency model makes use of only phrasal, not lexical information, the valency model tracks lexical subcategorization, e.g.: P( < PP> discuss ) = 0.24

95 Jurafsky 1996: valency model Whereas the constituency model makes use of only phrasal, not lexical information, the valency model tracks lexical subcategorization, e.g.: P( < PP> discuss ) = 0.24 P( <> discuss ) = 0.76

96 Jurafsky 1996: valency model Whereas the constituency model makes use of only phrasal, not lexical information, the valency model tracks lexical subcategorization, e.g.: P( < PP> discuss ) = 0.24 P( <> discuss ) = 0.76 (in today s NLP, these are called monolexical probabilities)

97 Jurafsky 1996: valency model Whereas the constituency model makes use of only phrasal, not lexical information, the valency model tracks lexical subcategorization, e.g.: P( < PP> discuss ) = 0.24 P( <> discuss ) = 0.76 (in today s NLP, these are called monolexical probabilities) In some cases, Jurafsky bins across categories:* *valence probs are RFEs from Connine et al. (1984) and Penn Treebank

98 Jurafsky 1996: valency model Whereas the constituency model makes use of only phrasal, not lexical information, the valency model tracks lexical subcategorization, e.g.: P( < PP> discuss ) = 0.24 P( <> discuss ) = 0.76 (in today s NLP, these are called monolexical probabilities) In some cases, Jurafsky bins across categories:* P( < XP[+pred]> keep) = 0.81 *valence probs are RFEs from Connine et al. (1984) and Penn Treebank

99 Jurafsky 1996: valency model Whereas the constituency model makes use of only phrasal, not lexical information, the valency model tracks lexical subcategorization, e.g.: P( < PP> discuss ) = 0.24 P( <> discuss ) = 0.76 (in today s NLP, these are called monolexical probabilities) In some cases, Jurafsky bins across categories:* P( < XP[+pred]> keep) = 0.81 P( <> keep ) = 0.19 *valence probs are RFEs from Connine et al. (1984) and Penn Treebank

100 Jurafsky 1996: valency model Whereas the constituency model makes use of only phrasal, not lexical information, the valency model tracks lexical subcategorization, e.g.: P( < PP> discuss ) = 0.24 P( <> discuss ) = 0.76 (in today s NLP, these are called monolexical probabilities) In some cases, Jurafsky bins across categories:* P( < XP[+pred]> keep) = 0.81 P( <> keep ) = 0.19 where XP[+pred] can vary across AdjP, VP, PP, Particle *valence probs are RFEs from Connine et al. (1984) and Penn Treebank

101 Jurafsky 1996: syntactic model The syntactic component of Jurafsky s model is just probabilistic context-free grammars (PCFGs) Total probability: 0.7*0.5*0.15*0.*0.0*0.02*0.4*0.07=

102 Modeling offline preferences Ford et al found effect of lexical selection in PP attachment preferences (offline, forced-choice): The women discussed the dogs on the beach

103 Modeling offline preferences Ford et al found effect of lexical selection in PP attachment preferences (offline, forced-choice): The women discussed the dogs on the beach -attachment (the dogs that were on the beach) -- 90% VP-attachment (discussed while on the beach) 10%

104 Modeling offline preferences Ford et al found effect of lexical selection in PP attachment preferences (offline, forced-choice): The women discussed the dogs on the beach -attachment (the dogs that were on the beach) -- 90% VP-attachment (discussed while on the beach) 10% The women kept the dogs on the beach

105 Modeling offline preferences Ford et al found effect of lexical selection in PP attachment preferences (offline, forced-choice): The women discussed the dogs on the beach -attachment (the dogs that were on the beach) -- 90% VP-attachment (discussed while on the beach) 10% The women kept the dogs on the beach -attachment 5% VP-attachment 95%

106 Modeling offline preferences Ford et al found effect of lexical selection in PP attachment preferences (offline, forced-choice): The women discussed the dogs on the beach -attachment (the dogs that were on the beach) -- 90% VP-attachment (discussed while on the beach) 10% The women kept the dogs on the beach -attachment 5% VP-attachment 95% These are uncertainties about what has already been said

107 Modeling offline preferences Ford et al found effect of lexical selection in PP attachment preferences (offline, forced-choice): The women discussed the dogs on the beach -attachment (the dogs that were on the beach) -- 90% VP-attachment (discussed while on the beach) 10% The women kept the dogs on the beach -attachment 5% VP-attachment 95% These are uncertainties about what has already been said Overall results broadly confirmed in online attachment study by Taraban and McClelland 1988

108 Modeling offline preferences (2) Jurafsky ranks parses as the product of constituent and valence probabilities:

109 Modeling offline preferences ()

110 Result Ranking with respect to parse probability matches offline preferences Note that only monotonicity, not degree of preference is matched

111 Modeling online parsing Frazier and Rayner 1987

112 Modeling online parsing Does this sentence make sense? Frazier and Rayner 1987

113 Modeling online parsing Does this sentence make sense? The complex houses married and single students and their families. Frazier and Rayner 1987

114 Modeling online parsing Does this sentence make sense? The complex houses married and single students and their families. How about this one? Frazier and Rayner 1987

115 Modeling online parsing Does this sentence make sense? The complex houses married and single students and their families. How about this one? The warehouse fires a dozen employees each year. Frazier and Rayner 1987

116 Modeling online parsing Does this sentence make sense? The complex houses married and single students and their families. How about this one? The warehouse fires a dozen employees each year. And this one? Frazier and Rayner 1987

117 Modeling online parsing Does this sentence make sense? The complex houses married and single students and their families. How about this one? The warehouse fires a dozen employees each year. And this one? The warehouse fires destroyed all the buildings. Frazier and Rayner 1987

118 Modeling online parsing Does this sentence make sense? The complex houses married and single students and their families. How about this one? The warehouse fires a dozen employees each year. And this one? The warehouse fires destroyed all the buildings. fires can be either a noun or a verb. So can houses: Frazier and Rayner 1987

119 Modeling online parsing Does this sentence make sense? The complex houses married and single students and their families. How about this one? The warehouse fires a dozen employees each year. And this one? The warehouse fires destroyed all the buildings. fires can be either a noun or a verb. So can houses: [ The complex] [ VP houses married and single students ]. Frazier and Rayner 1987

120 Modeling online parsing Does this sentence make sense? The complex houses married and single students and their families. How about this one? The warehouse fires a dozen employees each year. And this one? The warehouse fires destroyed all the buildings. fires can be either a noun or a verb. So can houses: [ The complex] [ VP houses married and single students ]. These are garden path sentences Frazier and Rayner 1987

121 Modeling online parsing Does this sentence make sense? The complex houses married and single students and their families. How about this one? The warehouse fires a dozen employees each year. And this one? The warehouse fires destroyed all the buildings. fires can be either a noun or a verb. So can houses: [ The complex] [ VP houses married and single students ]. These are garden path sentences Originally taken as some of the strongest evidence for serial processing by the human parser Frazier and Rayner 1987

122 Limited parallel parsing Full-serial: keep only one incremental interpretation Full-parallel: keep all incremental interpretations Limited parallel: keep some but not all interpretations In a limited parallel model, garden-path effects can arise from the discarding of a needed interpretation [S [ The complex] [VP houses ] ] [S [ The complex houses ] ] discarded kept

123 Modeling online parsing: garden paths Pruning strategy for limited ranked-parallel processing Each incremental analysis is ranked Analyses falling below a threshold are discarded In this framework, a model must characterize The incremental analyses The threshold for pruning Jurafsky 1996: partial context-free parses as analyses Probability ratio as pruning threshold Ratio defined as P(I) : P(I best ) (Gibson 1991: complexity ratio for pruning threshold)

124 Garden path models 1: N/V ambiguity

125 Garden path models 1: N/V ambiguity Each analysis is a partial PCFG tree

126 Garden path models 1: N/V ambiguity Each analysis is a partial PCFG tree Tree prefix probability used for ranking of analysis

127 Garden path models 1: N/V ambiguity Each analysis is a partial PCFG tree Tree prefix probability used for ranking of analysis

128 Garden path models 1: N/V ambiguity Each analysis is a partial PCFG tree Tree prefix probability used for ranking of analysis these nodes are actually still undergoing expansion

129 Garden path models 1: N/V ambiguity Each analysis is a partial PCFG tree Tree prefix probability used for ranking of analysis these nodes are actually still undergoing expansion Partial rule probs marginalize over rule completions

130 Garden path models 1: N/V ambiguity Each analysis is a partial PCFG tree Tree prefix probability used for ranking of analysis these nodes are actually still undergoing expansion Partial rule probs marginalize over rule completions

131 N/V ambiguity (2) Partial CF tree analysis of the complex houses Analysis of houses as noun has much lower probability than analysis as verb (> 250:1) Hypothesis: the low-ranking alternative is discarded

132 N/V ambiguity () Note that top-down vs. bottom-up questions are immediately implicated, in theory Jurafsky includes the cost of generating the initial under the S of course, it s a small cost as P(S -> ) = 0.92 If parsing were bottom-up, that cost would not have been explicitly calculated yet

133 Garden path models 2

134 Garden path models 2 The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs)

135 Garden path models 2 The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) The horse raced past the barn fell.

136 Garden path models 2 The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) The horse raced past the barn fell. (that was)

137 Garden path models 2 The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) The horse raced past the barn fell. (that was) From the valence + simple-constituency perspective, MC and RRC analyses differ in two places:

138 Garden path models 2 The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) The horse raced past the barn fell. (that was) From the valence + simple-constituency perspective, MC and RRC analyses differ in two places:

139 Garden path models 2 The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) The horse raced past the barn fell. (that was) From the valence + simple-constituency perspective, MC and RRC analyses differ in two places: p=0.14

140 Garden path models 2 The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) The horse raced past the barn fell. (that was) From the valence + simple-constituency perspective, MC and RRC analyses differ in two places: p=0.14 transitive valence: p=0.08

141 Garden path models 2 The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) The horse raced past the barn fell. (that was) From the valence + simple-constituency perspective, MC and RRC analyses differ in two places: p=0.14 transitive valence: p=0.08

142 Garden path models 2 The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) The horse raced past the barn fell. (that was) From the valence + simple-constituency perspective, MC and RRC analyses differ in two places: p=0.14 p 1 transitive valence: p=0.08

143 Garden path models 2 The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) The horse raced past the barn fell. (that was) From the valence + simple-constituency perspective, MC and RRC analyses differ in two places: p=0.14 p 1 transitive valence: p=0.08 best intransitive: p=0.92

144 Garden path models 2, cont.

145 Garden path models 2, cont. 82 : 1 probability ratio means that lower-probability analysis is discarded

146 Garden path models 2, cont. 82 : 1 probability ratio means that lower-probability analysis is discarded In contrast, some RRCs do not induce garden paths:

147 Garden path models 2, cont. 82 : 1 probability ratio means that lower-probability analysis is discarded In contrast, some RRCs do not induce garden paths: The bird found in the room died.

148 Garden path models 2, cont. 82 : 1 probability ratio means that lower-probability analysis is discarded In contrast, some RRCs do not induce garden paths: The bird found in the room died. Here, the probability ratio turns out to be much closer ( 4 : 1) because found is preferentially transitive

149 Garden path models 2, cont. 82 : 1 probability ratio means that lower-probability analysis is discarded In contrast, some RRCs do not induce garden paths: The bird found in the room died. Here, the probability ratio turns out to be much closer ( 4 : 1) because found is preferentially transitive Conclusion within pruning theory: beam threshold is between 4 : 1 and 82 : 1

150 Garden path models 2, cont. 82 : 1 probability ratio means that lower-probability analysis is discarded In contrast, some RRCs do not induce garden paths: The bird found in the room died. Here, the probability ratio turns out to be much closer ( 4 : 1) because found is preferentially transitive Conclusion within pruning theory: beam threshold is between 4 : 1 and 82 : 1 (granularity issue: when exactly does probability cost of valence get paid???)

151 Notes on the probabilistic model Jurafsky 1996 is a product-of-experts (PoE) model Expert 1: the constituency model Expert 2: the valence model PoEs are flexible and easy to define, but hard to learn The Jurafsky 1996 model is actually deficient (loses probability mass), due to relative frequency estimation

152 Notes on the probabilistic model (2) Jurafsky 1996 predated most work on lexicalized parsers (Collins 1999, Charniak 1997) In a generative lexicalized parser, valence and constituency are often combined through decomposition & Markov assumptions, e.g., sometimes approximated as The use of decomposition makes it easy to learn nondeficient models

153 Jurafsky 1996 & pruning: main points Syntactic comprehension is probabilistic Offline preferences explained by syntactic + valence probabilities Online garden-path results explained by same model, when beam search/pruning is assumed

154 General issues What is the granularity of incremental analysis? In [ the complex houses], complex could be an adjective (=the houses are complex) complex could also be a noun (=the houses of the complex) Should these be distinguished, or combined? When does valence probability cost get paid? What is the criterion for abandoning an analysis? Should the number of maintained analyses affect processing difficulty as well?

155 Generalizing incremental disambiguation

156 Generalizing incremental disambiguation Another type of uncertainty:

157 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the

158 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the statue?

159 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the statue? dog?

160 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the statue? view? dog?

161 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the statue? view? dog? woman?

162 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the statue? view? dog? woman? The squirrel stored some nuts in the

163 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the statue? view? dog? woman? The squirrel stored some nuts in the tree

164 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the statue? view? dog? woman? The squirrel stored some nuts in the tree This is uncertainty about what has not yet been said

165 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the statue? view? dog? woman? The squirrel stored some nuts in the tree This is uncertainty about what has not yet been said Reading-time (Ehrlich & Rayner, 1981) and EEG (Kutas & Hillyard, 1980, 1984) evidence shows this affects processing rapidly

166 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the statue? view? dog? woman? The squirrel stored some nuts in the tree This is uncertainty about what has not yet been said Reading-time (Ehrlich & Rayner, 1981) and EEG (Kutas & Hillyard, 1980, 1984) evidence shows this affects processing rapidly A good model should account for expectations about how this uncertainty will be resolved

167 Generalizing incremental disambiguation Another type of uncertainty: The old man stopped and stared at the statue? view? dog? woman? The squirrel stored some nuts in the tree This is uncertainty about what has not yet been said Reading-time (Ehrlich & Rayner, 1981) and EEG (Kutas & Hillyard, 1980, 1984) evidence shows this affects processing rapidly A good model should account for expectations about how this uncertainty will be resolved

168 Quantifying probabilistic online processing difficulty

169 Quantifying probabilistic online processing difficulty Let a word s difficulty be its surprisal given its context: (Hale, 2001, NAACL; Levy, 2008, Cognition)

170 Quantifying probabilistic online processing difficulty Let a word s difficulty be its surprisal given its context: Captures the expectation intuition: the more we expect an event, the easier it is to process (Hale, 2001, NAACL; Levy, 2008, Cognition)

171 Quantifying probabilistic online processing difficulty Let a word s difficulty be its surprisal given its context: Captures the expectation intuition: the more we expect an event, the easier it is to process Brains are prediction engines! (Hale, 2001, NAACL; Levy, 2008, Cognition)

172 Quantifying probabilistic online processing difficulty Let a word s difficulty be its surprisal given its context: Captures the expectation intuition: the more we expect an event, the easier it is to process Brains are prediction engines! Predictable words are read faster (Ehrlich & Rayner, 1981) and have distinctive EEG responses (Kutas & Hillyard 1980) (Hale, 2001, NAACL; Levy, 2008, Cognition)

173 Quantifying probabilistic online processing difficulty Let a word s difficulty be its surprisal given its context: Captures the expectation intuition: the more we expect an event, the easier it is to process Brains are prediction engines! Predictable words are read faster (Ehrlich & Rayner, 1981) and have distinctive EEG responses (Kutas & Hillyard 1980) Combine with probabilistic grammars to give grammatical expectations (Hale, 2001, NAACL; Levy, 2008, Cognition)

174 The surprisal graph Surprisal (-log P) Probability

175 Syntactic complexity--non-probabilistic On the resource limitation view, memory demands are a processing bottleneck Gibson 1998, 2000 (DLT): multiple and/or more distant dependencies are harder to process

176 Syntactic complexity--non-probabilistic On the resource limitation view, memory demands are a processing bottleneck Gibson 1998, 2000 (DLT): multiple and/or more distant dependencies are harder to process the reporter who attacked the senator

177 Syntactic complexity--non-probabilistic On the resource limitation view, memory demands are a processing bottleneck Gibson 1998, 2000 (DLT): multiple and/or more distant dependencies are harder to process the reporter who attacked the senator

178 Syntactic complexity--non-probabilistic On the resource limitation view, memory demands are a processing bottleneck Gibson 1998, 2000 (DLT): multiple and/or more distant dependencies are harder to process the reporter who attacked the senator

179 Syntactic complexity--non-probabilistic On the resource limitation view, memory demands are a processing bottleneck Gibson 1998, 2000 (DLT): multiple and/or more distant dependencies are harder to process the reporter who attacked the senator Processing

180 Syntactic complexity--non-probabilistic On the resource limitation view, memory demands are a processing bottleneck Gibson 1998, 2000 (DLT): multiple and/or more distant dependencies are harder to process the reporter who attacked the senator Processing Easy

181 Syntactic complexity--non-probabilistic On the resource limitation view, memory demands are a processing bottleneck Gibson 1998, 2000 (DLT): multiple and/or more distant dependencies are harder to process the reporter who attacked the senator the reporter who the senator attacked Processing Easy

182 Syntactic complexity--non-probabilistic On the resource limitation view, memory demands are a processing bottleneck Gibson 1998, 2000 (DLT): multiple and/or more distant dependencies are harder to process the reporter who attacked the senator the reporter who the senator attacked Processing Easy

183 Syntactic complexity--non-probabilistic On the resource limitation view, memory demands are a processing bottleneck Gibson 1998, 2000 (DLT): multiple and/or more distant dependencies are harder to process the reporter who attacked the senator Processing Easy the reporter who the senator attacked Hard

184 Expectations versus memory Suppose you know that some event class X has to happen in the future, but you don t know: 1. When X is going to occur 2. Which member of X it s going to be The things W you see before X can give you hints about (1) and (2) If expectations facilitate processing, then seeing W should generally speed processing of X But you also have to keep W in memory and retrieve it at X This could slow processing at X

185 What happens in German final-verb processing? Variation in pre-verbal dependency structure also found in verb-final clauses such as in German Die Einsicht, dass der Freund The insight, that the.nom friend dem Kunden das Auto aus Plastik the.dat client the.acc car of plastic verkaufte, erheiterte die Anderen. sold, amused the others. 40

186 What happens in German final-verb processing? Variation in pre-verbal dependency structure also found in verb-final clauses such as in German Die Einsicht, dass der Freund The insight, that the.nom friend dem Kunden das Auto aus Plastik the.dat client the.acc car of plastic verkaufte, erheiterte die Anderen. sold, amused the others. 40

187 What happens in German final-verb processing?...daß der Freund DEM Kunden das Auto verkaufte...that the friend the client the car sold...that the friend sold the client a car... (Konieczny & Döring 200)

188 What happens in German final-verb processing?...daß der Freund DEM Kunden das Auto verkaufte...that the friend the client the car sold...that the friend sold the client a car... (Konieczny & Döring 200)

189 What happens in German final-verb processing?...daß der Freund DEM Kunden das Auto verkaufte...that the friend the client the car sold...that the friend sold the client a car... (Konieczny & Döring 200)

190 What happens in German final-verb processing?...daß der Freund DEM Kunden das Auto verkaufte...that the friend the client the car sold...that the friend sold the client a car... (Konieczny & Döring 200)

191 What happens in German final-verb processing?...daß der Freund DEM Kunden das Auto verkaufte...that the friend the client the car sold...that the friend sold the client a car......daß der Freund DES Kunden das Auto verkaufte...that the friend the client the car sold...that the friend of the client sold a car... (Konieczny & Döring 200)

192 What happens in German final-verb processing?...daß der Freund DEM Kunden das Auto verkaufte...that the friend the client the car sold...that the friend sold the client a car......daß der Freund DES Kunden das Auto verkaufte...that the friend the client the car sold...that the friend of the client sold a car... (Konieczny & Döring 200)

193 What happens in German final-verb processing?...daß der Freund DEM Kunden das Auto verkaufte...that the friend the client the car sold...that the friend sold the client a car......daß der Freund DES Kunden das Auto verkaufte...that the friend the client the car sold...that the friend of the client sold a car... (Konieczny & Döring 200)

194 What happens in German final-verb processing?...daß der Freund DEM Kunden das Auto verkaufte...that the friend the client the car sold...that the friend sold the client a car......daß der Freund DES Kunden das Auto verkaufte...that the friend the client the car sold...that the friend of the client sold a car... (Konieczny & Döring 200)

195 What happens in German final-verb processing?...daß der Freund DEM Kunden das Auto verkaufte...that the friend the client the car sold...that the friend sold the client a car......daß der Freund DES Kunden das Auto verkaufte...that the friend the client the car sold...that the friend of the client sold a car... Locality: final verb read faster in DES condition Observed: final verb read faster in DEM condition (Konieczny & Döring 200)

196

197 daß daß

198 SBAR COMP daß SBAR COMP daß

199 COMP daß SBAR Next: nom acc dat PP ADVP Verb COMP daß SBAR Next: nom acc dat PP ADVP Verb

200 COMP daß SBAR der Freund Next: nom acc dat PP ADVP Verb COMP daß SBAR der Freund Next: nom acc dat PP ADVP Verb

201 COMP daß SBAR nom der Freund S Next: nom acc dat PP ADVP Verb COMP daß SBAR nom der Freund S Next: nom acc dat PP ADVP Verb

202 COMP daß SBAR nom der Freund S DEM Kunden Next: nom acc dat PP ADVP Verb COMP daß SBAR nom der Freund S DES Kunden Next: nom acc dat PP ADVP Verb

203 COMP daß SBAR nom der Freund S dat DEM Kunden VP Next: nom acc dat PP ADVP Verb COMP daß SBAR S nom nom gen der Freund DES Kunden Next: nom acc dat PP ADVP Verb

204 COMP daß SBAR nom der Freund S dat DEM Kunden VP das Auto Next: nom acc dat PP ADVP Verb COMP daß SBAR S nom nom gen der Freund DES Kunden das Auto Next: nom acc dat PP ADVP Verb

205 COMP daß SBAR nom der Freund S dat DEM Kunden VP acc das Auto Next: nom acc dat PP ADVP Verb COMP daß SBAR S nom nom gen der Freund DES Kunden acc das Auto VP Next: nom acc dat PP ADVP Verb

206 COMP daß SBAR nom der Freund S dat DEM Kunden VP acc das Auto verkaufte Next: nom acc dat PP ADVP Verb COMP daß SBAR S nom nom gen der Freund DES Kunden acc das Auto VP verkaufte Next: nom acc dat PP ADVP Verb

207 COMP daß SBAR nom der Freund S dat DEM Kunden VP acc das Auto V verkaufte Next: nom acc dat PP ADVP Verb COMP daß SBAR S nom nom gen der Freund DES Kunden acc das Auto VP V verkaufte Next: nom acc dat PP ADVP Verb

208 COMP daß SBAR nom der Freund S dat DEM Kunden VP acc das Auto V verkaufte Next: nom acc dat PP ADVP Verb COMP daß SBAR S nom nom gen der Freund DES Kunden acc das Auto VP V verkaufte Next: nom acc dat PP ADVP Verb

209 Model results Reading time (ms) P(w i ): word probability Locality-based predictions dem Kunden (dative) des Kunden (genitive) slower faster ~0% greater expectation in dative condition once again, wrong monotonicity

210 Garden-pathing and surprisal When the dog scratched the vet and his new assistant removed the muzzle. (Frazier & Rayner, 1982)

211 Garden-pathing and surprisal Here s another type of local syntactic ambiguity When the dog scratched the vet and his new assistant removed the muzzle. (Frazier & Rayner, 1982)

212 Garden-pathing and surprisal Here s another type of local syntactic ambiguity When the dog scratched the vet and his new assistant removed the muzzle. (Frazier & Rayner, 1982)

213 Garden-pathing and surprisal Here s another type of local syntactic ambiguity When the dog scratched the vet and his new assistant removed the muzzle. (Frazier & Rayner, 1982)

214 Garden-pathing and surprisal Here s another type of local syntactic ambiguity When the dog scratched the vet and his new assistant removed the muzzle. (Frazier & Rayner, 1982)

215 Garden-pathing and surprisal Here s another type of local syntactic ambiguity When the dog scratched the vet and his new assistant removed the muzzle. difficulty here (68ms/char) (Frazier & Rayner, 1982)

216 Garden-pathing and surprisal Here s another type of local syntactic ambiguity When the dog scratched the vet and his new assistant removed the muzzle. Compare with: difficulty here (68ms/char) When the dog scratched, the vet and his new assistant removed the muzzle. When the dog scratched its owner the vet and his new assistant removed the muzzle. (Frazier & Rayner, 1982)

217 Garden-pathing and surprisal Here s another type of local syntactic ambiguity When the dog scratched the vet and his new assistant removed the muzzle. Compare with: difficulty here (68ms/char) When the dog scratched, the vet and his new assistant removed the muzzle. When the dog scratched its owner the vet and his new assistant removed the muzzle. (Frazier & Rayner, 1982)

218 Garden-pathing and surprisal Here s another type of local syntactic ambiguity When the dog scratched the vet and his new assistant removed the muzzle. Compare with: difficulty here (68ms/char) When the dog scratched, the vet and his new assistant removed the muzzle. When the dog scratched its owner the vet and his new assistant removed the muzzle. (Frazier & Rayner, 1982)

219 Garden-pathing and surprisal Here s another type of local syntactic ambiguity When the dog scratched the vet and his new assistant removed the muzzle. Compare with: difficulty here (68ms/char) When the dog scratched, the vet and his new assistant removed the muzzle. When the dog scratched its owner the vet and his new assistant removed the muzzle. (Frazier & Rayner, 1982)

220 Garden-pathing and surprisal Here s another type of local syntactic ambiguity When the dog scratched the vet and his new assistant removed the muzzle. Compare with: difficulty here (68ms/char) When the dog scratched, the vet and his new assistant removed the muzzle. When the dog scratched its owner the vet and his new assistant removed the muzzle. easier (50ms/char) (Frazier & Rayner, 1982)

221 How is surprisal efficiently computed? To understand this, we ll look at a different set of slides 45

222 PCFG review (2) We just learned how to calculate the probability of a tree The probability of a string w 1 n is the sum of the probabilities of all trees whose yield is w 1 n The probability of a string prefix w 1 i is the sum of the probabilities of all trees whose yield begins with w 1 i If we had the probabilities of two string prefixes w 1 i 1 and w 1 i, we could calculate the conditional probability P(w i w 1 i 1 ) as their ratio: P(w i w 1...i 1 ) = P(w 1...i) P(w 1...i 1 )

223 Inference over infinite tree sets Consider the following noun-phrase grammar: 2 Det N 1 PP 1 PP P 1 Det the 2 N dog 1 N cat 1 P near

224 Inference over infinite tree sets Consider the following noun-phrase grammar: 2 Det N 1 PP 1 PP P Question: given a sentence starting with the... 1 Det the 2 what is the probability that the next word is dog? N dog 1 N cat 1 P near

225 Inference over infinite tree sets Consider the following noun-phrase grammar: 2 Det N 1 PP 1 PP P Question: given a sentence starting with the... 1 Det the 2 what is the probability that the next word is dog? Intuitively, the answers to this question should be N dog 1 N cat 1 P near P(dog the) = 2

226 Inference over infinite tree sets Consider the following noun-phrase grammar: 2 Det N 1 PP 1 PP P Question: given a sentence starting with the... 1 Det the 2 what is the probability that the next word is dog? Intuitively, the answers to this question should be N dog 1 N cat 1 P near P(dog the) = 2 because the second word HAS to be either dog or cat.

227 Inference over infinite tree sets (2) 2 Det N 1 PP 1 PP P 1 Det the 2 N dog 1 N cat 1 P near We should just enumerate the trees that cover the dog...,

228 Inference over infinite tree sets (2) 2 Det N 1 PP 1 PP P 1 Det the 2 N dog 1 N cat 1 P near We should just enumerate the trees that cover the dog..., and divide their total probability by that of the...

229 Inference over infinite tree sets (2) 2 Det N 1 PP 1 PP P 1 Det the 2 N dog 1 N cat 1 P near We should just enumerate the trees that cover the dog..., and divide their total probability by that of the but there are infinitely many trees.

230 Inference over infinite tree sets (2) 2 Det N 1 PP 1 PP P 1 Det the 2 N dog 1 N cat 1 P near We should just enumerate the trees that cover the dog..., and divide their total probability by that of the but there are infinitely many trees.... Det N PP PP PP the dog Det N PP PP the dog Det N PP the dog Det N the dog

231 2 Det N 1 PP 1 PP P 1 Det the 2 N dog 1 N cat 1 P near Shortcut 1: you can think of a partial tree as marginalizing over all completions of the partial tree. It has a corresponding marginal probability in the PCFG. Det the N dog 4 9 Det the N dog P near PP 8 81 Det the N dog Det the N dog P near PP 4 81 Det the N cat

232 2 Det N 1 PP 1 PP P 1 Det the 2 N dog 1 N cat 1 P near Shortcut 1: you can think of a partial tree as marginalizing over all completions of the partial tree. It has a corresponding marginal probability in the PCFG. Det the Det the N dog 4 9 N dog P Det the near PP N dog Det the N P near PP 8 81 Det the N dog Det the N dog P near PP 4 81 Det the N cat

233 2 Det N 1 PP 1 PP P 1 Det the 2 N dog 1 N cat 1 P near Shortcut 1: you can think of a partial tree as marginalizing over all completions of the partial tree. It has a corresponding marginal probability in the PCFG. Det the Det the N dog 4 9 N dog P Det the near PP N dog Det the N P near PP 8 81 Det the N dog Det the N dog 4 27 PP Det the N dog P near PP 4 81 Det the N cat

234 Problem 2: there are still an infinite number of incomplete trees covering a partial input. Det the N dog 4 9 Det the N dog 4 27 PP Det the N dog PP 4 81 PP Det the N dog PP 4 24 PP PP

235 Problem 2: there are still an infinite number of incomplete trees covering a partial input. Det the N dog 4 9 Det the N dog 4 27 PP BUT! These tree probabilities form a geometric series: Det the N dog PP 4 81 PP Det the N dog PP 4 24 PP PP P(the dog...) =

236 Problem 2: there are still an infinite number of incomplete trees covering a partial input. Det the N dog 4 9 Det the N dog 4 27 PP BUT! These tree probabilities form a geometric series: Det the N dog PP 4 81 PP Det the N dog PP 4 24 PP PP P(the dog...) = = 4 ( 1) i 9 i=0

237 Problem 2: there are still an infinite number of incomplete trees covering a partial input. Det the N dog 4 9 Det the N dog 4 27 PP BUT! These tree probabilities form a geometric series: Det the N dog PP 4 81 PP Det the N dog PP 4 24 PP PP P(the dog...) = = 4 ( 1) i 9 = 2 i=0

238 Problem 2: there are still an infinite number of incomplete trees covering a partial input. Det the N dog 4 9 Det the N dog 4 27 PP BUT! These tree probabilities form a geometric series: Det the N dog PP 4 81 PP Det the N dog PP 4 24 PP PP P(the dog...) = = 4 ( 1) i 9 = 2 i=0... which matches the original rule probability 2 N dog

239 Generalizing the geometric series induced by rule recursion In general, these infinite tree sets arise due to left recursion in a probabilistic grammar A B α B A β (Stolcke, 1995)

240 Generalizing the geometric series induced by rule recursion In general, these infinite tree sets arise due to left recursion in a probabilistic grammar A B α B A β We can formulate a stochastic left-corner matrix of transitions between categories: A B... K P L = A B K (Stolcke, 1995)

241 Generalizing the geometric series induced by rule recursion In general, these infinite tree sets arise due to left recursion in a probabilistic grammar A B α B A β We can formulate a stochastic left-corner matrix of transitions between categories: A B... K P L = A B K and solve for its closure R L = (I P L ) 1. (Stolcke, 1995)

242 Generalizing the geometric series 1 ROOT 2 Det N 1 PP 1 PP P The closure of our left-corner matrix is 1 Det the 2 N dog 1 N cat 1 P near ROOT PP Det N P ROOT PP R L = Det N P

243 Generalizing the geometric series 1 ROOT 2 Det N 1 PP 1 PP P The closure of our left-corner matrix is 1 Det the 2 N dog 1 N cat 1 P near ROOT PP Det N P ROOT PP R L = Det N P Refer to an entry (X, Y) in this matrix as R(X L Y)

244 Generalizing the geometric series 1 ROOT 2 Det N 1 PP 1 PP P The closure of our left-corner matrix is 1 Det the 2 N dog 1 N cat 1 P near ROOT PP Det N P ROOT PP R L = Det N P Refer to an entry (X, Y) in this matrix as R(X L Y) Note that the 2 bonus accrued for left-recursion of s appears in the (ROOT,) and (,) cells of the matrix

245 Generalizing the geometric series 1 ROOT 2 Det N 1 PP 1 PP P The closure of our left-corner matrix is 1 Det the 2 N dog 1 N cat 1 P near ROOT PP Det N P ROOT PP R L = Det N P Refer to an entry (X, Y) in this matrix as R(X L Y) Note that the 2 bonus accrued for left-recursion of s appears in the (ROOT,) and (,) cells of the matrix We need to do the same with unary chains, constructing a unary-closure matrix R U.

246 Efficient incremental parsing: the probabilistic Earley algorithm We can use the Earley algorithm (Earley, 1970) in a probabilistic incarnation (Stolcke, 1995) to deal with these infinite tree sets. The (slightly oversimplified) probabilistic Earley algorithm has two fundamental types of operations: Prediction: if Y is a possible goal, and Y can lead to Z through a left corner, choose a rule Z α and set up α as a new sequence of possible goals. Completion: if Y is a possible goal, Y can lead to Z through unary rewrites, and we encounter a completed Z, absorb it and move on to the next sub-goal in the sequence.

247 Efficient incremental parsing: the probabilistic Earley algorithm Parsing consists of constructing a chart of states (items) A state has the following structure: Goal symbol Forward probability Completed subgoals X α β p q Remaining subgoals Inside probability The forward probability is the total probability of getting from the root at the start of the sentence through to this state The inside probability is the bottom-up probability of the state

248 Efficient incremental parsing: the probabilistic Earley algorithm Inference rules for probabilistic Earley: Prediction: X β Yγ p q a : R(Y L Z) b : Z α Z α abp b

249 Efficient incremental parsing: the probabilistic Earley algorithm Inference rules for probabilistic Earley: Prediction: X β Yγ p q Completion: X β Yγ p q a : R(Y L Z) b : Z α Z α abp b a : R(Y U Z) X βy γ acp acq Z α b c

250 Efficient incremental parsing: probabilistic Earley the dog near the

251 Efficient incremental parsing: probabilistic Earley ROOT 1 1 the dog near the

252 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 the dog near the

253 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 the dog near the

254 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 Det N 2 1 Det the 1 1 the dog near the

255 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 Det N 2 1 Det the 1 1 the dog near the

256 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 N cat 1 1 N dog 2 2 Det N 2 1 Det the 1 1 the dog near the

257 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 N cat 1 1 N dog 2 2 Det N 2 1 Det the 1 1 the dog near the

258 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 N cat 1 1 N dog 2 2 Det N 2 1 Det the 1 1 N dog 2 2 the dog near the

259 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 N cat 1 1 N dog 2 2 Det N 2 1 Det the 1 1 N dog 2 2 the dog near the

260 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 ROOT 4 9 PP 2 9 Det N 2 N cat N dog 2 2 Det N 2 1 Det the 1 1 N dog 2 2 the dog near the

261 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 ROOT 4 9 PP 2 9 Det N 2 N cat N dog 2 2 Det N 2 1 Det the 1 1 N dog 2 2 the dog near the

262 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 ROOT 4 9 PP 2 9 Det N 2 N cat N dog 2 2 P near PP P Det N 2 1 Det the 1 1 N dog 2 2 the dog near the

263 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 ROOT 4 9 PP 2 9 Det N 2 N cat N dog 2 2 P near PP P Det N 2 1 Det the 1 1 N dog 2 2 the dog near the

264 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 ROOT 4 9 PP 2 9 Det N 2 N cat N dog 2 2 P near PP P Det N 2 1 Det the 1 1 N dog 2 2 PP P P near the dog near the

265 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 ROOT 4 9 PP 2 9 Det N 2 N cat N dog 2 2 P near PP P Det N 2 1 Det the 1 1 N dog 2 2 PP P P near the dog near the

266 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 ROOT 4 9 PP 2 9 Det N 2 N cat N dog 2 2 P near PP P Det the Det N PP Det N 2 1 Det the 1 1 N dog 2 2 PP P P near the dog near the

267 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 ROOT 4 9 PP 2 9 Det N 2 N cat N dog 2 2 P near PP P Det the Det N PP Det N 2 1 Det the 1 1 N dog 2 2 PP P P near the dog near the

268 Efficient incremental parsing: probabilistic Earley Det the 1 1 Det N 2 2 PP ROOT 1 1 ROOT 4 9 PP 2 9 Det N 2 N cat N dog 2 2 P near PP P Det the Det N PP Det N 2 1 Det the 1 1 N dog 2 2 PP P P near Det N Det the the dog near the

269 Prefix probabilities from probabilistic Earley If you have just processed word w i, then the prefix probability of w 1...i can be obtained by summing all forward probabilities of items that have the form X αw i β

270 Prefix probabilities from probabilistic Earley If you have just processed word w i, then the prefix probability of w 1...i can be obtained by summing all forward probabilities of items that have the form X αw i β In our example, we see: P(the) = 1 P(the dog) = 2 P(the dog near) = 2 9 P(the dog near the) = 2 9

271 Prefix probabilities from probabilistic Earley If you have just processed word w i, then the prefix probability of w 1...i can be obtained by summing all forward probabilities of items that have the form X αw i β In our example, we see: P(the) = 1 P(the dog) = 2 P(the dog near) = 2 9 P(the dog near the) = 2 9 Taking the ratios of these prefix probabilities can give us conditional word probabilities

272 Probabilistic Earley as an eager algorithm From the inside probabilities of the states on the chart, the posterior distribution on (incremental) trees can be directly calculated This posterior distribution is precisely the correct result of the application of Bayes rule: P(T incremental w 1...i ) = P(w 1...i, T incremental ) P(w 1...i ) Hence, probabilistic Earley is also performing rational disambiguation Hale (2001) called this the eager property of an incremental parsing algorithm.

273 Probabilistic Earley algorithm: key ideas We want to use probabilistic grammars for both disambiguation and calculating probability distributions over upcoming events Infinitely many trees can be constructed in polynomial time ( ) and space ( ) The prefix probability of the string is calculated in the process By taking the log-ratio of two prefix probabilities, the surprisal of a word in its context can be calculated

274 Probabilistic Earley algorithm: key ideas We want to use probabilistic grammars for both disambiguation and calculating probability distributions over upcoming events Infinitely many trees can be constructed in polynomial time (O(n )) and space ( ) The prefix probability of the string is calculated in the process By taking the log-ratio of two prefix probabilities, the surprisal of a word in its context can be calculated

275 Probabilistic Earley algorithm: key ideas We want to use probabilistic grammars for both disambiguation and calculating probability distributions over upcoming events Infinitely many trees can be constructed in polynomial time (O(n )) and space (O(n 2 )) The prefix probability of the string is calculated in the process By taking the log-ratio of two prefix probabilities, the surprisal of a word in its context can be calculated

276 Other introductions You can read about the (non-probabilistic) Earley algorithm in (Jurafsky and Martin, 2000, Chapter 1) Prefix probabilities can also be calculated with an extension of the CKY algorithm due to Jelinek and Lafferty (1991)

277 A small PCFG for this sentence type S SBAR S 0. Conj and 1 Adj new 1 S VP 0.7 Det the 0.8 VP V 0.5 SBAR COMPL S 0. Det its 0.1 VP V 0.5 SBAR COMPL S COMMA 0.7 Det his 0.1 V scratched 0.25 COMPL When 1 N dog 0.2 V removed 0.25 Det N 0.6 N vet 0.2 V arrived 0.5 Det Adj N 0.2 N assistant 0.2 COMMA, 1 Conj 0.2 N muzzle 0.2 N owner 0.2 (analysis in Levy, 2011)

278 A small PCFG for this sentence type S SBAR S 0. Conj and 1 Adj new 1 S VP 0.7 Det the 0.8 VP V 0.5 SBAR COMPL S 0. Det its 0.1 VP V 0.5 SBAR COMPL S COMMA 0.7 Det his 0.1 V scratched 0.25 COMPL When 1 N dog 0.2 V removed 0.25 Det N 0.6 N vet 0.2 V arrived 0.5 Det Adj N 0.2 N assistant 0.2 COMMA, 1 Conj 0.2 N muzzle 0.2 N owner 0.2 (analysis in Levy, 2011)

279 Two incremental trees 47

280 Two incremental trees Garden-path analysis: S SBAR S COMPL S VP When VP V Det N V the dog scratched Conj Det N and Det Adj N the vet his new assistant 47

281 Two incremental trees Garden-path analysis: S SBAR S COMPL S VP When VP V Det N V the dog scratched Conj Det N and Det Adj N the vet his new assistant 47

282 Two incremental trees Garden-path analysis: S SBAR S COMPL S VP When VP V Det N V the dog scratched Conj Det N and Det Adj N the vet his new assistant P (T w )=

283 Two incremental trees Garden-path analysis: S SBAR S COMPL S VP When VP V Det N V the dog scratched Conj Det N and Det Adj N the vet Ultimately-correct analysis S his new assistant P (T w )=0.826 SBAR S COMPL S VP When VP Conj V Det N V Det N and Det Adj N the dog scratched the vet his new assistant 47

284 Two incremental trees Garden-path analysis: S SBAR S COMPL S VP When VP V Det N V the dog scratched Conj Det N and Det Adj N the vet Ultimately-correct analysis S his new assistant P (T w )=0.826 SBAR S COMPL S VP When VP Conj V Det N V Det N and Det Adj N the dog scratched the vet his P (T w )=0.174 new assistant 47

285 Two incremental trees Garden-path analysis: S SBAR S COMPL S VP When VP V Det N V the dog scratched Det N Conj and Det Adj N removed? the vet Ultimately-correct analysis S his new assistant P (T w )=0.826 SBAR S COMPL S VP When VP Conj V Det N V Det N and Det Adj N the dog scratched the vet his P (T w )=0.174 new assistant 47

286 Two incremental trees Garden-path analysis: S SBAR S COMPL S VP When VP V Det N V the dog scratched Det N Conj and Det Adj N removed? the vet Ultimately-correct analysis S his new assistant P (T w )=0.826 SBAR S COMPL S VP When VP Conj V Det N V Det N and Det Adj N the dog scratched the vet his new assistant P (T w )=0.174 removed? 47

287 Two incremental trees Garden-path analysis: S SBAR S Disambiguating word probability marginalizes over incremental trees: COMPL S VP When VP V Det N V the dog scratched Conj Det N and Det Adj N the vet Ultimately-correct analysis S his new assistant P (T w )=0.826 SBAR S COMPL S VP When VP Conj V Det N V Det N and Det Adj N the dog scratched the vet his P (T w )=0.174 new assistant 47

288 Two incremental trees Garden-path analysis: S SBAR S Disambiguating word probability marginalizes over incremental trees: COMPL S VP When VP V Det N V the dog scratched Conj Det N the vet and Det Adj Ultimately-correct analysis S his new assistant P (T w )=0.826 N P (removed w )= P (removed T )P (T w ) T = SBAR S COMPL S VP When VP Conj V Det N V Det N and Det Adj N the dog scratched the vet his P (T w )=0.174 new assistant 47

289 Preceding context can disambiguate its owner takes up the object slot of scratched S SBAR S COMPL S VP When VP Conj V Det N V Det N and Det Adj N the dog scratched Det N the vet his new assistant its owner Condition Surprisal at Resolution absent 4.2 present 2

290 Sensitivity to verb argument structure A superficially similar example: When the dog arrived the vet and his new assistant removed the muzzle. (Staub, 2007)

291 Sensitivity to verb argument structure A superficially similar example: When the dog arrived the vet and his new assistant removed the muzzle. Easier here (Staub, 2007)

292 Sensitivity to verb argument structure A superficially similar example: When the dog arrived the vet and his new assistant removed the muzzle. But harder here! Easier here (Staub, 2007)

293 Sensitivity to verb argument structure A superficially similar example: When the dog arrived the vet and his new assistant removed the muzzle. But harder here! Easier here (c.f. When the dog scratched the vet and his new assistant removed the muzzle.) (Staub, 2007)

294 Modeling argument-structure sensitivity S SBAR S 0. Conj and 1 Adj new 1 S VP 0.7 Det the 0.8 VP V 0.5 SBAR COMPL S 0. Det its 0.1 VP V 0.5 SBAR COMPL S COMMA 0.7 Det his 0.1 V scratched 0.25 COMPL When 1 N dog 0.2 V removed 0.25 Det N 0.6 N vet 0.2 V arrived 0.5 Det Adj N 0.2 N assistant 0.2 COMMA, 1 Conj 0.2 N muzzle 0.2 N owner 0.2

295 Modeling argument-structure sensitivity S SBAR S 0. Conj and 1 Adj new 1 S VP 0.7 Det the 0.8 VP V 0.5 SBAR COMPL S 0. Det its 0.1 VP V 0.5 SBAR COMPL S COMMA 0.7 Det his 0.1 V scratched 0.25 COMPL When 1 N dog 0.2 V removed 0.25 Det N 0.6 N vet 0.2 V arrived 0.5 Det Adj N 0.2 N assistant 0.2 COMMA, 1 Conj 0.2 N muzzle 0.2 N owner 0.2 The context-free assumption doesn t preclude relaxing probabilistic locality: (Johnson, 1999; Klein & Manning, 200)

296 Modeling argument-structure sensitivity S SBAR S 0. Conj and 1 Adj new 1 S VP 0.7 Det the 0.8 VP V 0.5 SBAR COMPL S 0. Det its 0.1 VP V 0.5 SBAR COMPL S COMMA 0.7 Det his 0.1 V scratched 0.25 COMPL When 1 N dog 0.2 V removed 0.25 Det N 0.6 N vet 0.2 V arrived 0.5 Det Adj N 0.2 N assistant 0.2 COMMA, 1 Conj 0.2 N muzzle 0.2 N owner 0.2 The context-free assumption doesn t preclude relaxing probabilistic locality: (Johnson, 1999; Klein & Manning, 200)

297 Modeling argument-structure sensitivity S SBAR S 0. Conj and 1 Adj new 1 S VP 0.7 Det the 0.8 VP V 0.5 SBAR COMPL S 0. Det its 0.1 VP V 0.5 SBAR COMPL S COMMA 0.7 Det his 0.1 V scratched 0.25 COMPL When 1 N dog 0.2 V removed 0.25 Det N 0.6 N vet 0.2 V arrived 0.5 Det Adj N 0.2 N assistant 0.2 COMMA, 1 Conj 0.2 N muzzle 0.2 N owner 0.2 The context-free assumption doesn t preclude relaxing probabilistic locality: VP V 0.5 VP Vtrans 0.45 VP V 0.5 VP Vtrans 0.05 V scratched 0.25 Replaced by VP Vintrans 0.45 V removed 0.25 VP Vintrans 0.05 V arrived 0.5 Vtrans scratched 0.5 Vtrans removed 0.5 Vintrans arrived 1 (Johnson, 1999; Klein & Manning, 200)

298 Result When the dog arrived the vet and his new assistant removed the muzzle. ambiguity onset ambiguity resolution When the dog scratched the vet and his new assistant removed the muzzle. Transitivity-distinguishing PCFG Condition Ambiguity onset Resolution Intransitive (arrived) Transitive (scratched)

299 Move to broad coverage Instead of the pedagogical grammar, a broad-coverage grammar from the parsed Brown corpus (11,984 rules) Relative-frequency estimation of rule probabilities ( vanilla PCFG) Transitive RT Intransitive RT First pass time Surprisal Transitive surprisal Intransitive surprisal (bits) the vet and his new assistant removed the muzzle.

300 Surprisal vs. predictability in general But why would surprisal be the specific function relating probability to processing difficulty? And is there empirical evidence that surprisal is the right function? (Smith & Levy, 201) 5

301 Surprisal vs. predictability in general But why would surprisal be the specific function relating probability to processing difficulty? And is there empirical evidence that surprisal is the right function? (Smith & Levy, 201) 5

302 Theoretical interpretations of surprisal

303 Theoretical interpretations of surprisal Three proposals:

304 Theoretical interpretations of surprisal Three proposals: Relative entropy from prior to posterior distribution in the probabilistic grammar (Levy, 2008)

305 Theoretical interpretations of surprisal Three proposals: Relative entropy from prior to posterior distribution in the probabilistic grammar (Levy, 2008) Optimal sensory discrimination: Bayesian sequential probability ratio test on what is this word? (Stone, 1960; Laming, 1968; Norris, 2006) Decision Threshold

306 Theoretical interpretations of surprisal Three proposals: Relative entropy from prior to posterior distribution in the probabilistic grammar (Levy, 2008) Optimal sensory discrimination: Bayesian sequential probability ratio test on what is this word? (Stone, 1960; Laming, 1968; Norris, 2006) Any kind of general probability sensitivity plus highly incremental processing (Smith & Levy, 201) Time(together come) = Time(to- come) + Time(-ge- come,to-) + Time(-ther come, toge-) Decision Threshold

307 Theoretical interpretations of surprisal Three proposals: Relative entropy from prior to posterior distribution in the probabilistic grammar (Levy, 2008) Optimal sensory discrimination: Bayesian sequential probability ratio test on what is this word? (Stone, 1960; Laming, 1968; Norris, 2006) Any kind of general probability sensitivity plus highly incremental processing (Smith & Levy, 201) Time(together come) = Time(to- come) + Time(-ge- come,to-) + Time(-ther come, toge-) Decision Threshold

308 Estimating probability/time curve shape

309 Estimating probability/time curve shape As a proxy for processing difficulty, reading time in two different methods: self-paced reading & eye-tracking

310 Estimating probability/time curve shape As a proxy for processing difficulty, reading time in two different methods: self-paced reading & eye-tracking Challenge: we need big data to estimate curve shape, but probability correlated with confounding variables

311 Estimating probability/time curve shape As a proxy for processing difficulty, reading time in two different methods: self-paced reading & eye-tracking Challenge: we need big data to estimate curve shape, but probability correlated with confounding variables (5K words) (50K words)

312 Estimating probability/time curve shape GAM regression: total contribution of word (trigram) probability to RT near-linear over 6 orders of magnitude! Total amount of slowdown (ms) Reading times in self-paced reading Total amount of slowdown (ms) Gaze durations in eye-tracking (Smith & Levy, 201) P(word context) P(word context)

313 Surprisal theory: summary Surprisal: a simple theory of how linguistic knowledge guides expectation formation in online processing Unifies ambiguity resolution and syntactic complexity Covers a range of findings in both domains Accounts for anti-locality effects problematic for memorybased theories of syntactic complexity 57

314 Levels of analysis Most of what we ve covered today is at Marr s computational level of analysis Focus on the goals of computation and solutions based on all available sources of knowledge Jurafsky s beam search introduced algorithmic level considerations Use beam search to keep # candidate parses manageable We ll now see an appealing alternative to beam search 58

315 Memory constraints: a theoretical puzzle Levy, Reali, & Griffiths, 2009, NIPS

316 Memory constraints: a theoretical puzzle # Logically possible analyses grows at best exponentially in sentence length Levy, Reali, & Griffiths, 2009, NIPS

317 Memory constraints: a theoretical puzzle # Logically possible analyses grows at best exponentially in sentence length Exact probabilistic inference with context-free grammars can be done efficiently in O(n ) Levy, Reali, & Griffiths, 2009, NIPS

318 Memory constraints: a theoretical puzzle # Logically possible analyses grows at best exponentially in sentence length Exact probabilistic inference with context-free grammars can be done efficiently in O(n ) But Requires probabilistic locality, limiting conditioning context Human parsing is linear that is, O(n) anyway Levy, Reali, & Griffiths, 2009, NIPS

319 Memory constraints: a theoretical puzzle # Logically possible analyses grows at best exponentially in sentence length Exact probabilistic inference with context-free grammars can be done efficiently in O(n ) But Requires probabilistic locality, limiting conditioning context Human parsing is linear that is, O(n) anyway So we must be restricting attention to some subset of analyses Levy, Reali, & Griffiths, 2009, NIPS

320 Memory constraints: a theoretical puzzle # Logically possible analyses grows at best exponentially in sentence length Exact probabilistic inference with context-free grammars can be done efficiently in O(n ) But Requires probabilistic locality, limiting conditioning context Human parsing is linear that is, O(n) anyway So we must be restricting attention to some subset of analyses Puzzle: how to choose and manage this subset? Previous efforts: k-best beam search Levy, Reali, & Griffiths, 2009, NIPS

321 Memory constraints: a theoretical puzzle # Logically possible analyses grows at best exponentially in sentence length Exact probabilistic inference with context-free grammars can be done efficiently in O(n ) But Requires probabilistic locality, limiting conditioning context Human parsing is linear that is, O(n) anyway So we must be restricting attention to some subset of analyses Puzzle: how to choose and manage this subset? Previous efforts: k-best beam search Here, we ll explore the particle filter as a model of limitedparallel approximate inference Levy, Reali, & Griffiths, 2009, NIPS

322 The particle filter: general picture Levy, Reali, & Griffiths, 2009, NIPS

323 The particle filter: general picture Sequential Monte Carlo for incremental observations Levy, Reali, & Griffiths, 2009, NIPS

324 The particle filter: general picture Sequential Monte Carlo for incremental observations Let x i be observed data, z i be unobserved states For parsing: x i are words, z i are incremental structures Levy, Reali, & Griffiths, 2009, NIPS

325 The particle filter: general picture Sequential Monte Carlo for incremental observations Let x i be observed data, z i be unobserved states For parsing: x i are words, z i are incremental structures Suppose that after n-1 observations we have the distribution over interpretations P(z n-1 x 1 n-1 ) Levy, Reali, & Griffiths, 2009, NIPS

326 The particle filter: general picture Sequential Monte Carlo for incremental observations Let x i be observed data, z i be unobserved states For parsing: x i are words, z i are incremental structures Suppose that after n-1 observations we have the distribution over interpretations P(z n-1 x 1 n-1 ) After next observation x n, represent P(z n x 1 n ) inductively: Posterior z } { P (z n x 1...n ) / Likelihood structure Prior z } { z } { P (x n z n ) P (z {z } n z n 1 ) P (z {z } n 1 x 1...n 1 ) observation Hypothesized Levy, Reali, & Griffiths, 2009, NIPS

327 The particle filter: general picture Sequential Monte Carlo for incremental observations Let x i be observed data, z i be unobserved states For parsing: x i are words, z i are incremental structures Suppose that after n-1 observations we have the distribution over interpretations P(z n-1 x 1 n-1 ) After next observation x n, represent P(z n x 1 n ) inductively: Posterior z } { P (z n x 1...n ) / Likelihood structure Approximate P(z i x 1 i ) by samples Prior z } { z } { P (x n z n ) P (z {z } n z n 1 ) P (z {z } n 1 x 1...n 1 ) observation Hypothesized Levy, Reali, & Griffiths, 2009, NIPS

328 The particle filter: general picture Sequential Monte Carlo for incremental observations Let x i be observed data, z i be unobserved states For parsing: x i are words, z i are incremental structures Suppose that after n-1 observations we have the distribution over interpretations P(z n-1 x 1 n-1 ) After next observation x n, represent P(z n x 1 n ) inductively: Posterior z } { P (z n x 1...n ) / Likelihood structure Approximate P(z i x 1 i ) by samples Prior z } { z } { P (x n z n ) P (z {z } n z n 1 ) P (z {z } n 1 x 1...n 1 ) observation Hypothesized Sample z n from P(z n z n-1 ), and reweight by P(x n z n ) Levy, Reali, & Griffiths, 2009, NIPS

329 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S * women brought sandwiches

330 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S * women brought sandwiches

331 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S N * women brought sandwiches

332 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S N* women brought sandwiches

333 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S N* women brought sandwiches 0.7

334 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S * N women brought sandwiches 0.7

335 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S VP * N women brought sandwiches 0.7

336 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S VP N V * women brought sandwiches 0.7

337 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S VP N V* women brought sandwiches 0.7

338 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S VP N V* women brought sandwiches

339 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S VP N V N * women brought sandwiches

340 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S VP N V N * women brought sandwiches

341 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S VP N V N * women brought sandwiches

342 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S VP N V N * women brought sandwiches women brought sandwiches

343 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S * VP N V N * women brought sandwiches women brought sandwiches

344 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP * N V N * women brought sandwiches women brought sandwiches

345 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N V N * N * women brought sandwiches women brought sandwiches

346 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N* N V N * women brought sandwiches women brought sandwiches

347 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N* N V N * women brought sandwiches women brought sandwiches 0.7

348 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N V N * N RRC * women brought sandwiches women brought sandwiches 0.7

349 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC N V N * Part * women brought sandwiches women brought sandwiches 0.7

350 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC N V N * Part * women brought sandwiches women brought sandwiches 0.7

351 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC N V N * Part * women brought sandwiches women brought sandwiches

352 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC N V N * Part N * women brought sandwiches women brought sandwiches

353 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC N V N * Part N * women brought sandwiches women brought sandwiches

354 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC N V N * Part N * women brought sandwiches women brought sandwiches

355 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC N V N * Part N * women brought sandwiches tripped women brought sandwiches

356 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP * N RRC N V N Part N * women brought sandwiches tripped women brought sandwiches

357 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S * VP N RRC N V N Part N * women brought sandwiches tripped women brought sandwiches

358 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC N V N Part N * women brought sandwiches tripped women brought sandwiches

359 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC N V N Part N * women brought sandwiches tripped women brought sandwiches

360 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC N V N Part N * women brought sandwiches tripped women brought sandwiches tripped

361 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP N RRC * N V N Part N women brought sandwiches tripped women brought sandwiches tripped

362 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP * N RRC N V N Part N women brought sandwiches tripped women brought sandwiches tripped

363 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP VP * N RRC N V N Part N women brought sandwiches tripped women brought sandwiches tripped

364 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP VP N V N N Part RRC N V * women brought sandwiches tripped women brought sandwiches tripped

365 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP VP N RRC V* N V N Part N women brought sandwiches tripped women brought sandwiches tripped

366 Particle filter with probabilistic grammars S VP 1.0 V brought 0.4 N 0.8 V broke 0. N RRC 0.2 V tripped 0. RR Part N 1.0 Part brought 0.1 CVP V N 1.0 Part broken 0.7 N women 0.7 Part tripped 0.2 N sandwich 0. Adv quickly 1.0 es S S VP VP N RRC V* N V N Part N women brought sandwiches tripped women brought sandwiches tripped

367 Simple garden-path sentences

368 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped

369 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

370 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

371 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

372 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

373 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

374 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

375 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich) Comprehender is initially misled away from ultimately correct interpretation

376 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich) Comprehender is initially misled away from ultimately correct interpretation With finitely many hypotheses, recovery is not always successful

377 Resampling in the particle filter With the naïve particle filter, inferences are highly dependent on initial choices Most particles wind up with small weights Region of dense posterior poorly explored Especially bad for parsing Space of possible parses grows (at best) exponentially with input length input

378 Resampling in the particle filter With the naïve particle filter, inferences are highly dependent on initial choices Most particles wind up with small weights Region of dense posterior poorly explored Especially bad for parsing Space of possible parses grows (at best) exponentially with input length input

379 Resampling in the particle filter With the naïve particle filter, inferences are highly dependent on initial choices Most particles wind up with small weights Region of dense posterior poorly explored Especially bad for parsing Space of possible parses grows (at best) exponentially with input length input

380 Resampling in the particle filter With the naïve particle filter, inferences are highly dependent on initial choices Most particles wind up with small weights Region of dense posterior poorly explored Especially bad for parsing Space of possible parses grows (at best) exponentially with input length input

381 Resampling in the particle filter With the naïve particle filter, inferences are highly dependent on initial choices Most particles wind up with small weights Region of dense posterior poorly explored Especially bad for parsing Space of possible parses grows (at best) exponentially with input length input

382 Resampling in the particle filter With the naïve particle filter, inferences are highly dependent on initial choices Most particles wind up with small weights Region of dense posterior poorly explored Especially bad for parsing Space of possible parses grows (at best) exponentially with input length We handle this by resampling at each input word input

383 Resampling in the particle filter With the naïve particle filter, inferences are highly dependent on initial choices Most particles wind up with small weights Region of dense posterior poorly explored Especially bad for parsing Space of possible parses grows (at best) exponentially with input length We handle this by resampling at each input word input

384 Resampling in the particle filter With the naïve particle filter, inferences are highly dependent on initial choices Most particles wind up with small weights Region of dense posterior poorly explored Especially bad for parsing Space of possible parses grows (at best) exponentially with input length We handle this by resampling at each input word input

385 Resampling in the particle filter With the naïve particle filter, inferences are highly dependent on initial choices Most particles wind up with small weights Region of dense posterior poorly explored Especially bad for parsing Space of possible parses grows (at best) exponentially with input length We handle this by resampling at each input word input

386 Resampling in the particle filter With the naïve particle filter, inferences are highly dependent on initial choices Most particles wind up with small weights Region of dense posterior poorly explored Especially bad for parsing Space of possible parses grows (at best) exponentially with input length We handle this by resampling at each input word input

387 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped

388 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

389 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

390 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich)

391 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich) Posterior initially misled away from ultimately correct interpretation

392 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich) Posterior initially misled away from ultimately correct interpretation

393 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich) Posterior initially misled away from ultimately correct interpretation

394 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich) Posterior initially misled away from ultimately correct interpretation

395 Simple garden-path sentences The woman brought the sandwich from the kitchen tripped MAIN VERB (it was the woman who brought the sandwich) REDUCED RELATIVE (the woman was brought the sandwich) Posterior initially misled away from ultimately correct interpretation With finite # of particles, recovery is not always successful

396 Returning to the puzzle ( /S ) A-S Tom heard the gossip wasn t true. A-L Tom heard the gossip about the neighbors wasn t true. U-S Tom heard that the gossip wasn t true. U-L Tom heard that the gossip about the neighbors wasn t true. Frazier & Rayner,1982; Tabor & Hutchins, 2004

397 Returning to the puzzle ( /S ) A-S Tom heard the gossip wasn t true. A-L Tom heard the gossip about the neighbors wasn t true. U-S Tom heard that the gossip wasn t true. U-L Tom heard that the gossip about the neighbors wasn t true. Previous empirical finding: ambiguity induces difficulty Frazier & Rayner,1982; Tabor & Hutchins, 2004

398 Returning to the puzzle ( /S ) A-S Tom heard the gossip wasn t true. A-L Tom heard the gossip about the neighbors wasn t true. U-S Tom heard that the gossip wasn t true. U-L Tom heard that the gossip about the neighbors wasn t true. Previous empirical finding: ambiguity induces difficulty Frazier & Rayner,1982; Tabor & Hutchins, 2004

399 Returning to the puzzle ( /S ) A-S Tom heard the gossip wasn t true. A-L Tom heard the gossip about the neighbors wasn t true. U-S Tom heard that the gossip wasn t true. U-L Tom heard that the gossip about the neighbors wasn t true. Previous empirical finding: ambiguity induces difficulty but so does the length of the ambiguous region Frazier & Rayner,1982; Tabor & Hutchins, 2004

400 Returning to the puzzle ( /S ) A-S Tom heard the gossip wasn t true. A-L Tom heard the gossip about the neighbors wasn t true. U-S Tom heard that the gossip wasn t true. U-L Tom heard that the gossip about the neighbors wasn t true. Previous empirical finding: ambiguity induces difficulty but so does the length of the ambiguous region Frazier & Rayner,1982; Tabor & Hutchins, 2004

401 Returning to the puzzle ( /S ) A-S Tom heard the gossip wasn t true. A-L Tom heard the gossip about the neighbors wasn t true. U-S Tom heard that the gossip wasn t true. U-L Tom heard that the gossip about the neighbors wasn t true. Previous empirical finding: ambiguity induces difficulty but so does the length of the ambiguous region Our linking hypothesis: Proportion of parse failures at the disambiguating region should increase with sentence difficulty Frazier & Rayner,1982; Tabor & Hutchins, 2004

402 Returning to the puzzle ( /S ) A-S Tom heard the gossip wasn t true. A-L Tom heard the gossip about the neighbors wasn t true. U-S Tom heard that the gossip wasn t true. U-L Tom heard that the gossip about the neighbors wasn t true. Previous empirical finding: ambiguity induces difficulty but so does the length of the ambiguous region Our linking hypothesis: Proportion of parse failures at the disambiguating region should increase with sentence difficulty Frazier & Rayner,1982; Tabor & Hutchins, 2004

403 Another example (Tabor & Hutchins 2004) Trans/Short As the author wrote the essay the book grew. Intr/Short As the author wrote the book grew. Trans/Long As the author wrote the essay the book describing Babylon grew. Intr/Long As the author wrote the book describing Babylon grew. /Z

404 Another example (Tabor & Hutchins 2004) Trans/Short As the author wrote the essay the book grew. Intr/Short As the author wrote the book grew. Trans/Long As the author wrote the essay the book describing Babylon grew. Intr/Long As the author wrote the book describing Babylon grew. /Z

405 Resampling-induced drift

406 Resampling-induced drift In ambiguous region, observed words aren t strongly informative (P(x i z i ) similar across different z i )

407 Resampling-induced drift In ambiguous region, observed words aren t strongly informative (P(x i z i ) similar across different z i ) But due to resampling, P(z i x i ) will drift

408 Resampling-induced drift In ambiguous region, observed words aren t strongly informative (P(x i z i ) similar across different z i ) But due to resampling, P(z i x i ) will drift One of the interpretations may be lost

409 Resampling-induced drift In ambiguous region, observed words aren t strongly informative (P(x i z i ) similar across different z i ) But due to resampling, P(z i x i ) will drift One of the interpretations may be lost

410 Resampling-induced drift In ambiguous region, observed words aren t strongly informative (P(x i z i ) similar across different z i ) But due to resampling, P(z i x i ) will drift One of the interpretations may be lost

411 Resampling-induced drift In ambiguous region, observed words aren t strongly informative (P(x i z i ) similar across different z i ) But due to resampling, P(z i x i ) will drift One of the interpretations may be lost The longer the ambiguous region, the more likely this is

412 Model results

Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters

Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters Computational Psycholinguistics Lecture : human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters Klinton Bicknell & Roger Levy LSA 05 Summer Institute University of

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

A Context-Free Grammar

A Context-Free Grammar Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences

More information

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up

More information

The relation of surprisal and human processing

The relation of surprisal and human processing The relation of surprisal and human processing difficulty Information Theory Lecture Vera Demberg and Matt Crocker Information Theory Lecture, Universität des Saarlandes April 19th, 2015 Information theory

More information

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP  NP PP 1.0. N people 0. /6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5

More information

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CMPT-825 Natural Language Processing. Why are parsing algorithms important? CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Roger Levy Probabilistic Models in the Study of Language draft, October 2, Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III

More information

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment. even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Processing Difficulty and Structural Uncertainty in Relative Clauses across East Asian Languages

Processing Difficulty and Structural Uncertainty in Relative Clauses across East Asian Languages Processing Difficulty and Structural Uncertainty in Relative Clauses across East Asian Languages Zhong Chen and John Hale Department of Linguistics Cornell University Incremental parsing... 0 Incremental

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Features of Statistical Parsers

Features of Statistical Parsers Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic

More information

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities ESSLLI, August 2015 Part 1: Grammars and cognitive hypotheses What is a grammar?

More information

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014 Part 1: Grammars and cognitive hypotheses What is a grammar?

More information

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...

More information

Processing/Speech, NLP and the Web

Processing/Speech, NLP and the Web CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis

More information

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):

More information

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing

More information

Marrying Dynamic Programming with Recurrent Neural Networks

Marrying Dynamic Programming with Recurrent Neural Networks Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing! CS 6120 Spring 2014! Northeastern University!! David Smith! with some slides from Jason Eisner & Andrew

More information

10/17/04. Today s Main Points

10/17/04. Today s Main Points Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points

More information

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities ESSLLI, August 2015 Part 1: Grammars and cognitive hypotheses What is a grammar?

More information

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument

More information

Introduction to Computational Linguistics

Introduction to Computational Linguistics Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates

More information

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Assignment/report submission

More information

Lecture 5: UDOP, Dependency Grammars

Lecture 5: UDOP, Dependency Grammars Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M

More information

Propositional Logic: Methods of Proof (Part II)

Propositional Logic: Methods of Proof (Part II) Propositional Logic: Methods of Proof (Part II) You will be expected to know Basic definitions Inference, derive, sound, complete Conjunctive Normal Form (CNF) Convert a Boolean formula to CNF Do a short

More information

(7) a. [ PP to John], Mary gave the book t [PP]. b. [ VP fix the car], I wonder whether she will t [VP].

(7) a. [ PP to John], Mary gave the book t [PP]. b. [ VP fix the car], I wonder whether she will t [VP]. CAS LX 522 Syntax I Fall 2000 September 18, 2000 Paul Hagstrom Week 2: Movement Movement Last time, we talked about subcategorization. (1) a. I can solve this problem. b. This problem, I can solve. (2)

More information

Constituency Parsing

Constituency Parsing CS5740: Natural Language Processing Spring 2017 Constituency Parsing Instructor: Yoav Artzi Slides adapted from Dan Klein, Dan Jurafsky, Chris Manning, Michael Collins, Luke Zettlemoyer, Yejin Choi, and

More information

CS 188 Introduction to AI Fall 2005 Stuart Russell Final

CS 188 Introduction to AI Fall 2005 Stuart Russell Final NAME: SID#: Section: 1 CS 188 Introduction to AI all 2005 Stuart Russell inal You have 2 hours and 50 minutes. he exam is open-book, open-notes. 100 points total. Panic not. Mark your answers ON HE EXAM

More information

Probabilistic Context Free Grammars

Probabilistic Context Free Grammars 1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set

More information

Alessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005

Alessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005 Alessandro Mazzei Dipartimento di Informatica Università di Torino MATER DI CIENZE COGNITIVE GENOVA 2005 04-11-05 Natural Language Grammars and Parsing Natural Language yntax Paolo ama Francesca yntactic

More information

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a) Chapter 13 Statistical Parsg Given a corpus of trees, it is easy to extract a CFG and estimate its parameters. Every tree can be thought of as a CFG derivation, and we just perform relative frequency estimation

More information

X-bar theory. X-bar :

X-bar theory. X-bar : is one of the greatest contributions of generative school in the filed of knowledge system. Besides linguistics, computer science is greatly indebted to Chomsky to have propounded the theory of x-bar.

More information

The Degree of Commitment to the Initial Analysis Predicts the Cost of Reanalysis: Evidence from Japanese Garden-Path Sentences

The Degree of Commitment to the Initial Analysis Predicts the Cost of Reanalysis: Evidence from Japanese Garden-Path Sentences P3-26 The Degree of Commitment to the Initial Analysis Predicts the Cost of Reanalysis: Evidence from Japanese Garden-Path Sentences Chie Nakamura, Manabu Arai Keio University, University of Tokyo, JSPS

More information

Algorithms for Syntax-Aware Statistical Machine Translation

Algorithms for Syntax-Aware Statistical Machine Translation Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial

More information

Ling 240 Lecture #15. Syntax 4

Ling 240 Lecture #15. Syntax 4 Ling 240 Lecture #15 Syntax 4 agenda for today Give me homework 3! Language presentation! return Quiz 2 brief review of friday More on transformation Homework 4 A set of Phrase Structure Rules S -> (aux)

More information

Models of Adjunction in Minimalist Grammars

Models of Adjunction in Minimalist Grammars Models of Adjunction in Minimalist Grammars Thomas Graf mail@thomasgraf.net http://thomasgraf.net Stony Brook University FG 2014 August 17, 2014 The Theory-Neutral CliffsNotes Insights Several properties

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 27, 2016 Recap: Probabilistic Language

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.

More information

Chapter 14 (Partially) Unsupervised Parsing

Chapter 14 (Partially) Unsupervised Parsing Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2016 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Probabilistic Context-Free Grammar

Probabilistic Context-Free Grammar Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612

More information

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21 Parsing Unger s Parser Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2016/17 1 / 21 Table of contents 1 Introduction 2 The Parser 3 An Example 4 Optimizations 5 Conclusion 2 / 21 Introduction

More information

Unification. Two Routes to Deep Structure. Unification. Unification Grammar. Martin Kay. Stanford University University of the Saarland

Unification. Two Routes to Deep Structure. Unification. Unification Grammar. Martin Kay. Stanford University University of the Saarland Two Routes to Deep Structure Derivational! Transformational! Procedural Deep Unidirectional Transformations Surface Stanford University University of the Saarland Constraint-based! Declarative Deep Surface

More information

Multilevel Coarse-to-Fine PCFG Parsing

Multilevel Coarse-to-Fine PCFG Parsing Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa

More information

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically NP S AUX VP Ali will V NP help D N the man A node is any point in the tree diagram and it can

More information

Extensions to the Logic of All x are y: Verbs, Relative Clauses, and Only

Extensions to the Logic of All x are y: Verbs, Relative Clauses, and Only 1/53 Extensions to the Logic of All x are y: Verbs, Relative Clauses, and Only Larry Moss Indiana University Nordic Logic School August 7-11, 2017 2/53 An example that we ll see a few times Consider the

More information

Introduction to Probablistic Natural Language Processing

Introduction to Probablistic Natural Language Processing Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation

More information

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania 1 PICTURE OF ANALYSIS PIPELINE Tokenize Maximum Entropy POS tagger MXPOST Ratnaparkhi Core Parser Collins

More information

Probabilistic Context-Free Grammars and beyond

Probabilistic Context-Free Grammars and beyond Probabilistic Context-Free Grammars and beyond Mark Johnson Microsoft Research / Brown University July 2007 1 / 87 Outline Introduction Formal languages and Grammars Probabilistic context-free grammars

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

Artificial Intelligence

Artificial Intelligence CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20-21 Natural Language Parsing Parsing of Sentences Are sentences flat linear structures? Why tree? Is

More information

CS 545 Lecture XVI: Parsing

CS 545 Lecture XVI: Parsing CS 545 Lecture XVI: Parsing brownies_choco81@yahoo.com brownies_choco81@yahoo.com Benjamin Snyder Parsing Given a grammar G and a sentence x = (x1, x2,..., xn), find the best parse tree. We re not going

More information

Semantics and Generative Grammar. The Semantics of Adjectival Modification 1. (1) Our Current Assumptions Regarding Adjectives and Common Ns

Semantics and Generative Grammar. The Semantics of Adjectival Modification 1. (1) Our Current Assumptions Regarding Adjectives and Common Ns The Semantics of Adjectival Modification 1 (1) Our Current Assumptions Regarding Adjectives and Common Ns a. Both adjectives and common nouns denote functions of type (i) [[ male ]] = [ λx : x D

More information

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia An Efficient Context-Free Parsing Algorithm Speakers: Morad Ankri Yaniv Elia Yaniv: Introduction Terminology Informal Explanation The Recognizer Morad: Example Time and Space Bounds Empirical results Practical

More information

Syntax-Based Decoding

Syntax-Based Decoding Syntax-Based Decoding Philipp Koehn 9 November 2017 1 syntax-based models Synchronous Context Free Grammar Rules 2 Nonterminal rules NP DET 1 2 JJ 3 DET 1 JJ 3 2 Terminal rules N maison house NP la maison

More information

Computational Models - Lecture 4 1

Computational Models - Lecture 4 1 Computational Models - Lecture 4 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 3/8, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Transformational Priors Over Grammars

Transformational Priors Over Grammars Transformational Priors Over Grammars Jason Eisner Johns Hopkins University July 6, 2002 EMNLP This talk is called Transformational Priors Over Grammars. It should become clear what I mean by a prior over

More information

Chiastic Lambda-Calculi

Chiastic Lambda-Calculi Chiastic Lambda-Calculi wren ng thornton Cognitive Science & Computational Linguistics Indiana University, Bloomington NLCS, 28 June 2013 wren ng thornton (Indiana University) Chiastic Lambda-Calculi NLCS,

More information

A Support Vector Method for Multivariate Performance Measures

A Support Vector Method for Multivariate Performance Measures A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme

More information

Continuations in Type Logical Grammar. Grafting Trees: (with Chris Barker, UCSD) NJPLS, 27 February Harvard University

Continuations in Type Logical Grammar. Grafting Trees: (with Chris Barker, UCSD) NJPLS, 27 February Harvard University Grafting Trees: Continuations in Type Logical Grammar Chung-chieh Shan Harvard University ccshan@post.harvard.edu (with Chris Barker, UCSD) NJPLS, 27 February 2004 Computational Linguistics 1 What a linguist

More information

Semantics 2 Part 1: Relative Clauses and Variables

Semantics 2 Part 1: Relative Clauses and Variables Semantics 2 Part 1: Relative Clauses and Variables Sam Alxatib EVELIN 2012 January 17, 2012 Reviewing Adjectives Adjectives are treated as predicates of individuals, i.e. as functions from individuals

More information

Computational Models - Lecture 3

Computational Models - Lecture 3 Slides modified by Benny Chor, based on original slides by Maurice Herlihy, Brown University. p. 1 Computational Models - Lecture 3 Equivalence of regular expressions and regular languages (lukewarm leftover

More information

Natural Language Processing. Lecture 13: More on CFG Parsing

Natural Language Processing. Lecture 13: More on CFG Parsing Natural Language Processing Lecture 13: More on CFG Parsing Probabilistc/Weighted Parsing Example: ambiguous parse Probabilistc CFG Ambiguous parse w/probabilites 0.05 0.05 0.20 0.10 0.30 0.20 0.60 0.75

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

Semantics and Generative Grammar. A Little Bit on Adverbs and Events

Semantics and Generative Grammar. A Little Bit on Adverbs and Events A Little Bit on Adverbs and Events 1. From Adjectives to Adverbs to Events We ve just developed a theory of the semantics of adjectives, under which they denote either functions of type (intersective

More information

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this. Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

Dependency grammar. Recurrent neural networks. Transition-based neural parsing. Word representations. Informs Models

Dependency grammar. Recurrent neural networks. Transition-based neural parsing. Word representations. Informs Models Dependency grammar Morphology Word order Transition-based neural parsing Word representations Recurrent neural networks Informs Models Dependency grammar Morphology Word order Transition-based neural parsing

More information

Recurrent neural network grammars

Recurrent neural network grammars Widespread phenomenon: Polarity items can only appear in certain contexts Recurrent neural network grammars lide credits: Chris Dyer, Adhiguna Kuncoro Example: anybody is a polarity item that tends to

More information

Introduction to Semantic Parsing with CCG

Introduction to Semantic Parsing with CCG Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG)

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models 1 University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Language Models & Hidden Markov Models Stephan Oepen & Erik Velldal Language

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 18, 2017 Recap: Probabilistic Language

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information