frequencies that matter in sentence processing

Size: px
Start display at page:

Download "frequencies that matter in sentence processing"

Transcription

1 frequencies that matter in sentence processing Marten van Schijndel 1 September 7, Department of Linguistics, The Ohio State University van Schijndel Frequencies that matter September 7, / 69

2 frequencies are important van Schijndel Frequencies that matter September 7, / 69

3 frequencies are important Occurrence frequencies describe languages well Zipf Statistical NLP (esp vector spaces) van Schijndel Frequencies that matter September 7, / 69

4 frequencies are important Occurrence frequencies describe languages well Zipf Statistical NLP (esp vector spaces) Occurrence frequencies have major influence on sentence processing Behavioral measures (eg, reading times) Processing measures (eg, ERPs) Uniform Information Density Saarland SFB-1102 van Schijndel Frequencies that matter September 7, / 69

5 null hypothesis demands control van Schijndel Frequencies that matter September 7, / 69

6 null hypothesis demands control Linguists must control for frequencies to do research van Schijndel Frequencies that matter September 7, / 69

7 null hypothesis demands control Linguists must control for frequencies to do research How do people try to account for frequencies? van Schijndel Frequencies that matter September 7, / 69

8 Case Study 1: Cloze Probabilities van Schijndel, Culicover, & Schuler (2014) Pertains to: Pickering & Traxler (2003), inter alia van Schijndel Frequencies that matter September 7, / 69

9 generation/cloze probabilities Ask subjects to generate distribution van Schijndel Frequencies that matter September 7, / 69

10 generation/cloze probabilities Ask subjects to generate distribution Sentence generation norming: Write sentences with these words landed, sneezed, laughed, van Schijndel Frequencies that matter September 7, / 69

11 generation/cloze probabilities Ask subjects to generate distribution Sentence generation norming: Write sentences with these words landed, sneezed, laughed, Cloze norming: Complete this sentence The pilot landed van Schijndel Frequencies that matter September 7, / 69

12 generation/cloze probabilities Ask subjects to generate distribution Sentence generation norming: Write sentences with these words landed, sneezed, laughed, Cloze norming: Complete this sentence The pilot landed the plane van Schijndel Frequencies that matter September 7, / 69

13 generation/cloze probabilities Ask subjects to generate distribution Sentence generation norming: Write sentences with these words landed, sneezed, laughed, Cloze norming: Complete this sentence The pilot landed the plane The pilot landed in the field van Schijndel Frequencies that matter September 7, / 69

14 generation/cloze probabilities Ask subjects to generate distribution Sentence generation norming: Write sentences with these words landed, sneezed, laughed, Cloze norming: Complete this sentence NP:The pilot landed the plane PP: The pilot landed in the field Pickering & Traxler (2003) used 6 cloze tasks to determine frequencies van Schijndel Frequencies that matter September 7, / 69

15 generation/cloze probabilities Ask subjects to generate distribution Sentence generation norming: Write sentences with these words landed, sneezed, laughed, Cloze norming: Complete this sentence NP:The pilot landed the plane PP: The pilot landed in the field 25% 40% Pickering & Traxler (2003) used 6 cloze tasks to determine frequencies van Schijndel Frequencies that matter September 7, / 69

16 pickering & traxler (2003) Stimuli (1) That s the plane that the pilot landed behind in the fog (2) That s the truck that the pilot landed behind in the fog Readers slow down at landed in (2) van Schijndel Frequencies that matter September 7, / 69

17 pickering & traxler (2003) Stimuli (1) That s the plane that the pilot landed behind in the fog (2) That s the truck that the pilot landed behind in the fog Readers slow down at landed in (2) Suggests they try to link truck as the object of landed despite: landed biased for PP complement 40% PP complement 25% NP complement van Schijndel Frequencies that matter September 7, / 69

18 pickering & traxler (2003) Readers initially adopt a transitive interpretation despite subcat bias van Schijndel Frequencies that matter September 7, / 69

19 pickering & traxler (2003) Readers initially adopt a transitive interpretation despite subcat bias Early-attachment processing heuristic van Schijndel Frequencies that matter September 7, / 69

20 pickering & traxler (2003) Readers initially adopt a transitive interpretation despite subcat bias Early-attachment processing heuristic But what about syntactic frequencies? van Schijndel Frequencies that matter September 7, / 69

21 generalized categorial grammar (gcg) Nguyen et al (2012) N N A-aN D N-aD N-rN V-gN the apple that D N N-aD V-aN-gN V-aN-bN the girl ate van Schijndel Frequencies that matter September 7, / 69

22 generalized categorial grammar (gcg) Nguyen et al (2012) N N A-aN D N-aD N-rN V -gn the apple that D N N-aD V-aN -gn V-aN-bN the girl ate van Schijndel Frequencies that matter September 7, / 69

23 what about syntactic frequencies? Pickering & Traxler (2003) (1) That s the plane that the pilot landed behind in the fog (2) That s the truck that the pilot landed behind in the fog (a) VP-gNP (b) VP-gNP VP-gNP PP VP PP-gNP TV t i P NP IV P t i landed behind Transitive landed behind Intransitive van Schijndel Frequencies that matter September 7, / 69

24 what about syntactic frequencies? Pickering & Traxler (2003) (1) That s the plane that the pilot landed behind in the fog (2) That s the truck that the pilot landed behind in the fog (a) VP-gNP (b) VP-gNP VP-gNP PP VP PP-gNP TV t i P NP IV P t i landed behind Transitive landed behind Intransitive van Schijndel Frequencies that matter September 7, / 69

25 what about syntactic frequencies? Pickering & Traxler (2003) (1) That s the plane that the pilot landed behind in the fog (2) That s the truck that the pilot landed behind in the fog (a) VP-gNP (b) VP-gNP VP-gNP PP VP PP-gNP TV t i P NP IV P t i landed behind Transitive landed behind Intransitive van Schijndel Frequencies that matter September 7, / 69

26 what about syntactic frequencies? Pickering & Traxler (2003) (1) That s the plane that the pilot landed behind in the fog (2) That s the truck that the pilot landed behind in the fog van Schijndel et al (2014) Using syntactic probabilities with cloze data: van Schijndel Frequencies that matter September 7, / 69

27 what about syntactic frequencies? Pickering & Traxler (2003) (1) That s the plane that the pilot landed behind in the fog (2) That s the truck that the pilot landed behind in the fog van Schijndel et al (2014) Using syntactic probabilities with cloze data: P(Transitive landed) 0016 P(Intransitive landed) 0004 van Schijndel Frequencies that matter September 7, / 69

28 what about syntactic frequencies? Pickering & Traxler (2003) (1) That s the plane that the pilot landed behind in the fog (2) That s the truck that the pilot landed behind in the fog van Schijndel et al (2014) Using syntactic probabilities with cloze data: P(Transitive landed) 0016 P(Intransitive landed) 0004 Transitive interpretation is 300% more likely! van Schijndel Frequencies that matter September 7, / 69

29 results: case study 1 Subcat processing accounted for by hierarchic syntactic frequencies Early attachment heuristic unnecessary van Schijndel Frequencies that matter September 7, / 69

30 results: case study 1 Subcat processing accounted for by hierarchic syntactic frequencies Early attachment heuristic unnecessary Also applies to heavy-np shift heuristics (Staub, 2006), unaccusative processing (Staub et al, 2007), etc van Schijndel Frequencies that matter September 7, / 69

31 results: case study 1 Subcat processing accounted for by hierarchic syntactic frequencies Early attachment heuristic unnecessary Also applies to heavy-np shift heuristics (Staub, 2006), unaccusative processing (Staub et al, 2007), etc Suggests cloze probabilities are insufficient as a frequency control van Schijndel Frequencies that matter September 7, / 69

32 results: case study 1 Subcat processing accounted for by hierarchic syntactic frequencies Early attachment heuristic unnecessary Also applies to heavy-np shift heuristics (Staub, 2006), unaccusative processing (Staub et al, 2007), etc Suggests cloze probabilities are insufficient as a frequency control But do people use hierarchic syntactic probabilities? van Schijndel Frequencies that matter September 7, / 69

33 Case Study 2: N-grams and Syntactic Probabilities van Schijndel & Schuler (2015) Pertains to: Frank & Bod (2011), inter alia van Schijndel Frequencies that matter September 7, / 69

34 case study 2: overview Previous studies have debated whether humans use hierarchic syntax van Schijndel Frequencies that matter September 7, / 69

35 case study 2: overview Previous studies have debated whether humans use hierarchic syntax NP NP RC D N IN VP the apple that D NP N V ate the girl van Schijndel Frequencies that matter September 7, / 69

36 case study 2: overview Previous studies have debated whether humans use hierarchic syntax NP NP RC D N IN VP the apple that D NP N V ate the girl But how robust were their models? van Schijndel Frequencies that matter September 7, / 69

37 case study 2: overview This work shows that: van Schijndel Frequencies that matter September 7, / 69

38 case study 2: overview This work shows that: N-gram models can be greatly improved (accumulation) van Schijndel Frequencies that matter September 7, / 69

39 case study 2: overview This work shows that: N-gram models can be greatly improved (accumulation) Hierarchic syntax is still predictive over stronger baseline (Long distance dependencies independently improve model) van Schijndel Frequencies that matter September 7, / 69

40 hierarchic syntax in reading? The red 1 2 apple that the girl ate Frank & Bod (2011) van Schijndel Frequencies that matter September 7, / 69

41 hierarchic syntax in reading? The red 1 2 apple that the girl ate Frank & Bod (2011) Baseline: Sentence Position Word length N-grams (Unigram, bigram) van Schijndel Frequencies that matter September 7, / 69

42 hierarchic syntax in reading? The w 1 red w2 apple w 3 that w 4 the w 5 girl w 6 ate Frank & Bod (2011) Baseline: Sentence Position Word length N-grams (Unigram, bigram) van Schijndel Frequencies that matter September 7, / 69

43 hierarchic syntax in reading? The red apple that the 4 chars {}}{ girl w 6 ate Frank & Bod (2011) Baseline: Sentence Position Word length N-grams (Unigram, bigram) van Schijndel Frequencies that matter September 7, / 69

44 hierarchic syntax in reading? The red apple that the 4 chars {}}{ girl w 6 ate Frank & Bod (2011) Baseline: Sentence Position Word length N-grams (Unigram, bigram) van Schijndel Frequencies that matter September 7, / 69

45 hierarchic syntax in reading? The red apple that the girl ate Frank & Bod (2011) Baseline: Sentence Position Word length N-grams (Unigram, bigram) Test POS Predictors: Echo State Network (ESN) Phrase Structure Grammar (PSG) van Schijndel Frequencies that matter September 7, / 69

46 hierarchic syntax in reading? DT ADJ NN IN DT NN VBD Frank & Bod (2011) Baseline: Sentence Position Word length N-grams (Unigram, bigram) Test POS Predictors: Echo State Network (ESN) Phrase Structure Grammar (PSG) van Schijndel Frequencies that matter September 7, / 69

47 hierarchic syntax in reading? NP NP RC DT NP IN VP JJ NN DT NP NN VBD Frank & Bod (2011) Baseline: Sentence Position Word length N-grams (Unigram, bigram) Test POS Predictors: Echo State Network (ESN) Phrase Structure Grammar (PSG) van Schijndel Frequencies that matter September 7, / 69

48 hierarchic syntax in reading? Frank & Bod (2011) Baseline: Sentence Position Word length N-grams (Unigram, bigram) Test POS Predictors: Echo State Network (ESN) Phrase Structure Grammar (PSG) van Schijndel Frequencies that matter September 7, / 69

49 hierarchic syntax in reading? Frank & Bod (2011) Baseline: Sentence Position Word length N-grams (Unigram, bigram) Test POS Predictors: Echo State Network (ESN) Phrase Structure Grammar (PSG) Outcome: PSG < ESN + PSG ESN = ESN + PSG van Schijndel Frequencies that matter September 7, / 69

50 hierarchic syntax in reading? Frank & Bod (2011) Baseline: Sentence Position Word length Test POS Predictors: Echo State Network (ESN) Phrase Structure Grammar (PSG) N-grams (Unigram, bigram) Outcome: PSG < ESN + PSG ESN = ESN + PSG Sequential helps over hierarchic van Schijndel Frequencies that matter September 7, / 69

51 hierarchic syntax in reading? Frank & Bod (2011) Baseline: Sentence Position Word length N-grams (Unigram, bigram) Test POS Predictors: Echo State Network (ESN) Phrase Structure Grammar (PSG) Outcome: PSG < ESN + PSG ESN = ESN + PSG Hierarchic doesn t help over sequential van Schijndel Frequencies that matter September 7, / 69

52 hierarchic syntax in reading? Fossum & Levy (2012) Replicated Frank & Bod (2011): PSG < ESN + PSG ESN = ESN + PSG van Schijndel Frequencies that matter September 7, / 69

53 hierarchic syntax in reading? Fossum & Levy (2012) Replicated Frank & Bod (2011): PSG < ESN + PSG ESN = ESN + PSG Better n-gram baseline (more data) changes result: PSG = ESN + PSG ESN = ESN + PSG van Schijndel Frequencies that matter September 7, / 69

54 hierarchic syntax in reading? Fossum & Levy (2012) Replicated Frank & Bod (2011): PSG < ESN + PSG ESN = ESN + PSG Better n-gram baseline (more data) changes result: PSG = ESN + PSG Sequential doesn t help over hierarchic ESN = ESN + PSG van Schijndel Frequencies that matter September 7, / 69

55 hierarchic syntax in reading? Fossum & Levy (2012) Replicated Frank & Bod (2011): PSG < ESN + PSG ESN = ESN + PSG Better n-gram baseline (more data) changes result: PSG = ESN + PSG Sequential doesn t help over hierarchic ESN = ESN + PSG Also: lexicalized syntax improves PSG fit van Schijndel Frequencies that matter September 7, / 69

56 improved n-gram baseline Most previous reading time studies: N-grams trained on WSJ, Dundee, BNC (or less) van Schijndel Frequencies that matter September 7, / 69

57 improved n-gram baseline Most previous reading time studies: N-grams trained on WSJ, Dundee, BNC (or less) This study: N-grams trained on Gigaword 40 van Schijndel Frequencies that matter September 7, / 69

58 improved n-gram baseline Most previous reading time studies: N-grams trained on WSJ, Dundee, BNC (or less) Unigrams/Bigrams This study: N-grams trained on Gigaword 40 van Schijndel Frequencies that matter September 7, / 69

59 improved n-gram baseline 20 N gram Perplexity Perplexity N van Schijndel Frequencies that matter September 7, / 69

60 improved n-gram baseline Most previous reading time studies: N-grams trained on WSJ, Dundee, BNC Unigrams/Bigrams This study: N-grams trained on Gigaword 40 5-grams van Schijndel Frequencies that matter September 7, / 69

61 improved n-gram baseline Most previous reading time studies: N-grams trained on WSJ, Dundee, BNC Unigrams/Bigrams Only from region boundaries This study: N-grams trained on Gigaword 40 5-grams van Schijndel Frequencies that matter September 7, / 69

62 improved n-gram baseline Bigram Example Reading time of girl after red The red 1 apple that the 2 girl } {{ } region X : bigram target ate X: bigram condition van Schijndel Frequencies that matter September 7, / 69

63 improved n-gram baseline Bigram Example Reading time of girl after red The red 1 apple that the 2 girl } {{ } region X : bigram target ate X: bigram condition Fails to capture entire sequence; van Schijndel Frequencies that matter September 7, / 69

64 improved n-gram baseline Bigram Example Reading time of girl after red The red 1 apple that the 2 girl } {{ } region X : bigram target ate X: bigram condition Fails to capture entire sequence; Conditions never generated; van Schijndel Frequencies that matter September 7, / 69

65 improved n-gram baseline Bigram Example Reading time of girl after red The red 1 apple that the 2 girl } {{ } region X : bigram target ate X: bigram condition Fails to capture entire sequence; Conditions never generated; Probability of sequence is deficient van Schijndel Frequencies that matter September 7, / 69

66 improved n-gram baseline Cumulative Bigram Example Reading time of girl after red: The 1 red apple that the 2 girl ate X : bigram targets X: bigram conditions van Schijndel Frequencies that matter September 7, / 69

67 improved n-gram baseline Cumulative Bigram Example Reading time of girl after red: The 1 red apple that the 2 girl ate X : bigram targets X: bigram conditions Captures entire sequence; Well-formed sequence probability; Reflects processing that must be done by humans van Schijndel Frequencies that matter September 7, / 69

68 improved n-gram baseline Most previous reading time studies: N-grams trained on WSJ, Dundee, BNC Unigrams/Bigrams Only from region boundaries This study: N-grams trained on Gigaword 40 5-grams van Schijndel Frequencies that matter September 7, / 69

69 improved n-gram baseline Most previous reading time studies: N-grams trained on WSJ, Dundee, BNC Unigrams/Bigrams Only from region boundaries This study: N-grams trained on Gigaword 40 5-grams Cumulative and Non-cumulative van Schijndel Frequencies that matter September 7, / 69

70 evaluation Dundee Corpus (Kennedy et al, 2003) 10 subjects 2,388 sentences First pass durations ( 200,000) Go-past durations ( 200,000) Exclusions: Unknown words (<5 tokens) First and last of each line Regions larger than 4 words (track loss) van Schijndel Frequencies that matter September 7, / 69

71 cumu-n-grams predict reading times Baseline: Fixed Effects Sentence Position Word length Region Length Preceding word fixated? Random Effects Item/Subject Intercepts By Subject Slopes: All Fixed Effects N-grams (5-grams) N-grams (Cumu-5-grams) van Schijndel Frequencies that matter September 7, / 69

72 cumu-n-grams predict reading times Baseline: Fixed Effects Sentence Position Word length Region Length Preceding word fixated? Random Effects Item/Subject Intercepts By Subject Slopes: All Fixed Effects N-grams (5-grams) N-grams (Cumu-5-grams) van Schijndel Frequencies that matter September 7, / 69

73 cumu-n-grams predict reading times First Pass and Go-Past van Schijndel Frequencies that matter September 7, / 69

74 follow-up questions Is hierarchic surprisal useful over the better baseline? van Schijndel Frequencies that matter September 7, / 69

75 follow-up questions Is hierarchic surprisal useful over the better baseline? If so, can it be similarly improved through accumulation? van Schijndel Frequencies that matter September 7, / 69

76 follow-up questions Is hierarchic surprisal useful over the better baseline? If so, can it be similarly improved through accumulation? van Schijndel & Schuler (2013) found it could over weaker baselines Grammar: Berkeley parser, WSJ, 5 split-merge cycles (Petrov & Klein 2007) van Schijndel Frequencies that matter September 7, / 69

77 hierarchic surprisal predicts reading times Baseline: Fixed Effects Same as before N-grams (5-grams) N-grams (Cumu-5-grams) van Schijndel Frequencies that matter September 7, / 69

78 hierarchic surprisal predicts reading times Baseline: Fixed Effects Same as before N-grams (5-grams) N-grams (Cumu-5-grams) Random Effects Same as before By Subject Slopes: Hierarchic surprisal Cumu-Hierarchic surprisal van Schijndel Frequencies that matter September 7, / 69

79 hierarchic surprisal predicts reading times Baseline: Fixed Effects Same as before N-grams (5-grams) N-grams (Cumu-5-grams) Random Effects Same as before By Subject Slopes: Hierarchic surprisal Cumu-Hierarchic surprisal van Schijndel Frequencies that matter September 7, / 69

80 hierarchic surprisal predicts reading times First Pass and Go-Past van Schijndel Frequencies that matter September 7, / 69

81 cumulative surprisal doesn t help?! Suggests previous findings were due to weaker n-gram baseline van Schijndel Frequencies that matter September 7, / 69

82 cumulative surprisal doesn t help?! Suggests previous findings were due to weaker n-gram baseline Suggests only local PCFG surprisal affects reading times van Schijndel Frequencies that matter September 7, / 69

83 cumulative surprisal doesn t help?! Suggests previous findings were due to weaker n-gram baseline Suggests only local PCFG surprisal affects reading times Follow-up work shows long distance dependencies independently influence reading times van Schijndel Frequencies that matter September 7, / 69

84 results: case study 2 Hierarchic syntax predicts reading times over strong linear baseline van Schijndel Frequencies that matter September 7, / 69

85 results: case study 2 Hierarchic syntax predicts reading times over strong linear baseline Long-distance dependencies help over hierarchic syntax van Schijndel Frequencies that matter September 7, / 69

86 results: case study 2 Hierarchic syntax predicts reading times over strong linear baseline Long-distance dependencies help over hierarchic syntax Studies should use cumu-n-grams in their baselines van Schijndel Frequencies that matter September 7, / 69

87 what does this mean for our models? We need to carefully control for: Cloze probabilities (Smith 2011) van Schijndel Frequencies that matter September 7, / 69

88 what does this mean for our models? We need to carefully control for: Cloze probabilities (Smith 2011) N-gram frequencies (local and cumulative) van Schijndel Frequencies that matter September 7, / 69

89 what does this mean for our models? We need to carefully control for: Cloze probabilities (Smith 2011) N-gram frequencies (local and cumulative) Hierarchic syntactic frequencies van Schijndel Frequencies that matter September 7, / 69

90 what does this mean for our models? We need to carefully control for: Cloze probabilities (Smith 2011) N-gram frequencies (local and cumulative) Hierarchic syntactic frequencies Long distance dependency frequencies van Schijndel Frequencies that matter September 7, / 69

91 what does this mean for our models? We need to carefully control for: Cloze probabilities (Smith 2011) N-gram frequencies (local and cumulative) Hierarchic syntactic frequencies Long distance dependency frequencies (discourse, etc) Then we can try to interpret experimental results What do we do about convergence? Is there a way to avoid this explosion of control predictors? van Schijndel Frequencies that matter September 7, / 69

92 Case Study 3: Evading Frequency Confounds van Schijndel, Murphy, & Schuler (2015) van Schijndel Frequencies that matter September 7, / 69

93 motivation Can we measure memory load with fewer controls? van Schijndel Frequencies that matter September 7, / 69

94 motivation Can we measure memory load with fewer controls? Why do so many factors influence results? van Schijndel Frequencies that matter September 7, / 69

95 motivation Can we measure memory load with fewer controls? Why do so many factors influence results? Low dimensionality measures van Schijndel Frequencies that matter September 7, / 69

96 motivation Can we measure memory load with fewer controls? Why do so many factors influence results? Low dimensionality measures Do the factors become separable in another space? van Schijndel Frequencies that matter September 7, / 69

97 motivation Can we measure memory load with fewer controls? Why do so many factors influence results? Low dimensionality measures Do the factors become separable in another space? Let s try using MEG van Schijndel Frequencies that matter September 7, / 69

98 what is meg? van Schijndel Frequencies that matter September 7, / 69

99 what is meg? 102 locations van Schijndel Frequencies that matter September 7, / 69

100 how might meg reflect load? Jensen et al, (2012) van Schijndel Frequencies that matter September 7, / 69

101 how might meg reflect load? Jensen et al, (2012) van Schijndel Frequencies that matter September 7, / 69

102 where to look? Memory is a function of distributed processing van Schijndel Frequencies that matter September 7, / 69

103 where to look? Memory is a function of distributed processing Look for synchronized firing between sensors (brain regions) van Schijndel Frequencies that matter September 7, / 69

104 where to look? van Schijndel Frequencies that matter September 7, / 69

105 where to look? Memory is a function of distributed processing Look for synchronized firing between sensors (brain regions) This study uses spectral coherence measurements van Schijndel Frequencies that matter September 7, / 69

106 spectral coherence coherence(x, y) = E[S xy ] E[Sxx ] E[S yy ] van Schijndel Frequencies that matter September 7, / 69

107 spectral coherence coherence(x, y) = E[S xy ] E[Sxx ] E[S yy ] cross-correlation autocorrelations van Schijndel Frequencies that matter September 7, / 69

108 spectral coherence coherence(x, y) = E[S xy ] E[Sxx ] E[S yy ] cross-correlation autocorrelations Amount of connectivity (synchronization) not caused by chance van Schijndel Frequencies that matter September 7, / 69

109 spectral coherence: phase synchrony Fell & Axmacher (2011) van Schijndel Frequencies that matter September 7, / 69

110 audiobook meg corpus Collected 2 years ago at CMU van Schijndel Frequencies that matter September 7, / 69

111 audiobook meg corpus Collected 2 years ago at CMU 3 subjects van Schijndel Frequencies that matter September 7, / 69

112 audiobook meg corpus Collected 2 years ago at CMU 3 subjects Heart of Darkness, ch 2 12,342 words 80 (8 x 10) minutes Synched with parallel audio recording and forced alignment van Schijndel Frequencies that matter September 7, / 69

113 audiobook meg corpus Collected 2 years ago at CMU 3 subjects Heart of Darkness, ch 2 12,342 words 80 (8 x 10) minutes Synched with parallel audio recording and forced alignment 306-channel Elekta Neuromag, CMU Movement/noise correction: SSP, SSS, tsss Band-pass filtered Hz Downsampled to 125 Hz Visually scanned for muscle artifacts; none found van Schijndel Frequencies that matter September 7, / 69

114 memory load via center embedding d1 The cart broke d2 that the man bought van Schijndel Frequencies that matter September 7, / 69

115 memory load via center embedding d1 The cart broke d2 that the man bought Depth annotations: van Schijndel et al, (2013) parser Nguyen et al, (2012) Generalized Categorial Grammar (GCG) van Schijndel Frequencies that matter September 7, / 69

116 data filtering Remove words: in short or long sentences (<4 or >50 words) that follow a word at another depth that fail to parse Partition data: Dev set: One third of corpus Test set: Two thirds of corpus van Schijndel Frequencies that matter September 7, / 69

117 compute coherence Group by factor Compute coherence over subsets of 4 epochs van Schijndel Frequencies that matter September 7, / 69

118 dev coherence Coherence (d 2 d 1 ) Frequency (Hz) Time (s) van Schijndel Frequencies that matter September 7, / 69

119 dev coherence +variance Coherence (d 2 d 1 ) Frequency (Hz) Time (s) van Schijndel Frequencies that matter September 7, / 69

120 possible confounds? Sentence position Unigram, Bigram, Trigram: COCA logprobs PCFG surprisal: parser output van Schijndel Frequencies that matter September 7, / 69

121 dev results Factor p-value Unigram 0941 Bigram 0257 Trigram 0073 PCFG Surprisal 0482 Sentence Position 0031 Depth 0005 Depth 1 (40 items) Depth 2 (1118 items) van Schijndel Frequencies that matter September 7, / 69

122 test results Factor p-value Unigram Bigram Trigram PCFG Surprisal Sentence Position Depth Depth 1 (86 items) Depth 2 (2142 items) van Schijndel Frequencies that matter September 7, / 69

123 test results Factor p-value Unigram Bigram Trigram PCFG Surprisal Sentence Position Depth Bonferroni correction removes trigrams, but van Schijndel Frequencies that matter September 7, / 69

124 compute coherence: increased resolution Group by factor Compute coherence over subsets of 6 epochs van Schijndel Frequencies that matter September 7, / 69

125 test results: increased resolution Factor p-value Trigram Depth Depth 1 (57 items) Depth 2 (1428 items) van Schijndel Frequencies that matter September 7, / 69

126 results: case study 3 Memory load is reflected in MEG connectivity Common confounds do not pose problems for oscillatory measures van Schijndel Frequencies that matter September 7, / 69

127 conclusions Cloze probabilities are insufficient as frequency control Hierarchic syntactic frequencies strongly influence processing Reading time studies need to use local and cumulative n-grams Oscillatory analyses could avoid control predictor explosion van Schijndel Frequencies that matter September 7, / 69

128 acknowledgements Stefan Frank, Matthew Traxler, Shari Speer, Roberto Zamparelli Attendees of CogSci 2014, CUNY 2015, NAACL 2015, CMCL 2015 OSU Linguistics Targeted Investment for Excellence ( ) National Science Foundation (DGE ) University of Pittsburgh Medical Center MEG Seed Fund National Institutes of Health CRCNS (5R01HD ) van Schijndel Frequencies that matter September 7, / 69

129 thanks! questions? Cloze probabilities are insufficient as frequency control Hierarchic syntactic frequencies strongly influence processing Reading time studies need to use local and cumulative n-grams Oscillatory analyses could avoid control predictor explosion van Schijndel Frequencies that matter September 7, / 69

130 improved n-gram baseline 20 N gram Perplexity Perplexity N van Schijndel Frequencies that matter September 7, / 69

131 further improved n-gram baseline 20 N gram Perplexity Perplexity N van Schijndel Frequencies that matter September 7, / 69

132 further improved n-gram baseline 20 Pruned N gram Perplexity Perplexity pruned unpruned N van Schijndel Frequencies that matter September 7, / 69

133 model How probable is each subtree? Wall Street Journal (WSJ) section of the Penn Treebank: a) VP-gNP (b) VP-gNP VP-gNP PP VP PP-gNP VP-gNP Adv P NP VP Adv P t i TV t i carefully behind IV carefully behind landed Transitive landed Intransitive van Schijndel Frequencies that matter September 7, / 69

134 model How probable is each subtree? Wall Street Journal (WSJ) section of the Penn Treebank: a) VP-gNP (b) VP-gNP VP-gNP PP VP PP-gNP VP-gNP Adv P NP VP Adv P t i TV landed t i carefully Transitive behind IV landed carefully behind Intransitive van Schijndel Frequencies that matter September 7, / 69

135 model What is the probability of each interpretation? P(syntactic configuration) P(generating the verb from that tree) P(Transitive) = P(VP-gNP VP-gNP PP) P(verb TV) (1) P(Intransitive) = P(VP-gNP VP PP-gNP) P(verb IV) (2) van Schijndel Frequencies that matter September 7, / 69

136 model What is the probability of each interpretation? P(syntactic configuration) P(generating the verb from that tree) P(subcat bias)/p(preterminal prior) P(Transitive) = P(VP-gNP VP-gNP PP) P(verb TV) (1) P(Intransitive) = P(VP-gNP VP PP-gNP) P(verb IV) (2) van Schijndel Frequencies that matter September 7, / 69

137 model What is the probability of each interpretation? P(syntactic configuration) P(subcat bias)/p(preterminal prior) P(Transitive) = P(VP-gNP VP-gNP PP) P(verb TV) (1) P(VP-gNP VP-gNP PP) P(TV verb) P(TV) P(Intransitive) = P(VP-gNP VP PP-gNP) P(verb IV) (2) P(VP-gNP VP PP-gNP) P(IV verb) P(IV) van Schijndel Frequencies that matter September 7, / 69

138 model What are the preterminal priors? Relative prior probability from the WSJ: P(TV):P(IV) = 26: 1 van Schijndel Frequencies that matter September 7, / 69

139 model What is the probability of each interpretation? P(syntactic configuration) P(subcat bias)/p(preterminal prior) P(Transitive) P(VP-gNP VP-gNP PP) = 017 P(TV verb) 26 P(Intransitive) P(VP-gNP VP PP-gNP) = 001 P(IV verb) 10 P(TV verb) P(TV) P(IV verb) P(IV) (1) (2) van Schijndel Frequencies that matter September 7, / 69

140 model What is the probability of each interpretation? P(syntactic configuration) P(subcat bias)/p(preterminal prior) P(Transitive) P(VP-gNP VP-gNP PP) = 017 P(TV verb) 26 P(Intransitive) P(VP-gNP VP PP-gNP) = 001 P(IV verb) 10 P(TV verb) P(TV) = 0065 P(TV verb) (1) P(IV verb) P(IV) = 001 P(IV verb) (2) van Schijndel Frequencies that matter September 7, / 69

141 model What is the probability of each interpretation? P(syntactic configuration) P(subcat bias)/p(preterminal prior) P(Transitive) P(VP-gNP VP-gNP PP) = 017 P(TV verb) 26 P(Intransitive) P(VP-gNP VP PP-gNP) = 001 P(IV verb) 10 P(TV verb) P(TV) = 0065 P(TV verb) (1) P(IV verb) P(IV) = 001 P(IV verb) (2) Pickering & Traxler (2003) experimentally determined subcat biases for a wide variety of verbs van Schijndel Frequencies that matter September 7, / 69

142 evaluation Pickering & Traxler (2003) (1) That s the plane that the pilot landed carefully behind in the fog at the airport (2) That s the truck that the pilot landed carefully behind in the fog at the airport Using Pickering & Traxler s (2003) subcat bias data: van Schijndel Frequencies that matter September 7, / 69

143 evaluation Pickering & Traxler (2003) (1) That s the plane that the pilot landed carefully behind in the fog at the airport (2) That s the truck that the pilot landed carefully behind in the fog at the airport Using Pickering & Traxler s (2003) subcat bias data: P(Transitive landed) = 0016 P(Intransitive landed) = 0004 van Schijndel Frequencies that matter September 7, / 69

The relation of surprisal and human processing

The relation of surprisal and human processing The relation of surprisal and human processing difficulty Information Theory Lecture Vera Demberg and Matt Crocker Information Theory Lecture, Universität des Saarlandes April 19th, 2015 Information theory

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

A Context-Free Grammar

A Context-Free Grammar Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic

More information

Journal of Memory and Language

Journal of Memory and Language Journal of Memory and Language 88 (2016) 133 143 Contents lists available at ScienceDirect Journal of Memory and Language journal homepage: www.elsevier.com/locate/jml Corrigendum Corrigendum to Do successor

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Features of Statistical Parsers

Features of Statistical Parsers Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Algorithms for Syntax-Aware Statistical Machine Translation

Algorithms for Syntax-Aware Statistical Machine Translation Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial

More information

X-bar theory. X-bar :

X-bar theory. X-bar : is one of the greatest contributions of generative school in the filed of knowledge system. Besides linguistics, computer science is greatly indebted to Chomsky to have propounded the theory of x-bar.

More information

The Infinite PCFG using Hierarchical Dirichlet Processes

The Infinite PCFG using Hierarchical Dirichlet Processes S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP  NP PP 1.0. N people 0. /6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5

More information

Processing/Speech, NLP and the Web

Processing/Speech, NLP and the Web CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[

More information

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences

More information

TnT Part of Speech Tagger

TnT Part of Speech Tagger TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation

More information

Multilevel Coarse-to-Fine PCFG Parsing

Multilevel Coarse-to-Fine PCFG Parsing Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa

More information

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation

Natural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing

More information

Lecture 5: UDOP, Dependency Grammars

Lecture 5: UDOP, Dependency Grammars Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks

More information

Computational Linguistics

Computational Linguistics Computational Linguistics Dependency-based Parsing Clayton Greenberg Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2016 Acknowledgements These slides

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith U. Washington CMU UT-Austin CMU Contributions 1. A new generative model for learning

More information

CS 6120/CS4120: Natural Language Processing

CS 6120/CS4120: Natural Language Processing CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Assignment/report submission

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing Computational Linguistics Dependency-based Parsing Dietrich Klakow & Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2013 Acknowledgements These slides

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

Introduction to Probablistic Natural Language Processing

Introduction to Probablistic Natural Language Processing Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a) Chapter 13 Statistical Parsg Given a corpus of trees, it is easy to extract a CFG and estimate its parameters. Every tree can be thought of as a CFG derivation, and we just perform relative frequency estimation

More information

Chapter 14 (Partially) Unsupervised Parsing

Chapter 14 (Partially) Unsupervised Parsing Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with

More information

Language Modeling. Michael Collins, Columbia University

Language Modeling. Michael Collins, Columbia University Language Modeling Michael Collins, Columbia University Overview The language modeling problem Trigram models Evaluating language models: perplexity Estimation techniques: Linear interpolation Discounting

More information

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding

More information

Recurrent neural network grammars

Recurrent neural network grammars Widespread phenomenon: Polarity items can only appear in certain contexts Recurrent neural network grammars lide credits: Chris Dyer, Adhiguna Kuncoro Example: anybody is a polarity item that tends to

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful

More information

Language Processing with Perl and Prolog

Language Processing with Perl and Prolog Language Processing with Perl and Prolog es Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ Pierre Nugues Language Processing with Perl and Prolog 1 / 12 Training

More information

Natural Language Processing SoSe Words and Language Model

Natural Language Processing SoSe Words and Language Model Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27 SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011 CMU Pronunciation

More information

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi) Natural Language Processing SoSe 2015 Language Modelling Dr. Mariana Neves April 20th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Motivation Estimation Evaluation Smoothing Outline 3 Motivation

More information

Statistical methods in NLP, lecture 7 Tagging and parsing

Statistical methods in NLP, lecture 7 Tagging and parsing Statistical methods in NLP, lecture 7 Tagging and parsing Richard Johansson February 25, 2014 overview of today's lecture HMM tagging recap assignment 3 PCFG recap dependency parsing VG assignment 1 overview

More information

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Language Modeling III Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Office hours on website but no OH for Taylor until next week. Efficient Hashing Closed address

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models 1 University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Language Models & Hidden Markov Models Stephan Oepen & Erik Velldal Language

More information

N-gram Language Modeling Tutorial

N-gram Language Modeling Tutorial N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures

More information

Introduction to Semantics. The Formalization of Meaning 1

Introduction to Semantics. The Formalization of Meaning 1 The Formalization of Meaning 1 1. Obtaining a System That Derives Truth Conditions (1) The Goal of Our Enterprise To develop a system that, for every sentence S of English, derives the truth-conditions

More information

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically NP S AUX VP Ali will V NP help D N the man A node is any point in the tree diagram and it can

More information

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018 1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic

More information

Effectiveness of complex index terms in information retrieval

Effectiveness of complex index terms in information retrieval Effectiveness of complex index terms in information retrieval Tokunaga Takenobu, Ogibayasi Hironori and Tanaka Hozumi Department of Computer Science Tokyo Institute of Technology Abstract This paper explores

More information

CS 545 Lecture XVI: Parsing

CS 545 Lecture XVI: Parsing CS 545 Lecture XVI: Parsing brownies_choco81@yahoo.com brownies_choco81@yahoo.com Benjamin Snyder Parsing Given a grammar G and a sentence x = (x1, x2,..., xn), find the best parse tree. We re not going

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 September 2, 2004 These supplementary notes review the notion of an inductive definition and

More information

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of

More information

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Roger Levy Probabilistic Models in the Study of Language draft, October 2, Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov Models Murhaf Fares & Stephan Oepen Language Technology Group (LTG) October 27, 2016 Recap: Probabilistic Language

More information

N-gram Language Modeling

N-gram Language Modeling N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical

More information

Better! Faster! Stronger*!

Better! Faster! Stronger*! Jason Eisner Jiarong Jiang He He Better! Faster! Stronger*! Learning to balance accuracy and efficiency when predicting linguistic structures (*theorems) Hal Daumé III UMD CS, UMIACS, Linguistics me@hal3.name

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 159/259 Lecture 12: Features and hypothesis tests (Oct 3, 2017) David Bamman, UC Berkeley Announcements No office hours for DB this Friday (email if you d like to chat)

More information

CS 662 Sample Midterm

CS 662 Sample Midterm Name: 1 True/False, plus corrections CS 662 Sample Midterm 35 pts, 5 pts each Each of the following statements is either true or false. If it is true, mark it true. If it is false, correct the statement

More information

The Language Modeling Problem (Fall 2007) Smoothed Estimation, and Language Modeling. The Language Modeling Problem (Continued) Overview

The Language Modeling Problem (Fall 2007) Smoothed Estimation, and Language Modeling. The Language Modeling Problem (Continued) Overview The Language Modeling Problem We have some (finite) vocabulary, say V = {the, a, man, telescope, Beckham, two, } 6.864 (Fall 2007) Smoothed Estimation, and Language Modeling We have an (infinite) set of

More information

Prenominal Modifier Ordering via MSA. Alignment

Prenominal Modifier Ordering via MSA. Alignment Introduction Prenominal Modifier Ordering via Multiple Sequence Alignment Aaron Dunlop Margaret Mitchell 2 Brian Roark Oregon Health & Science University Portland, OR 2 University of Aberdeen Aberdeen,

More information

10/17/04. Today s Main Points

10/17/04. Today s Main Points Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing

Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing Dependency Parsing Statistical NLP Fall 2016 Lecture 9: Dependency Parsing Slav Petrov Google prep dobj ROOT nsubj pobj det PRON VERB DET NOUN ADP NOUN They solved the problem with statistics CoNLL Format

More information

National Centre for Language Technology School of Computing Dublin City University

National Centre for Language Technology School of Computing Dublin City University with with National Centre for Language Technology School of Computing Dublin City University Parallel treebanks A parallel treebank comprises: sentence pairs parsed word-aligned tree-aligned (Volk & Samuelsson,

More information

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21

More information

Transformational Priors Over Grammars

Transformational Priors Over Grammars Transformational Priors Over Grammars Jason Eisner Johns Hopkins University July 6, 2002 EMNLP This talk is called Transformational Priors Over Grammars. It should become clear what I mean by a prior over

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

Bringing machine learning & compositional semantics together: central concepts

Bringing machine learning & compositional semantics together: central concepts Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 159/259 Lecture 15: Review (Oct 11, 2018) David Bamman, UC Berkeley Big ideas Classification Naive Bayes, Logistic regression, feedforward neural networks, CNN. Where does

More information

Semantics 2 Part 1: Relative Clauses and Variables

Semantics 2 Part 1: Relative Clauses and Variables Semantics 2 Part 1: Relative Clauses and Variables Sam Alxatib EVELIN 2012 January 17, 2012 Reviewing Adjectives Adjectives are treated as predicates of individuals, i.e. as functions from individuals

More information

SYNTHER A NEW M-GRAM POS TAGGER

SYNTHER A NEW M-GRAM POS TAGGER SYNTHER A NEW M-GRAM POS TAGGER David Sündermann and Hermann Ney RWTH Aachen University of Technology, Computer Science Department Ahornstr. 55, 52056 Aachen, Germany {suendermann,ney}@cs.rwth-aachen.de

More information

Lecture 9: Hidden Markov Model

Lecture 9: Hidden Markov Model Lecture 9: Hidden Markov Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501 Natural Language Processing 1 This lecture v Hidden Markov

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 August 28, 2003 These supplementary notes review the notion of an inductive definition and give

More information

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP Recap: Language models Foundations of atural Language Processing Lecture 4 Language Models: Evaluation and Smoothing Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipp

More information

Maximum Entropy Models for Natural Language Processing

Maximum Entropy Models for Natural Language Processing Maximum Entropy Models for Natural Language Processing James Curran The University of Sydney james@it.usyd.edu.au 6th December, 2004 Overview a brief probability and statistics refresher statistical modelling

More information

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

NLP Homework: Dependency Parsing with Feed-Forward Neural Network NLP Homework: Dependency Parsing with Feed-Forward Neural Network Submission Deadline: Monday Dec. 11th, 5 pm 1 Background on Dependency Parsing Dependency trees are one of the main representations used

More information