Sharpening the empirical claims of generative syntax through formalization

Size: px
Start display at page:

Download "Sharpening the empirical claims of generative syntax through formalization"

Transcription

1 Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014

2 Part 1: Grammars and cognitive hypotheses What is a grammar? What can grammars do? Concrete illustration of a target: Surprisal Parts 2 4: Assembling the pieces Minimalist Grammars (MGs) MGs and MCFGs Probabilities on MGs Part 5: Learning and wrap-up Something slightly different: Learning model Recap and open questions

3 Sharpening the empirical claims of generative syntax through formalization Tim Hunter NASSLLI, June 2014 Part 4 Probabilities on MG Derivations

4 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 129 / 196

5 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 130 / 196

6 Probabilistic CFGs What are the probabilities of the derivations? = What are the values of λ 1, λ 2, etc.? λ 1 λ 2 λ 3 λ 4 λ 5 λ 6 λ 7 λ 8 S NP VP NP John NP Mary VP ran VP V NP VP V S V believed V knew 131 / 196

7 Probabilistic CFGs What are the probabilities of the derivations? = What are the values of λ 1, λ 2, etc.? λ 1 λ 2 λ 3 λ 4 λ 5 λ 6 λ 7 λ 8 S NP VP NP John NP Mary VP ran VP V NP VP V S V believed V knew Training algorithm Training corpus 1.0 S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew λ 5 = count(vp V NP) count(vp) 131 / 196

8 MCFG for an entire Minimalist Grammar Lexical items: ɛ :: =t +wh c 1 ɛ :: =t c 1 will :: =v =d t 1 often :: =v v 1 praise :: =d v 1 marie :: d 1 pierre :: d 1 who :: d -wh 1 Production rules: st, u :: +wh c, -wh 0 s :: =t +wh c 1 t, u :: t, -wh 0 st :: =d t 0 s :: =v =d t 1 t :: v 0 st, u :: =d t, -wh 0 s :: =v =d t 1 t, u :: v, -wh 0 ts :: c 0 s, t :: +wh c, -wh 0 st :: c 0 s :: =t c 1 t :: t 0 ts :: t 0 s :: =d t 0 t :: d 1 ts, u :: t, -wh 0 s, u :: =d t, -wh 0 t :: d 1 st :: v 0 s :: =d v 1 t :: d 1 st :: v 0 s :: =v v 1 t :: v 0 s, t :: v, -wh 0 s :: =d v 1 t :: d -wh 1 st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh / 196

9 Probabilities on MCFGs λ 1 ts :: c 0 s, t :: +wh c, -wh 0 λ 2 st :: c 0 s :: =t c 1 t :: t 0 λ 3 st :: v 0 s :: =d v 1 t :: d 1 λ 4 st :: v 0 s :: =v v 1 t :: v 0 λ 5 s, t :: v, -wh 0 s :: =d v 1 t :: d -wh 1 λ 6 st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 The context-free backbone for MG derivations identifies a parametrization for probability distributions over them. λ 2 = count( c 0 =t c 1 t 0 ) count ( c 0 ) Plus: It turns out that the intersect-with-an-fsa trick we used for CFGs also works for MCFGs! 133 / 196

10 Grammar intersection example (simple) 1.0 S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew Mary believed * 1.0 S 0,2 NP 0,1 VP 1,2 0.7 NP 0,1 Mary 0.5 VP 1,2 V 1,2 NP 2,2 0.3 VP 1,2 V 1,2 S 2,2 0.4 V 1,2 believed S0,2 S0,2 1.0 S 2,2 NP 2,2 VP 2,2 0.3 NP 2,2 John 0.7 NP 2,2 Mary 0.2 VP 2,2 ran 0.5 VP 2,2 V 2,2 NP 2,2 0.3 VP 2,2 V 2,2 S 2,2 0.4 V 2,2 believed 0.6 V 2,2 knew NP0,1 Mary VP1,2 NP0,1 Mary VP1,2 V1,2 NP2,2 believed V1,2 believed S2,2 NB: Total weight in this grammar is not one! (What is it? Start symbol is S 0,2.) Each derivation has the weight it had in the original grammar. 134 / 196

11 Beyond context-free t 1t 2 :: S t 1, t 2 :: P t 1u 1, t 2u 2 :: P t 1, t 2 :: P u 1, u 2 :: E ɛ, ɛ :: P a, a :: E b, b :: E { ww w {a, b} } aabaaaba :: S aaba,aaba :: P aab,aab :: P a,a :: E aa,aa :: P b,b :: E a,a :: P a,a :: E ɛ, ɛ :: P a,a :: E Unlike in a CFG, we can ensure that the two halves are extended in the same ways without concatenating them together. 135 / 196

12 Intersection with an MCFG S 0,2 P 0,1;1,2 P 0,1;1,2 P e;e E 0,1;1,2 E 0,1;1,2 A 0,1 A 1,2 S 0,2 P 0,2;2,2 P 0,2;2,2 P 0,2;2,2 E 2,2;2,2 P 0,2;2,2 P 0,1;2,2 E 1,2;2,2 P 0,1;2,2 P e;2,2 E 0,1;2,2 E 0,1;2,2 A 0,1 A 2,2 E 1,2;2,2 A 1,2 A 2,2 b,b :: E 2,2;2,2 a,a :: E 2,2;2,2 ɛ, ɛ :: P e;e ɛ, ɛ :: P e;2,2 a :: A 2,2 b :: B 2,2 a :: A 0,1 a :: A 1,2 S 0,2 S 0,2 P 0,2;2,2 P 0,1;1,2 P 0,2;2,2 E 2,2;2,2 P e;e E 0,1;1,2 P 0,1;2,2 E 1,2;2,2 A 0,1 A 1,2 P e;2,2 E 0,1;2,2 A 1,2 A 2,2 A 0,1 A 2,2 136 / 196

13 Intersection grammars 1.0 S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew Mary believed * = G S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew Mary believed John * = G 3 surprisal at John = log P(W 3 = John W 1 = Mary, W 2 = believed) total weight in G3 = log total weight in G 2 = log = / 196

14 Surprisal and entropy reduction surprisal at John = log P(W 3 = John W 1 = Mary, W 2 = believed) total weight in G3 = log total weight in G 2 entropy reduction at John = (entropy of G 2) (entropy of G 3) 138 / 196

15 Computing sum of weights in a grammar ( partition function ) Z(A) = ( ) p(a α) Z(α) A α Z(ɛ) = 1 Z(aβ) = Z(β) Z(Bβ) = Z(B) Z(β) where β ɛ (Nederhof and Satta 2008) 1.0 S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.4 V believed 0.6 V knew Z(V) = = 1.0 Z(NP) = = 1.0 Z(VP) = (0.5 Z(V) Z(NP)) = ( ) = 0.7 Z(S) = 1.0 Z(NP) Z(VP) = S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew Z(V) = = 1.0 Z(NP) = = 1.0 Z(VP) = (0.5 Z(V) Z(NP)) + (0.3 Z(V) Z(S)) Z(S) = 1.0 Z(NP) Z(VP) 139 / 196

16 Computing entropy of a grammar 1.0 S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew h(s) = 0 h(np) = entropy of (0.3, 0.7) h(vp) = entropy of (0.2, 0.5, 0.3) h(v) = entropy of (0.4, 0.6) H(S) = h(s) + 1.0(H(NP) + H(VP)) H(NP) = h(np) H(VP) = h(vp) + 0.2(0) + 0.5(H(V) + H(NP)) + 0.3(H(V) + H(S)) H(V) = h(v) (Hale 2006) 140 / 196

17 Surprisal and entropy reduction surprisal at John = log P(W 3 = John W 1 = Mary, W 2 = believed) total weight in G3 = log total weight in G 2 entropy reduction at John = (entropy of G 2) (entropy of G 3) 141 / 196

18 Putting it all together (Hale 2006) We can now put entropy reduction/surprisal together with a minimalist grammar to produce predictions about sentence comprehension difficulty! complexity metric + grammar prediction Write an MG that generates sentence types of interest Convert MG to an MCFG Add probabilities to MCFG based on corpus frequencies (or whatever else) Compute intersection grammars for each point in a sentence Calculate reduction in entropy across the course of the sentence (i.e. workload) 142 / 196

19 Demo 143 / 196

20 Hale (2006) 144 / 196

21 Hale (2006) they have -ed forget -en that the boy who tell -ed the story be -s so young the fact that the girl who pay -ed for the ticket be -s very poor doesnt matter I know that the girl who get -ed the right answer be -s clever he remember -ed that the man who sell -ed the house leave -ed the town they have -ed forget -en that the letter which Dick write -ed yesterday be -s long the fact that the cat which David show -ed to the man like -s eggs be -s strange I know that the dog which Penny buy -ed today be -s very gentle he remember -ed that the sweet which David give -ed Sally be -ed a treat they have -ed forget -en that the man who Ann give -ed the present to be -ed old the fact that the boy who Paul sell -ed the book to hate -s reading be -s strange I know that the man who Stephen explain -ed the accident to be -s kind he remember -ed that the dog which Mary teach -ed the trick to be -s clever they have -ed forget -en that the box which Pat bring -ed the apple in be -ed lost the fact that the girl who Sue write -ed the story with be -s proud doesnt matter I know that the ship which my uncle take -ed Joe on be -ed interesting he remember -ed that the food which Chris pay -ed the bill for be -ed cheap they have -ed forget -en that the girl whose friend buy -ed the cake be -ed wait -ing the fact that the boy whose brother tell -s lies be -s always honest surprise -ed us I know that the boy whose father sell -ed the dog be -ed very sad he remember -ed that the girl whose mother send -ed the clothe come -ed too late they have -ed forget -en that the man whose house Patrick buy -ed be -ed so ill the fact that the sailor whose ship Jim take -ed have -ed one leg be -s important I know that the woman whose car Jenny sell -ed be -ed very angry he remember -ed that the girl whose picture Clare show -ed us be -ed pretty 144 / 196

22 Hale (2006) 144 / 196

23 Hale (2006) 144 / 196

24 Hale (2006) Hale actually wrote two different MGs: classical adjunction analysis of relative clauses Kaynian/promotion analysis 145 / 196

25 Hale (2006) Hale actually wrote two different MGs: classical adjunction analysis of relative clauses Kaynian/promotion analysis The branching structure of the two MCFGs was different enough to produce distinct Entropy Reduction predictions. (Same corpus counts!) The Kaynian/promotion analysis produced a better fit for the Accessibility Hierarchy facts. (i.e. holding the complexity metric fixed to argue for a grammar) But there are some ways in which this method is insensitive to fine details of the MG formalism. 145 / 196

26 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 146 / 196

27 Subtlely different minimalist frameworks Minimalist grammars with many choices of different bells and whistles can all be expressed with context-free derivational structure. Must keep an eye on finiteness of number of types (SMC or equivalent)! See Stabler (2011) Some points of variation: adjunction head movement phases move as re-merge / 196

28 How to deal with adjuncts? A normal application of merge? < often :: v < eat :: cake :: often :: =v v < eat :: v cake :: Or a new kind of feature and distinct operation adjoin? > often :: < eat :: v cake :: often :: *v < eat :: v cake :: 148 / 196

29 How to implement head movement? Modify merge to allow some additional string-shuffling in head-complement relationships? > ɛ :: +k t < eats :: cake :: -s :: =v +k t < eat :: v cake :: 149 / 196

30 How to implement head movement? Modify merge to allow some additional string-shuffling in head-complement relationships? > ɛ :: +k t < eats :: cake :: -s :: =v +k t < eat :: v cake :: Or some combination of normal phrasal movements? (Koopman and Szabolcsi 2000) T T -s DP cake XP X X VP V eat t 149 / 196

31 How to implement head movement? Modify merge to allow some additional string-shuffling in head-complement relationships? > ɛ :: +k t < eats :: cake :: -s :: =v +k t < eat :: v cake :: Or some combination of normal phrasal movements? (Koopman and Szabolcsi 2000) T T -s DP cake XP X X VP V eat t 149 / 196

32 Successive cyclic movement? c c +wh c, -wh +wh c, -wh ɛ :: =t +wh c t, -wh ɛ :: =t +wh c t, -wh +k t, -k, -wh +k t, -k, -wh will :: =v +k t v, -k, -wh will :: =v +k t v, -k, -wh =subj v, -wh Mary :: subj -k =subj v, -wh Mary :: subj -k think :: =t =subj v t, -wh think :: =t =subj v t, -wh +wh t, -wh -wh +k t, -k, -wh +k +wh t, -k, -wh -wh will :: =v +k t v, -k, -wh will :: =v +k +wh t v, -k, -wh -wh =subj v, -wh John :: subj -k =subj v, -wh -wh John :: subj -k eat :: =obj =subj v what :: obj -wh eat :: =obj =subj v what :: obj -wh -wh 150 / 196

33 Successive cyclic movement? c c +wh c, -wh +wh c, -wh ɛ :: =t +wh c t, -wh ɛ :: =t +wh c t, -wh +k t, -k, -wh +k t, -k, -wh will :: =v +k t v, -k, -wh will :: =v +k t v, -k, -wh =subj v, -wh Mary :: subj -k =subj v, -wh Mary :: subj -k think :: =t =subj v t, -wh think :: =t =subj v t, -wh +wh t, -wh -wh +k t, -k, -wh +k +wh t, -k, -wh -wh will :: =v +k t v, -k, -wh will :: =v +k +wh t v, -k, -wh -wh =subj v, -wh John :: subj -k =subj v, -wh -wh John :: subj -k eat :: =obj =subj v what :: obj -wh eat :: =obj =subj v what :: obj -wh -wh 150 / 196

34 Unifying feature-checking (one way) John will seem to eat cake move will seem to eat cake, John merge t 0 move +k t, -k 0 merge will seem to eat cake, John =v +k t 1 v, -k 0 John will seem to eat cake mrg will seem to eat cake, John mrg will, seem to eat cake, John insert -t 0 mrg +k -t, -k 0 mrg +v +k -t, v, -k 1 insert will will seem to eat cake, John +v +k -t 1 v, -k 0 (Stabler 2006, Hunter 2011) 151 / 196

35 Unifying feature-checking (one way) John will seem to eat cake move will seem to eat cake, John merge t 0 move +k t, -k 0 merge will seem to eat cake, John =v +k t 1 v, -k 0 John will seem to eat cake mrg will seem to eat cake, John mrg will, seem to eat cake, John insert -t 0 mrg +k -t, -k 0 mrg +v +k -t, v, -k 1 insert will will seem to eat cake, John +v +k -t 1 v, -k 0 (Stabler 2006, Hunter 2011) 151 / 196

36 Three schemas for merge rules: st, t 1,..., t k :: γ, α 1,..., α k 0 s :: =fγ 1 t, t 1,..., t k :: f, α 1,..., α k n ts, s 1,..., s j, t 1,..., t k :: γ, α 1,..., α j, β 1,..., β k 0 s, s 1,..., s j :: =fγ, α 1,..., α j 0 t, t 1,..., t k :: f, β 1,..., β k n s, s 1,..., s j, t, t 1,..., t k :: γ, α 1,..., α j, δ, β 1,..., β k 0 s, s 1,..., s j :: =fγ, α 1,..., α j n t, t 1,..., t k :: fδ, β 1,..., β k n Two schemas for merge rules: s is, s 1,..., s i 1, s i+1,..., s k :: γ, α 1,..., α i 1, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: +fγ, α 1,..., α i 1, -f, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: γ, α 1,..., α i 1, δ, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: +fγ, α 1,..., α i 1, -fδ, α i+1,..., α k / 196

37 One schema for insert rules: s, s 1,..., s j, t, t 1,..., t k :: +fγ, α 1,..., α j, -fγ, β 1,..., β k n s, s 1,..., s j :: +fγ, α 1,..., α j n t, t 1,..., t k :: -fγ, β 1,..., β k n Three schemas for mrg rules: ss i, s 1,..., s i 1, s i+1,..., s k :: γ, α 1,..., α i 1, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: +fγ, α 1,..., α i 1, -f, α i+1,..., α k 1 s is, s 1,..., s i 1, s i+1,..., s k :: γ, α 1,..., α i 1, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: +fγ, α 1,..., α i 1, -f, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: γ, α 1,..., α i 1, δ, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: +fγ, α 1,..., α i 1, -fδ, α i+1,..., α k / 196

38 Subtlely different minimalist frameworks Minimalist grammars with many choices of different bells and whistles can all be expressed with context-free derivational structure. Must keep an eye on finiteness of number of types (SMC or equivalent)! See Stabler (2011) Some points of variation: adjunction head movement phases move as re-merge / 196

39 Subtlely different minimalist frameworks Minimalist grammars with many choices of different bells and whistles can all be expressed with context-free derivational structure. Must keep an eye on finiteness of number of types (SMC or equivalent)! See Stabler (2011) Some points of variation: adjunction head movement phases move as re-merge... Each variant of the formalism expresses a different hypothesis about the set of primitive grammatical operations. (We are looking for ways to tell these apart!) The shapes of the derivation trees are generally very similar from one variant to the next. But variants will make different classifications of the derivational steps involved, according to which operation is being applied. 154 / 196

40 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 155 / 196

41 Probabilities on MCFGs λ 1 ts :: c 0 s, t :: +wh c, -wh 0 λ 2 st :: c 0 s :: =t c 1 t :: t 0 λ 3 st :: v 0 s :: =d v 1 t :: d 1 λ 4 st :: v 0 s :: =v v 1 t :: v 0 λ 5 s, t :: v, -wh 0 s :: =d v 1 t :: d -wh 1 λ 6 st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 Training question: What values of λ 1, λ 2, etc. make the training corpus most likely? 156 / 196

42 Problem #1 with the naive parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise 157 / 196

43 Problem #1 with the naive parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh / 196

44 Generalizations missed by the naive parametrization > v often :: v < praise :: marie :: st :: v 0 s :: =v v 1 t :: v 0 often :: =v v < praise :: v v marie :: 158 / 196

45 Generalizations missed by the naive parametrization > v often :: v < praise :: marie :: st :: v 0 s :: =v v 1 t :: v 0 often :: =v v < praise :: v v marie :: > v, -wh often :: v < praise :: who :: -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 often :: =v v < praise :: v v, -wh who :: -wh 158 / 196

46 Problem #1 with the naive parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh / 196

47 Problem #1 with the naive parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh ) count ( v 0 =d v 1 d 1 ( ) = 95 count v ) count ( v,-wh 0 =d v 1 d -wh 1 ( ) = 2 count v,-wh 0 3 This training setup doesn t know which minimalist-grammar operations are being implemented by the various MCFG rules. 159 / 196

48 Naive parametrization G A Naive parametrization Training corpus 160 / 196

49 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 161 / 196

50 A (slightly) more complicated grammar ɛ :: =t c ɛ :: =t +wh c will :: =v =subj t shave :: v shave :: =obj v boys :: subj who :: subj -wh boys :: =x =det subj ɛ :: x foo :: det themselves :: =ant obj ɛ :: =subj ant -subj will :: =v +subj t boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Some details: Subject is base-generated in SpecTP; no movement for Case Transitive and intransitive versions of shave foo is a determiner that optionally combines with boys to make a subject Dummy feature x to fill complement of boys so that foo goes on the left themselves can appear in object position, via a movement theory of reflexives A subj can be turned into an ant -subj themselves combines with an ant to make an obj will can attract its subject by move as well as merge 162 / 196

51 > foo :: < boys :: subj ɛ :: < foo :: det boys :: =det subj ɛ :: boys :: =x =det subj ɛ :: x < themselves :: obj < ɛ :: -subj boys :: themselves :: =ant obj < ɛ :: ant -subj boys :: ɛ :: =subj ant -subj boys :: subj 163 / 196

52 Choice points in the MG-derived MCFG Question or not? c 0 =t c 0 t 0 c 0 +wh c, -wh 0 Antecedent lexical or complex? ant -subj 0 =subj ant -subj 1 subj 0 ant -subj 0 =subj ant -subj 1 subj 1 Non-wh subject merged and complex, merged and lexical, or moved? t 0 =subj t 0 subj 0 t 0 =subj t 0 subj 1 t 0 +subj t, -subj 0 Wh-phrase same as moving subject or separated because of doubling? t, -wh 0 =subj t 0 subj -wh 1 t, -wh 0 +subj t, -subj, -wh / 196

53 Choice points in the IMG-derived MCFG Question or not? -c 0 +t -c, -t 1 -c 0 +wh -c, -wh 0 Antecedent lexical or complex? +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 0 +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 1 Non-wh subject merged and complex, merged and lexical, or moved? +subj -t, -subj 0 +subj -t 0 -subj 0 +subj -t, -subj 0 +subj -t 0 -subj 1 +subj -t, -subj 0 +v +subj -t, -v, -subj 1 Wh-phrase same as moving subject or separated because of doubling? -t, -wh 0 +subj -t, -subj -wh 0 -t, -wh 0 +subj -t, -subj, -wh / 196

54 Problem #2 with the naive parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave 166 / 196

55 Problem #2 with the naive parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave With merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves who will shave themselves With merge and move as unified operations: boys will shave foo boys will shave who will shave boys will shave themselves who will shave themselves 166 / 196

56 Problem #2 with the naive parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave With merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves who will shave themselves With merge and move as unified operations: boys will shave foo boys will shave who will shave boys will shave themselves who will shave themselves This treatment of probabilities doesn t know which minimalist-grammar operations are being implemented by the various MCFG rules. So the probabilities are unaffected by changes in set of primitive operations. 166 / 196

57 Naive parametrization G A Naive parametrization Training corpus G B1 Naive parametrization Training corpus G B2 Naive parametrization / 196

58 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 168 / 196

59 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) (Hunter and Dyer 2013) 169 / 196

60 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) MCFG Rule φ merge φ d φ v φ t φ move φ wh st :: c 0 s :: =t c 1 t :: t ts :: c 0 s, t :: +wh c, -wh st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh Each rule r is assigned a score as a function of the vector φ(r): s(r) = exp(λ φ(r)) = exp(λ merge φ merge(r) + λ d φ d(r) + λ v φ v(r) +... ) (Hunter and Dyer 2013) 169 / 196

61 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) MCFG Rule φ merge φ d φ v φ t φ move φ wh st :: c 0 s :: =t c 1 t :: t ts :: c 0 s, t :: +wh c, -wh st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh Each rule r is assigned a score as a function of the vector φ(r): s(r) = exp(λ φ(r)) = exp(λ merge φ merge(r) + λ d φ d(r) + λ v φ v(r) +... ) s(r 1 ) = exp(λ merge + λ t) (Hunter and Dyer 2013) 169 / 196

62 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) MCFG Rule φ merge φ d φ v φ t φ move φ wh st :: c 0 s :: =t c 1 t :: t ts :: c 0 s, t :: +wh c, -wh st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh Each rule r is assigned a score as a function of the vector φ(r): s(r) = exp(λ φ(r)) = exp(λ merge φ merge(r) + λ d φ d(r) + λ v φ v(r) +... ) s(r 1 ) = exp(λ merge + λ t) s(r 2 ) = exp(λ move + λ wh) (Hunter and Dyer 2013) 169 / 196

63 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) MCFG Rule φ merge φ d φ v φ t φ move φ wh st :: c 0 s :: =t c 1 t :: t ts :: c 0 s, t :: +wh c, -wh st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh Each rule r is assigned a score as a function of the vector φ(r): s(r) = exp(λ φ(r)) = exp(λ merge φ merge(r) + λ d φ d(r) + λ v φ v(r) +... ) s(r 1 ) = exp(λ merge + λ t) s(r 2 ) = exp(λ move + λ wh) s(r 3 ) = exp(λ merge + λ d) (Hunter and Dyer 2013) 169 / 196

64 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) MCFG Rule φ merge φ d φ v φ t φ move φ wh st :: c 0 s :: =t c 1 t :: t ts :: c 0 s, t :: +wh c, -wh st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh Each rule r is assigned a score as a function of the vector φ(r): s(r) = exp(λ φ(r)) = exp(λ merge φ merge(r) + λ d φ d(r) + λ v φ v(r) +... ) s(r 1 ) = exp(λ merge + λ t) s(r 2 ) = exp(λ move + λ wh) s(r 3 ) = exp(λ merge + λ d) s(r 5 ) = exp(λ merge + λ d) (Hunter and Dyer 2013) 169 / 196

65 Generalizations missed by the naive parametrization > v often :: v < praise :: marie :: st :: v 0 s :: =v v 1 t :: v 0 often :: =v v < praise :: v v marie :: 170 / 196

66 Generalizations missed by the naive parametrization > v often :: v < praise :: marie :: st :: v 0 s :: =v v 1 t :: v 0 often :: =v v < praise :: v v marie :: > v, -wh often :: v < praise :: who :: -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 often :: =v v < praise :: v v, -wh who :: -wh 170 / 196

67 Comparison The old way: λ 1 ts :: c 0 s, t :: +wh c, -wh 0 λ 2 st :: c 0 s :: =t c 1 t :: t 0 λ 3 st :: v 0 s :: =d v 1 t :: d 1 λ 4 st :: v 0 s :: =v v 1 t :: v 0 λ 5 s, t :: v, -wh 0 s :: =d v 1 t :: d -wh 1 λ 6 st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 Training question: What values of λ 1, λ 2, etc. make the training corpus most likely? The new way: exp(λ move + λ wh) ts :: c 0 s, t :: +wh c, -wh 0 exp(λ merge + λ t) st :: c 0 s :: =t c 1 t :: t 0 exp(λ merge + λ d) st :: v 0 s :: =d v 1 t :: d 1 exp(λ merge + λ v) st :: v 0 s :: =v v 1 t :: v 0 exp(λ merge + λ d) s, t :: v, -wh 0 s :: =d v 1 t :: d -wh 1 exp(λ merge + λ v) st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 Training question: What values of λ merge, λ move, λ d, etc. make the training corpus most likely? 171 / 196

68 Solution #1 with the smarter parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise Maximise likelihood via stochastic gradient ascent: exp(λ φ(n δ)) P λ (N δ) = exp(λ φ(n δ )) 172 / 196

69 Solution #1 with the smarter parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise Maximise likelihood via stochastic gradient ascent: exp(λ φ(n δ)) P λ (N δ) = exp(λ φ(n δ )) naive smarter st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh / 196

70 Naive parametrization Smarter parametrization G A Naive parametrization G A Smarter parametrization Training corpus Training corpus G B1 Naive parametrization Training corpus G B2 Naive parametrization / 196

71 Solution #2 with the smarter parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave 174 / 196

72 Solution #2 with the smarter parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave Merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves foo boys will shave themselves who will shave themselves 174 / 196

73 Solution #2 with the smarter parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave Merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves foo boys will shave themselves who will shave themselves Merge and move unified: boys will shave foo boys will shave who will shave who will shave themselves boys will shave themselves foo boys will shave themselves 174 / 196

74 Solution #2 with the smarter parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave Merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves foo boys will shave themselves who will shave themselves Entropy Entropy Reduction 2.09 who will shave themselves Merge and move unified: boys will shave foo boys will shave who will shave who will shave themselves boys will shave themselves foo boys will shave themselves Entropy Entropy Reduction 2.13 who will shave themselves / 196

75 Solution #2 with the smarter parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave Merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves foo boys will shave themselves who will shave themselves Merge and move unified: boys will shave foo boys will shave who will shave who will shave themselves boys will shave themselves foo boys will shave themselves 174 / 196

76 Naive parametrization Smarter parametrization G A Naive parametrization G A Smarter parametrization Training corpus Training corpus G B1 Naive parametrization G B1 Smarter parametrization Training corpus Training corpus G B2 Naive parametrization G B2 Smarter parametrization / 196

77 Choice points in the MG-derived MCFG Question or not? c 0 =t c 0 t 0 c 0 +wh c, -wh 0 Antecedent lexical or complex? ant -subj 0 =subj ant -subj 1 subj 0 ant -subj 0 =subj ant -subj 1 subj 1 Non-wh subject merged and complex, merged and lexical, or moved? t 0 =subj t 0 subj 0 t 0 =subj t 0 subj 1 t 0 +subj t, -subj 0 Wh-phrase same as moving subject or separated because of doubling? t, -wh 0 =subj t 0 subj -wh 1 t, -wh 0 +subj t, -subj, -wh / 196

78 Choice points in the MG-derived MCFG Question or not? c 0 =t c 0 t 0 exp(λ merge + λ t) c 0 +wh c, -wh 0 exp(λ move + λ wh) Antecedent lexical or complex? ant -subj 0 =subj ant -subj 1 subj 0 exp(λ merge + λ subj) ant -subj 0 =subj ant -subj 1 subj 1 exp(λ merge + λ subj) Non-wh subject merged and complex, merged and lexical, or moved? t 0 =subj t 0 subj 0 exp(λ merge + λ subj) t 0 =subj t 0 subj 1 exp(λ merge + λ subj) t 0 +subj t, -subj 0 exp(λ move + λ subj) Wh-phrase same as moving subject or separated because of doubling? t, -wh 0 =subj t 0 subj -wh 1 exp(λ merge + λ subj) t, -wh 0 +subj t, -subj, -wh 0 exp(λ move + λ subj) 176 / 196

79 Learned weights on the MG λ t = exp(λ t) = λ subj = exp(λ v) = λ wh = exp(λ wh) = λ merge = exp(λ merge) = λ move = exp(λ move) = / 196

80 Learned weights on the MG λ t = exp(λ t) = λ subj = exp(λ v) = λ wh = exp(λ wh) = λ merge = exp(λ merge) = P(antecedent is lexical) = 0.5 P(antecedent is non-lexical) = 0.5 P(wh-phrase reflexivized) = exp(λ move) exp(λ merge) + exp(λ move) = λ move = exp(λ move) = exp(λ merge) P(wh-phrase non-reflexivized) = exp(λ merge) + exp(λ = move) P(question) = P(non-question) = exp(λ move + λ wh) exp(λ merge + λ t) + exp(λ move + λ wh) = exp(λ merge + λ t) exp(λ merge + λ t) + exp(λ move + λ wh) = P(non-wh subject merged and complex) = P(non-wh subject merged and lexical) = P(non-wh subject moved) = exp(λ merge) exp(λ merge) + exp(λ merge) + exp(λ move) = exp(λ merge) exp(λ merge) + exp(λ merge) + exp(λ move) = exp(λ move) exp(λ merge) + exp(λ merge) + exp(λ move) = / 196

81 Learned weights on the MG λ t = exp(λ t) = λ subj = exp(λ v) = λ wh = exp(λ wh) = λ merge = exp(λ merge) = P(antecedent is lexical) = 0.5 P(antecedent is non-lexical) = 0.5 P(wh-phrase reflexivized) = exp(λ move) exp(λ merge) + exp(λ move) = λ move = exp(λ move) = exp(λ merge) P(wh-phrase non-reflexivized) = exp(λ merge) + exp(λ = move) P(question) = P(non-question) = exp(λ move + λ wh) exp(λ merge + λ t) + exp(λ move + λ wh) = exp(λ merge + λ t) exp(λ merge + λ t) + exp(λ move + λ wh) = P(non-wh subject merged and complex) = P(non-wh subject merged and lexical) = P(non-wh subject moved) = exp(λ merge) exp(λ merge) + exp(λ merge) + exp(λ move) = exp(λ merge) exp(λ merge) + exp(λ merge) + exp(λ move) = exp(λ move) exp(λ merge) + exp(λ merge) + exp(λ move) = P(who will shave) = = P(boys will shave themselves) = = / 196

82 Choice points in the IMG-derived MCFG Question or not? -c 0 +t -c, -t 1 -c 0 +wh -c, -wh 0 Antecedent lexical or complex? +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 0 +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 1 Non-wh subject merged and complex, merged and lexical, or moved? +subj -t, -subj 0 +subj -t 0 -subj 0 +subj -t, -subj 0 +subj -t 0 -subj 1 +subj -t, -subj 0 +v +subj -t, -v, -subj 1 Wh-phrase same as moving subject or separated because of doubling? -t, -wh 0 +subj -t, -subj -wh 0 -t, -wh 0 +subj -t, -subj, -wh / 196

83 Choice points in the IMG-derived MCFG Question or not? -c 0 +t -c, -t 1 exp(λ mrg + λ t) -c 0 +wh -c, -wh 0 exp(λ mrg + λ wh) Antecedent lexical or complex? +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 0 exp(λ insert) +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 1 exp(λ insert) Non-wh subject merged and complex, merged and lexical, or moved? +subj -t, -subj 0 +subj -t 0 -subj 0 exp(λ insert) +subj -t, -subj 0 +subj -t 0 -subj 1 exp(λ insert) +subj -t, -subj 0 +v +subj -t, -v, -subj 1 exp(λ mrg + λ v) Wh-phrase same as moving subject or separated because of doubling? -t, -wh 0 +subj -t, -subj -wh 0 exp(λ mrg + λ subj) -t, -wh 0 +subj -t, -subj, -wh 0 exp(λ mrg + λ subj) 178 / 196

84 Learned weights on the IMG λ t = exp(λ t) = λ v = exp(λ v) = λ wh = exp(λ wh) = λ insert = exp(λ insert) = λ mrg = exp(λ mrg) = / 196

85 Learned weights on the IMG λ t = exp(λ t) = λ v = exp(λ v) = λ wh = exp(λ wh) = λ insert = exp(λ insert) = λ mrg = exp(λ mrg) = P(antecedent is lexical) = 0.5 P(antecedent is non-lexical) = 0.5 P(wh-phrase reflexivized) = 0.5 P(wh-phrase non-reflexivized) = 0.5 P(question) = P(non-question) = exp(λ mrg + λ wh) exp(λ mrg + λ t) + exp(λ mrg + λ = exp(λ wh) wh) exp(λ t) + exp(λ = wh) exp(λ mrg + λ t) exp(λ mrg + λ t) + exp(λ mrg + λ = exp(λ t) wh) exp(λ t) + exp(λ = wh) P(non-wh subject merged and lexical) = P(non-wh subject merged and complex) = P(non-wh subject moved) = exp(λ insert) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = exp(λ insert) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = exp(λ mrg + λ v) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = / 196

86 Learned weights on the IMG λ t = exp(λ t) = λ v = exp(λ v) = λ wh = exp(λ wh) = λ insert = exp(λ insert) = λ mrg = exp(λ mrg) = P(antecedent is lexical) = 0.5 P(antecedent is non-lexical) = 0.5 P(wh-phrase reflexivized) = 0.5 P(wh-phrase non-reflexivized) = 0.5 P(question) = P(non-question) = exp(λ mrg + λ wh) exp(λ mrg + λ t) + exp(λ mrg + λ = exp(λ wh) wh) exp(λ t) + exp(λ = wh) exp(λ mrg + λ t) exp(λ mrg + λ t) + exp(λ mrg + λ = exp(λ t) wh) exp(λ t) + exp(λ = wh) P(non-wh subject merged and lexical) = P(non-wh subject merged and complex) = P(non-wh subject moved) = exp(λ insert) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = exp(λ insert) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = exp(λ mrg + λ v) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = P(who will shave) = = P(boys will shave themselves) = = / 196

87 References I Billot, S. and Lang, B. (1989). The structure of shared forests in ambiguous parsing. In Proceedings of the 1989 Meeting of the Association of Computational Linguistics. Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press, Cambridge, MA. Chomsky, N. (1980). Rules and Representations. Columbia University Press, New York. Ferreira, F. (2005). Psycholinguistics, formal grammars, and cognitive science. The Linguistic Review, 22: Gärtner, H.-M. and Michaelis, J. (2010). On the Treatment of Multiple-Wh Interrogatives in Minimalist Grammars. In Hanneforth, T. and Fanselow, G., editors, Language and Logos, pages Akademie Verlag, Berlin. Gibson, E. and Wexler, K. (1994). Triggers. Linguistic Inquiry, 25: Hale, J. (2006). Uncertainty about the rest of the sentence. Cognitive Science, 30:643 Âŋ672. Hale, J. T. (2001). A probabilistic earley parser as a psycholinguistic model. In Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics. Hunter, T. (2011). Insertion Minimalist Grammars: Eliminating redundancies between merge and move. In Kanazawa, M., Kornai, A., Kracht, M., and Seki, H., editors, The Mathematics of Language (MOL 12 Proceedings), volume 6878 of LNCS, pages , Berlin Heidelberg. Springer. Hunter, T. and Dyer, C. (2013). Distributions on minimalist grammar derivations. In Proceedings of the 13th Meeting on the Mathematics of Language. Koopman, H. and Szabolcsi, A. (2000). Verbal Complexes. MIT Press, Cambridge, MA.

88 References II Lang, B. (1988). Parsing incomplete sentences. In Proceedings of the 12th International Conference on Computational Linguistics, pages Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106(3): Michaelis, J. (2001). Derivational minimalism is mildly context-sensitive. In Moortgat, M., editor, Logical Aspects of Computational Linguistics, volume 2014 of LNCS, pages Springer, Berlin Heidelberg. Miller, G. A. and Chomsky, N. (1963). Finitary models of language users. In Luce, R. D., Bush, R. R., and Galanter, E., editors, Handbook of Mathematical Psychology, volume 2. Wiley and Sons, New York. Morrill, G. (1994). Type Logical Grammar: Categorial Logic of Signs. Kluwer, Dordrecht. Nederhof, M. J. and Satta, G. (2008). Computing partition functions of pcfgs. Research on Language and Computation, 6(2): Seki, H., Matsumara, T., Fujii, M., and Kasami, T. (1991). On multiple context-free grammars. Theoretical Computer Science, 88: Stabler, E. P. (2006). Sidewards without copying. In Wintner, S., editor, Proceedings of The 11th Conference on Formal Grammar, pages , Stanford, CA. CSLI Publications. Stabler, E. P. (2011). Computational perspectives on minimalism. In Boeckx, C., editor, The Oxford Handbook of Linguistic Minimalism. Oxford University Press, Oxford. Stabler, E. P. and Keenan, E. L. (2003). Structural similarity within and among languages. Theoretical Computer Science, 293:

89 References III Vijay-Shanker, K., Weir, D. J., and Joshi, A. K. (1987). Characterizing structural descriptions produced by various grammatical formalisms. In Proceedings of the 25th Meeting of the Association for Computational Linguistics, pages Weir, D. (1988). Characterizing mildly context-sensitive grammar formalisms. PhD thesis, University of Pennsylvania. Yngve, V. H. (1960). A model and an hypothesis for language structure. In Proceedings of the American Philosophical Society, volume 104, pages

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014 Part 1: Grammars and cognitive hypotheses What is a grammar?

More information

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities ESSLLI, August 2015 Part 1: Grammars and cognitive hypotheses What is a grammar?

More information

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities ESSLLI, August 2015 Part 1: Grammars and cognitive hypotheses What is a grammar?

More information

Unifying Adjunct Islands and Freezing Effects in Minimalist Grammars

Unifying Adjunct Islands and Freezing Effects in Minimalist Grammars Unifying Adjunct Islands and Freezing Effects in Minimalist Grammars Tim Hunter Department of Linguistics University of Maryland TAG+10 1 / 26 Goal of This Talk Goal Present a unified account of two well-known

More information

Models of Adjunction in Minimalist Grammars

Models of Adjunction in Minimalist Grammars Models of Adjunction in Minimalist Grammars Thomas Graf mail@thomasgraf.net http://thomasgraf.net Stony Brook University FG 2014 August 17, 2014 The Theory-Neutral CliffsNotes Insights Several properties

More information

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars Laura Kallmeyer & Tatiana Bladier Heinrich-Heine-Universität Düsseldorf Sommersemester 2018 Kallmeyer, Bladier SS 2018 Parsing Beyond CFG:

More information

Insertion Minimalist Grammars: Eliminating redundancies between merge and move

Insertion Minimalist Grammars: Eliminating redundancies between merge and move Insertion Minimalist Grammars: Eliminating redundancies between merge and move Tim Hunter Department of Linguistics Yale University Abstract. Minimalist Grammars (MGs) provide a setting for rigourous investigations

More information

Tree Adjoining Grammars

Tree Adjoining Grammars Tree Adjoining Grammars TAG: Parsing and formal properties Laura Kallmeyer & Benjamin Burkhardt HHU Düsseldorf WS 2017/2018 1 / 36 Outline 1 Parsing as deduction 2 CYK for TAG 3 Closure properties of TALs

More information

Movement-Generalized Minimalist Grammars

Movement-Generalized Minimalist Grammars Movet-Generalized Minimalist Grammars Thomas Graf tgraf@ucla.edu tgraf.bol.ucla.edu University of California, Los Angeles LACL 2012 July 2, 2012 Outline 1 Monadic Second-Order Logic (MSO) Talking About

More information

Parsing Linear Context-Free Rewriting Systems

Parsing Linear Context-Free Rewriting Systems Parsing Linear Context-Free Rewriting Systems Håkan Burden Dept. of Linguistics Göteborg University cl1hburd@cling.gu.se Peter Ljunglöf Dept. of Computing Science Göteborg University peb@cs.chalmers.se

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Bringing machine learning & compositional semantics together: central concepts

Bringing machine learning & compositional semantics together: central concepts Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

HPSG: Binding Theory

HPSG: Binding Theory HPSG: Binding Theory Doug Arnold doug@essexacuk Introduction Binding Theory is to do with the syntactic restrictions on the distribution of referentially dependent items and their antecedents: reflexives/reciprocals

More information

Introduction to Semantics. The Formalization of Meaning 1

Introduction to Semantics. The Formalization of Meaning 1 The Formalization of Meaning 1 1. Obtaining a System That Derives Truth Conditions (1) The Goal of Our Enterprise To develop a system that, for every sentence S of English, derives the truth-conditions

More information

(7) a. [ PP to John], Mary gave the book t [PP]. b. [ VP fix the car], I wonder whether she will t [VP].

(7) a. [ PP to John], Mary gave the book t [PP]. b. [ VP fix the car], I wonder whether she will t [VP]. CAS LX 522 Syntax I Fall 2000 September 18, 2000 Paul Hagstrom Week 2: Movement Movement Last time, we talked about subcategorization. (1) a. I can solve this problem. b. This problem, I can solve. (2)

More information

X-bar theory. X-bar :

X-bar theory. X-bar : is one of the greatest contributions of generative school in the filed of knowledge system. Besides linguistics, computer science is greatly indebted to Chomsky to have propounded the theory of x-bar.

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Well-Nestedness Properly Subsumes Strict Derivational Minimalism

Well-Nestedness Properly Subsumes Strict Derivational Minimalism A revised version is to appear in: S. Pogodalla and J.-P. Prost (eds.), Logical Aspects of Computational Linguistics (LACL 2011), Lecture Notes in Artificial Intelligence Vol. 6736, pp. 112-128, Springer,

More information

Processing Difficulty and Structural Uncertainty in Relative Clauses across East Asian Languages

Processing Difficulty and Structural Uncertainty in Relative Clauses across East Asian Languages Processing Difficulty and Structural Uncertainty in Relative Clauses across East Asian Languages Zhong Chen and John Hale Department of Linguistics Cornell University Incremental parsing... 0 Incremental

More information

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Roger Levy Probabilistic Models in the Study of Language draft, October 2, Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CMPT-825 Natural Language Processing. Why are parsing algorithms important? CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate

More information

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing

More information

Computationele grammatica

Computationele grammatica Computationele grammatica Docent: Paola Monachesi Contents First Last Prev Next Contents 1 Unbounded dependency constructions (UDCs)............... 3 2 Some data...............................................

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

THE DRAVIDIAN EXPERIENCER CONSTRUCTION AND THE ENGLISH SEEM CONSTRUCTION. K. A. Jayaseelan CIEFL, Hyderabad

THE DRAVIDIAN EXPERIENCER CONSTRUCTION AND THE ENGLISH SEEM CONSTRUCTION. K. A. Jayaseelan CIEFL, Hyderabad THE DRAVIDIAN EXPERIENCER CONSTRUCTION AND THE ENGLISH SEEM CONSTRUCTION K. A. Jayaseelan CIEFL, Hyderabad 1. Introduction In many languages e.g. Malayalam, Tamil, Hindi, the same verb is used in the Experiencer

More information

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically NP S AUX VP Ali will V NP help D N the man A node is any point in the tree diagram and it can

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

An Additional Observation on Strict Derivational Minimalism

An Additional Observation on Strict Derivational Minimalism 10 An Additional Observation on Strict Derivational Minimalism Jens Michaelis Abstract We answer a question which, so far, was left an open problem: does in terms of derivable string languages the type

More information

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus Timothy A. D. Fowler Department of Computer Science University of Toronto 10 King s College Rd., Toronto, ON, M5S 3G4, Canada

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

Probabilistic Linguistics

Probabilistic Linguistics Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180 Bernoulli measures finite set A alphabet, strings of arbitrary (finite)

More information

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III

More information

Seminar in Semantics: Gradation & Modality Winter 2014

Seminar in Semantics: Gradation & Modality Winter 2014 1 Subject matter Seminar in Semantics: Gradation & Modality Winter 2014 Dan Lassiter 1/8/14 Handout: Basic Modal Logic and Kratzer (1977) [M]odality is the linguistic phenomenon whereby grammar allows

More information

Semantics and Generative Grammar. Quantificational DPs, Part 3: Covert Movement vs. Type Shifting 1

Semantics and Generative Grammar. Quantificational DPs, Part 3: Covert Movement vs. Type Shifting 1 Quantificational DPs, Part 3: Covert Movement vs. Type Shifting 1 1. Introduction Thus far, we ve considered two competing analyses of sentences like those in (1). (1) Sentences Where a Quantificational

More information

564 Lecture 25 Nov. 23, Continuing note on presuppositional vs. nonpresuppositional dets.

564 Lecture 25 Nov. 23, Continuing note on presuppositional vs. nonpresuppositional dets. 564 Lecture 25 Nov. 23, 1999 1 Continuing note on presuppositional vs. nonpresuppositional dets. Here's the argument about the nonpresupp vs. presupp analysis of "every" that I couldn't reconstruct last

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

CAS LX 522 Syntax I Fall 2000 October 10, 2000 Week 5: Case Theory and θ Theory. θ-theory continued

CAS LX 522 Syntax I Fall 2000 October 10, 2000 Week 5: Case Theory and θ Theory. θ-theory continued CAS LX 522 Syntax I Fall 2000 October 0, 2000 Paul Hagstrom Week 5: Case Theory and θ Theory θ-theory continued From last time: verbs have θ-roles (e.g., Agent, Theme, ) to assign, specified in the lexicon

More information

Other types of Movement

Other types of Movement Other types of Movement So far we seen Wh-movement, which moves certain types of (XP) constituents to the specifier of a CP. Wh-movement is also called A-bar movement. We will look at two more types of

More information

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

SEMANTICS OF POSSESSIVE DETERMINERS STANLEY PETERS DAG WESTERSTÅHL

SEMANTICS OF POSSESSIVE DETERMINERS STANLEY PETERS DAG WESTERSTÅHL SEMANTICS OF POSSESSIVE DETERMINERS STANLEY PETERS DAG WESTERSTÅHL Linguistics Department, Stanford University Department of Philosophy, Göteborg University peters csli.stanford.edu, dag.westerstahl phil.gu.se

More information

Generalized Quantifiers Logical and Linguistic Aspects

Generalized Quantifiers Logical and Linguistic Aspects Generalized Quantifiers Logical and Linguistic Aspects Lecture 1: Formal Semantics and Generalized Quantifiers Dag Westerståhl University of Gothenburg SELLC 2010 Institute for Logic and Cognition, Sun

More information

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Probabilistic Context-Free Grammar

Probabilistic Context-Free Grammar Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612

More information

2013 ISSN: JATLaC Journal 8: t 1. t t Chomsky 1993 I Radford (2009) R I t t R I 2. t R t (1) (= R's (15), p. 86) He could have helped

2013 ISSN: JATLaC Journal 8: t 1. t t Chomsky 1993 I Radford (2009) R I t t R I 2. t R t (1) (= R's (15), p. 86) He could have helped t 1. tt Chomsky 1993 IRadford (2009) R Itt R I 2. t R t (1) (= R's (15), p. 86) He could have helped her, or [she have helped him]. 2 have I has had I I could ellipsis I gapping R (2) (= R (18), p.88)

More information

A Context-Free Grammar

A Context-Free Grammar Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily

More information

Introduction to Computational Linguistics

Introduction to Computational Linguistics Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates

More information

PENGUIN READERS. Five Famous Fairy Tales

PENGUIN READERS. Five Famous Fairy Tales PENGUIN READERS Five Famous Fairy Tales Introduction Jacob and Wilhelm Grimm the Brothers Grimm were good friends. Jacob was a quiet man and sometimes sad. Wilhelm was often very ill but he was a happier

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP  NP PP 1.0. N people 0. /6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Ling 240 Lecture #15. Syntax 4

Ling 240 Lecture #15. Syntax 4 Ling 240 Lecture #15 Syntax 4 agenda for today Give me homework 3! Language presentation! return Quiz 2 brief review of friday More on transformation Homework 4 A set of Phrase Structure Rules S -> (aux)

More information

Strong connectivity hypothesis and generative power in TAG

Strong connectivity hypothesis and generative power in TAG 15 Strong connectivity hypothesis and generative power in TAG Alessandro Mazzei, Vincenzo ombardo and Patrick Sturt Abstract Dynamic grammars are relevant for psycholinguistic modelling and speech processing.

More information

Algorithms for Syntax-Aware Statistical Machine Translation

Algorithms for Syntax-Aware Statistical Machine Translation Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial

More information

The Degree of Commitment to the Initial Analysis Predicts the Cost of Reanalysis: Evidence from Japanese Garden-Path Sentences

The Degree of Commitment to the Initial Analysis Predicts the Cost of Reanalysis: Evidence from Japanese Garden-Path Sentences P3-26 The Degree of Commitment to the Initial Analysis Predicts the Cost of Reanalysis: Evidence from Japanese Garden-Path Sentences Chie Nakamura, Manabu Arai Keio University, University of Tokyo, JSPS

More information

Introduction to Semantic Parsing with CCG

Introduction to Semantic Parsing with CCG Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG)

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP

More information

Where linguistic meaning meets non-linguistic cognition

Where linguistic meaning meets non-linguistic cognition Where linguistic meaning meets non-linguistic cognition Tim Hunter and Paul Pietroski NASSLLI 2016 Friday: Putting things together (perhaps) Outline 5 Experiments with kids on most 6 So: How should we

More information

Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters

Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters Computational Psycholinguistics Lecture 2: human syntactic parsing, garden pathing, grammatical prediction, surprisal, particle filters Klinton Bicknell & Roger Levy LSA 2015 Summer Institute University

More information

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis 1 Introduction Parenthesis Matching Problem Describe the set of arithmetic expressions with correctly matched parenthesis. Arithmetic expressions with correctly matched parenthesis cannot be described

More information

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0 Massachusetts Institute of Technology 6.863J/9.611J, Natural Language Processing, Spring, 2001 Department of Electrical Engineering and Computer Science Department of Brain and Cognitive Sciences Handout

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Ling 98a: The Meaning of Negation (Week 5)

Ling 98a: The Meaning of Negation (Week 5) Yimei Xiang yxiang@fas.harvard.edu 15 October 2013 1 Review Negation in propositional logic, oppositions, term logic of Aristotle Presuppositions Projection and accommodation Three-valued logic External/internal

More information

The Formal Architecture of. Lexical-Functional Grammar. Ronald M. Kaplan and Mary Dalrymple

The Formal Architecture of. Lexical-Functional Grammar. Ronald M. Kaplan and Mary Dalrymple The Formal Architecture of Lexical-Functional Grammar Ronald M. Kaplan and Mary Dalrymple Xerox Palo Alto Research Center 1. Kaplan and Dalrymple, ESSLLI 1995, Barcelona Architectural Issues Representation:

More information

Features of Statistical Parsers

Features of Statistical Parsers Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical

More information

Computational Models - Lecture 3

Computational Models - Lecture 3 Slides modified by Benny Chor, based on original slides by Maurice Herlihy, Brown University. p. 1 Computational Models - Lecture 3 Equivalence of regular expressions and regular languages (lukewarm leftover

More information

An introduction to mildly context sensitive grammar formalisms. Combinatory Categorial Grammar

An introduction to mildly context sensitive grammar formalisms. Combinatory Categorial Grammar An introduction to mildly context sensitive grammar formalisms Combinatory Categorial Grammar Gerhard Jäger & Jens Michaelis University of Potsdam {jaeger,michael}@ling.uni-potsdam.de p.1 Basic Categorial

More information

MoL 13. The 13th Meeting on the Mathematics of Language. Proceedings

MoL 13. The 13th Meeting on the Mathematics of Language. Proceedings MoL 13 The 13th Meeting on the Mathematics of Language Proceedings August 9, 2013 Sofia, Bulgaria Production and Manufacturing by Omnipress, Inc. 2600 Anderson Street Madison, WI 53704 USA c 2013 The Association

More information

The Lambek-Grishin calculus for unary connectives

The Lambek-Grishin calculus for unary connectives The Lambek-Grishin calculus for unary connectives Anna Chernilovskaya Utrecht Institute of Linguistics OTS, Utrecht University, the Netherlands anna.chernilovskaya@let.uu.nl Introduction In traditional

More information

Compositionality and Syntactic Structure Marcus Kracht Department of Linguistics UCLA 3125 Campbell Hall 405 Hilgard Avenue Los Angeles, CA 90095

Compositionality and Syntactic Structure Marcus Kracht Department of Linguistics UCLA 3125 Campbell Hall 405 Hilgard Avenue Los Angeles, CA 90095 Compositionality and Syntactic Structure Marcus Kracht Department of Linguistics UCLA 3125 Campbell Hall 405 Hilgard Avenue Los Angeles, CA 90095 1543 kracht@humnet.ucla.edu 1. The Questions ➀ Why does

More information

CAS LX 522 Syntax I November 4, 2002 Week 9: Wh-movement, supplement

CAS LX 522 Syntax I November 4, 2002 Week 9: Wh-movement, supplement CAS LX 522 Syntax I November 4, 2002 Fall 2002 Week 9: Wh-movement, supplement () Italian Tuo fratello ( your brother ), [ CP a cui i [ TP mi domando [ CP che storie i [ TP abbiano raccontato t i t j...

More information

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this. Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one

More information

Computational Models - Lecture 4

Computational Models - Lecture 4 Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push

More information

The relation of surprisal and human processing

The relation of surprisal and human processing The relation of surprisal and human processing difficulty Information Theory Lecture Vera Demberg and Matt Crocker Information Theory Lecture, Universität des Saarlandes April 19th, 2015 Information theory

More information

One hint from secondary predication (from Baker 1997 (8) A secondary predicate cannot take the goal argument as subject of predication, wheth

One hint from secondary predication (from Baker 1997 (8) A secondary predicate cannot take the goal argument as subject of predication, wheth MIT, Fall 2003 1 The Double Object Construction (Larson 1988, Aoun & Li 1989) MIT, 24.951, Fr 14 Nov 2003 A familiar puzzle The Dative Alternation (1) a. I gave the candy to the children b. I gave the

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):

More information

Complexity, Parsing, and Factorization of Tree-Local Multi-Component Tree-Adjoining Grammar

Complexity, Parsing, and Factorization of Tree-Local Multi-Component Tree-Adjoining Grammar Complexity, Parsing, and Factorization of Tree-Local Multi-Component Tree-Adjoining Grammar Rebecca Nesson School of Engineering and Applied Sciences Harvard University Giorgio Satta Department of Information

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic

More information

Remembering subresults (Part I): Well-formed substring tables

Remembering subresults (Part I): Well-formed substring tables Remembering subresults (Part I): Well-formed substring tables Detmar Meurers: Intro to Computational Linguistics I OSU, LING 684.01, 1. February 2005 Problem: Inefficiency of recomputing subresults Two

More information

Reference-Set Computation = Minimalism + Transformational Rules?

Reference-Set Computation = Minimalism + Transformational Rules? Reference-Set Computation = inimalism + Transformational Rules? Thomas Graf tgraf@ucla.edu tgraf.bol.ucla.edu University of California, Los Angeles Universität Bielefeld, Bielefeld, Germany December 8,

More information

Context- Free Parsing with CKY. October 16, 2014

Context- Free Parsing with CKY. October 16, 2014 Context- Free Parsing with CKY October 16, 2014 Lecture Plan Parsing as Logical DeducBon Defining the CFG recognibon problem BoHom up vs. top down Quick review of Chomsky normal form The CKY algorithm

More information

Improved TBL algorithm for learning context-free grammar

Improved TBL algorithm for learning context-free grammar Proceedings of the International Multiconference on ISSN 1896-7094 Computer Science and Information Technology, pp. 267 274 2007 PIPS Improved TBL algorithm for learning context-free grammar Marcin Jaworski

More information

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument

More information

Moreno Mitrović. The Saarland Lectures on Formal Semantics

Moreno Mitrović. The Saarland Lectures on Formal Semantics ,, 3 Moreno Mitrović The Saarland Lectures on Formal Semantics λ- λ- λ- ( λ- ) Before we move onto this, let's recall our f -notation for intransitive verbs 1/33 λ- ( λ- ) Before we move onto this, let's

More information

Derivational order and ACGs

Derivational order and ACGs Derivational order and ACGs Gregory M. Kobele 1 Jens Michaelis 2 1 University of Chicago, Chicago, USA 2 Bielefeld University, Bielefeld, Germany ACG@10, Bordeaux, December 7 9, 2011 Greg & Jens (Chicago

More information

Unterspezifikation in der Semantik Scope Semantics in Lexicalized Tree Adjoining Grammars

Unterspezifikation in der Semantik Scope Semantics in Lexicalized Tree Adjoining Grammars in der emantik cope emantics in Lexicalized Tree Adjoining Grammars Laura Heinrich-Heine-Universität Düsseldorf Wintersemester 2011/2012 LTAG: The Formalism (1) Tree Adjoining Grammars (TAG): Tree-rewriting

More information

A Computational Approach to Minimalism

A Computational Approach to Minimalism A Computational Approach to Minimalism Alain LECOMTE CLIPS-IMAG BP-53 38041 Grenoble, cedex 9, France Email: Alain.Lecomte@upmf-grenoble.fr Abstract The aim of this paper is to recast minimalist principles

More information

Finite Presentations of Pregroups and the Identity Problem

Finite Presentations of Pregroups and the Identity Problem 6 Finite Presentations of Pregroups and the Identity Problem Alexa H. Mater and James D. Fix Abstract We consider finitely generated pregroups, and describe how an appropriately defined rewrite relation

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 14 Ana Bove May 14th 2018 Recap: Context-free Grammars Simplification of grammars: Elimination of ǫ-productions; Elimination of

More information