Sharpening the empirical claims of generative syntax through formalization

Size: px

Start display at page:

Download "Sharpening the empirical claims of generative syntax through formalization"

Caren McDaniel
6 years ago
Views:

1 Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014

2 Part 1: Grammars and cognitive hypotheses What is a grammar? What can grammars do? Concrete illustration of a target: Surprisal Parts 2 4: Assembling the pieces Minimalist Grammars (MGs) MGs and MCFGs Probabilities on MGs Part 5: Learning and wrap-up Something slightly different: Learning model Recap and open questions

3 Sharpening the empirical claims of generative syntax through formalization Tim Hunter NASSLLI, June 2014 Part 4 Probabilities on MG Derivations

4 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 129 / 196

5 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 130 / 196

6 Probabilistic CFGs What are the probabilities of the derivations? = What are the values of λ 1, λ 2, etc.? λ 1 λ 2 λ 3 λ 4 λ 5 λ 6 λ 7 λ 8 S NP VP NP John NP Mary VP ran VP V NP VP V S V believed V knew 131 / 196

7 Probabilistic CFGs What are the probabilities of the derivations? = What are the values of λ 1, λ 2, etc.? λ 1 λ 2 λ 3 λ 4 λ 5 λ 6 λ 7 λ 8 S NP VP NP John NP Mary VP ran VP V NP VP V S V believed V knew Training algorithm Training corpus 1.0 S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew λ 5 = count(vp V NP) count(vp) 131 / 196

8 MCFG for an entire Minimalist Grammar Lexical items: ɛ :: =t +wh c 1 ɛ :: =t c 1 will :: =v =d t 1 often :: =v v 1 praise :: =d v 1 marie :: d 1 pierre :: d 1 who :: d -wh 1 Production rules: st, u :: +wh c, -wh 0 s :: =t +wh c 1 t, u :: t, -wh 0 st :: =d t 0 s :: =v =d t 1 t :: v 0 st, u :: =d t, -wh 0 s :: =v =d t 1 t, u :: v, -wh 0 ts :: c 0 s, t :: +wh c, -wh 0 st :: c 0 s :: =t c 1 t :: t 0 ts :: t 0 s :: =d t 0 t :: d 1 ts, u :: t, -wh 0 s, u :: =d t, -wh 0 t :: d 1 st :: v 0 s :: =d v 1 t :: d 1 st :: v 0 s :: =v v 1 t :: v 0 s, t :: v, -wh 0 s :: =d v 1 t :: d -wh 1 st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh / 196

9 Probabilities on MCFGs λ 1 ts :: c 0 s, t :: +wh c, -wh 0 λ 2 st :: c 0 s :: =t c 1 t :: t 0 λ 3 st :: v 0 s :: =d v 1 t :: d 1 λ 4 st :: v 0 s :: =v v 1 t :: v 0 λ 5 s, t :: v, -wh 0 s :: =d v 1 t :: d -wh 1 λ 6 st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 The context-free backbone for MG derivations identifies a parametrization for probability distributions over them. λ 2 = count( c 0 =t c 1 t 0 ) count ( c 0 ) Plus: It turns out that the intersect-with-an-fsa trick we used for CFGs also works for MCFGs! 133 / 196

10 Grammar intersection example (simple) 1.0 S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew Mary believed * 1.0 S 0,2 NP 0,1 VP 1,2 0.7 NP 0,1 Mary 0.5 VP 1,2 V 1,2 NP 2,2 0.3 VP 1,2 V 1,2 S 2,2 0.4 V 1,2 believed S0,2 S0,2 1.0 S 2,2 NP 2,2 VP 2,2 0.3 NP 2,2 John 0.7 NP 2,2 Mary 0.2 VP 2,2 ran 0.5 VP 2,2 V 2,2 NP 2,2 0.3 VP 2,2 V 2,2 S 2,2 0.4 V 2,2 believed 0.6 V 2,2 knew NP0,1 Mary VP1,2 NP0,1 Mary VP1,2 V1,2 NP2,2 believed V1,2 believed S2,2 NB: Total weight in this grammar is not one! (What is it? Start symbol is S 0,2.) Each derivation has the weight it had in the original grammar. 134 / 196

11 Beyond context-free t 1t 2 :: S t 1, t 2 :: P t 1u 1, t 2u 2 :: P t 1, t 2 :: P u 1, u 2 :: E ɛ, ɛ :: P a, a :: E b, b :: E { ww w {a, b} } aabaaaba :: S aaba,aaba :: P aab,aab :: P a,a :: E aa,aa :: P b,b :: E a,a :: P a,a :: E ɛ, ɛ :: P a,a :: E Unlike in a CFG, we can ensure that the two halves are extended in the same ways without concatenating them together. 135 / 196

12 Intersection with an MCFG S 0,2 P 0,1;1,2 P 0,1;1,2 P e;e E 0,1;1,2 E 0,1;1,2 A 0,1 A 1,2 S 0,2 P 0,2;2,2 P 0,2;2,2 P 0,2;2,2 E 2,2;2,2 P 0,2;2,2 P 0,1;2,2 E 1,2;2,2 P 0,1;2,2 P e;2,2 E 0,1;2,2 E 0,1;2,2 A 0,1 A 2,2 E 1,2;2,2 A 1,2 A 2,2 b,b :: E 2,2;2,2 a,a :: E 2,2;2,2 ɛ, ɛ :: P e;e ɛ, ɛ :: P e;2,2 a :: A 2,2 b :: B 2,2 a :: A 0,1 a :: A 1,2 S 0,2 S 0,2 P 0,2;2,2 P 0,1;1,2 P 0,2;2,2 E 2,2;2,2 P e;e E 0,1;1,2 P 0,1;2,2 E 1,2;2,2 A 0,1 A 1,2 P e;2,2 E 0,1;2,2 A 1,2 A 2,2 A 0,1 A 2,2 136 / 196

13 Intersection grammars 1.0 S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew Mary believed * = G S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew Mary believed John * = G 3 surprisal at John = log P(W 3 = John W 1 = Mary, W 2 = believed) total weight in G3 = log total weight in G 2 = log = / 196

14 Surprisal and entropy reduction surprisal at John = log P(W 3 = John W 1 = Mary, W 2 = believed) total weight in G3 = log total weight in G 2 entropy reduction at John = (entropy of G 2) (entropy of G 3) 138 / 196

15 Computing sum of weights in a grammar ( partition function ) Z(A) = ( ) p(a α) Z(α) A α Z(ɛ) = 1 Z(aβ) = Z(β) Z(Bβ) = Z(B) Z(β) where β ɛ (Nederhof and Satta 2008) 1.0 S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.4 V believed 0.6 V knew Z(V) = = 1.0 Z(NP) = = 1.0 Z(VP) = (0.5 Z(V) Z(NP)) = ( ) = 0.7 Z(S) = 1.0 Z(NP) Z(VP) = S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew Z(V) = = 1.0 Z(NP) = = 1.0 Z(VP) = (0.5 Z(V) Z(NP)) + (0.3 Z(V) Z(S)) Z(S) = 1.0 Z(NP) Z(VP) 139 / 196

16 Computing entropy of a grammar 1.0 S NP VP 0.3 NP John 0.7 NP Mary 0.2 VP ran 0.5 VP V NP 0.3 VP V S 0.4 V believed 0.6 V knew h(s) = 0 h(np) = entropy of (0.3, 0.7) h(vp) = entropy of (0.2, 0.5, 0.3) h(v) = entropy of (0.4, 0.6) H(S) = h(s) + 1.0(H(NP) + H(VP)) H(NP) = h(np) H(VP) = h(vp) + 0.2(0) + 0.5(H(V) + H(NP)) + 0.3(H(V) + H(S)) H(V) = h(v) (Hale 2006) 140 / 196

17 Surprisal and entropy reduction surprisal at John = log P(W 3 = John W 1 = Mary, W 2 = believed) total weight in G3 = log total weight in G 2 entropy reduction at John = (entropy of G 2) (entropy of G 3) 141 / 196

18 Putting it all together (Hale 2006) We can now put entropy reduction/surprisal together with a minimalist grammar to produce predictions about sentence comprehension difficulty! complexity metric + grammar prediction Write an MG that generates sentence types of interest Convert MG to an MCFG Add probabilities to MCFG based on corpus frequencies (or whatever else) Compute intersection grammars for each point in a sentence Calculate reduction in entropy across the course of the sentence (i.e. workload) 142 / 196

19 Demo 143 / 196

20 Hale (2006) 144 / 196

21 Hale (2006) they have -ed forget -en that the boy who tell -ed the story be -s so young the fact that the girl who pay -ed for the ticket be -s very poor doesnt matter I know that the girl who get -ed the right answer be -s clever he remember -ed that the man who sell -ed the house leave -ed the town they have -ed forget -en that the letter which Dick write -ed yesterday be -s long the fact that the cat which David show -ed to the man like -s eggs be -s strange I know that the dog which Penny buy -ed today be -s very gentle he remember -ed that the sweet which David give -ed Sally be -ed a treat they have -ed forget -en that the man who Ann give -ed the present to be -ed old the fact that the boy who Paul sell -ed the book to hate -s reading be -s strange I know that the man who Stephen explain -ed the accident to be -s kind he remember -ed that the dog which Mary teach -ed the trick to be -s clever they have -ed forget -en that the box which Pat bring -ed the apple in be -ed lost the fact that the girl who Sue write -ed the story with be -s proud doesnt matter I know that the ship which my uncle take -ed Joe on be -ed interesting he remember -ed that the food which Chris pay -ed the bill for be -ed cheap they have -ed forget -en that the girl whose friend buy -ed the cake be -ed wait -ing the fact that the boy whose brother tell -s lies be -s always honest surprise -ed us I know that the boy whose father sell -ed the dog be -ed very sad he remember -ed that the girl whose mother send -ed the clothe come -ed too late they have -ed forget -en that the man whose house Patrick buy -ed be -ed so ill the fact that the sailor whose ship Jim take -ed have -ed one leg be -s important I know that the woman whose car Jenny sell -ed be -ed very angry he remember -ed that the girl whose picture Clare show -ed us be -ed pretty 144 / 196

22 Hale (2006) 144 / 196

23 Hale (2006) 144 / 196

24 Hale (2006) Hale actually wrote two different MGs: classical adjunction analysis of relative clauses Kaynian/promotion analysis 145 / 196

25 Hale (2006) Hale actually wrote two different MGs: classical adjunction analysis of relative clauses Kaynian/promotion analysis The branching structure of the two MCFGs was different enough to produce distinct Entropy Reduction predictions. (Same corpus counts!) The Kaynian/promotion analysis produced a better fit for the Accessibility Hierarchy facts. (i.e. holding the complexity metric fixed to argue for a grammar) But there are some ways in which this method is insensitive to fine details of the MG formalism. 145 / 196

26 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 146 / 196

27 Subtlely different minimalist frameworks Minimalist grammars with many choices of different bells and whistles can all be expressed with context-free derivational structure. Must keep an eye on finiteness of number of types (SMC or equivalent)! See Stabler (2011) Some points of variation: adjunction head movement phases move as re-merge / 196

28 How to deal with adjuncts? A normal application of merge? < often :: v < eat :: cake :: often :: =v v < eat :: v cake :: Or a new kind of feature and distinct operation adjoin? > often :: < eat :: v cake :: often :: *v < eat :: v cake :: 148 / 196

29 How to implement head movement? Modify merge to allow some additional string-shuffling in head-complement relationships? > ɛ :: +k t < eats :: cake :: -s :: =v +k t < eat :: v cake :: 149 / 196

30 How to implement head movement? Modify merge to allow some additional string-shuffling in head-complement relationships? > ɛ :: +k t < eats :: cake :: -s :: =v +k t < eat :: v cake :: Or some combination of normal phrasal movements? (Koopman and Szabolcsi 2000) T T -s DP cake XP X X VP V eat t 149 / 196

31 How to implement head movement? Modify merge to allow some additional string-shuffling in head-complement relationships? > ɛ :: +k t < eats :: cake :: -s :: =v +k t < eat :: v cake :: Or some combination of normal phrasal movements? (Koopman and Szabolcsi 2000) T T -s DP cake XP X X VP V eat t 149 / 196

32 Successive cyclic movement? c c +wh c, -wh +wh c, -wh ɛ :: =t +wh c t, -wh ɛ :: =t +wh c t, -wh +k t, -k, -wh +k t, -k, -wh will :: =v +k t v, -k, -wh will :: =v +k t v, -k, -wh =subj v, -wh Mary :: subj -k =subj v, -wh Mary :: subj -k think :: =t =subj v t, -wh think :: =t =subj v t, -wh +wh t, -wh -wh +k t, -k, -wh +k +wh t, -k, -wh -wh will :: =v +k t v, -k, -wh will :: =v +k +wh t v, -k, -wh -wh =subj v, -wh John :: subj -k =subj v, -wh -wh John :: subj -k eat :: =obj =subj v what :: obj -wh eat :: =obj =subj v what :: obj -wh -wh 150 / 196

33 Successive cyclic movement? c c +wh c, -wh +wh c, -wh ɛ :: =t +wh c t, -wh ɛ :: =t +wh c t, -wh +k t, -k, -wh +k t, -k, -wh will :: =v +k t v, -k, -wh will :: =v +k t v, -k, -wh =subj v, -wh Mary :: subj -k =subj v, -wh Mary :: subj -k think :: =t =subj v t, -wh think :: =t =subj v t, -wh +wh t, -wh -wh +k t, -k, -wh +k +wh t, -k, -wh -wh will :: =v +k t v, -k, -wh will :: =v +k +wh t v, -k, -wh -wh =subj v, -wh John :: subj -k =subj v, -wh -wh John :: subj -k eat :: =obj =subj v what :: obj -wh eat :: =obj =subj v what :: obj -wh -wh 150 / 196

34 Unifying feature-checking (one way) John will seem to eat cake move will seem to eat cake, John merge t 0 move +k t, -k 0 merge will seem to eat cake, John =v +k t 1 v, -k 0 John will seem to eat cake mrg will seem to eat cake, John mrg will, seem to eat cake, John insert -t 0 mrg +k -t, -k 0 mrg +v +k -t, v, -k 1 insert will will seem to eat cake, John +v +k -t 1 v, -k 0 (Stabler 2006, Hunter 2011) 151 / 196

35 Unifying feature-checking (one way) John will seem to eat cake move will seem to eat cake, John merge t 0 move +k t, -k 0 merge will seem to eat cake, John =v +k t 1 v, -k 0 John will seem to eat cake mrg will seem to eat cake, John mrg will, seem to eat cake, John insert -t 0 mrg +k -t, -k 0 mrg +v +k -t, v, -k 1 insert will will seem to eat cake, John +v +k -t 1 v, -k 0 (Stabler 2006, Hunter 2011) 151 / 196

36 Three schemas for merge rules: st, t 1,..., t k :: γ, α 1,..., α k 0 s :: =fγ 1 t, t 1,..., t k :: f, α 1,..., α k n ts, s 1,..., s j, t 1,..., t k :: γ, α 1,..., α j, β 1,..., β k 0 s, s 1,..., s j :: =fγ, α 1,..., α j 0 t, t 1,..., t k :: f, β 1,..., β k n s, s 1,..., s j, t, t 1,..., t k :: γ, α 1,..., α j, δ, β 1,..., β k 0 s, s 1,..., s j :: =fγ, α 1,..., α j n t, t 1,..., t k :: fδ, β 1,..., β k n Two schemas for merge rules: s is, s 1,..., s i 1, s i+1,..., s k :: γ, α 1,..., α i 1, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: +fγ, α 1,..., α i 1, -f, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: γ, α 1,..., α i 1, δ, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: +fγ, α 1,..., α i 1, -fδ, α i+1,..., α k / 196

37 One schema for insert rules: s, s 1,..., s j, t, t 1,..., t k :: +fγ, α 1,..., α j, -fγ, β 1,..., β k n s, s 1,..., s j :: +fγ, α 1,..., α j n t, t 1,..., t k :: -fγ, β 1,..., β k n Three schemas for mrg rules: ss i, s 1,..., s i 1, s i+1,..., s k :: γ, α 1,..., α i 1, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: +fγ, α 1,..., α i 1, -f, α i+1,..., α k 1 s is, s 1,..., s i 1, s i+1,..., s k :: γ, α 1,..., α i 1, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: +fγ, α 1,..., α i 1, -f, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: γ, α 1,..., α i 1, δ, α i+1,..., α k 0 s, s 1,..., s i,..., s k :: +fγ, α 1,..., α i 1, -fδ, α i+1,..., α k / 196

38 Subtlely different minimalist frameworks Minimalist grammars with many choices of different bells and whistles can all be expressed with context-free derivational structure. Must keep an eye on finiteness of number of types (SMC or equivalent)! See Stabler (2011) Some points of variation: adjunction head movement phases move as re-merge / 196

39 Subtlely different minimalist frameworks Minimalist grammars with many choices of different bells and whistles can all be expressed with context-free derivational structure. Must keep an eye on finiteness of number of types (SMC or equivalent)! See Stabler (2011) Some points of variation: adjunction head movement phases move as re-merge... Each variant of the formalism expresses a different hypothesis about the set of primitive grammatical operations. (We are looking for ways to tell these apart!) The shapes of the derivation trees are generally very similar from one variant to the next. But variants will make different classifications of the derivational steps involved, according to which operation is being applied. 154 / 196

40 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 155 / 196

41 Probabilities on MCFGs λ 1 ts :: c 0 s, t :: +wh c, -wh 0 λ 2 st :: c 0 s :: =t c 1 t :: t 0 λ 3 st :: v 0 s :: =d v 1 t :: d 1 λ 4 st :: v 0 s :: =v v 1 t :: v 0 λ 5 s, t :: v, -wh 0 s :: =d v 1 t :: d -wh 1 λ 6 st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 Training question: What values of λ 1, λ 2, etc. make the training corpus most likely? 156 / 196

42 Problem #1 with the naive parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise 157 / 196

43 Problem #1 with the naive parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh / 196

44 Generalizations missed by the naive parametrization > v often :: v < praise :: marie :: st :: v 0 s :: =v v 1 t :: v 0 often :: =v v < praise :: v v marie :: 158 / 196

45 Generalizations missed by the naive parametrization > v often :: v < praise :: marie :: st :: v 0 s :: =v v 1 t :: v 0 often :: =v v < praise :: v v marie :: > v, -wh often :: v < praise :: who :: -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 often :: =v v < praise :: v v, -wh who :: -wh 158 / 196

46 Problem #1 with the naive parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh / 196

47 Problem #1 with the naive parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh ) count ( v 0 =d v 1 d 1 ( ) = 95 count v ) count ( v,-wh 0 =d v 1 d -wh 1 ( ) = 2 count v,-wh 0 3 This training setup doesn t know which minimalist-grammar operations are being implemented by the various MCFG rules. 159 / 196

48 Naive parametrization G A Naive parametrization Training corpus 160 / 196

49 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 161 / 196

50 A (slightly) more complicated grammar ɛ :: =t c ɛ :: =t +wh c will :: =v =subj t shave :: v shave :: =obj v boys :: subj who :: subj -wh boys :: =x =det subj ɛ :: x foo :: det themselves :: =ant obj ɛ :: =subj ant -subj will :: =v +subj t boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Some details: Subject is base-generated in SpecTP; no movement for Case Transitive and intransitive versions of shave foo is a determiner that optionally combines with boys to make a subject Dummy feature x to fill complement of boys so that foo goes on the left themselves can appear in object position, via a movement theory of reflexives A subj can be turned into an ant -subj themselves combines with an ant to make an obj will can attract its subject by move as well as merge 162 / 196

51 > foo :: < boys :: subj ɛ :: < foo :: det boys :: =det subj ɛ :: boys :: =x =det subj ɛ :: x < themselves :: obj < ɛ :: -subj boys :: themselves :: =ant obj < ɛ :: ant -subj boys :: ɛ :: =subj ant -subj boys :: subj 163 / 196

52 Choice points in the MG-derived MCFG Question or not? c 0 =t c 0 t 0 c 0 +wh c, -wh 0 Antecedent lexical or complex? ant -subj 0 =subj ant -subj 1 subj 0 ant -subj 0 =subj ant -subj 1 subj 1 Non-wh subject merged and complex, merged and lexical, or moved? t 0 =subj t 0 subj 0 t 0 =subj t 0 subj 1 t 0 +subj t, -subj 0 Wh-phrase same as moving subject or separated because of doubling? t, -wh 0 =subj t 0 subj -wh 1 t, -wh 0 +subj t, -subj, -wh / 196

53 Choice points in the IMG-derived MCFG Question or not? -c 0 +t -c, -t 1 -c 0 +wh -c, -wh 0 Antecedent lexical or complex? +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 0 +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 1 Non-wh subject merged and complex, merged and lexical, or moved? +subj -t, -subj 0 +subj -t 0 -subj 0 +subj -t, -subj 0 +subj -t 0 -subj 1 +subj -t, -subj 0 +v +subj -t, -v, -subj 1 Wh-phrase same as moving subject or separated because of doubling? -t, -wh 0 +subj -t, -subj -wh 0 -t, -wh 0 +subj -t, -subj, -wh / 196

54 Problem #2 with the naive parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave 166 / 196

55 Problem #2 with the naive parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave With merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves who will shave themselves With merge and move as unified operations: boys will shave foo boys will shave who will shave boys will shave themselves who will shave themselves 166 / 196

56 Problem #2 with the naive parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave With merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves who will shave themselves With merge and move as unified operations: boys will shave foo boys will shave who will shave boys will shave themselves who will shave themselves This treatment of probabilities doesn t know which minimalist-grammar operations are being implemented by the various MCFG rules. So the probabilities are unaffected by changes in set of primitive operations. 166 / 196

57 Naive parametrization G A Naive parametrization Training corpus G B1 Naive parametrization Training corpus G B2 Naive parametrization / 196

58 Outline 13 Easy probabilities with context-free structure 14 Different frameworks 15 Problem #1 with the naive parametrization 16 Problem #2 with the naive parametrization 17 Solution: Faithfulness to MG operations 168 / 196

59 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) (Hunter and Dyer 2013) 169 / 196

60 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) MCFG Rule φ merge φ d φ v φ t φ move φ wh st :: c 0 s :: =t c 1 t :: t ts :: c 0 s, t :: +wh c, -wh st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh Each rule r is assigned a score as a function of the vector φ(r): s(r) = exp(λ φ(r)) = exp(λ merge φ merge(r) + λ d φ d(r) + λ v φ v(r) +... ) (Hunter and Dyer 2013) 169 / 196

61 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) MCFG Rule φ merge φ d φ v φ t φ move φ wh st :: c 0 s :: =t c 1 t :: t ts :: c 0 s, t :: +wh c, -wh st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh Each rule r is assigned a score as a function of the vector φ(r): s(r) = exp(λ φ(r)) = exp(λ merge φ merge(r) + λ d φ d(r) + λ v φ v(r) +... ) s(r 1 ) = exp(λ merge + λ t) (Hunter and Dyer 2013) 169 / 196

62 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) MCFG Rule φ merge φ d φ v φ t φ move φ wh st :: c 0 s :: =t c 1 t :: t ts :: c 0 s, t :: +wh c, -wh st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh Each rule r is assigned a score as a function of the vector φ(r): s(r) = exp(λ φ(r)) = exp(λ merge φ merge(r) + λ d φ d(r) + λ v φ v(r) +... ) s(r 1 ) = exp(λ merge + λ t) s(r 2 ) = exp(λ move + λ wh) (Hunter and Dyer 2013) 169 / 196

63 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) MCFG Rule φ merge φ d φ v φ t φ move φ wh st :: c 0 s :: =t c 1 t :: t ts :: c 0 s, t :: +wh c, -wh st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh Each rule r is assigned a score as a function of the vector φ(r): s(r) = exp(λ φ(r)) = exp(λ merge φ merge(r) + λ d φ d(r) + λ v φ v(r) +... ) s(r 1 ) = exp(λ merge + λ t) s(r 2 ) = exp(λ move + λ wh) s(r 3 ) = exp(λ merge + λ d) (Hunter and Dyer 2013) 169 / 196

64 The smarter parametrization Solution: Have a rule s probability be a function of (only) what it does merge or move what feature is being checked (either movement or selection) MCFG Rule φ merge φ d φ v φ t φ move φ wh st :: c 0 s :: =t c 1 t :: t ts :: c 0 s, t :: +wh c, -wh st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh Each rule r is assigned a score as a function of the vector φ(r): s(r) = exp(λ φ(r)) = exp(λ merge φ merge(r) + λ d φ d(r) + λ v φ v(r) +... ) s(r 1 ) = exp(λ merge + λ t) s(r 2 ) = exp(λ move + λ wh) s(r 3 ) = exp(λ merge + λ d) s(r 5 ) = exp(λ merge + λ d) (Hunter and Dyer 2013) 169 / 196

65 Generalizations missed by the naive parametrization > v often :: v < praise :: marie :: st :: v 0 s :: =v v 1 t :: v 0 often :: =v v < praise :: v v marie :: 170 / 196

66 Generalizations missed by the naive parametrization > v often :: v < praise :: marie :: st :: v 0 s :: =v v 1 t :: v 0 often :: =v v < praise :: v v marie :: > v, -wh often :: v < praise :: who :: -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 often :: =v v < praise :: v v, -wh who :: -wh 170 / 196

67 Comparison The old way: λ 1 ts :: c 0 s, t :: +wh c, -wh 0 λ 2 st :: c 0 s :: =t c 1 t :: t 0 λ 3 st :: v 0 s :: =d v 1 t :: d 1 λ 4 st :: v 0 s :: =v v 1 t :: v 0 λ 5 s, t :: v, -wh 0 s :: =d v 1 t :: d -wh 1 λ 6 st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 Training question: What values of λ 1, λ 2, etc. make the training corpus most likely? The new way: exp(λ move + λ wh) ts :: c 0 s, t :: +wh c, -wh 0 exp(λ merge + λ t) st :: c 0 s :: =t c 1 t :: t 0 exp(λ merge + λ d) st :: v 0 s :: =d v 1 t :: d 1 exp(λ merge + λ v) st :: v 0 s :: =v v 1 t :: v 0 exp(λ merge + λ d) s, t :: v, -wh 0 s :: =d v 1 t :: d -wh 1 exp(λ merge + λ v) st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh 0 Training question: What values of λ merge, λ move, λ d, etc. make the training corpus most likely? 171 / 196

68 Solution #1 with the smarter parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise Maximise likelihood via stochastic gradient ascent: exp(λ φ(n δ)) P λ (N δ) = exp(λ φ(n δ )) 172 / 196

69 Solution #1 with the smarter parametrization Grammar pierre :: d marie :: d praise :: =d v often :: =v v who :: d -wh will :: =v =d t ɛ :: =t c ɛ :: =t +wh c Training data 90 pierre will praise marie 5 pierre will often praise marie 1 who pierre will praise 1 who pierre will often praise Maximise likelihood via stochastic gradient ascent: exp(λ φ(n δ)) P λ (N δ) = exp(λ φ(n δ )) naive smarter st :: v 0 s :: =d v 1 t :: d st :: v 0 s :: =v v 1 t :: v s, t :: v, -wh 0 s :: =d v 1 t :: d -wh st, u :: v, -wh 0 s :: =v v 1 t, u :: v, -wh / 196

70 Naive parametrization Smarter parametrization G A Naive parametrization G A Smarter parametrization Training corpus Training corpus G B1 Naive parametrization Training corpus G B2 Naive parametrization / 196

71 Solution #2 with the smarter parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave 174 / 196

72 Solution #2 with the smarter parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave Merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves foo boys will shave themselves who will shave themselves 174 / 196

73 Solution #2 with the smarter parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave Merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves foo boys will shave themselves who will shave themselves Merge and move unified: boys will shave foo boys will shave who will shave who will shave themselves boys will shave themselves foo boys will shave themselves 174 / 196

74 Solution #2 with the smarter parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave Merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves foo boys will shave themselves who will shave themselves Entropy Entropy Reduction 2.09 who will shave themselves Merge and move unified: boys will shave foo boys will shave who will shave who will shave themselves boys will shave themselves foo boys will shave themselves Entropy Entropy Reduction 2.13 who will shave themselves / 196

75 Solution #2 with the smarter parametrization normal MG re-merge MG MCFG MCFG Language of both grammars boys will shave boys will shave themselves who will shave who will shave themselves foo boys will shave foo boys will shave themselves Training data 10 boys will shave 2 boys will shave themselves 3 who will shave 1 who will shave themselves 5 foo boys will shave Merge and move distinct operations: boys will shave foo boys will shave who will shave boys will shave themselves foo boys will shave themselves who will shave themselves Merge and move unified: boys will shave foo boys will shave who will shave who will shave themselves boys will shave themselves foo boys will shave themselves 174 / 196

76 Naive parametrization Smarter parametrization G A Naive parametrization G A Smarter parametrization Training corpus Training corpus G B1 Naive parametrization G B1 Smarter parametrization Training corpus Training corpus G B2 Naive parametrization G B2 Smarter parametrization / 196

77 Choice points in the MG-derived MCFG Question or not? c 0 =t c 0 t 0 c 0 +wh c, -wh 0 Antecedent lexical or complex? ant -subj 0 =subj ant -subj 1 subj 0 ant -subj 0 =subj ant -subj 1 subj 1 Non-wh subject merged and complex, merged and lexical, or moved? t 0 =subj t 0 subj 0 t 0 =subj t 0 subj 1 t 0 +subj t, -subj 0 Wh-phrase same as moving subject or separated because of doubling? t, -wh 0 =subj t 0 subj -wh 1 t, -wh 0 +subj t, -subj, -wh / 196

78 Choice points in the MG-derived MCFG Question or not? c 0 =t c 0 t 0 exp(λ merge + λ t) c 0 +wh c, -wh 0 exp(λ move + λ wh) Antecedent lexical or complex? ant -subj 0 =subj ant -subj 1 subj 0 exp(λ merge + λ subj) ant -subj 0 =subj ant -subj 1 subj 1 exp(λ merge + λ subj) Non-wh subject merged and complex, merged and lexical, or moved? t 0 =subj t 0 subj 0 exp(λ merge + λ subj) t 0 =subj t 0 subj 1 exp(λ merge + λ subj) t 0 +subj t, -subj 0 exp(λ move + λ subj) Wh-phrase same as moving subject or separated because of doubling? t, -wh 0 =subj t 0 subj -wh 1 exp(λ merge + λ subj) t, -wh 0 +subj t, -subj, -wh 0 exp(λ move + λ subj) 176 / 196

79 Learned weights on the MG λ t = exp(λ t) = λ subj = exp(λ v) = λ wh = exp(λ wh) = λ merge = exp(λ merge) = λ move = exp(λ move) = / 196

80 Learned weights on the MG λ t = exp(λ t) = λ subj = exp(λ v) = λ wh = exp(λ wh) = λ merge = exp(λ merge) = P(antecedent is lexical) = 0.5 P(antecedent is non-lexical) = 0.5 P(wh-phrase reflexivized) = exp(λ move) exp(λ merge) + exp(λ move) = λ move = exp(λ move) = exp(λ merge) P(wh-phrase non-reflexivized) = exp(λ merge) + exp(λ = move) P(question) = P(non-question) = exp(λ move + λ wh) exp(λ merge + λ t) + exp(λ move + λ wh) = exp(λ merge + λ t) exp(λ merge + λ t) + exp(λ move + λ wh) = P(non-wh subject merged and complex) = P(non-wh subject merged and lexical) = P(non-wh subject moved) = exp(λ merge) exp(λ merge) + exp(λ merge) + exp(λ move) = exp(λ merge) exp(λ merge) + exp(λ merge) + exp(λ move) = exp(λ move) exp(λ merge) + exp(λ merge) + exp(λ move) = / 196

81 Learned weights on the MG λ t = exp(λ t) = λ subj = exp(λ v) = λ wh = exp(λ wh) = λ merge = exp(λ merge) = P(antecedent is lexical) = 0.5 P(antecedent is non-lexical) = 0.5 P(wh-phrase reflexivized) = exp(λ move) exp(λ merge) + exp(λ move) = λ move = exp(λ move) = exp(λ merge) P(wh-phrase non-reflexivized) = exp(λ merge) + exp(λ = move) P(question) = P(non-question) = exp(λ move + λ wh) exp(λ merge + λ t) + exp(λ move + λ wh) = exp(λ merge + λ t) exp(λ merge + λ t) + exp(λ move + λ wh) = P(non-wh subject merged and complex) = P(non-wh subject merged and lexical) = P(non-wh subject moved) = exp(λ merge) exp(λ merge) + exp(λ merge) + exp(λ move) = exp(λ merge) exp(λ merge) + exp(λ merge) + exp(λ move) = exp(λ move) exp(λ merge) + exp(λ merge) + exp(λ move) = P(who will shave) = = P(boys will shave themselves) = = / 196

82 Choice points in the IMG-derived MCFG Question or not? -c 0 +t -c, -t 1 -c 0 +wh -c, -wh 0 Antecedent lexical or complex? +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 0 +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 1 Non-wh subject merged and complex, merged and lexical, or moved? +subj -t, -subj 0 +subj -t 0 -subj 0 +subj -t, -subj 0 +subj -t 0 -subj 1 +subj -t, -subj 0 +v +subj -t, -v, -subj 1 Wh-phrase same as moving subject or separated because of doubling? -t, -wh 0 +subj -t, -subj -wh 0 -t, -wh 0 +subj -t, -subj, -wh / 196

83 Choice points in the IMG-derived MCFG Question or not? -c 0 +t -c, -t 1 exp(λ mrg + λ t) -c 0 +wh -c, -wh 0 exp(λ mrg + λ wh) Antecedent lexical or complex? +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 0 exp(λ insert) +subj -ant -subj, -subj 0 +subj -ant -subj 0 -subj 1 exp(λ insert) Non-wh subject merged and complex, merged and lexical, or moved? +subj -t, -subj 0 +subj -t 0 -subj 0 exp(λ insert) +subj -t, -subj 0 +subj -t 0 -subj 1 exp(λ insert) +subj -t, -subj 0 +v +subj -t, -v, -subj 1 exp(λ mrg + λ v) Wh-phrase same as moving subject or separated because of doubling? -t, -wh 0 +subj -t, -subj -wh 0 exp(λ mrg + λ subj) -t, -wh 0 +subj -t, -subj, -wh 0 exp(λ mrg + λ subj) 178 / 196

84 Learned weights on the IMG λ t = exp(λ t) = λ v = exp(λ v) = λ wh = exp(λ wh) = λ insert = exp(λ insert) = λ mrg = exp(λ mrg) = / 196

85 Learned weights on the IMG λ t = exp(λ t) = λ v = exp(λ v) = λ wh = exp(λ wh) = λ insert = exp(λ insert) = λ mrg = exp(λ mrg) = P(antecedent is lexical) = 0.5 P(antecedent is non-lexical) = 0.5 P(wh-phrase reflexivized) = 0.5 P(wh-phrase non-reflexivized) = 0.5 P(question) = P(non-question) = exp(λ mrg + λ wh) exp(λ mrg + λ t) + exp(λ mrg + λ = exp(λ wh) wh) exp(λ t) + exp(λ = wh) exp(λ mrg + λ t) exp(λ mrg + λ t) + exp(λ mrg + λ = exp(λ t) wh) exp(λ t) + exp(λ = wh) P(non-wh subject merged and lexical) = P(non-wh subject merged and complex) = P(non-wh subject moved) = exp(λ insert) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = exp(λ insert) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = exp(λ mrg + λ v) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = / 196

86 Learned weights on the IMG λ t = exp(λ t) = λ v = exp(λ v) = λ wh = exp(λ wh) = λ insert = exp(λ insert) = λ mrg = exp(λ mrg) = P(antecedent is lexical) = 0.5 P(antecedent is non-lexical) = 0.5 P(wh-phrase reflexivized) = 0.5 P(wh-phrase non-reflexivized) = 0.5 P(question) = P(non-question) = exp(λ mrg + λ wh) exp(λ mrg + λ t) + exp(λ mrg + λ = exp(λ wh) wh) exp(λ t) + exp(λ = wh) exp(λ mrg + λ t) exp(λ mrg + λ t) + exp(λ mrg + λ = exp(λ t) wh) exp(λ t) + exp(λ = wh) P(non-wh subject merged and lexical) = P(non-wh subject merged and complex) = P(non-wh subject moved) = exp(λ insert) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = exp(λ insert) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = exp(λ mrg + λ v) exp(λ insert) + exp(λ insert) + exp(λ mrg + λ v) = P(who will shave) = = P(boys will shave themselves) = = / 196

87 References I Billot, S. and Lang, B. (1989). The structure of shared forests in ambiguous parsing. In Proceedings of the 1989 Meeting of the Association of Computational Linguistics. Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press, Cambridge, MA. Chomsky, N. (1980). Rules and Representations. Columbia University Press, New York. Ferreira, F. (2005). Psycholinguistics, formal grammars, and cognitive science. The Linguistic Review, 22: Gärtner, H.-M. and Michaelis, J. (2010). On the Treatment of Multiple-Wh Interrogatives in Minimalist Grammars. In Hanneforth, T. and Fanselow, G., editors, Language and Logos, pages Akademie Verlag, Berlin. Gibson, E. and Wexler, K. (1994). Triggers. Linguistic Inquiry, 25: Hale, J. (2006). Uncertainty about the rest of the sentence. Cognitive Science, 30:643 Âŋ672. Hale, J. T. (2001). A probabilistic earley parser as a psycholinguistic model. In Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics. Hunter, T. (2011). Insertion Minimalist Grammars: Eliminating redundancies between merge and move. In Kanazawa, M., Kornai, A., Kracht, M., and Seki, H., editors, The Mathematics of Language (MOL 12 Proceedings), volume 6878 of LNCS, pages , Berlin Heidelberg. Springer. Hunter, T. and Dyer, C. (2013). Distributions on minimalist grammar derivations. In Proceedings of the 13th Meeting on the Mathematics of Language. Koopman, H. and Szabolcsi, A. (2000). Verbal Complexes. MIT Press, Cambridge, MA.

88 References II Lang, B. (1988). Parsing incomplete sentences. In Proceedings of the 12th International Conference on Computational Linguistics, pages Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106(3): Michaelis, J. (2001). Derivational minimalism is mildly context-sensitive. In Moortgat, M., editor, Logical Aspects of Computational Linguistics, volume 2014 of LNCS, pages Springer, Berlin Heidelberg. Miller, G. A. and Chomsky, N. (1963). Finitary models of language users. In Luce, R. D., Bush, R. R., and Galanter, E., editors, Handbook of Mathematical Psychology, volume 2. Wiley and Sons, New York. Morrill, G. (1994). Type Logical Grammar: Categorial Logic of Signs. Kluwer, Dordrecht. Nederhof, M. J. and Satta, G. (2008). Computing partition functions of pcfgs. Research on Language and Computation, 6(2): Seki, H., Matsumara, T., Fujii, M., and Kasami, T. (1991). On multiple context-free grammars. Theoretical Computer Science, 88: Stabler, E. P. (2006). Sidewards without copying. In Wintner, S., editor, Proceedings of The 11th Conference on Formal Grammar, pages , Stanford, CA. CSLI Publications. Stabler, E. P. (2011). Computational perspectives on minimalism. In Boeckx, C., editor, The Oxford Handbook of Linguistic Minimalism. Oxford University Press, Oxford. Stabler, E. P. and Keenan, E. L. (2003). Structural similarity within and among languages. Theoretical Computer Science, 293:

89 References III Vijay-Shanker, K., Weir, D. J., and Joshi, A. K. (1987). Characterizing structural descriptions produced by various grammatical formalisms. In Proceedings of the 25th Meeting of the Association for Computational Linguistics, pages Weir, D. (1988). Characterizing mildly context-sensitive grammar formalisms. PhD thesis, University of Pennsylvania. Yngve, V. H. (1960). A model and an hypothesis for language structure. In Proceedings of the American Philosophical Society, volume 104, pages

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014 Part 1: Grammars and cognitive hypotheses What is a grammar?