Logical and Computational Structures for Linguistic Modeling Part 3 Mildly Context-Sensitive Formalisms

Size: px
Start display at page:

Download "Logical and Computational Structures for Linguistic Modeling Part 3 Mildly Context-Sensitive Formalisms"

Transcription

1 Logical and Computational Structures for Linguistic Modeling Part 3 Mildly Context-Sensitive Formalisms Éric de la Clergerie <Eric.De_La_Clergerie@inria.fr> 30 Septembre 2014 Éric. de la Clergerie TAL 30/09/ / 91

2 Part I Tree Adjoining Grammars Éric. de la Clergerie TAL 30/09/ / 91

3 Extending CFG with structures CFG NP Éric. de la Clergerie TAL 30/09/ / 91

4 Extending CFG with structures CFG NP Datalog V(sing) DCG LFG HPSG S(gap(np)) λ-prolog Vocabulary Complexity unification Éric. de la Clergerie TAL 30/09/ / 91

5 Extending CFG with structures Derivation complexity combining structures RCG LCFRS TAG LIG N A N CFG NP Datalog V(sing) DCG LFG HPSG S(gap(np)) λ-prolog Vocabulary Complexity unification Éric. de la Clergerie TAL 30/09/ / 91

6 Extending CFG with structures Derivation complexity combining structures RCG LCFRS MCS TAG LIG N A N CFG NP Datalog V(sing) DCG LFG HPSG S(gap(np)) λ-prolog Vocabulary Complexity unification Éric. de la Clergerie TAL 30/09/ / 91

7 Extending CFG with structures Derivation complexity combining structures RCG LCFRS MCS TAG LIG N A N Feature TAG CFG NP Datalog V(sing) DCG LFG HPSG S(gap(np)) λ-prolog Vocabulary Complexity unification Éric. de la Clergerie TAL 30/09/ / 91

8 Outline 1 Some background about TAGs 2 Deductive chart-based TAG parsing 3 Automata-based tabular TAG parsing Éric. de la Clergerie TAL 30/09/ / 91

9 From CFG to Tree Substitution Grammars s > np vp np > pn np > det n np > np pp vp > v np vp > vp pp pp > prep np CFG productions: are too local = need decorations for info propagation are generally not lexicalized but info often propagated from words also more efficient parsing algo for lexicalized grammars Éric. de la Clergerie TAL 30/09/ / 91

10 From CFG to Tree Substitution Grammars s > np vp np > pn np > det n np > np pp vp > v np vp > vp pp pp > prep np CFG productions: are too local = need decorations for info propagation are generally not lexicalized but info often propagated from words also more efficient parsing algo for lexicalized grammars CFG productions can be grouped into trees = we get Tree Substitution Grammars (TSG) S For instance, dealing with ditransitive verb donner NP v VP NP PP manger prep NP à Éric. de la Clergerie TAL 30/09/ / 91

11 Derivation Tree vs Parse Tree TSG are strongly equivalent to CFG However, for TSG, parse trees and derivation trees are not equivalent S τ 1 S NP NP0 v NP τ 2 donne Jean VP NP NP NP1 prep a NP τ 3 une pomme PP Jean NP NP2 NP τ 4 Marie NP0 VP v NP PP donne une pomme prep à Marie τ 1 NP1 NP2 τ 2 τ 3 τ 4 Furthermore, several derivations may lead to a same parse tree Éric. de la Clergerie TAL 30/09/ / 91

12 One step farther: adjoining How to deal with: Jean donne souvent une pomme à Marie Need a way to insert the adverb somewhere in the verbal tree = adjoining operation Tree Adjoining Grammars Éric. de la Clergerie TAL 30/09/ / 91

13 TAG: a small example Tree Adjoining Grammars [TAGs] [Joshi] build parse trees from initial and auxiliary trees by using 2 tree operations: substitution and adjoining NP S subst = P John NP VP NP VP V John V sleeps sleeps Éric. de la Clergerie TAL 30/09/ / 91

14 TAG: a small example Tree Adjoining Grammars [TAGs] [Joshi] build parse trees from initial and auxiliary trees by using 2 tree operations: substitution and adjoining NP S subst = P V adj = S John NP VP NP VP V Adv NP VP V John V a lot John V sleeps sleeps V Adv sleeps a lot Éric. de la Clergerie TAL 30/09/ / 91

15 A more complex example: French comparative Paul est plus grand que lui Éric. de la Clergerie TAL 30/09/ / 91

16 A more complex example: French comparative adjp supermod supermod supermod CS adv plus adjp adjp adj grand csu que NP S NP VP pn Paul v est adjp supermod supermod CS adv adjp csu NP plus adj que pro grand lui Éric. de la Clergerie TAL 30/09/ / 91

17 TAGs: (First) Formal definition A TAG G is a tuple (N,Σ, S,I,A) where Σ a finite set of terminal symbols N a finite set of non-terminal symbols S N the axiom I and A are two finite sets of elementary trees over N Σ {ǫ} only leaf nodes ν may have a label l(ν) Σ {ǫ} the trees α in I are initial trees the trees β in A are auxiliary trees and have a unique leaf node marked ( ) as a foot f β with same label than the root node r β, i.e. l(f β ) = l(r β ) Two operations may be used to combine the elementary trees substitution of a leaf node ν of γ by some initial tree α I, (l(ν) = l(t α )) adjoining of an (internal) node ν of γ by some auxiliary tree β A G generates a tree language and a string language T(G) = {γ α = γ yield(γ) Σ α I r α = S} L(G) = {yield(γ) γ T(G)} Éric. de la Clergerie TAL 30/09/ / 91

18 Adjoining Assuming γ = (V, E) with ν V and β = (V β, E β ) with r β, f β V β, such that l(ν) = l(r β ) N with γ[adj(ν,β)] = (V, E ) V = V V β \{ν} {(x, y) E x ν y ν} E = E β {(x, r β ) (x,ν) E} {(f β, y) (ν, y) E} Note: The node sets are assumed to be renamed to avoid clashes, i.e. E E = Éric. de la Clergerie TAL 30/09/ / 91

19 Adjoining contraints A full definition of TAGs should include constraints on adjoining nodes: A TAG G is a tuple (N,Σ, S,I,A, f OA, f SA ) where, assuming V set of nodes in I A, f OA : V {0, 1} specify if adjoining on ν is obligatory (1) or not (0) f SA : V 2 A specify which auxiliary trees may be adjoined on ν note: ν becomes non-adjoinable with f SA (ν) = Adjoining constraints necessary for getting the full expressive power of TAGs but they are often implicit: no adjoining on leaf nodes (including foot nodes) explicit mandatory adjoining (MA, +) marks on some nodes explicit non adjoining (NA, ) marks on some nodes Éric. de la Clergerie TAL 30/09/ / 91

20 Derivation tree S NP NP VP S NP VP Tarzan Jane VP Adv NP VP V NP passionately Tarzan VP Adv loves V NP passionately α(loves) loves Jane subst(np 1 ) subst(np 2 ) adj(vp 1 ) α 1 (Tarzan) α 2 (Jane) β 1 (passionately) For TAGs, derivation tree not isomorphic to parse tree but close from semantic level. Éric. de la Clergerie TAL 30/09/ / 91

21 Regular Tree Languages For a TAG G, its set of derivation trees D(G) forms a regular tree language i.e., D(G) may be generated by a finite tree automaton (top-down) term rewrite rules of the form q 0 a(q 1,...,q n ), q i Q, a F may also be seen as the parse trees for some CFG G Éric. de la Clergerie TAL 30/09/ / 91

22 TAG complexity: Adjoining Root Adjunction node Spine Adjunction Foot Auxiliary Tree adjoining Input String discontinuity (hole in aux tree) crossing (both sides of the hole) Éric. de la Clergerie TAL 30/09/ / 91

23 TAG complexity: Adjoining Root Adjunction node Spine Adjunction Foot Auxiliary Tree Input String adjoining discontinuity (hole in aux tree) crossing (both sides of the hole) unbounded synchronization (both sides of spine) nested adjoining Éric. de la Clergerie TAL 30/09/ / 91

24 Expressive power of TAGs The adjoining operation extends the expressive power of TAGs w.r.t. CFGs. long distance dependencies (wh-pronoun extraction for instance) crossed dependencies as given by copy language ww or by language a n b n c n (1) omdat ik Cecillia de nijlpaarden zag voeren because I Cecilia the hippopotamuses saw feed because I saw Cecilia feed the hippopotamuses Éric. de la Clergerie TAL 30/09/ / 91

25 Expressive power (limits) TAGs can t handle the following languages: a n b m c n d m e n f m multiple copy languages w n with n > 2. Éric. de la Clergerie TAL 30/09/ / 91

26 Pumping lemma Tree Adjoining Languages satisfy a pumping lemma If L is a TAL, then there exists N, such for all w L and w > N, there exist x, y, z, v 1, v 2, w 1, w 2, w 3, w 4 Σ, such that { v1 v 2 w 1 w 2 w 3 w 4 N w 1 w 2 w 3 w 4 1 and one of the following case holds 1 w = xw 1 v 1 w 2 yw 3 v 2 w 4 z and k 0, xw k 1 v 1w k 2 ywk 3 v 2w k 4 z L 2 w = xw 1 v 1 w 2 v 2 w 3 yw 4 z and k 0, xw k+1 1 v 1 w 2 v 2 w 3 (w 2 w 4 w 3 ) k yw 4 z L 3 w = xw 1 yw 2 v 1 w 3 v 2 w 4 z and forallk 0, xw 1 y(w 2 w 1 w 3 ) k w 2 v 1 w 3 v 2 w k+1 4 L Éric. de la Clergerie TAL 30/09/ / 91

27 Closure properties of TALs As CFLs, TALs form an Abstract Family of Languages (AFL): 1 closed by intersection with regular languages 2 closed by union, concatenation, and Kleene-iteration 3 closed by homomorphism and inverse homomorphism In particular, (1) = notion of Shared Derivation Forest Éric. de la Clergerie TAL 30/09/ / 91

28 Shared Derivation Forests Formal definition in Vijay-Shanker & Weir Tarzan 1 loves 2 Jane 3 very 4 passionately 5 α 1 (Tarzan) α(loves) subst(np subst(np 1 ) 2 adj(vp ) 1 ) α 2 (Jane) β 1 (passionately) adj(adv 1 ) β 2 (very) α 1 (0, 5) α 1 (0, 1) loves(1, 2) α 2 (2, 3) β 1 (1, 5, 1, 3) β 1 (1, 5, 1, 3) β 2 (3, 5, 4, 5) passionately(4, 5) β 2 (3, 5, 4, 5) very(3, 4) α 1 (0, 1) Tarzan(0, 1) α 1 (2, 3) Janes(0, 1) More formally, use tree nodes rather than trees Space complexity in O(n 6 ) by binarization (adj on spine node ν) ν (i, j, r, s) r β (i, j, p, q) ν (p, q, r, s) Éric. de la Clergerie TAL 30/09/ / 91

29 Well formed trees Many possible ways to define elementary trees In practive, elementary trees follow some linguistic principles: lexical anchoring: at least, one non-empty lexical (frontier) node the head (or anchor) sub-categorization: a frontier node for each argument sub-categorized by the head domain of locality semantic consistency: a tree correspond to the scope of a semantic predicate with its arguments non-composition: a tree stands for a single semantic unit A few bad trees: S V S S S NP VP dort NP VP C S NP VP V que V NP donne regarde Det N le chien Éric. de la Clergerie TAL 30/09/ / 91

30 Subcategorization S NP VP S NP VP V NP PP V S donner Prep NP penser C S à que Éric. de la Clergerie TAL 30/09/ / 91

31 Feature TAGs The nodes may be decorated with a pair (top,bot) of decorations S NP VP S NP VP V S b:mode=inf V S vouloir vouloir C S b:mode=subj que When adj on ν, unification of ν.top with r β.top and ν.bot with f β.bot alternate way to express adjoining constraints Note: for flat decorations, same expressive power and complexity Éric. de la Clergerie TAL 30/09/ / 91

32 TAG families Trees derived from a canonical ones grouped into families e.g. family of transitive verbs NP0 tn0vn1 S V v mange VP NP1 twn1n0v S t.wh=+ NP1 NP0 S VP V v trn1n0v NP NP S + all other extractions (on NP 0 ) + passive + extractions on passive + ordering + multiple realizations +... que NP0 S VP V v XTAG architecture a set of trees (with anchor nodes) grouped into families a lexicon L specifying for each word w the set of families it may anchor + additional constraints Éric. de la Clergerie TAL 30/09/ / 91

33 Meta-grammars Large coverage TAG = many trees to write and maintain! Alternative: generate the trees from a higher description level: meta-grammars Abeillé, Candito hierarchy of classes, containing constraints A precedes B, A dominates B,... a class deals with a linguistic facet e.g. verb argument, refined into subject or object a class may require or provide functionalities the classes may be combined to form neutral classes the constraints of the neutral classes used to generate elementary trees = used for FRMG, a large-coverage French TAG (plus mechanisms for factorizing elementary trees) Éric. de la Clergerie TAL 30/09/ / 91

34 Long-distance dependencies (Recursive) Adjoining may replace LFG s functional uncertainty for long-distance dependencies Jean demande [quel homme Paul pense [que Marie regarde ǫ]] S NP S ( Wh) = c + = ( Focus) = ( Wh)=+ ( Focus)= (Comp) Obj Éric. de la Clergerie TAL 30/09/ / 91

35 Long-distance dependencies (TAGs) Handled through repeated adjoining S NP S quel homme NP VP S Marie v NP NP VP regarde ǫ Paul v CS pense csu que *S Éric. de la Clergerie TAL 30/09/ / 91

36 Outline 1 Some background about TAGs 2 Deductive chart-based TAG parsing 3 Automata-based tabular TAG parsing Éric. de la Clergerie TAL 30/09/ / 91

37 Deductive parsing Formalization of chart parsing Use of universe of tabulable items, representing (set of) partial parses items often build upon dotted rules A 0 A 1...A i A i+1...a n chart edges labeled by dotted rules ( items i, j, A α β ) a deductive system specifying how to derive items Éric. de la Clergerie TAL 30/09/ / 91

38 CKY as a deductive system (for CFGs) A α i, i, A α A α i (Seed) i, j, A α aβ i, j + 1, A αa β A αa β A α aβ a = a j+1 i j j + 1 (Scan) i, j, A α Bβ j, k, B γ i, k, A αb β A αb β i A α Bβ j B γ k (Complete) Éric. de la Clergerie TAL 30/09/ / 91

39 CKY for TAGs CKY algorithm for TAGs [Vijay-Shanker & Joshi 85] Presentation: Dotted trees N and N where N is a node of an elementary tree Items N, i, p, q, j and N, i, p, q, j with p, q possibly covering a foot node. N N M i N j u N v p M, i, p, q, j q p q Without adjoining: N, p,,, q With adjoining: N, u,,, v Éric. de la Clergerie TAL 30/09/ / 91

40 Rule (Adjoin) Gluing a sub-tree at a foot node. N, p, r, s, q R t, i, p, q, j N, i, r, s, j label(n) = label(r t ) (Adjoin) R t N i pq N j p rs q i rs j Éric. de la Clergerie TAL 30/09/ / 91

41 Rule (NoAdjoin) When no adjoining on a node N, p, r, s, q N, p, r, s, q (NoAdjoin) Éric. de la Clergerie TAL 30/09/ / 91

42 Rule (Complete) Gluing all node s children N i, l i, p i, q i, r i i=1,...,v N, l 1, p i, q i, r v N N 1... N v and i, l i+1 = r i (Complete) Note: At most one child (k) covers a foot node with ( p i, q i ) = (p k, q k ) N N N 1 l r 1 l k N k p k q k r k l v N v - - r v l 1 p k q k r v Éric. de la Clergerie TAL 30/09/ / 91

43 Complexity Other deductive rules needed to handle 1 substitution 2 terminal scanning + axioms Time complexity O(n max(6,1+v+2) ) with v : maximal number of children per node 2 : number of indexes to cover a possible unique foot node Normalization using binary-branching trees (v = 2) = complexity O(n 6 ) 4 indexes per item = Space complexity in O(n 4 ) for a recognizer O(n 6 ) for a parser, keeping backpointers to parents Optimal worst-case complexities but practically, even less efficient than CKY for CFGs Éric. de la Clergerie TAL 30/09/ / 91

44 Prediction, dotted trees and dotted producctions To mark prediction, new dotted trees [Shabes]: N and N Alternative: equivalence with dotted productions N N 1... N v N N 1...N v dotted tree dotted production N k, N k+1 N N 1...N k N k+1...n v R (root) R R (root) R N N N 1...N v N N N 1...N n i p j q N M N α Mβ, i, p, q, j Éric. de la Clergerie TAL 30/09/ / 91

45 Non prefix valid Earley algorithm Glue a sub-tree at foot node F t (maybe useless!) M γ, p, r, s, q R t, i, p, q, j M γ, i, r, s, j label(m) = label(r t ) (Adjoin) Advance in recognition of N s children N α Mβ, i, u, v, j M γ, j, r, s, k N αm β, i, u r, v s, k (Complete) (Adjoin) and (Complete) similar to CKY (binary form) Éric. de la Clergerie TAL 30/09/ / 91

46 Adjoining Prediction Predict adjoining at M N α Mβ, i, p, q, j label(m) = label(r t ) (CallAdj) M i pq j Éric. de la Clergerie TAL 30/09/ / 91

47 Adjoining Prediction Predict adjoining at M N α Mβ, i, p, q, j R t, j,,, j label(m) = label(r t ) (CallAdj) M i pq j t Éric. de la Clergerie TAL 30/09/ / 91

48 Foot Prediction Predict a sub-tree root at M to recognize below foot node F t F t, i,,, i label(f t ) = label(m) (CallFoot) F t Éric. de la Clergerie TAL 30/09/ / 91

49 Foot Prediction Predict a sub-tree root at M to recognize below foot node F t F t, i,,, i M γ, i,,, i label(f t ) = label(m) (CallFoot) F t M i Éric. de la Clergerie TAL 30/09/ / 91

50 Foot Prediction Predict a sub-tree root at M to recognize below foot node F t F t, i,,, i M γ, i,,, i label(f t ) = label(m) (CallFoot) F t M i The prediction of M not related to the node M having triggered the adjoining of t = Non prefix valid parsing strategy Éric. de la Clergerie TAL 30/09/ / 91

51 Complexity Space complexity remains O(n 4 ) Dotted productions = implicit binarization = time in O(n 6 ) Non prefix valid: impact difficult to evaluate in practice Note: Dotted productions also applicable to improve CKY Éric. de la Clergerie TAL 30/09/ / 91

52 Prefix valid Early [Shabes] Complexities time in O(n 9 ) and space in O(n 6 ) due to 6-index items l tl bl j fl fr Actually, tl and bl may be avoided using dotted productions Éric. de la Clergerie TAL 30/09/ / 91

53 Prefix valid Earley [Nederhof] Item with only an extra index h: h, N α β, i, p, q, j h states starting (leftmost) position of current tree A B u A B h i A j p q h, N αb β, i, p, q, j A h i j p q u, A γ, p,,, q Éric. de la Clergerie TAL 30/09/ / 91

54 Foot prediction h, N α Mβ, i, p, q, j label(f t ) = label(m) (CallFootPf) h i Éric. de la Clergerie TAL 30/09/ / 91

55 Foot prediction h, N α Mβ, i, p, q, j j, F t, k,,, k label(f t ) = label(m) (CallFootPf) h i j F t Éric. de la Clergerie TAL 30/09/ / 91

56 Foot prediction h, N α Mβ, i, p, q, j j, F t, k,,, k h, M γ, k,,, k label(f t ) = label(m) (CallFootPf) M h i j F t M k Éric. de la Clergerie TAL 30/09/ / 91

57 Adjoining return h, N α Mβ, i, u, v, j h, M γ, p, r, s, q label(m) = label(r t ) (AdjoinPf) M h i M p q Éric. de la Clergerie TAL 30/09/ / 91

58 Adjoining return h, N α Mβ, i, u, v, j j, R t, j, p, q, k h, M γ, p, r, s, q label(m) = label(r t ) (AdjoinPf) M h i R t j k M p q Éric. de la Clergerie TAL 30/09/ / 91

59 Adjoining return h, N α Mβ, i, u, v, j j, R t, j, p, q, k h, M γ, p, r, s, q h, N αm β, i, u r, v s, k label(m) = label(r t ) (AdjoinPf) M M h i R t h i k j k M p q Éric. de la Clergerie TAL 30/09/ / 91

60 Raw complexity Maximal time complexity provided by (AdjoinPf) : O(n 10 ) because of 10 indexes h, N α Mβ, i, u, v, j j, R t, j, p, q, k h, M γ, p, r, s, q h, N αm β, i, u r, v s, k label(m) = label(r t ) (AdjoinPf) But (u, v) or (r, s) equals (, ) = (Case analysis) splitting rule into 2 sub-rules = O(n 8 ) = not sufficient! Éric. de la Clergerie TAL 30/09/ / 91

61 Splitting and intermediary structures Split (AdjoinPf) into 2 successive steps with an intermediary structure [M γ, j, r, s, k] This intermediary structure combines the aux. tree with the subtree rooted at M j, R t, j, p, q, k h, M γ, p, r, s, q [M γ, j, r, s, k] (AdjoinPf-1) h, N α Mβ, i, u, v, j [M γ, j, r, s, k] h, M γ, p, r, s, q h, N αm β, i, u r, v s, k (AdjoinPf-2) Éric. de la Clergerie TAL 30/09/ / 91

62 Projection j, R t, j, p, q, k h, M γ, p, r, s, q [M γ, j, r, s, k] (AdjoinPf-1) Involves 7 indexes {j, p, q, k, h, r, s} but h not consulted h, M γ, p, r, s, q, M γ, p, r, s, q (Proj) j, R t, j, p, q, k, M γ, p, r, s, q [M γ, j, r, s, k] (AdjoinPf-1) Finally, O(n 6 ) time complexity Éric. de la Clergerie TAL 30/09/ / 91

63 Case of (AdjoinPf-2) h, N α Mβ, i, u, v, j [M γ, j, r, s, k] h, M γ, p, r, s, q h, N αm β, i, u r, v s, k (AdjoinPf-2) 10 indexes = Raw complexity in O(n 10 ) At least one pair in (u, v) or (r, s) equals (, ); Case splitting = O(n 8 ) Pair (p,q) not consulted; projection = O(n 6 ) Éric. de la Clergerie TAL 30/09/ / 91

64 Preliminary conclusion Rule splitting, intermediary structures, and projections decrease complexities but increase the number of steps To be practically validated! Designing a tabular algorithm for TAGs is complex! Designing items Understanding the invariants Formulating the deductive rules (simultaneously handling tabulation and strategy) Optimizing rules (splitting and projections) How to adapt for close formalisms such as Linear Indexed Grammars [LIG]? A 0 ([ x]) A 1 ([])...A k ([ y])...a n ([]) Éric. de la Clergerie TAL 30/09/ / 91

65 LIG Indexed Grammars: Context-Free grammars with non terminals decorated with stacks Linear Indexed Grammars: a single stack propagated per production A LIG G = (N,Σ,I, S,P) where I is a finite set of indices P is a finite set of productions of the form A[ α] A 1 []...A i [ β]...a n [] or A[] γ with γ Σ and α,β I Relationship with (linear monadic) Context-Free Tree Languages Éric. de la Clergerie TAL 30/09/ / 91

66 LIGs and TAGs LIGs and TAGs are weakly equivalent, and almost strongly equivalent TAGs may be easily encoded by LIGs, using tree nodes as non-terminals adjoining node ν in γ using aux. tree β ν[ ] r β [ ν] discharging a node ν with children ν 1,...,ν n at a foot node f β f β [ ν] ν 1 [α 1 ]...ν n [α n ] where α i = [ ] if ν i on spine, and α i = [] otherwise traversing a node ν without adjoining ν[ ] ν 1 [α 1 ]...ν n [α n ] with same conditions on α i than above Reverse way more difficult: no locality constraint between push and pop points (same aux. tree β for TAGs) Suggest using LPDAs to parse LIGs and TAGs but non efficient and non termination Éric. de la Clergerie TAL 30/09/ / 91

67 Outline 1 Some background about TAGs 2 Deductive chart-based TAG parsing 3 Automata-based tabular TAG parsing Éric. de la Clergerie TAL 30/09/ / 91

68 From formalisms to automata Methodology: Automata are operational devices used to describe the steps of Parsing Strategies Dynamic Programming interpretations of automata used to identify context-free subderivations that may be tabulated. Formalisms Automata Tabulation Notes RegExp FSA - CFG PDA O(n 3 ) Lang TAG / LIG 2-Stack Automata O(n 6 ) Becker, Clergerie & Pardo Embedded PDA O(n 6 ) Nederhof Problem: 2-stack automata (or EPDA) have the power of Turing Machine (intuition) moving left- or rightward pushing on first or second stack & popping the other one = need restrictions Éric. de la Clergerie TAL 30/09/ / 91

69 EPDA Embedded Push-Down Automata Becker are natural candidates for LIGs (and TAGs) by handling stack of stacks. Two flavors: Top-Down and Bottom-Up EPDAs WRAP(TD) UNWRAP (BU) Éric. de la Clergerie TAL 30/09/ / 91

70 2-stack automata for TAGs Solution: stack asymmetry Master Stack: to keep trace of uncompleted tree traversals Auxiliary Stack: only to keep trace of uncompleted adjunctions Adjunction info: (top-down) ν n = ν and (bottom-up) ν n = T, T, B, B : prediction and propagation info about top and bottom node decorations (Feature TAGs) Calls (top-down prediction) Returns (bottom-up propagation) ν r f ν Éric. de la Clergerie TAL 30/09/ / 91

71 2-stack automata for TAGs Solution: stack asymmetry Master Stack: to keep trace of uncompleted tree traversals Auxiliary Stack: only to keep trace of uncompleted adjunctions Adjunction info: (top-down) ν n = ν and (bottom-up) ν n = T, T, B, B : prediction and propagation info about top and bottom node decorations (Feature TAGs) Calls (top-down prediction) Returns (bottom-up propagation) ν T ν ν n transition ACALL Call Adj. ν r f ν Éric. de la Clergerie TAL 30/09/ / 91

72 2-stack automata for TAGs Solution: stack asymmetry Master Stack: to keep trace of uncompleted tree traversals Auxiliary Stack: only to keep trace of uncompleted adjunctions Adjunction info: (top-down) ν n = ν and (bottom-up) ν n = T, T, B, B : prediction and propagation info about top and bottom node decorations (Feature TAGs) Calls (top-down prediction) Returns (bottom-up propagation) ν T ν ν n transition ACALL Call Adj. ν r f ν n B+ν n f transition FCALL Call Foot f ν Éric. de la Clergerie TAL 30/09/ / 91

73 2-stack automata for TAGs Solution: stack asymmetry Master Stack: to keep trace of uncompleted tree traversals Auxiliary Stack: only to keep trace of uncompleted adjunctions Adjunction info: (top-down) ν n = ν and (bottom-up) ν n = T, T, B, B : prediction and propagation info about top and bottom node decorations (Feature TAGs) Calls (top-down prediction) Returns (bottom-up propagation) ν T ν ν n transition ACALL Call Adj. ν r f ν n B+ν n f transition FCALL Call Foot f ν Ret Foot B +ν n f f ν n transition FRET Éric. de la Clergerie TAL 30/09/ / 91

74 2-stack automata for TAGs Solution: stack asymmetry Master Stack: to keep trace of uncompleted tree traversals Auxiliary Stack: only to keep trace of uncompleted adjunctions Adjunction info: (top-down) ν n = ν and (bottom-up) ν n = T, T, B, B : prediction and propagation info about top and bottom node decorations (Feature TAGs) Calls (top-down prediction) Returns (bottom-up propagation) ν T ν ν n transition ACALL Call Adj. ν r Ret Adj. T ν ν n ν transition ARET f ν n B+ν n f transition FCALL Call Foot f ν Ret Foot B +ν n f f ν n transition FRET Éric. de la Clergerie TAL 30/09/ / 91

75 Transitions Retracing in erase mode concerns only the size of AS (not its content). Retracing possible because : WRITE transitions leave marks (PUSH, POP, NOP, NEW) in the Master Stack that can only be removed by a dual ERASE transition. Auxiliary stack PUSH POP NOP NEW WRITE transitions SWAP Transition ERASE transitions Master stack Éric. de la Clergerie TAL 30/09/ / 91

76 Context-Free derivation for PDAs Dynamic Programming : Recursive decomposition of problems into elementary subproblems that may be combined, tabulated, and reused eg the knapsack problem Éric. de la Clergerie TAL 30/09/ / 91

77 Context-Free derivation for PDAs Dynamic Programming : Recursive decomposition of problems into elementary subproblems that may be combined, tabulated, and reused eg the knapsack problem For PDAs, derivations broken into elementary Context-Free sub-derivations: A PUSH B Aσ Init Éric. de la Clergerie TAL 30/09/ / 91

78 Context-Free derivation for PDAs Dynamic Programming : Recursive decomposition of problems into elementary subproblems that may be combined, tabulated, and reused eg the knapsack problem For PDAs, derivations broken into elementary Context-Free sub-derivations: Init A PUSH B Aσ ǫ-item B, Aσ Éric. de la Clergerie TAL 30/09/ / 91

79 Context-Free derivation for PDAs Dynamic Programming : Recursive decomposition of problems into elementary subproblems that may be combined, tabulated, and reused eg the knapsack problem For PDAs, derivations broken into elementary Context-Free sub-derivations: Init A PUSH B Aσ ǫ-item B, Aσ Éric. de la Clergerie TAL 30/09/ / 91

80 Context-Free derivation for PDAs Dynamic Programming : Recursive decomposition of problems into elementary subproblems that may be combined, tabulated, and reused eg the knapsack problem For PDAs, derivations broken into elementary Context-Free sub-derivations: Init A PUSH B Aσ ǫ-item B, Aσ A is the fraction ǫ of information consulted to trigger the subderivation and not propagated to B. Éric. de la Clergerie TAL 30/09/ / 91

81 (Escaped) CF derivations for 2SA Escaped derivation A B D E C Éric. de la Clergerie TAL 30/09/ / 91

82 (Escaped) CF derivations for 2SA Escaped derivation A B D E C = 5-point xcf items AB[DE]C = ǫa ǫb, b [ ǫd, d E ] C, c [TAG] ǫa ǫb [ ǫd E ] C When no escaped part = 3-point CF items ABC = ǫa ǫb, b C (new generalization) escaped part [DE] may take place between A and B Éric. de la Clergerie TAL 30/09/ / 91

83 xcfs and TAGs Escaped derivation A B D E C A root of elementary tree B start of adjoining C current position in the tree D and E left and right borders of the foot Éric. de la Clergerie TAL 30/09/ / 91

84 Item shapes At most 5 indexes per items = Space complexity in O(n 5 ) SD-2SA restrictions & transition kinds = 6 possible item shapes A B C A B C A B C B A B C D E A B C D E A C D E Éric. de la Clergerie TAL 30/09/ / 91

85 Combining items and transitions By graphically playing with items and transitions, we find 10 composition rules with O(n 8 ) time complexity may be split into 11 rules with O(n 6 ) time complexity (Easy:) Write a POP mark: I 1 + I 2 +τ = I 3 Éric. de la Clergerie TAL 30/09/ / 91

86 Combining items and transitions By graphically playing with items and transitions, we find 10 composition rules with O(n 8 ) time complexity may be split into 11 rules with O(n 6 ) time complexity (Easy:) Write a POP mark: I 1 + I 2 +τ = I 3 I 2 τ Éric. de la Clergerie TAL 30/09/ / 91

87 Combining items and transitions By graphically playing with items and transitions, we find 10 composition rules with O(n 8 ) time complexity may be split into 11 rules with O(n 6 ) time complexity (Easy:) Write a POP mark: I 1 + I 2 +τ = I 3 I 1 I 2 τ Éric. de la Clergerie TAL 30/09/ / 91

88 Combining items and transitions By graphically playing with items and transitions, we find 10 composition rules with O(n 8 ) time complexity may be split into 11 rules with O(n 6 ) time complexity (Easy:) Write a POP mark: I 1 + I 2 +τ = I 3 I 1 I 2 τ I 3 Éric. de la Clergerie TAL 30/09/ / 91

89 Combining items and transitions By graphically playing with items and transitions, we find 10 composition rules with O(n 8 ) time complexity may be split into 11 rules with O(n 6 ) time complexity (Easy:) Write a POP mark: I 1 + I 2 +τ = I 3 I 1 I 2 τ I 3 Consultation of 3 indexes = Complexity O(n 3 ) Éric. de la Clergerie TAL 30/09/ / 91

90 Combining items and transitions (2) (complex:) Erasing a PUSH mark: I 1 + I 2 + I 3 +τ = I 4 e.g. when returning from auxiliary tree (ending adjoining) Éric. de la Clergerie TAL 30/09/ / 91

91 Combining items and transitions (2) (complex:) Erasing a PUSH mark: I 1 + I 2 + I 3 +τ = I 4 e.g. when returning from auxiliary tree (ending adjoining) τ I 2 Éric. de la Clergerie TAL 30/09/ / 91

92 Combining items and transitions (2) (complex:) Erasing a PUSH mark: I 1 + I 2 + I 3 +τ = I 4 e.g. when returning from auxiliary tree (ending adjoining) I 1 τ I 2 Éric. de la Clergerie TAL 30/09/ / 91

93 Combining items and transitions (2) (complex:) Erasing a PUSH mark: I 1 + I 2 + I 3 +τ = I 4 e.g. when returning from auxiliary tree (ending adjoining) τ I 2 I 1 I 3 Éric. de la Clergerie TAL 30/09/ / 91

94 Combining items and transitions (2) (complex:) Erasing a PUSH mark: I 1 + I 2 + I 3 +τ = I 4 e.g. when returning from auxiliary tree (ending adjoining) I 4 I 2 τ I I 3 1 Éric. de la Clergerie TAL 30/09/ / 91

95 Combining items and transitions (2) (complex:) Erasing a PUSH mark: I 1 + I 2 + I 3 +τ = I 4 e.g. when returning from auxiliary tree (ending adjoining) I 4 I 2 τ I I 3 1 Consultation of 8 indexes = Complexity O(n 8 ) need to decompose, project and use intermediary steps (as seen before) Éric. de la Clergerie TAL 30/09/ / 91

96 Cascade of partial evaluations ν f Éric. de la Clergerie TAL 30/09/ / 91

97 Cascade of partial evaluations ν CAI RAI > f I a CFI > Éric. de la Clergerie TAL 30/09/ / 91

98 Cascade of partial evaluations ν CAI RAI > CFI f RFI > I a CFI > Éric. de la Clergerie TAL 30/09/ / 91

99 Cascade of partial evaluations ν CAI RAI > CFI f RFI > I a CFI > RFI a RFI b RFI c Éric. de la Clergerie TAL 30/09/ / 91

100 Cascade of partial evaluations ν CAI RAI > CFI f RFI > I a CFI > RFI a RFI b RFI > 1 RFI c Éric. de la Clergerie TAL 30/09/ / 91

101 Cascade of partial evaluations ν CAI RAI RAI > CFI f RFI > I a CFI > RFI a RFI b RFI > 1 RFI c Éric. de la Clergerie TAL 30/09/ / 91

102 Cascade of partial evaluations ν CAI RAI RAI > 1 RAI > CFI f RFI > I a CFI > RFI a RFI b RFI > 1 RFI c Éric. de la Clergerie TAL 30/09/ / 91

103 Cascade of partial evaluations ν CAI RAI RAI > 1 RAI > CFI f RFI > I a CFI > RFI a RFI b RFI > 1 RFI c Éric. de la Clergerie TAL 30/09/ / 91

104 Cascade of partial evaluations ν RAI > CAI RAI RAI > 1 L CFI f RFI > I a CFI > RFI a RFI b RFI > 1 RFI c Éric. de la Clergerie TAL 30/09/ / 91

105 Cascade of partial evaluations ν RAI > CAI RAI RAI > 1 L CFI f RFI > I a CFI > RFI a RFI b RFI > 1 RFI c Éric. de la Clergerie TAL 30/09/ / 91

106 Simplified Cascade of partial evaluations ν R f Not the optimal worst case complexity (because yellow subtree traversed in the context of larger yellow subtree, keeping trace of unfinished adjoinings) But more efficient in practice! And suggesting extensions, based on the idea of continuation Éric. de la Clergerie TAL 30/09/ / 91

107 Simplified Cascade of partial evaluations ν R CAI f CFI > Not the optimal worst case complexity (because yellow subtree traversed in the context of larger yellow subtree, keeping trace of unfinished adjoinings) But more efficient in practice! And suggesting extensions, based on the idea of continuation Éric. de la Clergerie TAL 30/09/ / 91

108 Simplified Cascade of partial evaluations ν R CAI f CFI RFI > CFI > Not the optimal worst case complexity (because yellow subtree traversed in the context of larger yellow subtree, keeping trace of unfinished adjoinings) But more efficient in practice! And suggesting extensions, based on the idea of continuation Éric. de la Clergerie TAL 30/09/ / 91

109 Simplified Cascade of partial evaluations ν R CAI CFI f RFI > CFI > RFI RAI > Not the optimal worst case complexity (because yellow subtree traversed in the context of larger yellow subtree, keeping trace of unfinished adjoinings) But more efficient in practice! And suggesting extensions, based on the idea of continuation Éric. de la Clergerie TAL 30/09/ / 91

110 Simplified Cascade of partial evaluations ν R CAI RAI CFI f RFI > CFI > RFI RAI > Not the optimal worst case complexity (because yellow subtree traversed in the context of larger yellow subtree, keeping trace of unfinished adjoinings) But more efficient in practice! And suggesting extensions, based on the idea of continuation Éric. de la Clergerie TAL 30/09/ / 91

111 Simplified Cascade of partial evaluations ν R CAI RAI CFI f RFI > CFI > RFI RAI > Not the optimal worst case complexity (because yellow subtree traversed in the context of larger yellow subtree, keeping trace of unfinished adjoinings) But more efficient in practice! And suggesting extensions, based on the idea of continuation Éric. de la Clergerie TAL 30/09/ / 91

112 Part II MCS in general Éric. de la Clergerie TAL 30/09/ / 91

113 Outline 4 Thread Automata and MCS formalisms 5 A Dynamic Programming interpretation for TAs Éric. de la Clergerie TAL 30/09/ / 91

114 Mildly Context Sensitivity An informal notion covering formalisms such that: they are powerful enough to model crossing, such as a n b n c n they are parsable with polynomial complexity i.e. Given L, there exists k, membership w checked in O( w k ) they generate string languages satisfying the constant growth property G, G finite, n 0, w L, w > n 0 = g G, w L, w = w +g (intuition) the languages are generated by finite sets of generators Éric. de la Clergerie TAL 30/09/ / 91

115 Mildly Context Sensitivity An informal notion covering formalisms such that: they are powerful enough to model crossing, such as a n b n c n they are parsable with polynomial complexity i.e. Given L, there exists k, membership w checked in O( w k ) they generate string languages satisfying the constant growth property G, G finite, n 0, w L, w > n 0 = g G, w L, w = w +g (intuition) the languages are generated by finite sets of generators Some MCS languages: TAGs and LIGs Local Multi Component TAGs (MC-TAGs Weir) Linear Context-Free Rewriting Systems (LCFRS Weir) Simple Range Concatenation Grammars (srcg Boullier) Éric. de la Clergerie TAL 30/09/ / 91

116 Semi-linearity The Constant Growth property subsumed by stronger semi-linearity under Parikh image The Parikh image of w {a 1,...,a n } defined as p(w) = ( w a1,..., w an ) The Parikh image of L defined as p(l) = {p(w) w L} A set V of vectors over N Σ is linear is generated by a base v 0, v 1,...,v n N Σ by V = {v 0 +Σ n i=1k i v i k i N} V is semilinear if V = k i=1 V i is a finite union of linear sets V i A language L is semilinear if p(l) is semilinear (intuition) A MCS language is generated, modulo some permutations, by a finite set of generators Éric. de la Clergerie TAL 30/09/ / 91

117 MCS: discontinuity and interleaving Discontinuous interleaved constituents present in linguistic phenomena Nesting, Crossing, Topicalization, Deep extraction, Complex Word-Order... Éric. de la Clergerie TAL 30/09/ / 91

118 MCS: discontinuity and interleaving Discontinuous interleaved constituents present in linguistic phenomena Nesting, Crossing, Topicalization, Deep extraction, Complex Word-Order... A B C X1 X2 X3 X4 H X5 X6 Éric. de la Clergerie TAL 30/09/ / 91

119 MCS: discontinuity and interleaving Discontinuous interleaved constituents present in linguistic phenomena Nesting, Crossing, Topicalization, Deep extraction, Complex Word-Order... A B C X1 X2 X3 X4 H X5 X6 LFCRS: A f(b, C), f linear non erasing function on string tuples. f( x 1, x 3, x 5, x 2, x 4, x 6 ) = x 1 x 2 x 3 x 4, x 5 x 6 srcg A(x 1.x 2.x 3.x 4, x 5.x 6 ) B(x 1, x 3, x 5 ), C(x 2, x 4, x 6 ) range variables x i ; concatenation. ; holes, Éric. de la Clergerie TAL 30/09/ / 91

120 LCFRS Linear Context-Free Rewriting Systems (LCFRS), a restricted form of generalized CFGs A LCFRS is a tuple G = (N,Σ, S,P,F) where P is a finite set of productions as follows, with f F A f(a 1,...,A n ) F is a set of linear regular operations over tuples of strings in Σ f( x 1,1,...,x 1,k1,..., x n,1,...,x 1,kn ) = t 1,...t k where V = {x i,j } are variables (over Σ ) and t i (Σ V) and (regular or non-erasing) xi,j, t u, x i,j t u (linear) xi,j, x i,j t u x i,j t v = u = v Assuming arity(s) = 1, L(G) = {w S = w } where A = f() if A f() P A = f(t 1,...,t n ) if A f(a 1,...,A n ) P i, A i = t i Éric. de la Clergerie TAL 30/09/ / 91

121 RCG Range Concatenation Grammars (RCG) [Boullier] : Constraints on intervals on the input string. For language a n b n c n Z) > A(X,Y, Z). ( " a X, " b Y, " c Z) > A(X,Y, Z). ( " ", " ", " " ) >. aabbcc S(]0, 6]) aabbcc A(]0, 2],]2, 4],]4, 6]) aabbcc A(]1, 2],]3, 4],]5, 6]) aabbcc A(]2, 2],]4, 4],]6, 6]) RCG is an operational formalism for encoding linguistic formalisms where discontinuous constituents are used. RCG allow modular grammar writing concatenation Y) > G1(X),G2(Y). union G(X) > G1(X) G2(X). intersection G(X) > G1(X),G2(X). Linear non-erasing positive RCGs equivalent to LCFRS Full RCGs are PTIME (equivalent to Datalog) Éric. de la Clergerie TAL 30/09/ / 91

122 Parsing MCS MCS have theoretical polynomial complexity O(n u ) depending upon degree of discontinuity, (also fanout, arity) degree of interleaving, (also rank) But no uniform framework to express parsing strategies and tabular algorithms operational device: Deterministic Tree Walking Transducer (Weir), but no tabular algorithm operational formalism srcg with tabular algorithm (Boullier) but not for prefix-valid strategies Notion of Thread Automata to model discontinuity and interleaving through the suspension/resume of threads. Éric. de la Clergerie TAL 30/09/ / 91

123 Tree Walking Automata TWA may be used to check properties of (binary) trees (by accepting or rejecting them) A (non-deterministic) TWA is a tuple A = (Q, Σ, I, F, R, δ) where Q is a finite set of states Σ a finite set of node labels I, F, R Q the initial, accepting, rejecting states δ the finite set of transitions in Q Σ Pred Dirs Q where Pred {root, left, right,leaf} is a set of predicates for testing nodes Dirs {stay, up, left, right} a set of directions Deterministic TWA: δ : Q Σ Pred Dirs Q Given a Σ-tree τ = (V, E), a configuration is given by (ν, q) V Q Extensions: Pebble Automata (Engelfriet) Éric. de la Clergerie TAL 30/09/ / 91

124 Tree Walking Transducers Similar to TWA, but emits strings when walking over a tree (in some tree set) A deterministic TWD (Weir) is a tuple T = (Q, G,Σ O, q I, F,δ) where G = (N,Σ I, S,P) is a CFG Σ O a finite set of output symbols δ : Q (N Σ I {ǫ}) Dirs Q Σ O with Dirs = {stay, up, down 1,...,down n } A transition step given by (q,γ,ν, w) (q,γ,ν, w.v) if { (q, dir) = δ(q, label(ν)) ν = dir(ν) The language generated by T defined as L(T) = {w (q I,γ, r γ,ǫ) (q f,γ,, w)} with q f F, γ a derivation tree for G with root r γ and a virtual node parent of r γ Weir s result: L(DTWD) = LCFRL Éric. de la Clergerie TAL 30/09/ / 91

125 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Éric. de la Clergerie TAL 30/09/ / 91

126 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Recognize aaabbbccc a n b n c n ǫ 1 Éric. de la Clergerie TAL 30/09/ / 91

127 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Recognize aaabbbccc a n b n c n ǫ a Éric. de la Clergerie TAL 30/09/ / 91

128 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Recognize aaabbbccc a n b n c n ǫ a 1.1 a 1 Éric. de la Clergerie TAL 30/09/ / 91

129 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Recognize aaabbbccc a n b n c n ǫ a a 1.1 a 1 Éric. de la Clergerie TAL 30/09/ / 91

130 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Recognize aaabbbccc a n b n c n ǫ a ǫ a 1.1 a 1 Éric. de la Clergerie TAL 30/09/ / 91

131 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Recognize aaabbbccc a n b n c n ǫ a ǫ a 1.1 a 1 Éric. de la Clergerie TAL 30/09/ / 91

132 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Recognize aaabbbccc a n b n c n ǫ a ǫ a 1.1 a 1 Éric. de la Clergerie TAL 30/09/ / 91

133 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Recognize aaabbbccc a n b n c n ǫ a ǫ a 1.1 a 1 Éric. de la Clergerie TAL 30/09/ / 91

134 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Recognize aaabbbccc a n b n c n a ǫ a 1.1 a b 1 ǫ b b ǫ Éric. de la Clergerie TAL 30/09/ / 91

135 Thread Automata Idea: Associate a thread p per constituent and create a subthread p.u for a sub-constituent [PUSH] suspend thread at constituent discontinuity, and (resume) either the parent thread [SPOP] or some direct subthread [SPUSH] scan terminal [SWAP] delete thread after full recognition of a constituent [POP] Recognize aaabbbccc a n b n c n ǫ a ǫ a 1.1 a 1 b b b ǫ c c c ǫ Éric. de la Clergerie TAL 30/09/ / 91

136 Formal presentation of TA Configuration position l, active thread path p, thread store S = {p i :A i } S closed by prefix: p.u dom(s) = p dom(s) Note: stateless automata (but no problem for variants with states) Triggering function a = Φ(A) amount of information needed to trigger transitions. = useful to get linear compexity O( G ) w.r.t. grammar size G Default: Φ = Identity Driver function u δ(a) Drive thread creations and suspensions = reduce number of transitions (TA variants without δ should be possible) Éric. de la Clergerie TAL 30/09/ / 91

137 Formal presentation of TA (cont d) SWAP B α C : Changes the content of the active thread, possibly scanning a terminal. l, p,s p:b τ l + α, p,s p:c a l = α if α ǫ PUSH b [b]c : Creates a new subthread (unless present) l, p,s p:b τ l, pu,s p:b pu:c (b, u) Φδ(B) pu dom POP [B]C D : Terminates thread pu (if no existing subthreads). l, pu,s p:b pu:c τ l, p,s p:d pu dom(s) SPUSH b[c] [b]d : Resumes the subthread pu (if already created) l, p,s p:b pu:c l, pu,s p:b pu:d (b, u s ) Φδ(B) τ SPOP [B]c D[c] : Resumes the parent thread p of pu l, pu,s p:b pu:c l, p,s p:d pu:c (c, ) Φδ(C) τ Éric. de la Clergerie TAL 30/09/ / 91

138 Characterizing Thread Automata Key parameters: h maximal number of suspensions to the parent thread h finite ensures termination (of tabular parsing) d maximal number of simultaneously alive subthreads l maximal number of subthreads s maximal number of suspensions (parent + alive subthreads) s h+dh h+lh Éric. de la Clergerie TAL 30/09/ / 91

139 Characterizing Thread Automata Key parameters: h maximal number of suspensions to the parent thread h finite ensures termination (of tabular parsing) d maximal number of simultaneously alive subthreads l maximal number of subthreads s maximal number of suspensions (parent + alive subthreads) s h+dh h+lh Worst-case Complexity: } { space O(n u ) u = 2+s + x time O(n 1+u where ) x = min(s,(l d)(h + 1)) { space between O(n = 2+2s ) and [when l = d] O(n 2+s ) time between O(n 3+2s) ) and [when l = d] O(n 3+s ) Éric. de la Clergerie TAL 30/09/ / 91

140 Characterizing Thread Automata Key parameters: h maximal number of suspensions to the parent thread h finite ensures termination (of tabular parsing) d maximal number of simultaneously alive subthreads l maximal number of subthreads s maximal number of suspensions (parent + alive subthreads) s h+dh h+lh Worst-case Complexity: } { space O(n u ) u = 2+s + x time O(n 1+u where ) x = min(s,(l d)(h + 1)) { space between O(n = 2+2s ) and [when l = d] O(n 2+s ) time between O(n 3+2s) ) and [when l = d] O(n 3+s ) Push-Down Automata (PDA) for CFG TA(h=0,d=1,s=0) = space O(n 2 ) and time O(n 3 ) Éric. de la Clergerie TAL 30/09/ / 91

141 Parsing TAGs Idea: Assign a thread per elementary tree traversal (substitution or adjunction) Suspend and return to parent thread to handle a foot node Éric. de la Clergerie TAL 30/09/ / 91

142 Parsing TAGs Idea: Assign a thread per elementary tree traversal (substitution or adjunction) Suspend and return to parent thread to handle a foot node ν r ν ν β r β α r α ν ν f ν Éric. de la Clergerie TAL 30/09/ / 91

143 Parsing TAGs Idea: Assign a thread per elementary tree traversal (substitution or adjunction) Suspend and return to parent thread to handle a foot node ν r ν ν β r β f ν f ν α r α ν ν Éric. de la Clergerie TAL 30/09/ / 91

144 Parsing TAGs Idea: Assign a thread per elementary tree traversal (substitution or adjunction) Suspend and return to parent thread to handle a foot node ν r ν ν β r β f f ν f ν α r α ν ν ν Éric. de la Clergerie TAL 30/09/ / 91

145 Parsing TAGs Idea: Assign a thread per elementary tree traversal (substitution or adjunction) Suspend and return to parent thread to handle a foot node ν r ν ν β r β f f r β ν f ν α r α ν ν ν ν Éric. de la Clergerie TAL 30/09/ / 91

146 Parsing TAGs Idea: Assign a thread per elementary tree traversal (substitution or adjunction) Suspend and return to parent thread to handle a foot node ν r ν ν β r β f f r β ν f ν α r α ν ν ν ν r α Éric. de la Clergerie TAL 30/09/ / 91

147 Parsing TAGs Idea: Assign a thread per elementary tree traversal (substitution or adjunction) Suspend and return to parent thread to handle a foot node ν r ν ν ν f ν α β r α β r β ν f ν f r β ν ν r α Éric. de la Clergerie TAL 30/09/ / 91

148 Parsing TAGs Idea: Assign a thread per elementary tree traversal (substitution or adjunction) Suspend and return to parent thread to handle a foot node ν r ν ν ν f ν α β r α β r β ν f ν f r β ν ν r α Éric. de la Clergerie TAL 30/09/ / 91

149 Parsing TAGs Idea: Assign a thread per elementary tree traversal (substitution or adjunction) Suspend and return to parent thread to handle a foot node ν r ν ν ν f ν α β r α β r β ν f ν f r β ν ν r α Éric. de la Clergerie TAL 30/09/ / 91

150 Parsing TAGs Idea: Assign a thread per elementary tree traversal (substitution or adjunction) Suspend and return to parent thread to handle a foot node ν r ν ν ν f ν α β r α β r β ν f ν f r β ν ν r α One thread per tree h = 1, d = max(depth(trees)) = [s = 1+d] space O(n 4+2d ) and time O(n 5+2d ) Éric. de la Clergerie TAL 30/09/ / 91

151 Parsing TAG: an alternate parsing strategy Using more than one thread per elementary tree: 1 thread per subtree ( LIG) = implicit extraction of subtrees = implicit normal form (using a third kind of tree operation) = usual n 6 time complexity ν r ν ν ν f ν r ν ν f f ν ν Note: Similar to a TAG encoding in RCG proposed by Boullier Éric. de la Clergerie TAL 30/09/ / 91

152 Using less threads Always possible to reduce the number of live subthreads (down to 2). if a thread p has d + 1 subthreads, add a new subthread p.v that inherits d subthreads of p generally increases the number of parent suspensions h but may also exploit good topological properties, such as well-nesting (TAGs). Éric. de la Clergerie TAL 30/09/ / 91

Mildly Context-Sensitive Grammar Formalisms: Thread Automata

Mildly Context-Sensitive Grammar Formalisms: Thread Automata Idea of Thread Automata (1) Mildly Context-Sensitive Grammar Formalisms: Thread Automata Laura Kallmeyer Sommersemester 2011 Thread automata (TA) have been proposed in [Villemonte de La Clergerie, 2002].

More information

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing

More information

Tree Adjoining Grammars

Tree Adjoining Grammars Tree Adjoining Grammars TAG: Parsing and formal properties Laura Kallmeyer & Benjamin Burkhardt HHU Düsseldorf WS 2017/2018 1 / 36 Outline 1 Parsing as deduction 2 CYK for TAG 3 Closure properties of TALs

More information

Mildly Context-Sensitive Grammar Formalisms: Embedded Push-Down Automata

Mildly Context-Sensitive Grammar Formalisms: Embedded Push-Down Automata Mildly Context-Sensitive Grammar Formalisms: Embedded Push-Down Automata Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Sommersemester 2011 Intuition (1) For a language L, there is a TAG G with

More information

Computational Models - Lecture 4

Computational Models - Lecture 4 Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push

More information

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars Laura Kallmeyer & Tatiana Bladier Heinrich-Heine-Universität Düsseldorf Sommersemester 2018 Kallmeyer, Bladier SS 2018 Parsing Beyond CFG:

More information

Non-context-Free Languages. CS215, Lecture 5 c

Non-context-Free Languages. CS215, Lecture 5 c Non-context-Free Languages CS215, Lecture 5 c 2007 1 The Pumping Lemma Theorem. (Pumping Lemma) Let be context-free. There exists a positive integer divided into five pieces, Proof for for each, and..

More information

Lecture 17: Language Recognition

Lecture 17: Language Recognition Lecture 17: Language Recognition Finite State Automata Deterministic and Non-Deterministic Finite Automata Regular Expressions Push-Down Automata Turing Machines Modeling Computation When attempting to

More information

Foundations of Informatics: a Bridging Course

Foundations of Informatics: a Bridging Course Foundations of Informatics: a Bridging Course Week 3: Formal Languages and Semantics Thomas Noll Lehrstuhl für Informatik 2 RWTH Aachen University noll@cs.rwth-aachen.de http://www.b-it-center.de/wob/en/view/class211_id948.html

More information

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CMPT-825 Natural Language Processing. Why are parsing algorithms important? CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate

More information

Properties of Context-Free Languages

Properties of Context-Free Languages Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Complexity, Parsing, and Factorization of Tree-Local Multi-Component Tree-Adjoining Grammar

Complexity, Parsing, and Factorization of Tree-Local Multi-Component Tree-Adjoining Grammar Complexity, Parsing, and Factorization of Tree-Local Multi-Component Tree-Adjoining Grammar Rebecca Nesson School of Engineering and Applied Sciences Harvard University Giorgio Satta Department of Information

More information

Parsing Linear Context-Free Rewriting Systems

Parsing Linear Context-Free Rewriting Systems Parsing Linear Context-Free Rewriting Systems Håkan Burden Dept. of Linguistics Göteborg University cl1hburd@cling.gu.se Peter Ljunglöf Dept. of Computing Science Göteborg University peb@cs.chalmers.se

More information

Context-Free Grammars and Languages. Reading: Chapter 5

Context-Free Grammars and Languages. Reading: Chapter 5 Context-Free Grammars and Languages Reading: Chapter 5 1 Context-Free Languages The class of context-free languages generalizes the class of regular languages, i.e., every regular language is a context-free

More information

NPDA, CFG equivalence

NPDA, CFG equivalence NPDA, CFG equivalence Theorem A language L is recognized by a NPDA iff L is described by a CFG. Must prove two directions: ( ) L is recognized by a NPDA implies L is described by a CFG. ( ) L is described

More information

Everything You Always Wanted to Know About Parsing

Everything You Always Wanted to Know About Parsing Everything You Always Wanted to Know About Parsing Part IV : Parsing University of Padua, Italy ESSLLI, August 2013 Introduction First published in 1968 by Jay in his PhD dissertation, Carnegie Mellon

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

Final exam study sheet for CS3719 Turing machines and decidability.

Final exam study sheet for CS3719 Turing machines and decidability. Final exam study sheet for CS3719 Turing machines and decidability. A Turing machine is a finite automaton with an infinite memory (tape). Formally, a Turing machine is a 6-tuple M = (Q, Σ, Γ, δ, q 0,

More information

Exam: Synchronous Grammars

Exam: Synchronous Grammars Exam: ynchronous Grammars Duration: 3 hours Written documents are allowed. The numbers in front of questions are indicative of hardness or duration. ynchronous grammars consist of pairs of grammars whose

More information

Introduction to Theory of Computing

Introduction to Theory of Computing CSCI 2670, Fall 2012 Introduction to Theory of Computing Department of Computer Science University of Georgia Athens, GA 30602 Instructor: Liming Cai www.cs.uga.edu/ cai 0 Lecture Note 3 Context-Free Languages

More information

This lecture covers Chapter 7 of HMU: Properties of CFLs

This lecture covers Chapter 7 of HMU: Properties of CFLs This lecture covers Chapter 7 of HMU: Properties of CFLs Chomsky Normal Form Pumping Lemma for CFs Closure Properties of CFLs Decision Properties of CFLs Additional Reading: Chapter 7 of HMU. Chomsky Normal

More information

Unterspezifikation in der Semantik Scope Semantics in Lexicalized Tree Adjoining Grammars

Unterspezifikation in der Semantik Scope Semantics in Lexicalized Tree Adjoining Grammars in der emantik cope emantics in Lexicalized Tree Adjoining Grammars Laura Heinrich-Heine-Universität Düsseldorf Wintersemester 2011/2012 LTAG: The Formalism (1) Tree Adjoining Grammars (TAG): Tree-rewriting

More information

Part I: Definitions and Properties

Part I: Definitions and Properties Turing Machines Part I: Definitions and Properties Finite State Automata Deterministic Automata (DFSA) M = {Q, Σ, δ, q 0, F} -- Σ = Symbols -- Q = States -- q 0 = Initial State -- F = Accepting States

More information

Tabular Algorithms for TAG Parsing

Tabular Algorithms for TAG Parsing Tabular Algorithms for TAG Parsing Miguel A. Alonso Departamento de Computación Univesidad de La Coruña Campus de lviña s/n 15071 La Coruña Spain alonso@dc.fi.udc.es David Cabrero Departamento de Computación

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY 15-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY REVIEW for MIDTERM 1 THURSDAY Feb 6 Midterm 1 will cover everything we have seen so far The PROBLEMS will be from Sipser, Chapters 1, 2, 3 It will be

More information

Theory of Computation (IV) Yijia Chen Fudan University

Theory of Computation (IV) Yijia Chen Fudan University Theory of Computation (IV) Yijia Chen Fudan University Review language regular context-free machine DFA/ NFA PDA syntax regular expression context-free grammar Pushdown automata Definition A pushdown automaton

More information

PARSING TREE ADJOINING GRAMMARS AND TREE INSERTION GRAMMARS WITH SIMULTANEOUS ADJUNCTIONS

PARSING TREE ADJOINING GRAMMARS AND TREE INSERTION GRAMMARS WITH SIMULTANEOUS ADJUNCTIONS PARSING TREE ADJOINING GRAMMARS AND TREE INSERTION GRAMMARS WITH SIMULTANEOUS ADJUNCTIONS Miguel A. Alonso Departamento de Computación Universidade da Coruña Campus de Elviña s/n 15071 La Coruña (Spain)

More information

Turing Machines. 22c:135 Theory of Computation. Tape of a Turing Machine (TM) TM versus FA, PDA

Turing Machines. 22c:135 Theory of Computation. Tape of a Turing Machine (TM) TM versus FA, PDA Turing Machines A Turing machine is similar to a finite automaton with supply of unlimited memory. A Turing machine can do everything that any computing device can do. There exist problems that even a

More information

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation MA/CSSE 474 Theory of Computation CFL Hierarchy CFL Decision Problems Your Questions? Previous class days' material Reading Assignments HW 12 or 13 problems Anything else I have included some slides online

More information

Everything You Always Wanted to Know About Parsing

Everything You Always Wanted to Know About Parsing Everything You Always Wanted to Know About Parsing Part V : LR Parsing University of Padua, Italy ESSLLI, August 2013 Introduction Parsing strategies classified by the time the associated PDA commits to

More information

Ogden s Lemma for CFLs

Ogden s Lemma for CFLs Ogden s Lemma for CFLs Theorem If L is a context-free language, then there exists an integer l such that for any u L with at least l positions marked, u can be written as u = vwxyz such that 1 x and at

More information

Theory of Computation (Classroom Practice Booklet Solutions)

Theory of Computation (Classroom Practice Booklet Solutions) Theory of Computation (Classroom Practice Booklet Solutions) 1. Finite Automata & Regular Sets 01. Ans: (a) & (c) Sol: (a) The reversal of a regular set is regular as the reversal of a regular expression

More information

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata. Pushdown Automata We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata. Next we consider a more powerful computation model, called a

More information

Complexity of Natural Languages Mildly-context sensitivity T1 languages

Complexity of Natural Languages Mildly-context sensitivity T1 languages Complexity of Natural Languages Mildly-context sensitivity T1 languages Introduction to Formal Language Theory day 4 Wiebke Petersen & Kata Balogh Heinrich-Heine-Universität NASSLLI 2014 Petersen & Balogh

More information

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

CISC4090: Theory of Computation

CISC4090: Theory of Computation CISC4090: Theory of Computation Chapter 2 Context-Free Languages Courtesy of Prof. Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Spring, 2014 Overview In Chapter

More information

Strong connectivity hypothesis and generative power in TAG

Strong connectivity hypothesis and generative power in TAG 15 Strong connectivity hypothesis and generative power in TAG Alessandro Mazzei, Vincenzo ombardo and Patrick Sturt Abstract Dynamic grammars are relevant for psycholinguistic modelling and speech processing.

More information

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs Harry Lewis October 8, 2013 Reading: Sipser, pp. 119-128. Pushdown Automata (review) Pushdown Automata = Finite automaton

More information

CS20a: summary (Oct 24, 2002)

CS20a: summary (Oct 24, 2002) CS20a: summary (Oct 24, 2002) Context-free languages Grammars G = (V, T, P, S) Pushdown automata N-PDA = CFG D-PDA < CFG Today What languages are context-free? Pumping lemma (similar to pumping lemma for

More information

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus Timothy A. D. Fowler Department of Computer Science University of Toronto 10 King s College Rd., Toronto, ON, M5S 3G4, Canada

More information

Model-Theory of Property Grammars with Features

Model-Theory of Property Grammars with Features Model-Theory of Property Grammars with Features Denys Duchier Thi-Bich-Hanh Dao firstname.lastname@univ-orleans.fr Yannick Parmentier Abstract In this paper, we present a model-theoretic description of

More information

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0 Massachusetts Institute of Technology 6.863J/9.611J, Natural Language Processing, Spring, 2001 Department of Electrical Engineering and Computer Science Department of Brain and Cognitive Sciences Handout

More information

Multiple Context Free Grammars

Multiple Context Free Grammars Multiple Context Free Grammars Siddharth Krishna September 14, 2013 Abstract Multiple context-free grammars (MCFGs) are a generalization of context-free grammars that deals with tuples of strings. This

More information

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu Part 4 out of 5 ETH Zürich (D-ITET) October, 12 2017 Last week, we showed the equivalence of DFA, NFA and REX

More information

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars COMP-330 Theory of Computation Fall 2017 -- Prof. Claude Crépeau Lec. 10 : Context-Free Grammars COMP 330 Fall 2017: Lectures Schedule 1-2. Introduction 1.5. Some basic mathematics 2-3. Deterministic finite

More information

Parsing. Left-Corner Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 17

Parsing. Left-Corner Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 17 Parsing Left-Corner Parsing Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 17 Table of contents 1 Motivation 2 Algorithm 3 Look-ahead 4 Chart Parsing 2 / 17 Motivation Problems

More information

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up

More information

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES Contents i SYLLABUS UNIT - I CHAPTER - 1 : AUT UTOMA OMATA Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 2 : FINITE AUT UTOMA OMATA An Informal Picture of Finite Automata,

More information

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever.   ETH Zürich (D-ITET) October, Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu ETH Zürich (D-ITET) October, 5 2017 Part 3 out of 5 Last week, we learned about closure and equivalence of regular

More information

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu Part 3 out of 5 ETH Zürich (D-ITET) October, 5 2017 Last week, we learned about closure and equivalence of regular

More information

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik Einführung in die Computerlinguistik Context-Free Grammars formal properties Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2018 1 / 20 Normal forms (1) Hopcroft and Ullman (1979) A normal

More information

How to Pop a Deep PDA Matters

How to Pop a Deep PDA Matters How to Pop a Deep PDA Matters Peter Leupold Department of Mathematics, Faculty of Science Kyoto Sangyo University Kyoto 603-8555, Japan email:leupold@cc.kyoto-su.ac.jp Abstract Deep PDA are push-down automata

More information

NEW TABULAR ALGORITHMS FOR LIG PARSING

NEW TABULAR ALGORITHMS FOR LIG PARSING NEW TABULAR ALGORITHMS FOR LIG PARSING Miguel A. Alonso Jorge Graña Manuel Vilares Departamento de Computación Universidad de La Coruña Campus de Elviña s/n 15071 La Coruña (Spain) {alonso,grana,vilares}@dc.fi.udc.es

More information

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation MA/CSSE 474 Theory of Computation Bottom-up parsing Pumping Theorem for CFLs Recap: Going One Way Lemma: Each context-free language is accepted by some PDA. Proof (by construction): The idea: Let the stack

More information

Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication

Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication Shay B. Cohen University of Edinburgh Daniel Gildea University of Rochester We describe a recognition algorithm for a subset

More information

CP405 Theory of Computation

CP405 Theory of Computation CP405 Theory of Computation BB(3) q 0 q 1 q 2 0 q 1 1R q 2 0R q 2 1L 1 H1R q 1 1R q 0 1L Growing Fast BB(3) = 6 BB(4) = 13 BB(5) = 4098 BB(6) = 3.515 x 10 18267 (known) (known) (possible) (possible) Language:

More information

Finite-state Machines: Theory and Applications

Finite-state Machines: Theory and Applications Finite-state Machines: Theory and Applications Unweighted Finite-state Automata Thomas Hanneforth Institut für Linguistik Universität Potsdam December 10, 2008 Thomas Hanneforth (Universität Potsdam) Finite-state

More information

Notes for Comp 497 (454) Week 10

Notes for Comp 497 (454) Week 10 Notes for Comp 497 (454) Week 10 Today we look at the last two chapters in Part II. Cohen presents some results concerning the two categories of language we have seen so far: Regular languages (RL). Context-free

More information

Properties of Context-Free Languages. Closure Properties Decision Properties

Properties of Context-Free Languages. Closure Properties Decision Properties Properties of Context-Free Languages Closure Properties Decision Properties 1 Closure Properties of CFL s CFL s are closed under union, concatenation, and Kleene closure. Also, under reversal, homomorphisms

More information

Epsilon-Free Grammars and Lexicalized Grammars that Generate the Class of the Mildly Context-Sensitive Languages

Epsilon-Free Grammars and Lexicalized Grammars that Generate the Class of the Mildly Context-Sensitive Languages psilon-free rammars and Leicalized rammars that enerate the Class of the Mildly Contet-Sensitive Languages Akio FUJIYOSHI epartment of Computer and Information Sciences, Ibaraki University 4-12-1 Nakanarusawa,

More information

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF) CS5371 Theory of Computation Lecture 7: Automata Theory V (CFG, CFL, CNF) Announcement Homework 2 will be given soon (before Tue) Due date: Oct 31 (Tue), before class Midterm: Nov 3, (Fri), first hour

More information

Global Index Languages

Global Index Languages Global Index Languages A Dissertation Presented to The Faculty of the Graduate School of Arts and Sciences Brandeis University Department of Computer Science James Pustejovsky, Advisor In Partial Fulfillment

More information

Even More on Dynamic Programming

Even More on Dynamic Programming Algorithms & Models of Computation CS/ECE 374, Fall 2017 Even More on Dynamic Programming Lecture 15 Thursday, October 19, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 26 Part I Longest Common Subsequence

More information

Parser Combinators, (Simply) Indexed Grammars, Natural Language Parsing

Parser Combinators, (Simply) Indexed Grammars, Natural Language Parsing Parser Combinators, (Simply) Indexed Grammars, Natural Language Parsing Jan van Eijck jve@cwi.nl Essex, April 23, 2004 Abstract Parser combinators [4, 6] are higher order functions that transform parsers

More information

Theory Bridge Exam Example Questions

Theory Bridge Exam Example Questions Theory Bridge Exam Example Questions Annotated version with some (sometimes rather sketchy) answers and notes. This is a collection of sample theory bridge exam questions. This is just to get some idea

More information

Introduction to Computational Linguistics

Introduction to Computational Linguistics Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates

More information

SCHEME FOR INTERNAL ASSESSMENT TEST 3

SCHEME FOR INTERNAL ASSESSMENT TEST 3 SCHEME FOR INTERNAL ASSESSMENT TEST 3 Max Marks: 40 Subject& Code: Automata Theory & Computability (15CS54) Sem: V ISE (A & B) Note: Answer any FIVE full questions, choosing one full question from each

More information

Computability and Complexity

Computability and Complexity Computability and Complexity Push-Down Automata CAS 705 Ryszard Janicki Department of Computing and Software McMaster University Hamilton, Ontario, Canada janicki@mcmaster.ca Ryszard Janicki Computability

More information

Functions on languages:

Functions on languages: MA/CSSE 474 Final Exam Notation and Formulas page Name (turn this in with your exam) Unless specified otherwise, r,s,t,u,v,w,x,y,z are strings over alphabet Σ; while a, b, c, d are individual alphabet

More information

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014 Part 1: Grammars and cognitive hypotheses What is a grammar?

More information

Pushdown Automata (2015/11/23)

Pushdown Automata (2015/11/23) Chapter 6 Pushdown Automata (2015/11/23) Sagrada Familia, Barcelona, Spain Outline 6.0 Introduction 6.1 Definition of PDA 6.2 The Language of a PDA 6.3 Euivalence of PDA s and CFG s 6.4 Deterministic PDA

More information

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises Automata and Computability Solutions to Exercises Spring 27 Alexis Maciel Department of Computer Science Clarkson University Copyright c 27 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

More information

Automata Theory (2A) Young Won Lim 5/31/18

Automata Theory (2A) Young Won Lim 5/31/18 Automata Theory (2A) Copyright (c) 2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later

More information

arxiv: v1 [cs.ds] 9 Apr 2018

arxiv: v1 [cs.ds] 9 Apr 2018 From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract

More information

CS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA)

CS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA) CS5371 Theory of Computation Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA) Objectives Introduce the Pumping Lemma for CFL Show that some languages are non- CFL Discuss the DPDA, which

More information

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

October 6, Equivalence of Pushdown Automata with Context-Free Gramm Equivalence of Pushdown Automata with Context-Free Grammar October 6, 2013 Motivation Motivation CFG and PDA are equivalent in power: a CFG generates a context-free language and a PDA recognizes a context-free

More information

Computational Models - Lecture 5 1

Computational Models - Lecture 5 1 Computational Models - Lecture 5 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 10/22, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften Normal forms (1) Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften Laura Heinrich-Heine-Universität Düsseldorf Sommersemester 2013 normal form of a grammar formalism

More information

Properties of context-free Languages

Properties of context-free Languages Properties of context-free Languages We simplify CFL s. Greibach Normal Form Chomsky Normal Form We prove pumping lemma for CFL s. We study closure properties and decision properties. Some of them remain,

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 14 Ana Bove May 14th 2018 Recap: Context-free Grammars Simplification of grammars: Elimination of ǫ-productions; Elimination of

More information

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Notes for Comp 497 (Comp 454) Week 10 4/5/05 Notes for Comp 497 (Comp 454) Week 10 4/5/05 Today look at the last two chapters in Part II. Cohen presents some results concerning context-free languages (CFL) and regular languages (RL) also some decidability

More information

arxiv: v5 [cs.fl] 21 Feb 2012

arxiv: v5 [cs.fl] 21 Feb 2012 Streaming Tree Transducers Rajeev Alur and Loris D Antoni University of Pennsylvania February 23, 2012 arxiv:1104.2599v5 [cs.fl] 21 Feb 2012 Abstract Theory of tree transducers provides a foundation for

More information

An Alternative Construction in Symbolic Reachability Analysis of Second Order Pushdown Systems

An Alternative Construction in Symbolic Reachability Analysis of Second Order Pushdown Systems An Alternative Construction in Symbolic Reachability Analysis of Second Order Pushdown Systems Anil Seth CSE Department, I.I.T. Kanpur, Kanpur 208016, INDIA. seth@cse.iitk.ac.in Abstract. Recently, it

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

Computational Models - Lecture 4 1

Computational Models - Lecture 4 1 Computational Models - Lecture 4 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 3/8, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Pushdown Automata: Introduction (2)

Pushdown Automata: Introduction (2) Pushdown Automata: Introduction Pushdown automaton (PDA) M = (K, Σ, Γ,, s, A) where K is a set of states Σ is an input alphabet Γ is a set of stack symbols s K is the start state A K is a set of accepting

More information

Context-Free Languages (Pre Lecture)

Context-Free Languages (Pre Lecture) Context-Free Languages (Pre Lecture) Dr. Neil T. Dantam CSCI-561, Colorado School of Mines Fall 2017 Dantam (Mines CSCI-561) Context-Free Languages (Pre Lecture) Fall 2017 1 / 34 Outline Pumping Lemma

More information

Properties of Context-free Languages. Reading: Chapter 7

Properties of Context-free Languages. Reading: Chapter 7 Properties of Context-free Languages Reading: Chapter 7 1 Topics 1) Simplifying CFGs, Normal forms 2) Pumping lemma for CFLs 3) Closure and decision properties of CFLs 2 How to simplify CFGs? 3 Three ways

More information

CS481F01 Prelim 2 Solutions

CS481F01 Prelim 2 Solutions CS481F01 Prelim 2 Solutions A. Demers 7 Nov 2001 1 (30 pts = 4 pts each part + 2 free points). For this question we use the following notation: x y means x is a prefix of y m k n means m n k For each of

More information

CPS 220 Theory of Computation

CPS 220 Theory of Computation CPS 22 Theory of Computation Review - Regular Languages RL - a simple class of languages that can be represented in two ways: 1 Machine description: Finite Automata are machines with a finite number of

More information

Before We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions?

Before We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions? Before We Start The Pumping Lemma Any questions? The Lemma & Decision/ Languages Future Exam Question What is a language? What is a class of languages? Context Free Languages Context Free Languages(CFL)

More information

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises Automata and Computability Solutions to Exercises Fall 28 Alexis Maciel Department of Computer Science Clarkson University Copyright c 28 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

More information

Miscellaneous. Closure Properties Decision Properties

Miscellaneous. Closure Properties Decision Properties Miscellaneous Closure Properties Decision Properties 1 Closure Properties of CFL s CFL s are closed under union, concatenation, and Kleene closure. Also, under reversal, homomorphisms and inverse homomorphisms.

More information

CSE 105 THEORY OF COMPUTATION

CSE 105 THEORY OF COMPUTATION CSE 105 THEORY OF COMPUTATION Spring 2016 http://cseweb.ucsd.edu/classes/sp16/cse105-ab/ Today's learning goals Sipser Ch 2 Define push down automata Trace the computation of a push down automaton Design

More information

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever.   ETH Zürich (D-ITET) October, Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu ETH Zürich (D-ITET) October, 19 2017 Part 5 out of 5 Last week was all about Context-Free Languages Context-Free

More information

Computational Models: Class 5

Computational Models: Class 5 Computational Models: Class 5 Benny Chor School of Computer Science Tel Aviv University March 27, 2019 Based on slides by Maurice Herlihy, Brown University, and modifications by Iftach Haitner and Yishay

More information

Introduction to Formal Languages, Automata and Computability p.1/42

Introduction to Formal Languages, Automata and Computability p.1/42 Introduction to Formal Languages, Automata and Computability Pushdown Automata K. Krithivasan and R. Rama Introduction to Formal Languages, Automata and Computability p.1/42 Introduction We have considered

More information