CFG nd nturl lnguges (1) Mildly Context-ensitive Grmmr Formlisms: Introduction Lur Kllmeyer Heinrich-Heine-Universität Düsseldorf A context-free grmmr (CFG) is set of rewriting rules tht tell us how to replce non-terminl y sequence of non-terminl nd terminl symols. Exmple: The string lnguge generted y this grmmr is { n n n 1}. ommersemester 2011 Grmmr Formlisms 1 Introduction Grmmr Formlisms 3 Introduction CFG nd nturl lnguges (2) mple CFG G telescope : Overview 1. CFG nd nturl lnguges 2. Polynomil extensions of CFG 3. Bsic definitions NP VP NP D N VP VP PP V NP N N PP PP P NP N mn girl telescope D the N John P with V sw Grmmr Formlisms 2 Introduction Grmmr Formlisms 4 Introduction
CFG nd nturl lnguges (3) Context-free lnguges (CFLs) cn e recognized in polynomil time (O(n 3 )); re ccepted y push-down utomt; hve nice closure properties (e.g., closure under homomorphisms, intersection with regulr lnguges...); stisfy pumping lemm; cn descrie nested dependencies ({ww R w T }). (Hopcroft nd Ullmn, 1979) Grmmr Formlisms 5 Introduction CFG nd nturl lnguges (5) wiss Germn: (2) (3)... ds mer em Hns es huus hälfed striiche... tht we Hns Dt house Acc helped pint... tht we helped Hns pint the house... ds mer d chind em Hns es huus lönd hälfe striiche... tht we the children Acc Hns Dt house Acc let help pint... tht we let the children help Hns pint the house wiss Germn uses cse mrking nd displys cross-seril dependencies. (hieer, 1985) shows tht wiss Germn is not context-free. Grmmr Formlisms 7 Introduction CFG nd nturl lnguges (4) Question: Is CFG powerful enough to descrie ll nturl lnguge phenome? Answer: No. There re constructions in nturl lnguges tht cnnot e dequtely descried with context-free grmmr. Exmple: cross-seril dependencies in Dutch nd in wiss Germn. Dutch: (1)... dt Wim Jn Mrie de kinderen zg helpen leren zwemmen... tht Wim Jn Mrie the children sw help tech swim... tht Wim sw Jn help Mrie tech the children to swim CFG nd nturl lnguges (6) In generl, ecuse of the closure properties, the following holds: A formlism tht cn generte cross-seril dependencies cn lso generte the copy lnguge {ww w {, } }. The copy lnguge is not context-free. Therefore we re interested in extensions of CFG in order to descrie ll nturl lnguge phenomen. Grmmr Formlisms 6 Introduction Grmmr Formlisms 8 Introduction
CFG nd nturl lnguges (7) Polynomil extensions of CFG (2) Ide (Joshi, 1985): chrcterize the mount of context-sensitivity necessry for nturl lnguges. Mildly context-sensitive formlisms hve the following properties: 1. They generte (t lest) ll CFLs. 2. They cn descrie limited mount of cross-seril dependencies. In other words, there is n 2 up to which the formlism cn generte ll string lnguges {w n w T }. 3. They re polynomilly prsle. 4. Their string lnguges re of constnt growth. In other words, the length of the words generted y the grmmr grows in liner wy, e.g., { 2n n 0} does not hve tht property. Exmple: TAG derivtion of : Grmmr Formlisms 9 Introduction Grmmr Formlisms 11 Introduction Polynomil extensions of CFG (1) Tree Adjoining Grmmrs (TAG), (Joshi, Levy, nd Tkhshi, 1975; Joshi nd ches, 1997): Tree-rewriting grmmr. Extension of CFG tht llows to replce not only leves ut lso internl nodes with new trees. Cn generte the copy lnguge. Exmple: TAG for the copy lnguge Grmmr Formlisms 10 Introduction Polynomil extensions of CFG (3) Liner Context-free rewriting systems (LCFR) nd the equivlent Multiple Context-Free Grmmrs (MCFG), (Vijy-hnker, Weir, nd Joshi, 1987; Weir, 1988; eki et l., 1991) Ide: extension of CFG where non-terminls cn spn tuples of non-djcent strings. Exmple: yield(a) = n n, c n d n, with n 1. The rewriting rules tell us how to compute the spn of the lefthnd side non-terminl from the spns of the righthnd side non-terminls. A(, cd) ε A(X, cy d) A(X, Y ) (XY ) A(X, Y ) Generted string lnguge: { n n c n d n n 1}. LCFR is more powerful thn TAG ut still mildly context-sensitive. Grmmr Formlisms 12 Introduction
Polynomil extensions of CFG (4) Rnge Conctention Grmmr (RCG) (Boullier, 2000) RCG contins cluses of the form A(...) A 1 (...)...A k (...) where A, A 1,..., A k re predictes. Their rguments re words over the terminl nd nonterminl lphets. Intuition: The predictes chrcterize properties of strings. A derivtion strts with (w) where is strt predicte. If this cn e reduced to the empty word (i.e., property is true for w), then w is in the lnguge. Exmple: RCG for { 2n n 0}. () ε (XY ) E(X, Y )(X) E(, ) ε E(X, Y ) E(X, Y ) Grmmr Formlisms 13 Introduction Polynomil extensions of CFG (6) ummry: CFG TAG LCFR, MCFG, simple RCG RCG (= PTIME) mildly context-sensitive In this course, we re interested in mildly context-sensitive formlisms. Grmmr Formlisms 15 Introduction Polynomil extensions of CFG (5) RCGs re simple if the rguments in the right-hnd sides of the cluses re single vriles. no vrile ppers more thn once in the left-hnd side of cluse or more thn once in the right-hnd side of cluse. ech vrile occurring in the left-hnd side of cluse occurs lso in its right-hnd side nd vice vers. imple RCG re equivlent to LCFR nd MCFG. RCG in generl re more powerful; they generte exctly the clss PTIME of polynomilly prsle lnguges. (They properly include the clss of MC formlisms.) Bsic Definitions: Lnguges (1) Definition 1 (Alphet, word, lnguge) 1. An lphet is nonempty finite set X. 2. A string x 1...x n with n 1 nd x i X for 1 i n is clled nonempty word on the lphet X. X + is defined s the set of ll nonempty words on X. 3. A new element ε / X + is dded: X := x + {ε}. For ech w X, the conctention of w nd ε is defined s follows: wε = εw = w. ε is clled the empty word, nd ech w X is clled word on X. 4. A set L is clled lnguge iff there is n lphet X such tht L X. Grmmr Formlisms 14 Introduction Grmmr Formlisms 16 Introduction
Bsic Definitions: Lnguges (2) Definition 2 (Homomorphism) For two lphets X nd Y, function f : X Y is homomorphism iff for ll v, w X : f(vw) = f(v)f(w). Definition 3 (Length of word) Let X e n lphet, w X. 1. The length of w, w is defined s follows: if w = ε, then w = 0. If w = xw for some x X, then w = 1 + w. 2. For every X, we define w s the numer of s occurring in w: If w = ε, then w = 0, if w = w then w = w + 1 nd if w = w for some X \ {}, then w = w. Grmmr Formlisms 17 Introduction Bsic Definitions: CFG (2) Definition 5 (Lnguge of CFG) Let G = N, T, P, e CFG. The (string) lnguge L(G) of G is the set {w T w} where for w, w (N T) : w w iff there is A α P nd there re v, u (N T) such tht w = vau nd w = vαu. is the reflexive trnsitive closure of : w 0 w for ll w (N T), nd for ll w, w (N T) : w n w iff there is v such tht w v nd v n 1 w. for ll w, w (N T) : w w iff there is i IN such tht w i w. A lnguge L is clled context-free iff there is CFG G such tht L = L(G). Grmmr Formlisms 19 Introduction Bsic Definitions: CFG (1) Definition 4 (Context-free grmmr) A context-free grmmr (CFG) is tuple G = N, T, P, such tht 1. N nd T re disjoint lphets, the nonterminls nd terminls of G, 2. P N (N T) is finite set of productions (lso clled rewriting rules). A production A, α is usully written A α. 3. N is the strt symol. Bsic Definitions: CFG (3) Proposition 1 (Pumping lemm for context-free lnguges) Let L e context-free lnguge. Then there is constnt c such tht for ll w L with w c: w = xv 1 yv 2 z with v 1 v 2 1, v 1 yv 2 c, nd for ll i 0: xv i 1 yvi 2 z L. Grmmr Formlisms 18 Introduction Grmmr Formlisms 20 Introduction
Bsic Definitions: CFG (4) Proposition 2 Context-free lnguges re closed under homomorphisms, i.e., for lphets T 1, T 2 nd for every context-free lnguge L 1 T 1 nd every homomorphism h : T 1 T 2, h(l 1 ) = {h(w) w L 1 } is context-free lnguge. Proposition 3 Context-free lnguges re closed under intersection with regulr lnguges, i.e., for every context-free lnguge L nd every regulr lnguge L r, L L r is context-free lnguge. Proposition 4 The copy lnguge {ww w {, } } is not context-free. Bsic Definitions: Trees (2) Definition 7 (Tree) A tree is triple γ = V, E, r such tht V, E is directed grph nd r V is specil node, the root node. γ contins no cycles, i.e., there is no v V such tht v, v E +, only the root r V hs in-degree 0, every vertex v V is ccessile from r, i.e., r, v E, nd ll nodes v V {r} hve in-degree 1. A vertex with out-degree 0 is clled lef. The vertices in tree re lso clled nodes. Grmmr Formlisms 21 Introduction Grmmr Formlisms 23 Introduction Bsic Definitions: Trees (1) Definition 6 (Directed Grph) 1. A directed grph is pir V, E where V is finite set of vertices nd E V V is set of edges. 2. For every v V, we define the in-degree of v s {v V v, v E} nd the out-degree of v s {v V v, v E}. E + is the trnsitive closure of E nd E is the reflexive trnsitive closure of E. Bsic Definitions: Trees (3) Definition 8 (Ordered Tree) A tree is ordered if it hs n dditionl liner precedence reltion V V such tht is irreflexive, ntisymmetric nd trnsitive, for ll v 1, v 2 with { v 1, v 2, v 2, v 1 } E = : either v 1 v 2 or v 2 v 1 nd if there is either v 3, v 1 E with v 3 v 2 or v 4, v 2 E with v 1 v 4, then v 1 v 2, nd nothing else is in. We use Gorn ddresses for nodes in ordered trees: The root ddress is ε, nd the jth child of node with ddress p hs ddress pj. Grmmr Formlisms 22 Introduction Grmmr Formlisms 24 Introduction
Bsic Definitions: Trees (4) Definition 9 (Leling) A leling of grph γ = V, E over signture A 1, A 2 is pir of functions l : V A 1 nd g : E A 2 with A 1, A 2 possily distinct. Definition 10 (yntctic tree) Let N nd T e disjoint lphets of non-terminl nd terminl symols. A syntctic tree (over N nd T) is n ordered finite leled tree such tht l(v) N for ech vertex v with out-degree t lest 1 nd l(v) (N T {ε}) for ech lef v. Bsic Definitions: Trees (6) Definition 12 (Wek nd trong Equivlence) Let F 1, F 2 e two grmmr formlisms. F 1 nd F 2 re wekly equivlent iff for ech instnce G 1 of F 1 there is n instnce G 2 of F 2 tht genertes the sme string lnguge nd vice vers. F 1 nd F 2 re strongly equivlent iff for oth formlisms the notion of tree lnguge is defined nd, furthermore, for ech instnce G 1 of F 1 there is n instnce G 2 of F 2 tht genertes the sme tree lnguge nd vice vers. Grmmr Formlisms 25 Introduction Grmmr Formlisms 27 Introduction Bsic Definitions: Trees (5) Definition 11 (Tree Lnguge of CFG) Let G = N, T, P, e CFG. 1. A syntctic tree V, E, r over N nd T is prse tree in G iff l(v) (T {ε}) for ech lef v, for every v 0, v 1,..., v n V, n 1 such tht v 0, v i E for 1 i n nd v i, v i+1 for 1 i < n, l(v 0 ) l(v 1 )...l(v n ) P. 2. A prse tree V, E, r is derivtion tree in G iff l(r) =. 3. The tree lnguge of G is L T (G) = {γ γ is derivtion tree in G} Grmmr Formlisms 26 Introduction References Boullier, Pierre. 2000. Rnge Conctention Grmmrs. In Proceedings of the ixth Interntionl Workshop on Prsing Technologies (IWPT2000), pges 53 64, Trento, Itly, Ferury. Hopcroft, John E. nd Jeffrey D. Ullmn. 1979. Introduction to Automt Theory, Lnguges nd Computtion. Addison Wesley. Joshi, Arvind K. 1985. Tree djoining grmmrs: How much contextsensitivity is required to provide resonle structurl descriptions? In D. Dowty, L. Krttunen, nd A. Zwicky, editors, Nturl Lnguge Prsing. Cmridge University Press, pges 206 250. Joshi, Arvind K., Leon. Levy, nd Msko Tkhshi. 1975. Tree Adjunct Grmmrs. Journl of Computer nd ystem cience, 10:136 163. Joshi, Arvind K. nd Yves ches. 1997. Tree-Adjoning Grmmrs. In G. Rozenerg nd A. lom, editors, Hndook of Forml Lnguges. pringer, Berlin, pges 69 123. Grmmr Formlisms 28 Introduction
vitch, Wlter J., Emmon Bch, Willim Mrxh, nd Gil frn-nveh, editors. 1987. The Forml Complexity of Nturl Lnguge. tudies in Linguistics nd Philosophy. Reidel, Dordrecht, Hollnd. eki, Hiroyuki, Tkhshi Mtsumur, Mmoru Fujii, nd Tdo Ksmi. 1991. On multiple context-free grmmrs. Theoreticl Computer cience, 88(2):191 229. hieer, turt M. 1985. Evidence ginst the context-freeness of nturl lnguge. Linguistics nd Philosophy, 8:333 343. Reprinted in (vitch et l., 1987). Vijy-hnker, K., Dvid J. Weir, nd Arvind K. Joshi. 1987. Chrcterizing structurl descriptions produced y vrious grmmticl formlisms. In Proceedings of ACL, tnford. Weir, Dvid J. 1988. Chrcterizing Mildly Context-ensitive Grmmr Formlisms. Ph.D. thesis, University of Pennsylvni. Grmmr Formlisms 29 Introduction