Around Context-Free Grammars - a Norma Form, a Representaton Theorem, and a Reguar Approxmaton arxv:1512.09207v1 [cs.fl] 31 Dec 2015 Lana Coocaru Schoo of Informaton Scences, Computer Scence Unversty of Tampere Lana.Coocaru@uta.f Abstract We ntroduce a norma form for context-free grammars, caed Dyck norma form. Ths s a syntactca restrcton of the Chomsky norma form, n whch the two nontermnas occurrng on the rght-hand sde of a rue are pared nontermnas. Ths parwse property aows to defne a homomorphsm from Dyck words to words generated by a grammar n Dyck norma form. We prove that for each context-free anguage L, there exst an nteger K and a homomorphsm ϕ such that L = ϕ(d K ), where D K D K, and D K s the one-sded Dyck anguage over K etters. Through a transton-ke dagram for a context-free grammar n Dyck norma form, we effectvey bud a reguar anguage R such that D K = R D K, whch eads to the Chomsky-Schützenberger theorem. Usng graphca approaches we refne R such that the Chomsky-Schützenberger theorem st hods. Based on ths readustment we sketch a transton dagram for a reguar grammar that generates a reguar superset approxmaton for the nta context-free anguage. Keywords: near anguages, context-free anguages, Dyck anguages, Chomsky norma form, Dyck norma form, Chomsky-Schützenberger theorem, reguar approxmaton Introducton A norma form for context-free grammars conssts of restrctons mposed on the structure of grammar s productons. These restrctons concern the number of termnas and nontermnas aowed on the rght-hand sdes of the rues, or on the manner n whch termnas and nontermnas are arranged nto the rues. Norma forms turned out to be usefu toos n studyng syntactca propertes of context-free grammars, n parsng theory, structura and descrptona compexty, nference and earnng theory. Varous norma forms for contextfree grammars have been study so far, but the most mportant reman the Chomsky norma form [17], Grebach norma form [12], and operator norma form [17]. For defntons, resuts, and surveys on norma forms the reader s referred to [5], [17], and [20]. A norma form s correct f t preserves the anguage generated by the orgna grammar. Ths condton s caed the weak equvaence,.e., a norma form preserves the anguage but may ose mportant syntactca or semantca propertes of the orgna grammar. The more syntactca, semantca, or ambguty propertes a norma form preserves, the stronger t s. It s we known that the Chomsky norma form s a strong norma form.
Ths paper s party devoted to a new norma form for context-free grammars, caed Dyck norma form. The Dyck norma form s a syntactca restrcton of the Chomsky norma form, n whch the two nontermnas occurrng on the rght-hand sde of a rue are pared nontermnas, n the sense that each eft (rght) nontermna of a par has a unque rght (eft) parwse. Ths parwse property mposed on the structure of the rght-hand sde of each rue nduces a nested structure on the dervaton tree of each word generated by a grammar n Dyck norma form. More precsey, each dervaton tree of a word generated by a grammar n Dyck norma form, that s read n the depth-frst search order s a Dyck word, hence the name of the norma form. Furthermore, there exsts aways a homomorphsm between the dervaton tree of a word generated by a grammar n Chomsky norma form and ts equvaent n Dyck norma form. In other words the Chomsky and Dyck norma forms are strongy equvaent. Ths property, aong wth severa other termna rewrtng condtons mposed to a grammar n Dyck norma form, enabe us to defne a homomorphsm from Dyck words to words generated by a grammar n Dyck norma form. We have been nspred to deveop ths norma form by the genera theory of Dyck words and Dyck anguages, that turned out to pay a cruca roe n the descrpton and characterzaton of context-free anguages [9], [10], and [19]. The defnton and severa propertes of grammars n Dyck norma form are presented n Secton 1. For each context-free grammar G n Dyck norma form we defne, n Secton 2, the trace anguage assocated wth dervatons n G, whch s the set of a dervaton trees of G read n the depth-frst search order, startng from the grammar axom. By expotng the Dyck norma form, and severa characterzatons of Dyck anguages presented n [19], we gve a new characterzaton of context-free anguages n terms of Dyck anguages. We prove (aso n Secton 2) that for each context-free anguage L, generated by a grammar G n Dyck norma form, there exst an nteger K and a homomorphsm ϕ such that L = ϕ(d K ), where D K (a subset of the Dyck anguage over K etters) equas, wth very tte exceptons, the trace anguage assocated wth G. In Secton 3 we show that the representaton theorem n Secton 2 emerges, through a transton-ke dagram for context-free grammars n Dyck norma form, to the Chomsky- Schützenberger theorem. By mprovng ths transton dagram, n Secton 4 we refne the reguar anguage provded by the Chomsky-Schützenberger theorem, whe n Secton 5 we show that the refned graphca representaton of dervatons n a context-free grammar n Dyck norma form, used n the prevous sectons, provdes a framework for a reguar grammar that generates a reguar superset approxmaton for the nta context-free anguage. The method used throughout ths paper s graph-constructve, n the sense that t suppes a graphca nterpretaton of the Chomsky-Schützenberger theorem, and consequenty t shows how to graphcay bud a reguar anguage (as mnma as possbe) that satsfes ths theorem. Even f we reach the same famous Chomsky-Schützenberger theorem, the method used to approach t s dfferent from the other methods known n the terature. In bref, the method n [17] s based on pushdown approaches, whe that n [11] uses some knd of magnary brackets that smuate the work of a pushdown store, when dervng a context-free anguage. The method presented n [1] uses equatons on anguages and agebraca approaches to derve severa types of Dyck anguage generators for context-free anguages. In a these works, the Dyck anguage s somehow hdden behnd the derva- 2
tve structure of the context-free anguage (suppementary brackets are needed to derve a Dyck anguage generator for a context-free anguage). The Dyck anguage provded n ths paper s merey found through a parwse-renamng procedure of the nontermnas n the orgna context-free grammar. Hence, t es nsde the context-free grammar t descrbes. Each method used n the terature to prove the Chomsky-Schützenberger theorem provdes ts own reguar anguage. Our am s to fnd a thner reguar anguage that satsfes the Chomsky-Schützenberger theorem (wth respect to the method hereby used) and approachng ths anguage to bud a reguar superset approxmaton for context-free anguages (key to be as thner as possbe). Note that the concept of a thner (mnma) reguar anguage, for the Chomsky-Schützenberger theorem and for the reguar superset approxmaton s reatve, n the sense that t depends on the structure of the grammar n Dyck norma form used to generate the orgna context-free anguage. In [2], [14], [15], and [16] t s proved that there s no agorthm that buds, for an arbtrary context-free anguage L, the mnma context-free grammar that generates L, where the mnmaty of a context-free grammar s consdered, n prncpa, wth respect to descrptona measures such as number of nontermnas, rues, and oops (.e., grammatca eves [14], encountered durng dervatons n a context-free grammar). Consequenty, there s no agorthm to bud a mnma reguar superset approxmaton for an arbtrary context-free anguage. Attempts to fnd optma reguar superset (subsets) approxmatons for context-free anguages can be found n [4], [6], [21], and [23]. In Sectons 3, 4, and 5 we aso ustrate, through severa exampes, the manner n whch the reguar anguages provded by the Chomsky-Schützenberger theorem and by the reguar approxmaton can be but, wth regards to the method proposed n ths paper. 1 Dyck Norma Form We assume the reader to be famar wth the basc notons of forma anguage theory [17]. For an aphabet X, X denotes the free monod generated by X. By x a we denote the number of occurrences of the etter a n the strng x X, whe x s the ength of x V. We denote by λ the empty strng. If X s a fnte set, then X s the cardnaty of X. Defnton 1.1. A context-free grammar 1 G = (N, T, P, S) s sad to be n Dyck norma form f t satsfes the foowng condtons: 1. G s n Chomsky norma form, 2. f A a P, A N, A S, a T, then no other rue n P rewrtes A, 3. for each A N such that X AB P (X BA P ) there s no other rue n P of the form X B A (X AB ), 4. for each rues X AB, X A B (X AB, X AB ), we have A = A (B = B ). 1 A context-free grammar s denoted by G = (N, T, P, S), where N and T are fnte sets of varabes and termnas, respectvey, N T =, S N T s the grammar axom, and P N (N T ) s the fnte set of productons. 3
Note that the reasons for whch we ntroduce the restrctons at tems 2 4, are the foowng. The condton at tem 2 aows to make a partton between those nontermnas rewrtten by nontermnas, and those nontermnas rewrtten by termnas (wth the excepton of the axom). Ths enabes, n Secton 2, to defne a homomorphsm from Dyck words to words generated by a grammar n Dyck norma form. Condtons at tems 3 and 4 aow to spt the set of nontermnas nto parwse nontermnas, and thus to ntroduce bracketed pars. The next theorem proves that the Dyck norma form s correct. Theorem 1.2. For each context-free grammar G = (N, T, P, S) there exsts a grammar G = (N, T, P, S) such that L(G) = L(G ) where G s n Dyck norma form. Proof. Suppose that G s a context-free grammar n Chomsky norma form. Otherwse, usng the agorthm descrbed n [20] we can convert G nto Chomsky norma form. To convert G from Chomsky norma form nto Dyck norma form we proceed as foows. Step 1 We check whether P contans two (or more) rues of the form A a, A b, a b. If t does, then for each rue A b, a b, a new varabe A b s ntroduced. We add the new rue A b b, and remove A b. For each rue X AB (X BA) we add the new rue X A b B (X BA b ), whe for a rue of the form X AA we add three new rues X A b A, X AA b, X A b A b, wthout removng the nta rues. We ca ths procedure an A b -termna substtuton of A. For each rue A a, a T, we check whether a rue of the form A B 1 B 2, B 1, B 2 N, exsts n P. If t does, then a new nontermna A a s ntroduced and an A a -termna substtuton of A for the rue A a s performed. Step 2 Suppose there exst two (or more) rues of the form X AB and X B A. If we have agreed on preservng ony the eft occurrences of A on the rght-hand sdes, then accordng to condton 3 of Defnton 1.1, we have to remove a rght occurrences of A. To do so we ntroduce a new nontermna Z A and a rght occurrences of A, preceded at the eft sde by Z, n the rght-hand sde of a rue, are substtuted by Z A. For each rue that rewrtes A, A Y, Y N 2 T, we add a new rue of the form Z A Y, preservng the rue A Y. We ca ths procedure an Z A-nontermna substtuton of A. Accordng to ths procedure, for the rue X B A, we ntroduce a new nontermna B A, we add the rue X B B A, and remove the rue X B A. For each rue that rewrtes A, of the form 2 A Y, Y N 2 T, we add a new rue of the form B A Y, preservng the rue A Y. Step 3 Fnay, for each two rues X AB, X A B (X BA, X BA ) wth A A, a new nontermna A B (B A ) s ntroduced to repace B from the second rue, and we perform an A B(B A )-nontermna substtuton of B,.e., we add X A A B, and remove X A B. For each rue that rewrtes B, of the form B Y, Y N 2 T, we add a new rue A B Y, preservng B Y. In the case that A occurs on the rght-hand sde of another rue, such that A matches at the rght sde wth another nontermna dfferent of A B, then the procedure descrbed above s repeated for A, too. Note that, f one of the condtons 2, 3, and 4 n Defnton 1.1, has been setted, we do not have to resove t once agan n further steps of the procedure. The new grammar G but as descrbed at steps 1, 2, and 3 has the set of nontermnas N and the set of productons P 2 Ths case deas wth the possbty of havng Y = B B A, too. 4
composed of a nontermnas from N and productons from P, pus/mnus a nontermnas and productons, respectvey ntroduced/removed accordng to the substtutons performed durng the above steps. Next we prove that grammars G = (N, T, P, S) n Chomsky norma form, and G = (N, T, P, S) n Dyck norma form, generate the same anguage. Consder the homomorphsm h d : N T N T defned by h d (x) = x, x T, h d (X) = X, for X N, and h d (X ) = X for X N N, X N such that X s a (transtve 3 ) X -substtuton of X, termna or not, n the above constructon of the grammar G. To prove that L(G ) L(G) we extend h d to a homomorphsm from (N T ) to (N T ) defned on the cassca concatenaton operaton. It s straghtforward to prove by nducton, that for each α G δ we have h d (α) G h d(δ). Ths mpes that for any dervaton of a word w L(G ),.e., S G w, we have h d (S) G h d(w),.e., S G w, or equvaenty, L(G ) L(G). To prove that L(G) L(G ) we make use of the CYK (Cocke-Younger-Kasam) agorthm as descrbed n [20]. Let w = a 1 a 2...a n be an arbtrary word n L(G), and V,,, {1,..., n}, be the tranguar matrx of sze n n but wth the CYK agorthm. Snce w L(G), we have S V 1n. We prove that w L(G ),.e., S V 1n, where V,,, {1,..., n} forms the tranguar matrx obtaned by appyng the CYK agorthm to w accordng to G productons. We consder two reatons ĥt (N T ) (N T ) and ĥ t N N. The frst reaton s defned by ĥt(x) = x, x T, ĥt(s) = S, f S t, t T, s a rue n G, and ĥt(x) = X, f X s a (transtve) X -termna substtuton 4 of X, and X t s a rue n G. Fnay, ĥt(x) = X f X t P, t T. The second reaton s defned as ĥ t(s) = S, ĥ t(x) = {X} {X X s a (transtve) X -nontermna substtuton of X} and ĥ t(x) = X, f there s no substtuton of X and no rue of the form X t, t T, n G. Note that ĥx(x 1 X 2 )= ĥx(x 1 ) ĥx(x 2 ), for X N, {1, 2}, x {t, t}. Usng ĥt, each rue X t n P has a correspondng set of rues {X t X ĥt(x), X t P } n P. Each rue A BC n P has a correspondng set of rues {A B C A ĥ t(a), B ĥ t(b) ĥ t (B), C ĥ t(c) ĥt(c), B and C are parwse nontermnas, A BC P } n P. Consder V = ĥt(v ) and V = ĥ t(v ), <,, {1,..., n}. We cam that V,, {1,..., n},, defnes the tranguar matrx obtaned by appyng CYK agorthm to rues that derve w n G. Frst, observe that for =, we have V = ĥt(v ) = {A A a P }, {1,..., n}, due to the defnton of ĥt. Now et us consder k =, k {1,..., n 1}. We want to compute V, <. By defnton, we have V = 1 = {A A BC, B V, C V +1 }, so that V = ĥ t (V )= ĥ t( 1 = {A A BC, B V, C V +1 })= 1 = ĥ t({a A BC, B V, C V +1 }) = 1 = {A A B C, A ĥ t(a), B ĥ t(b) ĥt(b), B V, C ĥ t(c) ĥ t (C), C V +1, B and C are parwse nontermnas, A BC P }. Let us expcty deveop the ast unon. 3 There exst X k N, such that X s an X -substtuton of X k, X k s an X k -substtuton of X k 1,..., and X 1 s an X 1-substtuton of X. A of them substtute X. 4 There may exst severa termna/nontermna substtutons for the same nontermna X. Ths makes ĥ t/ĥ t to be a reaton. 5
If k = 1, then {}. For each {1,..., n 1} we have V +1 = {A A B C, A ĥ t (A), B ĥ t(b) ĥt(b), B V, C ĥ t(c) ĥt(c), C V +1+1, B and C are parwse nontermnas, A BC P }. Due to the fact that B V and C V +1+1, B s a termna substtuton of B, whe C s a termna substtuton of C. Therefore, we have B / ĥ t(b), C / ĥ t(c), so that B ĥt(b), for a B V, and C ĥt(c), for a C V +1+1,.e., B ĥt(v ) = V and C ĥt(v +1+1 ) = V +1+1. Therefore, V +1 = {A A B C, B V, C V +1+1 }. If k 2, then {, + 1,..., 1}, and V = 1 = {A A B C, A ĥ t(a), B ĥ t (B) ĥt(b), B V, C ĥ t(c) ĥt(c), C V +1, B and C are parwse nontermnas, A BC P }. We now compute the frst set of the above unon,.e., V = {A A B C, A ĥ t(a), B ĥ t(b) ĥt(b), B V, C ĥ t(c) ĥt(c), C V +1, B and C are parwse nontermnas, A BC P }. By the same reasonng as before, the condton B ĥ t(b) ĥt(b), B V, s equvaent wth B ĥt(v ) = V. Because + 1, C s a nontermna substtuton of C. Therefore, C / ĥt(c), and the condton C ĥ t(c) ĥt(c), C V +1 s equvaent wth C ĥ t(v +1 ) = V +1. So that V = {A A B C, B V, C V +1 }. Usng the same method for each { + 1,..., 1} we have V = {A A B C, A ĥ t(a), B ĥ t(b) ĥt(b), B V, C ĥ t(c) ĥ t (C), C V +1, B and C are parwse nontermnas, A BC P } = {A A B C, B V, C V +1 }. In concuson, V = 1 = {A A B C, B V, C V +1 }, for each, {1,..., n},.e., V,, contans the nontermnas of the n n tranguar matrx computed by appyng the CYK agorthm to rues that derve w n G. Because w L(G), we have S V 1n. That s equvaent wth S V 1n = ĥt(v 1n ), f n = 1, and S V 1n = ĥ t(v 1n ), f n > 1,.e., w L(G ). Coroary 1.3. Let G be a context-free grammar n Dyck norma form. dervaton n G producng a word of ength n, n 1, takes 2n 1 steps. Any termna Proof. If G s a context-free grammar n Dyck norma form, then t s aso n Chomsky norma form, and a propertes of the atter hod. Coroary 1.4. If G = (N, T, P, S) s a grammar n Chomsky norma form, and G = (N, T, P, S) ts equvaent n Dyck norma form, then there exsts a homomorphsm h d : N T N T, such that any dervaton tree of w L(G) s the homomorphc mage of a dervaton tree of the same word n G. Proof. Consder the homomorphsm h d : N T N T defned as h d (A t ) = h d ( Z A) = h d (A Z ) = A, for each A t -termna or Z A(A Z )-nontermna substtuton of A, and h d (t) = t, t T. The cam s a drect consequence of the way n whch the new nontermnas A t, Z A, and A Z have been chosen. Note that, due to the parwse-renamng procedure used to reach the Dyck norma form, t may appear that a context-free grammar n Dyck norma form s more ambguous than the orgna grammar n Chomsky norma form. However, ths s reatve. The dervaton trees of a certan word have the same structure n both grammars, n Chomsky norma form 6
and Dyck norma form (ony some abes of the nodes n these trees dffer). The apparent ambguty can be resoved through the homomorphsm h d consdered n Coroary 1.4. Let G be a grammar n Dyck norma form. To emphass the parwse brackets occurrng on the rght-hand sde of a rue, and aso to make the connecton wth the Dyck anguage, each par (A, B), such that there exsts a rue of the form X AB, s repaced by an ndexed par of brackets [, ]. In each rue that rewrtes A and B, we repace A by [, and B by ], respectvey. Next we present an exampe of the converson procedure descrbed n the proof of Theorem 1.2 aong wth the homomorphsm consdered n Coroary 1.4. Exampe 1.5. Consder the context-free grammar n Chomsky norma form G=({E 0, E, E 1, E 2, T, T 1, T 2, R}, {+,, a}, E 0, P ), where P = {E 0 a/t T 1 /EE 1, E a/t T 1 /EE 1, T a/t T 1, T 1 T 2 R, E 1 E 2 T, T 2, E 2 +, R a}. To convert G nto Dyck norma form, wth respect to Defnton 1.1, tem 2, we frst remove E a and T a. Then, accordng to tem 3, we remove the rght occurrence of T from the rue E 1 E 2 T, aong wth other transformatons that may be requred after competng these procedures. Let E 3 and T 3 be two new nontermnas. We remove E a and T a, and add the rues E 3 a, T 3 a, E 0 E 3 E 1, E 0 T 3 T 1, E E 3 E 1, E T 3 T 1, E 1 E 2 T 3, T T 3 T 1. Let T be the new nontermna that repaces the rght occurrence of T. We add the rues E 1 E 2 T, T T T 1, T T 3 T 1, and remove E 1 E 2 T. We repeat the procedure wth T 3 (added n the prevous step),.e., we ntroduce a new nontermna T 4, remove E 1 E 2 T 3, add E 1 E 2 T 4 and T 4 a. Due to the new nontermnas E 3, T 3, T 4, tem 4 does not hod. To have accompshed ths condton, we ntroduce three new nontermnas E 4 to repace E 2 n E 1 E 2 T 4, E 5 to repace E 1 n E 0 E 3 E 1 and E E 3 E 1, and T 5 to repace T 1 n E 0 T 3 T 1 and E T 3 T 1. We remove a the above rues and add the new rues E 1 E 4 T 4, E 4 +, E 0 E 3 E 5, E E 3 E 5, E 5 E 2 T, E 5 E 4 T 4, E 0 T 3 T 5, E T 3 T 5, and T 5 T 2 R. The Dyck norma form of G, n bracketed notaton, s G = ({E 0, [ 1, [ 2,..., [ 7, ] 1, ] 2,..., ] 7 }, {+,, a}, E 0, P ), P ={E 0 a/[ 1 ] 1 /[ 2 ] 2 /[ 3 ] 3 /[ 4 ] 4, [ 1 [ 1 ] 1 /[ 4 ] 4, [ 2 [ 1 ] 1 /[ 2 ] 2 /[ 3 ] 3 /[ 4 ] 4, ] 1 [ 7 ] 7, ] 2 [ 5 ] 5 /[ 6 ] 6, ] 3 [ 5 ] 5 /[ 6 ] 6, ] 4 [ 7 ] 7, ] 5 [ 1 ] 1 /[ 4 ] 4, [ 3 a, [ 4 a, [ 5 +, [ 6 +, ] 6 a, [ 7, ] 7 a}, where ([ T, ] T1 ) = ([ 1, ] 1 ), ([ E, ] E1 ) = ([ 2, ] 2 ), ([ E3, ] E5 ) = ([ 3, ] 3 ), ([ T3, ] T5 ) = ([ 4, ] 4 ), ([ E2, ] T ) = ([ 5, ] 5 ), ([ E4, ] T4 ) = ([ 6, ] 6 ), ([ T2, ] R ) = ([ 7, ] 7 ). The homomorphsm h d s defned as h d : N T N T, h d (E 0 ) = E 0, h d ([ 2 ) = h d ([ 3 ) = E, h d (] 2 ) = h d (] 3 ) = E 1, h d ([ 5 ) = h d ([ 6 ) = E 2, h d ([ 1 ) = h d (] 5 ) = h d ([ 4 ) = h d (] 6 ) = T, h d (] 1 ) = h d (] 4 ) = T 1, h d ([ 7 ) = T 2, h d (] 7 ) = R, h d (t) = t, for each t T. The strng w = a a a + a s a word n L(G ) = L(G) generated, for nstance, by a eftmost dervaton D n G as foows. D : E 0 [ 2 ] 2 [ 1 ] 1 ] 2 [ 4 ] 4 ] 1 ] 2 a ] 4 ] 1 ] 2 a [ 7 ] 7 ] 1 ] 2 a ] 7 ] 1 ] 2 a a ] 1 ] 2 a a [ 7 ] 7 ] 2 a a ] 7 ] 2 a a a ] 2 a a a [ 6 ] 6 a a a + ] 6 a a a + a. Appyng h d to D, n G, we obtan a dervaton of w n G. If we consder T the dervaton tree of w n G, and T the dervaton tree of w n G, then T s the homomorphc mage of T through h d. 7
2 Characterzatons of Context-Free Languages by Dyck Languages Defnton 2.1. Let G k = (N k, T, P k, S) be a context-free grammar n Dyck norma form wth N k {S} = 2k. Let D : S u 1 u 2... u 2n 1 = w, n 2, be a eftmost dervaton of w L(G). The trace-word of w assocated wth the dervaton D, denoted as t w,d, s defned as the concatenaton of nontermnas consecutvey rewrtten n D, excudng the axom. The trace-anguage assocated wth G k, denoted by L(G k ), s L(G k ) = {t w,d for any w L(G k ), and any eftmost dervaton D of w}. Note that t w,d, w L(G), can aso be read from the dervaton tree n the depthfrst search order startng wth the root, but gnorng the root and the eaves. The traceword assocated wth w and the eftmost dervaton D n Exampe 2.5 s t a a a+a,d = [ E [ T [ T3 ] T5 [ T2 ] R ] T1 [ T2 ] R ] E1 [ E4 ] T4. Defnton 2.2. A one-sded Dyck anguage over k etters, k 1, s a context-free anguage defned by the grammar Γ k = ({S}, T k, P, S), where T k = {[ 1, [ 2,..., [ k, ] 1, ] 2,..., ] k } and P = {S [ S ], S SS, S [ ] 1 k}. Let G k = (N k, T, P k, S) be a context-free grammar n Dyck norma form. To emphasze possbe reatons between the structure of trace-words n L(G k ) and the structure of words n the Dyck anguage, and aso to keep contro of each bracketed par occurrng on the rght-hand sde of each rue n G k, we fx N k = {S, [ 1, [ 2,..., [ k, ] 1, ] 2,..., ] k }, and P k to be composed of rues of the forms X [ ], 1 k, and Y t, X, Y N k, t T. From [19] we have adopted the next characterzatons of D k, k 1, (Defnton 2.3, and Lemmas 2.4 and 2.5). Defnton 2.3. For a strng w, et w : be ts substrng startng at the th poston and endng at the th poston. Let h be a homomorphsm defned as foows: h([ 1 ) = h([ 2 ) =... = h([ k ) = [ 1, h(] 1 ) = h(] 2 ) =... = h(] k ) =] 1. Let w D k, 1 w, where w s the ength of w. We say that (, ) s a matched par of w, f h(w : ) s baanced,.e., h(w : ) has an equa number of [ 1 s and ] 1 s and, n any prefx of h(w : ), the number of [ 1 s s greater than or equa to the number of ] 1 s. Lemma 2.4. A strng w {[ 1, ] 1 } s n D 1 f and ony f t s baanced. Consder the homomorphsms defned as foows (where λ s the empty strng) h 1 ([ 1 ) = [ 1, h 1 (] 1 ) =] 1, h 1 ([ 2 ) = h 1 (] 2 ) =... = h 1 ([ k ) = h 1 (] k ) = λ, h 2 ([ 2 ) = [ 1, h 2 (] 2 ) =] 1, h 2 ([ 1 ) = h 2 (] 1 ) =... = h 2 ([ k ) = h 2 (] k ) = λ,..... h k ([ k ) = [ 1, h k (] k ) =] 1, h k ([ 1 ) = h k (] 1 ) =... = h k ([ k 1 ) = h k (] k 1 ) = λ. Lemma 2.5. We have w D k, k 2, f and ony f the foowng condtons hod: ) (1, w ) s a matched par, and ) for a matched pars (, ), h k (w : ) are n D 1, where k 1. Defnton 2.6. Let w D k, (, ) s a nested par of w f (, ) s a matched par, and ether = + 1, or ( + 1, 1) s a matched par. 8
Defnton 2.7. Let w D k and (, ) be a matched par of w. We say that (, ) s reducbe f there exsts an nteger, < <, such that (, ) and ( + 1, ) are matched pars. Let w D k, f (, ) s a nested par of w then (, ) s an rreducbe par. If (, ) s a nested par of w then ( + 1, 1) may be a reducbe par. Theorem 2.8. The trace-anguage assocated wth a context-free grammar, G = (N k, T, P k, S) n Dyck norma form, wth N k = 2k + 1, s a subset of D k. Proof. Let N k = {S, [ 1,..., [ k, ] 1,..., ] k } be the set of nontermnas, w L(G), and D a eftmost dervaton of w. We show that any subtree of the dervaton tree, read n the depth-frst search order, by gnorng the root and the termna nodes, corresponds to a matched par n t w,d. In partcuar, (1, t w,d ) w be a matched par. Denote by t w,d: the substrng of t w,d startng at the th poston and endng at the th poston of t w,d. We show that for a matched pars (, ), h k (t w,d: ) beong to D 1, 1 k k. We prove these cams by nducton on the heght of subtrees. Bass. Certany, any subtree of heght n = 1, read n the depth-frst search order, ooks ke [ ], 1 k. Therefore, t satsfes the above condtons. Inducton step. Assume that the cam s true for a subtrees of heght, < n, and we prove t for = n. Each subtree of heght n can have one of the foowng structures. The eve 0 of the subtree s marked by a eft or rght bracket. Ths bracket w not be consdered when we read the subtree. Denote by [ m the eft son of the root. Then the rght son s abeed by ] m. They are the roots of a eft and rght subtree, for whch at east one has the heght n 1. Suppose that both subtrees have the heght 1 n 1. By the nducton hypothess, et us further suppose that the eft subtree corresponds to the matched par (, ), and the rght subtree corresponds to the matched par ( r, r ), r = +2, because the poston +1 s taken by ] m. As h s a homomorphsm, we have h(t w,d 1: r ) = h([ m t w,d : ] m t w,d +2: r ) = h([ m )h(t w,d : )h(] m )h(t w,d +2: r ). Therefore, h(t w,d 1: r ) satsfes a condtons n Defnton 2.3, and thus ( 1, r ) that corresponds to the consdered subtree of heght n, s a matched par. By the nducton hypothess, h k (t w,d : ) and h k (t w,dr:r ) are n D 1, 1 k k. Hence, h k (t w,d 1: r ) = h k ([ m )h k (t w,d : )h k (] m )h k (t w,d +2: r ) {h k (t w,d : )h k (t w,d +2: r ), [ 1 h k (t w,d : )] 1 h k (t w,d +2: r )} beong to D 1, 1 k k. Note that n ths case the matched par ( 1, r ) s reducbe nto ( 1, + 1) and ( + 2, r ), where ( 1, + 1) corresponds to the substrng t w,d 1: +1 = [ mt w,d : ] m. We refer to ths structure as the eft embedded subtree,.e., ( 1, + 1) s a nested par. A smar reasonng s apped for the case when one of the subtrees has the heght 0. Anaogousy, t can be shown that the nta tree corresponds to the matched par (1, t w,d ),.e., the frst condton of Lemma 2.5 hods. So far, we have proved that each subtree of the dervaton tree, and aso each eft embedded subtree, corresponds to a matched par (, ) and (, ), such that h k (t w,d: ) and h k ([ m t w,d : ] m ), 1 k k, are n D 1. Next we show that a matched pars from t w,d correspond ony to subtrees, or eft embedded subtrees, from the dervaton tree. To derve a contradcton, et us suppose that there exsts a matched par (, ) n t w,d, that does not correspond to any subtree, or eft 9
embedded subtree, of the dervaton tree read n the depth-frst search order. We show that ths eads to a contradcton. Snce (, ) does not correspond to any subtree, or eft embedded subtree, there exst two adacent subtrees θ 1 (a eft embedded subtree) and θ 2 (a rght subtree) such that (, ) s composed of two adacent subparts of θ 1 and θ 2. In terms of matched pars, f θ 1 corresponds to the matched par ( 1, 1 ) and θ 2 corresponds to the matched par ( 2, 2 ), such that 2 = 1 + 2, then there exsts a suffx s 1 1: 1 +1 of t w,d1 1: 1 +1, and a prefx p 2 : 2 of t w,d2 : 2, such that t w,d: = s 1 1: 1 +1p 2 : 2. Furthermore, wthout oss of generaty, we assume that ( 1, 1 ) and ( 2, 2 ) are nested pars. Otherwse, the matched par (, ) can be narrowed unt θ 1 and θ 2 are characterzed by two nested pars. If ( 1, 1 ) s a nested par, then so s ( 1 1, 1 + 1). As s 1 1: 1 +1 s a suffx of t w,d1 1: 1 +1 and ( 1 1, 1 + 1) s a matched par, wth respect to Defnton 2.3, the number of ] 1 s n h(s 1 1: 1 +1) s greater than or equa to the number of [ 1 s n h(s 1 1: 1 +1). On the other hand, s 1 1: 1 +1 s aso a prefx of t w,d:, because (, ) s a matched par, by the nducton hypothess. Therefore, the number of [ 1 s n h(s 1 1: 1 +1) s greater than or equa to the number of ] 1 s n h(s 1 1: 1 +1). Hence, the ony possbty for s 1 1: 1 +1 to be, n the same tme, a suffx for t w,d1 1: 1 +1 and a prefx for t w,d:, s the equaty between the number of [ 1 s and ] 1 s n h(s 1 1: 1 +1). Ths property hods f and ony f s 1 1: 1 +1 corresponds to a matched par n t w,d1 1: 1 +1,.e., f s and s are the start and the end postons of s 1 1: 1 +1 n t w,d1 1: 1 +1, then ( s, s ) s a matched par. Thus, ( 1 1, 1 + 1) s a reducbe par nto ( 1 1, s 1) and ( s, s ), where s = 1 + 1. We have reached a contradcton,.e., ( 1 1, 1 + 1) s reducbe. Therefore, the matched pars n t w,d correspond to subtrees, or eft embedded subtrees, n the dervaton tree. For these matched pars we have aready proved that they satsfy Lemma 2.5. Accordngy, t w,d D k, and consequenty the trace-anguage assocated wth G s a subset of D k. Theorem 2.9. Gven a context-free grammar G there exst an nteger K, a homomorphsm ϕ, and a subset D K of the Dyck anguage D K, such that L(G) = ϕ(d K ). Proof. Let G be a context-free grammar and G k = (N k, T, P k, S) be the Dyck norma form of G, such that N k = {S, [ 1,..., [ k, ] 1,..., ] k }. Let L(G k ) be the trace-anguage assocated wth G k. Consder {t k+1,..., t k+p } the ordered subset of T, such that S t k+ P, 1 p. We defne N k+p = N k {[ tk+1,..., [ tk+p, ] tk+1,...] tk+p }, and P k+p = P k {S [ tk+ ] tk+, [ tk+ t k+, ] tk+ λ S t k+ P, 1 p}. The new grammar G k+p = (N k+p, T, P k+p, S) generates the same anguage as G k. Let ϕ: (N k+p {S}) T be the homomorphsm defned by ϕ(n) = λ, for each rue of the form N XY, N, X, Y N k {S}, and ϕ(n) = t, for each rue of the form N t, N N k {S}, and t T, ϕ([ k+ ) = t k+, and ϕ(] k+ ) = λ, for each 1 p. Obvousy, L = ϕ(d K ), where K = k + p, D K = L(G k) L p, and L p = {[ tk+1 ] tk+1,..., [ tk+p ] tk+p }. In the seque, grammar G k+p s caed the extended grammar of G k. G k has an extended grammar f and ony f G k (or G) has rues of the form S t, t T {λ}. If G k does not have an extended grammar then D K = D k = L(G k). 10
3 On the Chomsky-Schützenberger Theorem Let G k = (N k, T, P k, S) be an arbtrary context-free grammar n Dyck norma form, wth N k = {S, [ 1,..., [ k, ] 1,..., ] k }. and ϕ: (N k {S}) T the restrcton of the homomorphsm ϕ n the proof of Theorem 2.9. We dvde N k nto three man sets N (1), N (2), N (3) as foows: 1. [ and ] beong to N (1) f and ony f ϕ([ ) = t and ϕ(] ) = t, t, t T, 2. [ and ] beong to N (2) f and ony f ϕ([ ) = t and ϕ(] ) = λ, or vce versa ϕ([ ) = λ and ϕ(] ) = t, t T, 3. [, ] N (3) f and ony ϕ([ ) = λ and ϕ(] ) = λ. Certany, N k {S} = N (1) N (2) N (3) and N (1) N (2) N (3) =. N (2) s further dvded nto N (2) and N r (2), where N (2) contans those pars [, ] N (2) such that ϕ([ ) λ, whe N r (2) contans those pars [, ] N (2) such that ϕ(] ) λ. Defnton 3.1. A grammar G k s n near-dyck norma form f G k s n Dyck norma form and N (3) =. Theorem 3.2. For each near grammar G, there exts a grammar G k n near-dyck norma form such that L(G) = L(G k ), and vce versa. Proof. Each near grammar G, n standard form, s composed of rues of the forms X λ, X t, X t 1 Y, X Y t 2, X t 1 Y t 2, t, t 1, t 2 T, X, Y N. Transformng G nto Chomsky norma form, and then nto the Dyck norma form, we obtan a grammar G k n near-dyck norma form. Snce the standard form for near anguages, Chomsky norma form, and Dyck norma form are weaky equvaent we obtan L(G) = L(G k ). The converse statement s trva. Next we consder more cosey the structures of the dervaton trees assocated wth words generated by near and context-free grammars n near-dyck norma form and Dyck norma form, respectvey. We are nterested on the structure of the trace-words assocated wth words generated by these grammars. Let G k = (N k, T, P k, S) be an arbtrary (near) context-free grammar n (near-)dyck norma form, and L(G k ) the anguage generated by ths grammar. Let w L(G k ), D a eftmost dervaton of w, and t w,d the trace-word of w assocated wth D. From the structure of the dervaton tree, read n the depth-frst search order, t s easy to observe that each bracket [, such that [, ] N (1), s mmedatey foowed, n t w,d by ts parwse ]. The same property hods for those pars [, ] N (2). If [, ] N r (2) N (3) then the par [, ] shoud embed a eft subtree,.e., the case of the eft embedded subtree n the proof of Theorem 2.8. In ths case the bracket [ may have a eft, ong dstance, pacement from ts parwse ], n t w,d. Suppose that G k s a near grammar n near-dyck norma form,.e., N (3) =, such that N (2) and N r (2). Each word w = a 1 a 2...a n L(G k ), of an arbtrary ength n, has the property that there exsts an ndex n t, 1 n t n 1, and a unque par 5 5 To emphasze whch of the brackets n the par ([, ] ) produces a termna, we aso use the notaton [, ] t f and ony f [, ] N r (2),, ] f and ony f [, ] N (2), and, ] t f and ony f [, ] N (1). 11
[ t, ]t N (1), such that [ t a n t and ] t a n t+1. Usng the homomorphsm ϕ n Theorem 2.9, we have ϕ([ t ) = a n t and ϕ(] t ) = a n t+1. For the poston n t aready marked, there s no other poston n w wth the above property. We ca [ t ]t the core segment of the trace-word t w,d. Trace-words of words generated by context-free grammars n Dyck norma form have more than one core segment. Each core segment nduces n a trace-word (both for near and context-free anguages) a symmetrca dstrbuton of rght brackets n N r (2) N (3) (aways paced at the rght sde of the core segment) accordng to eft brackets n N r (2) N (3) (aways paced at the eft sde of the respectve core). The structure of the trace-word of a word w L(G k ), for a grammar G k n near-dyck norma form, s depcted n (1), where by vertca nes we emphasze the mage through the homomorphsm ϕ of each bracket occurrng n t w,d. [ 1... [ k1 [ 1 ] 1 [ k1 +1... [ k2 [ 2 ] 2... [ nt ] 1 nt [ 1 knt +1... [ 1 knt t w,d =............ λ... λ a 1 λ λ... λ a 2 λ... a nt 1 λ λ λ [ t nt ] t nt ] knt... ]... ] knt 1 k2... ] k1... ] 1............ (1) a nt a nt+1 a nt+2... a n knt 1... k 1 +1...a n k2 k 1 +1... a n k1 +1... a n Next our am s to fnd a connecton between Theorem 2.9 and the Chomsky-Schützenberger theorem. More precsey we want to compute, from the structure of trace-words, the reguar and the Dyck anguages yeded by the Chomsky-Schützenberger theorem. Therefore, we bud a transton-ke dagram for context-free grammars n Dyck norma form. Frst we bud some drected graphs as foows. Constructon 3.3. Let G k = (N k, T, P k, S) be an arbtrary context-free grammar n Dyck norma form. A dependency graph of G k s a drected graph G X = (V X, E X ), X {] [, ] N (3) } {S}, n whch vertces are abeed wth varabes n Ñk {X}, Ñ k = {[ [, ] N (1) N r (2) N (3) } {] [, ] N (2) } and the set of edges s but as foows. For each rue X [ ] P k, [, ] N (2), G X contans a drected edge from X to ], for each rue X [ ] P k, [, ] N (1) N r (2) N (3), G X contans a drected edge from X to [. There exsts an edge n G X from a vertex abeed by [, [, ] N r (2) N (3), to a vertex abeed by ] /[ k, [, ] N (2), [ k, ] k N (1) N r (2) N (3), f there exsts a rue n P k of the form [ [ ] /[ [ k ] k. There exsts an edge n G X from a vertex abeed by ], [, ] N (2), to a vertex abeed by ] /[ k, [, ] N (2), [ k, ] k N (1) N r (2) N (3), f there exsts a rue n P k of the form ] [ ] /] [ k ] k. The vertex abeed by X s caed the nta vertex of G X. Any vertex abeed by a eft bracket n N (1) s a fna vertex. Let G X be a dependency graph of G k. Consder the set of a possbe paths n G X startng from the nta vertex to a fna vertex. Such a path s caed termna path. A oop or cyce n a graph s a path from v to v composed of dstnct vertces. If from v to v there s no other vertex, then the oop s a sef-oop. The cyce rank of a graph s a measure 12
of the oop compexty formay defned 6 and studed n [3] and [7]. In [7] t s proved that from each two vertces u and v beongng to a dgraph of cyce rank k, there exsts a reguar expresson of star-heght 7 at most k that descrbes the set of paths from u to v. On the other hand, the cyce rank of a dgraph wth n vertces s upper bounded by n og n [13]. Hence any reguar expresson obtaned from a dgraph wth n vertces has the star-heght at most n og n. Consequenty, the (nfnte) set of paths from an nta vertex to a fna vertex n G X, can be dvded nto a fnte number of casses of termna paths. Paths beongng to the same cass are characterzed by the same reguar expresson, n terms of and + Keene operatons, of star-heght at most V X og V X (whch s fnte reated to the engths of strngs n L(G k )). Denote by R X the set of a reguar expressons over [ t Ñk {X} that can be read n G X, startng from the nta vertex X and endng n the fna vertex. The cardnaty of RX s fnte. Defne the homomorphsm h G : Ñ k {X} {] [, ] N r (2) N (3) } {λ} such that h G ([ ) =] for any [, ] N r (2) N (3), h G (X) = h G ( ) = h G(] ) = λ, for any, ]t N (1) and [, ] N (2). For any eement r.e (,X) R X we bud a new reguar expresson 8 r.e (r,x) h r G (r.e(,x) ), where h r [ t G s the mrror mage of h G. Consder r.e X certan X and, denote by R.eX the set of a reguar expressons r.e X Furthermore, R.e X =,]t N (1) R.eX = r.e (,X) and R.e = R.e S ( [,] N (3) R.e] ). = r.e (r,x). For a obtaned as above. Constructon 3.4. Let G k = (N k, T, P k, S) be a context-free grammar n Dyck norma form and {G X X {] [, ] N (3) {S}}} the set of dependency graphs of G k. The extended dependency graph of G k, denoted by G e = (V e, E e ), s a drected graph for whch V e = Ñk {S} {] [, ] N r (2) N (3) }, S s the nta vertex of G e and E e s but as foows: 1. - S[ (S] ) - there exsts an edge n G e from the vertex abeed by S to a vertex abeed by [ (from S to ] ), [, ] N (1) N r (2) N (3) ([, ] N (2) ), f there exsts a reguar expresson n R.e S wth a prefx of the form S[ (S], respectvey). 2. - ] ] - there exsts an edge n G e from a vertex abeed by ] to a vertex abeed by ] [, ], [, ] N (2), f there exsts a reguar expresson n R.e havng a substrng of the form ] ] (f = then ] ] forms a sef-oop n G e ). 3. - ] [ (or [ ] ) - there exsts an edge n G e from a vertex abeed by ] to a vertex abeed by [ (or vce versa from [ to ] ) such that [, ] N (2) and [, ] N r (2) N (3), f there exsts a reguar expresson n R.e havng a substrng of the form ] [ ([ ], respectvey). 4. - [ [ - there exsts an edge n G e from a vertex abeed by [ to a vertex abeed by [, [, ], [, ] N r (2) N (3), f there exsts a reguar expresson n R.e havng a substrng of the 6 In bref, the rank of a cyce C s 1 f there exsts v C such that C v s not a cyce. Recursvey, the rank of a cyce C s k f there exsts v C such that C v contan a cyce of rank k 1 and a the other cyces n C v have the rank at most k 1. 7 Informay, ths s the (maxma) power of a nested -oop occurrng n the descrpton of a reguar expresson. For the forma defnton the reader s referred to [7] and [18] (see aso Defnton 4.1, Secton 4). 8 Snce reguar anguages are cosed under homomorphsm and reverse operaton, r.e (r,x) s a reguar expresson. 13
form [ [ (f = then [ [ forms a sef-oop n G e ). 5. - ] [ t (or [ [ t ) - there exsts an edge n G e from a vertex abeed by ] (or by [ ) to a vertex abeed by [ t, [, ] N (2) (or [, ] N r (2), respectvey), [ t, ]t N (1), f there exsts a reguar expresson n R.e wth a substrng of the form ] [ t ([ [ t, respectvey). 6. - ] - there exsts an edge n G e from a vertex abeed by ] to a vertex abeed by, [, ] N (3),, ]t N (1), f there exsts a reguar expresson n R.e ] of the form ] [ t. 7. - ] [ (or ] ] ) - there exsts an edge n G e from a vertex abeed by ] to a vertex abeed by [, [, ] N (3), [, ] N r (2) ([, ] N (2), respectvey), f there exsts a reguar expresson n R.e ] wth a prefx of the form ] [ (] ], respectvey). 8. - ] ] - there exsts an edge n G e from a vertex ] to a vertex abeed by ], and not necessary dstnct, such that [, ] N r (2), [, ] N r (2) N (3), f ether.,., or. hods:. there exsts at east one reguar expresson n R.e havng a substrng of the form ] ] (f = then ] ] forms a sef-oop n G e ),. there exsts [ k, ] k N (3) such that there exst a reguar expresson n R.e wth a substrng of the form ] k ], and a reguar expresson n R.e ] k that ends n ] (f = then ] ] s a sef-oop).. there exst [ k, ] k, [ k1, ] k1,..., [ km, ] km N (3) such that there exst a reguar expresson n R.e wth a substrng of the form ] k ], a reguar expresson n R.e ] k that ends n ] k1, a reguar expresson n R.e ] k 1 that ends n ] k2, and so on, unt a reguar expresson n R.e ] k m 1 endng n ] km and a reguar expresson n R.e ] km endng n ] are reached. 9. - ] - there exsts an edge n G e from a vertex abeed by, [t, ]t N (1), to a vertex abeed by ], [, ] N r (2) N (3) f ether.,., or. hods. there exsts a reguar expresson n R.e havng a substrng of the form ],. there exsts [ k, ] k N (3) such that there exst a reguar expresson n R.e havng a substrng of the form ] k ], and a reguar expresson n R.e ] k that ends n [ t.. there exst [ k, ] k, [ k1, ] k1,..., [ km, ] km N (3) such that there exst a reguar expresson n R.e wth a substrng of the form ] k ], a reguar expresson n R.e ] k that ends n ] k1, a reguar expresson n R.e ] k 1 that ends n ] k2, and so on, unt a reguar expresson n R.e ] k m 1 endng n ] km and a reguar expresson n R.e ] km endng n [ t are reached. 10. - A vertex abeed by, [t, ]t N (1), s a fna vertex n G e f ether.,., or. hods:. there exsts a reguar expresson n R.e S that ends n,. there exsts [ k, ] k N (3), such that there exst a reguar expresson n R.e S that ends n ] k, and a reguar expresson n R.e ] k that ends n [ t.. there exsts [ k, ] k N (3) such that there exst a reguar expresson n R.e S that ends n ] k, and [ k1, ] k1,..., [ km, ] km N (3) such that there s a reguar expresson n R.e ] k that ends n ] k1, a reguar expresson n R.e ] k 1 that ends n ] k2, and so on, unt a reguar expresson n R.e ] k m 1 endng n ]km and a reguar expresson n R.e ] km endng n [ t are reached. 11. - A vertex abeed by ], [, ] N r (2), s a fna vertex n G e f ether.,., or. hods:. there exsts a reguar expresson n R.e S that ends n ], 14
. there exsts [ k, ] k N (3), such that there exst a reguar expresson n R.e S that ends n ] k, and a reguar expresson n R.e ] k that ends n ].. there exsts [ k, ] k N (3) such that there exst a reguar expresson n R.e S that ends n ] k, and [ k1, ] k1,..., [ km, ] km N (3) such that there exst a reguar expresson n R.e ] k endng n ] k1, a reguar expresson n R.e ] k 1 endng n ] k2, and so on, unt a reguar expresson n R.e ] k m 1 endng n ]km and a reguar expresson n R.e ] km endng n ] are reached. Denote by R e the set of a reguar expressons obtaned by readng a paths n G e from the nta vertex S to a fna vertces (.e., a termna paths). We have Theorem 3.5. (Chomsky-Schützenberger theorem) For each context-free anguage L there exst an nteger K, a reguar set R, and a homomorphsm h, such that L = h(d K R). Furthermore, f G s the context-free grammar that generates L, G k the Dyck norma form of G, and G k has no extended grammar, then K = k and D K R = L(G k ). Otherwse, there exsts p > 0 such that K = k + p, and D K R = D K, where D K s the subset of D K computed as n Theorem 2.9. Proof. Let G k = (N k, T, P k, S) be the Dyck norma form of G such that L = L(G). Suppose that G k does not have an extended grammar. Let h k : Ñ k {] [, ] N r (2) N (3) } {S} {[, ] [, ] N r (2) N (3) } {[ ] [, ] N (2) N (1) } {λ} be the homomorphsm defned by h k (S) = λ, h k ([ ) = [, h k (] ) = ] for [, ] N r (2) N (3), h k (] ) = [ ] for [, ] N (2), and h k ( ) = [t ]t for, ]t N (1). Then R = h k (R e ) s a reguar anguage such that D k R = L(G k ). To prove the ast equaty, notce that each termna path n a dependency graph G X (Constructon 3.3) provdes a strng equa to a substrng (or a prefx f X = S) of a traceword n L(G k ) (n whch eft brackets n N (2) are omtted) generated (n the eftmost dervaton order) from the dervaton tme when X s rewrtten, up to the moment when the very frst eft bracket of a par n N (1) s rewrtten. Ths strng corresponds to a reguar expresson r.e (,X) R X, whch s extended wth another reguar expresson r.e (r,x) that s the mrror mage of eft brackets n N r (2) N (3) occurrng n r.e (,X). If eft brackets n N (2) r N (3) are enroed n a star-heght, then ther homomorphc mage (through h r G ) n s another star-heght. The mrror mage of consecutve eft brackets n N r (2) (wth r.e (r,x) respect to ther reatve core) s a segment composed of consecutve rght brackets n N r (2). The mrror mage of consecutve eft brackets n N (3) s broken by the nterpoaton of a reguar expresson r.e ] n R.e ], [, ] N (3). The number of r.e ] nsertons matches the number of eft brackets [ paced at the eft sde of the reatve core (ths s assured by the ntersecton wth D k ). In fact, the extended dependency graph of G k has been conceved such that t reproduces, on reguar expressons n R e, the structure of trace-words n L(G k ). The man probem s the star-heght synchronzatons for brackets n N r (2) N (3),.e., the number of eft-brackets occurrng n a oop paced at the eft-sde of a core segment ]t, to be equa to the number of ther parwse rght-brackets occurrng n the correspondng mrror oop paced at the rght-sde of ts reatve core, ]t, [t N (1). Ths s controed 15
by the ntersecton of h k (R e ) wth D k, eadng to L(G k ). In few words, the proof s by the constructon descrbed n Constructon 3.4. Another probem that occurs s that the constructon of G e aows to concatenate r.e (,X) R X to ts rght parwse r.e (r,x) as we as to another reguar expresson r.e (r,x ) (whch by constructon t s aso concatenated to ts eft parwse r.e (,X ) ) where X and X are not necessary dstnct. Ths does not change the ntersecton wth the Dyck anguage, but enarges the reguar anguage R = h k (R e ) wth useess 9 words. If G k has an extended grammar G k+p = (N k+p, T, P k+p, S), but as n the proof of Theorem 2.9, then R e s augmented wth e = {S[ tk+1,..., S[ tk+p } and h k s extended to h K : Ñ k {S} {] [, ] N r (2) N (3) } {[ tk+1,..., [ tk+p } {[, ] [, ] N r (2) N (3) } {[ ] [, ] N (2) N (1) } {[ tk+1 ] tk+1,..., [ tk+p ] tk+p } {λ}, where h K (x) = h k (x), x / {[ tk+1,..., [ tk+p }, h K ([ tk+ ) = [ tk+ ] tk+, 1 p, K = k + p. L(G k ) s augmented wth L p = {[ tk+1 ] tk+1,..., [ tk+p ] tk+p } and D K = h K(R e e ) D K = L(G k ) L p. The homomorphsm h s equa to ϕ n Theorem 2.9,.e., ϕ : (N k+p {S}) T, ϕ(n) = λ, for each rue of the form N XY, N, X, Y N k, and ϕ(n) = t, for each rue of the form N t, N N k {S}, t T, ϕ([ k+ ) = t k+, and ϕ(] k+ ) = λ, for each 1 p. Note that, for the case of near anguages there s ony one dependency graph G S. The reguar anguage n the Chomsky-Schützenberger theorem can be but wthout the use of the extended dependency graph. It suffces to consder ony the reguar expressons n R.e S =,]t N (1) R.eS. If G k has an extended grammar G K, then L(G k ) = ϕ(d K h K (R.e S e )), where K = k + p, G K, e, and ϕ are defned as n Theorems 2.9 and 3.5. If G k has no extended grammar then L(G k ) = ϕ(d k h k (R.e S )). However, a graphca representaton may be consdered an nterestng common framework for both, near and context-free anguages. Beow we ustrate the manner n whch the reguar anguage n the Chomsky-Schützenberger theorem can be computed for near (Exampes 3.6) and contextfree (Exampe 3.7) anguages. Exampe 3.6. Consder the near context-free grammar G = ({S, [ 1..., [ 7, ] 1..., ] 7 }, {a, b, c, d}, S, P ) n near-dyck norma form, wth P = {S [ t 1 ] 1, ] 1 [ 2 ] t 2, [ 2 [ 3 ] t 3, [ 3 [ 2 ] t 2 /[t 4 ] 4, ] 4 [ 5 ] t 5, [ 5 [ t 6 ] 6, ] 6 [ t 1 ] 1/[ t 7 ]t 7, [t 1 a, ]t 2 b, ]t 3 c, [t 4 b, ]t 5 d, [t 6 b, [t 7 a, ]t 7 a}. The dependency graph G S and extended dependency graph G e of G are depcted n Fgure 1.a and 1.b, respectvey. There exsts ony one reguar expresson readabe from G S,.e., r.e (,S) = S(] [ t 1 ([ 2 [ 3 ) + ] 4 [ 5 ] 6 ) + [ t 7. Hence, r.es = r.e (,S) r.e (r,s) = S(] 7 [ t 7 [ t 7 [ t 1 ([ 2 [ 3 ) + ] 4 [ 5 ] 6 ) + [ t 7 (] 5(] 3 ] 2 ) + ) +. 7 The reguar anguage provded by the Chomsky-Schützenberger theorem s R = ([ 1 ] 1 ([ 2 [ 3 ) + [ 4 ] 4 [ 5 [ 6 ] 6 ) + [ t 7 ]t 7 (] 5 (] 3 ] 2 ) + ) +. Therefore, D 7 = D 7 R = {([ 1 ] 1 ([ 2 [ 3 ) n [ 4 ] 4 [ 5 [ 6 ] 6 ) m [ t 7 ]t 7 (] 5 (] 3 ] 2 ) n ) m n, m 1} = L(G k ), and L(G) = ϕ(d 7 ) = {(abb)m aa(d(cb) n ) m n, m 1} (G contans no rue of the form S t, t T ). 9 In Secton 4 we show how these unnecessary concatenatons can be avoded, through a refnement procedure of the reguar anguage n the Chomsky-Schützenberger theorem. 16
Fgure 1: a. The dependency graph G S of grammar G n Exampe 1. b. The extended dependency graph of G. Edges coored n orange extend G to G e. c. The transton dagram A e (see Exampe 5.1 a.) but from G e. Each bracket [ (S, ] ) n A e corresponds to state s [ (s S, s ] ). In a graphs S s the nta vertex. In a. - b. the vertex coored n bue s the fna vertex. Exampe 3.7. Consder the context-free grammar G = ({S, [ 1..., [ 7, ] 1..., ] 7 }, {a, b, c}, S, P ) n Dyck norma form wth P = {S [ 1 ] 1, [ 1 [ 5 ] t 5 /[ 1] 1, ] 1 [ 6 ] 6, [ 2 [ 6 ] 6 /[ t 7 ] 7, [ 3 [ t 7 ] 7, [ 5 [ t 4 ]t 4, [ 6 [ 3 ] t 3, ] 6 [ 2 ] t 2, ] 7 [ 3 ] t 3 /[t 4 ]t 4, ]t 2 b, ]t 3 a, [t 4 c, ]t 4 c, ]t 5 b, [t 7 a} The sets of reguar expressons and extended reguar expressons obtaned by readng G S (Fgure 2.a) are R S = {S[ + [ t 1 [ 5[ t 4 } and R.eS = R.e S = {S[ + 4 [ t 1 [ 5[ t 4 ]t 5 ]+ 1 }, respectvey. 4 The reguar expressons and extended reguar expressons readabe from G ] 1 (Fgure 2.b) are R ] 1 = {] [ t 1 [ 6 ([ 3 ] 7 ) + [ t 4 } and R.e] 1 = {] 1 [ 6 ([ 3 ] 7 ) + [ t 4 (]t 3 )+ ] 6 }, respectvey. The reguar ex- 4 pressons and extended reguar expressons obtaned by readng G ] 6 (Fgure 2.c) are R ] 6 = [ t 4 {] 6 [ 2 [ 6 ([ 3 ] 7 ) + [ t 4, ] 6[ 2 (] 7 [ 3 ) ] 7 [ t 4 } and R.e] 6 = R.e ] 6 = {] [ t 6 [ 2 [ 6 ([ 3 ] 7 ) + [ t 4 (]t 3 )+ ] 6 ] t 2, ] 6[ 2 (] 7 [ 3 ) ] 7 [ t 4 (]t 3 ) ] t 2 }, 4 respectvey. The extended dependency graph of G s sketched n Fgure 2.d. Edges n back, are but from the reguar expressons n R X, X {S, ] [ t 1, ] 6 }. Orange edges emphasze symmetrca 4 structures, but wth respect to the structure of trace-words n L(G). Some of them (e.g., ] t 2 ] 1 and ] t 2 ]t 2 ) connect reguar expressons n R e between them wth respect to the structure of trace-words n L(G) (see Constructon 3.4, tem 8). The edge ] t 2 ] 1 s added because there exsts at east one reguar expresson n R e that contans ] 1 ] 1 (e.g. S[ + 1 [ 5[ t 4 ]t 5 ]+ 1 ), a reguar that ends n ] 6 (e.g. ] 1 [ 6 ([ 3 ] 7 ) + [ t 4 (]t 3 )+ ] 6 ) and a reguar expresson n expresson n R.e ] 1 [ t 4 R.e ] 6 that ends n ] t [ t 2 (see Constructon 3.4, tem 8..). The + sef-oop ]t 2 ]t 2 s due to the 4 exstence of a reguar expresson that contans ] 6 ] t 2 (e.g. ] 6[ 2 [ 6 ([ 3 ] 7 ) + [ t 4 (]t 3 )+ ] 6 ] t 2 ) and a reguar expresson n R.e ] 6 [ t 4 that ends n ] t 2 (e.g. ] 6[ 2 [ 6 ([ 3 ] 7 ) + [ t 4 (]t 3 )+ ] 6 ] t 2 or ] 6[ 2 (] 7 [ 3 ) ] 7 [ t 4 (]t 3 ) ] t 2 ). The reguar anguage provded by the Chomsky-Schützenberger theorem s the homomorphc mage, through h k (defned n Theorem 3.5), of a reguar expressons assocated wth a paths n the extended dependency graph n Fgure 2.d, reachabe from the nta vertex S to the fna vertex abeed by ] t 2,.e., termna paths. The nterpretaton that emerges from the graphca method descrbed n ths paper s that the reguar anguage n the Chomsky-Schützenberger theorem ntersected wth a (certan) Dyck anguage sts a dervaton trees (read n the depth-frst search order) assocated wth words n a context-free grammar, n Dyck norma form or n Chomsky norma form 17