A Graph Based Parsing Algorithm for Context-free Languages

A Graph Based Parsing Algorithm for Context-free Languages Giinter Hot> Technical Report A 01/99 June 1999 e-mail: hotzocs.uni-sb.de VVVVVV: http://vwv-hotz.cs.uni-sb. de Abstract We present a simple algorithm deciding the word problem of c. f. languages in O(n 3 ). It decides this problem in time O(n') for unambiguous grammars and in time O(n) in the case of LR(k) grammars. Fachbereich 14 Informatik Universitat des Saarlandes P08tfach 15 11 50 66041 Saarbriicken Germany

1 Introduction There are several algorithms known deciding the word problem of general context-free languages in time O(n 3 ). The algorithm of Younger [You67] is very simple and it solves the problem in time O(n 3 ), but it takes no advantage of special cases. Kasami in [KT69] describes an algorithm, which decides this problem for unambiguous context-free grammars in time O(n 2 log n). Early [Ear70] developed an algorithm which decides the general word problem in time O(n 3 ) but does it for unambiguous grammars in time O(n 2 ) and for a wide class of grammars as LR(k) grammars [Knu65] in time O(n). His algorithm takes no advantage of grammars in a normal form. The proofs are hard to read. We present here a simple algorithm with the same runtime eciency as Early's algorithm. 2 Notations and Denitions Let V T be nite alphabets, V \ T = S 2 V and P (V V 2 ) [ (V T ) a c. f. production system in Chomsky normal form (Ch-NF). We assume that the grammar G := (V T P S) does not contain superuous variables. That means for each x 2 V we nd u1 u2 u 2 T such that x ;! u and s ;! u1xu2 holds. We dene linear forms with variables from V and coecients from the boolean algebra B. These are mappings : V ;! B and we write B hv i := f j : V ;! B g: We use the equivalent notation :=X (v) v We dene the sum and a product in B hv i : As usual we put v2v ( + )(v) :=(v) +(v) for v 2 V: The product x y for x y 2 V gives all possible reductions of xy relative to P. More formally we dene Now we put x y :=X z2v (z) z () ((z) =1() (z xy) 2 P: := X x y2v (x) (y) (x y) 1

we use in this denition for 2 B and 2 B hv i the operation ()(v) = (v) for v 2 V: The product is not associative. (B hv i + ) is distributive. We use furthermore the notation P ;1 (t) =X t z z t z =1() (z t) 2 P: z2v If the operation is associative then for u = t1 ::: t n and (u) := P ;1 (t1) ::: P ;1 (t n ) we have u 2 L(G) () (u)(s) =1: In this case (B hv i ) is a nite monoid and P ;1 : T homomorphism and therefore L(G) is regular. ;! (B (V ) ) is a 3 The Graph ;(G u) We assign to the grammar G and u 2 T an oriented graph ;=(K E) K is the set of vertices and E the set of edges and n := juj the length of u. K [ f(v i 0) j v 2 V 1 <i ng [ f(v i 1) j v 2 V 1 i<n+1g E [ f((v i 1) (v j 0)) j V ;! t i ::: t j;1 1 i<j n +1g Obviously it holds u 2 L(G) () ((s 1 1) (s n 0)) 2 E: The graph ; is closed under the following operation: Let be i<j<m edges of ; and If (z) =1 then the edge (x i 1) s 1 ;! (x j 0) (y j 1) s 2 ;! (y m 0) := x y: (z i 1) s 3 ;! (z m 0) 2

is in ;. We write in this case s3 := s1 s2 in general there may be several edges s 0 3 in the relation s 0 3 := s 1 s2. This closure property corresponds to and x ;! t i ::: t j;1 y ;! t j ::: t m;1 z ;! xy: Therefore we have z ;! t1 ::: t m;1 and from this follows by denition of ;, that s3 is in E. Lemma 1. If there are two dierent operations producing the same edge s3, then G is ambiguous. Proof 1. Let s1 s2 and s 0 1 s0 2 two pairs of edges from ; producing under the explained operation the edge s3, then we have the two dierent derivations z ;! xy x ;! u1 y ;! u2 u3 = u1 u2 z ;! x 0 y 0 x 0 ;! u 0 1 y0 ;! u 0 2 u 3 = u 0 1 u 0 2 : Now we assume G not containing superuous variables. Therefore exist the derivations s ;! ~uzu ;! ~uu1 u2u = ~uu 0 1 u 0 2 u 2 T : So we have more than one derivation of ~uu3u from S, i.e. G is ambiguous. 4 The algorithm We now construct a sequence ;1 ;2 ::: ; n of subgraphs of ; such that ;1 depends only on t1 and with ; n =;: We give an operation which constructs ; i+1 from ; i and estimate the complexity ofthisoperation. Let ; i := (K i E i ) for i =1 ::: n and K i := (v l ") 2 K j 1 l i " 2f0 1gg [ f(x i +1 0) j x 2 V g E i := fs 2 E j source(s), sink(s) 2 K i g: The construction of ;1 can be done in time O(1). 3

We assume ; i, i<nhas been constructed. We add t i+1 and f(v i +1 1) j v 2 V [fv i +2 0) j v 2 V g to K i. We in the rst step add the following edges of E to E i : (v i +1 1) ;! (v i +2 0) for v ;! t i+1: Let ; 0 i the result of this construction. Now we apply the closure operations s1 s2 ;! s3 to edges s1 s2 from ; 0 i : ; i being closed under these operations we have to begin with the new edges in ; 0 i. We have the following situation (x j 1) s 1 ;! (x i +1 0) (y i +1 1) s 2 ;! (y i +2 0): We built from s1 s2 (z j 1) s 3 ;! (z i +2 0) if (z xy) 2 P. Iterating this construction in the worst case we need O(n 2 ) elementary operations to construct ; i+1 from ; i, because each edge of ; 0 i we have to consider only once. To construct ; n by this procedure therefore needs in the worst case O(n 3 ) -operations. If the grammar is unambiguous we construct each edge only one time. Operations s1 s2 which do not produce a new edge we are able to exclude by only once inspecting the pairs of vertices (x l 0) (y l 1). If x y =0, then none of the pairs s 1 ;! (x l 0) (y l 1) s 2 ;! has to be considered. Therefore in this case we need only O(n 2 ) steps because this is the bound for the number of edges in ;. So we proved the Theorem 1. The algorithm dened here solves the word problem for c. f. languages in time O(n 3 ). In the case of unambiguous grammars the running time of the algorithm is O(n 2 ). 4

Corollar 1. The algorithm solves the word problem in the case of grammars with m-bound ambiguity in time O(n 2 m). Proof 2. From the m-bound ambiguity it follows that the algorithm draws each new edge maximal m times. Now we study the case G is a LR(k) grammar. LR(k) grammars are characterized by the following property: For uvu 0 2 L(G) and jvj = k let w1 ::: w l be the reduced words of u v relative to G. Then the set of this words has a common prex u, where u is a reduced word of u, such that we can write w1 + :::+ w l = u (v1 + :::+ v l ) jv i jk for i =1 ::: l: This property enables us to compute an upper bound for the number j; i j of edges in ; i. Obviously we have j;1j m for m := #V: We assume ; i being constructed. We then get ; i+1 by the following steps: 1. We compute P ;1 (t i+1), which produces not more than m new edges. 2. We match the new edges with the existing edges. This leads to new edges connecting vertices belonging to (v1 + :::+ v l ) P ;1 (t i+1)g and edges connecting vertices belonging to vt i+1 with edges belonging to u. The number of edges belonging to the rst class is bound by a constant c depending on m = #V and k. The number of the edges belonging to the second class is 0 if u i is prex of u i+1. It is 1 if ju i+1j = ju i j and it is ju i j;ju i+1j if reductions of the reduced word u i take place. So we have From this we get From this follows j; i+1j j; i j + C + ju i j;ju i+1j +1: j; n j = O(n): Theorem 2. The given graph algorithm solves the word problem for LR(k) grammars G := (V T P S) and words w 2 T with = O(n) -operations. It is obvious that the -operations can be performed on a computer in time only depending on G. This means that it can be done in constant time relative tojwj. 5

Literatur [Ear70] J. Early. An ecient context-free parsing algorithm. Com. ACM, 13, 1970. [Knu65] D. E. Knuth. On the translation of languages from left to right. Information and Control, 8, 1965. [KT69] T. Kasami and K. Tori. A syntax-analysis procedure for unambiguous context-free grammars. ACM, 16, 1969. [You67] D. H. Younger. Recognition and parsing of context-free languages in time n 3. Information and Control, 10, 1967. 6