A Graph Based Parsing Algorithm for Context-free Languages

Similar documents
straight segment and the symbol b representing a corner, the strings ababaab, babaaba and abaabab represent the same shape. In order to learn a model,

Applications of Regular Algebra to Language Theory Problems. Roland Backhouse February 2001

Computability and Complexity

TECHNISCHE UNIVERSITÄT DRESDEN. Fakultät Informatik. Technische Berichte Technical Reports. Daniel Kirsten. TUD / FI 98 / 07 - Mai 1998

Foundations of Informatics: a Bridging Course

Note that neither ; nor are syntactic constituents of content models. It is not hard to see that the languages denoted by content models are exactly t

CYK Algorithm for Parsing General Context-Free Grammars

Laboratoire d Informatique Fondamentale de Lille

2 Garrett: `A Good Spectral Theorem' 1. von Neumann algebras, density theorem The commutant of a subring S of a ring R is S 0 = fr 2 R : rs = sr; 8s 2

Computability and Complexity

Fall 1999 Formal Language Theory Dr. R. Boyer. Theorem. For any context free grammar G; if there is a derivation of w 2 from the

Abstract It is shown that formulas in monadic second order logic (mso) with one free variable can be mimicked by attribute grammars with a designated

Geraud Senizergues. LaBRI. 351, Cours de la Liberation Talence, France?? is shown to be decidable. We reduce this problem to the -

On some computational problems in nite abelian groups. Universitat des Saarlandes. Fachbereich 14 - Informatik. Postfach

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Context-Free Grammars (and Languages) Lecture 7

Computability and Complexity

Lean clause-sets: Generalizations of minimally. unsatisable clause-sets. Oliver Kullmann. University of Toronto. Toronto, Ontario M5S 3G4

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008

Theory of Computation 8 Deterministic Membership Testing

Fundamenta Informaticae 30 (1997) 23{41 1. Petri Nets, Commutative Context-Free Grammars,

Properties of Context-Free Languages

Properties of context-free Languages

Structural Grobner Basis. Bernd Sturmfels and Markus Wiegelmann TR May Department of Mathematics, UC Berkeley.

AUGMENTED REGULAR EXPRESSIONS: A FORMALISM TO DESCRIBE, RECOGNIZE AND LEARN A CLASS OF CONTEXT-SENSITIVE LANGUAGES.

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Solvability of Word Equations Modulo Finite Special And. Conuent String-Rewriting Systems Is Undecidable In General.

Almost Optimal Sublinear Time Parallel. Lawrence L. Larmore. Wojciech Rytter y. Abstract

Abstract This work is a survey on decidable and undecidable problems in matrix theory. The problems studied are simply formulated, however most of the

Decision Problems Concerning. Prime Words and Languages of the

FLAC Context-Free Grammars

Finite Presentations of Pregroups and the Identity Problem

to t in the graph with capacities given by u e + i(e) for edges e is maximized. We further distinguish the problem MaxFlowImp according to the valid v

Grammars and Context Free Languages

For the property of having nite derivation type this result has been strengthened considerably by Otto and Sattler-Klein by showing that this property

Integer Circuit Evaluation is PSPACE-complete. Ke Yang. Computer Science Department, Carnegie Mellon University, 5000 Forbes Ave.

Non-elementary Lower Bound for Propositional Duration. Calculus. A. Rabinovich. Department of Computer Science. Tel Aviv University

1. Affine Grassmannian for G a. Gr Ga = lim A n. Intuition. First some intuition. We always have to rst approximation

Analog Neural Nets with Gaussian or other Common. Noise Distributions cannot Recognize Arbitrary. Regular Languages.

Institute for Advanced Computer Studies. Department of Computer Science. On the Convergence of. Multipoint Iterations. G. W. Stewart y.

Uniquely 2-list colorable graphs

Formal Languages and Automata

Quasigroups and Related Systems 21 (2013), Introduction

Feature Constraint Logics. for Unication Grammars. Gert Smolka. German Research Center for Articial Intelligence and. Universitat des Saarlandes

A shrinking lemma for random forbidding context languages

The Best Circulant Preconditioners for Hermitian Toeplitz Systems II: The Multiple-Zero Case Raymond H. Chan Michael K. Ng y Andy M. Yip z Abstract In

Computation of Substring Probabilities in Stochastic Grammars Ana L. N. Fred Instituto de Telecomunicac~oes Instituto Superior Tecnico IST-Torre Norte

Approximate Map Labeling. is in (n log n) Frank Wagner* B 93{18. December Abstract

Knowledge Discovery. Zbigniew W. Ras. Polish Academy of Sciences, Dept. of Comp. Science, Warsaw, Poland

CS Lecture 29 P, NP, and NP-Completeness. k ) for all k. Fall The class P. The class NP

Weeks 6 and 7. November 24, 2013

Flow Network. The following figure shows an example of a flow network:

example like a b, which is obviously not true. Using our method, examples like this will quickly halt and say there is no solution. A related deciency

Grammars and Context Free Languages

Upper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139

CSCI Compiler Construction

Computability and Complexity

UNIT II REGULAR LANGUAGES

A parsing technique for TRG languages

3.1 Basic properties of real numbers - continuation Inmum and supremum of a set of real numbers

Language-Processing Problems. Roland Backhouse DIMACS, 8th July, 2003

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

2 JOHAN JONASSON we address in this paper is if this pattern holds for any nvertex graph G, i.e. if the product graph G k is covered faster the larger

2 EBERHARD BECKER ET AL. has a real root. Thus our problem can be reduced to the problem of deciding whether or not a polynomial in one more variable

Computability and Complexity

Hamiltonian problem on claw-free and almost distance-hereditary graphs

1 Introduction It will be convenient to use the inx operators a b and a b to stand for maximum (least upper bound) and minimum (greatest lower bound)

A theorem on summable families in normed groups. Dedicated to the professors of mathematics. L. Berg, W. Engel, G. Pazderski, and H.- W. Stolle.

615, rue du Jardin Botanique. Abstract. In contrast to the case of words, in context-free tree grammars projection rules cannot always be

The following can also be obtained from this WWW address: the papers [8, 9], more examples, comments on the implementation and a short description of

Note On Parikh slender context-free languages

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Decidability of Existence and Construction of a Complement of a given function

Minimal enumerations of subsets of a nite set and the middle level problem

Nine Test Tubes Generate any RE Language. C. Ferretti, G. Mauri, C. Zandron. Universita di Milano - ITALY. Abstract

output H = 2*H+P H=2*(H-P)

Music as a Formal Language

Results on Equivalence, Boundedness, Liveness, and Covering Problems of BPP-Petri Nets

Einführung in die Computerlinguistik

(1.) For any subset P S we denote by L(P ) the abelian group of integral relations between elements of P, i.e. L(P ) := ker Z P! span Z P S S : For ea

Linear Algebra, 4th day, Thursday 7/1/04 REU Info:

[3] R.M. Colomb and C.Y.C. Chung. Very fast decision table execution of propositional

STGs may contain redundant states, i.e. states whose. State minimization is the transformation of a given

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

Fall 1999 Formal Language Theory Dr. R. Boyer. 1. There are other methods of nding a regular expression equivalent to a nite automaton in

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

The rest of the paper is organized as follows: in Section 2 we prove undecidability of the existential-universal ( 2 ) part of the theory of an AC ide

October 7, :8 WSPC/WS-IJWMIP paper. Polynomial functions are renable

October 23, concept of bottom-up tree series transducer and top-down tree series transducer, respectively,

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Chap. 7 Properties of Context-free Languages

Implementing -Reduction by. Hypergraph Rewriting. Sabine Kuske 1. Fachbereich Mathematik und Informatik. Universitat Bremen. D{28334 Bremen, Germany

One Quantier Will Do in Existential Monadic. Second-Order Logic over Pictures. Oliver Matz. Institut fur Informatik und Praktische Mathematik

Introduction to Finite-State Devices in Natural Language Processing

Analysis on Graphs. Alexander Grigoryan Lecture Notes. University of Bielefeld, WS 2011/12

Grammars and Context-free Languages; Chomsky Hierarchy

2. THE MAIN RESULTS In the following, we denote by the underlying order of the lattice and by the associated Mobius function. Further let z := fx 2 :

Note Watson Crick D0L systems with regular triggers

of acceptance conditions (nite, looping and repeating) for the automata. It turns out,

ON THE COMPLEXITY OF SOLVING THE GENERALIZED SET PACKING PROBLEM APPROXIMATELY. Nimrod Megiddoy

Transcription:

A Graph Based Parsing Algorithm for Context-free Languages Giinter Hot> Technical Report A 01/99 June 1999 e-mail: hotzocs.uni-sb.de VVVVVV: http://vwv-hotz.cs.uni-sb. de Abstract We present a simple algorithm deciding the word problem of c. f. languages in O(n 3 ). It decides this problem in time O(n') for unambiguous grammars and in time O(n) in the case of LR(k) grammars. Fachbereich 14 Informatik Universitat des Saarlandes P08tfach 15 11 50 66041 Saarbriicken Germany

1 Introduction There are several algorithms known deciding the word problem of general context-free languages in time O(n 3 ). The algorithm of Younger [You67] is very simple and it solves the problem in time O(n 3 ), but it takes no advantage of special cases. Kasami in [KT69] describes an algorithm, which decides this problem for unambiguous context-free grammars in time O(n 2 log n). Early [Ear70] developed an algorithm which decides the general word problem in time O(n 3 ) but does it for unambiguous grammars in time O(n 2 ) and for a wide class of grammars as LR(k) grammars [Knu65] in time O(n). His algorithm takes no advantage of grammars in a normal form. The proofs are hard to read. We present here a simple algorithm with the same runtime eciency as Early's algorithm. 2 Notations and Denitions Let V T be nite alphabets, V \ T = S 2 V and P (V V 2 ) [ (V T ) a c. f. production system in Chomsky normal form (Ch-NF). We assume that the grammar G := (V T P S) does not contain superuous variables. That means for each x 2 V we nd u1 u2 u 2 T such that x ;! u and s ;! u1xu2 holds. We dene linear forms with variables from V and coecients from the boolean algebra B. These are mappings : V ;! B and we write B hv i := f j : V ;! B g: We use the equivalent notation :=X (v) v We dene the sum and a product in B hv i : As usual we put v2v ( + )(v) :=(v) +(v) for v 2 V: The product x y for x y 2 V gives all possible reductions of xy relative to P. More formally we dene Now we put x y :=X z2v (z) z () ((z) =1() (z xy) 2 P: := X x y2v (x) (y) (x y) 1

we use in this denition for 2 B and 2 B hv i the operation ()(v) = (v) for v 2 V: The product is not associative. (B hv i + ) is distributive. We use furthermore the notation P ;1 (t) =X t z z t z =1() (z t) 2 P: z2v If the operation is associative then for u = t1 ::: t n and (u) := P ;1 (t1) ::: P ;1 (t n ) we have u 2 L(G) () (u)(s) =1: In this case (B hv i ) is a nite monoid and P ;1 : T homomorphism and therefore L(G) is regular. ;! (B (V ) ) is a 3 The Graph ;(G u) We assign to the grammar G and u 2 T an oriented graph ;=(K E) K is the set of vertices and E the set of edges and n := juj the length of u. K [ f(v i 0) j v 2 V 1 <i ng [ f(v i 1) j v 2 V 1 i<n+1g E [ f((v i 1) (v j 0)) j V ;! t i ::: t j;1 1 i<j n +1g Obviously it holds u 2 L(G) () ((s 1 1) (s n 0)) 2 E: The graph ; is closed under the following operation: Let be i<j<m edges of ; and If (z) =1 then the edge (x i 1) s 1 ;! (x j 0) (y j 1) s 2 ;! (y m 0) := x y: (z i 1) s 3 ;! (z m 0) 2

is in ;. We write in this case s3 := s1 s2 in general there may be several edges s 0 3 in the relation s 0 3 := s 1 s2. This closure property corresponds to and x ;! t i ::: t j;1 y ;! t j ::: t m;1 z ;! xy: Therefore we have z ;! t1 ::: t m;1 and from this follows by denition of ;, that s3 is in E. Lemma 1. If there are two dierent operations producing the same edge s3, then G is ambiguous. Proof 1. Let s1 s2 and s 0 1 s0 2 two pairs of edges from ; producing under the explained operation the edge s3, then we have the two dierent derivations z ;! xy x ;! u1 y ;! u2 u3 = u1 u2 z ;! x 0 y 0 x 0 ;! u 0 1 y0 ;! u 0 2 u 3 = u 0 1 u 0 2 : Now we assume G not containing superuous variables. Therefore exist the derivations s ;! ~uzu ;! ~uu1 u2u = ~uu 0 1 u 0 2 u 2 T : So we have more than one derivation of ~uu3u from S, i.e. G is ambiguous. 4 The algorithm We now construct a sequence ;1 ;2 ::: ; n of subgraphs of ; such that ;1 depends only on t1 and with ; n =;: We give an operation which constructs ; i+1 from ; i and estimate the complexity ofthisoperation. Let ; i := (K i E i ) for i =1 ::: n and K i := (v l ") 2 K j 1 l i " 2f0 1gg [ f(x i +1 0) j x 2 V g E i := fs 2 E j source(s), sink(s) 2 K i g: The construction of ;1 can be done in time O(1). 3

We assume ; i, i<nhas been constructed. We add t i+1 and f(v i +1 1) j v 2 V [fv i +2 0) j v 2 V g to K i. We in the rst step add the following edges of E to E i : (v i +1 1) ;! (v i +2 0) for v ;! t i+1: Let ; 0 i the result of this construction. Now we apply the closure operations s1 s2 ;! s3 to edges s1 s2 from ; 0 i : ; i being closed under these operations we have to begin with the new edges in ; 0 i. We have the following situation (x j 1) s 1 ;! (x i +1 0) (y i +1 1) s 2 ;! (y i +2 0): We built from s1 s2 (z j 1) s 3 ;! (z i +2 0) if (z xy) 2 P. Iterating this construction in the worst case we need O(n 2 ) elementary operations to construct ; i+1 from ; i, because each edge of ; 0 i we have to consider only once. To construct ; n by this procedure therefore needs in the worst case O(n 3 ) -operations. If the grammar is unambiguous we construct each edge only one time. Operations s1 s2 which do not produce a new edge we are able to exclude by only once inspecting the pairs of vertices (x l 0) (y l 1). If x y =0, then none of the pairs s 1 ;! (x l 0) (y l 1) s 2 ;! has to be considered. Therefore in this case we need only O(n 2 ) steps because this is the bound for the number of edges in ;. So we proved the Theorem 1. The algorithm dened here solves the word problem for c. f. languages in time O(n 3 ). In the case of unambiguous grammars the running time of the algorithm is O(n 2 ). 4

Corollar 1. The algorithm solves the word problem in the case of grammars with m-bound ambiguity in time O(n 2 m). Proof 2. From the m-bound ambiguity it follows that the algorithm draws each new edge maximal m times. Now we study the case G is a LR(k) grammar. LR(k) grammars are characterized by the following property: For uvu 0 2 L(G) and jvj = k let w1 ::: w l be the reduced words of u v relative to G. Then the set of this words has a common prex u, where u is a reduced word of u, such that we can write w1 + :::+ w l = u (v1 + :::+ v l ) jv i jk for i =1 ::: l: This property enables us to compute an upper bound for the number j; i j of edges in ; i. Obviously we have j;1j m for m := #V: We assume ; i being constructed. We then get ; i+1 by the following steps: 1. We compute P ;1 (t i+1), which produces not more than m new edges. 2. We match the new edges with the existing edges. This leads to new edges connecting vertices belonging to (v1 + :::+ v l ) P ;1 (t i+1)g and edges connecting vertices belonging to vt i+1 with edges belonging to u. The number of edges belonging to the rst class is bound by a constant c depending on m = #V and k. The number of the edges belonging to the second class is 0 if u i is prex of u i+1. It is 1 if ju i+1j = ju i j and it is ju i j;ju i+1j if reductions of the reduced word u i take place. So we have From this we get From this follows j; i+1j j; i j + C + ju i j;ju i+1j +1: j; n j = O(n): Theorem 2. The given graph algorithm solves the word problem for LR(k) grammars G := (V T P S) and words w 2 T with = O(n) -operations. It is obvious that the -operations can be performed on a computer in time only depending on G. This means that it can be done in constant time relative tojwj. 5

Literatur [Ear70] J. Early. An ecient context-free parsing algorithm. Com. ACM, 13, 1970. [Knu65] D. E. Knuth. On the translation of languages from left to right. Information and Control, 8, 1965. [KT69] T. Kasami and K. Tori. A syntax-analysis procedure for unambiguous context-free grammars. ACM, 16, 1969. [You67] D. H. Younger. Recognition and parsing of context-free languages in time n 3. Information and Control, 10, 1967. 6