Linear conjunctive languages are closed under complement

Similar documents
60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Theory of Computer Science

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

Chapter 6. Properties of Regular Languages

CPSC 313 Introduction to Computability

Solutions to Problem Set 3

Concordia University Department of Computer Science & Software Engineering

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.

Theory of Computation

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

Set Theory. CSE 215, Foundations of Computer Science Stony Brook University

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

Automata Theory CS F-08 Context-Free Grammars

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

EXAMPLE CFG. L = {a 2n : n 1 } L = {a 2n : n 0 } S asa aa. L = {a n b : n 0 } L = {a n b : n 1 } S asb ab S 1S00 S 1S00 100

Chapter 1. Introduction

CS375 Midterm Exam Solution Set (Fall 2017)

Grammars (part II) Prof. Dan A. Simovici UMB

CS481F01 Prelim 2 Solutions

Information and Computation 194 (2004) Boolean grammars. Alexander Okhotin

Automata Theory and Formal Grammars: Lecture 1

Chapter 3. Regular grammars

Computability and Complexity

Introduction to Formal Languages, Automata and Computability p.1/42

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

CS 121, Section 2. Week of September 16, 2013

Non-context-Free Languages. CS215, Lecture 5 c

VTU QUESTION BANK. Unit 1. Introduction to Finite Automata. 1. Obtain DFAs to accept strings of a s and b s having exactly one a.

Pushdown Automata. Reading: Chapter 6

In English, there are at least three different types of entities: letters, words, sentences.

1 Alphabets and Languages

Left-Forbidding Cooperating Distributed Grammar Systems

Context-Free Grammars: Normal Forms

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

CS 275 Automata and Formal Language Theory

CS 373: Theory of Computation. Fall 2010

CS 133 : Automata Theory and Computability

CS 275 Automata and Formal Language Theory. Proof of Lemma II Lemma (II )

Conjunctive Grammars and Alternating Pushdown Automata Draft Paper

Comment: The induction is always on some parameter, and the basis case is always an integer or set of integers.

Semigroup presentations via boundaries in Cayley graphs 1

Regular Expressions [1] Regular Expressions. Regular expressions can be seen as a system of notations for denoting ɛ-nfa

Theory of Computation

Homework 4. Chapter 7. CS A Term 2009: Foundations of Computer Science. By Li Feng, Shweta Srivastava, and Carolina Ruiz

Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries

CS Automata, Computability and Formal Languages

CSEP 590 Data Compression Autumn Arithmetic Coding

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

CSCI 340: Computational Models. Regular Expressions. Department of Computer Science

Properties of Context-Free Languages. Closure Properties Decision Properties

Theory of Computation - Module 3

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

Section 1 (closed-book) Total points 30

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

CS 275 Automata and Formal Language Theory. Proof of Lemma II Lemma (II )

Einführung in die Computerlinguistik

REGular and Context-Free Grammars

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Author: Vivek Kulkarni ( )

Context-Free and Noncontext-Free Languages

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

HW 3 Solutions. Tommy November 27, 2012

Properties of Context-Free Languages

BASIC MATHEMATICAL TECHNIQUES

UNIT II REGULAR LANGUAGES

Example. Addition, subtraction, and multiplication in Z are binary operations; division in Z is not ( Z).

Theory of Computation (Classroom Practice Booklet Solutions)

Power of controlled insertion and deletion

Fundamentele Informatica II

PRIME RADICAL IN TERNARY HEMIRINGS. R.D. Giri 1, B.R. Chide 2. Shri Ramdeobaba College of Engineering and Management Nagpur, , INDIA

CPS 220 Theory of Computation

Problem Session 5 (CFGs) Talk about the building blocks of CFGs: S 0S 1S ε - everything. S 0S0 1S1 A - waw R. S 0S0 0S1 1S0 1S1 A - xay, where x = y.

HW6 Solutions. Micha l Dereziński. March 20, 2015

ÖVNINGSUPPGIFTER I SAMMANHANGSFRIA SPRÅK. 17 april Classrum Edition

Recitation 4: Converting Grammars to Chomsky Normal Form, Simulation of Context Free Languages with Push-Down Automata, Semirings

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

Linear Conjunctive Grammars and One-turn Synchronized Alternating Pushdown Automata

How to write proofs - II

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

CS500 Homework #2 Solutions

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Sheet 1-8 Dr. Mostafa Aref Format By : Mostafa Sayed

Ogden s Lemma for CFLs

Question Bank UNIT I

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Notes for Comp 497 (454) Week 10

Chapter 4. Regular Expressions. 4.1 Some Definitions

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

NPDA, CFG equivalence

Valence automata over E-unitary inverse semigroups

This lecture covers Chapter 7 of HMU: Properties of CFLs

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

6.8 The Post Correspondence Problem

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

page 1 Total ( )

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Transcription:

Linear conjunctive languages are closed under complement Alexander Okhotin okhotin@cs.queensu.ca Technical report 2002-455 Department of Computing and Information Science, Queen s University, Kingston, Ontario, Canada K7L 3N6 January 2002 Abstract Linear conjunctive grammars are conjunctive grammars in which the body of each conjunct contains no more than a single nonterminal symbol. They can at the same time be thought of as a special case of conjunctive grammars and as a generalization of linear context-free grammars. While the problem of whether the complement of any conjunctive language can be denoted with a conjuctive grammar is still open and conjectured to have a negative answer, it turns out that the subfamily of linear conjunctive languages is in fact closed under complement and therefore under all set-theoretic operations. 1 Introduction Conjunctive grammars, introduced in [1, 2], are context-free grammars augmented with an explicit set-theoretic intersection operation. A grammar is defined as a quadruple G = (Σ, N, P, S), where Σ and N are disjoint finite nonempty sets of terminal and nonterminal symbols respectively; P is a finite set of grammar rules, each of the form A α 1 &... &α n (A N, n 1, i α i (Σ N) ), (1) where the strings α i are distinct, and their order is considered insignificant in the sense that there are no two rules in P that differ only in the order of these strings; S N is a nonterminal designated as the start symbol. For each rule of the form (1) and for each i (1 i n), A α i is called a conjunct. Let conjuncts(g) denote the set of all conjuncts. 1

Terminal strings are derived by a conjunctive grammar by transforming strings over the alphabet Σ N{ (, &, ) } in the following way: nonterminals can be replaced with rule bodies enclosed in parentheses (e.g., by the rule (1) A can be replaced with (α 1 &... &α n ) ), and conjunction of several terminal strings in parentheses can be replaced with one such string (e.g., a substring (u&u) can be replaced with u ). The language generated by a conjunctive grammar is the set of all strings over Σ that can be derived from S. Similarly to the linear context-free grammars, linear conjunctive grammars are those where the body of each conjunct contains no more than one nonterminal symbol, i.e. each string α i in (1) is either in Σ or in Σ N Σ. There is no loss of generality in the stronger assumption that every rule is either of the form or of the form A u 1 B 1 v 1 &... &u m B m v m (u i, v i Σ, B i N) (2a) A w (w Σ ) (2b) It has been proved in [2] that every linear conjunctive grammar can be effectively transformed to an equivalent grammar in the so-called linear normal form, where each rule is of the form A bb 1 &... &bb m &C 1 c&... &C n c (m + n 1; A, B i, C j N; b, c Σ), (3a) A a (A N, a Σ), (3b) S ɛ, only if S does not appear in right parts of rules (3c) Both the family of linear conjunctive languages and the whole family of conjunctive languages are obviously closed under union and intersection, since for any two given conjunctive (linear conjunctive) grammars G i = (Σ, N i, P i, S i ) (i = 1, 2) it is possible to construct the conjunctive (linear conjunctive) grammar G = (Σ, N 1 N 2 {S}, P 1 P 2 P 3, S), where P 3 = {S S 1, S S 2 } for the case of union and P 3 = {S S 1 &S 2 } for the case of intersection. Concerning the closure under complement, it has been conjectured [2] that the whole family of conjunctive languages is not closed under this operation and that the co-context-free language {ww w Σ } is not conjunctive for any alphabet of cardinality 2 or more. This conjecture has neither been proved nor disproved, and it remains unknown whether the family of conjunctive languages is closed under complement. In this paper we prove that for each linear conjunctive grammar G there exists a linear conjunctive grammar that generates the complement of L(G), which means the closure of this language family under complement. The informal reasoning behind the construction is given in Section 2, and Section 3 describes an algorithmic grammar transformation that yields the grammar for the complement of the given language. 2

In Section 4 we prove that the constructed grammar indeed generates the complement of the language generated by the source grammar. Finally, in Section 5 we apply this formal transformation to construct a linear conjunctive grammar for the complement of the language {wcw w {a, b} }. 2 General idea A string w is generated by some nonterminal A if and only if there is a rule A s 11... s 1n1 &... &s m1... s mnm (s ij Σ N) (4) in the grammar, such that for every i (1 i m) there is a factorization w = u 1... u ni, where u j L(s ij ) for all j (1 j n i ). By writing out a formal negation of this statement, we get that a string w is not generated by some nonterminal A if and only if for every rule of the form (4) there exists some i (1 i m), such that for every factorization w = u 1... u ni there is j (1 j n i ), for which u j / L(s ij ). While those universal and existential quantifiers in the latter statement that refer to rules and conjuncts could be implemented by the means of set-theoretic intersection and union (represented with conjunction of several strings in one rule and multiple rules for one nonterminal respectively), there is no obvious way to express within the formalism of conjunctive languages that some condition must hold for every factorization. That is why no method to construct a direct negation of a given conjunctive grammar is known, and it is conjectured [2] that the family is not closed under complement. However, the situation changes if we restrict ourselves to linear conjunctive grammars. If a conjunct is of the form A ubv, where u, v Σ, then there is no more than one meaningful factorization of each string, because w L(uBv) if and only if w = uxv for some x L(B); the same is true in respect to the conjuncts of the form A u. In light of this singularity the difference between the existential and the universal quantifier on factorizations vanishes, and thus the aforementioned condition of a string s not being derivable from a nonterminal can be reformulated as follows: If a conjunctive grammar is linear, then a string w is not generated by some nonterminal A if and only if for every rule of the form (4) there exists a number i (1 i m), a factorization w = u 1... u ni, and a number j (1 j n i ), such that u j / L(s ij ). This condition turns out to be expressible in the terms of linear conjunctive grammars. Let us develop an effective method to construct a linear conjunctive grammar for the complement of the language generated by a given linear conjunctive grammar. 3 Construction Let G = (Σ, N, P, S) be an arbitrary linear conjunctive grammar in the linear normal form. 3

We construct the following linear conjunctive grammar: G = (Σ, N X N Y N Z N U N V N W, P, S ) (5a) where N X = {X A A N}, N Y = {Y A α1 &...&α m A α 1 &... &α m P }, N Z = {Z a a Σ} {Z ɛ }, N U = {U aσ + a Σ}, N V = {V Σ + a a Σ}, N W = {W }, S = X S (5b) (5c) (5d) (5e) (5f) (5g) (5h) The nonterminals of the new grammar are meant to generate the following languages: For each nonterminal symbol A N of the original grammar, the nonterminal X A generates the complement of L G (A), i.e. L G (X A ) = Σ \ L G (A) (6) For each rule A α 1 &... &α m P, the nonterminal Y A α1&...&α m generates those and only those strings that are not generated by this rule in the original grammar: L G (Y A α1&...&α m ) = Σ \ L G ((α 1 &... &α m )) (7) For each terminal symbol a Σ, the nonterminal Z a generates all strings but the string a. Z ɛ generates Σ +. For each a Σ, the nonterminal U aσ + generates all strings except those in a Σ +, i.e. ɛ, all one-symbol strings and and all strings of the form bw (w Σ +, b Σ, b a). Similarly, V Σ + a generates all strings except those that are end with a and are at least two symbols long. Finally, the nonterminal W generates Σ. Now let us construct P, the set of rules of the new grammar: For each nonterminal A N of the original grammar, if there are no rules for A in P, then P contains the following single rule for X A : X A W (8a) Otherwise, if P contains some rules for A, let {A A 1,... A A m } denote all these rules. Then the new grammar contains the following rule for X A : X A Y A A1 &... &Y A Am (8b) 4

For each rule A bb 1 &... &bb m &C 1 c&... &C n c P (m + n 1) of the original grammar, the new grammar contains the following rules: Y A bb1&...&bb m&c 1c&...&C nc U bσ + (if m > 0) (9a) Y A bb1 &...&bb m &C 1 c&...&c n c V Σ + c (if n > 0) (9b) Y A bb1 &...&bb m &C 1 c&...&c n c bx Bi (for all i {1,..., m}) (9c) Y A bb1 &...&bb m &C 1 c&...&c n c X Cj c (for all j {1,..., n}) (9d) For each rule A a P of the original grammar, the new grammar has the rule Y A a Z a (10) If the original grammar contains the rule S ɛ, then there is a rule Y S ɛ Z ɛ (11) The languages generated by the nonterminals from N Z N U N V N W are all regular and thus linear conjunctive; writing the rules for them does not pose any difficulty. 4 Proof Let us prove the correctness of our construction. Lemma 1. Let G = (Σ, N, P, S) be a linear conjunctive grammar in the linear normal form. Let G be a grammar constructed from G using the method given in Section 3. Then for every n 0 and for every nonterminal A N, L G (A) = Σ \ L G (X A ) (mod Σ n ). Proof. Induction on n. Basis n = 1 Let w {ɛ} Σ and let A N. Due to G s being in the linear normal form, w L G (A) if and only if A w P (which is either S ɛ or A a). Let A w P. Then Y A w does not generate w, because the only rule for Y A w is Y A w Z w. Therefore, the only rule for X A contains a conjunct (X A Y S ɛ conjuncts(p )) that does not generate the string w, and hence ɛ / L G (X A ). Now let A w / P. This means that all rules for A are either of the form A bb 1 &... &bb m &C 1 c&... &C n c P (m + n 1) (12a) or of the form A a (a Σ, a w) (12b) 5

For each rule of the form (12a) the corresponding nonterminal Y A bb1 &...&bb m &C 1 c&...&c n c has at least one of the rules (9a) and (9b) and hence generates ɛ. The same holds in respect to each rule of the form (12b), because the nonterminal Y A a has the rule (10), which can generate any string except a, and hence the string w. Since every conjunct of the only rule for nonterminal X A generates w, so does the nonterminal X A. Induction step Assume L G (A) = Σ \L G (X A ) (mod Σ n 1 ) for some n 2 and consider an arbitrary nonterminal A N and an arbitrary string w Σ n. Denote w = bxc, where b, c Σ and x Σ n 2. w L G (A) if and only if there exists some rule A bb 1 &... &bb m &C 1 c&... &C n c P (m + n 1) (13) such that there exist derivations B i C j G G =... = xc (for all i: 1 i m) (14a) G G =... = bx (for all j: 1 j n) (14b) By induction hypothesis, (14) holds if and only if for some rule (13) xc / L G (X Bi ) (for all i: 1 i m) (15a) bx / L G (X Cj ) (for all j: 1 j n) (15b) Now let us see that Y A bb1 &...&bb m &C 1 c&...&c n c does not generate w if and only if (15) is the case. Indeed, if we assume (15), then none of the rules (9) derive w: (9a) cannot derive bxc because it starts with b and is at least two symbols long, (9b) does not derive it because it starts with c, and the rules of the form (9c) and (9d) are of no use due to (15). If (15) is untrue, then xc is in L G (X Bi ) for some i (or bx is in L G (X Cj ) for some i), and hence one of the rules Y A bb1&...&bb m&c 1c&...&C nc bx Bi Y A bb1&...&bb m&c 1c&...&C nc X Cj c (16a) (16b) derives w. Putting together the results we have established so far, w L G (A) if and only if there exists a rule (13) of the original grammar, such that w / L G (Y A bb1&...&bb m&c 1c&...&C nc) (17) But the existence of such a rule implies the existence of a conjunct of the rule (8b) that does not derive w. Since (8b) is the only rule for X A, we conclude that w L G (A) if and only if w / L G (X A ), which, due to the arbitrariness of the choice of w, proves the induction step. 6

This completes the proof. Corollary 1. The languages L G (A) and Σ \ L G (X A ) coincide. Corollary 2. L(G) = Σ \ L(G ). Theorem 1. The family of linear conjunctive languages is closed under complement. 5 Example Consider the following linear conjunctive grammar for the language L = {wcw w {a, b} } [1, 2]: S C&K C aca acb bca bcb c K aa&ak bb&bk ce A aaa aab baa bab cra B aba abb bba bbb crb R ar br ɛ After converting it to linear normal form using the method given in [2] we obtain the following equivalent grammar: S Da&aA&aK Db&aA&aK Ea&bB&bK Eb&bB&bK c C Da Db Ea Eb c D ac E bc K aa&ak bb&bk cr c A ai aj bi bj F a I Aa J Ab B am an bm bn F b M Ba N Bb R ar br a b F cr c The resulting grammar for the complement of L contains the following rules: X S Y S Da&aA&aK &Y S Db&aA&aK &Y S Ea&bB&bK & Y S Eb&bB&bK &Y S c Y S Da&aA&aK U aσ + V Σ + a X D a ax A ax K Y S Db&aA&aK U aσ + V Σ + b X D b ax A ax K Y S Ea&bB&bK U bσ + V Σ + a X E a bx B bx K Y S Eb&bB&bK U bσ + V Σ + b X E b bx B bx K Y S c Z c 7

X C Y C Da &Y C Db &Y C Ea &Y C Eb &Y C c Y C Da V Σ + a X D a Y C Db V Σ + b X D b Y C Ea V Σ + a X E a Y C Eb V Σ + b X E b Y C c Z c X D Y D ac Y D ac U aσ + ax C X E Y E bc Y E bc U bσ + bx C X K Y K aa&ak &Y K bb&bk &Y K cr &Y K c Y K aa&ak U aσ + ax A ax K Y K ba&bk U bσ + bx A bx K Y K cr U cσ + cx R Y K c Z c X A Y A ai &Y A aj &Y A bi &Y A bj &Y A F a Y A ai U aσ + ax I Y A aj U aσ + ax J Y A bi U bσ + bx I Y A bj U bσ + bx J Y A F a V Σ + a X F a X I Y I Aa Y I Aa V Σ+ a X A a X J Y J Ab Y J Ab V Σ + b X A b X B Y B am &Y B an &Y B bm &Y B bn &Y B F b Y B am U aσ + ax M Y B an U aσ + ax N Y B bm U bσ + bx M Y B bn U bσ + bx N Y B F b V Σ + b X F b X M Y M Ba Y M Ba V Σ + a X B a X N Y N Bb Y N Bb V Σ + b X B b X R Y R ar &Y R br &Y R a &Y R b Y R ar U aσ + ax R Y R br U bσ + bx R Y R a Z a Y R b Z b X F Y F cr &Y F c Y F cr U cσ + cx R Y F c Z c 8

Z a ɛ bw cw aaw abw acw Z b ɛ aw cw baw bbw bcw Z c ɛ aw bw caw cbw ccw Z ɛ aw bw cw U aσ + U bσ + U cσ + ɛ a bw cw ɛ aw b cw ɛ aw bw c V Σ + a ɛ a W b W c V Σ + b ɛ W a b W c V Σ + c ɛ W a W b c W aw bw cw ɛ 6 Conclusion We have proved that the family of linear conjunctive languages is closed under complement and hence, due to its obvious closure under union and intersection, is closed under all set-theoretic operations. In all cases the construction is effective in the sense it can be done by an algorithm. The given method essentially uses the linearity of the grammar, and it seems very unlikely that it could be applied to the conjunctive grammars of general form. Thus the conjecture that the whole family of conjunctive languages is not closed under complement [2] remains in effect. References [1] A. Okhotin, Conjunctive grammars, in Pre-proceedings of DCAGRS 2000, Dept. of Computer Science, University of Western Ontario, London, Ontario, Canada. Report No. 555 (2000). [2] A. Okhotin, Conjunctive grammars, Journal of Automata, Languages and Combinatorics, 6:4 (2001), to appear. 9