Mildly Context-Sensitive Grammar Formalisms: Introduction

Similar documents
Closure Properties of Regular Languages

CS 275 Automata and Formal Language Theory

Introduction to Lexicalized Tree Adjoining Grammar (LTAG)

CS 275 Automata and Formal Language Theory

Harvard University Computer Science 121 Midterm October 23, 2012

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

1.4 Nonregular Languages

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Parse trees, ambiguity, and Chomsky normal form

1.3 Regular Expressions

CHAPTER 1 Regular Languages. Contents

Context-Free Grammars and Languages

Lecture 08: Feb. 08, 2019

First Midterm Examination

Formal languages, automata, and theory of computation

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

CSC 311 Theory of Computation

Revision Sheet. (a) Give a regular expression for each of the following languages:

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

CISC 4090 Theory of Computation

CS375: Logic and Theory of Computing

Exercises Chapter 1. Exercise 1.1. Let Σ be an alphabet. Prove wv = w + v for all strings w and v.

Theory of Computation Regular Languages

CS 314 Principles of Programming Languages

Designing finite automata II

I. Theory of Automata II. Theory of Formal Languages III. Theory of Turing Machines

Minimal DFA. minimal DFA for L starting from any other

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

For convenience, we rewrite m2 s m2 = m m m ; where m is repeted m times. Since xyz = m m m nd jxyj»m, we hve tht the string y is substring of the fir

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Postprint.

Finite Automata-cont d

Fundamentals of Computer Science

First Midterm Examination

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Formal Languages and Automata

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

PART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

Chapter 2 Finite Automata

Let's start with an example:

Normal Forms for Context-free Grammars

CS 330 Formal Methods and Models

ɛ-closure, Kleene s Theorem,

1 Nondeterministic Finite Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata

Parsing Directed Acyclic Graphs with Range Concatenation Grammars

Homework 3 Solutions

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Automata and Languages

THEOTY OF COMPUTATION

Regular expressions, Finite Automata, transition graphs are all the same!!

Name Ima Sample ASU ID

FABER Formal Languages, Automata and Models of Computation

Nondeterminism and Nodeterministic Automata

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

Homework 4. 0 ε 0. (00) ε 0 ε 0 (00) (11) CS 341: Foundations of Computer Science II Prof. Marvin Nakayama

SWEN 224 Formal Foundations of Programming WITH ANSWERS

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

1 From NFA to regular expression

Thoery of Automata CS402

Finite-State Automata: Recap

Convert the NFA into DFA

CSC 473 Automata, Grammars & Languages 11/9/10

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Overview HC9. Parsing: Top-Down & LL(1) Context-Free Grammars (1) Introduction. CFGs (3) Context-Free Grammars (2) Vertalerbouw HC 9: Ch.

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

Deterministic Finite Automata

CMSC 330: Organization of Programming Languages

Formal Languages and Automata Theory. D. Goswami and K. V. Krishna

CSE : Exam 3-ANSWERS, Spring 2011 Time: 50 minutes

Lecture 6 Regular Grammars

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

3 Regular expressions

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

CS 275 Automata and Formal Language Theory

Formal Languages Simplifications of CFGs

Lecture 09: Myhill-Nerode Theorem

The size of subsequence automaton

Some Theory of Computation Exercises Week 1

NFAs continued, Closure Properties of Regular Languages

Tutorial Automata and formal Languages

Lecture 3: Equivalence Relations

I. Theory of Automata II. Theory of Formal Languages III. Theory of Turing Machines

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Agenda. Agenda. Regular Expressions. Examples of Regular Expressions. Regular Expressions (crash course) Computational Linguistics 1

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy:

Model Reduction of Finite State Machines by Contraction

NFAs continued, Closure Properties of Regular Languages

Lexical Analysis Finite Automate

Formal Language and Automata Theory (CS21004)

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Transcription:

CFG nd nturl lnguges (1) Mildly Context-ensitive Grmmr Formlisms: Introduction Lur Kllmeyer Heinrich-Heine-Universität Düsseldorf A context-free grmmr (CFG) is set of rewriting rules tht tell us how to replce non-terminl y sequence of non-terminl nd terminl symols. Exmple: The string lnguge generted y this grmmr is { n n n 1}. ommersemester 2011 Grmmr Formlisms 1 Introduction Grmmr Formlisms 3 Introduction CFG nd nturl lnguges (2) mple CFG G telescope : Overview 1. CFG nd nturl lnguges 2. Polynomil extensions of CFG 3. Bsic definitions NP VP NP D N VP VP PP V NP N N PP PP P NP N mn girl telescope D the N John P with V sw Grmmr Formlisms 2 Introduction Grmmr Formlisms 4 Introduction

CFG nd nturl lnguges (3) Context-free lnguges (CFLs) cn e recognized in polynomil time (O(n 3 )); re ccepted y push-down utomt; hve nice closure properties (e.g., closure under homomorphisms, intersection with regulr lnguges...); stisfy pumping lemm; cn descrie nested dependencies ({ww R w T }). (Hopcroft nd Ullmn, 1979) Grmmr Formlisms 5 Introduction CFG nd nturl lnguges (5) wiss Germn: (2) (3)... ds mer em Hns es huus hälfed striiche... tht we Hns Dt house Acc helped pint... tht we helped Hns pint the house... ds mer d chind em Hns es huus lönd hälfe striiche... tht we the children Acc Hns Dt house Acc let help pint... tht we let the children help Hns pint the house wiss Germn uses cse mrking nd displys cross-seril dependencies. (hieer, 1985) shows tht wiss Germn is not context-free. Grmmr Formlisms 7 Introduction CFG nd nturl lnguges (4) Question: Is CFG powerful enough to descrie ll nturl lnguge phenome? Answer: No. There re constructions in nturl lnguges tht cnnot e dequtely descried with context-free grmmr. Exmple: cross-seril dependencies in Dutch nd in wiss Germn. Dutch: (1)... dt Wim Jn Mrie de kinderen zg helpen leren zwemmen... tht Wim Jn Mrie the children sw help tech swim... tht Wim sw Jn help Mrie tech the children to swim CFG nd nturl lnguges (6) In generl, ecuse of the closure properties, the following holds: A formlism tht cn generte cross-seril dependencies cn lso generte the copy lnguge {ww w {, } }. The copy lnguge is not context-free. Therefore we re interested in extensions of CFG in order to descrie ll nturl lnguge phenomen. Grmmr Formlisms 6 Introduction Grmmr Formlisms 8 Introduction

CFG nd nturl lnguges (7) Polynomil extensions of CFG (2) Ide (Joshi, 1985): chrcterize the mount of context-sensitivity necessry for nturl lnguges. Mildly context-sensitive formlisms hve the following properties: 1. They generte (t lest) ll CFLs. 2. They cn descrie limited mount of cross-seril dependencies. In other words, there is n 2 up to which the formlism cn generte ll string lnguges {w n w T }. 3. They re polynomilly prsle. 4. Their string lnguges re of constnt growth. In other words, the length of the words generted y the grmmr grows in liner wy, e.g., { 2n n 0} does not hve tht property. Exmple: TAG derivtion of : Grmmr Formlisms 9 Introduction Grmmr Formlisms 11 Introduction Polynomil extensions of CFG (1) Tree Adjoining Grmmrs (TAG), (Joshi, Levy, nd Tkhshi, 1975; Joshi nd ches, 1997): Tree-rewriting grmmr. Extension of CFG tht llows to replce not only leves ut lso internl nodes with new trees. Cn generte the copy lnguge. Exmple: TAG for the copy lnguge Grmmr Formlisms 10 Introduction Polynomil extensions of CFG (3) Liner Context-free rewriting systems (LCFR) nd the equivlent Multiple Context-Free Grmmrs (MCFG), (Vijy-hnker, Weir, nd Joshi, 1987; Weir, 1988; eki et l., 1991) Ide: extension of CFG where non-terminls cn spn tuples of non-djcent strings. Exmple: yield(a) = n n, c n d n, with n 1. The rewriting rules tell us how to compute the spn of the lefthnd side non-terminl from the spns of the righthnd side non-terminls. A(, cd) ε A(X, cy d) A(X, Y ) (XY ) A(X, Y ) Generted string lnguge: { n n c n d n n 1}. LCFR is more powerful thn TAG ut still mildly context-sensitive. Grmmr Formlisms 12 Introduction

Polynomil extensions of CFG (4) Rnge Conctention Grmmr (RCG) (Boullier, 2000) RCG contins cluses of the form A(...) A 1 (...)...A k (...) where A, A 1,..., A k re predictes. Their rguments re words over the terminl nd nonterminl lphets. Intuition: The predictes chrcterize properties of strings. A derivtion strts with (w) where is strt predicte. If this cn e reduced to the empty word (i.e., property is true for w), then w is in the lnguge. Exmple: RCG for { 2n n 0}. () ε (XY ) E(X, Y )(X) E(, ) ε E(X, Y ) E(X, Y ) Grmmr Formlisms 13 Introduction Polynomil extensions of CFG (6) ummry: CFG TAG LCFR, MCFG, simple RCG RCG (= PTIME) mildly context-sensitive In this course, we re interested in mildly context-sensitive formlisms. Grmmr Formlisms 15 Introduction Polynomil extensions of CFG (5) RCGs re simple if the rguments in the right-hnd sides of the cluses re single vriles. no vrile ppers more thn once in the left-hnd side of cluse or more thn once in the right-hnd side of cluse. ech vrile occurring in the left-hnd side of cluse occurs lso in its right-hnd side nd vice vers. imple RCG re equivlent to LCFR nd MCFG. RCG in generl re more powerful; they generte exctly the clss PTIME of polynomilly prsle lnguges. (They properly include the clss of MC formlisms.) Bsic Definitions: Lnguges (1) Definition 1 (Alphet, word, lnguge) 1. An lphet is nonempty finite set X. 2. A string x 1...x n with n 1 nd x i X for 1 i n is clled nonempty word on the lphet X. X + is defined s the set of ll nonempty words on X. 3. A new element ε / X + is dded: X := x + {ε}. For ech w X, the conctention of w nd ε is defined s follows: wε = εw = w. ε is clled the empty word, nd ech w X is clled word on X. 4. A set L is clled lnguge iff there is n lphet X such tht L X. Grmmr Formlisms 14 Introduction Grmmr Formlisms 16 Introduction

Bsic Definitions: Lnguges (2) Definition 2 (Homomorphism) For two lphets X nd Y, function f : X Y is homomorphism iff for ll v, w X : f(vw) = f(v)f(w). Definition 3 (Length of word) Let X e n lphet, w X. 1. The length of w, w is defined s follows: if w = ε, then w = 0. If w = xw for some x X, then w = 1 + w. 2. For every X, we define w s the numer of s occurring in w: If w = ε, then w = 0, if w = w then w = w + 1 nd if w = w for some X \ {}, then w = w. Grmmr Formlisms 17 Introduction Bsic Definitions: CFG (2) Definition 5 (Lnguge of CFG) Let G = N, T, P, e CFG. The (string) lnguge L(G) of G is the set {w T w} where for w, w (N T) : w w iff there is A α P nd there re v, u (N T) such tht w = vau nd w = vαu. is the reflexive trnsitive closure of : w 0 w for ll w (N T), nd for ll w, w (N T) : w n w iff there is v such tht w v nd v n 1 w. for ll w, w (N T) : w w iff there is i IN such tht w i w. A lnguge L is clled context-free iff there is CFG G such tht L = L(G). Grmmr Formlisms 19 Introduction Bsic Definitions: CFG (1) Definition 4 (Context-free grmmr) A context-free grmmr (CFG) is tuple G = N, T, P, such tht 1. N nd T re disjoint lphets, the nonterminls nd terminls of G, 2. P N (N T) is finite set of productions (lso clled rewriting rules). A production A, α is usully written A α. 3. N is the strt symol. Bsic Definitions: CFG (3) Proposition 1 (Pumping lemm for context-free lnguges) Let L e context-free lnguge. Then there is constnt c such tht for ll w L with w c: w = xv 1 yv 2 z with v 1 v 2 1, v 1 yv 2 c, nd for ll i 0: xv i 1 yvi 2 z L. Grmmr Formlisms 18 Introduction Grmmr Formlisms 20 Introduction

Bsic Definitions: CFG (4) Proposition 2 Context-free lnguges re closed under homomorphisms, i.e., for lphets T 1, T 2 nd for every context-free lnguge L 1 T 1 nd every homomorphism h : T 1 T 2, h(l 1 ) = {h(w) w L 1 } is context-free lnguge. Proposition 3 Context-free lnguges re closed under intersection with regulr lnguges, i.e., for every context-free lnguge L nd every regulr lnguge L r, L L r is context-free lnguge. Proposition 4 The copy lnguge {ww w {, } } is not context-free. Bsic Definitions: Trees (2) Definition 7 (Tree) A tree is triple γ = V, E, r such tht V, E is directed grph nd r V is specil node, the root node. γ contins no cycles, i.e., there is no v V such tht v, v E +, only the root r V hs in-degree 0, every vertex v V is ccessile from r, i.e., r, v E, nd ll nodes v V {r} hve in-degree 1. A vertex with out-degree 0 is clled lef. The vertices in tree re lso clled nodes. Grmmr Formlisms 21 Introduction Grmmr Formlisms 23 Introduction Bsic Definitions: Trees (1) Definition 6 (Directed Grph) 1. A directed grph is pir V, E where V is finite set of vertices nd E V V is set of edges. 2. For every v V, we define the in-degree of v s {v V v, v E} nd the out-degree of v s {v V v, v E}. E + is the trnsitive closure of E nd E is the reflexive trnsitive closure of E. Bsic Definitions: Trees (3) Definition 8 (Ordered Tree) A tree is ordered if it hs n dditionl liner precedence reltion V V such tht is irreflexive, ntisymmetric nd trnsitive, for ll v 1, v 2 with { v 1, v 2, v 2, v 1 } E = : either v 1 v 2 or v 2 v 1 nd if there is either v 3, v 1 E with v 3 v 2 or v 4, v 2 E with v 1 v 4, then v 1 v 2, nd nothing else is in. We use Gorn ddresses for nodes in ordered trees: The root ddress is ε, nd the jth child of node with ddress p hs ddress pj. Grmmr Formlisms 22 Introduction Grmmr Formlisms 24 Introduction

Bsic Definitions: Trees (4) Definition 9 (Leling) A leling of grph γ = V, E over signture A 1, A 2 is pir of functions l : V A 1 nd g : E A 2 with A 1, A 2 possily distinct. Definition 10 (yntctic tree) Let N nd T e disjoint lphets of non-terminl nd terminl symols. A syntctic tree (over N nd T) is n ordered finite leled tree such tht l(v) N for ech vertex v with out-degree t lest 1 nd l(v) (N T {ε}) for ech lef v. Bsic Definitions: Trees (6) Definition 12 (Wek nd trong Equivlence) Let F 1, F 2 e two grmmr formlisms. F 1 nd F 2 re wekly equivlent iff for ech instnce G 1 of F 1 there is n instnce G 2 of F 2 tht genertes the sme string lnguge nd vice vers. F 1 nd F 2 re strongly equivlent iff for oth formlisms the notion of tree lnguge is defined nd, furthermore, for ech instnce G 1 of F 1 there is n instnce G 2 of F 2 tht genertes the sme tree lnguge nd vice vers. Grmmr Formlisms 25 Introduction Grmmr Formlisms 27 Introduction Bsic Definitions: Trees (5) Definition 11 (Tree Lnguge of CFG) Let G = N, T, P, e CFG. 1. A syntctic tree V, E, r over N nd T is prse tree in G iff l(v) (T {ε}) for ech lef v, for every v 0, v 1,..., v n V, n 1 such tht v 0, v i E for 1 i n nd v i, v i+1 for 1 i < n, l(v 0 ) l(v 1 )...l(v n ) P. 2. A prse tree V, E, r is derivtion tree in G iff l(r) =. 3. The tree lnguge of G is L T (G) = {γ γ is derivtion tree in G} Grmmr Formlisms 26 Introduction References Boullier, Pierre. 2000. Rnge Conctention Grmmrs. In Proceedings of the ixth Interntionl Workshop on Prsing Technologies (IWPT2000), pges 53 64, Trento, Itly, Ferury. Hopcroft, John E. nd Jeffrey D. Ullmn. 1979. Introduction to Automt Theory, Lnguges nd Computtion. Addison Wesley. Joshi, Arvind K. 1985. Tree djoining grmmrs: How much contextsensitivity is required to provide resonle structurl descriptions? In D. Dowty, L. Krttunen, nd A. Zwicky, editors, Nturl Lnguge Prsing. Cmridge University Press, pges 206 250. Joshi, Arvind K., Leon. Levy, nd Msko Tkhshi. 1975. Tree Adjunct Grmmrs. Journl of Computer nd ystem cience, 10:136 163. Joshi, Arvind K. nd Yves ches. 1997. Tree-Adjoning Grmmrs. In G. Rozenerg nd A. lom, editors, Hndook of Forml Lnguges. pringer, Berlin, pges 69 123. Grmmr Formlisms 28 Introduction

vitch, Wlter J., Emmon Bch, Willim Mrxh, nd Gil frn-nveh, editors. 1987. The Forml Complexity of Nturl Lnguge. tudies in Linguistics nd Philosophy. Reidel, Dordrecht, Hollnd. eki, Hiroyuki, Tkhshi Mtsumur, Mmoru Fujii, nd Tdo Ksmi. 1991. On multiple context-free grmmrs. Theoreticl Computer cience, 88(2):191 229. hieer, turt M. 1985. Evidence ginst the context-freeness of nturl lnguge. Linguistics nd Philosophy, 8:333 343. Reprinted in (vitch et l., 1987). Vijy-hnker, K., Dvid J. Weir, nd Arvind K. Joshi. 1987. Chrcterizing structurl descriptions produced y vrious grmmticl formlisms. In Proceedings of ACL, tnford. Weir, Dvid J. 1988. Chrcterizing Mildly Context-ensitive Grmmr Formlisms. Ph.D. thesis, University of Pennsylvni. Grmmr Formlisms 29 Introduction