Formal Languages, Grammars and Automata Lecture 5

Similar documents
Grammars and Context Free Languages

Grammars and Context Free Languages

Grammars and Context-free Languages; Chomsky Hierarchy

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Lecture 11 Context-Free Languages

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

Foundations of Informatics: a Bridging Course

CSCI Compiler Construction

Automata Theory CS F-08 Context-Free Grammars

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Chapter 5: Context-Free Languages

Introduction and Motivation. Introduction and Motivation. Introduction to Computability. Introduction and Motivation. Theory. Lecture5: Context Free

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Context-Free Grammars (and Languages) Lecture 7

Solutions to Problem Set 3

Computational Models - Lecture 4 1

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Context Free Grammars

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Computational Models - Lecture 3

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

Context-free Grammars and Languages

Computational Models - Lecture 4

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Properties of Context-Free Languages

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008

Introduction to Theory of Computing

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

5 Context-Free Languages

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

CS20a: summary (Oct 24, 2002)

CISC4090: Theory of Computation

FLAC Context-Free Grammars

AC68 FINITE AUTOMATA & FORMULA LANGUAGES JUNE 2014

CPS 220 Theory of Computation

CYK Algorithm for Parsing General Context-Free Grammars

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Computational Models - Lecture 4 1

CISC 4090 Theory of Computation

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

Chap. 7 Properties of Context-free Languages

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

This lecture covers Chapter 5 of HMU: Context-free Grammars

Follow sets. LL(1) Parsing Table

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Context-Free Grammars and Languages

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS Lecture 29 P, NP, and NP-Completeness. k ) for all k. Fall The class P. The class NP

Theory of Computation (II) Yijia Chen Fudan University

Theory of Languages and Automata

Einführung in die Computerlinguistik

CISC 4090 Theory of Computation

Chapter 6. Properties of Regular Languages

Concordia University Department of Computer Science & Software Engineering

Non-context-Free Languages. CS215, Lecture 5 c

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

The View Over The Horizon

Fundamentele Informatica II

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Theory of Computation - Module 3

Recitation 4: Converting Grammars to Chomsky Normal Form, Simulation of Context Free Languages with Push-Down Automata, Semirings

5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Theory of Computation (IV) Yijia Chen Fudan University

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

CONTEXT FREE GRAMMAR AND

Chomsky Normal Form and TURING MACHINES. TUESDAY Feb 4

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

Context Free Languages and Grammars

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Decidable and undecidable languages

Intro to Theory of Computation

MA/CSSE 474 Theory of Computation

CS 275 Automata and Formal Language Theory

CS 373: Theory of Computation. Fall 2010

C6.2 Push-Down Automata

Context-Free Grammar

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Pushdown Automata: Introduction (2)

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

CA320 - Computability & Complexity

Theory of Computation (Classroom Practice Booklet Solutions)

CPS 220 Theory of Computation Pushdown Automata (PDA)

Transcription:

Formal Languages, Grammars and Automata Lecture 5 Helle Hvid Hansen helle@cs.ru.nl http://www.cs.ru.nl/~helle/ Foundations Group Intelligent Systems Section Institute for Computing and Information Sciences 6 June 2014 Helle Hvid Hansen 6 June 2014 FLGA 1 / 19

Midterm Enquete Results lecture level count (very slow) 0 0 1 4 (good) 2 7 3 5 (very fast) 4 3 average: 2.4 Comments: exercise level count (very easy) 0 0 1 3 (appropriate) 2 9 3 6 (very hard) 4 0 average: 2.2 Sometimes too fast, sometimes too slow (3 students). Solutions online (2 students). Helle Hvid Hansen 6 June 2014 FLGA 2 / 19

Overview Introduction Applications of finite automata: text search, natural language processing, lexical analysis (parsing), biology, video games (PacMan), internet protocols (TCP),... (see notes on webpage). Programming languages (like Java etc.) and natural languages are generally not regular. Helle Hvid Hansen 6 June 2014 FLGA 3 / 19

Overview Introduction Applications of finite automata: text search, natural language processing, lexical analysis (parsing), biology, video games (PacMan), internet protocols (TCP),... (see notes on webpage). Programming languages (like Java etc.) and natural languages are generally not regular. Formal Languages Grammars (generators) Automata (acceptors) Helle Hvid Hansen 6 June 2014 FLGA 3 / 19

Today Introduction Topics: Context-free grammars and context-free languages Regular grammars. Motivation/Application: Compilation of programming languages. Helle Hvid Hansen 6 June 2014 FLGA 4 / 19

Introduction Compilation and Parsing source code (ASCII) token string lexical analysis (DFA) (context-free language) parsing parse tree (data structure) code generation executable code Helle Hvid Hansen 6 June 2014 FLGA 5 / 19

Generating Strings with Production Rules Example: S O E E λ aea beb O a b aoa bob Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

Generating Strings with Production Rules Example: S O E E λ aea beb O a b aoa bob Productions, always start with S S E aea abeba abba Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

Generating Strings with Production Rules Example: S O E E λ aea beb O a b aoa bob Productions, always start with S S E aea abeba abba S E beb baeab babebab babaeabab babaabab Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

Generating Strings with Production Rules Example: S O E E λ aea beb O a b aoa bob Productions, always start with S S E aea abeba abba S E beb baeab babebab babaeabab babaabab S O bob bab Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

Generating Strings with Production Rules Example: S O E E λ aea beb O a b aoa bob Productions, always start with S S E aea abeba abba S E beb baeab babebab babaeabab babaabab S O bob bab S O bob baoab babab Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

Generating Strings with Production Rules Example: S O E E λ aea beb O a b aoa bob Productions, always start with S S E aea abeba abba S E beb baeab babebab babaeabab babaabab S O bob bab S O bob baoab babab We can generate exactly the set of words w with w = w R (w is a palindrome) Helle Hvid Hansen 6 June 2014 FLGA 6 / 19

Context-Free Grammar Def. A context-free grammar (CFG) G = (V, Σ, S, P) consists of V Σ S P a set of non-terminal symbols a set of terminal symbols a start symbol, S V a set of production rules of the form X w where X V, w (V Σ) Helle Hvid Hansen 6 June 2014 FLGA 7 / 19

Context-Free Grammar Def. A context-free grammar (CFG) G = (V, Σ, S, P) consists of V Σ S P a set of non-terminal symbols a set of terminal symbols a start symbol, S V a set of production rules of the form X w where X V, w (V Σ) Notation (Backus-Naur Form or BNF): Group together rules for the same non-terminal: is shorthand for three rules: E λ aea beb E λ, E aea, E beb Helle Hvid Hansen 6 June 2014 FLGA 7 / 19

Derivations and Language Let G = (V, Σ, S, P) be a context-free grammar, and let u, v, w (V Σ) be arbitrary. Given a string uxv (V Σ), we can apply a rule X w: uxv uwv A derivation of u from v is a sequence of rule applications: v v v u We say that u can be derived from v in G if there is a derivation of u from v in G, and write v u. Helle Hvid Hansen 6 June 2014 FLGA 8 / 19

Derivations and Language Let G = (V, Σ, S, P) be a context-free grammar, and let u, v, w (V Σ) be arbitrary. Given a string uxv (V Σ), we can apply a rule X w: uxv uwv A derivation of u from v is a sequence of rule applications: v v v u We say that u can be derived from v in G if there is a derivation of u from v in G, and write v u. Why context-free? A rule X w can be applied in any context to replace X with w. (There are also context-sensitive grammars) Helle Hvid Hansen 6 June 2014 FLGA 8 / 19

Language and Parsing The language generated by G is L(G) = {w Σ S w} A parser is an algorithm that determines for given G and w whether w L(G)? Note: in general there can be many derivations of a w in G. Helle Hvid Hansen 6 June 2014 FLGA 9 / 19

Language and Parsing The language generated by G is L(G) = {w Σ S w} A parser is an algorithm that determines for given G and w whether w L(G)? Note: in general there can be many derivations of a w in G. Def. A derivation is leftmost if in each step a rule is applied to the leftmost non-terminal. (Think of reading from left to right). Rightmost derivations are defined analogously. Lemma: For a CFG G and word w, w L(G) iff there is a leftmost derivation of w in G. (So we can restrict to searching for a leftmost derivation.) Helle Hvid Hansen 6 June 2014 FLGA 9 / 19

Another Example Introduction G : S asb λ, B b Derivations of aabb: d 1 : d 2 : d 3 : S asb aasbb aabb aabb aabb S asb asb aasbb aasbb aabb S asb asb aasbb aabb aabb (Beware of typo in derivations in [Silva], Example 5.2.4) Derivation d 1 is leftmost, d 2 is rightmost, d 3 is neither. Derivation Trees (on blackboard). Helle Hvid Hansen 6 June 2014 FLGA 10 / 19

Ambiguity Introduction Def. A CFG G is unambiguous if for each w L(G) there exists a unique leftmost derivation of w in G. Otherwise, G is ambiguous. Helle Hvid Hansen 6 June 2014 FLGA 11 / 19

Ambiguity Introduction Def. A CFG G is unambiguous if for each w L(G) there exists a unique leftmost derivation of w in G. Otherwise, G is ambiguous. Two syntactically correct strings can have different meanings. Examples: Time flies like an arrow; fruit flies like a banana The peasants are revolting ( revolting = disgusting/in rebellion) Helle Hvid Hansen 6 June 2014 FLGA 11 / 19

Ambiguity Introduction Def. A CFG G is unambiguous if for each w L(G) there exists a unique leftmost derivation of w in G. Otherwise, G is ambiguous. Two syntactically correct strings can have different meanings. Examples: Time flies like an arrow; fruit flies like a banana The peasants are revolting ( revolting = disgusting/in rebellion) Programming language should be given by unambiguous grammar. Helle Hvid Hansen 6 June 2014 FLGA 11 / 19

Regular and Context-Free Languages Def. A language L is context-free if there is a CFG G such that L = L(G). Theorem: Every regular language is context-free. Helle Hvid Hansen 6 June 2014 FLGA 12 / 19

Regular and Context-Free Languages Def. A language L is context-free if there is a CFG G such that L = L(G). Theorem: Every regular language is context-free. Proof: Let M = (Q, q 0, δ, F ) be a DFA that accepts L Σ. Define a context-free grammar G M as follows: V = Q(non-terminals are states) Σ (terminal symbols are letters of alphabet) S = q 0 P = {q aq δ(q)(a) = q } {q λ q F } Helle Hvid Hansen 6 June 2014 FLGA 12 / 19

Regular and Context-Free Languages Def. A language L is context-free if there is a CFG G such that L = L(G). Theorem: Every regular language is context-free. Proof: Let M = (Q, q 0, δ, F ) be a DFA that accepts L Σ. Define a context-free grammar G M as follows: V = Q(non-terminals are states) Σ (terminal symbols are letters of alphabet) S = q 0 P = {q aq δ(q)(a) = q } {q λ q F } L(G M ) = L(M) since computations correspond to derivations: a M : q 1 a 0 2 a q1 n qn F G M : q 0 a 1 q 1 a 1 a 2 q 2 a 1 a n q n a 1 a n λ Helle Hvid Hansen 6 June 2014 FLGA 12 / 19

Introduction Def. A regular grammar is a CFG G = (V, Σ, S, P) in which all rules have the form X uy or X u where X, Y V and u Σ (u consists only of terminals). Helle Hvid Hansen 6 June 2014 FLGA 13 / 19

Introduction Def. A regular grammar is a CFG G = (V, Σ, S, P) in which all rules have the form X uy or X u where X, Y V and u Σ (u consists only of terminals). Example: S abax Y X bx ay λ Y X ax bb Helle Hvid Hansen 6 June 2014 FLGA 13 / 19

Introduction Def. A regular grammar is a CFG G = (V, Σ, S, P) in which all rules have the form X uy or X u where X, Y V and u Σ (u consists only of terminals). Example: S abax Y X bx ay λ Y X ax bb Theorem: Every regular language is generated by a regular grammar (by the previous construction). Note: In some texts, a regular grammar is defined as having rules only of the form X ay or X λ. This corresponds to the difference between DFA and NFA-λ. Helle Hvid Hansen 6 June 2014 FLGA 13 / 19

and Regular Languages Theorem: If G is a regular grammar, then L(G) is a regular. Helle Hvid Hansen 6 June 2014 FLGA 14 / 19

and Regular Languages Theorem: If G is a regular grammar, then L(G) is a regular. Proof: Build an NFA-λ M G = (Q, q 0, δ, F ) as follows: State set Q = V plus some extra states (see below), q 0 = S, F = {X V X λ} plus some extra states (see below) Transitions: For each rule of the form X a 1 a 2 a n Y, add new states q 1, q 2,..., q n 1 to Q and transitions X a1 a q 2 a n 1 a 1 n qn 1 Y For each rule of the form X a 1 a 2 a n, add new states q 1, q 2,..., q n 1 to Q and transitions X a1 a q 2 a n 1 a 1 n qn 1 qn and add q n to F. For each rule of the form X Y add λ-transition X λ Y. Helle Hvid Hansen 6 June 2014 FLGA 14 / 19

and Regular Languages II...Proof continued: Again, derivations correspond to computations, e.g. in we have: G : G : S abax Y X bx ay λ Y X ax bb S Y ax aay aabb M G : S λ Y a X Hence we have: L(M G ) = L(G). a Y b q 1 b q n F Helle Hvid Hansen 6 June 2014 FLGA 15 / 19

Parsing Algorithm Introduction Given a CFG G and a word w, how do we determine w L(G)? The Cocke-Younger-Kasami (CYK) ALgorithm. Requires G to be in Chomsky normal form. Helle Hvid Hansen 6 June 2014 FLGA 16 / 19

Parsing Algorithm Introduction Given a CFG G and a word w, how do we determine w L(G)? The Cocke-Younger-Kasami (CYK) ALgorithm. Requires G to be in Chomsky normal form. Def. A CFG G is in Chomsky normal form if all its productions are of the form: X YZ or X a where X, Y, Z V and a Σ. (Note: λ / L(G).) Lemma: Every CFG G can be transformed into an equivalent CFG in Chomsky normal form. (See e.g. Sudkamp.) Helle Hvid Hansen 6 June 2014 FLGA 16 / 19

Introduction Idea: For each substring u of input word w, compute the set of non-terminals that can produce u. Example: G : S AB BA SS AC BD, A a, C SB, B b, D SA. Run algorithm on w = aabbab. (continue on blackboard, see notes on webpage) Helle Hvid Hansen 6 June 2014 FLGA 17 / 19

The CYK-Algorithm Introduction Helle Hvid Hansen 6 June 2014 FLGA 18 / 19

Context-Free Art: http://www.contextfreeart.org Helle Hvid Hansen 6 June 2014 FLGA 19 / 19