Lecture 11 Context-Free Languages

Similar documents
Chapter 5: Context-Free Languages

Automata Theory CS F-08 Context-Free Grammars

Context-Free Grammars and Languages

Solutions to Problem Set 3

Context-free Grammars and Languages

Theory of Computation - Module 3

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

FLAC Context-Free Grammars

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Context-Free Grammars (and Languages) Lecture 7

5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:

Pushdown Automata. Reading: Chapter 6

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Einführung in die Computerlinguistik

Grammars and Context Free Languages

Context-Free Grammars and Languages. Reading: Chapter 5

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

Grammars and Context Free Languages

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

This lecture covers Chapter 5 of HMU: Context-free Grammars

Concordia University Department of Computer Science & Software Engineering

Formal Languages, Grammars and Automata Lecture 5

Introduction and Motivation. Introduction and Motivation. Introduction to Computability. Introduction and Motivation. Theory. Lecture5: Context Free

Computational Models - Lecture 4 1

Context Free Grammars

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

5 Context-Free Languages

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008

Introduction to Theory of Computing

CONTEXT FREE GRAMMAR AND

CS375: Logic and Theory of Computing

Section 1 (closed-book) Total points 30

Computational Models - Lecture 4 1

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Context Free Languages and Grammars

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Context-Free Grammar

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

UNIT II REGULAR LANGUAGES

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Compiling Techniques

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

CS 373: Theory of Computation. Fall 2010

Grammars and Context-free Languages; Chomsky Hierarchy

Chapter 1. Formal Definition and View. Lecture Formal Pushdown Automata on the 28th April 2009

Recitation 4: Converting Grammars to Chomsky Normal Form, Simulation of Context Free Languages with Push-Down Automata, Semirings

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

MA/CSSE 474 Theory of Computation

An automaton with a finite number of states is called a Finite Automaton (FA) or Finite State Machine (FSM).

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

CA320 - Computability & Complexity

AC68 FINITE AUTOMATA & FORMULA LANGUAGES JUNE 2014

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Theory Of Computation UNIT-II

CISC4090: Theory of Computation

CISC 4090 Theory of Computation

HW 3 Solutions. Tommy November 27, 2012

Miscellaneous. Closure Properties Decision Properties

download instant at Assume that (w R ) R = w for all strings w Σ of length n or less.

Syntax Analysis Part I

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

The View Over The Horizon

Syntax Analysis - Part 1. Syntax Analysis

Fundamentele Informatica II

Theory of Computer Science

CS375: Logic and Theory of Computing

6.8 The Post Correspondence Problem

Chapter 5. Finite Automata

CPS 220 Theory of Computation

Computational Models - Lecture 5 1

Syntax Analysis Part I

Lecture 4 Nondeterministic Finite Accepters

EXAMPLE CFG. L = {a 2n : n 1 } L = {a 2n : n 0 } S asa aa. L = {a n b : n 0 } L = {a n b : n 1 } S asb ab S 1S00 S 1S00 100

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Computational Models - Lecture 3

Context-Free Grammars: Normal Forms

Theory of Computation

Theory of Computation (IV) Yijia Chen Fudan University

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

Computational Models - Lecture 4

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7

Homework 4. Chapter 7. CS A Term 2009: Foundations of Computer Science. By Li Feng, Shweta Srivastava, and Carolina Ruiz

Chapter 1. Introduction

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Chapter 6. Properties of Regular Languages

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

CS A Term 2009: Foundations of Computer Science. Homework 2. By Li Feng, Shweta Srivastava, and Carolina Ruiz.

Transcription:

Lecture 11 Context-Free Languages COT 4420 Theory of Computation Chapter 5

Context-Free Languages n { a b : n n { ww } 0} R Regular Languages a *b* ( a + b) *

Example 1 G = ({S}, {a, b}, S, P) Derivations: S asb S λ S => asb => aasbb => aabb S => asb => aasbb => aaasbbb => aaaabbbb

Notation We write: * S => aaabbb for zero or more derivation steps Instead of: S => asb => aasbb => aaasbbb => aaabbb

Example Grammar: S asb S λ Possible Derivations: * S => λ * S => ab * * S => aaasbbb => aaaaabbbbb

Language of a Grammar For a grammar G with start variable S * L(G) = { w: S => w, w T*}

Example Grammar: S asb S λ Language of the grammar: L n n = { a b : n 0}

Context-Free Grammar A grammar G=(V, T, S, P) is context-free if all productions in P have the form: Sequence of terminals and variables where and A language L is a context-free language iff there is a context-free grammar G such that L = L(G)

Context-Free Language L = {a n b n : n 0} is a context-free language since context-free grammar: S asb λ generates L(G) = L

Another Example Context-free grammar G: S asa bsb λ A derivation: S => asa => absba => abba L(G) = { ww R : w {a,b}* }

Another Example Context-free grammar G: S (S) SS λ A derivation: S => SS => (S)S => ((S))S => (())(S) => (())() L(G) : balanced parentheses

Example 2 n > m aaaaaaaabbbbb S AS 1 S 1 as 1 b λ A aa a L = { a n b m : n m} S AS 1 S 1 B S 1 as 1 b λ A aa a B bb b n < m aaaaabbbbbbbbb S S 1 B S 1 as 1 b λ B bb b

Example 3 Suppose I have this grammar: S ab ba A as baa a B bs abb b Claim: L(G) is all words over {a, b} that have an equal number of a s and b s (excluding λ). Proof: by induction on the size of w L(G).

Proof by induction Induction hypothesis: 1. S =>* w iff w has an equal number of a s and b s 2. A =>* w iff w has one more a s than b s 3. B =>* w iff w has one more b s than a s Basis: true for w = 1 Inductive step: assume it is true for w k-1

1. S =>* w iff w has an equal number of a s and b s If S =>* w then w has an equal number of a s and b s Suppose S=>* w, w = k. The first derivation must be S ab or S ba. Suppose it is S ab. Then w = aw 1 where B =>* w 1. Since w 1 = k-1 by induction hypothesis (3) w 1 has one more b s than a s. Therefore w has equal number of a s and b s. We can prove similarly if the first step is using the rule S ba.

1. S =>* w iff w has an equal number of a s and b s If w has an equal number of a s and b s then S=>* w Assume w = k and w has equal number of a s and b s. Suppose w = aw 1. So w 1 must have one more b s than a s. By induction hypothesis since w 1 = k-1, B =>* w 1. Thus S => ab =>* aw 1 = w. Therefore, S =>* w Similarly if w = bw 1.

2. A =>* w iff w has one more a s than b s If A=>* w then w has one more a s than b s. Suppose A=>* w and w = k>1. Then the first derivation step must be A as or A baa. In the first case, S =>* w 1 with w 1 having equal a s and b s. In the second case first rhs A=>* w 1 and second rhs A=>* w 2, with w 1 and w 2 having one more a s than b s. Thus, A =>* bw 1 w 2 has one more a s than b s overall.

2. A =>* w iff w has one more a s than b s If w has one more a s than b s then A =>* w Assume w has one more a s than b s and w =k. Let w= aw 1. By induction S =>* w 1 therefore, A=> as =>* aw 1 =w. Let w = bw 2. Now w 2 has two more a s than b s and can be written as w 2 = w 3 w 4 with w 3 having one more a s than b s and w 4 having one more a s than b s (Why this is true?), by induction A=>* w 3 and A =>* w 4 therefore: A=> baa =>* bw 3 w 4 = w

Derivations

Derivations When a sentential form has a number of variables, we can replace any one of them at any step. As a result, we have many different derivations of the same string of terminals.

Derivations Example: 1. S aas 2. S a 3. A SbA 4. A SS 5. A ba 1 2 3 4 2 S => aas => aaa => asbaa => asbssa => asbsaa 2 2 => asbaaa => aabaaa 1 3 2 4 2 S => aas => asbas => asbaa => asbssa => 2 2 aabssa => aabsaa => aabaaa

Leftmost Derivation A derivation is said to be leftmost if in each step the leftmost variable in the sentential form is replaced. Example: S aas a A SbA SS ba S => aas => asbas => aabas => aabsss => aabass => aabaas => aabaaa Leftmost

Rightmost Derivation A derivation is said to be rightmost if in each step the rightmost variable is replaced. Example: 1. S aas 2. S a 3. A SbA 4. A SS 5. A ba 1 2 3 4 2 S => aas => aaa => asbaa => asbssa => asbsaa 2 2 => asbaaa => aabaaa Rightmost

Leftmost and Rightmost Derivation Example: 1. S aas 2. S a 3. A SbA 4. A SS 5. A ba S => aas => asbas => asbaa => asbssa => aabssa => aabsaa => aabaaa Neither

Derivation Trees

S AB A aaa λ B Bb λ S AB S A B Prof. Busch - LSU

S AB A aaa λ B Bb λ S AB aaab S A B a a A Prof. Busch - LSU

S AB A aaa λ B Bb λ S AB aaab aaabb S A B a a A B b Prof. Busch - LSU

S AB A aaa λ B Bb λ S AB aaab aaabb aabb S A B a a A B b λ Prof. Busch - LSU

S AB A aaa λ B Bb λ S AB aaab aaabb aabb aab Derivation Tree (parse tree) S A B a a A B b λ Prof. Busch - LSU λ yield aaλλb = aab

Derivation Trees Derivation trees are trees whose nodes are labeled by symbols of a CFG. Root is labeled by S (start symbol). Leaves are labeled by terminals T {λ} Interior nodes are labeled by non-terminals V. If a node has label A V, and there is a production rule A α 1 α 2 α n then its children are labeled from left to right α 1, α 2,,α n. The string of symbols obtained by reading the leaves from left to right is said to be the yield.

Partial Derivation Tree A partial derivation tree is a subset of the derivation tree (the leaves can be non-terminals or terminals. S AB A aaa λ B Bb λ Partial derivation tree S A B

Partial Derivation Tree S AB aaab Partial derivation tree S sentential form A B a a A yield aaab

Theorem: 1) If there is a derivation tree with root labeled A that yields w, then A =>* lm w. 2) If A =>* lm w, then there is a derivation tree with root A that yields w.

Proof - part 1 Proof: by induction on the height of the tree Basis: if height is 1, A a 1 a 2 a n must be a production rule. Therefore, A=>* lm a 1 a 2 a n A a 1... a n Inductive step: Assume it is true for trees of height < h, and you want to prove for height h. Since h > 1, the production used at root has at least one variable on its right side.

Proof - part 1 Assume node A has children X 1, X 2,, X n. Each of these X i yield w i in at most h-1 steps. Note that X i might be a terminal, in that case X i =w i and nothing needs to be done. X 1 A... X n If X i is a non-terminal, because of the induction hypothesis we know that there is a leftmost derivation X i =>* lm w i Thus, A => lm X 1 X n =>* lm w 1 X 2 X n =>* lm w 1 w 2 X 3 X n =>* lm =>* lm w 1 w n = w. w 1 w n

Proof part 2 Proof: by induction on the length of the derivation Basis: if A => lm a 1 a 2 a n by a one step derivation then there must be a derivation tree A a 1... a n Inductive step: Assume it is true for derivations of < k steps, and let A =>* lm w be a derivation of k steps. Since k>1, the first step must be A => lm X 1 X 2 X n

Proof part 2 If X i is a terminal, in that case X i = w i and nothing needs to be done. If X i is a nonterminal X i =>* lm w i in at most k-1 steps. By the induction hypothesis there is a derivation tree with root X i and yield w i. So we create the derivation tree as follows: A X 1... X n w 1 w n

Ambiguity

E E + E E E * E E a b Ambiguous grammars Example a * b + b + a Two derivation trees E E E * E E + E a E + E E * E E + E E + E a a b b a b b

Example E E + E E E * E E a b E => E + E => E * E + E => a * E + E => a * b + E => a * b + E + E => a * b + b + E => a* b + b + a Leftmost derivation E => E * E => a * E => a * E + E => a * E + E + E => a * b + E + E => a * b + b + E => a* b + b + a Leftmost derivation

Ambiguous grammars A context-free grammar G is ambiguous if there exist some w L(G) that has at least two distinct derivation trees. Or if there exists two or more leftmost derivations (or rightmost).

Why do we care about ambiguity? Grammar for mathematical expressions: E E + E E E * E E a E a + a * a E E + E E * E a E * E E + E a a a a a

Why do we care about ambiguity? Compute expressions result using the tree 12 E 3 + 3 * 3 =? 18 E 3 9 6 3 E + E E * E 3 E * E E + E 3 3 3 3 3

Why do we care about ambiguity? John saw the boy with a telescope.

Ambiguity In general, ambiguity is bad for programming languages and we want to remove it Sometimes it is possible to find a nonambiguous grammar for a language But in general it is difficult to achieve this

Non-ambiguous Grammar Example Can we rewrite the previous grammar so that it is not ambiguous anymore? E Equivalent non-ambiguous grammar: (Generates the same language) E E + T T T T * R R R a Every string w in L(G) has a unique derivation tree E + T T * R R a R a T Unique derivation tree for a + a * a a

Ambiguous Grammars If L is a context-free language for which there exists an unambiguous grammar, then L is said to be unambiguous. If every grammar that generates L is ambiguous, then the language is called inherently ambiguous. In general it is very difficult to show whether or not a language is inherently ambiguous.

Parsing

Compiler Lexical analyzer Parser Input String Program file machine code Output Prof. Busch - LSU

Lexical Analyzer Recognizes the lexemes of the input program file: Keywords (if, then, else, while, ), Integers, Identifiers (variables), etc Removes white space and comments

Lexical Analyzer Examples: letter A B Z a b z digit 0 1 9 digit: [0-9] letter: [a-za-z] num: digit + (. digit + )? ( E (+ -)? digit + )? identifier: letter ( letter digit)*

Design of a Lexical Analyzer Generator Translate regular expressions to NFA Translate NFA to an efficient DFA Optional regular expressions NFA DFA Simulate NFA to recognize tokens Simulate DFA to recognize tokens

Parser Parsing = process of determining if a string of tokens can be generated by a grammar Knows the grammar of the programming language to be compiled Constructs derivation (and derivation tree) for input program file (input string) Converts derivation to machine code

Example Parser stmt id := expr if expr then stmt if expr then stmt else stmt while expr do stmt begin opt_stmts end opt_stmts stmt ; opt_stmts ε

Parser Finds the derivation of a particular input 10 * 5 + 2 Parser E E + T T T T * R R R INT derivation E => E + T => T + T => T * R + T => R * R + T => 10 * R + T => 10 * 5 + R => 10 * 5 + 2

derivation E E => E + T => T + T => T * R + T => R * R + T => 10 * R + T => 10 * 5 + R => 10 * 5 + 2 T R E T * R 5 + T R 2 Derivation trees are used to build machine code 10 machine code mult a, 2, 5 add b, 10, a Prof. Busch - LSU

Parsing Parsing of a string w L(G) is to find a sequence of productions by which w is derived or to determine that w L(G). Example: Find derivation of string aabb S SS asb bsa λ aabb Parser S SS S asb S bsa S λ derivation?

Exhaustive search / Brute force S SS asb bsa λ w = aabb First derivation: S => SS S => asb S => bsa Cannot possibly produce aabb S => λ All possible derivations of length 1

Exhaustive search / Brute force S SS asb bsa λ First derivation: S => SS S => asb S => bsa S => λ Second derivation: S => SS => SSS S => SS => asbs S => SS => bsas S => SS => S S => asb => assb S => asb => aasbb S => asb => absab S => asb => ab w = aabb

Exhaustive search / Brute force S SS asb bsa λ Second derivation: S => SS => SSS S => SS => asbs S => SS => bsas S => SS => S S => asb => assb S => asb => aasbb S => asb => absab S => asb => ab w = aabb Third derivation: Explore all possible derivations A possible derivation found: S => asb = aasbb => aabb

Exhaustive search / Brute force This approach is called exhaustive search parsing or brute force parsing which is a form of top-down parsing. Can we use this approach as an algorithm for determining whether or not w L(G)?

Theorem: Suppose a CFG has no rules of the form A λ and A B. Then the exhaustive search parsing method can be made into an algorithm to parse w Σ *. Proof: In each derivation step, either the length of the sentential form or the number of terminals increases. Therefore, the maximum length of a derivation is 2 w. If w is parsed by then, you have the parse. If not, w L(G).

Parsing algorithm The exhaustive search algorithm is not very efficient since it may grow exponentially with the length of the string. For general context-free grammars there exists a parsing algorithm that parses a string w in time w 3

Faster Parsers There exists faster parsing algorithms for specialized grammars. A context-free grammar is said to be a simple grammar (s-grammar) if all its productions are of the form: A ax, A V, a T, x V* And any pair (A, a) occurs at most once.

Faster Parsers S-grammar Example: S as bss c Looking at exhaustive search for this grammar, at each step there is only one choice to follow. w = abcc S => as => abss => abcs => abcc Total steps for parsing string w: w