Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Similar documents
Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Context Free Grammars

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

Context-Free Grammars and Languages. Reading: Chapter 5

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Einführung in die Computerlinguistik

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Section 1 (closed-book) Total points 30

Context-Free Grammars and Languages

Chapter 6. Properties of Regular Languages

Grammars and Context-free Languages; Chomsky Hierarchy

MA/CSSE 474 Theory of Computation

5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:

Chapter 4: Context-Free Grammars

Computational Models - Lecture 4

5 Context-Free Languages

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Context-free Grammars and Languages

This lecture covers Chapter 5 of HMU: Context-free Grammars

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

MA/CSSE 474 Theory of Computation

Foundations of Informatics: a Bridging Course

Functions on languages:

Before We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions?

Syntax Analysis Part I

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Notes for Comp 497 (454) Week 10

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

Syntax Analysis Part I

Lecture 11 Context-Free Languages

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Context Sensitive Grammar

FLAC Context-Free Grammars

CPS 220 Theory of Computation

UNRESTRICTED GRAMMARS

Introduction to Theory of Computing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

THEORY OF COMPUTATION

UNIT II REGULAR LANGUAGES

Grammars and Context Free Languages

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

Context-Free and Noncontext-Free Languages

Compiling Techniques

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Compiling Techniques

CA320 - Computability & Complexity

Grammars and Context Free Languages

Pushdown Automata. Reading: Chapter 6

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Top-Down Parsing and Intro to Bottom-Up Parsing

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Non-context-Free Languages. CS215, Lecture 5 c

Computational Models - Lecture 3

Context-Free Grammar

CSE302: Compiler Design

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

A* Search. 1 Dijkstra Shortest Path

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

Properties of Context-Free Languages

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

This lecture covers Chapter 7 of HMU: Properties of CFLs

Syntax Analysis - Part 1. Syntax Analysis

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Einführung in die Computerlinguistik

Pushdown Automata: Introduction (2)

CS 373: Theory of Computation. Fall 2010

Automata Theory CS F-08 Context-Free Grammars

Context Free Languages and Grammars

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

Context-Free Grammars: Normal Forms

Theory of Computation (II) Yijia Chen Fudan University

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

Compiling Techniques

Syntax Analysis Part III

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

Properties of Context-Free Languages. Closure Properties Decision Properties

Context Free Language Properties

Context-Free Grammars (and Languages) Lecture 7

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

d(ν) = max{n N : ν dmn p n } N. p d(ν) (ν) = ρ.

The Parser. CISC 5920: Compiler Construction Chapter 3 Syntactic Analysis (I) Grammars (cont d) Grammars

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

Theory of Computation - Module 3

Peter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages

Theory Bridge Exam Example Questions

MTH401A Theory of Computation. Lecture 17

CPSC 421: Tutorial #1

CONTEXT FREE GRAMMAR AND

CMSC 330: Organization of Programming Languages

[4203] Compiler Theory Sheet # 3-Answers

Transcription:

Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g. V = {a, b, c,..., z} A string, ω, in a language is a finite set of symbols from an alphabet. ω = v 1 v 2... v k v k V The length of a string ω, written ω is the number of symbols in it. The empty string is represented as ε. ε = 0. The set of all strings ω over an alphabet V may be expressed as: ω k = {ω ω = k } ω o = { ε } ω = ω k in 0.. k Languages Suppose we have an alphabet V. Then we can write: V 0 = { ε } V 1 = V o. V V κ+1 = V k. V V = k in 0.. V k V + = k in 1.. V k V is known as the closure of V or Kleene closure of V. 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 1 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 2 Grammars If a language is an infinite set of strings, how can we define it? we can t just list all the strings One possibility is to have a finite set of rules for generating the strings this is called a grammar Grammars have a number of syntactic categories: Nonterminals a set of symbols, V N Terminals the "words" of the language (the alphabet) which cannot be broken down any further, V T Productions the rules for the formation of sentences, P. Start symbol the starting point, S. S V N Together these four things represent a grammar. A grammar, G, is a quadruple (V N, V T, P, S) where V N V T = i.e. non-terminals and terminals are disjoint Note, the vocabulary V N V T = V (where V T is the alphabet). 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 3 An Example Grammar A simple example grammar is: G = ( {S, B}, {a, b, c}, {S ab, S bb, S cb, B a, B b, B c}, S ) In order to derive strings in G, we start with the goal symbol S bb bc Sentential form Sentence in the language The sequence of steps is called a derivation. The result is that we have a derived string in the language. " " is pronounced "directly derives" and represents one step in the derivation. As long as there are nonterminals in the working string, the process does not halt. Sometimes it is useful to abbreviate the derivation by using the Kleene star: S bc A language generated by G is defined formally as: L(G) = { ω ω V T and S G ω } 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 4

Derivations Recall α 1 G + α m if and only if α 1 G α 2 G... G α m and L(G) = { ω ω V T and S G ω } ω V such that S G ω is sometimes known as a sentential form or a sentence. For ω L(G), the sequence of sentential forms from S to ω is known as the derivation of ω. Note that we've seen this already these derivations are paths in our Parse Tree or Abstract Syntax Tree. Grammar G2.1+: <E> ::= <T> <E> + <T> <T> ::= <F> <T> <F> <F> ::= <X> ( <E>) <X> ::= a b c Derive a+b from E: E E + T T + T F + T X + T a + T a + F a + X a + b An Example leftmost derivation Now G=(V N,V T,P,E) rightmost derivation V N ={E,T,F,X} V T ={a,b,c,+..(,)} P=..there are 9 of them... E E + T E + F E + X E + b T + b F + b X + b a + b 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 5 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 6 Drawing Derivation Trees Example (1) Each node in the tree is labelled with a symbol in V. (2) The root node is always labelled with the start symbol of the language. (3) Leaf nodes have a label V T if the derivation tree is complete. (4) If a node is not a leaf node, then its label V N (5) If a node n labelled A has decendents: n 1, n 2,..., n m labelled A 1 A 2... A m then there must be a production: A A 1 A 2... A m in the production rules. V N E T F X a E + V T T F X b start symbol We can only draw derivation trees for type 2 and 3 Chomsky grammars. Type 0 and 1 required a directed graph to express them. See handout. V N 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 7 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 8

Derivations Derivations may be partitioned into leftmost derivations and rightmost derivations. Leftmost Derivation With leftmost derivation we always expand the leftmost nonterminal in the sentential form. Leftmost derivation is often written as: α lm β Rightmost Derivation With rightmost derivation we always expand the rightmost nonterminal in the sentential form. Rightmost derivation is often written as: α rm β 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 9 Ambiguity A context free (type 2) grammar is said to be ambiguous if there are one or more sentences ω L(G), with two or more distinct leftmost (rightmost) derivations. An unambiguous grammar has only one parse tree for each sentence in the language. Ambiguity is bad the Pascal dangling else problem: S if B then S S if B then S else S S B? else else 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 10 The Two Trees that result S if B then S else S if B then S S if B then S if B then S else S 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 11 Ambiguity The Pascal dangling else problem was resolved by defining the else to bind to the most recent if. Note how the two different pretty-print formats both look plausible. Note how the alternatives have different semantics (assuming that semantics is based on syntax) Just what is wanted when you write code to control your nuclear power plant! Ambiguous grammars can not be parsed using a recursive descent analyser We need to remove any ambiguity from the grammar before we start... 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 12

Different kinds of Grammars Grammars have a set of productions Most general form is a b Where a is a string including a non-terminal and b is any string (of terminals and non-terminals) Different kinds of grammars constrain the productions: TYPE 1: a b TYPE 2: a consists only of a non-terminal TYPE 3: a consists only of a non-terminal and b consists only of a terminal or a terminal followed by a non-terminal TYPE3 Grammars TYPE3 grammars have productions a b where a consists only of a non-terminal and b consists only of a terminal or a terminal followed by a non-terminal TYPE3 grammars are used to define terminal symbols of a programming language, e.g. numbers, identifiers TYPE3 grammars cannot define strings consisting of an arbitrary length matching strings, e.g. a n b n S ab 1 S aa 1 S aa 3 B 1 b A 1 ab 2 A 3 aa 2 B 2 bb 1 A 2 ab 3 B 3 bb 2 Need approximately 2n non-terminals for strings a n b n 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 13 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 14 TYPE2 Grammars TYPE1 Grammars TYPE2 grammars have productions a b where a consists only of a non-terminal and b consists of an arbitrary string (of terminals and non-terminals) TYPE2 grammars are used to define the syntax of a programming language, e.g. blocks, s, expressions TYPE2 grammars can define strings consisting of an arbitrary length matching strings, e.g. a n b n S ab S axb X ab X axb They cannot defines strings a n b n c n TYPE2 grammars cannot handle context constraints, such as definition and use of identifiers TYPE1 grammars have productions a b where a includes a non-terminal and a b TYPE1 grammars can define strings consisting of an matching strings, e.g. a n b n c n S abc S axbc X abc X axbc CB BC -- reordering ab ab bc bc -- context sensitive bb bb cc cc -- context sensitive TYPE1 grammars can handle context constraints, such as definition and use of identifiers 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 15 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 16

TYPE1 Grammars S abc S axbc X abc X axbc CB BC -- reordering ab ab bc bc -- context sensitive bb bb cc cc -- context sensitive S axbc aabcbc aabcbc aabcbc Dead end no string generated S axbc aabcbc aabbcc aabbcc aabbcc aabbcc aabbcc A string of the language 1992-2004 Kevin J. Maciunas, Charles Lakos Slide 17