(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

Similar documents
Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Context Free Grammars

5 Context-Free Languages

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

1 Alphabets and Languages

UNIT II REGULAR LANGUAGES

Context-Free Grammars (and Languages) Lecture 7

CONTEXT FREE GRAMMAR AND

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Chapter 6. Properties of Regular Languages

Section 1 (closed-book) Total points 30

Foundations of Informatics: a Bridging Course

Context-Free Grammars and Languages. Reading: Chapter 5

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

Chapter 4: Context-Free Grammars

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Computational Models - Lecture 4

Grammars and Context-free Languages; Chomsky Hierarchy

How do regular expressions work? CMSC 330: Organization of Programming Languages

CISC4090: Theory of Computation

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Einführung in die Computerlinguistik

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

Theory of Computation (II) Yijia Chen Fudan University

Context-Free and Noncontext-Free Languages

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

Context-free Grammars and Languages

FLAC Context-Free Grammars

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Properties of Context-Free Languages. Closure Properties Decision Properties

Automata Theory CS F-08 Context-Free Grammars

Theory of Computation 1 Sets and Regular Expressions

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

Lecture 11 Context-Free Languages

Computational Models - Lecture 3

Context-Free Languages (Pre Lecture)

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

10. The GNFA method is used to show that

5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:

CPS 220 Theory of Computation

Context Free Languages and Grammars

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

What we have done so far

Formal Languages, Grammars and Automata Lecture 5

CISC 4090 Theory of Computation

Miscellaneous. Closure Properties Decision Properties

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

CMSC 330: Organization of Programming Languages

Chapter 3. Regular grammars

REGular and Context-Free Grammars

Computational Models - Lecture 4 1

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

Parsing with Context-Free Grammars

Notes for Comp 497 (454) Week 10

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

In English, there are at least three different types of entities: letters, words, sentences.

Introduction to Bottom-Up Parsing

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Non-context-Free Languages. CS215, Lecture 5 c

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

CISC 4090 Theory of Computation

CMSC 330: Organization of Programming Languages

Introduction to Bottom-Up Parsing

Functions on languages:

Grammars and Context Free Languages

CPSC 421: Tutorial #1

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

Lecture 3: Finite Automata

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Computational Theory

Syntax Analysis - Part 1. Syntax Analysis

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

MA/CSSE 474 Theory of Computation

Problem Session 5 (CFGs) Talk about the building blocks of CFGs: S 0S 1S ε - everything. S 0S0 1S1 A - waw R. S 0S0 0S1 1S0 1S1 A - xay, where x = y.

Solutions to Problem Set 3

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Exam: Synchronous Grammars

Fundamentele Informatica II

Computational Models - Lecture 4 1

Discrete Mathematics. CS204: Spring, Jong C. Park Computer Science Department KAIST

Lecture 4: Finite Automata

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Context-Free Grammars and Languages

Advanced Automata Theory 7 Automatic Functions

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Lecture 3: Finite Automata

CS481F01 Prelim 2 Solutions

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

What is this course about?

Introduction to Language Theory and Compilation: Exercises. Session 2: Regular expressions

CS 341 Homework 16 Languages that Are and Are Not Context-Free

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

Transcription:

(NB. Pages 22-40 are intended for those who need repeated study in formal languages) Length of a string Number of symbols in the string. Formal languages Basic concepts for symbols, strings and languages: Alphabet A finite set of symbols. Σ b = { 0,1 } binary alphabet Σ s = { A,B,C,...,Z,Å,Ä,Ö } Swedish characters Σ r = { WHILE,IF,BEGIN,... } reserved words String A finite sequence of symbols from an alphabet. 10011 from Σ b KALLE from Σ s WHILE DO BEGIN from Σ r x arbitrary string, x length of the string x 10011 = 5 according to Σ b WHILE = 5 according to Σ s WHILE = 1 according to Σ r Empty string The empty string is denoted ε, ε = 0 Concatenation Two strings x and y are joined together x y = xy x = AB, y = CDE produce x y = ABCDE xy = x + y xy yx (not commutative) εx = xε = x String exponentiation x 0 = ε x 1 = x x 2 = xx x n = x x n-1, n >= 1 Revision material: Formal languages Page 22 Revision material: Formal languages Page 23 Substrings: Prefix, suffix. x = abc Σ * denotes the set of all strings which can be constructed from the alphabet. Prefix: Substring at the beginning. Prefix of x: abc (improper as the prefix = x), ab, a, ε Suffix: Substring at the end. Suffix of x: abc (improper as the suffix = x), bc, c, ε Languages A finite or infinite set of strings which can be constructed from a special alphabet. Alternatively: a subset of all the strings which can be constructed from an alphabet. * = closure, Kleene closure. + = positive closure. e.g. Σ = {0,1} Σ* = {ε, 0,1,00,01,...,111,101,...} Σ + = Σ * {ε} = {0,1,00,01,...} Operations on languages Concatenation L, M are languages. = the empty language. NB! {ε}. L M = LM = {xy x L and y M} Σ = {0,1} L 1 = {00,01,10,11} all strings of length 2 L 2 = {1,01,11,001,...,111,...} all strings which finish on 1 L 3 = all strings of length 1 which finish on 01 L{ε} = {ε}l = L L = L = L ={ab,cd} M={uv,yz} gives us LM ={abuv,abyz,cduv,cdyz} Revision material: Formal languages Page 24 Revision material: Formal languages Page 25

Exponents of languages L 0 = {ε} L 1 = L L 2 = L L L n = L L n-1, n >= 1 Union of languages L, M are languages. L M = {x x L or x M} L = {ab,cd}, M = {uv,yz} gives us L M = {ab,cd,uv,yz} Closure L * = L 0 L 1... L Positive closure L + =L 1 L 2... L LL * = L * {ε}, if ε L L * = {ε} L + A = {a,b} A* = {ε,a,b,aa,ab,ba,bb,...} = All possible sequences of a and b. A language over A is always a subset of A*. Revision material: Formal languages Page 26 Regular expressions Are used to describe simple languages, e.g. basic symbols, tokens. entifier = letter (letter digit) * Regular expressions over an alphabet Σ denote a language (regular set). Rules for constructing regular expressions Σ is an alphabet, the regular expression r describes the language L r, the regular expression s corresponds to the language L s, etc. Regular expression r Language L r ε {ε} a, a Σ { a } union: (s) (t) L s L t concatenation: (s).(t) L s.l t repetition: (s) * L * s repetition: (s) + L s + Each symbol in the alphabet Σ is a regular expression which denotes {a}. * = repetition, zero or more times. + = repetition, one or more times. Revision material: Formal languages Page 27 Priorities Highest *, +. Lowest Σ = {a,b} 1. r=a L r ={a} 2. r=a * L r ={ε,a,aa,aaa,...} = {a} * 3. r=a b L r ={a,b}={a} {b} 4. r=(a b) * L r ={a,b} * ={ε,a,b,aa,ab,ba,bb,aaa,aab,...} 5. r=(a * b * ) * L r ={a,b} * ={ε,a,b,aa,ab,ba,bb,aaa,aab,...} 6. r=a ba* L r ={a,b,ba,baa,baaa,...}={a or ba i i 0} NB! {a n b n n>=0} can not be described with regular expressions. r=a * b * gives us L r ={a i b j i,j>=0} does not work. r=(ab) * gives us L r ={(ab) i i>=0} ={ε,ab,abab,... } does not work. Regular expressions can not count (have no memory). Context-free grammars an English sentence Sentence: Old Harry jogs Constituent: subject predicate Word class: adjective noun verb <adjective> Old <subject> <sentence> <noun> Harry A grammar is used to describe the syntax. BNF (Backus-Naur form) 1960 (metalanguage to describe languages): <sentence> <subject> <predicate> <subject> <adjective> <noun> <predicate> <verb> <adjective> old big strong... <noun> Harry brother... <verb> jogs snores sleeps... <predicate> <verb> jogs Revision material: Formal languages Page 28 Revision material: Formal languages Page 29

<sentence> is a start symbol. Symbols to the left of are called nonterminals. Symbols not surrounded by < > are terminals. Each line is a production. Symbol Meaning <... > syntactic classes "consists of", "is" (also ::= ) "or" Definition: A CFG (Context-free grammar) is a quadruple (4 parts): G = < N, Σ, P, S > where N : nonterminals. Σ : terminal symbols. P : rules, productions of the form A a where A N and a (N Σ)* S : the start symbol, a nonterminal, S N. (Sometimes V = N Σ is used, called the vocabulary.) The grammar can be used to produce or derive sentences. <sentence> * Old Harry jogs where <sentence> is the start symbol and * means derivation in zero or more steps. e.g. Derivation <sentence> <subject> <predicate> <adjective> <noun> <predicate> Old <noun> <predicate> Old Harry <predicate> Old Harry <verb> Old Harry jogs Revision material: Formal languages Page 30 1.<number> <no> 2.<no> <no> <digit> 3. <digit> 4.<digit> 0 1 2 3 4 5 6 7 8 9 N = { <number>, <no>, <digit> } Σ = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } S = <number> A grammar G denotes the language L(G), L(G) is generated by the grammar G. L(<number>) = { w w digit + } Revision material: Formal languages Page 31 Conventions α, β,γ V * string of terminals and nonterminals A, B, C N nonterminals a, b, c Σ terminal symbols u, v, w, x, y, z Σ string of terminals Derivation α β (pronounced α derives β ) Sentential form A string α is a sentential form in G if S * α and α V * (string of terminals and nonterminals.) <no> <digit> is a sentential form in G(<number>). Sentence w is a sentence in G if S + w and w Σ *. Formally: γαθ γδθ if we have A δ 12 is a sentence in G(<number>). <number> rm <no> rm <no> <digit> rm <no> 2 rm <digit> 2 rm 12 <number> <no> direct derivation. <number> * 12 several derivations (zero or more). <number> + 12 several derivations (one or more). Given G = < N, Σ, P, S > the language generated by G can be defined as L(G): L(G) = { w S + w och w Σ * } Left derivation lm means that we replace the leftmost nonterminal by some appropriate right se. Left sentential form A sentential form which is part of a leftmost derivation. Right derivation (canonical derivation) rm means that we replace the rightmost nonterminal by some appropriate right se. Right sentential form A sentential form which is part of a rightmost derivation. Revision material: Formal languages Page 32 Revision material: Formal languages Page 33

Reverse rightmost derivation 12 rm <digit> 2 rm <no> 2 rm <no> <digit> rm <no> rm <number> Handles Consist of two parts: 1. A production A β 2. A position If S rm * αaw rm αβw, the production is A β and the position after a is a handle of αβw. The handle of <no> 2 is the production <digit> 2 and the position after <no> because: <number> rm <no> rm <no> <digit> rm <no> 2 rm <digit> 2 rm 12 Parse trees (derivation trees) <no> <digit> 1 <number> <no> <digit> A parse tree can correspond to several different derivations. 2 Parse tree för 12 Informally: a handle is what we reduce to what and where to get the previous sentential form in a rightmost derivation. Reduction In reverse right derivation, find a right se in some rule according to the grammar in the given right sentential form and replace it with the corresponding left se, i.e. nonterminal. Revision material: Formal languages Page 34 Revision material: Formal languages Page 35 Ambiguous grammars A grammar G is ambiguous if a sentence in G has several different parse trees. e.g. E E + E E * E E E +* has two different parse trees: E E + E E E * E Rewrite the grammar to make it unambiguous: +, * are to have the right priority and +, * are to be left associative while is to be right associative. a+b+c+d is interpreted as (a+b)+c+d. E E + T T T T * F F F P F P P (left associative) (left associative) (right associative) E * E E + E Home assignment: +(*) *(+) Draw the parse trees for: 1. * * 2. Revision material: Formal languages Page 36 Revision material: Formal languages Page 37

The following grammar generates { a n b n n >= 1 }. S a b e.g. S 0 S 0 1 S 1 0 1 describes binary palindromes of odd length. S a Home assignment: Write a CFG for binary palindromes of all lengths >= 1, i.e. ε is not included. b Excerpt from a Pascal grammar: <goal> <progdecl>. <progdecl> <prog_hedr> ; <block> <prog_hedr> program <name> ( <name_list> ) program <name> <block> <decls> begin <stat_list> end <decls> <labels> <consts> <types> <vars> <procs> <labels> label <label_decl> ; <label_decl> <label_decl>, <label> <label> <label> <int> <> <consts> const <const_decls> <const_decls> <const_decls> <const_decl_c> <const_decl_c> <const_decl_c> <const_decl> ; <const_decl> <name> = <const> <types> type <type_decls> <type_decls> <type_decls> <type_decl_c> <type_decl_c> <type_decl_c> <type_decl> ; <type_decl> <name> = <type> <vars> var <var_decls> <var_decls> <var_decls> <var_decl_c> <var_decl_c> <var_decl_c> <var_decl> ; <var_decl> <_list> : <type> <procs> <proc_decls> <proc_decls> <proc_decls> <proc> <proc> Revision material: Formal languages Page 38 Revision material: Formal languages Page 39 <proc> procedure <phead_c> forward ; procedure <phead_c> <block> ; function <fhead_c> forward ; function <fhead_c> <block> ; <fhead_c> <fhead> ; <fhead> <name> <params> : <type_> <phead_c> <phead> ; <phead> <name> <params> <params> ( <param_list> ) <param> var <par_decl> <par_decl> <par_decl> <_list> : <type_> <param_list> <param_list> ; <param> <param> <_list> <_list>, <> <>... Revision material: Formal languages Page 40