What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

Similar documents
The Three Hour Tour Through Automata Theory

Lecture 1 09/08/2017

1 Alphabets and Languages

CS 455/555: Mathematical preliminaries

Text Search and Closure Properties

Text Search and Closure Properties

Languages. A language is a set of strings. String: A sequence of letters. Examples: cat, dog, house, Defined over an alphabet:

How do regular expressions work? CMSC 330: Organization of Programming Languages

CS 341 Homework 16 Languages that Are and Are Not Context-Free

Sri vidya college of engineering and technology

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Concatenation. The concatenation of two languages L 1 and L 2

Formal Languages. We ll use the English language as a running example.

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

CMSC 330: Organization of Programming Languages

MA/CSSE 474 Theory of Computation

Regular expressions and Kleene s theorem

UNIT II REGULAR LANGUAGES

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

6.8 The Post Correspondence Problem

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

Automata Theory and Formal Grammars: Lecture 1

Theory of Computation

Regular expressions and Kleene s theorem

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

In English, there are at least three different types of entities: letters, words, sentences.

Mathematical Preliminaries. Sipser pages 1-28

Computational Theory

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

Context Free Language Properties

CS 154, Lecture 3: DFA NFA, Regular Expressions

Notes for Comp 497 (454) Week 10

CSCI 340: Computational Models. Regular Expressions. Department of Computer Science

3515ICT: Theory of Computation. Regular languages

Lecture Notes on Inductive Definitions

Chapter 3. Regular grammars

Regular Expressions and Language Properties

CMSC 330: Organization of Programming Languages

Theory of computation: initial remarks (Chapter 11)

MA/CSSE 474 Theory of Computation. Your Questions? Previous class days' material Reading Assignments

Automata: a short introduction

Theory of Computation (Classroom Practice Booklet Solutions)

MA/CSSE 474 Theory of Computation

Recap from Last Time

Closure under the Regular Operations

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

Lecture 17: Language Recognition

Context-Free Grammars (and Languages) Lecture 7

Lecture Notes on Inductive Definitions

Properties of Context-Free Languages. Closure Properties Decision Properties

Formal Languages. We ll use the English language as a running example.

MA/CSSE 474 Theory of Computation

Before We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions?

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

Context-Free and Noncontext-Free Languages

Finite Automata Part Two

CPS 220 Theory of Computation REGULAR LANGUAGES

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Supplementary Notes on Inductive Definitions

Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries

A Universal Turing Machine

Languages, regular languages, finite automata

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Theory of Computation 8 Deterministic Membership Testing

Theory of Computation 1 Sets and Regular Expressions

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CSE 105 THEORY OF COMPUTATION

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

CS243, Logic and Computation Nondeterministic finite automata

Decision Problems with TM s. Lecture 31: Halting Problem. Universe of discourse. Semi-decidable. Look at following sets: CSCI 81 Spring, 2012

Functions on languages:

On Parsing Expression Grammars A recognition-based system for deterministic languages

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

Closure Properties of Regular Languages

The Pumping Lemma for Context Free Grammars

CSE 105 THEORY OF COMPUTATION

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

1. Induction on Strings

Equivalence of Regular Expressions and FSMs

Chapter-6 Regular Expressions

Computability Theory

Fall 1999 Formal Language Theory Dr. R. Boyer. Theorem. For any context free grammar G; if there is a derivation of w 2 from the

CS 121, Section 2. Week of September 16, 2013

CVO103: Programming Languages. Lecture 2 Inductive Definitions (2)

NPDA, CFG equivalence

CPSC 421: Tutorial #1

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

What is this course about?

CSE 105 THEORY OF COMPUTATION

Context-Free Languages (Pre Lecture)

Assignment #2 COMP 3200 Spring 2012 Prof. Stucki

Section 1 (closed-book) Total points 30

CSC173 Workshop: 13 Sept. Notes

Transcription:

Do Homework 2. What Is a Language? Grammars, Languages, and Machines L Language Grammar Accepts Machine Strings: the Building Blocks of Languages An alphabet is a finite set of symbols: English alphabet: {A, B, C,, Z} Binary alphabet: {0, 1} A string over an alphabet is a finite sequence of symbols drawn from the alphabet. English string: happynewyear binary string: 1001101 We will generally omit from strings unless doing so would lead to confusion. The set of all possible strings over an alphabet Σ is written Σ*. binary string: 1001101 {0,1}* The shortest string contains no characters. It is called the empty string and is written or ε (epsilon). The set of all possible strings over an alphabet Σ is written Σ*. The length of a string is the number of symbols in it. ε = 0 1001101 = 7 More on Strings A string a is a substring of a string b if a occurs contiguously as part of b. aaa is a substring of aaabbbaaa aaaaaa is not a substring of aaabbbaaa Every string is a substring (although not a proper substring) of itself. ε is a substring of every string. Alternatively, we can match ε anywhere. Notice the analogy with sets here. Lecture Notes 2 What is a Language? 1

Operations on Strings Concatenation: The concatenation of two strings x and y is written x y, x y, or xy and is the string formed by appending the string y to the string x. xy = x + y If x = ε and y = food, then xy = If x = good and y = bye, then xy = Note: x ε = ε x = x for all strings x. Replication: For each string w and each natural number i, the string w i is defined recursively as w 0 = ε w i = w i-1 w for each i 1 Like exponentiation, the replication operator has a high precedence. Examples: a 3 = (bye) 2 = a 0 b 3 = An inductive definition: String Reversal (1) If w = 0 then w R = w = ε (2) If w 1 then a Σ: w = u a (a is the last character of w) and w R = a u R Example: (abc) R = More on String Reversal Theorem: If w, x are strings, then (w x) R = x R w R Example: (dogcat) R = (cat) R (dog) R = tacgod Proof (by induction on x ): Basis: x = 0. Then x = ε, and (w x) R = (w ε) R = (w) R = ε w R = ε R w R = x R w R Induction Hypothesis: If x n, then (w x) R = x R w R Induction Step: Let x = n + 1. Then x = u a for some character a and u = n (w x) R = (w (u a)) R = ((w u) a) R associativity = a (w u) R definition of reversal = a u R w R induction hypothesis = (u a) R w R definition of reversal = x R w R d o g c a t w x u a Lecture Notes 2 What is a Language? 2

Defining a Language A language is a (finite or infinite) set of finite length strings over a finite alphabet Σ. Example: Let Σ = {a, b} Some languages over Σ:, {ε}, {a, b}, {ε, a, aa, aaa, aaaa, aaaaa} The language Σ* contains an infinite number of strings, including: ε, a, b, ab, ababaaa L = {x {a, b}* : all a's precede all b's} Example Language Definitions L = {x : y {a, b}* : x = ya} L = {a n, n 0 } L = a n (If we say nothing about the range of n, we will assume that it is drawn from N, i.e., n 0.) L = {x#y: x,y {0-9}* and square(x) = y} L = {} = (the empty language not to be confused with {ε}, the language of the empty string) Techniques for Defining Languages Languages are sets. Recall that, for sets, it makes sense to talk about enumerations and decision procedures. So, if we want to provide a computationally effective definition of a language we could specify either a Language generator, which enumerates (lists) the elements of the language, or a Language recognizer, which decides whether or not a candidate string is in the language and returns True if it is and False if it isn't. Example: The logical definition: language recognizer. L = {x : y {a, b}* : x = ya} can be turned into either a language generator or a How Large are Languages? The smallest language over any alphabet is. = 0 The largest language over any alphabet is Σ*. Σ* =? - If Σ = then Σ* = {ε} and Σ* = 1 - If Σ then Σ* is countably infinite because its elements can be enumerated in 1 to 1 correspondence with the integers as follows: 1. Enumerate all strings of length 0, then length 1, then length 2, and so forth. 2. Within the strings of a given length, enumerate them lexicographically. E.g., aa, ab, ba, bb So all languages are either finite or countably infinite. Alternatively, all languages are countable. Operations on Languages 1 Normal set operations: union, intersection, difference, complement Examples: Σ = {a, b} L 1 = strings with an even number of a's L 2 = strings with no b's L 1 L 2 = L 1 L 2 = L 2 - L 1 = ( L 2 - L 1 ) = Lecture Notes 2 What is a Language? 3

Operations on Languages 2 Concatenation: (based on the definition of concatenation of strings) If L 1 and L 2 are languages over Σ, their concatenation L = L 1 L 2, sometimes L 1 L 2, is {w Σ*: w = x y for some x L 1 and y L 2 } Examples: L 1 = {cat, dog} L 2 = {apple, pear} L 1 L 2 = {catapple, catpear, dogapple, dogpear} L 1 = {a n : n 1} L 2 = {a n : n 3} L 1 L 2 = Identities: L = L = L (analogous to multiplication by 0) L{ε}= {ε}l = L L (analogous to multiplication by 1) Replicated concatenation: L n = L L L L (n times) L 1 = L L 0 = {ε} Example: L = {dog, cat, fish} L 0 = {ε} L 1 = {dog, cat, fish} L 2 = {dogdog, dogcat, dogfish, catdog, catcat, catfish, fishdog, fishcat, fishfish} Concatenating Languages Defined Using Variables L 1 = a n = {a n : n 0} L 2 = b n = {b n : n 0} L 1 L 2 = {a n : n 0}{b n : n 0} = { a n b m : n,m 0} (common mistake: ) a n b n = { a n b n : n 0} Note: The scope of any variable used in an expression that invokes replication will be taken to be the entire expression. L = 1 n 2 m L = a n b m a n Operations on Languages 3 Kleene Star (or Kleene closure): L* = {w Σ* : w = w 1 w 2 w k for some k 0 and some w 1, w 2, w k L} Alternative definition: L* = L 0 L 1 L 2 L 3 Note: L, ε L* Example: L = {dog, cat, fish} L* = {ε, dog, cat, fish, dogdog, dogcat, fishcatfish, fishdogdogfishcat, } Another useful definition: L + = L L* (L + is the closure of L under concatenation) Alternatively, L + = L 1 L 2 L 3 L + = L*-{ε} L + = L* if ε L if ε L Lecture Notes 2 What is a Language? 4

Regular Languages Read Supplementary Materials: Regular Languages and Finite State Machines: Regular Languages Do Homework 3. Regular Grammars, Languages, and Machines L Regular Language Regular Expression or Regular Grammar Accepts Finite State Machine Pure Regular Expressions The regular expressions over an alphabet Σ are all strings over the alphabet Σ { (, ),,, *} that can be obtained as follows: 1. and each member of Σ is a regular expression. 2. If α, β are regular expressions, then so is αβ 3. If α, β are regular expressions, then so is α β. 4. If α is a regular expression, then so is α*. 5. If α is a regular expression, then so is (α). 6. Nothing else is a regular expression. If Σ = {a,b} then these are regular expressions:, a, bab, a b, (a b)*a*b* So far, regular expressions are just (finite) strings over some alphabet, Σ { (, ),,, *}. Regular Expressions Define Languages Regular expressions define languages via a semantic interpretation function we'll call L: 1. L( ) = and L(a) = {a} for each a Σ 2. If α, β are regular expressions, then L(αβ) = L(α) L(β) = all strings that can be formed by concatenating to some string from L(α) some string from L(β). Note that if either α or β is, then its language is, so there is nothing to concatenate and the result is. 3. If α, β are regular expressions, then L(α β) = L(α) L(β) 4. If α is a regular expression, then L(α*) = L(α)* 5. L( (α) ) = L(α) A language is regular if and only if it can be described by a regular expression. A regular expression is always finite, but it may describe a (countably) infinite language. Lecture Notes 3 Regular Languages 1

Regular Languages An equivalent definition of the class of regular languages over an alphabet Σ: The closure of the languages {a} a Σ and [1] with respect to the functions: concatenation, [2] union, and [3] Kleene star. [4] In other words, the class of regular languages is the smallest set that includes all elements of [1] and that is closed under [2], [3], and [4]. Closure and Closed Informally, a set can be defined in terms of a (usually small) starting set and a group of functions over elements from the set. The functions are applied to members of the set, and if anything new arises, it s added to the set. The resulting set is called the closure over the initial set and the functions. Note that the functions(s) may only be applied a finite number of times. Examples: The set of natural numbers N can be defined as the closure over {0} and the successor (succ(n) = n+1) function. Regular languages can be defined as the closure of {a} a Σ and and the functions of concatenation, union, and Kleene star. We say a set is closed over a function if applying the function to arbitrary elements in the set does not yield any new elements. Examples: The set of natural numbers N is closed under multiplication. Regular languages are closed under intersection. See Supplementary Materials Review of Mathematical Concepts for more formal definitions of these terms. Examples of Regular Languages L( a*b* ) = L( (a b) ) = L( (a b)* ) = L( (a b)*a*b*) = L = {w {a,b}* : w is even} L = {w {a,b}* : w contains an odd number of a's} Augmenting Our Notation It would be really useful to be able to write ε in a regular expression. Example: (a ε) b (Optional a followed by b) But we'd also like a minimal definition of what constitutes a regular expression. Why? Observe that 0 = {ε} (since 0 occurrences of the elements of any set generates the empty string), so * = {ε} So, without changing the set of languages that can be defined, we can add ε to our notation for regular expressions if we specify that L(ε) = {ε} We're essentially treating ε the same way that we treat the characters in the alphabet. Having done this, you'll probably find that you rarely need in any regular expression. Lecture Notes 3 Regular Languages 2

L( (aa*) ε ) = L( (a ε)* ) = L = { w {a,b}* : there is no more than one b} L = { w {a,b}* : no two consecutive letters are the same} More Regular Expression Examples Further Notational Extensions of Regular Expressions A fixed number of concatenations: α n means αααα α (n times). At Least 1: α + means 1 or more occurrences of α concatenated together. Shorthands for denoting sets, such as ranges, e.g., (A-Z) or (letter-letter) Example: L = (A-Z) + ((A-Z) (0-9))* A replicated regular expression α n, where n is a constant. Example: L = (0 1) 20 Intersection: α β (we ll prove later that regular languages are closed under intersection) Example: L = (a 3 )* (a 5 )* Operator Precedence in Regular Expressions Regular expressions are strings in the language of regular expressions. Thus to interpret them we need to: 1. Parse the string 2. Assign a meaning to the parse tree Parsing regular expressions is a lot like parsing arithmetic expressions. To do it, we must assign precedence to the operators: Regular Arithmetic Expressions Expressions Highest Kleene star exponentiation concatenation intersection multiplication Lowest union addition a b* c d* x y 2 + i j 2 Regular Expressions and Grammars Recall that grammars are language generators. A grammar is a recipe for creating strings in a language. Regular expressions are analogous to grammars, but with two special properties: 1. The have limited power. They can be used to define only regular languages. 2. They don't look much like other kinds of grammars, which generally are composed of sets of production rules. But we can write more "standard" grammars to define exactly the same languages that regular expressions can define. Specifically, any such grammar must be composed of rules that: have a left hand side that is a single nonterminal have a right hand side that is ε, or a single terminal, or a single terminal followed by a single nonterminal. Lecture Notes 3 Regular Languages 3

L={w {a, b}* : w is even} ((aa) (ab) (ba) (bb))* Regular Grammar Example Notice how these rules correspond naturally to a FSM: S ε S at S bt T a T b T as T bs S a, b a, b T Generators and Recognizers Generator Recognizer Language Regular Languages Regular Expressions Regular Grammars? Lecture Notes 3 Regular Languages 4