Formal Languages. We ll use the English language as a running example.

Similar documents
Formal Languages. We ll use the English language as a running example.

Closure under the Regular Operations

Sri vidya college of engineering and technology

Languages. A language is a set of strings. String: A sequence of letters. Examples: cat, dog, house, Defined over an alphabet:

1 Alphabets and Languages

Regular Expression Unit 1 chapter 3. Unit 1: Chapter 3

CS 455/555: Finite automata

3515ICT: Theory of Computation. Regular languages

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Non-deterministic Finite Automata (NFAs)

CS 121, Section 2. Week of September 16, 2013

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

How do regular expressions work? CMSC 330: Organization of Programming Languages

Finite Automata and Formal Languages TMV026/DIT321 LP4 2012

Concatenation. The concatenation of two languages L 1 and L 2

Finite Automata and Regular languages

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Regular Expressions. Definitions Equivalence to Finite Automata

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Deterministic Finite Automata (DFAs)

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

UNIT-III REGULAR LANGUAGES

Deterministic Finite Automata (DFAs)

Kleene Algebras and Algebraic Path Problems

Finite Automata and Formal Languages

Outline. Nondetermistic Finite Automata. Transition diagrams. A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F)

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

Closure under the Regular Operations

Before we show how languages can be proven not regular, first, how would we show a language is regular?

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

CSE 105 THEORY OF COMPUTATION

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Nondeterministic Finite Automata

Finite Automata and Regular Languages

CS243, Logic and Computation Nondeterministic finite automata

Computational Theory

CS 154. Finite Automata, Nondeterminism, Regular Expressions

In English, there are at least three different types of entities: letters, words, sentences.

CSE 105 THEORY OF COMPUTATION

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Deterministic Finite Automata (DFAs)

4.1, 4.2: Analysis of Algorithms

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

CSE 105 THEORY OF COMPUTATION

Automata: a short introduction

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

Regular Expressions [1] Regular Expressions. Regular expressions can be seen as a system of notations for denoting ɛ-nfa

Inf2A: Converting from NFAs to DFAs and Closure Properties

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Nondeterministic finite automata

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

Finite Automata and Regular Languages (part III)

Uses of finite automata

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

Finite Automata Part Two

Theory of Computation (I) Yijia Chen Fudan University

Properties of Context-Free Languages. Closure Properties Decision Properties

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Introduction to Finite-State Automata

COSE312: Compilers. Lecture 2 Lexical Analysis (1)

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Introduction to Formal Languages, Automata and Computability p.1/51

CS 154, Lecture 3: DFA NFA, Regular Expressions

CSC173 Workshop: 13 Sept. Notes

More on Finite Automata and Regular Languages. (NTU EE) Regular Languages Fall / 41

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

This Lecture will Cover...

Automata Theory and Formal Grammars: Lecture 1

Computational Models - Lecture 1 1

Computer Sciences Department

EDUCATIONAL PEARL. Proof-directed debugging revisited for a first-order version

Lecture 3: Nondeterministic Finite Automata

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

Ogden s Lemma for CFLs

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )

Computational Models - Lecture 4

Regular Expressions and Language Properties

Non-Deterministic Finite Automata

Course 4 Finite Automata/Finite State Machines

UNIT II REGULAR LANGUAGES

Computational Models Lecture 2 1

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Extended transition function of a DFA

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

Computational Models Lecture 2 1

Computational Models #1

Equivalence of DFAs and NFAs

Transcription:

Formal Languages We ll use the English language as a running example. Definitions. A string is a finite set of symbols, where each symbol belongs to an alphabet denoted by Σ. Examples. The set of all strings that can be constructed from an alphabet Σ is Σ. If x, y are two strings of lengths x and y, then: xy or x y is the concatenation of x and y, so the length, xy = x + y (x) R is the reversal of x the k th -power of x is x k = { ɛ if k =0 x k 1 x, if k>0 equal, substring, prefix, suffix are defined in the expected ways. Note that the language is not the same language as ɛ. 73

Operations on Languages Suppose that L E is the English language and that L F is the French language over an alphabet Σ. Complementation: L = Σ L L E is the set of all words that do NOT belong in the english dictionary. Union: L 1 L 2 = {x : x L 1 or x L 2 } L E L F is the set of all english and french words. Intersection: L 1 L 2 = {x : x L 1 and x L 2 } L E L F is the set of all words that belong to both english and french...eg., journal Concatenation: L 1 L 2 is the set of all strings xy such that x L 1 and y L 2 Q: What is an example of a string in L E L F? goodnuit Q: What if L E or L F is? What is L E L F? 74

Kleene star: L. Also called the Kleene Closure of L and is the concatenation of zero or more strings in L. Recursive Definition Base Case: ɛ L Induction Step: If x L and y L then xy L Language Exponentiation Repeated concatenation of a language L. L k = { {ɛ} if k =0 L k 1 L, if k>0 Reversal The language Rev(L) is the language that results from reversing all strings in L. Q: How do we define the strings that belong to a language such as English, French, Java, arithmetic, etc. Example: For the language of arithmetic, LA: Define Σ = {N} {+,, =, (, )} then but )((2(+4(= Σ )((2(+4(= LA. 75

Regular Expressions A regular expression over an alphabet Σ consists of 1. Symbols in the alphabet 2. The symbols {+, (, ), } where + means OR and means zero or more times. Recursive Definition. Let the set RE of ALL regular expressions, be the smallest set such that: Basis:, ɛ,a RE, a Σ Inductive Step: if R and S are regular expressions RE, then so are: (R + S), (RS),R Examples: Let Σ = {0, 1}: Regular Expression Corresponding Language (0 + 1) Σ ((0 + 1)(0 + 1) ) all non-empty strings in Σ ((0 + 1)(0 + 1)) all even length strings ɛ + 0 + 0(0 + 1) 0 all strings that don t begin/end with 1 11(0 + 11) all strings with 1 s in pairs 76

Relating Regular Expressions to Languages Let L(R) represent the language constructed by the regular expression R. We define L(R) inductively as follows: Base Case: L( ) = L(ɛ) ={ɛ} For any a Σ, L(a) = {a} Induction Step: If R is a regular expression, then by definition of R, R = ST, or R = S + T, or R = S where S and T are regular expressions and by induction, L(S) and L(T ) have been defined. 77

We can define the language denoted by R, ie., L(R) as follows: L((S + T )) = L(S) union L(T ) L((ST )) = L(S) cat L(T ) L(S ) = (L(S)) Q: Why is this definition important? We can construct the language defined by a regular expression by building the set from smaller regular expressions. Example Q: What is a regular expression R A to denote the language of strings consisting of only an even number of a s? e.g., aa, aaaa, aaaaaaaa etc. (aa) Q: What is a regular expression R B for the language of strings consisting of 1 or more triples of b s? e.g., bbb, bbbbbb, bbbbbbbbb. bbb(bbb) Q: What is a regular expression, R AB, for the language of strings consisting of an even number of a s sandwiched between 1 or more triples of b? eg., bbbaabbb, or bbbaaaaaabbb R B R A R B = bbb(bbb) (aa) bbb(bbb) 78

Equivalence. We say that two regular expressions R and S are equivalent if they describe the same language. In other words, if L(R) =L(S) for two regular expressions R and S then R = S. Examples. Are R and S equivalent? no. Q: Why? R = a (ba ba ) and S = a (ba b) a bbaabb is in R but not in S. Are R =(a(a + b) ) and S =(a(a + b)) equivalent? NO. R denotes strings all nonempty strings starting with a and S denotes all strings that can be split into pairs of symbols such that the first symbol is always an a

Regular Expression Equivalences There exist equivalence axioms for regular expressions that are very similar to those for predicate/propositional logic. Equivalences for Regular Expressions Commutativity of union: (R+S) = (S+R) Associativity of union: (R+S) + T = R+(S+T) Associativity of concatenation: (RS)T = R(ST) Left distributivity: R(S+T) = RS + RT Right distributivity: (S+T)R = SR + TR Identity of Union: R+ = R Identity of Concatenation: Rɛ Annihilator for concatenation: R = = R Idempotence of Kleene star: R = R Theorem (Substitution) If two substrings R and R are equivalent then if R is a substring of S then replacing R by R constructs a new regular expression equivalent to S. 79

Equivalent Regular Expressions Q: How can we determine whether two regular expressions denote the same language? To show equivalency, one method is to use the previous axioms to construct a proof. To show that two regular expressions are NOT equivalent we only need to find a string that belongs to the language denoted by one expression but not the other. Examples. Prove that (0110 + 01)(10) 01(10) Proof. (0110 + 01)(10) (0110 + 01ɛ)(10) substitution, 10 by 10 ɛ. (01(10 + ɛ))(10) by distributivity 01((10 + ɛ)(10) ) assoc. of concat. 01((ɛ + 10)(10) ) commutativity of union 01(ɛ10 + 10(10) ) right distributive 01(10 + 10(10) ) substitution, 10ɛ by 10 01(10) since L(10 ) includes every string L(10(10) ) 80

Another Example. Prove that R denotes the language L of all strings that contain an even number of 0 s. Equivalently, R =1 (01 01 ) x L x L(R) Proof. ( ) Let x L(R). Then x L(1 (01 01 ) )=L(1 )L(01 )L(01 ) Let x = y(zw) then y L(1 ),z L(01 ),w L(01 ) Therefore, y has zero 0s Therefore, w has 1 zero Therefore, z has 1 zero So, x = y(zw) has zero 0s plus a multiple of 2 zeros. 81

( ) Suppose that x is an arbitrary string in L. x has an even number of 0 s. Denote by 2k for some k N. How can we rewrite x consisting of 0 s and 1s? x =1...101...10 for 2k 0 s. Let x = y 0, y 1, y 2,...,y k, so y 0 =1 n 1 L(1 ) y i =01...101...1 = 01 n i 01m i L(01 01 ) ( from the 2i 1st 0 to just before the (2i + 1)st 0 (if it exists)) y i L(01 01 ), 1 i k So x = y 0 y 1...y k L(1 )(L(01 01 )) = L(1(01 01 ) ). Q: Can every possible type of string be represented by a regular expression? To answer this, we turn to Finite State Machines. 82

String Matching and Finite State Machines Given source code (say in Java) Find the comments may need to remove comments for software transformations public class QuickSort { private static long comparisons = 0; private static long exchanges = 0; /*********************************************************************** * Quicksort code from Sedgewick 7.1, 7.2. ***********************************************************************/ public static void quicksort(double[] a) { shuffle(a); // to guard against worst-case quicksort(a, 0, a.length - 1); } public static void quicksort(double[] a, int left, int right) { if (right <= left) return; int i = partition(a, left, right); quicksort(a, left, i-1); quicksort(a, i+1, right); } private static int partition(double[] a, int left, int right) { int i = left - 1; int j = right; while (true) { while (less(a[++i], a[right])) // find item on left to swap ; // a[right] acts as sentinel while (less(a[right], a[--j])) // find item on right to swap if (j == left) break; // don't go out-of-bounds if (i >= j) break; // check if pointers cross exch(a, i, j); // swap two elements into place } exch(a, i, right); // swap with partition element return i; } 83

Q. What patterns are we looking for? // text \nl or / text / Q. What do we know if we see a / followed by a we are in a comment / we are in a comment text not in comment Q. What do we know if we see / followed by a might be at the end of a comment if next char is / / not end of comment text in the comment Let s represent these ideas with a diagram. 84

Deterministic Finite State Automata (DFSA or DFA) A DFA consists of: Q. a set of states (this set is finite) Σ. an alphabet that strings are composed from s Q. a start state where you feed in the string F Q. a set of accepting/final states δ. Q Σ Q this is the transition function, means that you pass it the current state and the input and it tells you which state to go to. Comment Example. Q = {start, /, //, /*, *, accept } Σ = {text, /, \nl, *} s = start F = accept δ: Q Σ Q 85

Example cont... δ(state, input) / * text \nl start / // /* accept Q: What if we want to know which state the input **// ends at if we begin at start? Two Options. 1. Compute: δ(δ(δ(δ(start, *),*), /),/). 2. Define δ. δ takes a string and returns the final state after processing the entire string. Then, δ(δ(δ(δ(start, *),*), /),/)= δ (start,**//) = // Formal definition of δ (q, x) (reading left to right): { δ q if x = ɛ (q, x) = δ(δ (q, z),a) if x = za, a Σ,z Σ 86

Regular Expressions and DFA The set of strings accepted by an automaton defines a langauge. For automaton M the language M accepts is L(M ). Given regular expression R, find M such that Examples. L(R) =L(M). Let regular expression R 1 = (1 + 00). Q. Which strings belong to L(R 1 )? L(R 1 )={x {0, 1} all 0 s are in pairs, i.e., 00} Q: What is a DFA M 1 such that L(M 1 )=L(R 1 )? 87

DFSA Conventions Strings ending at a final state are accepted (if we want to accept/reject). Drop dead states. Group elements that go from and to the same states. Examples cont. Let regular expression R 2 = 1(1 + (01)). Q. Which strings belong to L(R 2 )? L(R 2 )={x {0, 1} every 0 is sandwiched between 1s.} Q: What is a DFA M 2 such that L(M 2 )=L(R 2 )? 88

δ : δ(q 0, 0) = d 1 δ(q 0, 1) = q 1 δ(q 1, 0) = q 2 δ(q 1, 1) = q 1 δ(q 2, 0) = d 1 δ(q 2, 1) = q 1 δ(d 1, 0 or 1) = d 1 Q: How do we know that our machine M is correct? We can show this by proving that δ (q 0,x) only accepts those strings in L(R 2 ). Q: What might be a good way to do this? INDUCTION! Proving a DFA is Correct Q: What should we do induction on? either the length of the string, or on the structure of the string...same thing Q: What should our S(x) include? it should say something about δ and the types of strings x that are accepted. 89