Languages and Computation (G52LAC) Lecture notes Spring 2018

Similar documents
The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

3 Regular expressions

CS 573 Automata Theory and Formal Languages

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Finite State Automata and Determinisation

NON-DETERMINISTIC FSA

= state, a = reading and q j

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

Nondeterministic Finite Automata

Nondeterministic Automata vs Deterministic Automata

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Homework 3 Solutions

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

Chapter 4 State-Space Planning

CS241 Week 6 Tutorial Solutions

Formal languages, automata, and theory of computation

1 Nondeterministic Finite Automata

Parse trees, ambiguity, and Chomsky normal form

Lecture 08: Feb. 08, 2019

Convert the NFA into DFA

@#? Text Search ] { "!" Nondeterministic Finite Automata. Transformation NFA to DFA and Simulation of NFA. Text Search Using Automata

Regular languages refresher

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

CSE 401 Compilers. Today s Agenda

Formal Languages and Automata

First Midterm Examination

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Bisimulation, Games & Hennessy Milner logic

Minimal DFA. minimal DFA for L starting from any other

Introduction to Olympiad Inequalities

2.4 Theoretical Foundations

6.5 Improper integrals

Prefix-Free Regular-Expression Matching

A Study on the Properties of Rational Triangles

Part 4. Integration (with Proofs)

Designing finite automata II

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

Let's start with an example:

Closure Properties of Regular Languages

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic

Coalgebra, Lecture 15: Equations for Deterministic Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

CMSC 330: Organization of Programming Languages

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Lecture 6: Coding theory

Thoery of Automata CS402

Finite Automata-cont d

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

System Validation (IN4387) November 2, 2012, 14:00-17:00

SEMANTIC ANALYSIS PRINCIPLES OF PROGRAMMING LANGUAGES. Norbert Zeh Winter Dalhousie University 1/28

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix

CHAPTER 1 Regular Languages. Contents

Harvard University Computer Science 121 Midterm October 23, 2012

First Midterm Examination

More on automata. Michael George. March 24 April 7, 2014

FABER Formal Languages, Automata and Models of Computation

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

Worked out examples Finite Automata

Bases for Vector Spaces

CS 275 Automata and Formal Language Theory

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

SWEN 224 Formal Foundations of Programming WITH ANSWERS

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

CS 275 Automata and Formal Language Theory

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Lecture 3: Equivalence Relations

1 From NFA to regular expression

Semantic Analysis. CSCI 3136 Principles of Programming Languages. Faculty of Computer Science Dalhousie University. Winter Reading: Chapter 4

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

CSCI565 - Compiler Design

Name Ima Sample ASU ID

Regular expressions, Finite Automata, transition graphs are all the same!!

Chapter 2 Finite Automata

Descriptional Complexity of Non-Unary Self-Verifying Symmetric Difference Automata

Discrete Structures Lecture 11

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

The Word Problem in Quandles

PYTHAGORAS THEOREM WHAT S IN CHAPTER 1? IN THIS CHAPTER YOU WILL:

Symmetrical Components 1

Lecture 09: Myhill-Nerode Theorem

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

Transcription:

Lnguges nd Computtion (G52LAC) Leture notes Spring 28 Thorsten Altenkirh, Vennzio Cprett, nd Henrik Nilsson Contents Ferury 2, 28 Introdution 4. Exmple: Vlid Jv progrms................... 4.2 Exmple: The hlting prolem................... 5.3 Exmple: The λ-lulus....................... 6.4 P versus NP.............................. 7 2 Forml Lnguges 9 2. Exerises............................... 3 Finite Automt 3 3. Deterministi finite utomt.................... 3 3.. Wht is DFA?....................... 3 3..2 The lnguge of DFA................... 5 3.2 Nondeterministi finite utomt.................. 6 3.2. Wht is n NFA?....................... 6 3.2.2 The lnguge epted y n NFA............. 8 3.2.3 The suset onstrution................... 2 3.2.4 Corretness of the suset onstrution........... 25 3.3 Exerises............................... 26 4 Regulr Expressions 29 4. Wht re regulr expressions?.................... 29 4.2 The mening of regulr expressions................. 3 4.3 Algeri lws............................ 33 4.4 Trnslting regulr expressions into NFAs............. 34 4.5 Summing up............................. 44 4.6 Exerises............................... 45 5 Minimiztion of Finite Automt 46 5. The tle-filling lgorithm...................... 46 5.2 Exmple of DFA minimiztion using the tle-filling lgorithm. 47 6 Disproving Regulrity 5 6. The pumping lemm......................... 5 6.2 Applying the pumping lemm.................... 5 6.3 Exerises............................... 52 7 Context-Free Grmmrs 53 7. Wht re ontext-free grmmrs?.................. 53 7.2 The mening of ontext-free grmmrs............... 55 7.3 The reltion etween regulr nd ontext-free lnguges..... 56 7.4 Derivtion trees............................ 57 7.5 Amiguity............................... 59 7.6 Applitions of ontext-free grmmrs............... 62 7.7 Exerises............................... 64 8 Trnsformtions of ontext-free grmmrs 66 8. Equivlene of ontext-free grmmrs............... 66 8.2 Elimintion of uselsss produtions................. 66 8.3 Sustitution.............................. 67 8.4 Left ftoring............................. 68 8.5 Dismiguting ontext-free grmmrs............... 68 8.6 Elimintion of left reursion..................... 7 8.7 Exerises............................... 74 9 Pushdown Automt 75 9. Wht is pushdown utomton?.................. 75 9.2 How does PDA work?....................... 76 9.3 The lnguge of PDA....................... 77 9.4 Deterministi PDAs......................... 78 9.5 Context-free grmmrs nd push-down utomt......... 79 Reursive-Desent Prsing 8. Wht is prsing?........................... 8.2 Prsing strtegies........................... 8.3 Bsis of reursive-desent prsing................. 82.4 Hndling hoie............................ 85.5 Reursive-desent prsing nd left-reursion............ 88.6 Preditive prsing.......................... 88.6. First nd follow sets..................... 9.6.2 LL() grmmrs....................... 9.6.3 Nullle nonterminls.................... 9.6.4 Computing first sets..................... 92.6.5 Computing follow sets.................... 93.6.6 Implementing preditive prser.............. 94.6.7 LL(), left-reursion, nd miguity............ 96.6.8 Stisfying the LL() onditions............... 98.7 Beyond hnd-written prsers: use prser genertors........ 98.8 Exerises............................... 2

Turing Mhines. Wht is Turing mhine?......................2 Grmmrs nd ontext-sensitivity................. 4.3 The hlting prolem......................... 5.4 Reursive nd reursively enumerle sets............. 6.5 Bk to Chomsky........................... 9.6 Exerises............................... 2 λ-clulus 2 2. Syntx of λ-lulus......................... 2 2.2 Churh numerls........................... 5 2.3 Other dt strutures........................ 6 2.4 Confluene.............................. 7 2.5 Reursion............................... 8 2.6 The universlity of λ-lulus.................... 9 2.7 Exerises............................... 2 3 Algorithmi Complexity 2 3. The Stisfiility Prolem...................... 22 3.2 Time Complexity........................... 22 3.3 NP-ompleteness........................... 24 3.4 Exerises............................... 25 A Model Answers to Exerises 27 List of exerises Exerise 2................................. Exerise 2.2................................ 2 Exerise 3................................. 26 Exerise 3.2................................ 26 Exerise 3.3................................ 26 Exerise 3.4................................ 27 Exerise 3.5................................ 27 Exerise 4................................. 45 Exerise 4.2................................ 45 Exerise 4.3................................ 45 Exerise 6................................. 52 Exerise 7................................. 64 Exerise 7.2................................ 64 Exerise 7.3................................ 64 Exerise 7.4................................ 65 Exerise 8................................. 74 Exerise 8.2................................ 74 Exerise................................ Exerise................................ Exerise.2............................... Exerise.3............................... Exerise 2................................ 2 Exerise 2.2............................... 2 Exerise 3................................ 25 Exerise 3.2............................... 25 Exerise 3.3............................... 26 Introdution This module is out two fundmentl notions in omputer siene, lnguges nd omputtion, nd how they re relted. Speifi topis inlude: Automt Theory Forml Lnguges Models of Computtion Complexity Theory The module strts with n investigtion of lsses of forml lnguges nd relted strt mhines, onsiders prtil uses of this theory suh s prsing, nd finishes with disussion on wht omputtion is, wht n nd nnot e omputed t ll, nd wht n e omputed effiiently, inluding fmous results suh s the Hlting Prolem nd open prolems suh s P versus NP. G52LAC uilds on G5MSC Mthemtis for Computer Sientists nd prts of G52ACE Algorithms Corretness nd Effiieny. It feeds into modules suh s G53CMP Compilers, G53COM Computility, nd G54FOP Mthemtil Foundtions of Progrmming. To give you more onrete ide out wht you will enounter in this module, s well s the roder signifine of the module nd it of historil ontext, we will illustrte with some exmples.. Exmple: Vlid Jv progrms Consider the following Jv frgment: lss Foo { int n; void printnsqrd() { System.out.println(n * n); } } Aswrittenusingtexteditororsstoredinfile,itisjuststringofhrters. But not ny string is vlid Jv progrm. For exmple, Jv uses speifi keywords, hve rules for wht identifiers must look like, nd requires proper nesting, suh s definition of method inside definition of lss. This rises numer of questions: How to desrie the set of strings tht re vlid Jv progrms? Given string, how to determine if it is vlid Jv progrm or not? How to reover the struture of Jv progrm from flt string? 3 4

To nswer suh questions, we will study regulr expressions nd grmmrs to give preise desriptions of lnguges, nd vrious kinds of utomt to deide if stringelongstolngugeornot.we willlsoonsiderhowtosystemtilly derive progrms tht effiiently nswer this type of questions, drwing diretly from the theory. Suh progrms re key prts of ompilers, we rowsers nd we servers, nd in ft of ny progrm tht uses strutured text in one wy or nother. A little it of history. Context-free grmmrs were invented y Amerin linguist, philosopher, nd ognitive sientist Nom Chomsky (928 ) in n ttempt to desrie nturl lnguges formlly. He lso introdued the Chomsky Hierrhy whih lssifies grmmrs nd lnguges nd their desriptive power: All lnguges tried (ll numers up to 2 6!), the sequene does indeed terminte. But so fr, no one hs een le to prove tht it lwys will! The fmous mthemtiin Pul Erdős even sid: Mthemtis my not e redy for suh prolems. (See Colltz onjeture, Wikipedi.) The following importnt deidility result should then perhps not ome s totl surprise: It is impossile to write progrm tht deides if nother, ritrry, progrm termintes (hlts) or not. This is known s the Hlting Prolem nd it is thus one exmple of n undeidle prolem: the nswer nnot e determined mehnilly in generl. The undeidility of the Hlting Prolem ws first proved y British mthemtiin, logiin, nd omputer sientist Aln Turing (92 954): Type or reursively enumerle lnguges Deidle lnguges Turing mhines Type or ontext sensitive lnguges Type 2 or ontext free lnguges pushdown utomt Type 3 or regulr lnguges finite utomt.2 Exmple: The hlting prolem Consider the following progrm. Does it terminte for ll vlues of n? while (n > ) { if even(n) { n = n / 2; } else { n = n * 3 + ; } } This is not s esy to nswer s it might first seem. Sy we strt with n = 7, for exmple: 7, 22,, 34, 7, 52, 26, 3, 4, 2,, 5, 6, 8, 4, 2, Note how the numers oth inrese nd derese in wy tht is very hrd to desrie,whihisextlywhyitissohrdtonlysethisprogrm.thesequene involved is known s the hilstone sequene, nd Colltz onjeture sys tht the numer will lwys e rehed. And, in ft, for ll numers tht hve een Turing proved this result using Turing Mhines, n strt model of omputtion tht he introdued in 936 to give preise definition of wht prolems re effetively lulle (n e solved mehnilly). Turing ws further instrumentl in the suess of British ode reking efforts during World Wr II nd is lso fmous for the Turing test to deide if mhine exhiits intelligent ehviour equivlent to, or indistinguishle from, tht of humn. Andrew Hodges hs written very good iogrphy of Turing: Aln Turing: the Enigm (http://www.turing.org.uk/turing/)..3 Exmple: The λ-lulus The λ-lulus is theory of pure funtions. It is very simple, hving only two onstruts: definition nd pplition of funtions. The folloing is n exmple of λ-lulus term: (λx.x)(λy.y) This does not men tht it is impossile to ompute the nswer for every instne of suh prolem. On the ontrry, in speifi ses, the nswer n often e omputed very esily, nd progrms tht ttempt to solve undeidle prolems n e very useful. But if we do write suh progrm, we must neessrily e prepred to give up one wy or nother with don t know nswer. 5 6

Like Turing mhines, the λ-lulus is universl model of omputtion. It ws introdued y Amerin mthemtiin nd logiin Alonzo Churh (93 995), lso in 936, little erlier thn Turing s mhine: There is n undne of importnt prolems where solutions n e heked quikly, ut where the est known lgorithm for finding solution is exponentil in the size of the prolem. As n exmple, here is one, the Suset Sum Prolem: Does some nonempty susetofgivensetofintegerssumtozero?forexmple,given{3, 2,8, 5,4,9}, the nonempty suset { 5, 2,3,4} sums to. It is esy to hek proposed solution: just dd ll the numers. If the initil set ontins n integers, ny proposed solution (eing suset) ontins t most n integers, so we n sum ll the elements with t most n dditions mening the totl time tken is proportionl to n (ssuming ddition is onstnt time opertion). However, for finding solution, no etter wy is known thn essentilly heking eh possile suset one fter nother. As there re 2 n susets of set with n elements, this mens finding solution tkes exponentil time. Whether or not there is etter wy to solve the Suset Sum Prolem might not seem prtiulrly importnt, ut if it were the se tht P = NP, then this would hve monumentl prtil implitions. For exmple, puli key ryptogrphy, on whih pretty muh ll seure Internet ommunition, suh s HTTPS, hinges, would no longer provide dequte seurity, nd the entire Internet seurity infrstruture would hve to e redesigned nd reimplemented. The question here is if it is possile to quikly find the prime ftors of (very) lrge numers. As long s tht is not the se, puli key ryptogrphy is onsidered seure. Aln Turing susequently eme PhD student of Alonzo Churh. They proved tht Turing mhines nd the λ-lulus re equivlent in terms of omputtionl expressiveness. In ft, ll proposed universl models of omputtion to dte hve proved to e equivlent in tht sense. This is ptured y the Churh- Turing thesis: Wht is effetively lulle is extly wht n e omputed y Turing mhine. Funtionl progrmming lnguges, like Hskell, nd mny proof ssistnts implement (vritions of) the λ-lulus. This is thus n exmple of theory with very diret nd prtil pplitions..4 P versus NP Here is seemingly innouous question: Cn every prolem whose solution n e heked quikly y omputer lso e solved quikly y omputer? Quikly here mens in time proportionl to polynomil in the size of the prolem. Whether or not this is ths se is known s the P versus NP prolem nd it is likely the most fmous open prolem in omputer siene, dting k to the 95s. Here, P refers to the lss of prolems tht n e solved in polynomil time, while NP refers to prolems tht n e solved in nondeterministi polynomil time, nd the question is thus whether these two lsses of prolems tully re the sme, or P = NP. 7 8

2 Forml Lnguges In this ourse we will use the terms lnguge nd word in different wy thn in everydy lnguge: A lnguge is set of words. A word is sequene, or string, of symols. We will write ǫ for the empty word; i.e., sequene of length. This leves us with the question: wht is symol? The nswer is: nything, ut it hs to ome from n lphet Σ tht is finite set. A ommon (nd importnt) instne is Σ = {,}. Note tht ǫ will never e symol to void onfusion. Mthemtilly we sy: Given n lphet Σ we define the set Σ s set of words (or sequenes) over Σ: the empty word ǫ Σ nd given symol x Σ nd word w Σ we n form new word xw Σ. These re ll the wys elements on Σ n e onstruted (this is lled n indutive definition). This unry -opertor is known s the Kleene str (or Kleene opertor or Kleene losure). With Σ = {,}, typil elements of Σ re,,ǫ. Note, tht we only write ǫ if it ppers on its own, insted of ǫ we just write. Note further tht Σ y definition is lwys nonempty s the empty word ǫ elongs to Σ for ny lphet Σ, inluding Σ =. Moreover,for ny nonempty lphet Σ, Σ is n infinite set. It is importnt to relise tht lthough there re infinitely mny words over nonempty lphet Σ, eh word hs finite length. At first this my seem strnge: how n it e tht ll elements of set with infinitely mny elements n e finite? A good wy to think of n infinite set is s proess tht n generte new element whenever we need one, s mny times s we like 2. But eh suh element n oviously e of finite size s we t ny point in time will only hve sked for finitely mny elements. Conversely, if we mke set ontining single (notionlly) infinite element, suh s numer lrger thn ny numer exept itself, or n infinitely long string, tht does not mke the set itself infinite: it would still ontin extly one element. We n now define the notion of lnguge L over n lphet Σ preisely: L Σ or equivlently L P(Σ ) 3. Here re some informl exmples of lnguges: The set {,,ǫ} is lnguge over Σ = {,}. This is n exmple of finite lnguge. The set of words with odd length over Σ = {}. The set of words tht ontin the sme numer of s nd s is lnguge over Σ = {,}. 2 Indeed, this is extly how infinite dt strutures, suh s infinite lists, re relised in lzy lnguges like Hskell. 3 Given set A, P(A) is the powerset of A; tht is, the set of ll possile susets of A. For exmple, if A = {,}, then P(A) = {, {}, {}, {,}}. The numer of elements A of set A, its rdinlity, nd the numer of elements in its power set re relted y P(A) = 2 A. Hene powerset. 9 The set of words tht ontin the sme numer of s nd s modulo 2 (i.e., oth re even or odd) is lnguge over Σ = {,}. The set of plindromes using the English lphet, e.g. words tht red the sme forwrds nd kwrds like. This is lnguge over {,,...,z}. The set of orret Jv progrms. This is lnguge over the set of UNICODE hrters (whih orrespond to numers etween nd 7 2 6, less some invlid surnges, 262 vlid enodings in ll). The set of progrms tht, if exeuted on Windows mhine, prints the text Hello World! in window. This is lnguge over Σ = {,}. Note the distintion etween ǫ,, nd {ǫ}! ǫ denotes the empty word, sequene of symols of length. denotes the empty set, set with no elements. {ǫ} is set with extly one element: the empty word. In prtiulr, note tht ǫ is different type ( sequene) from nd {ǫ} (tht re oth sets). An importnt opertion on Σ is ontention. This is denoted y juxtpositioning (or, if you prefer, y n invisile opertor ): given u,v Σ we n onstrut new word uv Σ simply y ontenting the two words. We n define this opertion y primitive reursion: ǫv = v (xu)v = x(uv) Contention is ssoitive nd hs unit ǫ: u(vw) = (uv)w ǫu = u = uǫ where u, v, w re words. We use exponent nottion to denote ontention of word with itself. For exmple, u 2 = uu, u 3 = uuu, nd so on. By definition, u = u nd u = ǫ, the unit of ontention. Thus we n simplify repeted ontention using fmilir-looking lws. For exmple: u u u 2 = u 3. Contention of words is extended to ontention of lnguges y: For exmple: MN = {uv u M v N} M = {ǫ,,} N = {, } MN = {uv u {ǫ,,} v {,}} = {ǫ,ǫ,,,,} = {,,,,,} Some importnt properties of lnguge ontention re:

Contention of lnguges is ssoitive: L(MN) = (LM)N Contention of lnguges hs zero : L = = L Contention of lnguges hs unit {ǫ}: L{ǫ} = L = {ǫ}l Contention distriutes through set union: L(M N) = LM LN (L M)N = LN MN Note tht ontention does not distriute through intersetion! Counterexmple. Let L = {ǫ,}, M = {ǫ}, N = {}. Then:. Desrie L in plin English. 2. Enumerte ll the words in L. 3. In generl,for n ritrrylphet Σ nd m n, how mny words re there in the lnguge L = {w w Σ,m w n}? Tht is, write down n expression for L. 4. How mny words would there e in L if Σ = Σ, m = 3, nd n = 7? Exerise 2.2 Let the lphet Σ = {,,} nd let L = {ǫ,,} nd L 2 = {,,} e two lnguges over Σ. Enumerte the words in the following lnguges, showing your lultions in some detil:. L 3 = L L 2 2. L 4 = L {ǫ}(l 2 L ) 3. L 5 = L 3 L 4 L(M N) = L = LM LN = {ǫ,} {,} = {} Exponent nottion is used to denote iterted lnguge ontention: L = L, L 2 = LL, L 3 = LLL, nd so on. By definition, L = {ǫ} (for ny lnguge, inluding ), whih is the unit for lnguge ontention (just s u = ǫ is the unit for ontention of words). The Kleene str n lso e pplied to lnguges. This intuitively mens lnguge ontention iterted or more times: L = L n n= Note tht ǫ L for ny lnguge L, inluding L =, the empty lnguge. As n exmple, if L = {,}, then L = {ǫ,,,,,,,...}. Alterntively (nd more strtly), L n e desried s the lest lnguge (with respet to ) tht ontins L nd the empty word, ǫ, nd is losed under ontention: u L v L = uv L Note the sutle differene etween using the Kleene str on n lphet Σ, set of symols, s in Σ, nd on using the Kleene str on lnguge L Σ, set of words. While the result in oth ses is set of words, the types of the rguments to the two vrints of the Kleene str opertion differ. 2. Exerises Exerise 2. Let the lphet Σ = {3,5,7,9}, nd let the lnguge L = {w w Σ, w 2}. (If w is word, w denotes the length of tht word. If X is finite set, like n lphet or finite lnguge, X denotes the numer of elements in tht set, its rdinlity.) Answer the following questions: 2

3 Finite Automt Finite utomt orrespond to omputer with fixed finite mount of memory 4. We will introdue deterministi finite utomt (DFA) first nd then move to nondeterministi finite utomt (NFA). An utomton will ept ertin words (sequenes of symols of given lphet Σ) nd rejet others. The set of epted words is lled the lnguge of the utomton. We will show tht the lss of lnguges tht re epted y DFAs nd NFAs is the sme. 3. Deterministi finite utomt 3.. Wht is DFA? A deterministi finite utomton (DFA) A = (Q,Σ,δ,q,F) is given y:. A finite set of sttes Q 2. A finite set of input symols, the lphet, Σ 3. A trnsition funtion δ Q Σ Q 4. An initil stte q Q 5. A set of finl sttes F Q The initil sttes re sometimes lled strt sttes, nd the finl sttes re sometimes lled epting sttes. As n exmple onsider the following utomton where D = ({q,q,q 2},{,},δ D,q,{q 2}) δ D = {((q,),q ),((q,),q ),((q,),q ),((q,),q 2),((q 2,),q 2),((q 2,),q 2)} if we view funtion s set of rgument-result pirs. Alterntively, we n define it se y se: δ D(q,) = q δ D(q,) = q δ D(q,) = q δ D(q,) = q 2 δ D(q 2,) = q 2 δ D(q 2,) = q 2 A DFA my e more onveniently represented y trnsition tle. The trnsition tle for the DFA D is: δ D q q q q q q 2 q 2 q 2 q 2 4 However, tht does not men tht finite utomt re good model of generl purpose omputers. A omputer with n its of memory hs 2 n possile sttes. Tht is n solutely enormous numer even for very modest memory sizes, sy 24 its or more, mening tht desriing omputer using finite utomt quikly eomes infesile. We will enounter etter model of omputers lter, the Turing Mhines. 3 A trnsition tle represents the trnsition funtion δ of DFA; i.e., the vlue of δ(q,x) is given y the row lelled q in the olumn lelled x. In ddition, the initil stte is identified y putting n rrow to the left of it, nd ll finl sttes re similrly identified y str. The inlusion of this dditionl informtion mkes trnsition tle self-ontined representtion of DFA. Note tht the initil stte lso n e finl (epting). For exmple, for vrition D of the DFA D where q lso is finl: δ D q q q q q q 2 q 2 q 2 q 2 Another wy to represent DFA is through trnsition digrm. The trnsition digrm for the DFA D is:, q q q 2 The initil stte is identified y n inoming rrow. Finl sttes re drwn with doule outline. If δ(q,x) = q then there is n rrow from stte q to q tht is lelled x. For nother exmple, here is the trnsition digrm for the DFA D :, q q q 2 An lterntive to the doule outline for finl stte is to use n outgoing rrow. Using tht onvention, the trnsition digrm for the DFA D is:, q q q 2 Here is n exmple of lrger DFA over the lphet Σ = {,,} represented y trnsition digrm: A B C D E 4 F G

The sttes re nmed y pitl letters this time for it of vrition: Q = {A,B,C,D,E,F,G}. While it is ommon to use symols q i, i N to nme sttes, we n pik ny nmes we like. Another ommon hoie is to use nturl numers; i.e., Q N Q is finite. The representtion of the ove DFA s trnsition tle is: δ A B C D E F G 3..2 The lnguge of DFA B C A B D A E C A E C F B D F B C G B C F We will now disuss how DFA epts or rejets words over its lphet of input symols. The set of words epted y DFA A is lled the lnguge L(A) of the DFA. Thus, for DFA A with lphet Σ, L(A) Σ. To determine whether word w L(A), the mhine strts in its initil stte. Tking the DFA D ove s n exmple, it would strt in stte q. We indite the stte of DFA y underlining the stte nme:, q q q 2 Then, whenever n input symol is red from w, the mhine trnsitions to new stte y following the edge lelled with this symol. One ll symols in the input word w hve een red, the word is epted if the stte is finl, mening w L(A), otherwise the word is rejeted, mening w / L(A). To ontinue with the exmple, suppose w =. The mhine D would thus first red nd trnsition to new stte y following the edge lelled. As tht edge in this se forms loop k to stte q, the mhine D would trnsition k into stte q :, q q q 2 The mhine would then red nd trnsition to stte q y following the edge lelled. We indite this y moving the mrk long tht edge to q :, q q q2 Finlly, the mhine would red the lst in the input word, moving to q 2: q q q 2 Asq 2 isfinlstte,thedfad eptsthewordw =,mening L(D). In the sme wy, we n determine tht / L(D), / L(D), ut L(D). Verify this. Indeed, little it of thought revels tht, L(D) = {w w ontins the sustring } To mke the notion of the lnguge of DFA preise, we now give forml definition ofl(a). Firstwedefinethe extended trnsition funtion ˆδ Q Σ Q. Intuitively, ˆδ(q,w) = q if the mhine strting from stte q ends up in stte q when reding the word w. Formlly, ˆδ is defined y primitive reursion: ˆδ(q, ǫ) = q () ˆδ(q,xw) = ˆδ(δ(q,x),w) (2) where x Σ nd w Σ. Thus, xw stnds for nonempty word the first symol of whih is x nd the rest of whih is w. For exmple, if xw =, then x = nd w =. Note tht w my e empty; e.g., if xw =, then x = nd w = ǫ. As n exmple, we lulte ˆδ D(q,) = q : ˆδ D(q,) = ˆδ D(δ D(q,),) y (2) = ˆδ D(q,) euse δ D(q,) = q = ˆδ D(δ D(q,),) y (2) = ˆδ D(q,) euse δ D(q,) = q = ˆδ D(δ D(q,),ǫ) y (2) = ˆδ D(q 2,ǫ) euse δ D(q,) = q 2 = q 2 y () Using the extended trnsition funtion ˆδ, we define the lnguge L(A) of DFA A formlly: L(A) = {w ˆδ(q,w) F} Returningtoourexmple,wethushvetht L(D)euse ˆδ D(q,) = q 2 nd q 2 F D. 3.2 Nondeterministi finite utomt 3.2. Wht is n NFA? Nondeterministi finite utomt (NFA) hve trnsition funtions tht mp given stte nd n input symol to zero or more suessor sttes. We n think ofthis sthe mhine hving hoie wheneverthere retwoormorepossile trnsitions from stte on n input symol. In this presenttion, we will further 5 6

llow n NFA to hve more thn one initil stte 5. Agin, we n think of this s the mhine hving hoie of where to strt. An NFA epts word w if there is t lest one possile wy to get from one of the initil sttes to one of the finl sttes long edges lelled with the symols of w in order. It is importnt to note tht lthough n NFA hs nondetermisti trnsition funtion, it n lwys e determined whether or not word elongs to its lnguge. Indeed, we shll see tht every NFA n e trnslted into n DFA tht epts the sme lnguge. Here is n exmple of n NFA C tht epts ll words over Σ = {,} suh tht the symol efore the lst is :,, q q q 2 A nondeterministi finite utomton (NFA) A = (Q,Σ,δ,S,F) is given y:. A finite set of sttes Q, 2. A finite set of input symols, the lphet, Σ, 3. A trnsition funtion δ Q Σ P(Q), 4. A set of initil sttes S Q, 5. A set of finl (or epting) sttes F Q. Thus, in ontrst to DFA, n NFA my hve mny initil sttes, not just one, nd its trnsition funtion mps stte nd n input symol to set of possile suessor sttes, not just single stte. As n exmple we hve tht where δ C is given y C = ({q,q,q 2},{,},δ C,{q },{q2}) δ C q {q } {q,q } q {q 2} {q 2} q 2 Note tht the entries in the tle re sets of sttes, nd tht these sets my e empty ( ), here exemplified y the entries for stte q 2. Agin, the (in this se only) initil stte hs een mrked with nd the (in this se only) finl stte mrked with to mke this self-ontined representtion of the NFA. Here is nother exmple of n NFA, this time overthe lphet Σ = {,,} nd with sttes Q = {,,2,3,4,5} N: 5 Note tht we diverge slightly from the definition in the ook [HMU], whih uses single initil stte insted of set of initil sttes. Permitting more thn one initil stte llows us to void introduing ǫ-nfas (see [HMU], setion 2.5). 2, 3 4 5, The trnsition tle for this NFA is: δ {} {2} {4} {3, 4} 2 {3, 4} {4} 3 {} {2} {3} 4 {5} 5 {4} Note tht this NFA hs multiple initil sttes, multiple finl sttes, one initil stte tht lso is finl, nd tht there in some ses re no possile suessor sttes nd in other ses more thn one. 3.2.2 The lnguge epted y n NFA To see whether word w Σ is epted y n NFA A, we hve to onsider ll possile sttes the mhine ould e in fter hving red sequene of input symols. Initilly, n NFA n e in ny of its initil sttes. Eh time n input symol is red, ll suessor sttes on the red symol for eh urrent possile stte eome the new possile sttes. After hving red omplete word w, if t lest one of the possile sttes is finl (epting), then tht word is epted, mening w L(A), otherwise it is rejeted, mening w / L(A). We will illustrte y showing how the NFA C rejets the word. We will gin mrk the urrent sttes of the NFA y underlining the stte nmes, ut this time there my e more thn one mrked stte t one. Initilly, s q is the only initil stte, we would hve:, q, q q 2 Eh time when we red symol we look t ll the mrked sttes. We remove the old mrkers nd put mrkers t ll the sttes tht re rehle vi n rrow mrked with the urrent input symol. This my inlude one or more sttes tht were mrked previously. It my lso e the se tht no sttes re rehle, in whih se ll mrks re removed nd the word rejeted (s it no longer is possile to reh ny finl sttes). In our exmple, fter reding, there would e two mrked sttes s there re two rrows from q lelled :, q q, q2 7 8

After reding, the next symol in the word, there would still e two mrked sttes s the mhine on input n reh q from q nd q 2 from q :, q, q q 2 Note tht one of the mrked sttes is finl (epting) stte, mening the word red so fr () is epted y the NFA. However, there is one symol left in our exmple word, nd fter hving red the finl, the finl stte would no longer e mrked euse it nnot e rehed from ny of the mrked sttes:, q, q q 2 The NFA C thus rejets the word. For nother exmple, onsider the NFA t the end of setion 3.2.. Convine yourself tht you understnd how this NFA epts the words ǫ,,, nd rejets. We illustrte y tring its opertion on the word. We strt y mrking ll initil sttes. Then it is just mtter of systemtilly exploring ll possiilities: After reding : After reding : 2 2 2, 3 4 5,, 3 4 5,, 3 4 5, 9 After reding : After reding : After reding : 2 2 2, 3 4 5,, 3 4 5,, 3 4 5, The mhine thus rejets s no finl stte is mrked. In ft, s there re no mrked sttes left t ll, this shows tht this NFA will rejet ll words tht strt. Cn you find other suh prefixes? To define the extended trnsition funtion ˆδ for NFAs we use generlistion of the union opertion on sets over (finite) set of sets: {A,A 2,...A n} = A A 2 A n In the speil ses of the empty set of sets nd one element set of sets: = {A} = A As n exmple {{},{2,3},{,3}}= {} {2,3} {,3} = {,2,3} Alterntively, we n define y omprehension, whih lso extends the opertion to infinite sets of sets (lthough we don t need this here): B = {x A B.x A} 2

We define ˆδ P(Q) Σ P(Q) suh tht ˆδ(P,w) is the set of sttes tht re rehle from one of the sttes in P on the word w: ˆδ(P, ǫ) = P (3) ˆδ(P,xw) = ˆδ( {δ(q,x) q P},w) (4) where x Σ nd w Σ. Intuitively, if P re the possile sttes, then ˆδ(P,w) re the possile sttes fter hving red word w. To illustrte, we lulte ˆδ C(q,): ˆδ C({q },) = ˆδ C( {δ C(q,) q {q }},) y (4) = ˆδ C(δ C(q,),) = ˆδ C({q,q },) = ˆδ C( {δ C(q,) q {q,q }},) y (4) = ˆδ C(δ C(q,) δ C(q,),) = ˆδ C({q } {q 2},) = ˆδ C({q,q 2},) = ˆδ C( {δ C(q,) q {q,q 2}},ǫ) y (4) = ˆδ C(δ C(q,) δ C(q 2,),) = ˆδ C({q },ǫ) = {q } y (3) Of ourse, we lredy knew this from the worked exmple ove illustrting how the NFA C rejets. Mke sure you see how the mrked sttes fter eh step oinides with the set of possile sttes in the lultion. The lnguge of n NFA n now e defined using ˆδ: Thus, / L(C) euse L(A) = {w ˆδ(S,w) F } ˆδ C(S C,) F C = ˆδ C({q },) {q 2} = {q } {q 2} = 3.2.3 The suset onstrution DFAs n e viewed s speil se of NFAs; i.e., those for whih the there is preisely one strt stte (S = {q }) nd for whih the trnsition funtion lwys returns singleton (one-element) sets (δ(q,x) = {q } for ll q Q nd x Σ). The opposite is lso true, however: NFAs re relly just DFAs in disguise. We show this y for given NFA systemtilly onstruting n equivlent DFA; i.e., DFA tht epts the sme lnguge s the given NFA. NFAs re thus no more powerful thn DFAs; i.e., NFAs nnot desrie more lnguges thn DFAs. However, in some ses, NFAs need lot fewer sttes thn the orresponding DFA, nd they my e esier to onstrut in the first ple. The suset onstrution: Given n NFA A = (Q,Σ,δ,S,F) we onstrut the equivlet DFA: D(A) = (P(Q),Σ,δ D(A),S,F D(A) ) where δ D(A) (P,x) = {δ(q,x) q P} (5) F D(A) = {P P(Q) P F } (6) 2 The si ide of this onstrution is to define DFA whose sttes re sets of NFA sttes. A set of possile NFA sttes thus eomes single DFA stte. The DFA trnsition funtion is given y onsidering ll rehle NFA sttes for eh of the urrent possile NFA sttes for eh input symol. The resulting set of possile NFA sttes is gin just single DFA stte. A DFA stte is finl if tht set tht ontins t lest one finl NFA stte. As n exmple, let us onstrut DFA D(C) equivlent to C ove: where δ D(C) is given y: D(C) = (P({q,q,q 2},{,},δ D(C),{q },F D(C) ) δ D(C) {q } {q } {q,q } {q } {q 2} {q 2} {q 2} {q,q } {q,q 2} {q,q,q 2} {q,q 2} {q } {q,q } {q,q 2} {q 2} {q 2} {q,q,q 2} {q,q 2} {q,q,q 2} nd F D(C) (ll the sttes mrked with ove) y: The trnsition digrm is: F D(C) = {{q 2},{q,q 2},{q,q 2},{q,q,q 2}} {q } {q,q } {q } {q,q 2},,, {q 2} {q,q 2} {q,q,q 2} Aepting sttes hve een mrked y outgoing rrows. Note tht some of the sttes (,{q },{q 2},{q,q 2}) nnot e rehed from the initil stte. This mens tht they n e omitted without hnging the lnguge. We thus otin the following utomton: 22,

{q,q 2} We then proeed to tulte δ D(N) for eh of the new sttes for eh x Σ, dding ny further new sttes to the tle: {q } {q,q } {q,q,q 2} δ D(N) 2 {q } {q 2} {q,q 3} {q 2} {q } {q } {q,q 3} {q 4} = {q 4} {q } = {q } {q } {q 4} = {q,q 4} It is possile to void hving to perform lultions for sttes tht nnot e rehed y rrying out the suset onstrution in demnd driven wy. The ide is to strt from the initil DFA stte, whih is just the set of initil NFA sttes S, nd then only onsider the DFA sttes (susets of NFA sttes) tht pper during the ourse of the lultions. We illustrte this pproh y n exmple. Consider the following NFA N: N = (Q N = {q,q.q 2,q 3,q 4}, Σ N = {,, 2}, δ N, S N = {q }, F N = {q 4}) where δ N is given y the trnsition digrm:,2 q,2 q q 2,2 q 3 q 4 Note tht N hs 5 sttes whih mens tht the DFA D(N) hs P(Q N) = 2 5 = 32 sttes. However, s we will see, only hndful of those 32 sttes n tully e rehed from the initil stte S N of D(N). We would thus wste quite it of effort if we were to tulte ll of them. We strt from S N = {q }, the set of strt sttes of N, nd we we ompute {δ(q,x) q SN} for eh x Σ N (eqution (5)). In this se we get: δ D(N) 2 {q } {q 2} {q,q 3} Whenever we enounterq stte P Q of D(N) tht hs not een onsidered efore, we dd P to the tle, mrking ny finl sttes s suh. In this se, three new DFA sttes emerge ({q 2}, {q,q 3}, nd ), none of whih is finl: δ D(N) 2 {q } {q 2} {q,q 3} {q 2} {q,q 3} 23 Here,twonew sttesemerge({q 4}nd {q,q 4}),oth finl (euse{q 4} F N nd {q,q 4} F N ): δ D(N) 2 {q } {q 2} {q,q 3} {q 2} {q } {q } {q,q 3} {q 4} = {q 4} {q } = {q } {q } {q 4} = {q,q 4} {q 4} {q,q 4} This proess is repeted until no new sttes emerges. Tulting for the lst two new sttes revels tht no further sttes emerge in this se nd we re thus done, hving only hd to tulte for 6 rehle out of the 32 DFA sttes: δ D(N) 2 {q } {q 2} {q,q 3} {q 2} {q } {q } {q,q 3} {q 4} = {q 4} {q } = {q } {q } {q 4} = {q,q 4} {q 4} {q,q 4} {q 2} = {q 2} {q,q 3} = {q,q 3} = After doule heking tht we hve not forgotten to mrk ny finl sttes, we n drw the trnsition digrm for D(N): {q },2 {q 2} {q,q 3} {q,q 4} Aepting sttes hve een mrked y outgoing rrows. 2 2 24 2,,2 {q 4},,2

3.2.4 Corretness of the suset onstrution We still hve to onvine ourselves tht the suset onstrution tully works; i.e., tht for given NFA A it relly is the se tht L(A) = L(D(A)). We strt y proving the following lemm, whih sys tht the extended trnsition funtions oinide: Lemm 3. ˆδ D(A) (P,w) = ˆδ A(P,w) The result of oth funtions is set of sttes of the NFA A: for the left-hnd side euse the extended trnsition funtion on NFAs returns set of sttes, nd for the right-hnd side euse the sttes of D(A) re sets of sttes of A. Proof. We show this y indution over the length of the word w, w. w = Then w = ǫ nd we hve w = n+ Then w = xv with v = n. ˆδ D(A) (P,ǫ) = P y () = ˆδ A(P,ǫ) y (3) ˆδ D(A) (P,xv) = ˆδ D(A) (δ D(A) (P,x),v) y (2) = ˆδ A(δ D(A) (P,x),v) ind.hyp. = ˆδ A( {δ A(q,x) q P},v) y (5) = ˆδ A(P,xv) y (4) We n now use the lemm to show Theorem 3.2 Proof. L(A) = L(D(A)) w L(A) Definition of L(A) for NFAs ˆδ A(S,w) F Lemm 3. ˆδ D(A) (S,w) F Definition of F D(A) ˆδ D(A) (S,w) F D(A) Definition of L(A) for DFAs w L D(A) 3.3 Exerises Exerise 3. Let the lphet Σ = {,} nd onsider the following DFA A: A = (Q A = {,,2,3}, Σ, δ A, q =, F A = {,2}) δ A = {((,),), ((,),2), ((,),), ((,),3), ((2,),3), ((2,),), ((3,),2), ((3,),)} (Here tuple nottion is used to define the mpping of the trnsition funtion δ A; thus δ A(,) =, δ A(,) = 2, et.) For the DFA A:. Drw its trnsition digrm. 2. Determine whih of the following words elong to L(A):. ǫ 2. 3. 4. 3. Expliitly lulte ˆδ A(,). 4. Desrie the lnguge tht the utomton reognises in English. Exerise 3.2 Construt DFA B over Σ = {,,,d} epting ll words where the numer of s is multiple of3. E.g. d L(B) (3 s), ut dd / L(B) (4 s, 4 is not multiple of 3). Explin your onstrution. In prtiulr, explin why you hose to hve the numer of sttes you did, nd explin the purpose (or mening ) of eh stte. Exerise 3.3 ForthelphetΣ = {,,2,3},onstrutDFAC thtpreiselyreognizes the words for whih the rithmeti sum of the onstituent symols is divisile y 5. For exmple, ǫ L(C) (there re no symols in the empty string, the sum is thus whih is divisile y 5), L(C) (the sum is gin ), nd 233 L(C) (2+3++3+ = whih is divisile y 5), ut 33 / L(C) (+3+3 = 7 whih is not divisile y 5). Explin your onstrution. Corollry 3.3 NFAs nd DFAs reognise the sme lss of lnguges. Proof. We hve notied tht DFAs re just speil se of NFAs. On the other hnd the suset onstrution introdued ove shows tht for every NFA we n find DFA tht reognises the sme lnguge. 25 26

Exerise 3.4 Consider the following NFA A over Σ A = {,,}:,, q q 2 q q 3 q 4. Construt DFA D(B) equivlent to B using the suset onstrution nd drw the trnsition digrm for D(B), ignoring unrehle sttes. Clerly show eh step of your lultions, e.g. in trnsition tle. 2. Crry out snity hek on your resulting DFA D(B) s follows. () Give two words over Σ B tht re epted y the NFA B nd two tht re not. At lest two of those should e four symols long or longer. () Chekthtthe DFAD(B) eptsthefirsttwowordsndrejetsthe other two, extly like the NFA B. Justify your nswer y listing the sequene of sttes the DFA D(B) goes through for eh word, nd stting whether or not the lst stte of tht sequene is epting.. Whih of the following words re epted y A nd whih re not? () ǫ () () (d) (e) 2. Construt DFA D(A) equivlent to A using the suset onstrution. Clerly show eh step of your lultions in trnsition tle. Hint: Some of the 32 sttes (i.e., the 2 QA = 2 5 = 32 possile susets of Q A) tht would rise y pplying the suset onstrution lindly to A my e unrehle. You n therefore dopt strtegy where you only onsider sttes rehle from the initil sttes, S A. 3. Drw the trnsition digrm for D(A), ignoring unrehle sttes. Exerise 3.5 Consider the following NFA B over Σ B = {,}: q q q 3 q 4 q 2 27 28

4 Regulr Expressions To repitulte, given n lphet Σ, lnguge is set of words L Σ. So fr, we hve desried lnguges either using set theory (expliit enumertion or set omprehensions) or through finite utomt. The key enefits of using utomt is tht they n desrie infinite lnguges (unlike enumertion) nd tht they diretly give mehnil wy to determine lnguge memership (unlike omprehensions). However, from n utomton, it is not usully immeditely ovious wht the lnguge of tht utomton is, nd onversely, given high-level desription of lnguge, it is often not ovious if it is possile to desrie the lnguge using finite utomton. This setion introdues regulr expressions: onise nd muh more diret wy to desrie lnguges. Moreover, regulr expression n mehnilly e trnslted into finite utomton tht ept preisely the lnguge desried. This opens up for mny prtil pplitions s lnguges n oth e desried nd reognised with ese. In ft, the opposite is lso true: given finite utomton, it is possile to trnslte tht into n equivlent regulr expression. Finite utomt nd regulr expressions re thus interonvertile, mening tht they desrie the ext sme lss of lnguges: the regulr lnguges or, ording to the Chomsky hierrhy, type 3 lnguges (setion.). One pplition of regulr expressions is to define ptterns in progrms suh s grep. Given regulr expression nd sequene of text lines s input, grep returns those lines tht mth the regulr expression, where mthing mens tht the line ontins sustring tht is in the lnguge denoted y the regulr expression. The syntx used y grep for regulr expressions is slightly different from the one used here, nd grep further supports some onvenient revitions. However, the underlying theory is extly the sme. Other pplitions for regulr expressions inlude defining the lexil syntx of progrmming lnguges; i.e., wht si symols, or tokens, suh s identifiers, keywords, numeri literls look like, s well other lexil spets suh s white spe nd omments. The ontext-free syntx(see setion 7) of progrmming lnguge is then defined in terms of the tokens; i.e., the tokens effetively onstitute the lphet of the lnguge. In ft, regulr expression mthing hs so mny pplitions tht mny progrmming lnguges provide support for this pility, either uilt-in or vi lirries. Exmples inlude Perl, PHP, Python, nd Jv. In the pst, some of those implementtions were it nive s the regulr expressions were not ompiled into finite utomt. As result, mthing ould e very slow, s explined in the pper Regulr Expression Mthing Cn Be Simple And Fst (ut is slow in Jv, Perl, PHP, Python, Ruy,...) [Cox7]. This pper is very good red, nd one you hve red these leture notes up to nd inluding the present setion, you will e le to ppreite it fully. 4. Wht re regulr expressions? GivennlphetΣ(e.g.,Σ = {,,,...,z}),thesyntx (i.e., form) ofregulr expressions over Σ is defined indutively s follows:. is regulr expression. 2. ǫ is regulr expression. 29 3. For eh x Σ, x is regulr expression 6. 4. If E nd F re regulr expressions then E +F is regulr expression. 5. If E nd F re regulr expressions then EF (juxtpositioning; just one fter the other) is regulr expression. 6. If E is regulr expression then E is regulr expression. 7. If E is regulr expression then (E) is regulr expression 7. These re ll regulr expressions. To illustrte, here re some exmples of regulr expressions: ǫ hllo hllo+hello h(+e)llo (ǫ+)() (ǫ+) As in rithmeti, there re onventions for reding regulr expressions: inds stronger thn juxtpositioning nd +. For exmple, is red s ( ). Prentheses must e used to enfore the reding (). Juxtpositioning inds stronger thn +. For exmple, +d is red s ()+(d). Prentheses must e used to enfore the reding (+)d. 4.2 The mening of regulr expressions In the previous setion, we defined the syntx of regulr expressions, their form. We now proeed to define the semntis of regulr expressions; i.e., wht they men, wht lnguge regulr expression denotes. To nswer this question, first rell the definition of ontention of ontention of lnguges from setion 2: L L 2 = {uv u L v L 2} We further rell the the Kleene str opertion from the sme setion (2): L = L n n= To eh regulr expression E over Σ we ssign lnguge L(E) Σ s its mening or semntis. We do this y indution over the definition of the syntx: 6 Note tht the regulr expression here is typeset in oldfe, like, to distinguish is from the orresponding symol, like, typeset in type-writer font in this nd the next setion (nd on osion lter on s well). Underlining is sometimes used s n lterntive to oldfe. 7 The prentheses hve een typeset in oldfe to emphsise tht they re prt of the syntx of the regulr expression. 3

. L( ) = 2. L(ǫ) = {ǫ} 3. L(x) = {x} where x Σ. 4. L(E +F) = L(E) L(F) 5. L(EF) = L(E)L(F) 6. L(E ) = L(E) 7. L((E)) = L(E) Sutle points: In (), the symol is used oth s regulr expression nd s the empty set (empty lnguge). Similrly, ǫ in (2) is used in two wys: s regulr expression nd s the empty word. In (3), the regulr expression is typeset in oldfe to distinguish it from the orresponding symol. In (6), the -opertor is used oth to onstrut regulr expression (prt of the syntx) nd s n opertion on lnguges. In (7), the inner prentheses on the left-hnd side, typeset in oldfe, re prt of the syntx of regulr expressions. Let us now lulte the mening of eh of the regulr expression exmples from the previous setion; i.e., the lnguge denoted in eh se: ǫ: By (2): hllo: Consider L(h). By (3): L(ǫ) = {ǫ} L(h) = {h} L() = {} Hene, y (5) nd lnguge ontention (setion 2): L(h) = L(h)L() = {uv u L(h) vl() = {uv u {h} v {}} = {h} Continuing the sme resoning we otin: hllo+hello: L(hllo) = {hllo} From ove we know L(hllo) = {hllo} nd L(hello) = {hello}. By (4) we then get: L(hllo + hello) = {hllo} {hello}} = {hllo, hello} h(+e)llo: By (3) nd (4) we know L(+e) = {,e}. Thus, using (5) nd lnguge ontention, we otin: : L(h(+e)llo) = L(h)L(+e)L(llo) By (6): L( ) = L() = {uvw u L(h) v L(+e) w L(llo)} = {uvw u {h} v {,e} w {llo}} = {hllo, hello} = {} = {} n n= = {w w...w n i n, w i {}} n= = { n } n= = { n n N} Using (5) nd lnguge ontention, this llows us to onlude: L( ) = L( )L( ) = {uv u L( ) v L( )} = {uv u { m m N} v { n n N}} = { m n m,n N} Tht is, L( ) is the set of ll words tht strt with (possily empty) sequene of s, followed y (possily empty) sequene of s. (ǫ+)() (ǫ+): Let us nlyse the prts: Thus, we hve: L(ǫ+) = {ǫ,} L(() ) = {() n n N} L(ǫ+) = {ǫ,} L((ǫ+)() (ǫ+)) = {u() n v u {ǫ,} n N v {ǫ,}} In English: L((ǫ+)() (ǫ+)) is the set of (possily empty) sequenes of lternting s nd s. 3 32

4.3 Algeri lws The semntis of regulr expressions not only llows us to find out the mening of speifi regulr expressions, ut lso llows us to prove useful lws out regulr expression in generl. Let us illustrte y proving the following distriutive lw for regulr expressions: E(F +G) = EF +EG Note tht E, F, G re vriles stnding for some speifi ut ritrry regulr expressions, nd tht = here is semnti (s opposed to syntti) equlity. Tht is, wht we need to prove is tht regulr expression of the form E(F +G) nd one of the form EF +EG lwys hve the sme mening, i.e., denote the sme lnguge. We thus strt from L(E(F +G)) nd show step y step tht this is equl to L(EF +EG) without mking ny ssumptions out the onstituent regulr expressions E, F, nd G, other thn tht their semntis is given y L(E) et. L(E(F +G)) = { Semntis of r.e. (ont.) } L(E)L(E +F) = { Semntis of r.e. (+) } L(E)(L(F) L(G)) = { Def. ont. of lnguges } {w w 2 w L(E) w 2 (L(F) L(G))} = { Def. set union } {w w 2 w L(E) w 2 {w w L(F) w L(G)}} = { Dulity etween sets nd predites } {w w 2 w L(E) (w 2 L(F) w 2 L(G))} = { Conjuntion ( ) distriutes over disjuntion ( ) } {w w 2 (w L(E) w 2 L(F)) (w L(E) w 2 L(G))} = { Def. set union } {w w 2 (w L(E) w 2 L(F))} {w w 2 (w L(E) w 2 L(G))} = { Def. ont. lnguges (twie) } L(E)L(F) L(E)L(G) = { Semntis of r.e. (ont., twie) } L(EF) L(EG) = { Semntis of r.e. (+) } L(EF +EG) Other lws for regulr expressions n e proved similrly, lthough indution is sometimes needed. As n exerise, prove (some of) the following: ǫe = E Eǫ = E E = E = E +(F +G) = (E +F)+G E(FG) = (EF)G (E ) = E ǫ+ee = E 4.4 Trnslting regulr expressions into NFAs Theorem 4. A regulr expression E n e trnslted into n equivlent NFA N(E) suh tht L(N(E)) = L(E). We refer to this trnsltion s the Grphil Constrution. It is vrition of the stndrd wy of trnslting regulr expressions into NFAs known s Thompson s onstrution 8. Proof. The proof is y indution on the syntx of regulr expressions:. N( ): whih will rejet everything (there re no finl sttes). Thus: 2. N(ǫ): L(N( )) = = L( ) This utomton epts the empty word ut rejets everything else. Thus 3. N(x): L(N(ǫ)) = {ǫ} = L(ǫ) This utomton only epts the word x. Thus: L(N(x)) = {x} = L(x) 4. N(E +F): We merge the digrms for N(E) nd N(F) into one: 8 https://en.wikipedi.org/wiki/thompson%27s onstrution 33 34

Intuitively, the NFAs for the suexpressions E nd F re pled side y side. Thus if either of the NFA epts word, the omined NFA epts this word. However, we hve to ensure tht the sttes of the onstituent NFAs do not get onfused with eh other. We therefore hve to use the disjoint union, defined s follows: A B = {(,) A} {(,) B} Tht is, eh element of the disjoint union is tgged with n index tht shows from whih of the two sets it originted. Thus the elements of the onstituent sets will remin distint. The trnsition funtion of the omined NFA lso hs to e defined to work on tgged sttes. Thus, given the NFAs for the suexpressions E nd F: N(E) = (Q E,Σ,δ E,S E,F E) N(F) = (Q F,Σ,δ F,S F,F F) 5. N(EF): Rell tht L(EF) = L(E)L(F). Thus, word w L(EF) iff w n e divided into two words u nd v, w = uv, suh tht u L(E) nd v L(F). Consequently, if we hve n NFA N(E) reognising L(E) nd nother NFA N(F) reognising L(F), we n onstrut n NFA reognising words w = uv L(EF) if we join the NFAs N(E) nd N(F) in sequene in suh wy tht the mhine moves from N(E) to n initil stte of N(F) on the lst symol of word u L(E). Of ourse, it ould e tht ǫ L(E), mening tht one or more of the initil sttes of N(E) re epting. In this se, for word w = uv with u = ǫ, the mhine needs to strt in n initil stte of N(F) diretly s there is no lst symol of u = ǫ to move on. Therefore, we onsider two ses: ǫ / L(E), mening no initil stte of N(E) is epting, nd ǫ L(E), mening t lest one initil stte of N(E) is epting. We strt with the former: we onstrut the omined NFA for the regulr expression E +F: where N(E +F) = (Q E+F,Σ,δ E+F,S E+F,F E+F) Q E+F = Q E Q F δ E+F((,q),x) = {(,q ) q δ E(q,x)} δ E+F((,q),x) = {(,q ) q δ F(q,x)} S E+F F E+F = S E S F = F E F F It remins to prove L(N(E +F)) = L(E +F). We first oserve tht L(N(E +F)) = L(N(E)) L(N(F)) eusetheinitilsttess E+F oftheominednfaisthe(disjoint)union of the initil sttes of the onstituent NFAs, nd euse the omined NFA epts word whenever one of the onstituent NFAs does. The proof then proeeds y indution; tht is, we ssume tht the trnsltion is orret for the suexpressions: The dshed lines here suggest one or more. So there ould e one or more initil sttes nd one or more finl sttes in oth N(E) nd N(F). This will e mde preise shortly; the figure just onveys the generl ide. We identify every stte of N(E) tht immeditely preedes n epting stte; i.e., every stte from whih n epting stte n e rehed on single input symol. We then join N(E) nd N(F) into omined NFA N(EF) y dding n edge from eh of the identified sttes to ll of the initil sttesofn(f) forehsymol onwhih n eptingstte ofn(e) n e rehed. The initil sttes of N(EF) re the initil sttes of N(E) nd the finl sttes of N(EF) re the finl sttes of N(F): L(N(E)) = L(E) L(N(F)) = L(F) Thus: L(N(E +F)) = L(N(E)) L(N(F)) Aove = L(E) L(F) Indution hypothesis = L(E +F) By (4) This is wht is ment y indution over the syntx of regulr expressions. This ensures tht the NFA N(EF), one it hs red word u epted y N(E), is redy to try to ept the reminder v of word w = uv 35 36

y effetively pssing v to N(F), llowing the ltter to try to ept the remining prt v of w from ny of its initil sttes. We now formlise this onstrution. The set of sttes of the omined NFA N(EF) is gin given y the disjoint union of the sttes of N(E) nd N(F) to void onfusion, nd the trnsition funtion δ EF s well s the initil sttes S EF nd finl sttes F EF re defined ordingly. Thus, given the NFAs for the suexpressions E nd F: N(E) = (Q E,Σ,δ E,S E,F E) N(F) = (Q F,Σ,δ F,S F,F F) we onstrut the omined NFA for the regulr expression EF: where N(EF) = (Q EF,Σ,δ EF,S EF,F EF) Q EF = Q E Q F δ EF((,q),x) = {(,q ) q δ E(q,x)} {(,q ) δ E(q,x) F E q S F} δ EF((,q),x) = {(,q ) q δ F(q,x)} S EF = {(,q) q S E} F EF = {(,q) q F F} Now let us onsider the seond se: t lest one of the initil sttes of N(E) is epting: We thus refine the forml definition of the initil sttes of N(EF) to ount for this, yielding definition tht overs oth ses: S EF = {(,q) q S E} {(,q) S E F E q S F} It remins to prove L(N(EF)) = L(EF). From the onstrution ove, it is ler tht L(N(EF)) = {uv u L(N(E)) v L(N(F))} The proof gin proeeds y indution; tht is, we ssume tht the trnsltion is orret for the suexpressions: Thus: L(N(E)) = L(E) L(N(F)) = L(F) L(N(EF)) = {uv u L(N(E)) v L(N(F))} Aove = {uv u L(E) v L(F)} Ind. hyp. = L(E)L(F) Lng. ont. = L(EF) By (5) 6. N(E ): Rell tht L(E ) = L(E). Thus, word w L(E ) iff w n e divided into sequene of n N words u i, w = u u 2...u n, suh tht i [,n]. u i L(E). Consequently, if we hve n NFA N(E) reognising L(E), we n onstrut n NFA reognising words w L(E ) y onneting or more NFAs N(E) in sequene in similr wy to wht we did for the se N(EF) ove: Aswsdisussedove,thissimplymensthtwehvetorrngethtthe initil sttes of N(F) lso e initil sttes of the omined NFA N(EF): Here we use the * to informlly suggest sequentil omposition of or more NFAs. However, we need to onstrut single NFA, nd there is no upper ound on the numer of NFAs N(E) tht we need to onnet in sequene. We resolve this y tking single NFA N(E) nd onstrut nd NFA for N(E ) y mking it loop k to ll of its own initil sttes from eh stte tht immeditely preedes n epting stte. As we lso need to llow for itertion times, we further hve to dd one extr stte tht is oth initil nd finl thus epting ǫ: 37 38