CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

Similar documents
CMSC 330: Organization of Programming Languages

How do regular expressions work? CMSC 330: Organization of Programming Languages

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

CMSC 330: Organization of Programming Languages

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

Recap from Last Time

CS 133 : Automata Theory and Computability

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

CS 154, Lecture 3: DFA NFA, Regular Expressions

UNIT II REGULAR LANGUAGES

Automata: a short introduction

Theory of Computation

Regular languages, regular expressions, & finite automata (intro) CS 350 Fall 2018 gilray.org/classes/fall2018/cs350/

Section 1.3 Ordered Structures

CS21 Decidability and Tractability

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

Sri vidya college of engineering and technology

COSE312: Compilers. Lecture 2 Lexical Analysis (1)

Deterministic Finite Automaton (DFA)

Finite Automata and Regular Languages

Author: Vivek Kulkarni ( )

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

CS Automata, Computability and Formal Languages

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Regular expressions and Kleene s theorem

CS21 Decidability and Tractability

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries

1 Alphabets and Languages

SFWR ENG 2FA3. Solution to the Assignment #4

Theory of computation: initial remarks (Chapter 11)

Computational Theory

Theory of Computer Science

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro Diniz

Regular expressions and Kleene s theorem

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

acs-04: Regular Languages Regular Languages Andreas Karwath & Malte Helmert Informatik Theorie II (A) WS2009/10

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Automata Theory and Formal Grammars: Lecture 1

Regular Expressions. Definitions Equivalence to Finite Automata

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

Regular Expressions and Finite-State Automata. L545 Spring 2008

Foundations of

Uses of finite automata

Text Search and Closure Properties

Chapter 4. Regular Expressions. 4.1 Some Definitions

Nondeterministic Finite Automata

Lecture 17: Language Recognition

REGular and Context-Free Grammars

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CSCI 340: Computational Models. Regular Expressions. Department of Computer Science

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

Theory of Computation (Classroom Practice Booklet Solutions)

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

Formal Definition of a Finite Automaton. August 26, 2013

Chapter Five: Nondeterministic Finite Automata

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Concatenation. The concatenation of two languages L 1 and L 2

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

CSE 105 THEORY OF COMPUTATION

CS 455/555: Finite automata

Computer Sciences Department

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.

CSE 105 THEORY OF COMPUTATION

Chapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin

Computational Models Lecture 2 1

Closure under the Regular Operations

Regular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza

Automata Theory CS F-08 Context-Free Grammars

Computational Models Lecture 2 1

Automata and Formal Languages - CM0081 Finite Automata and Regular Expressions

CS 121, Section 2. Week of September 16, 2013

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

COMP4141 Theory of Computation

Context-Free Grammars (and Languages) Lecture 7

Theory of Computation

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

CS375 Midterm Exam Solution Set (Fall 2017)

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

HW 3 Solutions. Tommy November 27, 2012

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

Transcription:

CMSC 330: Organization of Programming Languages Regular Expressions and Finite Automata CMSC330 Spring 2018 1

How do regular expressions work? What we ve learned What regular expressions are What they can express, and cannot Programming with them What s next: how they work A great computer science result 2

A Few Questions About REs How are REs implemented? Implementing a one-off RE is not so hard Ø How to do it in general? What are the basic components of REs? Can implement some features in terms of others Ø E.g., e+ is the same as ee* What does a regular expression represent? Just a set of strings Ø This observation provides insight on how we go about our implementation next comes the math! 4

Definition: Alphabet An alphabet is a finite set of symbols Usually denoted Σ Example alphabets: Σ = {0,1} Binary: Decimal: Σ = {0,1,2,3,4,5,6,7,8,9} Alphanumeric: Σ = {0-9,a-z,A-Z} 5

Definition: String A string is a finite sequence of symbols from Σ ε is the empty string ("" in Ruby) s is the length of string s Ø Hello = 5, ε = 0 Note Ø Ø is the empty set (with 0 elements) Ø Ø { ε } ε Example strings over alphabet Σ = {0,1} (binary): 0101 0101110 ε 6

Definition: String concatenation String concatenation is indicated by juxtaposition s 1 = super s 2 = hero Sometimes also written s 1 s 2 s 1 s 2 = superhero For any string s, we have sε = εs = s You can concatenate strings from different alphabets; then the new alphabet is the union of the originals: Ø If s 1 = super from Σ 1 = {s,u,p,e,r} and s 2 = hero from Σ 2 = {h,e,r,o}, then s 1 s 2 = superhero from Σ 3 = {e,h,o,p,r,s,u} 7

Definition: Language A language L is a set of strings over an alphabet Example: All strings of length 1 or 2 over alphabet Σ = {a, b, c} that begin with a L = { a, aa, ab, ac } Example: All strings over Σ = {a, b} L = { ε, a, b, aa, bb, ab, ba, aaa, bba, aba, baa, } Language of all strings written Σ* Example: All strings of length 0 over alphabet Σ L = { s s Σ* and s = 0 } the set of strings s such that s is from Σ* and has length 0 = {ε} Ø 8

Definition: Language (cont.) Example: The set of phone numbers over the alphabet Σ = {0, 1, 2, 3, 4, 5, 6, 7, 9, (, ), -} Give an example element of this language (123)456-7890 Are all strings over the alphabet in the language? No Is there a Ruby regular expression for this language? /\(\d{3,3}\)\d{3,3}-\d{4,4}/ Example: The set of all valid Ruby programs Later we ll see how we can specify this language (Regular expressions are useful, but not sufficient) 9

Operations on Languages Let Σ be an alphabet and let L, L 1, L 2 be languages over Σ Concatenation L 1 L 2 is defined as L 1 L 2 = { xy x L 1 and y L 2 } Union is defined as L 1 L 2 = { x x L 1 or x L 2 } Kleene closure is defined as L* = { x x = ε or x L or x LL or x LLL or } 10

Quiz 1: Which string is not in L 3 L 1 = {a, ab, c, d, ε} L 2 = {d} where Σ = {a,b,c,d} L 3 = L 1 L 2 A. a B. abd C.ε D.d 11

Quiz 1: Which string is not in L 3 L 1 = {a, ab, c, d, ε} L 2 = {d} where Σ = {a,b,c,d} L 3 = L 1 L 2 A. a B. abd C.ε D.d 12

Quiz 2: Which string is not in L 3 L 1 = {a, ab, c, d, ε} L 2 = {d} L 3 = L 1 L 2 * where Σ = {a,b,c,d} A. a B. abd C.adad D.abdd 13

Quiz 2: Which string is not in L 3 L 1 = {a, ab, c, d, ε} L 2 = {d} L 3 = L 1 L 2 * where Σ = {a,b,c,d} A. a B. abd C.adad D.abdd 14

Regular Expressions: Grammar Similarly to how we expressed Micro-OCaml we can define a grammar for regular expressions R R ::= Ø The empty language ε The empty string σ A symbol from alphabet Σ R 1 R 2 The concatenation of two regexps R 1 R 2 The union of two regexps R* The Kleene closure of a regexp 15

Regular Languages Regular expressions denote languages. These are the regular languages aka regular sets Not all languages are regular Examples (without proof): Ø The set of palindromes over Σ Ø {a n b n n > 0 } (a n = sequence of n a s) Almost all programming languages are not regular But aspects of them sometimes are (e.g., identifiers) Regular expressions are commonly used in parsing tools 16

Semantics: Regular Expressions (1) Given an alphabet Σ, the regular expressions over Σ are defined inductively as follows regular expression Ø ε each symbol σ Σ denotes language Ø {ε} {σ} Constants 17

Semantics: Regular Expressions (2) Let A and B be regular expressions denoting languages L A and L B, respectively. Then: regular expression AB A B denotes language L A L B L A L B A* L A * Operations There are no other regular expressions over Σ 18

Terminology etc. Regexps apply operations to symbols Generates a set of strings (i.e., a language) Ø (Formal definition shortly) Examples Ø a {a} Ø a b {a} {b} = {a, b} Ø a* {ε} {a} {aa} = {ε, a, aa, } If s language L generated by a RE r, we say that r accepts, describes, or recognizes string s 19

Precedence Order in which operators are applied is: Kleene closure * > concatenation > union ab c = ( a b ) c {ab, c} ab* = a ( b* ) {a, ab, abb } a b* = a ( b* ) {a, ε, b, bb, bbb } We use parentheses ( ) to clarify E.g., a(b c), (ab)*, (a b)* Using escaped \( if parens are in the alphabet 20

Ruby Regular Expressions Almost all of the features we ve seen for Ruby REs can be reduced to this formal definition /Ruby/ concatenation of single-symbol REs /(Ruby Regular)/ union /(Ruby)*/ Kleene closure /(Ruby)+/ same as (Ruby)(Ruby)* /(Ruby)?/ same as (ε (Ruby)) (// is ε) /[a-z]/ same as (a b c... z) / [^0-9]/ same as (a b c...) for a,b,c,... Σ - {0..9} ^, $ correspond to extra symbols in alphabet 21

Implementing Regular Expressions We can implement a regular expression by turning it into a finite automaton A machine for recognizing a regular language String String String String String String Yes No 22

Finite Automaton Elements States S (start, final) Alphabet Σ Transition edges δ 23

Finite Automaton Start state Transition on 1 Final state States Machine starts in start or initial state Repeat until the end of the string s is reached Scan the next symbol σ Σ of the string s Take transition edge labeled with σ String s is accepted if automaton is in final state when end of string s is reached Elements States S (start, final) Alphabet Σ Transition edges δ 24

Finite Automaton: States Start state State with incoming transition from no other state Can have only one start state Final states States with double circle Can have zero or more final states S1 S2 Any state, including the start state, can be final 25

Finite Automaton: Example 1 0 0 1 0 1 1 Accepted? Yes 26

Finite Automaton: Example 2 0 0 1 0 1 0 Accepted? No 27

Quiz 3: What Language is This? A. All strings over {0, 1} B. All strings over {1} C. All strings over {0, 1} of length 1 D. All strings over {0, 1} that end in 1 28

Quiz 3: What Language is This? A. All strings over {0, 1} B. All strings over {1} C. All strings over {0, 1} of length 1 D. All strings over {0, 1} that end in 1 regular expression for this language is (0 1)*1 29

Finite Automaton: Example 3 string aabcc state at end accepts? (a,b,c notation shorthand for three self loops) 30

Finite Automaton: Example 3 string aabcc state at end S2 accepts? Y (a,b,c notation shorthand for three self loops) 31

Finite Automaton: Example 3 string acca state at end accepts? (a,b,c notation shorthand for three self loops) 32

Finite Automaton: Example 3 string acca state at end S3 accepts? N (a,b,c notation shorthand for three self loops) 33

Quiz 4: Which string is not accepted? A. bcca B. abbbc C. ccc D. ε (a,b,c notation shorthand for three self loops) 40

Quiz 4: Which string is not accepted? A. bcca B. abbbc C. ccc D. ε (a,b,c notation shorthand for three self loops) 41

Finite Automaton: Example 3 What language does this FA accept? a*b*c* S3 is a dead state a nonfinal state with no transition to another state 42

Finite Automaton: Example 4 Language? a*b*c* again, so FAs are not unique 43

Dead State: Shorthand Notation If a transition is omitted, assume it goes to a dead state that is not shown is short for Language? Strings over {0,1,2,3} with alternating even and odd digits, beginning with odd digit 44

Finite Automaton: Example 5 Description for each state S0 = Haven't seen anything yet OR Last symbol seen was a b S1 = Last symbol seen was an a S2 = Last two symbols seen were ab S3 = Last three symbols seen were abb 45

Finite Automaton: Example 5 Language as a regular expression? (a b)*abb 46

Quiz 5 0 a 1 0 Over Σ={a,b}, this FA accepts only: b a A. A string that contains a single a. B. Any string in {a,b}. C. A string that starts with b followed by a s. D. Zero or more b s, followed by one or more a s. 47

Quiz 5 0 a 1 0 Over Σ={a,b}, this FA accepts only: b a A. A string that contains a single a. B. Any string in {a,b}. C. A string that starts with b followed by a s. D. Zero or more b s, followed by one or more a s. 48

Exercises: Define an FA over Σ = {0,1} That accepts strings containing two consecutive 0s followed by two consecutive 1s That accepts strings with an odd number of 1s That accepts strings containing an even number of 0s and any number of 1s That accepts strings containing an odd number of 0s and odd number of 1s That accepts strings that DO NOT contain odd number of 0s and an odd number of 1s 49

Exercises: Define an FA over Σ = {0,1} That accepts strings with an odd number of 1s 50

Exercises: Define an FA over Σ = {0,1} That accepts strings with an odd number of 1s 51

Exercises: Define an FA over Σ = {0,1} That accepts strings containing an even number of 0s and any number of 1s 52

Exercises: Define an FA over Σ = {0,1} That accepts strings containing an even number of 0s and any number of 1s 53

Exercises: Define an FA over Σ = {0,1} That accepts strings containing two consecutive 0s followed by two consecutive 1s 54

Exercises: Define an FA over Σ = {0,1} That accepts strings containing two consecutive 0s followed by two consecutive 1s 55

Exercises: Define an FA over Σ = {0,1} That accepts strings end with two consecutive 0s followed by two consecutive 1s 56

Exercises: Define an FA over Σ = {0,1} That accepts strings end with two consecutive 0s followed by two consecutive 1s 57

Exercises: Define an FA over Σ = {0,1} That accepts strings containing an odd number of 0s and odd number of 1s 58

Exercises: Define an FA over Σ = {0,1} That accepts strings containing an odd number of 0s and odd number of 1s 4 states: 0s 1s e e o e e o o o 59

Exercises: Define an FA over Σ = {0,1} That accepts strings that DO NOT contain odd number of 0s and an odd number of 1s 60

Exercises: Define an FA over Σ = {0,1} That accepts strings that DO NOT contain odd number of 0s and an odd number of 1s Flip each state 61