CMSC 330: Organization of Programming Languages

Similar documents
CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

How do regular expressions work? CMSC 330: Organization of Programming Languages

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

CMSC 330: Organization of Programming Languages

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

Recap from Last Time

CS 133 : Automata Theory and Computability

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

UNIT II REGULAR LANGUAGES

Automata: a short introduction

Theory of Computation

Regular languages, regular expressions, & finite automata (intro) CS 350 Fall 2018 gilray.org/classes/fall2018/cs350/

Section 1.3 Ordered Structures

CS21 Decidability and Tractability

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

Sri vidya college of engineering and technology

COSE312: Compilers. Lecture 2 Lexical Analysis (1)

Deterministic Finite Automaton (DFA)

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Finite Automata and Regular Languages

CS 154, Lecture 3: DFA NFA, Regular Expressions

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Author: Vivek Kulkarni ( )

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

SFWR ENG 2FA3. Solution to the Assignment #4

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

CS Automata, Computability and Formal Languages

CS21 Decidability and Tractability

Regular expressions and Kleene s theorem

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

1 Alphabets and Languages

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Theory of computation: initial remarks (Chapter 11)

Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries

Computational Theory

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Theory of Computer Science

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

Regular expressions and Kleene s theorem

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro Diniz

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Automata Theory and Formal Grammars: Lecture 1

Foundations of

acs-04: Regular Languages Regular Languages Andreas Karwath & Malte Helmert Informatik Theorie II (A) WS2009/10

Regular Expressions and Finite-State Automata. L545 Spring 2008

Regular Expressions. Definitions Equivalence to Finite Automata

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

Uses of finite automata

Text Search and Closure Properties

Chapter 4. Regular Expressions. 4.1 Some Definitions

Nondeterministic Finite Automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CSCI 340: Computational Models. Regular Expressions. Department of Computer Science

Lecture 17: Language Recognition

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

REGular and Context-Free Grammars

Theory of Computation (Classroom Practice Booklet Solutions)

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

CSE 105 THEORY OF COMPUTATION

Formal Definition of a Finite Automaton. August 26, 2013

Concatenation. The concatenation of two languages L 1 and L 2

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Chapter Five: Nondeterministic Finite Automata

CS 455/555: Finite automata

Computer Sciences Department

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.

CSE 105 THEORY OF COMPUTATION

Computational Models Lecture 2 1

Closure under the Regular Operations

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Automata Theory CS F-08 Context-Free Grammars

Computational Models Lecture 2 1

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

Regular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza

Chapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin

Theory of Computation

Automata and Formal Languages - CM0081 Finite Automata and Regular Expressions

CS 121, Section 2. Week of September 16, 2013

COMP4141 Theory of Computation

Context-Free Grammars (and Languages) Lecture 7

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

CS375 Midterm Exam Solution Set (Fall 2017)

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

HW 3 Solutions. Tommy November 27, 2012

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Chapter-6 Regular Expressions

Transcription:

CMSC 330: Organization of Programming Languages Regular Expressions and Finite Automata CMSC 330 Spring 2017 1

How do regular expressions work? What we ve learned What regular expressions are What they can express, and cannot Programming with them What s next: how they work A great computer science result CMSC 330 Spring 2017 2

Languages and Machines CMSC 330 Spring 2017 3

A Few Questions About REs How are REs implemented? Implementing a one-off RE is not so hard Ø How to do it in general? What are the basic components of REs? Can implement some features in terms of others Ø E.g., e+ is the same as ee* What does a regular expression represent? Just a set of strings Ø This observation provides insight on how we go about our implementation next comes the math! CMSC 330 Spring 2017 4

Definition: Alphabet An alphabet is a finite set of symbols Usually denoted Σ Example alphabets: Σ = {0,1} Binary: Decimal: Σ = {0,1,2,3,4,5,6,7,8,9} Alphanumeric: Σ = {0-9,a-z,A-Z} CMSC 330 Spring 2017 5

Definition: String A string is a finite sequence of symbols from Σ ε is the empty string ("" in Ruby) s is the length of string s Ø Hello = 5, ε = 0 Note Ø Ø is the empty set (with 0 elements) Ø Ø { ε } ε Example strings over alphabet Σ = {0,1} (binary): 0101 0101110 ε CMSC 330 Spring 2017 6

Definition: String concatenation String concatenation is indicated by juxtaposition s 1 = super s 2 = hero Sometimes also written s 1 s 2 s 1 s 2 = superhero For any string s, we have sε = εs = s You can concatenate strings from different alphabets; then the new alphabet is the union of the originals: Ø If s 1 = super from Σ 1 = {s,u,p,e,r} and s 2 = hero from Σ 2 = {h,e,r,o}, then s 1 s 2 = superhero from Σ 3 = {e,h,o,p,r,s,u} CMSC 330 Spring 2017 7

Definition: Language A language L is a set of strings over an alphabet Example: All strings of length 1 or 2 over alphabet Σ = {a, b, c} that begin with a L = { a, aa, ab, ac } Example: All strings over Σ = {a, b} L = { ε, a, b, aa, bb, ab, ba, aaa, bba, aba, baa, } Language of all strings written Σ* Example: All strings of length 0 over alphabet Σ L = { s s Σ* and s = 0 } the set of strings s such that s is from Σ* and has length 0 = {ε} Ø CMSC 330 Spring 2017 8

Definition: Language (cont.) Example: The set of phone numbers over the alphabet Σ = {0, 1, 2, 3, 4, 5, 6, 7, 9, (, ), -} Give an example element of this language (123)456-7890 Are all strings over the alphabet in the language? No Is there a Ruby regular expression for this language? /\(\d{3,3}\)\d{3,3}-\d{4,4}/ Example: The set of all valid Ruby programs Later we ll see how we can specify this language (Regular expressions are useful, but not sufficient) CMSC 330 Spring 2017 9

Operations on Languages Let Σ be an alphabet and let L, L 1, L 2 be languages over Σ Concatenation L 1 L 2 is defined as L 1 L 2 = { xy x L 1 and y L 2 } Union is defined as L 1 L 2 = { x x L 1 or x L 2 } Kleene closure is defined as L* = { x x = ε or x L or x LL or x LLL or } CMSC 330 Spring 2017 10

Quiz 1: Which string is not in L 3 L 1 = {a, ab, c, d, ε} L 2 = {d} where Σ = {a,b,c,d} L 3 = L 1 L 2 A. a B. abd C.cd D.d CMSC 330 Spring 2017 11

Quiz 1: Which string is not in L 3 L 1 = {a, ab, c, d, ε} L 2 = {d} where Σ = {a,b,c,d} L 3 = L 1 L 2 A. a B. abd C.cd D.d CMSC 330 Spring 2017 12

Quiz 2: Which string is not in L 3 L 1 = {a, ab, c, d, ε} L 2 = {d} where Σ = {a,b,c,d} L 3 = L 1 L 2 A. a B. abd C.ε D.d CMSC 330 Spring 2017 13

Quiz 2: Which string is not in L 3 L 1 = {a, ab, c, d, ε} L 2 = {d} where Σ = {a,b,c,d} L 3 = L 1 L 2 A. a B. abd C.ε D.d CMSC 330 Spring 2017 14

Regular Expressions: Grammar Similarly to how we expressed Micro-OCaml we can define a grammar for regular expressions R R ::= Ø The empty language ε The empty string σ A symbol from alphabet Σ R 1 R 2 The concatenation of two regexps R 1 R 2 The union of two regexps R* The Kleene closure of a regexp CMSC 330 Spring 2017 15

Regular Languages Regular expressions denote languages. These are the regular languages aka regular sets Not all languages are regular Examples (without proof): Ø The set of palindromes over Σ Ø {a n b n n > 0 } (a n = sequence of n a s) Almost all programming languages are not regular But aspects of them sometimes are (e.g., identifiers) Regular expressions are commonly used in parsing tools CMSC 330 Spring 2017 16

Semantics: Regular Expressions (1) Given an alphabet Σ, the regular expressions over Σ are defined inductively as follows regular expression Ø ε each symbol σ Σ denotes language Ø {ε} {σ} Constants CMSC 330 Spring 2017 17

Semantics: Regular Expressions (2) Let A and B be regular expressions denoting languages L A and L B, respectively. Then: regular expression AB A B denotes language L A L B L A L B A* L A * Operations There are no other regular expressions over Σ CMSC 330 Spring 2017 18

Terminology etc. Regexps apply operations to symbols Generates a set of strings (i.e., a language) Ø (Formal definition shortly) Examples Ø a {a} Ø a b {a} {b} = {a, b} Ø a* {ε} {a} {aa} = {ε, a, aa, } If s language generated by a RE r, we say that r accepts, describes, or recognizes string s CMSC 330 Spring 2017 19

Precedence Order in which operators are applied is: Kleene closure * > concatenation > union ab c = ( a b ) c {ab, c} ab* = a ( b* ) {a, ab, abb } a b* = a ( b* ) {a, ε, b, bb, bbb } We use parentheses ( ) to clarify E.g., a(b c), (ab)*, (a b)* Using escaped \( if parens are in the alphabet CMSC 330 Spring 2017 20

Ruby Regular Expressions Almost all of the features we ve seen for Ruby REs can be reduced to this formal definition /Ruby/ concatenation of single-symbol REs /(Ruby Regular)/ union /(Ruby)*/ Kleene closure /(Ruby)+/ same as (Ruby)(Ruby)* /(Ruby)?/ same as (ε (Ruby)) (// is ε) /[a-z]/ same as (a b c... z) / [^0-9]/ same as (a b c...) for a,b,c,... Σ - {0..9} ^, $ correspond to extra symbols in alphabet CMSC 330 Spring 2017 21

Implementing Regular Expressions We can implement a regular expression by turning it into a finite automaton A machine for recognizing a regular language String String String String String String Yes No CMSC 330 Spring 2017 22

Finite Automaton Start state Transition on 1 Final state States Machine starts in start or initial state Repeat until the end of the string s is reached Scan the next symbol σ Σ of the string s Take transition edge labeled with σ String s is accepted if automaton is in final state when end of string s is reached Elements States S (start, final) Alphabet Σ Transition edges δ CMSC 330 Spring 2017 23

Finite Automaton: States Start state State with incoming transition from no other state Can have only one start state Final states States with double circle Can have zero or more final states S1 S2 Any state, including the start state, can be final CMSC 330 Spring 2017 24

Finite Automaton: Example 1 0 0 1 0 1 1 Accepted? Yes CMSC 330 Spring 2017 25

Finite Automaton: Example 2 0 0 1 0 1 0 Accepted? No CMSC 330 Spring 2017 26

Quiz 3: What Language is This? A. All strings over {0, 1} B. All strings over {1} C. All strings over {0, 1} of length 1 D. All strings over {0, 1} that end in 1 CMSC 330 Spring 2017 27

Quiz 3: What Language is This? A. All strings over {0, 1} B. All strings over {1} C. All strings over {0, 1} of length 1 D. All strings over {0, 1} that end in 1 regular expression for this language is (0 1)*1 CMSC 330 Spring 2017 28

Finite Automaton: Example 3 string aabcc state at end accepts? (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 29

Finite Automaton: Example 3 string aabcc state at end S2 accepts? Y (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 30

Finite Automaton: Example 3 string acca state at end accepts? (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 31

Finite Automaton: Example 3 string acca state at end S3 accepts? N (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 32

Finite Automaton: Example 3 string aacbbb state at end accepts? (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 33

Finite Automaton: Example 3 string aacbbb state at end S3 accepts? N (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 34

Finite Automaton: Example 3 string ε state at end accepts? (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 35

Finite Automaton: Example 3 string ε state at end S0 accepts? Y (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 36

Finite Automaton: Example 3 string acba state at end accepts? (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 37

Finite Automaton: Example 3 string acba state at end S3 accepts? N (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 38

Quiz 4: Which string is not accepted? A. abbbc B. ccc C. ε D. bcca (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 39

Quiz 4: Which string is not accepted? A. abbbc B. ccc C. ε D. bcca (a,b,c notation shorthand for three self loops) CMSC 330 Spring 2017 40

Finite Automaton: Example 3 What language does this FA accept? a*b*c* S3 is a dead state a nonfinal state with no transition to another state CMSC 330 Spring 2017 41

Finite Automaton: Example 4 Language? a*b*c* again, so FAs are not unique CMSC 330 Spring 2017 42

Dead State: Shorthand Notation If a transition is omitted, assume it goes to a dead state that is not shown is short for Language? Strings over {0,1,2,3} with alternating even and odd digits, beginning with odd digit CMSC 330 Spring 2017 43

Finite Automaton: Example 5 Description for each state S0 = Haven't seen anything yet OR Last symbol seen was a b S1 = Last symbol seen was an a S2 = Last two symbols seen were ab S3 = Last three symbols seen were abb CMSC 330 Spring 2017 44

Finite Automaton: Example 5 Language as a regular expression? (a b)*abb CMSC 330 Spring 2017 45

Quiz 5 0 b 1 0 Over Σ={a,b}, this FA accepts: a a A. A string that contains a single b. B. Zero or more a s, followed by a single b, followed by zero or more a s. C. Any string in {a,b}. D. A string that starts with b followed by a s. CMSC 330 Spring 2017 46

Quiz 5 0 b 1 0 Over Σ={a,b}, this FA accepts: a a A. A string that contains a single b. B. Zero or more a s, followed by a single b, followed by zero or more a s. C. Any string in {a,b}. D. A string that starts with b followed by a s. CMSC 330 Spring 2017 47

Exercises: Define an FA over Σ = {0,1} That accepts strings containing two consecutive 0s followed by two consecutive 1s That accepts strings with an odd number of 1s That accepts strings containing an even number of 0s and any number of 1s That accepts strings containing an odd number of 0s and odd number of 1s That accepts strings that DO NOT contain odd number of 0s and an odd number of 1s CMSC 330 Spring 2017 48