Introduction to Language Theory and Compilation: Exercises. Session 2: Regular expressions

Similar documents
Foundations of

Sri vidya college of engineering and technology

CS 121, Section 2. Week of September 16, 2013

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Nondeterministic Finite Automata

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Outline. Nondetermistic Finite Automata. Transition diagrams. A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F)

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

Concatenation. The concatenation of two languages L 1 and L 2

UNIT-III REGULAR LANGUAGES

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

How do regular expressions work? CMSC 330: Organization of Programming Languages

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

Deterministic Finite Automaton (DFA)

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

CSE 311 Lecture 25: Relating NFAs, DFAs, and Regular Expressions. Emina Torlak and Kevin Zatloukal

Fooling Sets and. Lecture 5

Automata and Formal Languages - CM0081 Non-Deterministic Finite Automata

UNIT II REGULAR LANGUAGES

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Finite Automata and Regular Languages

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

Author: Vivek Kulkarni ( )

Equivalence of DFAs and NFAs

CS 581: Introduction to the Theory of Computation! Lecture 1!

Chapter 5. Finite Automata

CMSC 330: Organization of Programming Languages

Introduction to Theoretical Computer Science. Motivation. Automata = abstract computing devices

What we have done so far

Theory of Computation (I) Yijia Chen Fudan University

Lecture 1: Finite State Automaton

Finite Automata and Formal Languages TMV026/DIT321 LP4 2012

Deterministic Finite Automata (DFAs)

Incorrect reasoning about RL. Equivalence of NFA, DFA. Epsilon Closure. Proving equivalence. One direction is easy:

Lecture 3: Nondeterministic Finite Automata

Deterministic Finite Automata (DFAs)

Regular Expression Unit 1 chapter 3. Unit 1: Chapter 3

COMP4141 Theory of Computation

Finite Automata and Formal Languages

Text Search and Closure Properties

CS 154, Lecture 3: DFA NFA, Regular Expressions

Classes and conversions

Nondeterministic finite automata

CSC173 Workshop: 13 Sept. Notes

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

CS243, Logic and Computation Nondeterministic finite automata

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Text Search and Closure Properties

Extended transition function of a DFA

CSC236 Week 11. Larry Zhang

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

Regular expressions and Kleene s theorem

Finite-State Machines (Automata) lecture 12

Lecture 2: Regular Expression

September 7, Formal Definition of a Nondeterministic Finite Automaton

highlights CSE 311: Foundations of Computing highlights 1 in third position from end Fall 2013 Lecture 25: Non-regularity and limits of FSMs

Deterministic Finite Automata (DFAs)

CISC 4090 Theory of Computation

Automata and Languages

Recap from Last Time

Lecture 3: Finite Automata

Announcements. Problem Set Four due Thursday at 7:00PM (right before the midterm).

Regular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza

Recitation 2 - Non Deterministic Finite Automata (NFA) and Regular OctoberExpressions

COSE312: Compilers. Lecture 2 Lexical Analysis (1)

Lexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University.

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Non-deterministic Finite Automata (NFAs)

Regular expressions and Kleene s theorem

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

Uses of finite automata

Closure under the Regular Operations

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs

CMSC 330: Organization of Programming Languages

Lecture 4: Finite Automata

Lecture 3: Finite Automata

Theory Bridge Exam Example Questions

Finite Automata. Finite Automata

Regular Expressions and Finite-State Automata. L545 Spring 2008

CSC236 Week 10. Larry Zhang

Introduction to Theory of Computing

Regular expressions and Kleene s theorem

Chapter Five: Nondeterministic Finite Automata

Nondeterministic Finite Automata

Theory of computation: initial remarks (Chapter 11)

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

jflap demo Regular expressions Pumping lemma Turing Machines Sections 12.4 and 12.5 in the text

Decision, Computation and Language

CSE 311: Foundations of Computing. Lecture 25: Pattern Matching, DFA NFA Regex Languages vs Representations

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Course 4 Finite Automata/Finite State Machines

Computational Theory

CSE 311 Lecture 23: Finite State Machines. Emina Torlak and Kevin Zatloukal

Transcription:

Introduction to Language Theory and Compilation: Exercises Session 2: Regular expressions

Regular expressions (RE) Finite automata are an equivalent formalism to regular languages (for each regular language, there exists at least one FA that recognizes it, and each FA recognizes a regular language). Regular expressions are another formalism defined inductively just as regular languages. It can be proven that regular expressions and regular languages are equivalent (see lecture notes).

Regular expressions (RE) (ctd.) Base cases: RE language φ ε {ε} a ( a Σ) {a} If p and q are regular expressions representing the languages P and Q respectively, then: RE p + q pq (or p q) p language P Q P Q P Extended regular expression example: p + pp

RE examples 0 + 1 denotes the language {0, 1} a(b + c) denotes the language {a} {b, c} = {ab, ac}... which could also be denoted by ab + ac x denotes {x}... which could also be denoted by ε + x + xxx A regular expression is equivalent to one and one only regular language, but a regular language can have more than one corresponding regular expression.

Exercise 1 For each of the following languages (defined on the alphabet Σ = {0, 1}), design a RE that recognizes it: 1 The set of strings ending with 00. 2 The set of strings whose 10 th symbol, counted from the end of the string, is a 1. 3 The set of strings where each pair of zeroes is followed by a pair of ones. 4 The set of strings not containing 101. 5 The set of binary numbers divisible by 4.

Solution for exercise 1 1 (0 + 1) 00 2 (0 + 1) 1(0 + 1)(0 + 1)(0 + 1)(0 + 1)(0 + 1)(0 + 1)(0 + 1)(0 + 1)(0 + 1) 3 (1 + 01 + 0011) (0 + ε) 4 0 (1 + 00 + ) 0 5 (0 + 1) 00 + 0

State elimination method Given a DFA M, we can craft a corresponding regular expression using the state elimination method. The general idea is to label transitions in the automaton using regular expressions, pick a final state, then remove all other states step by step to finally reach a simple automaton which can then be used to easily determine a regular expression. There are two possible cases: 1 The start state of M is not final (q 0 / F ) 2 The start state of M is final (q 0 F )

State elimination method (ctd.) For each q F we ll be left with an A q that Inlooks the case like where q 0 / F, we reach a two state automaton: R U S Start T The that corresponding corresponds regular toexpression the regex is: E q = (R+SU T ) SU (R + SU T ) SU or with A q looking like R Start

State elimination method (ctd.) that corresponds to the regex E q = (R+SU T ) SU Inor thewith case where A q looking q 0 F, we likereach a single state automaton: R Start T The corresponding regular to the expression regex is: E q = R R The final expression is E q q F 69

State elimination method (ctd.) For each final state q F F, one has to build such a simple automaton to derive a regular expression RE(q F ) that expresses all possible inputs that are accepted when M stops in q F. The actual regular expression that describes the language L(M) of the automaton M then simply becomes: RE(q F 1 ) + RE(q F 2 ) +... + RE(q F k ) where { q F 1,..., q F k } = F

Algorithm First, preprocess by labeling all transitions by a RE. Then, for each state S x to be eliminated, consider each transition (S a, S x ), (S x, S b ) or (S x, S x ) with respective labels A, B and X. The transition (S a, S b ) labeled by E becomes the absorbing transition E + (AX B) and remove A, B, X and E. Note: some transitions can be null. In that case, does not consider the transition. For instance, if E = (S a, S b ) cannot be generated by δ (the transition function, see definition), then the absorption transition will be AX B.

Exercise 2.1 Design a RE accepting the same language as: 1 0 0 0 q1 q2 q3 1 1

Solution for exercise 2.1 1 0 0 0 start Q 1 Q 2 Q 3 1 Transition (Q a, Q x ) : A (Q x, Q b ) : B (Q x, Q x ) : X (Q 1, Q 1 ) (Q 1, Q 2 ) : 0 (Q 2, Q 1 ) : 1 (Q 2, Q 2 ) : (Q 1, Q 3 ) (Q 1, Q 2 ) : 0 (Q 2, Q 3 ) : 0 (Q 2, Q 2 ) : (Q 3, Q 3 ) (Q 3, Q 2 ) : 1 (Q 2, Q 3 ) : 0 (Q 2, Q 2 ) : (Q 3, Q 1 ) (Q 3, Q 2 ) : 1 (Q 2, Q 1 ) : 1 (Q 2, Q 2 ) : We search absorbing transitions (Q a, Q a ), (Q a, Q b ), (Q b, Q b ) and (Q b, Q a ). 1

Solution for exercise 2.1 Transition (Q a, Q x ) : A (Q x, Q b ) : B (Q x, Q x ) : X (Q 1, Q 1 ) (Q 1, Q 2 ) : 0 (Q 2, Q 1 ) : 1 (Q 2, Q 2 ) : (Q 1, Q 3 ) (Q 1, Q 2 ) : 0 (Q 2, Q 3 ) : 0 (Q 2, Q 2 ) : (Q 3, Q 3 ) (Q 3, Q 2 ) : 1 (Q 2, Q 3 ) : 0 (Q 2, Q 2 ) : (Q 3, Q 1 ) (Q 3, Q 2 ) : 1 (Q 2, Q 1 ) : 1 (Q 2, Q 2 ) : Applying the rule: E + (AX B): Absorbing transitions (Q, Q ) : E AX B Result (Q 1, Q 1 ) : R 1 01 1 + (01) (Q 1, Q 3 ) : S 00 00 (Q 3, Q 3 ) : U 0 10 0 + (10) (Q 3, Q 1 ) : T 11 11

Solution for exercise 2.1 Absorbing transitions (Q, Q ) : E AX B Result (Q 1, Q 1 ) : R 1 01 1 + (01) (Q 1, Q 3 ) : S 00 00 (Q 3, Q 3 ) : U 0 10 0 + (10) (Q 3, Q 1 ) : T 11 11 Give the automaton: R = 1 + (01) U = 0 + (10) S = 00 start Q 1 Q 3 T = 11 by applying the rule: (R + SU T ) SU, the RE is: ((1 + (01)) + (00)(0 + (10)) (11)) (00)(0 + (10))

Exercise 2.2 Design a RE accepting the same language as: 1 1 0 q3 q1 q2 1 0 0

Solution for exercise 2.2 Absorbing transitions (Q, Q ) : E AX B Result (Q 1, Q 1 ) : R 00 00 (Q 1, Q 3 ) : S 1 01 1 + (01) (Q 3, Q 3 ) : U 01 01 (Q 3, Q 1 ) : T 1 00 1 + (00) Give the automaton: R = 00 U = 01 S = 1 + (01) start Q 1 Q 3 T = 1 + (00) by applying the rule: (R + SU T ) SU, the RE is: ((00) + (1 + (01))(01) (1 + (00))) (1 + (01))(01)

Exercise 3 Convert the following REs into ε-nfas: 1 01 2 (0 + 1)01 3 00(0 + 1) Recall: AB gives (Q 1 ) A (Q 2 ) B (Q 3 ) A + B gives (Q 4 ) ɛ (Q 3 ) A (Q 2 ) ɛ (Q 1 ) ɛ (Q 2 ) B (Q 3 ) ɛ (Q 4 ) X (Q 2 ) X (Q 1 ) ɛ (Q 2 )

Solution for exercise 3 ε 0 ε 1 ε

Solution for exercise 3 (0 + 1)01 0 ε ε 0 1 ε ε 1

Solution for exercise 3 00(0 + 1) ε ε 0 ε 0 0 ε 1 ε ε

Extended regular expressions (ERE) Very popular on UNIX-like tools (grep, find, etc.) Grant more flexibility than traditional regular expressions Typically used by scanner generators such as lex

ERE syntax Expression Accepted language r* 0 or more rs r+ 1 or more rs r? 0 or 1 r [abc] a or b or c [a-z] Any character in the interval a... z. Any character except \n [ˆs] Any character but those in s r{m,n} Between m and n occurrences of r r1 r2 The concatenation of r1 and r2

ERE syntax (ctd.) Expression Accepted language r1 r2 r1 or r2 (r) r ˆr r if it starts a line r$ r if it ends a line "s" The string s \c The character c r1(?=r2) r1 when it s followed by r2

Examples Expression Accepted language [a-za-z] Any letter (upper or lower case) [0-9] Any digit a[ˆa-za-z]b An a followed by a non-alphabetical character and a b ˆSilly Silly if it starts a line [a-za-z]([a-za-z] [0-9])* An identifier in the Pascal language

Exercise 4 1 Give an extended regular expression (ERE) that targets any sequence of 5 characters, including the newline character \n 2 Give an ERE that targets any string starting with an arbitrary number of \ followed by any number of * 3 UNIX-like shells (such as bash) allow the user to write batch files in which comments can be added. A line is defined to be a comment if it starts with a # sign. What ERE accepts such comments? 4 Design an ERE that accepts numbers in scientific notation. Such a number must contain at least one digit and has two optional parts: A "decimal" part : a dot followed by a sequence of digits An "exponential" part: an E followed by an integer that may be prefixed by + or - Examples : 42, 66.4E-5, 8E17,...

Exercise 4 5 Design an ERE that accepts "correct" phrases that fulfill the following criteria: No prepending/appending spaces The first word must start with a capital letter The phrase must end with a full stop. The phrase must be made of one or more words (made of the characters a...z and A...Z) separated by a single space There must be one sentence per line Punctuation signs other than a full stop are not allowed. 6 Craft an ERE that accepts old school DOS-style filenames (8 characters in a...z, A...Z and _) whose extension is.ext and that begin with the string abcde. We ask that the ERE only accept the filename without capturing the extension! Example: on abcdelol.ext, the ERE must accept abcdelol

Solution for exercise 4 1 (. \n){5} 2 \\*\** 3 ˆ#.*$ 4 [0-9]+(\.[0-9]+)?(E[+-]?[0-9]+)? 5 ˆ [A-Z][A-Za-z]*(\ [A-Za-z]+)*\.$ 6 abcde[a-za-z_]{3}(?=\.ext)