COSE312: Compilers. Lecture 2 Lexical Analysis (1)

Similar documents
Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

1 Alphabets and Languages

Languages. A language is a set of strings. String: A sequence of letters. Examples: cat, dog, house, Defined over an alphabet:

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

Deterministic Finite Automaton (DFA)

CMSC 330: Organization of Programming Languages

UNIT II REGULAR LANGUAGES

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

CVO103: Programming Languages. Lecture 2 Inductive Definitions (2)

Regular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza

Automata: a short introduction

Plan for Today and Beginning Next week (Lexical Analysis)

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lectures 13 16

Lexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University.

UNIT-III REGULAR LANGUAGES

Context-Free Grammars (and Languages) Lecture 7

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Formal Languages. We ll use the English language as a running example.

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

Chapter 5. Finite Automata

What is this course about?

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

Uses of finite automata

Computational Theory

Finite Automata and Languages

Computational Models #1

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Closure under the Regular Operations

Introduction to Language Theory and Compilation: Exercises. Session 2: Regular expressions

Finite Automata and Regular languages

Computational Models - Lecture 1 1

Theoretical Computer Science

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro Diniz

Regular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

CS21 Decidability and Tractability

Theory of Computation 1 Sets and Regular Expressions

CS 121, Section 2. Week of September 16, 2013

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017

Nondeterministic Finite Automata and Regular Expressions

How do regular expressions work? CMSC 330: Organization of Programming Languages

Non-deterministic Finite Automata (NFAs)

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

TDDD65 Introduction to the Theory of Computation

Formal Languages, Automata and Models of Computation

CS 154, Lecture 3: DFA NFA, Regular Expressions

Finite Automata and Regular Languages

Theory of Computation (I) Yijia Chen Fudan University

Equivalence of Regular Expressions and FSMs

THEORY OF COMPILATION

Formal Languages. We ll use the English language as a running example.

Lecture 3: Nondeterministic Finite Automata

Nondeterministic Finite Automata

Embedded systems specification and design

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.

COSE312: Compilers. Lecture 17 Intermediate Representation (2)

CS Automata, Computability and Formal Languages

Theory of Computer Science

COSE312: Compilers. Lecture 14 Semantic Analysis (4)

In English, there are at least three different types of entities: letters, words, sentences.

COMP4141 Theory of Computation

CS5371 Theory of Computation. Lecture 5: Automata Theory III (Non-regular Language, Pumping Lemma, Regular Expression)

Compiling Techniques

Theory of computation: initial remarks (Chapter 11)

Outline. Nondetermistic Finite Automata. Transition diagrams. A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F)

Operational Semantics

MA/CSSE 474 Theory of Computation

Sri vidya college of engineering and technology

Regular expressions and Kleene s theorem

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

CSE 105 THEORY OF COMPUTATION

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

CS 133 : Automata Theory and Computability

Extended transition function of a DFA

Theory of Computation

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

REGular and Context-Free Grammars

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

Type 3 languages. Type 2 languages. Regular grammars Finite automata. Regular expressions. Type 2 grammars. Deterministic Nondeterministic.

Lecture 3: Finite Automata. Finite Automata. Deterministic Finite Automata. Summary. Dr Kieran T. Herley

CSC173 Workshop: 13 Sept. Notes

Regular Expressions and Finite-State Automata. L545 Spring 2008

Theory of Computation

Introduction to Automata

Transcription:

COSE312: Compilers Lecture 2 Lexical Analysis (1) Hakjoo Oh 2017 Spring Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 1 / 15

Lexical Analysis ex) Given a C program float match0 (char *s) /* find a zero */ {if (!strncmp(s, "0.0", 3)) return 0.0; } the lexical analyzer returns the stream of tokens: FLOAT ID(match0) LPAREN CHAR STAR ID(s) RPAREN LBRACE IF LPAREN BANG ID(strncmp) LPAREN ID(s) COMMA STRING(0.0) COMMA NUM(3) RPAREN RPAREN RETURN REAL(0.0) SEMI RBRACE EOF Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 2 / 15

Specification, Recognition, and Automation 1 Specification: how to specify lexical patterns? In C, identifiers are strings like x, xy, match0, and _abc. Numbers are strings like 3, 12, 0.012, and 3.5E4. regular expressions 2 Recognition: how to recognize the lexical patterns? Recognize match0 as an identifier. Recognize 512 as a number. deterministic finite automata. 3 Automation: how to automatically generate string recognizers from specifications? Thompson s construction and subset construction Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 3 / 15

cf) Lexical Analyzer Generator lex: a lexical analyzer generator for C jlex: a lexical analyzer generator for Java ocamllex: a lexical analyzer generator for OCaml Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 4 / 15

Part 1: Specification Preliminaries: alphabets, strings, languages Syntax and semantics of regular expressions Extensions of regular expressions Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 5 / 15

Alphabet An alphabet Σ is a finite, non-empty set of symbols. E.g, Σ = {0, 1} Σ = {a, b,..., z} Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 6 / 15

Strings A string is a finite sequence of symbols chosen from an alphabet, e.g., 1, 01, 10110 are strings over Σ = {0, 1}. Notations: ɛ: the empty string. wv: the concatenation of w and v. w R : the reverse of w. w : the length of string w: ɛ = 0 va = v + 1 If w = vu, then v is a prefix of w, and u is a suffix of w. Σ k : the set of strings over Σ of length k Σ : the set of all strings over alphabet Σ: Σ = Σ 0 Σ 1 Σ 2 = Σ i Σ + = Σ 1 Σ 2 = Σ \ {ɛ} Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 7 / 15 i N

Languages A language L is a subset of Σ : L Σ. L 1 L 2, L 1 L 2, L 1 L 2 L R = {w R w L} L = Σ L L 1 L 2 = {xy x L 1 y L 2 } The power of a language, L n : L 0 = {ɛ} L n = L n 1 L The star-closure (or Kleene closure) of a language, L : L = L 0 L 1 L 2 = L i i 0 The positive closure of a language, L + : L + = L 1 L 2 L 3 = i 1 L i Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 8 / 15

Regular Expressions A regular expression is a notation to denote a language. Syntax R ɛ a Σ R 1 R 2 R 1 R 2 R1 (R) Semantics L( ) = L(ɛ) = {ɛ} L(a) = {a} L(R 1 R 2 ) = L(R 1 ) L(R 2 ) L(R 1 R 2 ) = L(R 1 )L(R 2 ) L(R ) = (L(R)) L((R)) = L(R) Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 9 / 15

Example L(a (a b)) = L(a )L(a b) = (L(a)) (L(a) L(b)) = ({a}) ({a} {b}) = {ɛ, a, aa, aaa,...}({a, b}) = {a, aa, aaa,..., b, ab, aab,...} Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 10 / 15

Exercises Write regular expressions for the following languages: The set of all strings over Σ = {a, b}. The set of strings of a s and b s, terminated by ab. The set of strings with an even number of a s followed by an odd number of b s. The set of C identifiers. Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 11 / 15

Regular Definitions Give names to regular expressions and use the names in subsequent expressions, e.g., the set of C identifiers: letter A B Z a b z _ digit 0 1 9 id letter(letter digit) Formally, a regular definition is a sequence of definitions of the form: d 1 r 1 d 2 r 2 d n r n 1 Each d i is a new name such that d i Σ. 2 Each r i is a regular expression over Σ {d 1, d 2,..., d i 1 }. Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 12 / 15

Example Unsigned numbers (integers or floating point), e.g., 5280, 0.01234, 6.336E4, or 1.89E-4: digit 0 1 9 digits digit digit optionalfraction. digits ɛ optionalexponent (E (+ - ɛ) digits) ɛ number digits optionalfraction optionalexponent Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 13 / 15

Extensions of Regular Expressions 1 R + : the positive closure of R, i.e., L(R + ) = L(R) +. 2 R?: zero or one instance of R, i.e., L(R?) = L(R) {ɛ}. 3 [a 1 a 2 a n ]: the shorthand for a 1 a 2 a n. 4 [a 1 -a n ]: the shorthand for [a 1 a 2 a n ], where a 1,..., a n are consecutive symbols. [abc] = a b c [a-z] = a b z. Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 14 / 15

Examples C identifiers: Unsigned numbers: letter [A-Za-z_] digit [0-9] id letter (letter digit) digit [0-9] digits digit + number digits (. digits)? (E [+-]? digits)? Hakjoo Oh COSE312 2017 Spring, Lecture 2 March 12, 2017 15 / 15