Some useful tasks involving language. Finite-State Machines and Regular Languages. More useful tasks involving language. Regular expressions

Similar documents
Regular Expressions and Finite-State Automata. L545 Spring 2008

Regular expressions and automata

Overview. Regular Expressions and Finite-State. Motivation. Regular expressions. RE syntax Additional functions. Regular languages Properties

Finite State Automata Design

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

Chapter 5. Finite Automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS 455/555: Finite automata

Inf2A: Converting from NFAs to DFAs and Closure Properties

CS 154. Finite Automata, Nondeterminism, Regular Expressions

UNIT-III REGULAR LANGUAGES

Sri vidya college of engineering and technology

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

COM364 Automata Theory Lecture Note 2 - Nondeterminism

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

Formal Models in NLP

Theory of Computation (I) Yijia Chen Fudan University

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Closure under the Regular Operations

CSE 105 THEORY OF COMPUTATION

Closure under the Regular Operations

Finite-state machines (FSMs)

CS 154, Lecture 3: DFA NFA, Regular Expressions

Regular languages, regular expressions, & finite automata (intro) CS 350 Fall 2018 gilray.org/classes/fall2018/cs350/

Lecture 3: Nondeterministic Finite Automata

Foundations of

Deterministic Finite Automata (DFAs)

Theory of computation: initial remarks (Chapter 11)

Finite Automata. BİL405 - Automata Theory and Formal Languages 1

Finite State Transducers

Uses of finite automata

Deterministic Finite Automata (DFAs)

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Equivalence of DFAs and NFAs

Finite Automata. Finite Automata

CS 121, Section 2. Week of September 16, 2013

Finite Automata Part Two

Theory of computation: initial remarks (Chapter 11)

FINITE STATE AUTOMATA

Part I: Definitions and Properties

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Finite-state Machines: Theory and Applications

UNIT-I. Strings, Alphabets, Language and Operations

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

CHAPTER 1 Regular Languages. Contents

Chapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Finite Automata and Formal Languages TMV026/DIT321 LP4 2012

How do regular expressions work? CMSC 330: Organization of Programming Languages

Automata Theory for Presburger Arithmetic Logic

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

Computer Sciences Department

CSC173 Workshop: 13 Sept. Notes

Author: Vivek Kulkarni ( )

Context-Free and Noncontext-Free Languages

Formal Languages. We ll use the English language as a running example.

MA/CSSE 474 Theory of Computation

CISC 4090 Theory of Computation

CSE 105 THEORY OF COMPUTATION

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Equivalence of Regular Expressions and FSMs

Deterministic Finite Automata (DFAs)

Regular expressions and Kleene s theorem

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Büchi Automata and their closure properties. - Ajith S and Ankit Kumar

Non-deterministic Finite Automata (NFAs)

CPS 220 Theory of Computation REGULAR LANGUAGES

Computational Models Lecture 2 1

Regular Expressions. Definitions Equivalence to Finite Automata

Computational Models Lecture 2 1

Intro to Theory of Computation

Finite Automata and Regular languages

Finite-State Transducers

CS21 Decidability and Tractability

CSE 105 THEORY OF COMPUTATION

3515ICT: Theory of Computation. Regular languages

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata.

Nondeterministic Finite Automata

Finite Automata and Formal Languages

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Concatenation. The concatenation of two languages L 1 and L 2

Deterministic Finite Automaton (DFA)

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

jflap demo Regular expressions Pumping lemma Turing Machines Sections 12.4 and 12.5 in the text

Computational Theory


Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Nondeterminism. September 7, Nondeterminism

Methods for the specification and verification of business processes MPB (6 cfu, 295AA)

Deterministic Finite Automata (DFAs)

Finite State Automata

CSE 105 THEORY OF COMPUTATION

Transcription:

Some useful tasks involving language Finite-State Machines and Regular Languages Find all phone numbers in a text, e.g., occurrences such as When you call (614) 292-8833, you reach the fax machine. Find multiple adjacent occurrences of the same word in a text, as in Detmar Meurers: Intro to Computational Linguistics I OSU, LING 684.01 I read the the book. Determine the language of the following utterance: French or Polish? Czy pasazer jadacy do Warszawy moze jechac przez Londyn? 2 More useful tasks involving language Regular expressions Look up the following words in a dictionary: laughs, became, unidentifiable, Thatcherization Determine the part-of-speech of words like the following, even if you can t find them in the dictionary: conurbation, cadence, disproportionality, lyricism, parlance Such tasks can be addressed using so-called finite-state machines. A regular expression is a description of a set of strings, i.e., a language. They can be used to search for occurrences of these strings A variety of unix tools (grep, sed), editors (emacs), and programming languages (perl, python) incorporate regular expressions. Just like any other formalism, regular expressions as such have no linguistic contents, but they can be used to refer to linguistic units. How can such machines be specified? 3 4

The syntax of regular expressions (1) The syntax of regular expressions (2) Regular expressions consist of strings of characters: c, A100, natural language, 30 years! disjunction: ordinary disjunction: devoured ate, famil(y ies) character classes: [Tt]he, bec[oa]me ranges: [A-Z] (a capital letter) negation:[ˆa] (any symbol but a) [ˆA-Z0-9] (not an uppercase letter or number) counters optionality:? colou?r any number of occurrences: * (Kleene star) [0-9]* years at least one occurrence: + [0-9]+ dollars wildcard for any character:. beg.n for any character in between beg and n 5 6 The syntax of regular expressions (3) Regular languages Operator precedence, from highest to lowest: parentheses () counters * +? character sequences disjunction Note: The various unix tools and languages differ w.r.t. the exact syntax of the regular expressions they allow. 7 How can the class of regular languages which is specified by regular expressions be characterized? Let Σ be the set of all symbols of the language, the alphabet, then: 1. {} is a regular language 2. a Σ: {a} is a regular language 3. If L 1 and L 2 are regular languages, so are: (a) the concatenation of L 1 and L 2 : L 1 L 2 = {xy x L 1, y L 2 } (b) the union of L 1 and L 2 : L 1 L 2 (c) the Kleene closure of L: L = L 0 L 1 L 2... where L i is the language of all strings of length i. 8

Properties of regular languages The regular languages are closed under (L 1 and L 2 regular languages): concatenation: L 1 L 2 set of strings with beginning in L 1 and continuation in L 2 intersection: L 1 L 2 set of strings in both L 1 and L 2 reversal: L R 1 set of the reversal of all strings in L 1 Kleene closure: L 1 set of repeated concatenation of a string in L 1 union: L 1 L 2 set of strings in L 1 or in L 2 complementation: Σ L 1 set of all possible strings that are not in L 1 difference: L 1 L 2 set of strings which are in L 1 but not in L 2 9 10 Finite state machines Defining finite state automata Finite state machines (or automata) (FSM, FSA) recognize or generate regular languages, exactly those specified by regular expressions. Example: Regular expression: colou?r Finite state machine: 0 c 6 o 5 l 4 o 2 r u r 1 3 A finite state automaton is a quintuple (Q, Σ,E, S, F) with Q a finite set of states Σ a finite set of symbols, the alphabet S Q the set of start states F Q the set of final states E a set of edges Q (Σ {ǫ}) Q The transition function d can be defined as d(q,a) = {q Q (q,a,q ) E} 11 12

Language accepted by an FSA Finite state transition networks (FSTN) The extended set of edges Ê Q Σ Q is the smallest set such that (q,σ,q ) E : (q, σ,q ) Ê (q 0,σ 1, q 1 ), (q 1, σ 2, q 2 ) Ê : (q 0,σ 1 σ 2,q 2 ) Ê The language L(A) of a finite state automaton A is defined as Finite state transition networks are graphical descriptions of finite state machines: nodes represent the states start states are marked with a short arrow final states are indicated by a double circle arcs represent the transitions L(A) = {w q s S, q f F,(q s, w,q f ) Ê} 13 14 Example for a finite state transition network Finite state transition tables S0 a c S1 S2 b Regular expression specifying the language generated or accepted by the corresponding FSM: ab cb+ b b S3 Finite state transition tables are an alternative, textual way of describing finite state machines: the rows represent the states start states are marked with a dot after their name final states with a colon the columns represent the alphabet the fields in the table encode the transitions 15 16

The example specified as finite state transition table Some properties of finite state machines a b c d S0. S1 S2 S1 S3: S2 S2,S3: S3: Recognition problem can be solved in linear time (independent of the size of the automaton). There is an algorithm to transform each automaton into a unique equivalent automaton with the least number of states. 17 18 Deterministic Finite State Automata Example: Determinization of FSA A finite state automaton is deterministic iff it has no ǫ transitions and for each state and each symbol there is at most one applicable transition. Every non-deterministic automaton can be transformed into a deterministic one: Define new states representing a disjunction of old states for each non-determinacy which arises. Define arcs for these states corresponding to each transition which is defined in the non-deterministic automaton for one of the disjuncts in the new state names. 19 a 1 b c 2 3 d c a e 4 5 c a 7 e 6 a 1 b 2 c 3 d {3,5} e a a 4 {5,6} 5 e a {4,5} c a 7 c, a 3 6 20

From Automata to Transducers Transducers and determinization Needed: mechanism to keep track of path taken A finite state transducer is a 6-tuple (Q, Σ 1, Σ 2,E, S,F) with Q a finite set of states Σ 1 a finite set of symbols, the input alphabet Σ 2 a finite set of symbols, the output alphabet S Q the set of start states F Q the set of final states E a set of edges Q (Σ 1 {ǫ}) Q (Σ 2 {ǫ}) 21 A finite state transducer understood as consuming an input and producing an output cannot generally be determinized. Example: a:b a:b 3 a:c a:c b:b c:c 22 Summary Reading assignment 2 Notations for characterizing regular languages: Regular expressions Finite state transition networks Finite state transition tables Finite state machines and regular languages: Definitions and some properties Finite state transducers Chapter 1 Finite State Techniques of course notes Chapter 2 Regular expressions and automata of Jurafsky and Martin (2000) 23 24