Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

Similar documents
Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Regular expressions, Finite Automata, transition graphs are all the same!!

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Minimal DFA. minimal DFA for L starting from any other

Lecture 08: Feb. 08, 2019

Finite Automata-cont d

A tutorial on sequential functions

Fundamentals of Computer Science

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Chapter 2 Finite Automata

Finite-State Automata: Recap

Java II Finite Automata I

Theory of Computation Regular Languages

NFAs continued, Closure Properties of Regular Languages

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Coalgebra, Lecture 15: Equations for Deterministic Automata

Deterministic Finite Automata

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

CHAPTER 1 Regular Languages. Contents

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

CS375: Logic and Theory of Computing

Formal Languages and Automata

Non-deterministic Finite Automata

Convert the NFA into DFA

NFAs continued, Closure Properties of Regular Languages

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

Closure Properties of Regular Languages

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

Chapter 1, Part 1. Regular Languages. CSC527, Chapter 1, Part 1 c 2012 Mitsunori Ogihara 1

Non-deterministic Finite Automata

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

CMSC 330: Organization of Programming Languages

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

FABER Formal Languages, Automata and Models of Computation

Lecture 9: LTL and Büchi Automata

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Designing finite automata II

1.3 Regular Expressions

Worked out examples Finite Automata

11.1 Finite Automata. CS125 Lecture 11 Fall Motivation: TMs without a tape: maybe we can at least fully understand such a simple model?

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Lexical Analysis Finite Automate

PART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA

CISC 4090 Theory of Computation

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

1.4 Nonregular Languages

Regular languages refresher

ɛ-closure, Kleene s Theorem,

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

GNFA GNFA GNFA GNFA GNFA

1 From NFA to regular expression

Agenda. Agenda. Regular Expressions. Examples of Regular Expressions. Regular Expressions (crash course) Computational Linguistics 1

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

Context-Free Grammars and Languages

Let's start with an example:

Lecture 09: Myhill-Nerode Theorem

Homework 4. 0 ε 0. (00) ε 0 ε 0 (00) (11) CS 341: Foundations of Computer Science II Prof. Marvin Nakayama

NON-DETERMINISTIC FSA

Thoery of Automata CS402

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Normal Forms for Context-free Grammars

State Minimization for DFAs

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

In-depth introduction to main models, concepts of theory of computation:

Lexical Analysis Part III

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Nondeterministic Biautomata and Their Descriptional Complexity

A Unified Construction of the Glushkov, Follow, and Antimirov Automata

CS375: Logic and Theory of Computing

Harvard University Computer Science 121 Midterm October 23, 2012

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CONTEXT-SENSITIVE LANGUAGES, RATIONAL GRAPHS AND DETERMINISM

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy:

Hybrid Control and Switched Systems. Lecture #2 How to describe a hybrid system? Formal models for hybrid system

General Algorithms for Testing the Ambiguity of Finite Automata and the Double-Tape Ambiguity of Finite-State Transducers

Myhill-Nerode Theorem

Formal languages, automata, and theory of computation

Formal Language and Automata Theory (CS21004)

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

Homework 3 Solutions

Languages & Automata

1 Nondeterministic Finite Automata

General Algorithms for Testing the Ambiguity of Finite Automata

3 Regular expressions

Nondeterminism and Nodeterministic Automata

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

More on automata. Michael George. March 24 April 7, 2014

BACHELOR THESIS Star height

Transcription:

Speech Recognition Lecture 2: Finite Automt nd Finite-Stte Trnsducers Mehryr Mohri Cournt Institute nd Google Reserch mohri@cims.nyu.com

Preliminries Finite lphet Σ, empty string. Set of ll strings over : Σ Σ (free monoid). Length of string x Σ : x. Mirror imge or reverse of string x = x 1 x n : x R = x n x 1. A lnguge L: suset of Σ. Mehryr Mohri - Speech Recognition pge 2 Cournt Institute, NYU

Rtionl Opertions Rtionl opertions over lnguges: union: lso denoted L 1 + L 2, conctention: closure: L 1 L 2 = {x Σ : x L 1 x L 2 }. L 1 L 2 = {x = uv Σ : u L 1 v L 2 }. L = n=0 L n, where L n = L L. n Mehryr Mohri - Speech Recognition pge 3 Cournt Institute, NYU

Regulr or Rtionl Lnguges Definition: closure under rtionl opertions of Σ. Thus, Rt(Σ ) is the smllest suset L of 2 Σ verifying L ; x Σ, {x} L ; L 1,L 2 L,L 1 L 2 L,L 1 L 2 L,L 1 L. Exmples of regulr lnguges over Σ={,, c} : Σ, ( + ) c, n c, ( +( + c) ) c. Mehryr Mohri - Speech Recognition pge 4 Cournt Institute, NYU

Finite Automt Definition: finite utomton A over the lphet is 4-tuple (Q, I, F, E) where Q is finite set of sttes, I Q set of initil sttes, F Q set of finl sttes, nd E multiset of trnsitions which re elements of Q (Σ {}) Q. pth in n utomton π A =(Q, I, F, E) element of E. pth from stte in I to stte in is n F is clled n ccepting pth. Lnguge L(A) ccepted y A: set of strings leling ccepting pths. Σ Mehryr Mohri - Speech Recognition pge 5 Cournt Institute, NYU

Finite Automt - Exmple 0 1 2 Mehryr Mohri - Speech Recognition pge 6 Cournt Institute, NYU

Finite Automt - Some Properties Trim: ny stte lies on some ccepting pth. Unmiguous: no two ccepting pths hve the sme lel. Deterministic: unique initil stte, two trnsitions leving the sme stte hve different lels. Complete: t lest one outgoing trnsition leled with ny lphet element t ny stte. Acyclic: no pth with cycle. Mehryr Mohri - Speech Recognition pge 7 Cournt Institute, NYU

Normlized Automt Definition: finite utomton is normlized if it hs unique initil stte with no incoming trnsition. it hs unique finl stte with no outgoing trnsition. i A f Mehryr Mohri - Speech Recognition pge 8 Cournt Institute, NYU

Elementry Normlized Automton Definition: normlized utomton ccepting n element Σ {} constructed s follows. 0 1 Mehryr Mohri - Speech Recognition pge 9 Cournt Institute, NYU

Normlized Automt: Union Construction: the union of two normlized utomt is normlized utomton constructed s follows. i 1 A 1 1 f i f i 2 A 2 2 f Mehryr Mohri - Speech Recognition pge 10 Cournt Institute, NYU

Normlized Automt: Conctention Construction: the conctention of two normlized utomt is normlized utomton constructed s follows. i1 A f 1 1 i f 2 A 2 2 Mehryr Mohri - Speech Recognition pge 11 Cournt Institute, NYU

Normlized Automt: Closure Construction: the closure of normlized utomton is normlized utomton constructed s follows. i 0 i A f f 0 Mehryr Mohri - Speech Recognition pge 12 Cournt Institute, NYU

Normlized Automt - Properties Construction properties: ech rtionl opertion require creting t most two sttes. ech stte hs t most two outgoing trnsitions. the complexity of ech opertion is liner. Mehryr Mohri - Speech Recognition pge 13 Cournt Institute, NYU

Thompson s Construction Proposition: let r e regulr expression over the lphet Σ. Then, there exists normlized utomton A with t most 2 r sttes representing r. Proof: (Thompson, 1968) liner-time context-free prser to prse regulr expression. construction of normlized utomton strting from elementry expressions nd following opertions of the tree. Mehryr Mohri - Speech Recognition pge 14 Cournt Institute, NYU

Thompson s Construction - Exmple ε 4 ε 5 ε ε 1 2 ε 3 ε 6 ε 0 ε 7 c 8 ε 9 Normlized utomton for regulr expression + c. Mehryr Mohri - Speech Recognition pge 15 Cournt Institute, NYU

Regulr Lnguges nd Finite Automt Theorem: A lnguge is regulr iff it cn e ccepted y finite utomton. Proof: Let for A =(Q, I, F, E) e finite utomton. (i, j, k) [1, Q ] [1, Q ] [0, Q ] L(A) = is thus regulr. i I,f F X Q if Mehryr Mohri - Speech Recognition pge 16 define Xij 0 is regulr for ll (i, j) since E is finite. y recurrence Xij k for ll (i, j, k) since (Kleene, 1956) X k ij = {i q 1 q 2... q n j : n 0,q i k}. X k+1 ij = X k ij + Xk i,k+1 (Xk k+1,k+1 ) X k k+1,j. Cournt Institute, NYU

Regulr Lnguges nd Finite Automt Proof: the converse holds y Thompson s construction. Notes: more generl theorem (Schützenerger, 1961) holds for weighted utomt. not ll lnguges re regulr, e.g., L = { n n : n N} is not regulr. Let A e n utomton. If L L(A), then for lrge enough n, n n corresponds to pth with cycle: n n = p u q, p u q L(A), which implies L(A) = L. Mehryr Mohri - Speech Recognition pge 17 Cournt Institute, NYU

Left Syntctic Congruence Definition: for ny lnguge L Σ, the left syntctic congruence is the equivlence reltion defined y u L v u 1 L = v 1 L, where for ny u Σ, u 1 L is defined y u 1 L = {w : uw L}. u 1 L of L with respect to u nd denoted L. is sometimes clled the prtil derivtive u Mehryr Mohri - Speech Recognition pge 18 Cournt Institute, NYU

Regulr Lnguges - Chrcteriztion Theorem: lnguge L is regulr iff the set of is finite ( hs finite index). L Proof: let utomton ccepting A =(Q, I, F, E) L e trim deterministic (existence seen lter). let δ the prtil trnsition function. Then, urv δ(i, u) =δ(i, v). u 1 L lso defines n eq. reltion with index Q. since δ(i, u) =δ(i, v) u 1 L = v 1 L, the index of L is t most Q, thus finite. Mehryr Mohri - Speech Recognition pge 19 Cournt Institute, NYU

Regulr Lnguges - Chrcteriztion Proof: conversely, if the set of the utomton Q = {u 1 L: u Σ } ; i = 1 L = L, I = {i} ; F = {u 1 L: u L} ; since ccepts exctly L. u 1 L A =(Q, I, F, E) is finite, then defined y ; is well defined nd E = {(u 1 L,, (u) 1 L): u Σ } u 1 L = v 1 L (u) 1 L =(v) 1 L Mehryr Mohri - Speech Recognition pge 20 Cournt Institute, NYU

Illustrtion Miniml deterministic utomton for ( + ) : L -1 L () -1 L Mehryr Mohri - Speech Recognition pge 21 Cournt Institute, NYU

ε-removl Theorem: ny finite utomton dmits n equivlent utomton with no ε- trnsition. A =(Q, I, F, E) Proof: for ny stte q Q, let [q] denote the set of sttes reched from q y pths leled with. Define A =(Q,I,F,E ) y Q = {[q]: q Q}, I = [q], F = {[q]: [q] F = }. q I E = {([p],,[q]) : (p,,q ) E,p [p],q [q]}. Mehryr Mohri - Speech Recognition pge 22 Cournt Institute, NYU

ε-removl - Illustrtion 0 1 2 3 {0, 1} {0, 2} {0, 1, 3} {0} Mehryr Mohri - Speech Recognition pge 23 Cournt Institute, NYU

Determiniztion Theorem: ny utomton A =(Q, I, F, E) without -trnsitions dmits n equivlent deterministic utomton. Proof: Suset construction: A =(Q,I,F,E ) Q =2 Q. I = {s Q : s I = }. F = {s Q : s F = }. E = {(s,, s ): (q,, q ) E,q s, q s }. with Mehryr Mohri - Speech Recognition pge 24 Cournt Institute, NYU

Determiniztion - Illustrtion 0 1 2 {0} {1} {1, 2} {2} {0, 1} Mehryr Mohri - Speech Recognition pge 25 Cournt Institute, NYU

Completion Theorem: ny deterministic utomton dmits n equivlent complete deterministic utomton. Proof: constructive, see exmple. 0 1 3 0 1 3 2 2 4 Mehryr Mohri - Speech Recognition pge 26 Cournt Institute, NYU

Complementtion Theorem: let A =(Q, I, F, E) e deterministic utomton, then there exists deterministic utomton ccepting L(A). Proof: y previous theorem, we cn ssume A complete. The utomton otined from A y mking non-finl sttes finl nd finl sttes non-finl exctly ccepts L(A). B =(Σ,Q,I,Q F, E) Mehryr Mohri - Speech Recognition pge 27 Cournt Institute, NYU

Complementtion - Ilustrtion 0 1 3 2 4 0 1 3 2 4 Mehryr Mohri - Speech Recognition pge 28 Cournt Institute, NYU

Regulr Lnguges - Properties Theorem: regulr lnguges re closed under rtionl opertions, intersection, complementtion, reversl, morphism, inverse morphism, nd quotient with ny set. Proof: closure under rtionl opertions holds y definition. intersection: use De Morgn s lw. complementtion: use lgorithm. others: lgorithms nd equivlence reltion. Mehryr Mohri - Speech Recognition pge 29 Cournt Institute, NYU

Rtionl Reltions Definition: closure under rtionl opertions of the monoid Σ, where Σ nd re finite lphets, denoted y Rt(Σ ). exmples: (, ), (, ) (, )+(, ). Mehryr Mohri - Speech Recognition pge 30 Cournt Institute, NYU

Rtionl Reltions - Chrcteriztion Theorem: R Rt(Σ ) is rtionl reltion iff there exists regulr lnguge L (Σ ) such tht R = {(π Σ (x),π (x)) : x L} where is the projection of over nd π the projection over. π Σ (Σ ) Σ Proof: use surjective morphism π :(Σ ) (Σ ) x (π Σ (x),π (x)). (Nivt, 1968) Mehryr Mohri - Speech Recognition pge 31 Cournt Institute, NYU

Trnsductions Definition: function trnsduction from Σ to. reltion ssocite to τ : τ :Σ 2 is clled R(τ) ={(x, y) Σ : y τ(x)}. trnsduction ssocited to reltion: x Σ,τ(x) ={y :(x, y) R}. rtionl trnsductions: trnsductions with rtionl reltions. Mehryr Mohri - Speech Recognition pge 32 Cournt Institute, NYU

Finite-Stte Trnsducers Definition: finite-stte trnsducer T over the lphets Σ nd is 4-tuple where Q is finite set of sttes, I Q set of initil sttes, F Q set of finl sttes, nd E multiset of trnsitions which re elements of Q (Σ {}) ( {}) Q. T defines reltion vi the pir of input nd output lels of its ccepting pths, R(T )={(x, y) Σ : I x:y F }. Mehryr Mohri - Speech Recognition pge 33 Cournt Institute, NYU

Rtionl Reltions nd Trnsducers Theorem: trnsduction is rtionl iff it cn e relized y finite-stte trnsducer. Proof: Nivt s theorem comined with Kleene s theorem, nd construction of normlized trnsducer from finite-stte trnsducer. Mehryr Mohri - Speech Recognition pge 34 Cournt Institute, NYU

References Kleene, S. C.1956. Representtion of events in nerve nets nd finite utomt. Automt Studies. Nivt, Murice. 968. Trnsductions des lngges de Chomsky. Annles 18, Institut Fourier. Schützenerger, Mrcel~Pul. 1961. On the definition of fmily of utomt. Informtion nd Control, 4 Thompson, K. 1968. Regulr expression serch lgorithm. Comm. ACM, 11. Mehryr Mohri - Speech Recognition pge 35 Cournt Institute, NYU