FS Properties and FSTs

Similar documents
T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Algorithms for NLP

Sri vidya college of engineering and technology

CSE 105 THEORY OF COMPUTATION

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Finite Automata and Regular languages

Computational Theory

Equivalence of Regular Expressions and FSMs

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

Theory of Computation

Automata: a short introduction

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

CS 455/555: Finite automata

Finite-state Machines: Theory and Applications

CSE 105 THEORY OF COMPUTATION

Formal Models in NLP

The Pumping Lemma and Closure Properties

CMSC 330: Organization of Programming Languages

Before we show how languages can be proven not regular, first, how would we show a language is regular?

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

COMP4141 Theory of Computation

Regular Expressions. Definitions Equivalence to Finite Automata

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Deterministic Finite Automaton (DFA)

More Properties of Regular Languages

MA/CSSE 474 Theory of Computation. Your Questions? Previous class days' material Reading Assignments

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Nondeterministic Finite Automata

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Computational Models: Class 3

Inf2A: Converting from NFAs to DFAs and Closure Properties

Regular Expressions [1] Regular Expressions. Regular expressions can be seen as a system of notations for denoting ɛ-nfa

Theory of computation: initial remarks (Chapter 11)

Regular Expressions and Language Properties

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Notes for Comp 497 (Comp 454) Week 5 2/22/05. Today we will look at some of the rest of the material in Part 1 of the book.

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Lecture 2: Regular Expression

Finite Automata Part Two

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CS 154. Finite Automata, Nondeterminism, Regular Expressions

FINITE STATE AUTOMATA

Finite Automata and Formal Languages

CSE 105 Theory of Computation Professor Jeanne Ferrante

Warshall s algorithm

CSE 105 THEORY OF COMPUTATION

CS 154 Introduction to Automata and Complexity Theory

Finite Automata and Regular Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

Deterministic Finite Automata (DFAs)

CSE 105 THEORY OF COMPUTATION

3515ICT: Theory of Computation. Regular languages

Lecture 3: Nondeterministic Finite Automata

1 More finite deterministic automata

Proving languages to be nonregular

acs-04: Regular Languages Regular Languages Andreas Karwath & Malte Helmert Informatik Theorie II (A) WS2009/10

What we have done so far

CSE 105 THEORY OF COMPUTATION

Intro to Theory of Computation

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

CS 121, Section 2. Week of September 16, 2013

Name: Student ID: Instructions:

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Formal Languages, Automata and Models of Computation

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

COM364 Automata Theory Lecture Note 2 - Nondeterminism

DFA to Regular Expressions

Deterministic Finite Automata (DFAs)

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

CSE 105 THEORY OF COMPUTATION

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Outline. Summary. DFA -> Regex. Finish off Regex -> e-nfa -> NFA -> DFA -> Regex Minimization/equivalence (Myhill-Nerode theorem)

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

2. Elements of the Theory of Computation, Lewis and Papadimitrou,

UNIT II REGULAR LANGUAGES

Harvard CS 121 and CSCI E-207 Lecture 6: Regular Languages and Countability

CSE 105 THEORY OF COMPUTATION

Automata and Formal Languages - CM0081 Finite Automata and Regular Expressions

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Finite Automata and Formal Languages TMV026/DIT321 LP4 2012

Formal Languages. We ll use the English language as a running example.

1. Induction on Strings

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Computational Models - Lecture 5 1

Computational Models - Lecture 3 1

Languages. A language is a set of strings. String: A sequence of letters. Examples: cat, dog, house, Defined over an alphabet:

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

Non-deterministic Finite Automata (NFAs)

CSE 105 THEORY OF COMPUTATION

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Transcription:

FS Properties and FSTs Chris Dyer Algorithms for NLP 11-711

Announcements HW1 has been posted; due in class in 2 weeks

Goals of Today s Lecture Understand properties of regular languages Understand Brzozowski derivatives and how to use them to prove languages are regular/not-regular. Understand relations between formal languages and the definition of finite state transducers Understand the operations on finite state transducers (FSTs): union, concatenation, Kleene*, composition, and (sometimes) intersection

Properties of FSAs Any NFA-e can be made into a NFA Any NFA can be made into a DFA Thus, all (DFAs NFAs NFA-e s) define the same set. RE s can be converted into NFA-e s (and thus to DFAs). Can we convert any DFA into an RE?

DFAs into REs Yes! Algorithm due to Kleene (1956) Given a DFA M = hq,,,q 0,Fi with states numbered 0, 1, 2,, n We will incrementally construct which gives the expression for the language containing all strings for each pair of states (i,j) without going through a state higher than k. [ R = j2f R n 0j R k ij

RE construction Base: R 1 ii =( 1 2... m ") () (q i, 1) =...= (q i, m) = (q i, ") =q i R 1 ij =( 1 2... m) Recursion: () (q i, 1) =...= (q i, m) =q j Rij k = R k 1 ik (Rk 1 kk )Rk 1 R k 1 kj ij

RE construction Base: R 1 ii =( 1 2... m ") R 1 () (q i, 1) =...= (q i, m) = (q i, ") =q i =( ij 1 2... m) Thus, REs, DFAs, NFAs, NFA-e s all represent () the (q i, full 1) set =...= of regular (q i, m) languages. =q j Recursion: Rij k = R k 1 ik (Rk 1 kk )Rk 1 R k 1 kj ij

Closure Properties Regular languages are closed under Intersection (cf. dual control proof from recitation) Finite union (how would you prove this?) Kleene* Concatenation Complementation (Contruct a DFA. Flip final states for nonfinal states) Difference A B = A \ B

Closure Properties Regular languages are closed under Intersection (cf. dual control proof from recitation) Finite union (how would you prove this?) Other definitions of complementation and Kleene* difference are possible. (Exam Q?) Concatenation Complementation (Contruct a DFA. Flip final states for nonfinal states) Difference A B = A \ B

Minimization For a regular language L there exists a unique DFA A (up to a renaming of states) that accepts L The proof is constructive and provides an algorithm for DFA minimization See the book for details. You will not be required to understand the details of the proof, but you should be aware that it exists. Intuition: lexicographic sort. DFA minimization is important for practical applications

Some other Qs Is a language empty? Determinize, minimize, compare to empty Are two regular languages equivalent? Determinize, minimize, compare Is a language finite? Determinize, minimize, look for loops Is a language regular?

Is this language regular? If finite, then yes. If infinite, then we must prove things. The classic intro solution is the Pumping Lemma. It is not a great bit of theory, IMHO: It is necessary, but not sufficient to prove nonregularity (i.e., a non regular language may be pumpable). It is needlessly complex. Many thanks to Adam Lopez (Edinburgh) for pointing this out to me: https://bosker.wordpress.com/2013/08/18/i-hate-the-pumping-lemma/

Brzozowski Derivatives Recall differentiation from calculus: d dx x y z = y z d (xyz + xz) =(yz + z) dx

Brzozowski Derivatives Recall differentiation from calculus: d dx x y z = y z d (xyz + xz) =(yz + z) dx Brzozowski s idea is that concatenation is a kind of product, and union is a kind of sum, motivating the following definition: d dw L = {v 2 wv 2 L}

Brzozowski Derivatives Brzozowski s idea is that concatenation is a kind of product, and union is a kind of sum, motivating the following definition: d dw L = {v 2 wv 2 L}

Brzozowski Derivatives Brzozowski s idea is that concatenation is a kind of product, and union is a kind of sum, motivating the following definition: d dw L = {v 2 wv 2 L} Examples.

Brzozowski Derivatives Brzozowski s idea is that concatenation is a kind of product, and union is a kind of sum, motivating the following definition: d dw L = {v 2 wv 2 L} Examples. d {ab, aab, ba, b} = {b, ab} da d da {an : n 0} = {a n : n 0} d {a, ab} = {",b} da

Proving Regularity Theorem. A language is regular iff it has a finite number of Brzozowski derivatives.

Proving Regularity Theorem. A language is regular iff it has a finite number of Brzozowski derivatives. Lemma. Every derivative of a regular language is regular. Proof. Stripping off a prefix corresponds to changing the start state of a DFA.

Proving Regularity Theorem. A language is regular iff it has a finite number of Brzozowski derivatives. Lemma. Every derivative of a regular language is regular. Proof. Stripping off a prefix corresponds to changing the start state of a DFA. Proof of theorem. (=) ) By definition, DFAs have a finite number of, states, and by the lemma, derivatives ~ states, there must be a finite number of derivatives.

Proving Regularity Proof of theorem. ( (= ) By construction. Given a language L on alphabet. States of the DFA are languages. q 0 = L (q, )= d d (q) F = {q 2 Q " 2 q}

Proving Regularity Proof of theorem. ( (= ) By construction. Given a language L on alphabet. States of the DFA are languages. q 0 = L (q, )= d d (q) F = {q 2 Q " 2 q} Proof that construction is correct. By induction on length of w. (try at home).

Examples Is the following language regular? L = a i b i i 0 d d(a n ) L = ai b i+n i 0 8n 1 No.

Examples Is the following language regular? L = a i b i i 0 d d(a n ) L = ai b i+n i 0 8n 1 No.

Examples Is the following language regular? L = a i b i i 0 d d(a n ) L = ai b i+n i 0 8n 1 No.

Examples Is the following language regular? L = a i b i i 0 d d(a n ) L = ai b i+n i 0 8n 1 No. Is the following language regular? L = a i b j i 0 ^ j 0 d d(a n ) L = ai b j i 0 ^ j 0 8n 1 Yes.

Examples Is the following language regular? L = a i b i i 0 d d(a n ) L = ai b i+n i 0 8n 1 No. Is the following language regular? L = a i b j i 0 ^ j 0 d d(a n ) L = ai b j i 0 ^ j 0 8n 1 Yes.

Examples Is the following language regular? L = a i b i i 0 d d(a n ) L = ai b i+n i 0 8n 1 No. Is the following language regular? L = a i b j i 0 ^ j 0 d d(a n ) L = ai b j i 0 ^ j 0 8n 1 Yes.

Transducers In NLP we often want to transduce between different representations Tokenization, POS tagging, grapheme to phoneme conversion, morphological analysis, spelling correction, translation, The formal concept we will rely on is that of a relation (generalization of a function) We will refer to automata that define relations/perform transduction as transducers

a b ab aba aaa bbb...

a b ab aba aaa bbb... 0 1 111...

Relation a b ab aba aaa bbb... 0 1 111... In general, a relation is a many-to-many mapping.

Finite State Transducers a :0 b : " q 0 b :1 a :1 b :0 a :2

Finite State Transducers a :0 b : " q 0 b :1 a :1 b :0 a :2

Notational Equivalence a : a a

Finite State Transducers A finite state transducer is a 6-tuple T = hq,,,,q 0,Fi where Q is a finite set of states is the finite input alphabet : is Qthe finite! 2output Q is the a transition alphabet transition function relation : Q ( [ {"}) ( [ {"})! 2 Q is the transition relation q 0 2 Q F Q is the start (initial) state is the set of final (accept) states 24

Generalized Transitions As with FSAs, we can provide a generalized definition of the transition function. ˆ(q, ", ") =q ˆ(q, a, b) = (q, a, b) ˆ(q, xa, yb) =s () ˆ(q, x, y) =r ^ (r, a, b) =s Given x 2 and y 2 we say that T transduces x to y and we write x[t ]y iff there is a path from q 0 to some final state producing the string pair x, y, i.e. ˆ(q0, x, y) \ F 6= ; 25

Regular Relations R(T ) R(T )={x 2 : y 2 x[t ]y} where T is a finite state transducer. 26

Operations on FSTs Given FSTs T and S and w, x 2 and y, z 2, the following FSTs exist: 27

Operations on FSTs Given FSTs T and S and w, x 2 and y, z 2, the following FSTs exist: (union) x[t [ S]y () x[t ]y _ x[s]y 27

Operations on FSTs Given FSTs T and S and w, x 2 and y, z 2, the following FSTs exist: (union) (concatenation) x[t [ S]y () x[t ]y _ x[s]y wx[t.s]yz () w[t ]y ^ x[s]z 27

Operations on FSTs Given FSTs T and S and w, x 2 and y, z 2, the following FSTs exist: (union) (concatenation) x[t [ S]y () x[t ]y _ x[s]y wx[t.s]yz () w[t ]y ^ x[s]z (Kleene*) "[T ]" w[t ]y ^ x[t ]z =) wx[t ]yz 27

Operations on FSTs Given FSTs T and S and w, x 2 and y, z 2, the following FSTs exist: (union) (concatenation) x[t [ S]y () x[t ]y _ x[s]y wx[t.s]yz () w[t ]y ^ x[s]z (Kleene*) "[T ]" w[t ]y ^ x[t ]z =) wx[t ]yz Given FSTs T and S, with alphabets (, ) and (, ) the following FSTs exist: (composition) x[t S]y () 9z 2 s.t. x[t ]z ^ z[s]y 27

Other Operations In contrast to regular languages, regular relations are not closed under Intersection Complementation Difference 28

Intersection Theorem. Regular relations aren t closed under intersection. Proof. By counterexample. a : b " : c " : b a : c " : c a : c q 0 q 0 T S T [ S = {(a n,b n c n ):n 0} 29

Intersection However, certain restricted classes of FSTs are closed under intersection And FST intersection is extremely useful, as you will see in later lectures 30

Operations on FSTs FSTs can be inverted by swapping the input and output labels FSTs can be determinized such that each state has a single outgoing transition with a single label in the input language (outputs may not be deterministic); FST must be a functional relation. [construction similar to powerset construction] Deterministic FSTs can be minimized. FSTs can be projected to FSAs to yield the input or output languages 31

Building FSTs FSTs are often constructed modularly to deal with certain phenomena and then composed. Enables a divide and conquer approach to design. 32