Formal Models in NLP

Similar documents
Finite-state Machines: Theory and Applications

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Finite State Transducers

Automata and Formal Languages - CM0081 Non-Deterministic Finite Automata

Inf2A: Converting from NFAs to DFAs and Closure Properties

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

CS 208: Automata Theory and Logic

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Theory of Computation (I) Yijia Chen Fudan University

Finite Automata and Regular languages

Nondeterministic Finite Automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Lecture 1: Finite State Automaton

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Finite Automata and Languages

CS 455/555: Finite automata

Sri vidya college of engineering and technology

Intro to Theory of Computation

Closure under the Regular Operations

Chapter Five: Nondeterministic Finite Automata

CS 121, Section 2. Week of September 16, 2013

Lecture 3: Nondeterministic Finite Automata

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

Introduction to Formal Languages, Automata and Computability p.1/51

CS 154, Lecture 3: DFA NFA, Regular Expressions

Automata and Formal Languages - CM0081 Finite Automata and Regular Expressions

Non-deterministic Finite Automata (NFAs)

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Equivalence of DFAs and NFAs

Finite Automata and Regular Languages

Extended transition function of a DFA

Nondeterministic Finite Automata. Nondeterminism Subset Construction

3515ICT: Theory of Computation. Regular languages

Introduction to Finite-State Automata

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

Theory of Computation

Nondeterministic Finite Automata

Harvard CS 121 and CSCI E-207 Lecture 4: NFAs vs. DFAs, Closure Properties

Outline. Nondetermistic Finite Automata. Transition diagrams. A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F)

2. Elements of the Theory of Computation, Lewis and Papadimitrou,

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

CSE 135: Introduction to Theory of Computation Equivalence of DFA and NFA

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Constructions on Finite Automata

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Finite Automata. Dr. Neil T. Dantam. Fall CSCI-561, Colorado School of Mines. Dantam (Mines CSCI-561) Finite Automata Fall / 35

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

Closure under the Regular Operations

Finite Automata Part Two

Finite Automata. Theorems - Unit I SUSAN ELIAS. Professor Department of Computer Science & Engineering Sri Venkateswara College of Engineering

Decision, Computation and Language

Theory of Computation (II) Yijia Chen Fudan University

Nondeterminism and Epsilon Transitions

CS 275 Automata and Formal Language Theory

Finite Universes. L is a fixed-length language if it has length n for some

Regular Expressions. Definitions Equivalence to Finite Automata

CS243, Logic and Computation Nondeterministic finite automata

Takeaway Notes: Finite State Automata

Subset construction. We have defined for a DFA L(A) = {x Σ ˆδ(q 0, x) F } and for A NFA. For any NFA A we can build a DFA A D such that L(A) = L(A D )

Constructions on Finite Automata

Theory of Languages and Automata

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )

FS Properties and FSTs

September 7, Formal Definition of a Nondeterministic Finite Automaton

INF Introduction and Regular Languages. Daniel Lupp. 18th January University of Oslo. Department of Informatics. Universitetet i Oslo

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

Regular expressions and Kleene s theorem

Finite Automata and Regular Languages (part III)

Finite Automata. Seungjin Choi

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

CSC173 Workshop: 13 Sept. Notes

Computer Sciences Department

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

Equivalence of Regular Expressions and FSMs

Nondeterminism. September 7, Nondeterminism

Nondeterministic Finite Automata

Finite Automata. Finite Automata

CPS 220 Theory of Computation REGULAR LANGUAGES

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

Computational Models - Lecture 1 1

Parsing Regular Expressions and Regular Grammars

Computational Theory

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Obtaining the syntactic monoid via duality

The Pumping Lemma and Closure Properties

Theory of Computation Lecture 1. Dr. Nahla Belal

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Computability and Complexity

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

NOTES ON AUTOMATA. Date: April 29,

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs

Uses of finite automata

CSE 105 Theory of Computation Professor Jeanne Ferrante

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Applied Computer Science II Chapter 1 : Regular Languages

Transcription:

Formal Models in NLP Finite-State Automata Nina Seemann Universität Stuttgart Institut für Maschinelle Sprachverarbeitung Pfaffenwaldring 5b 70569 Stuttgart May 15, 2012 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 1

Outline 1 Finite-State Automata: Characterization 2 Closure Properties of Finite-State Acceptors 3 Closure Properties of Finite-State Transducers 4 Equivalence Transformations on Finite-State Acceptors Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 2

Outline 1 Finite-State Automata: Characterization Finite-State Acceptors Finite-State Transducers 2 Closure Properties of Finite-State Acceptors 3 Closure Properties of Finite-State Transducers 4 Equivalence Transformations on Finite-State Acceptors Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 3

Finite-State Acceptors Example (NFA A lex accepting some animal names) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 4

Finite-State Acceptors Non-Deterministic Finite-State Acceptor Definition (Non-deterministic finite-state acceptor (NFA)) A non-deterministic finite-state acceptor A is a 5-tuple (Q, Σ, q 0, F, δ) where Q is a finite set of states Σ is the alphabet q 0 Q is the start state F Q is a set of final states δ : Q Σ {ɛ} 2 Q, the transition function Nondeterminism refers to the fact that a NFA has the power to be in several states at once. A transition may be labeled with ɛ. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 5

Finite-State Acceptors Deterministic Finite-State Acceptor Definition (Deterministic finite-state acceptor (DFA)) A deterministic finite-state acceptor D is a 5-tuple (Q, Σ, q 0, F, δ) where Q is a finite set of states Σ is a finite set and called the alphabet q 0 Q is the initial state F Q is a set of final states δ : Q Σ Q, the transition function Determinism refers to the fact that DFAs can go to one state only. DFAs are ɛ-free by definition. DFA and NFA have the same generative power, i.e. they are equivalent. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 6

Finite-State Acceptors Example (DFA D lex accepting some animal names) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 7

Finite-State Acceptors Extended Transition Function & Language Definition (Extended transition function ˆδ) ˆδ describes what happens when we start in any state and follow any sequence of inputs. ˆδ(q, ɛ) = q. ˆδ(q, w) = δ(ˆδ(q, x), a) with w = xa. Definition (Language of a DFA A) L(A) = {w Σ ˆδ(q o, w) F } We also say that L(A) is recognized by A. Definition (Regular language) The language is called regular if there exists some DFA which recognizes it. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 8

Finite-State Acceptors Extended Transition Function for DFA Example (frog in DFA D lex ) Assumption: ˆδ(0, frog) {26, 24, 22, 13, 11, 9, 8} ˆδ(0, ɛ) = 0 ˆδ(0, f ) = δ(ˆδ(0, ɛ), f ) = δ(0, f ) = 3 ˆδ(0, fr) = δ(ˆδ(0, f ), r) = δ(3, r) = 6 ˆδ(0, fro) = δ(ˆδ(0, fr), o) = δ(6, o) = 7 ˆδ(0, frog) = δ(ˆδ(0, fro), g) = δ(7, g) = 8 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 9

Finite-State Acceptors Extended Transition Function for NFA Example (frog in NFA A lex ) Assumption: ˆδ(31, frog) {2, 6, 9, 13, 18, 21, 30} ˆδ(31, ɛ) = {31} ˆδ(31, f ) = δ(ˆδ(31, ɛ), f ) = δ(31, f ) = {3, 7, 10} ˆδ(31, fr) = δ(ˆδ(31, f ), r) = δ(3, r) δ(7, r) δ(10, r) = {4} = {4} ˆδ(31, fro) = δ(ˆδ(31, fr), o) = δ(4, o) = {5} ˆδ(31, frog) = δ(ˆδ(31, fro), g) = δ(5, g) = {6} Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 10

Finite-State Transducers Definition Definition ((Non-deterministic) finite-state transducer (NFST)) A (non-deterministic) finite-state transducer T is a 7-tuple (Q, Σ,, q 0, F, δ, σ) where Q is a set of states Σ is the input alphabet of T is the output alphabet of T q 0 Q is the start state F Q is a set of final states δ : Q Σ {ɛ} 2 Q, the transition function σ : Q Σ {ɛ} Q, the output function Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 11

Finite-State Transducers Alternative Definition Definition (Normalized finite-state transducer) A normalized finite-state transducer T is a 6-tuple (Q, Σ,, q 0, F, E) where Q is a set of states Σ is a set and called the input alphabet of T is a set and called the output alphabet of T q 0 Q is the start state F Q is a set of final states E Q (Σ {ɛ}) ( {ɛ}) Q, the set of transitions Every transducer can be transformed into a normalized transducer. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 12

Finite-State Transducers Example (NFST T lex mapping surface forms to morph. features) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 13

Finite-State Transducers Deterministic Finite-State Transducer Definition (Deterministic finite-state transducer (DFST)) A deterministic finite-state transducer T is a 7-tuple (Q, Σ,, q 0, F, δ, σ) where Q is a set of states Σ is a set and called the input alphabet of T is a set and called the output alphabet of T q 0 Q is the start state F Q is a set of final states δ : Q Σ Q, the (deterministic) transition function σ : Q Σ Q, the (deterministic) output function Note: Not every NFST can be determinized. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 14

Outline 1 Finite-State Automata: Characterization Finite-State Acceptors Finite-State Transducers 2 Closure Properties of Finite-State Acceptors 3 Closure Properties of Finite-State Transducers 4 Equivalence Transformations on Finite-State Acceptors Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 15

Closure Properties of Finite-State Acceptors Finite-state acceptors are closed under: Union Concatenation Closure (Kleene Star) Reversal Intersection Complementation Difference Homomorphism / Inverse homomorphism Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 16

Closure Properties of Finite-State Acceptors Union Example (Union of two acceptors A 1 and A 2 ) A 1 A 2 A 1 A 2 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 17

Closure Properties of Finite-State Acceptors Concatenation Example (Concatenation of two acceptors A 1 and A 2 ) A 1 A 2 A 1 A 2 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 18

Closure Properties of Finite-State Acceptors Closure (Kleene Star) Example (Closure of acceptor A 1 ) A 1 A 1 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 19

Closure Properties of Finite-State Acceptors Reversal Example (Reversal of acceptor A 2 ) A 2 A R 2 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 20

Closure Properties of Finite-State Acceptors Intersection Intersection Let L and M be the languages of the deterministic automata A L = (Q L, Σ, δ L, q L, F L ) and A M = (Q M, Σ, δ M, q M, F M ). For L M we will construct an automaton A = (Q L Q M, Σ, δ, (q L, q M ), F L F M ) where δ((p, q), σ) = (δ L (p, σ), δ M (q, σ)) [p Q L, q Q M, and σ Σ]. The set F of final states consists of all pairs (p, q) such that p F L and q F M. states of A are pair of states (A L, A M ) suppose state (p,q): Given input symbol a what does A L on input a s what does A M on input a t new state pair (s, t) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 21

Closure Properties of Finite-State Acceptors Intersection Example (Intersection of two acceptors A 1 and A 3 ) A 1 A 3 A 1 A 3 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 22

Closure Properties of Finite-State Acceptors Complementation Example (Complementation of acceptor A 3 ) A 3 A 3 Complementation requires a deterministic acceptor. If the acceptor is not total, a sink state has to be added. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 23

Closure Properties of Finite-State Acceptors Difference Example (Difference of two acceptors A 1 and A 2 ) A 1 A 2 A 1 A 2 = A 1 A 2 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 24

Outline 1 Finite-State Automata: Characterization Finite-State Acceptors Finite-State Transducers 2 Closure Properties of Finite-State Acceptors 3 Closure Properties of Finite-State Transducers 4 Equivalence Transformations on Finite-State Acceptors Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 25

Closure Properties of Finite-State Transducers Finite-state transducers are closed under Union Concatenation Closure (Kleene Star) Reversal Projection (leads to FSAs) Composition Inversion Finite-state transducers are not closed under Complementation Intersection (but acyclic and ɛ-free transducers are) Difference Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 26

Closure Properties of Finite-State Transducers Projection Example (Projection of transducer T ) Transducer T π 1 (T ) π 2 (T ) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 27

Composition Definition (ɛ-free composition) Let T 1 = (Q 1, Σ 1, 1, q 1, F 1, E 1 ) and T 2 = (Q 2, Σ 2, 2, q 2, F 2, E 2 ) be two normalized, ɛ-free FSTs. T 1 T 2 is the transducer T = (Q 1 Q 2, Σ 1, 2, (q 1, q 2 ), F 1 F 2, E) where E = {((p, q), a, b, (p, q )) c 1 Σ 2 : (p, a, c, p ) E 1 (q, c, b, q ) E 2 } How does composition work? Whenever T 1 contains a transition: and T 2 contains a transition: T will contain a transition: Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 28

Closure Properties of Finite-State Transducers Composition Example (Composition) = Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 29

Closure Properties of Finite-State Transducers Inversion Example (Inversion) FST T Morph mapping words to morphological categories FST T 1 Morph mapping morphological categories to words Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 30

Outline 1 Finite-State Automata: Characterization Finite-State Acceptors Finite-State Transducers 2 Closure Properties of Finite-State Acceptors 3 Closure Properties of Finite-State Transducers 4 Equivalence Transformations on Finite-State Acceptors Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 31

Equivalence Transformations on Finite-State Acceptors Equivalence transformations are operations on automata which change the topology of an automaton but not its language. They usually serve optimization purposes, i.e. they create smaller and/or faster automata. Sometimes they are even necessary (e.g. determinization is crucial for complementation). Finite-state acceptors admit the following transformations: ɛ-removal Determinization Minimization Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 32

Determinization Subset Construction Example A DFA can be constructed from a NFA by the subset construction. In worst case, the smallest DFA can have 2 n states.... Q D is the power set of Q N F D is the set of subsets S of Q N such that S F N. For each set S Q N and for each input symbol a Σ δ D (S, a) = p S δ N (p, a) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 33

Determinization Subset Construction transition diagram: transition function δ: δ(p 0, 0) = {p 0, p 1 } δ(p 0, 1) = {p 0 } δ(p 1, 1) = {p 2 } 0 1 not accessible! {p 0 } {p 0, p 1 } {p 0 } {p 1 } {p 2 } not accessible! {p 2 } not accessible! {p 0, p 1 } {p 0, p 1 } {p 0, p 2 } {p 0, p 2 } {p 0, p 1 } {p 0 } {p 1, p 2 } {p 2 } not accessible! {p 0, p 1, p 2 } {p 0, p 1 } {p 0, p 2 } not accessible! Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 34

Determinization Subset Construction: Lazy Evaluation Lazy Evaluation Basis NFA N s start state is accessible. Induction Set S of states is accessible. Then for each input symbol a, compute the set of states δ D (S, a). Example δ D ({p 0 }, 0) = {p 0, p 1 } (new accessible state) δ D ({p 0 }, 1) = {p 0 } ( old state) δ D ({p 0, p 1 }, 0) = δ N (p 0, 0) δ N (p 1, 0) = {p 0, p 1 } = {p 0, p 1 } ( old ) δ D ({p 0, p 1 }, 1) = δ N (p 0, 1) δ N (p 1, 1) = {p 0 } {p 2 } = {p 0, p 2 } (n.a.s.) δ D ({p 0, p 2 }, 0) = δ N (p 0, 0) δ N (p 2, 0) = {p 0, p 1 } = {p 0, p 1 } ( old ) δ D ({p 0, p 2 }, 1) = δ N (p 0, 1) δ N (p 2, 1) = {p 0 } = {p 0 } ( old ) Converging! Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 35

Determinization Example (Determinized Version of A 2 ) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 36

Bibliography J. E. Hopcroft, R. Motwani & J. D. Ullman: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 2007. T. Hanneforth: Finite-state Machines: Theory and Applications. Unweighted Finite-state Automata. Universität Potsdam, 2008. Slides: tagh.de/tom/wp-content/uploads/fsm unweigtedautomata.pdf Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 37