Formal Models in NLP - PDF Free Download

Formal Models in NLP Finite-State Automata Nina Seemann Universität Stuttgart Institut für Maschinelle Sprachverarbeitung Pfaffenwaldring 5b 70569 Stuttgart May 15, 2012 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 1

Outline 1 Finite-State Automata: Characterization 2 Closure Properties of Finite-State Acceptors 3 Closure Properties of Finite-State Transducers 4 Equivalence Transformations on Finite-State Acceptors Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 2

Outline 1 Finite-State Automata: Characterization Finite-State Acceptors Finite-State Transducers 2 Closure Properties of Finite-State Acceptors 3 Closure Properties of Finite-State Transducers 4 Equivalence Transformations on Finite-State Acceptors Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 3

Finite-State Acceptors Example (NFA A lex accepting some animal names) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 4

Finite-State Acceptors Non-Deterministic Finite-State Acceptor Definition (Non-deterministic finite-state acceptor (NFA)) A non-deterministic finite-state acceptor A is a 5-tuple (Q, Σ, q 0, F, δ) where Q is a finite set of states Σ is the alphabet q 0 Q is the start state F Q is a set of final states δ : Q Σ {ɛ} 2 Q, the transition function Nondeterminism refers to the fact that a NFA has the power to be in several states at once. A transition may be labeled with ɛ. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 5

Finite-State Acceptors Deterministic Finite-State Acceptor Definition (Deterministic finite-state acceptor (DFA)) A deterministic finite-state acceptor D is a 5-tuple (Q, Σ, q 0, F, δ) where Q is a finite set of states Σ is a finite set and called the alphabet q 0 Q is the initial state F Q is a set of final states δ : Q Σ Q, the transition function Determinism refers to the fact that DFAs can go to one state only. DFAs are ɛ-free by definition. DFA and NFA have the same generative power, i.e. they are equivalent. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 6

Finite-State Acceptors Example (DFA D lex accepting some animal names) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 7

Finite-State Acceptors Extended Transition Function & Language Definition (Extended transition function ˆδ) ˆδ describes what happens when we start in any state and follow any sequence of inputs. ˆδ(q, ɛ) = q. ˆδ(q, w) = δ(ˆδ(q, x), a) with w = xa. Definition (Language of a DFA A) L(A) = {w Σ ˆδ(q o, w) F } We also say that L(A) is recognized by A. Definition (Regular language) The language is called regular if there exists some DFA which recognizes it. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 8

Finite-State Acceptors Extended Transition Function for DFA Example (frog in DFA D lex ) Assumption: ˆδ(0, frog) {26, 24, 22, 13, 11, 9, 8} ˆδ(0, ɛ) = 0 ˆδ(0, f ) = δ(ˆδ(0, ɛ), f ) = δ(0, f ) = 3 ˆδ(0, fr) = δ(ˆδ(0, f ), r) = δ(3, r) = 6 ˆδ(0, fro) = δ(ˆδ(0, fr), o) = δ(6, o) = 7 ˆδ(0, frog) = δ(ˆδ(0, fro), g) = δ(7, g) = 8 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 9

Finite-State Acceptors Extended Transition Function for NFA Example (frog in NFA A lex ) Assumption: ˆδ(31, frog) {2, 6, 9, 13, 18, 21, 30} ˆδ(31, ɛ) = {31} ˆδ(31, f ) = δ(ˆδ(31, ɛ), f ) = δ(31, f ) = {3, 7, 10} ˆδ(31, fr) = δ(ˆδ(31, f ), r) = δ(3, r) δ(7, r) δ(10, r) = {4} = {4} ˆδ(31, fro) = δ(ˆδ(31, fr), o) = δ(4, o) = {5} ˆδ(31, frog) = δ(ˆδ(31, fro), g) = δ(5, g) = {6} Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 10

Finite-State Transducers Definition Definition ((Non-deterministic) finite-state transducer (NFST)) A (non-deterministic) finite-state transducer T is a 7-tuple (Q, Σ,, q 0, F, δ, σ) where Q is a set of states Σ is the input alphabet of T is the output alphabet of T q 0 Q is the start state F Q is a set of final states δ : Q Σ {ɛ} 2 Q, the transition function σ : Q Σ {ɛ} Q, the output function Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 11

Finite-State Transducers Alternative Definition Definition (Normalized finite-state transducer) A normalized finite-state transducer T is a 6-tuple (Q, Σ,, q 0, F, E) where Q is a set of states Σ is a set and called the input alphabet of T is a set and called the output alphabet of T q 0 Q is the start state F Q is a set of final states E Q (Σ {ɛ}) ( {ɛ}) Q, the set of transitions Every transducer can be transformed into a normalized transducer. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 12

Finite-State Transducers Example (NFST T lex mapping surface forms to morph. features) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 13

Finite-State Transducers Deterministic Finite-State Transducer Definition (Deterministic finite-state transducer (DFST)) A deterministic finite-state transducer T is a 7-tuple (Q, Σ,, q 0, F, δ, σ) where Q is a set of states Σ is a set and called the input alphabet of T is a set and called the output alphabet of T q 0 Q is the start state F Q is a set of final states δ : Q Σ Q, the (deterministic) transition function σ : Q Σ Q, the (deterministic) output function Note: Not every NFST can be determinized. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 14

Closure Properties of Finite-State Acceptors Finite-state acceptors are closed under: Union Concatenation Closure (Kleene Star) Reversal Intersection Complementation Difference Homomorphism / Inverse homomorphism Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 16

Closure Properties of Finite-State Acceptors Union Example (Union of two acceptors A 1 and A 2 ) A 1 A 2 A 1 A 2 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 17

Closure Properties of Finite-State Acceptors Concatenation Example (Concatenation of two acceptors A 1 and A 2 ) A 1 A 2 A 1 A 2 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 18

Closure Properties of Finite-State Acceptors Closure (Kleene Star) Example (Closure of acceptor A 1 ) A 1 A 1 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 19

Closure Properties of Finite-State Acceptors Reversal Example (Reversal of acceptor A 2 ) A 2 A R 2 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 20

Closure Properties of Finite-State Acceptors Intersection Intersection Let L and M be the languages of the deterministic automata A L = (Q L, Σ, δ L, q L, F L ) and A M = (Q M, Σ, δ M, q M, F M ). For L M we will construct an automaton A = (Q L Q M, Σ, δ, (q L, q M ), F L F M ) where δ((p, q), σ) = (δ L (p, σ), δ M (q, σ)) [p Q L, q Q M, and σ Σ]. The set F of final states consists of all pairs (p, q) such that p F L and q F M. states of A are pair of states (A L, A M ) suppose state (p,q): Given input symbol a what does A L on input a s what does A M on input a t new state pair (s, t) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 21

Closure Properties of Finite-State Acceptors Intersection Example (Intersection of two acceptors A 1 and A 3 ) A 1 A 3 A 1 A 3 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 22

Closure Properties of Finite-State Acceptors Complementation Example (Complementation of acceptor A 3 ) A 3 A 3 Complementation requires a deterministic acceptor. If the acceptor is not total, a sink state has to be added. Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 23

Closure Properties of Finite-State Acceptors Difference Example (Difference of two acceptors A 1 and A 2 ) A 1 A 2 A 1 A 2 = A 1 A 2 Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 24

Closure Properties of Finite-State Transducers Finite-state transducers are closed under Union Concatenation Closure (Kleene Star) Reversal Projection (leads to FSAs) Composition Inversion Finite-state transducers are not closed under Complementation Intersection (but acyclic and ɛ-free transducers are) Difference Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 26

Closure Properties of Finite-State Transducers Projection Example (Projection of transducer T ) Transducer T π 1 (T ) π 2 (T ) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 27

Composition Definition (ɛ-free composition) Let T 1 = (Q 1, Σ 1, 1, q 1, F 1, E 1 ) and T 2 = (Q 2, Σ 2, 2, q 2, F 2, E 2 ) be two normalized, ɛ-free FSTs. T 1 T 2 is the transducer T = (Q 1 Q 2, Σ 1, 2, (q 1, q 2 ), F 1 F 2, E) where E = {((p, q), a, b, (p, q )) c 1 Σ 2 : (p, a, c, p ) E 1 (q, c, b, q ) E 2 } How does composition work? Whenever T 1 contains a transition: and T 2 contains a transition: T will contain a transition: Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 28

Closure Properties of Finite-State Transducers Composition Example (Composition) = Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 29

Closure Properties of Finite-State Transducers Inversion Example (Inversion) FST T Morph mapping words to morphological categories FST T 1 Morph mapping morphological categories to words Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 30

Equivalence Transformations on Finite-State Acceptors Equivalence transformations are operations on automata which change the topology of an automaton but not its language. They usually serve optimization purposes, i.e. they create smaller and/or faster automata. Sometimes they are even necessary (e.g. determinization is crucial for complementation). Finite-state acceptors admit the following transformations: ɛ-removal Determinization Minimization Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 32

Determinization Subset Construction Example A DFA can be constructed from a NFA by the subset construction. In worst case, the smallest DFA can have 2 n states.... Q D is the power set of Q N F D is the set of subsets S of Q N such that S F N. For each set S Q N and for each input symbol a Σ δ D (S, a) = p S δ N (p, a) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 33

Determinization Subset Construction transition diagram: transition function δ: δ(p 0, 0) = {p 0, p 1 } δ(p 0, 1) = {p 0 } δ(p 1, 1) = {p 2 } 0 1 not accessible! {p 0 } {p 0, p 1 } {p 0 } {p 1 } {p 2 } not accessible! {p 2 } not accessible! {p 0, p 1 } {p 0, p 1 } {p 0, p 2 } {p 0, p 2 } {p 0, p 1 } {p 0 } {p 1, p 2 } {p 2 } not accessible! {p 0, p 1, p 2 } {p 0, p 1 } {p 0, p 2 } not accessible! Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 34

Determinization Subset Construction: Lazy Evaluation Lazy Evaluation Basis NFA N s start state is accessible. Induction Set S of states is accessible. Then for each input symbol a, compute the set of states δ D (S, a). Example δ D ({p 0 }, 0) = {p 0, p 1 } (new accessible state) δ D ({p 0 }, 1) = {p 0 } ( old state) δ D ({p 0, p 1 }, 0) = δ N (p 0, 0) δ N (p 1, 0) = {p 0, p 1 } = {p 0, p 1 } ( old ) δ D ({p 0, p 1 }, 1) = δ N (p 0, 1) δ N (p 1, 1) = {p 0 } {p 2 } = {p 0, p 2 } (n.a.s.) δ D ({p 0, p 2 }, 0) = δ N (p 0, 0) δ N (p 2, 0) = {p 0, p 1 } = {p 0, p 1 } ( old ) δ D ({p 0, p 2 }, 1) = δ N (p 0, 1) δ N (p 2, 1) = {p 0 } = {p 0 } ( old ) Converging! Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 35

Determinization Example (Determinized Version of A 2 ) Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 36

Bibliography J. E. Hopcroft, R. Motwani & J. D. Ullman: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 2007. T. Hanneforth: Finite-state Machines: Theory and Applications. Unweighted Finite-state Automata. Universität Potsdam, 2008. Slides: tagh.de/tom/wp-content/uploads/fsm unweigtedautomata.pdf Nina Seemann (IMS) Formal Models in NLP: Finite-State Automata May 15, 2012 37