OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language Part I. Theory and Algorithms
Overview. Preliminaries Semirings Weighted Automata and Transducers. Algorithms Rational Operations Elementary Unary Operations Fundamental Binary Operations Optimization Algorithms Search Operations Fundamental String Algorithms OpenFst Part I. Algorithms Overview
Weight Sets: Semirings A semiring (K,,,, ) = a ring that may lack negation. Sum: to compute the weight of a sequence (sum of the weights of the paths labeled with that sequence). Product: to compute the weight of a path (product of the weights of constituent transitions). Semiring Set Boolean {, } Probability R + + Log R {, + } log + + Tropical R {, + } min + + String Σ { } ǫ log is defined by: x log y = log(e x +e y ) and is longest common prefix. The string semiring is a left semiring. OpenFst Part I. Algorithms Preliminaries
Weighted Automaton/Acceptor a/ b/4 / b/ c/3 a/ b/3 3/ b/ c/5 Probability semiring (R +, +,,, ) Tropical semiring (R + { }, min, +,, ) [A](ab) = 4 [A](ab) = 4 ( + 3 = 4) (min( + +,3 + + ) = 4) OpenFst Part I. Algorithms Preliminaries 3
Weighted Transducer a:ε/ a:r/3 / b:r/ b:ε/ c:s/ 3/ Probability semiring (R +, +,,, ) Tropical semiring (R + { }, min, +,, ) [T ](ab, r) = 6 [T ](ab, r) = 5 ( + 3 = 6) (min( + +,3 + + ) = 5) OpenFst Part I. Algorithms Preliminaries 4
Transducers as Weighted Automata A transducer T is functional iff for each x there exists at most one y such that [T ](x, y) An unweighted functional transducer can be seen as as: a weighted automata over the string semiring (Σ { },,,, ǫ) A weighted functional transducer over the semiring K can be seen as: a weighted automata over the cartesian product of the string semiring and K a:ε/ a:r/3 / b:r/ b:ε/ c:s/ 3/ a/(ε,) a/(r,3) /(ε,) b/(r,) b/(ε,) c/(s,) 3/(ε,) [T ](ab, r) = 5 [A](ab) = (r,5) [Tropical semiring (R + { },min,+,, )] OpenFst Part I. Algorithms Preliminaries 5
Definitions and Notation Paths Path π Origin or previous state: p[π]. Destination or next state: n[π]. Input label: i[π]. Output label: o[π]. p[π] i[π]:o[π] n[π] Sets of paths P(R,R ): set of all paths from R Q to R Q. P(R,x, R ): paths in P(R, R ) with input label x. P(R,x, y,r ): paths in P(R, x, R ) with output label y. OpenFst Part I. Algorithms Preliminaries 6
Definitions and Notation Automata and Transducers. General Definitions Alphabets: input Σ, output. States: Q, initial states I, final states F. Transitions: E Q (Σ {ǫ}) ( {ǫ}) K Q. Weight functions: initial weight function λ : I K. Machines final weight function ρ : F K. Automaton A = (Σ, Q, I,F, E, λ, ρ) with for all x Σ : M [A](x) = λ(p[π]) w[π] ρ(n[π]) π P(I,x,F) Transducer T = (Σ,,Q, I,F, E, λ, ρ) with for all x Σ, y : M [T ](x, y) = λ(p[π]) w[π] ρ(n[π]) π P(I,x,y,F) OpenFst Part I. Algorithms Preliminaries 7
Rational Operations Algorithms Definitions Operation Definition and Notation Sum [T T ](x, y) = [T ](x, y) [T ](x, y) M Product [T T ](x, y) = [T ](x,y ) [T ](x,y ) x=x x,y=y y M Closure [T ](x, y) = [T n ](x, y) n= Conditions on the closure operation: condition on T: e.g. weight of ǫ-cycles = (regulated transducers), or semiring condition: e.g. x = as with the tropical semiring (locally closed semirings). Complexity and implementation Complexity (linear): O(( E + Q ) + ( E + Q )) or O( Q + E ). Lazy implementation. OpenFst Part I. Algorithms Rational Operations 8
Sum (Union) Illustration Definition: [T T ](x, y) = [T ](x, y) [T ](x, y) Example: red/.5 green/.3 blue/ yellow/.6 /.8 green/.4 blue/. / /.3 red/.5 6 eps/ eps/ green/.3 blue/ /.8 yellow/.6 3 green/.4 4 / blue/. 5 /.3 OpenFst Part I. Algorithms Rational Operations 9
Product (Concatenation) Illustration M Definition: [T T ](x, y) = [T ](x, y ) [T ](x, y ) x=x x,y=y y Example: green/.4 / red/.5 blue/. green/.3 blue/ yellow/.6 /.8 /.3 red/.5 green/.4 4 / green/.3 blue/ yellow/.6 eps/.8 3 blue/. 5 /.3 OpenFst Part I. Algorithms Rational Operations
Definition: [T ](x, y) = Example: Closure Illustration M [T n ](x, y) n= green/.4 / green/.4 eps/ / eps/ blue/. 3 / blue/. /.3 eps/.3 /.3 OpenFst Part I. Algorithms Rational Operations
Some Elementary Unary Operations Algorithms Definitions Operation Definition and Notation Lazy Implementation Reversal [ T e ](x, y) = [T ](ex, ey) No Inversion [T ](x, y) = [T ](y, x) Yes Projection [Π (T)](x) = M y [T ](x, y) Yes [Π (T)](x) = M y [T ](y, x) Complexity and implementation Complexity (linear): O( Q + E ). Lazy implementation (see table). OpenFst Part I. Algorithms Elementary Unary Operations
Definition: [ e T ](x, y) = [T ](ex, ey) Example: Reversal Illustration red/.5 green/. 3 / green/.3 blue/ yellow/.6 blue/ 4 /.3 red/.5 eps/ eps/.3 4 5 green/. blue/ 3 blue/ yellow/.6 green/.3 / OpenFst Part I. Algorithms Elementary Unary Operations 3
Definition: [T ](x, y) = [T ](y, x) Inversion Illustration Example: red:bird/.5 green:pig/.3 blue:cat/ yellow:dog/.6 /.8 bird:red/.5 pig:green/.3 cat:blue/ dog:yellow/.6 /.8 OpenFst Part I. Algorithms Elementary Unary Operations 4
Projection Illustration Definition: [Π (T)](x) = M y [T ](x, y) Example: red:bird/.5 green:pig/.3 blue:cat/ yellow:dog/.6 /.8 red/.5 green/.3 blue/ yellow/.6 /.8 OpenFst Part I. Algorithms Elementary Unary Operations 5
Definitions Some Fundamental Binary Operations Algorithms Operation Definition and Notation Condition Composition [T T ](x, y) = M z [T ](x, z) [T ](z, y) K commutative Intersection [A A ](x) = [A ](x) [A ](x) K commutative Difference [A A ](x) = [A A ](x) A unweighted & deterministic Complexity and implementation Complexity (quadratic): O(( E + Q ) ( E + Q )). Path multiplicity in presence of ǫ-transitions: ǫ-filter. Lazy implementation. OpenFst Part I. Algorithms Fundamental Binary Operations 6
Composition Illustration Definition: [T T ](x, y) = M z [T ](x, z) [T ](z, y) Example: c:a/.3 a:b/.6 a:b/. b:a/. a:a/.4 b:b/.5 3/.6 b:c/.3 a:b/.4 /.7 c:b/.9 c:b/.7 (, ) a:b/ a:c/.4 (, ) (, ) a:b/.8 (3, )/.3 OpenFst Part I. Algorithms Fundamental Binary Operations 7
Composition Pseudocode Composition(T, T ) S Q I I while S do 3 (q, q ) Head(S) 4 Dequeue(S) 5 if (q, q ) I I then 6 I I I 7 λ(q, q ) λ (q ) λ (q ) 8 if (q, q ) F F then 9 F F {(q, q )} ρ(q, q ) ρ (q ) ρ (q ) for each(e, e ) such that o[e ] = i[e ] do if (n[e ], n[e ]) Q then 3 Q Q {(n[e ], n[e ])} 4 Enqueue(S,(n[e ], n[e ])) 5 E E {((q, q ), i[e ], o[e ], w[e ] w[e ],(n[e ], n[e ]))} 6 return T = (Σ,, Q, I, F, E, λ, ρ) OpenFst Part I. Algorithms Fundamental Binary Operations 8
Multiplicity & ǫ-transitions Problem T : a:a b:ε c:ε 3 d:d 4 ε:ε ˆT : a:a ε:ε b:ε ε:ε c:ε ε:ε 3 d:d ε:ε 4 T : a:d ε:e d:a 3 ε:ε ˆT : a:d ε:ε ε: e ε:ε d:a ε:ε 3 ˆT ˆT : a:d ε:e,,, (x:x) b:ε (ε:ε) c:ε (ε:ε) (ε:ε) ε:e,, (ε:ε) b:e (ε:ε) b:ε (ε:ε) c:ε (ε:ε) Redundant ǫ-paths 3, ε:e 3, (ε:ε) d:a (x:x) 4,3 OpenFst Part I. Algorithms Fundamental Binary Operations 9
Solution Filter F for Composition ε:ε x:x Filter F ε:ε x:x ε:ε ε:ε ε:ε a:d ε:e,,, (x:x) b:ε (ε:ε) c:ε (ε:ε) (ε:ε) ε:e,, (ε:ε) b:e (ε:ε) b:ε (ε:ε) c:ε (ε:ε) x:x 3, ε:e 3, (ε:ε) d:a (x:x) 4,3 Replace T T by ˆT F ˆT. OpenFst Part I. Algorithms Fundamental Binary Operations
Intersection Illustration Definition: [A A ](x) = [A ](x) [A ](x) Example: red/.5 green/.4 green/.3 blue/ yellow/.6 /.8 / red/. blue/.6 yellow/.3 /.5 blue/.6 3 /.8 red/.7 green/.7 yellow/.9 4 /.3 OpenFst Part I. Algorithms Fundamental Binary Operations
Difference Illustration Definition: [A A ](x) = [A A ](x) Example: red/.5 green green/.3 blue/ yellow/.6 /.8 red blue yellow red/.5 red/.5 red/.5 green/.3 green/.3 3 blue/ yellow/.6 4 /.8 OpenFst Part I. Algorithms Fundamental Binary Operations
Optimization Algorithms Overview Definitions Operation Connection ǫ-removal Determinization Pushing Minimization Description Removes non-accessible/non-coaccessible states Removes ǫ-transitions Creates equivalent deterministic machine Creates equivalent pushed/stochastic machine Creates equivalent minimal deterministic machine Conditions: There are specific semiring conditions for the use of these algorithms. Not all weighted automata or transducers can be determinized using that algorithm. OpenFst Part I. Algorithms Optimization Algorithms 3
Connection Algorithm Definition Input: weighted transducer T. Output: weighted transducer T T with all states connected. Description. Depth-first search of T from I.. Mark accessible and coaccessible states. 3. Keep marked states and corresponding transitions. Complexity and implementation Complexity (linear): O( Q + E ). No natural lazy implementation. OpenFst Part I. Algorithms Optimization Algorithms 4
Connection Illustration Definition: Removes non-accessible/non-coaccessible states Example: green/. 3 4 /. red/.5 blue/ /.8 green/.3 yellow/.6 red/ 5 red/.5 green/.3 blue/ yellow/.6 /.8 OpenFst Part I. Algorithms Optimization Algorithms 5
Definition ǫ-removal Algorithm Input: weighted transducer T with ǫ-transitions. Output: weighted transducer T T with no ǫ-transition. Description (two stages):. Computation of ǫ-closures: for any state p, states q that can be reached from p via ǫ-paths and the total weight of the ǫ-paths from p to q. C[p] = (q,w) : q ǫ[p], d[p, q] = w with: d[p,q] = M π P(p,ǫ,q) w[π]. Removal of ǫ s: actual removal of ǫ-transitions and addition of new transitions. = All-pair K-shortest-distance problem in T ǫ (T reduced to its ǫ- transitions). OpenFst Part I. Algorithms Optimization Algorithms 6
Complexity and implementation All-pair shortest-distance algorithm in T ǫ. k-closed semirings (for T ǫ ) or approximation: generic sparse shortestdistance algorithm [See references]. Closed semirings: Floyd-Warshall or Gauss-Jordan elimination algorithm with decomposition of T ǫ into strongly connected components [See references], space complexity (quadratic): O( Q + E ). time complexity (cubic): O( Q 3 (T + T + T )). Complexity: Acyclic T ǫ : O( Q + Q E (T + T )). General case (tropical semiring): O( Q E + Q log Q ). Lazy implementation: integration with on-the-fly weighted determinization. OpenFst Part I. Algorithms Optimization Algorithms 7
Definition: Removes ǫ-transitions ǫ-removal Illustration Example: a:b/. ε:ε/. b:a/.4 ε:ε/.3 ε:ε/.6 b:a/.9 b:b/.5 3 ε:ε/ 4/ b:a/.7 a:a/.8 4/..6.6.8 3.3 T a:a/.6 ǫ-distances b:a/ b:a/.7 a:b/. b:b/.7 b:a/.5 b:b/.5 b:a/.3 a:a/.4 b:a/.7 4/ b:a/.4 b:a/.9 a:a/.8 3 rmeps(t) b:a/.7 OpenFst Part I. Algorithms Optimization Algorithms 8
Definition Determinization Algorithm Input: determinizable weighted automaton or transducer M. Output: M M subsequential or deterministic: M has a unique initial state and no two transitions leaving the same state share the same input label. Description. Generalization of subset construction: weighted subsets {(q, w ),...,(q n, w n )}, w i remainder weight at state q i.. Weight of a transition in the result: -sum of the original transitions pre- -multiplied by remainders. Conditions Semiring: weakly left divisible semirings. M is determinizable the determinization algorithm applies to M. All unweighted automata are determinizable. All acyclic machines are determinizable. OpenFst Part I. Algorithms Optimization Algorithms 9
Not all weighted automata or transducers are determinizable. Characterization based on the twins property. Complexity and Implementation Complexity: exponential. Lazy implementation. OpenFst Part I. Algorithms Optimization Algorithms 3
Determinization of Weighted Automata Illustration Definition: Creates an equivalent deterministic weighted automaton Example: a/ b/4 b/ b/3 a/3 b/3 3/ b/ / b/5 a/ {(,),(,)}/ b/ {(,)} b/ b/3 {(3,)}/ {(,),(,3)}/ OpenFst Part I. Algorithms Optimization Algorithms 3
Determinization of Weighted Transducers Illustration Definition: Creates an equivalent deterministic weighted transducer Example: a:a/. a:b/. a:c/.5 b:b/.3 b:eps/.4 3 c:eps/.7 a:eps/.5 a:b/.6 4/.7 a:a/. {(, b, )} a:b/. {(, eps, )} a:eps/. {(, c,.4), (, a, )} b:a/.3 {(3, b, ), (4, eps,.)} /(eps,.8) a:b/.5 a:b/.6 c:c/. {(4, eps, )} /(eps,.7) OpenFst Part I. Algorithms Optimization Algorithms 3
Determinization of Weighted Automata Pseudocode Determinization(A) i {(i, λ(i)) : i I} λ (i ) 3 S i 4 while S do 5 p Head(S) 6 Dequeue(S) 7 for each x i[e[q[p ]]] do 8 w L v w : (p, v) p,(p, x, w, q) E 9 q {(q, L w (v w) : (p, v) p,(p, x, w, q) E ) : q = n[e], i[e] = x, e E[Q[p ]]} E E (p, x, w, q ) if q Q then Q Q q 3 if Q[q ] F then 4 F F q 5 ρ (q ) L v ρ(q) : (q, v) q, q F 6 Enqueue(S, q ) 7 return A OpenFst Part I. Algorithms Optimization Algorithms 33
Definition Pushing Algorithm Input: weighted automaton or transducer M. Output: M M such that: the longest common prefix of all outgoing paths is minimal, or the -sum of the weights of all outgoing transitions = modulo the string/weight at the initial state. Description (two stages):. Single-source shortest distance computation: for each state q, d[q] = M w[π] π P(q,F). Reweighting: for each transition e such that d[p[e]], w[e] (d[p[e]]) (w[e] d[n[e]]) OpenFst Part I. Algorithms Optimization Algorithms 34
Conditions (automata case) Weakly divisible semiring. Zero-sum free semiring or zero-sum free machine. Complexity Automata case Acyclic case (linear): O( Q + E (T + T )). General case (tropical semiring): O( Q log Q + E ). Transducer case: O(( P max + ) E ). OpenFst Part I. Algorithms Optimization Algorithms 35
Weight Pushing Illustration Definition: Creates an equivalent pushed/stochastic machine Example: Tropical semiring a/ b/ c/4 d/ e/ f/ e/ 3/ a/ b/ c/4 d/ e/ f/ e/ 3/ e/ f/ e/ f/ OpenFst Part I. Algorithms Optimization Algorithms 36
Log semiring a/ b/ c/4 d/ e/ f/ e/ 3/ a/.366 b/.366 c/4.366 d/.36 e/.33 f/.33 e/.33 3/-.639 e/ f/ e/.36 f/.33 OpenFst Part I. Algorithms Optimization Algorithms 37
Label Pushing Illustration Definition: Minimizes at each state the length of the common prefix of all outgoing paths at that state. Example: a:a c: ε b:d a: ε b:ε 3 d:ε c:d a:d 4 a: ε 5 f:d 6 a:a c:d b:ε a:d b:ε 3 d:d c: ε a:d 4 a: ε 5 f: ε 6 OpenFst Part I. Algorithms Optimization Algorithms 38
Definition Minimization Algorithm Input: deterministic weighted automaton or transducer M. Output: deterministic M M with minimal number of states and transitions. Description: two stages. Canonical representation: use pushing or other algorithm to standardize input automata.. Automata minimization: encode pairs (label, weight) as labels and use classical unweighted minimization algorithm. Complexity Automata case Acyclic case (linear): O( Q + E (T + T )). General case (tropical semiring): O( E log Q ). Transducer case Acyclic case: O(S + Q + E ( P max + )). General case: O(S + Q + E (log Q + P max )). OpenFst Part I. Algorithms Optimization Algorithms 39
Minimization Illustration Definition: Computes a minimal equivalent deterministic machine Example: b/ c/ d/ a/ b/ a/ d/3 3 e/ d/4 e/3 5 c/3 c/ 6 e/ 7/ 4 b/ c/ d/ a/ b/ a/3 d/6 3 e/ d/6 e/ 5 c/ c/ 6 e/ 7/6 d/ a/ b/ b/ a/3 4 c/ c/ e/ d/6 3 6 e/ 7/6 OpenFst Part I. Algorithms Optimization Algorithms 4
Definition Equivalence Algorithm Input: deterministic weighted automata A and A. Output: true if A A, false otherwise. Description: two stages. Canonical representation: use pushing or other algorithm to standardize input automata.. Test: encode pairs (label, weight) as labels and use classical algorithm for testing the equivalence of unweighted automata. Complexity First stage: O(( E + E ) + ( Q + Q )log( Q + Q )) if using pushing in the tropical semiring. Second stage (quasi-linear): O(m α(m, n)) where m = E + E and n = Q + Q, and α is the inverse of Ackermann s function. OpenFst Part I. Algorithms Optimization Algorithms 4
Equivalence Illustration Definition: A A iff [A ](x) = [A ](x) for all x Graphical Representation: blue/.7 /.3 red/.3 yellow/.9 3 /.4 red/? blue/ yellow/.3 /.3 OpenFst Part I. Algorithms Optimization Algorithms 4
Single-Source Shortest-Distance Algorithms Algorithm Generic single-source shortest-distance algorithm Definition: for each state q, d[q] = M π P(q,F) w[π] Works with any queue discipline and any semiring k-closed for the graph. Coincides with classical algorithms in the specific case of the tropical semiring and the specific queue disciplines: shortest-first (Dijkstra), FIFO (Bellman-Ford), or topological sort order (Lawler). N-shortest paths algorithm General N-shortest paths algorithm augmented with the computation of the potentials. On-the-fly weighted determinization for n-shortest strings. OpenFst Part I. Algorithms Search Operations 43
N-Shortest Paths Illustration Definition: Computes the N-shortest paths in the input machine Condition: Semiring needs to have the path property: a b {a, b} (e.g. tropical semiring) Example: red/.5 red/.5 red/.5 green/.3 green/.3 3 blue/ yellow/.6 4 /.8 green/.3 blue/ /.8 OpenFst Part I. Algorithms Search Operations 44
Pruning Illustration Definition: Removes any paths which weight is more than the shortestdistance -multiply by a specified threshold Condition: Semiring needs to be commutative and have the path property: a b {a, b} (e.g. tropical semiring) Example: red/.5 red/.5 green/.3 green/.3 3 blue/ yellow/.6 4 /.8 green/.3 blue/ yellow/.6 /.8 OpenFst Part I. Algorithms Search Operations 45
String Algorithms Overview How to implement some fundamental string algorithms using the operations previously described: Counting patterns (e.g. n-grams) in automata Pattern matching using automata Compiling regular expression into automata Benefits: generality, efficiency and flexibility OpenFst Part I. Algorithms String Algorithms 46
Expected count of x in A: Counting from weighted automata Let A be a weighted automaton over the probability semiring, c(x) = u Σ u x [A](u) where: u x : number of occurrences of x in u [A](u): weight associated to u by A Pr(u) if A is pushed Condition: The weight of any cycle in A should be less than. This is the case if A represents a probability distribution. OpenFst Part I. Algorithms String Algorithms 47
Counting by composition with a transducer Counting transducer T for set of sequences X with Σ = {a, b}: b:ε/ a:ε/ X:X/ b:ε/ a:ε/ / u x v 3 u:ε x:x v:ε 3 in A in A T To each successful path π in A and each occurence of x along π corresponds a successful path with output x in A T c(x) is the sum of the weight of all the successful path with output x in A T Theorem: Let Π denote projection onto output labels. For all x X, c(x) = [Π (A T)](x) OpenFst Part I. Algorithms String Algorithms 48
Counting with transducers Example a c b 3 a a 4 b 5 c:ε b:ε a:ε a:a b:b c:ε b:ε a:ε c:ε a:ε a:a 5 Automaton A 8 a:a b:ε a:a b:b 6 a:ε 3 7 b:b b:ε 4 Counting transducer T 8 a ε ε ε a 3 b a ε 5 b 6 ε 7 4 A T Π (A T) [Π (A T)](ab) = 3 = c(ab) OpenFst Part I. Algorithms String Algorithms 49
Local Grammar Algorithm [Mohri, 94] Definition Input: a deterministic finite automaton A Output: a compact representation of det(σ A) Description A generalization of [Aho-Corasick, 75] Failure transitions: labeled by φ, non-consuming, traversed when no transition with required label is present Default transitions: labeled by ρ, consuming, traversed when no transition with required label is present, only present at the initial state OpenFst Part I. Algorithms String Algorithms 5
Local Grammar Illustration Pattern matching: find all occurences of pattern A in text T T = abbaabaab, A = ab + a b a b a 3 ρ a φ φ b φ b a 3 A det(σ A), a, b, b 3, a 4,3 a 5, b 6, a 7,3 a 8, b 9, Complexity: search time linear in T T det(σ A) OpenFst Part I. Algorithms String Algorithms 5
Definition Regular Expression Compilation Algorithms Input: a (weighted) regular expression α Output: a (weighted) automata representing α Description: Thompson construction. Build a sparse tree for α. Walk the tree bottom-up and apply the relevant rational operation at each node Complexity and implementation Linear in the length of α Admits lazy implementation Other constructions (Glushkov, Antimirov, Follow) can be obtained from Thompson using epsilon-removal and minimization OpenFst Part I. Algorithms String Algorithms 5
Regular Expression Compilation Thompson Regular expression: α = (ab + 3b(4ab) ) Thompson automaton: ε/ ε/ ε/ ε/ ε/3 3 a/ ε/ b/ 3 ε/ b/ ε/ ε/ ε/4 4 a/ 4 ε/ 5 b/ ε/ 5 ε/ ε/ ε/ ε/ ε/ ε/ */ A T (α) OpenFst Part I. Algorithms String Algorithms 53
Regular Expression Compilation Glushkov Regular expression: α = (ab + 3b(4ab) ) Glushkov automaton: a/ / a/ b/ a/ / b/3 a/ b/3 b/3 a/4 4 3/ b/3 a/4 b/ 5/ A G (α) = rmeps(a T (α)) OpenFst Part I. Algorithms String Algorithms 54