OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language. Part I. Theory and Algorithms

Size: px
Start display at page:

Download "OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language. Part I. Theory and Algorithms"

Transcription

1 OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language Part I. Theory and Algorithms

2 Overview. Preliminaries Semirings Weighted Automata and Transducers. Algorithms Rational Operations Elementary Unary Operations Fundamental Binary Operations Optimization Algorithms Search Operations Fundamental String Algorithms OpenFst Part I. Algorithms Overview

3 Weight Sets: Semirings A semiring (K,,,, ) = a ring that may lack negation. Sum: to compute the weight of a sequence (sum of the weights of the paths labeled with that sequence). Product: to compute the weight of a path (product of the weights of constituent transitions). Semiring Set Boolean {, } Probability R + + Log R {, + } log + + Tropical R {, + } min + + String Σ { } ǫ log is defined by: x log y = log(e x +e y ) and is longest common prefix. The string semiring is a left semiring. OpenFst Part I. Algorithms Preliminaries

4 Weighted Automaton/Acceptor a/ b/4 / b/ c/3 a/ b/3 3/ b/ c/5 Probability semiring (R +, +,,, ) Tropical semiring (R + { }, min, +,, ) [A](ab) = 4 [A](ab) = 4 ( + 3 = 4) (min( + +,3 + + ) = 4) OpenFst Part I. Algorithms Preliminaries 3

5 Weighted Transducer a:ε/ a:r/3 / b:r/ b:ε/ c:s/ 3/ Probability semiring (R +, +,,, ) Tropical semiring (R + { }, min, +,, ) [T ](ab, r) = 6 [T ](ab, r) = 5 ( + 3 = 6) (min( + +,3 + + ) = 5) OpenFst Part I. Algorithms Preliminaries 4

6 Transducers as Weighted Automata A transducer T is functional iff for each x there exists at most one y such that [T ](x, y) An unweighted functional transducer can be seen as as: a weighted automata over the string semiring (Σ { },,,, ǫ) A weighted functional transducer over the semiring K can be seen as: a weighted automata over the cartesian product of the string semiring and K a:ε/ a:r/3 / b:r/ b:ε/ c:s/ 3/ a/(ε,) a/(r,3) /(ε,) b/(r,) b/(ε,) c/(s,) 3/(ε,) [T ](ab, r) = 5 [A](ab) = (r,5) [Tropical semiring (R + { },min,+,, )] OpenFst Part I. Algorithms Preliminaries 5

7 Definitions and Notation Paths Path π Origin or previous state: p[π]. Destination or next state: n[π]. Input label: i[π]. Output label: o[π]. p[π] i[π]:o[π] n[π] Sets of paths P(R,R ): set of all paths from R Q to R Q. P(R,x, R ): paths in P(R, R ) with input label x. P(R,x, y,r ): paths in P(R, x, R ) with output label y. OpenFst Part I. Algorithms Preliminaries 6

8 Definitions and Notation Automata and Transducers. General Definitions Alphabets: input Σ, output. States: Q, initial states I, final states F. Transitions: E Q (Σ {ǫ}) ( {ǫ}) K Q. Weight functions: initial weight function λ : I K. Machines final weight function ρ : F K. Automaton A = (Σ, Q, I,F, E, λ, ρ) with for all x Σ : M [A](x) = λ(p[π]) w[π] ρ(n[π]) π P(I,x,F) Transducer T = (Σ,,Q, I,F, E, λ, ρ) with for all x Σ, y : M [T ](x, y) = λ(p[π]) w[π] ρ(n[π]) π P(I,x,y,F) OpenFst Part I. Algorithms Preliminaries 7

9 Rational Operations Algorithms Definitions Operation Definition and Notation Sum [T T ](x, y) = [T ](x, y) [T ](x, y) M Product [T T ](x, y) = [T ](x,y ) [T ](x,y ) x=x x,y=y y M Closure [T ](x, y) = [T n ](x, y) n= Conditions on the closure operation: condition on T: e.g. weight of ǫ-cycles = (regulated transducers), or semiring condition: e.g. x = as with the tropical semiring (locally closed semirings). Complexity and implementation Complexity (linear): O(( E + Q ) + ( E + Q )) or O( Q + E ). Lazy implementation. OpenFst Part I. Algorithms Rational Operations 8

10 Sum (Union) Illustration Definition: [T T ](x, y) = [T ](x, y) [T ](x, y) Example: red/.5 green/.3 blue/ yellow/.6 /.8 green/.4 blue/. / /.3 red/.5 6 eps/ eps/ green/.3 blue/ /.8 yellow/.6 3 green/.4 4 / blue/. 5 /.3 OpenFst Part I. Algorithms Rational Operations 9

11 Product (Concatenation) Illustration M Definition: [T T ](x, y) = [T ](x, y ) [T ](x, y ) x=x x,y=y y Example: green/.4 / red/.5 blue/. green/.3 blue/ yellow/.6 /.8 /.3 red/.5 green/.4 4 / green/.3 blue/ yellow/.6 eps/.8 3 blue/. 5 /.3 OpenFst Part I. Algorithms Rational Operations

12 Definition: [T ](x, y) = Example: Closure Illustration M [T n ](x, y) n= green/.4 / green/.4 eps/ / eps/ blue/. 3 / blue/. /.3 eps/.3 /.3 OpenFst Part I. Algorithms Rational Operations

13 Some Elementary Unary Operations Algorithms Definitions Operation Definition and Notation Lazy Implementation Reversal [ T e ](x, y) = [T ](ex, ey) No Inversion [T ](x, y) = [T ](y, x) Yes Projection [Π (T)](x) = M y [T ](x, y) Yes [Π (T)](x) = M y [T ](y, x) Complexity and implementation Complexity (linear): O( Q + E ). Lazy implementation (see table). OpenFst Part I. Algorithms Elementary Unary Operations

14 Definition: [ e T ](x, y) = [T ](ex, ey) Example: Reversal Illustration red/.5 green/. 3 / green/.3 blue/ yellow/.6 blue/ 4 /.3 red/.5 eps/ eps/ green/. blue/ 3 blue/ yellow/.6 green/.3 / OpenFst Part I. Algorithms Elementary Unary Operations 3

15 Definition: [T ](x, y) = [T ](y, x) Inversion Illustration Example: red:bird/.5 green:pig/.3 blue:cat/ yellow:dog/.6 /.8 bird:red/.5 pig:green/.3 cat:blue/ dog:yellow/.6 /.8 OpenFst Part I. Algorithms Elementary Unary Operations 4

16 Projection Illustration Definition: [Π (T)](x) = M y [T ](x, y) Example: red:bird/.5 green:pig/.3 blue:cat/ yellow:dog/.6 /.8 red/.5 green/.3 blue/ yellow/.6 /.8 OpenFst Part I. Algorithms Elementary Unary Operations 5

17 Definitions Some Fundamental Binary Operations Algorithms Operation Definition and Notation Condition Composition [T T ](x, y) = M z [T ](x, z) [T ](z, y) K commutative Intersection [A A ](x) = [A ](x) [A ](x) K commutative Difference [A A ](x) = [A A ](x) A unweighted & deterministic Complexity and implementation Complexity (quadratic): O(( E + Q ) ( E + Q )). Path multiplicity in presence of ǫ-transitions: ǫ-filter. Lazy implementation. OpenFst Part I. Algorithms Fundamental Binary Operations 6

18 Composition Illustration Definition: [T T ](x, y) = M z [T ](x, z) [T ](z, y) Example: c:a/.3 a:b/.6 a:b/. b:a/. a:a/.4 b:b/.5 3/.6 b:c/.3 a:b/.4 /.7 c:b/.9 c:b/.7 (, ) a:b/ a:c/.4 (, ) (, ) a:b/.8 (3, )/.3 OpenFst Part I. Algorithms Fundamental Binary Operations 7

19 Composition Pseudocode Composition(T, T ) S Q I I while S do 3 (q, q ) Head(S) 4 Dequeue(S) 5 if (q, q ) I I then 6 I I I 7 λ(q, q ) λ (q ) λ (q ) 8 if (q, q ) F F then 9 F F {(q, q )} ρ(q, q ) ρ (q ) ρ (q ) for each(e, e ) such that o[e ] = i[e ] do if (n[e ], n[e ]) Q then 3 Q Q {(n[e ], n[e ])} 4 Enqueue(S,(n[e ], n[e ])) 5 E E {((q, q ), i[e ], o[e ], w[e ] w[e ],(n[e ], n[e ]))} 6 return T = (Σ,, Q, I, F, E, λ, ρ) OpenFst Part I. Algorithms Fundamental Binary Operations 8

20 Multiplicity & ǫ-transitions Problem T : a:a b:ε c:ε 3 d:d 4 ε:ε ˆT : a:a ε:ε b:ε ε:ε c:ε ε:ε 3 d:d ε:ε 4 T : a:d ε:e d:a 3 ε:ε ˆT : a:d ε:ε ε: e ε:ε d:a ε:ε 3 ˆT ˆT : a:d ε:e,,, (x:x) b:ε (ε:ε) c:ε (ε:ε) (ε:ε) ε:e,, (ε:ε) b:e (ε:ε) b:ε (ε:ε) c:ε (ε:ε) Redundant ǫ-paths 3, ε:e 3, (ε:ε) d:a (x:x) 4,3 OpenFst Part I. Algorithms Fundamental Binary Operations 9

21 Solution Filter F for Composition ε:ε x:x Filter F ε:ε x:x ε:ε ε:ε ε:ε a:d ε:e,,, (x:x) b:ε (ε:ε) c:ε (ε:ε) (ε:ε) ε:e,, (ε:ε) b:e (ε:ε) b:ε (ε:ε) c:ε (ε:ε) x:x 3, ε:e 3, (ε:ε) d:a (x:x) 4,3 Replace T T by ˆT F ˆT. OpenFst Part I. Algorithms Fundamental Binary Operations

22 Intersection Illustration Definition: [A A ](x) = [A ](x) [A ](x) Example: red/.5 green/.4 green/.3 blue/ yellow/.6 /.8 / red/. blue/.6 yellow/.3 /.5 blue/.6 3 /.8 red/.7 green/.7 yellow/.9 4 /.3 OpenFst Part I. Algorithms Fundamental Binary Operations

23 Difference Illustration Definition: [A A ](x) = [A A ](x) Example: red/.5 green green/.3 blue/ yellow/.6 /.8 red blue yellow red/.5 red/.5 red/.5 green/.3 green/.3 3 blue/ yellow/.6 4 /.8 OpenFst Part I. Algorithms Fundamental Binary Operations

24 Optimization Algorithms Overview Definitions Operation Connection ǫ-removal Determinization Pushing Minimization Description Removes non-accessible/non-coaccessible states Removes ǫ-transitions Creates equivalent deterministic machine Creates equivalent pushed/stochastic machine Creates equivalent minimal deterministic machine Conditions: There are specific semiring conditions for the use of these algorithms. Not all weighted automata or transducers can be determinized using that algorithm. OpenFst Part I. Algorithms Optimization Algorithms 3

25 Connection Algorithm Definition Input: weighted transducer T. Output: weighted transducer T T with all states connected. Description. Depth-first search of T from I.. Mark accessible and coaccessible states. 3. Keep marked states and corresponding transitions. Complexity and implementation Complexity (linear): O( Q + E ). No natural lazy implementation. OpenFst Part I. Algorithms Optimization Algorithms 4

26 Connection Illustration Definition: Removes non-accessible/non-coaccessible states Example: green/. 3 4 /. red/.5 blue/ /.8 green/.3 yellow/.6 red/ 5 red/.5 green/.3 blue/ yellow/.6 /.8 OpenFst Part I. Algorithms Optimization Algorithms 5

27 Definition ǫ-removal Algorithm Input: weighted transducer T with ǫ-transitions. Output: weighted transducer T T with no ǫ-transition. Description (two stages):. Computation of ǫ-closures: for any state p, states q that can be reached from p via ǫ-paths and the total weight of the ǫ-paths from p to q. C[p] = (q,w) : q ǫ[p], d[p, q] = w with: d[p,q] = M π P(p,ǫ,q) w[π]. Removal of ǫ s: actual removal of ǫ-transitions and addition of new transitions. = All-pair K-shortest-distance problem in T ǫ (T reduced to its ǫ- transitions). OpenFst Part I. Algorithms Optimization Algorithms 6

28 Complexity and implementation All-pair shortest-distance algorithm in T ǫ. k-closed semirings (for T ǫ ) or approximation: generic sparse shortestdistance algorithm [See references]. Closed semirings: Floyd-Warshall or Gauss-Jordan elimination algorithm with decomposition of T ǫ into strongly connected components [See references], space complexity (quadratic): O( Q + E ). time complexity (cubic): O( Q 3 (T + T + T )). Complexity: Acyclic T ǫ : O( Q + Q E (T + T )). General case (tropical semiring): O( Q E + Q log Q ). Lazy implementation: integration with on-the-fly weighted determinization. OpenFst Part I. Algorithms Optimization Algorithms 7

29 Definition: Removes ǫ-transitions ǫ-removal Illustration Example: a:b/. ε:ε/. b:a/.4 ε:ε/.3 ε:ε/.6 b:a/.9 b:b/.5 3 ε:ε/ 4/ b:a/.7 a:a/.8 4/ T a:a/.6 ǫ-distances b:a/ b:a/.7 a:b/. b:b/.7 b:a/.5 b:b/.5 b:a/.3 a:a/.4 b:a/.7 4/ b:a/.4 b:a/.9 a:a/.8 3 rmeps(t) b:a/.7 OpenFst Part I. Algorithms Optimization Algorithms 8

30 Definition Determinization Algorithm Input: determinizable weighted automaton or transducer M. Output: M M subsequential or deterministic: M has a unique initial state and no two transitions leaving the same state share the same input label. Description. Generalization of subset construction: weighted subsets {(q, w ),...,(q n, w n )}, w i remainder weight at state q i.. Weight of a transition in the result: -sum of the original transitions pre- -multiplied by remainders. Conditions Semiring: weakly left divisible semirings. M is determinizable the determinization algorithm applies to M. All unweighted automata are determinizable. All acyclic machines are determinizable. OpenFst Part I. Algorithms Optimization Algorithms 9

31 Not all weighted automata or transducers are determinizable. Characterization based on the twins property. Complexity and Implementation Complexity: exponential. Lazy implementation. OpenFst Part I. Algorithms Optimization Algorithms 3

32 Determinization of Weighted Automata Illustration Definition: Creates an equivalent deterministic weighted automaton Example: a/ b/4 b/ b/3 a/3 b/3 3/ b/ / b/5 a/ {(,),(,)}/ b/ {(,)} b/ b/3 {(3,)}/ {(,),(,3)}/ OpenFst Part I. Algorithms Optimization Algorithms 3

33 Determinization of Weighted Transducers Illustration Definition: Creates an equivalent deterministic weighted transducer Example: a:a/. a:b/. a:c/.5 b:b/.3 b:eps/.4 3 c:eps/.7 a:eps/.5 a:b/.6 4/.7 a:a/. {(, b, )} a:b/. {(, eps, )} a:eps/. {(, c,.4), (, a, )} b:a/.3 {(3, b, ), (4, eps,.)} /(eps,.8) a:b/.5 a:b/.6 c:c/. {(4, eps, )} /(eps,.7) OpenFst Part I. Algorithms Optimization Algorithms 3

34 Determinization of Weighted Automata Pseudocode Determinization(A) i {(i, λ(i)) : i I} λ (i ) 3 S i 4 while S do 5 p Head(S) 6 Dequeue(S) 7 for each x i[e[q[p ]]] do 8 w L v w : (p, v) p,(p, x, w, q) E 9 q {(q, L w (v w) : (p, v) p,(p, x, w, q) E ) : q = n[e], i[e] = x, e E[Q[p ]]} E E (p, x, w, q ) if q Q then Q Q q 3 if Q[q ] F then 4 F F q 5 ρ (q ) L v ρ(q) : (q, v) q, q F 6 Enqueue(S, q ) 7 return A OpenFst Part I. Algorithms Optimization Algorithms 33

35 Definition Pushing Algorithm Input: weighted automaton or transducer M. Output: M M such that: the longest common prefix of all outgoing paths is minimal, or the -sum of the weights of all outgoing transitions = modulo the string/weight at the initial state. Description (two stages):. Single-source shortest distance computation: for each state q, d[q] = M w[π] π P(q,F). Reweighting: for each transition e such that d[p[e]], w[e] (d[p[e]]) (w[e] d[n[e]]) OpenFst Part I. Algorithms Optimization Algorithms 34

36 Conditions (automata case) Weakly divisible semiring. Zero-sum free semiring or zero-sum free machine. Complexity Automata case Acyclic case (linear): O( Q + E (T + T )). General case (tropical semiring): O( Q log Q + E ). Transducer case: O(( P max + ) E ). OpenFst Part I. Algorithms Optimization Algorithms 35

37 Weight Pushing Illustration Definition: Creates an equivalent pushed/stochastic machine Example: Tropical semiring a/ b/ c/4 d/ e/ f/ e/ 3/ a/ b/ c/4 d/ e/ f/ e/ 3/ e/ f/ e/ f/ OpenFst Part I. Algorithms Optimization Algorithms 36

38 Log semiring a/ b/ c/4 d/ e/ f/ e/ 3/ a/.366 b/.366 c/4.366 d/.36 e/.33 f/.33 e/.33 3/-.639 e/ f/ e/.36 f/.33 OpenFst Part I. Algorithms Optimization Algorithms 37

39 Label Pushing Illustration Definition: Minimizes at each state the length of the common prefix of all outgoing paths at that state. Example: a:a c: ε b:d a: ε b:ε 3 d:ε c:d a:d 4 a: ε 5 f:d 6 a:a c:d b:ε a:d b:ε 3 d:d c: ε a:d 4 a: ε 5 f: ε 6 OpenFst Part I. Algorithms Optimization Algorithms 38

40 Definition Minimization Algorithm Input: deterministic weighted automaton or transducer M. Output: deterministic M M with minimal number of states and transitions. Description: two stages. Canonical representation: use pushing or other algorithm to standardize input automata.. Automata minimization: encode pairs (label, weight) as labels and use classical unweighted minimization algorithm. Complexity Automata case Acyclic case (linear): O( Q + E (T + T )). General case (tropical semiring): O( E log Q ). Transducer case Acyclic case: O(S + Q + E ( P max + )). General case: O(S + Q + E (log Q + P max )). OpenFst Part I. Algorithms Optimization Algorithms 39

41 Minimization Illustration Definition: Computes a minimal equivalent deterministic machine Example: b/ c/ d/ a/ b/ a/ d/3 3 e/ d/4 e/3 5 c/3 c/ 6 e/ 7/ 4 b/ c/ d/ a/ b/ a/3 d/6 3 e/ d/6 e/ 5 c/ c/ 6 e/ 7/6 d/ a/ b/ b/ a/3 4 c/ c/ e/ d/6 3 6 e/ 7/6 OpenFst Part I. Algorithms Optimization Algorithms 4

42 Definition Equivalence Algorithm Input: deterministic weighted automata A and A. Output: true if A A, false otherwise. Description: two stages. Canonical representation: use pushing or other algorithm to standardize input automata.. Test: encode pairs (label, weight) as labels and use classical algorithm for testing the equivalence of unweighted automata. Complexity First stage: O(( E + E ) + ( Q + Q )log( Q + Q )) if using pushing in the tropical semiring. Second stage (quasi-linear): O(m α(m, n)) where m = E + E and n = Q + Q, and α is the inverse of Ackermann s function. OpenFst Part I. Algorithms Optimization Algorithms 4

43 Equivalence Illustration Definition: A A iff [A ](x) = [A ](x) for all x Graphical Representation: blue/.7 /.3 red/.3 yellow/.9 3 /.4 red/? blue/ yellow/.3 /.3 OpenFst Part I. Algorithms Optimization Algorithms 4

44 Single-Source Shortest-Distance Algorithms Algorithm Generic single-source shortest-distance algorithm Definition: for each state q, d[q] = M π P(q,F) w[π] Works with any queue discipline and any semiring k-closed for the graph. Coincides with classical algorithms in the specific case of the tropical semiring and the specific queue disciplines: shortest-first (Dijkstra), FIFO (Bellman-Ford), or topological sort order (Lawler). N-shortest paths algorithm General N-shortest paths algorithm augmented with the computation of the potentials. On-the-fly weighted determinization for n-shortest strings. OpenFst Part I. Algorithms Search Operations 43

45 N-Shortest Paths Illustration Definition: Computes the N-shortest paths in the input machine Condition: Semiring needs to have the path property: a b {a, b} (e.g. tropical semiring) Example: red/.5 red/.5 red/.5 green/.3 green/.3 3 blue/ yellow/.6 4 /.8 green/.3 blue/ /.8 OpenFst Part I. Algorithms Search Operations 44

46 Pruning Illustration Definition: Removes any paths which weight is more than the shortestdistance -multiply by a specified threshold Condition: Semiring needs to be commutative and have the path property: a b {a, b} (e.g. tropical semiring) Example: red/.5 red/.5 green/.3 green/.3 3 blue/ yellow/.6 4 /.8 green/.3 blue/ yellow/.6 /.8 OpenFst Part I. Algorithms Search Operations 45

47 String Algorithms Overview How to implement some fundamental string algorithms using the operations previously described: Counting patterns (e.g. n-grams) in automata Pattern matching using automata Compiling regular expression into automata Benefits: generality, efficiency and flexibility OpenFst Part I. Algorithms String Algorithms 46

48 Expected count of x in A: Counting from weighted automata Let A be a weighted automaton over the probability semiring, c(x) = u Σ u x [A](u) where: u x : number of occurrences of x in u [A](u): weight associated to u by A Pr(u) if A is pushed Condition: The weight of any cycle in A should be less than. This is the case if A represents a probability distribution. OpenFst Part I. Algorithms String Algorithms 47

49 Counting by composition with a transducer Counting transducer T for set of sequences X with Σ = {a, b}: b:ε/ a:ε/ X:X/ b:ε/ a:ε/ / u x v 3 u:ε x:x v:ε 3 in A in A T To each successful path π in A and each occurence of x along π corresponds a successful path with output x in A T c(x) is the sum of the weight of all the successful path with output x in A T Theorem: Let Π denote projection onto output labels. For all x X, c(x) = [Π (A T)](x) OpenFst Part I. Algorithms String Algorithms 48

50 Counting with transducers Example a c b 3 a a 4 b 5 c:ε b:ε a:ε a:a b:b c:ε b:ε a:ε c:ε a:ε a:a 5 Automaton A 8 a:a b:ε a:a b:b 6 a:ε 3 7 b:b b:ε 4 Counting transducer T 8 a ε ε ε a 3 b a ε 5 b 6 ε 7 4 A T Π (A T) [Π (A T)](ab) = 3 = c(ab) OpenFst Part I. Algorithms String Algorithms 49

51 Local Grammar Algorithm [Mohri, 94] Definition Input: a deterministic finite automaton A Output: a compact representation of det(σ A) Description A generalization of [Aho-Corasick, 75] Failure transitions: labeled by φ, non-consuming, traversed when no transition with required label is present Default transitions: labeled by ρ, consuming, traversed when no transition with required label is present, only present at the initial state OpenFst Part I. Algorithms String Algorithms 5

52 Local Grammar Illustration Pattern matching: find all occurences of pattern A in text T T = abbaabaab, A = ab + a b a b a 3 ρ a φ φ b φ b a 3 A det(σ A), a, b, b 3, a 4,3 a 5, b 6, a 7,3 a 8, b 9, Complexity: search time linear in T T det(σ A) OpenFst Part I. Algorithms String Algorithms 5

53 Definition Regular Expression Compilation Algorithms Input: a (weighted) regular expression α Output: a (weighted) automata representing α Description: Thompson construction. Build a sparse tree for α. Walk the tree bottom-up and apply the relevant rational operation at each node Complexity and implementation Linear in the length of α Admits lazy implementation Other constructions (Glushkov, Antimirov, Follow) can be obtained from Thompson using epsilon-removal and minimization OpenFst Part I. Algorithms String Algorithms 5

54 Regular Expression Compilation Thompson Regular expression: α = (ab + 3b(4ab) ) Thompson automaton: ε/ ε/ ε/ ε/ ε/3 3 a/ ε/ b/ 3 ε/ b/ ε/ ε/ ε/4 4 a/ 4 ε/ 5 b/ ε/ 5 ε/ ε/ ε/ ε/ ε/ ε/ */ A T (α) OpenFst Part I. Algorithms String Algorithms 53

55 Regular Expression Compilation Glushkov Regular expression: α = (ab + 3b(4ab) ) Glushkov automaton: a/ / a/ b/ a/ / b/3 a/ b/3 b/3 a/4 4 3/ b/3 a/4 b/ 5/ A G (α) = rmeps(a T (α)) OpenFst Part I. Algorithms String Algorithms 54

Speech Recognition Lecture 4: Weighted Transducer Software Library. Mehryar Mohri Courant Institute of Mathematical Sciences

Speech Recognition Lecture 4: Weighted Transducer Software Library. Mehryar Mohri Courant Institute of Mathematical Sciences Speech Recognition Lecture 4: Weighted Transducer Software Library Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Software Libraries FSM Library: Finite-State Machine Library.

More information

Generic ǫ-removal and Input ǫ-normalization Algorithms for Weighted Transducers

Generic ǫ-removal and Input ǫ-normalization Algorithms for Weighted Transducers International Journal of Foundations of Computer Science c World Scientific Publishing Company Generic ǫ-removal and Input ǫ-normalization Algorithms for Weighted Transducers Mehryar Mohri mohri@research.att.com

More information

Weighted Finite-State Transducer Algorithms An Overview

Weighted Finite-State Transducer Algorithms An Overview Weighted Finite-State Transducer Algorithms An Overview Mehryar Mohri AT&T Labs Research Shannon Laboratory 80 Park Avenue, Florham Park, NJ 0793, USA mohri@research.att.com May 4, 004 Abstract Weighted

More information

Statistical Natural Language Processing

Statistical Natural Language Processing 199 CHAPTER 4 Statistical Natural Language Processing 4.0 Introduction............................. 199 4.1 Preliminaries............................. 200 4.2 Algorithms.............................. 201

More information

EFFICIENT ALGORITHMS FOR TESTING THE TWINS PROPERTY

EFFICIENT ALGORITHMS FOR TESTING THE TWINS PROPERTY Journal of Automata, Languages and Combinatorics u (v) w, x y c Otto-von-Guericke-Universität Magdeburg EFFICIENT ALGORITHMS FOR TESTING THE TWINS PROPERTY Cyril Allauzen AT&T Labs Research 180 Park Avenue

More information

N-Way Composition of Weighted Finite-State Transducers

N-Way Composition of Weighted Finite-State Transducers International Journal of Foundations of Computer Science c World Scientific Publishing Company N-Way Composition of Weighted Finite-State Transducers CYRIL ALLAUZEN Google Research, 76 Ninth Avenue, New

More information

A Disambiguation Algorithm for Weighted Automata

A Disambiguation Algorithm for Weighted Automata A Disambiguation Algorithm for Weighted Automata Mehryar Mohri a,b and Michael D. Riley b a Courant Institute of Mathematical Sciences, 251 Mercer Street, New York, NY 10012. b Google Research, 76 Ninth

More information

Introduction to Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient computation of inner

More information

Introduction to Finite Automaton

Introduction to Finite Automaton Lecture 1 Introduction to Finite Automaton IIP-TL@NTU Lim Zhi Hao 2015 Lecture 1 Introduction to Finite Automata (FA) Intuition of FA Informally, it is a collection of a finite set of states and state

More information

Pre-Initialized Composition For Large-Vocabulary Speech Recognition

Pre-Initialized Composition For Large-Vocabulary Speech Recognition Pre-Initialized Composition For Large-Vocabulary Speech Recognition Cyril Allauzen, Michael Riley Google Research, 76 Ninth Avenue, New York, NY, USA allauzen@google.com, riley@google.com Abstract This

More information

Finite-State Transducers

Finite-State Transducers Finite-State Transducers - Seminar on Natural Language Processing - Michael Pradel July 6, 2007 Finite-state transducers play an important role in natural language processing. They provide a model for

More information

On-Line Learning with Path Experts and Non-Additive Losses

On-Line Learning with Path Experts and Non-Additive Losses On-Line Learning with Path Experts and Non-Additive Losses Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Manfred Warmuth (UC Santa Cruz) MEHRYAR MOHRI MOHRI@ COURANT

More information

Kernel Methods for Learning Languages

Kernel Methods for Learning Languages Kernel Methods for Learning Languages Leonid (Aryeh) Kontorovich a and Corinna Cortes b and Mehryar Mohri c,b a Department of Mathematics Weizmann Institute of Science, Rehovot, Israel 76100 b Google Research,

More information

Learning Languages with Rational Kernels

Learning Languages with Rational Kernels Learning Languages with Rational Kernels Corinna Cortes 1, Leonid Kontorovich 2, and Mehryar Mohri 3,1 1 Google Research, 76 Ninth Avenue, New York, NY 111. 2 Carnegie Mellon University, 5 Forbes Avenue,

More information

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY 15-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY REVIEW for MIDTERM 1 THURSDAY Feb 6 Midterm 1 will cover everything we have seen so far The PROBLEMS will be from Sipser, Chapters 1, 2, 3 It will be

More information

Introduction to Finite-State Automata

Introduction to Finite-State Automata Introduction to Finite-State Automata John McDonough Language Technologies Institute, Machine Learning for Signal Processing Group, Carnegie Mellon University March 26, 2012 Introduction In this lecture,

More information

Introduction to Kleene Algebras

Introduction to Kleene Algebras Introduction to Kleene Algebras Riccardo Pucella Basic Notions Seminar December 1, 2005 Introduction to Kleene Algebras p.1 Idempotent Semirings An idempotent semiring is a structure S = (S, +,, 1, 0)

More information

Formal Languages, Automata and Models of Computation

Formal Languages, Automata and Models of Computation CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 5 School of Innovation, Design and Engineering Mälardalen University 2011 1 Content - More Properties of Regular Languages (RL)

More information

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley

More information

An Abstraction for the Analysis of Secure Policy Interoperability

An Abstraction for the Analysis of Secure Policy Interoperability An Abstraction for the Analysis of Secure Policy Interoperability Javier Baliosian November 4, 2016 Seminario de Probabilidad y Estadística 2016, CMAT Instituto de Computación, Facultad de Ingeniería,

More information

Chapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin

Chapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin Chapter 0 Introduction Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin October 2014 Automata Theory 2 of 22 Automata theory deals

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.

More information

Path Queries under Distortions: Answering and Containment

Path Queries under Distortions: Answering and Containment Path Queries under Distortions: Answering and Containment Gosta Grahne Concordia University Alex Thomo Suffolk University Foundations of Information and Knowledge Systems (FoIKS 04) Postulate 1 The world

More information

Advanced Automata Theory 10 Transducers and Rational Relations

Advanced Automata Theory 10 Transducers and Rational Relations Advanced Automata Theory 10 Transducers and Rational Relations Frank Stephan Department of Computer Science Department of Mathematics National University of Singapore fstephan@comp.nus.edu.sg Advanced

More information

Introduction to Automata

Introduction to Automata Introduction to Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Finite-state Machines: Theory and Applications

Finite-state Machines: Theory and Applications Finite-state Machines: Theory and Applications Unweighted Finite-state Automata Thomas Hanneforth Institut für Linguistik Universität Potsdam December 10, 2008 Thomas Hanneforth (Universität Potsdam) Finite-state

More information

A Universal Turing Machine

A Universal Turing Machine A Universal Turing Machine A limitation of Turing Machines: Turing Machines are hardwired they execute only one program Real Computers are re-programmable Solution: Universal Turing Machine Attributes:

More information

Regular Expressions [1] Regular Expressions. Regular expressions can be seen as a system of notations for denoting ɛ-nfa

Regular Expressions [1] Regular Expressions. Regular expressions can be seen as a system of notations for denoting ɛ-nfa Regular Expressions [1] Regular Expressions Regular expressions can be seen as a system of notations for denoting ɛ-nfa They form an algebraic representation of ɛ-nfa algebraic : expressions with equations

More information

Author: Vivek Kulkarni ( )

Author: Vivek Kulkarni ( ) Author: Vivek Kulkarni ( vivek_kulkarni@yahoo.com ) Chapter-2: Finite State Machines Solutions for Review Questions @ Oxford University Press 2013. All rights reserved. 1 Q.1 Construct Mealy and Moore

More information

Chapter 5. Finite Automata

Chapter 5. Finite Automata Chapter 5 Finite Automata 5.1 Finite State Automata Capable of recognizing numerous symbol patterns, the class of regular languages Suitable for pattern-recognition type applications, such as the lexical

More information

Finite State Transducers

Finite State Transducers Finite State Transducers Eric Gribkoff May 29, 2013 Original Slides by Thomas Hanneforth (Universitat Potsdam) Outline 1 Definition of Finite State Transducer 2 Examples of FSTs 3 Definition of Regular

More information

Hierarchical Overlap Graph

Hierarchical Overlap Graph Hierarchical Overlap Graph B. Cazaux and E. Rivals LIRMM & IBC, Montpellier 8. Feb. 2018 arxiv:1802.04632 2018 B. Cazaux & E. Rivals 1 / 29 Overlap Graph for a set of words Consider the set P := {abaa,

More information

Finite Automata. Finite Automata

Finite Automata. Finite Automata Finite Automata Finite Automata Formal Specification of Languages Generators Grammars Context-free Regular Regular Expressions Recognizers Parsers, Push-down Automata Context Free Grammar Finite State

More information

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compiler Design Spring 2011 Lexical Analysis Sample Exercises and Solutions Prof. Pedro C. Diniz USC / Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, California 90292 pedro@isi.edu

More information

BASIC MATHEMATICAL TECHNIQUES

BASIC MATHEMATICAL TECHNIQUES CHAPTER 1 ASIC MATHEMATICAL TECHNIQUES 1.1 Introduction To understand automata theory, one must have a strong foundation about discrete mathematics. Discrete mathematics is a branch of mathematics dealing

More information

Regular Expressions and Finite-State Automata. L545 Spring 2008

Regular Expressions and Finite-State Automata. L545 Spring 2008 Regular Expressions and Finite-State Automata L545 Spring 2008 Overview Finite-state technology is: Fast and efficient Useful for a variety of language tasks Three main topics we ll discuss: Regular Expressions

More information

Learning from Uncertain Data

Learning from Uncertain Data Learning from Uncertain Data Mehryar Mohri AT&T Labs Research 180 Park Avenue, Florham Park, NJ 07932, USA mohri@research.att.com Abstract. The application of statistical methods to natural language processing

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 4 Ana Bove March 23rd 2018 Recap: Formal Proofs How formal should a proof be? Depends on its purpose...... but should be convincing......

More information

Probabilistic Aspects of Computer Science: Probabilistic Automata

Probabilistic Aspects of Computer Science: Probabilistic Automata Probabilistic Aspects of Computer Science: Probabilistic Automata Serge Haddad LSV, ENS Paris-Saclay & CNRS & Inria M Jacques Herbrand Presentation 2 Properties of Stochastic Languages 3 Decidability Results

More information

Max-plus automata and Tropical identities

Max-plus automata and Tropical identities Max-plus automata and Tropical identities Laure Daviaud University of Warwick Oxford, 01-02-2018 A natural and fundamental question: u v A A [[A](u) = [[A](v)? 2/18 A natural and fundamental question:

More information

CMPS 6610 Fall 2018 Shortest Paths Carola Wenk

CMPS 6610 Fall 2018 Shortest Paths Carola Wenk CMPS 6610 Fall 018 Shortest Paths Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk Paths in graphs Consider a digraph G = (V, E) with an edge-weight function w

More information

Language Technology. Unit 1: Sequence Models. CUNY Graduate Center. Lecture 4a: Probabilities and Estimations

Language Technology. Unit 1: Sequence Models. CUNY Graduate Center. Lecture 4a: Probabilities and Estimations Language Technology CUNY Graduate Center Unit 1: Sequence Models Lecture 4a: Probabilities and Estimations Lecture 4b: Weighted Finite-State Machines required hard optional Liang Huang Probabilities experiment

More information

Chomsky Normal Form and TURING MACHINES. TUESDAY Feb 4

Chomsky Normal Form and TURING MACHINES. TUESDAY Feb 4 Chomsky Normal Form and TURING MACHINES TUESDAY Feb 4 CHOMSKY NORMAL FORM A context-free grammar is in Chomsky normal form if every rule is of the form: A BC A a S ε B and C aren t start variables a is

More information

Models of Computation. by Costas Busch, LSU

Models of Computation. by Costas Busch, LSU Models of Computation by Costas Busch, LSU 1 Computation CPU memory 2 temporary memory input memory CPU output memory Program memory 3 Example: f ( x) x 3 temporary memory input memory Program memory compute

More information

Axioms of Kleene Algebra

Axioms of Kleene Algebra Introduction to Kleene Algebra Lecture 2 CS786 Spring 2004 January 28, 2004 Axioms of Kleene Algebra In this lecture we give the formal definition of a Kleene algebra and derive some basic consequences.

More information

A Weak Bisimulation for Weighted Automata

A Weak Bisimulation for Weighted Automata Weak Bisimulation for Weighted utomata Peter Kemper College of William and Mary Weighted utomata and Semirings here focus on commutative & idempotent semirings Weak Bisimulation Composition operators Congruence

More information

Speech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 5: N-gram Language Models Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Components Acoustic and pronunciation model: Pr(o w) =

More information

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism Closure Properties of Regular Languages Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism Closure Properties Recall a closure property is a statement

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017 Lecture 4 Ana Bove March 24th 2017 Structural induction; Concepts of automata theory. Overview of today s lecture: Recap: Formal Proofs

More information

Finite Automata - Deterministic Finite Automata. Deterministic Finite Automaton (DFA) (or Finite State Machine)

Finite Automata - Deterministic Finite Automata. Deterministic Finite Automaton (DFA) (or Finite State Machine) Finite Automata - Deterministic Finite Automata Deterministic Finite Automaton (DFA) (or Finite State Machine) M = (K, Σ, δ, s, A), where K is a finite set of states Σ is an input alphabet s K is a distinguished

More information

REVIEW QUESTIONS. Chapter 1: Foundations: Sets, Logic, and Algorithms

REVIEW QUESTIONS. Chapter 1: Foundations: Sets, Logic, and Algorithms REVIEW QUESTIONS Chapter 1: Foundations: Sets, Logic, and Algorithms 1. Why can t a Venn diagram be used to prove a statement about sets? 2. Suppose S is a set with n elements. Explain why the power set

More information

Large-Scale Training of SVMs with Automata Kernels

Large-Scale Training of SVMs with Automata Kernels Large-Scale Training of SVMs with Automata Kernels Cyril Allauzen, Corinna Cortes, and Mehryar Mohri, Google Research, 76 Ninth Avenue, New York, NY Courant Institute of Mathematical Sciences, 5 Mercer

More information

컴파일러입문 제 3 장 정규언어

컴파일러입문 제 3 장 정규언어 컴파일러입문 제 3 장 정규언어 목차 3.1 정규문법과정규언어 3.2 정규표현 3.3 유한오토마타 3.4 정규언어의속성 Regular Language Page 2 정규문법과정규언어 A study of the theory of regular languages is often justified by the fact that they model the lexical

More information

Automata Theory for Presburger Arithmetic Logic

Automata Theory for Presburger Arithmetic Logic Automata Theory for Presburger Arithmetic Logic References from Introduction to Automata Theory, Languages & Computation and Constraints in Computational Logic Theory & Application Presented by Masood

More information

CS 133 : Automata Theory and Computability

CS 133 : Automata Theory and Computability CS 133 : Automata Theory and Computability Lecture Slides 1 Regular Languages and Finite Automata Nestine Hope S. Hernandez Algorithms and Complexity Laboratory Department of Computer Science University

More information

Some useful tasks involving language. Finite-State Machines and Regular Languages. More useful tasks involving language. Regular expressions

Some useful tasks involving language. Finite-State Machines and Regular Languages. More useful tasks involving language. Regular expressions Some useful tasks involving language Finite-State Machines and Regular Languages Find all phone numbers in a text, e.g., occurrences such as When you call (614) 292-8833, you reach the fax machine. Find

More information

Single Source Shortest Paths

Single Source Shortest Paths CMPS 00 Fall 017 Single Source Shortest Paths Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk Paths in graphs Consider a digraph G = (V, E) with an edge-weight

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science Zdeněk Sawa Department of Computer Science, FEI, Technical University of Ostrava 17. listopadu 15, Ostrava-Poruba 708 33 Czech republic September 22, 2017 Z. Sawa (TU Ostrava)

More information

L p Distance and Equivalence of Probabilistic Automata

L p Distance and Equivalence of Probabilistic Automata International Journal of Foundations of Computer Science c World Scientific Publishing Company L p Distance and Equivalence of Probabilistic Automata Corinna Cortes Google Research, 76 Ninth Avenue, New

More information

Automata Theory and Formal Grammars: Lecture 1

Automata Theory and Formal Grammars: Lecture 1 Automata Theory and Formal Grammars: Lecture 1 Sets, Languages, Logic Automata Theory and Formal Grammars: Lecture 1 p.1/72 Sets, Languages, Logic Today Course Overview Administrivia Sets Theory (Review?)

More information

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs Harry Lewis October 8, 2013 Reading: Sipser, pp. 119-128. Pushdown Automata (review) Pushdown Automata = Finite automaton

More information

Deterministic Finite Automaton (DFA)

Deterministic Finite Automaton (DFA) 1 Lecture Overview Deterministic Finite Automata (DFA) o accepting a string o defining a language Nondeterministic Finite Automata (NFA) o converting to DFA (subset construction) o constructed from a regular

More information

Context-free Languages and Pushdown Automata

Context-free Languages and Pushdown Automata Context-free Languages and Pushdown Automata Finite Automata vs CFLs E.g., {a n b n } CFLs Regular From earlier results: Languages every regular language is a CFL but there are CFLs that are not regular

More information

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules). Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules). 1a) G = ({R, S, T}, {0,1}, P, S) where P is: S R0R R R0R1R R1R0R T T 0T ε (S generates the first 0. R generates

More information

Mildly Context-Sensitive Grammar Formalisms: Thread Automata

Mildly Context-Sensitive Grammar Formalisms: Thread Automata Idea of Thread Automata (1) Mildly Context-Sensitive Grammar Formalisms: Thread Automata Laura Kallmeyer Sommersemester 2011 Thread automata (TA) have been proposed in [Villemonte de La Clergerie, 2002].

More information

Part I: Definitions and Properties

Part I: Definitions and Properties Turing Machines Part I: Definitions and Properties Finite State Automata Deterministic Automata (DFSA) M = {Q, Σ, δ, q 0, F} -- Σ = Symbols -- Q = States -- q 0 = Initial State -- F = Accepting States

More information

UNIT-I. Strings, Alphabets, Language and Operations

UNIT-I. Strings, Alphabets, Language and Operations UNIT-I Strings, Alphabets, Language and Operations Strings of characters are fundamental building blocks in computer science. Alphabet is defined as a non empty finite set or nonempty set of symbols. The

More information

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata. CMSC 330: Organization of Programming Languages Last Lecture Languages Sets of strings Operations on languages Finite Automata Regular expressions Constants Operators Precedence CMSC 330 2 Clarifications

More information

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET Regular Languages and FA A language is a set of strings over a finite alphabet Σ. All languages are finite or countably infinite. The set of all languages

More information

Mathematical Preliminaries. Sipser pages 1-28

Mathematical Preliminaries. Sipser pages 1-28 Mathematical Preliminaries Sipser pages 1-28 Mathematical Preliminaries This course is about the fundamental capabilities and limitations of computers. It has 3 parts 1. Automata Models of computation

More information

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r Syllabus R9 Regulation UNIT-II NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: In the automata theory, a nondeterministic finite automaton (NFA) or nondeterministic finite state machine is a finite

More information

Page 1. Compiler Lecture Note, (Regular Language) 컴파일러입문 제 3 장 정규언어. Database Lab. KHU

Page 1. Compiler Lecture Note, (Regular Language) 컴파일러입문 제 3 장 정규언어. Database Lab. KHU Page 1 컴파일러입문 제 3 장 정규언어 Page 2 목차 I. 정규문법과정규언어 II. 정규표현 III. 유한오토마타 VI. 정규언어의속성 Page 3 정규문법과정규언어 A study of the theory of regular languages is often justified by the fact that they model the lexical analysis

More information

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation MA/CSSE 474 Theory of Computation CFL Hierarchy CFL Decision Problems Your Questions? Previous class days' material Reading Assignments HW 12 or 13 problems Anything else I have included some slides online

More information

Theory of Computation

Theory of Computation Theory of Computation (Feodor F. Dragan) Department of Computer Science Kent State University Spring, 2018 Theory of Computation, Feodor F. Dragan, Kent State University 1 Before we go into details, what

More information

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism, CS 54, Lecture 2: Finite Automata, Closure Properties Nondeterminism, Why so Many Models? Streaming Algorithms 0 42 Deterministic Finite Automata Anatomy of Deterministic Finite Automata transition: for

More information

Theory of Computation

Theory of Computation Fall 2002 (YEN) Theory of Computation Midterm Exam. Name:... I.D.#:... 1. (30 pts) True or false (mark O for true ; X for false ). (Score=Max{0, Right- 1 2 Wrong}.) (1) X... If L 1 is regular and L 2 L

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Spring 2017 Unit 1: Sequence Models Lecture 4a: Probabilities and Estimations Lecture 4b: Weighted Finite-State Machines required optional Liang Huang Probabilities experiment

More information

Lecture 17: Language Recognition

Lecture 17: Language Recognition Lecture 17: Language Recognition Finite State Automata Deterministic and Non-Deterministic Finite Automata Regular Expressions Push-Down Automata Turing Machines Modeling Computation When attempting to

More information

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro Diniz

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro Diniz Compiler Design Spring 2010 Lexical Analysis Sample Exercises and Solutions Prof. Pedro Diniz USC / Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, California 90292 pedro@isi.edu

More information

Warshall s algorithm

Warshall s algorithm Regular Expressions [1] Warshall s algorithm See Floyd-Warshall algorithm on Wikipedia The Floyd-Warshall algorithm is a graph analysis algorithm for finding shortest paths in a weigthed, directed graph

More information

Language-Processing Problems. Roland Backhouse DIMACS, 8th July, 2003

Language-Processing Problems. Roland Backhouse DIMACS, 8th July, 2003 1 Language-Processing Problems Roland Backhouse DIMACS, 8th July, 2003 Introduction 2 Factors and the factor matrix were introduced by Conway (1971). He used them very effectively in, for example, constructing

More information

CS 410/584, Algorithm Design & Analysis, Lecture Notes 4

CS 410/584, Algorithm Design & Analysis, Lecture Notes 4 CS 0/58,, Biconnectivity Let G = (N,E) be a connected A node a N is an articulation point if there are v and w different from a such that every path from 0 9 8 3 5 7 6 David Maier Biconnected Component

More information

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY 15-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY Chomsky Normal Form and TURING MACHINES TUESDAY Feb 4 CHOMSKY NORMAL FORM A context-free grammar is in Chomsky normal form if every rule is of the form:

More information

5 Context-Free Languages

5 Context-Free Languages CA320: COMPUTABILITY AND COMPLEXITY 1 5 Context-Free Languages 5.1 Context-Free Grammars Context-Free Grammars Context-free languages are specified with a context-free grammar (CFG). Formally, a CFG G

More information

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3}, Code No: 07A50501 R07 Set No. 2 III B.Tech I Semester Examinations,MAY 2011 FORMAL LANGUAGES AND AUTOMATA THEORY Computer Science And Engineering Time: 3 hours Max Marks: 80 Answer any FIVE Questions All

More information

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa CS:4330 Theory of Computation Spring 2018 Regular Languages Finite Automata and Regular Expressions Haniel Barbosa Readings for this lecture Chapter 1 of [Sipser 1996], 3rd edition. Sections 1.1 and 1.3.

More information

CS 21 Decidability and Tractability Winter Solution Set 3

CS 21 Decidability and Tractability Winter Solution Set 3 CS 21 Decidability and Tractability Winter 2018 Posted: January 31 Solution Set 3 If you have not yet turned in the Problem Set, you should not consult these solutions. 1. (a) A 2-NPDA is a 7-tuple (Q,,

More information

Dynamic Programming: Shortest Paths and DFA to Reg Exps

Dynamic Programming: Shortest Paths and DFA to Reg Exps CS 374: Algorithms & Models of Computation, Spring 207 Dynamic Programming: Shortest Paths and DFA to Reg Exps Lecture 8 March 28, 207 Chandra Chekuri (UIUC) CS374 Spring 207 / 56 Part I Shortest Paths

More information

Uses of finite automata

Uses of finite automata Chapter 2 :Finite Automata 2.1 Finite Automata Automata are computational devices to solve language recognition problems. Language recognition problem is to determine whether a word belongs to a language.

More information

Finite Automata and Formal Languages

Finite Automata and Formal Languages Finite Automata and Formal Languages TMV26/DIT32 LP4 2 Lecture 6 April 5th 2 Regular expressions (RE) are an algebraic way to denote languages. Given a RE R, it defines the language L(R). Actually, they

More information

Building Finite-State Machines Intro to NLP - J. Eisner 1

Building Finite-State Machines Intro to NLP - J. Eisner 1 Building Finite-State Machines 600.465 - Intro to NLP - J. Eisner 1 todo Why is it called closure? What is a closed set? Examples: closure of {1} or {2} or {2,5} under addition E* = eps E EE EEE Show construction

More information

CS 322 D: Formal languages and automata theory

CS 322 D: Formal languages and automata theory CS 322 D: Formal languages and automata theory Tutorial NFA DFA Regular Expression T. Najla Arfawi 2 nd Term - 26 Finite Automata Finite Automata. Q - States 2. S - Alphabets 3. d - Transitions 4. q -

More information

Advanced Automata Theory 9 Automatic Structures in General

Advanced Automata Theory 9 Automatic Structures in General Advanced Automata Theory 9 Automatic Structures in General Frank Stephan Department of Computer Science Department of Mathematics National University of Singapore fstephan@comp.nus.edu.sg Advanced Automata

More information

Nondeterministic Finite Automata

Nondeterministic Finite Automata Nondeterministic Finite Automata Not A DFA Does not have exactly one transition from every state on every symbol: Two transitions from q 0 on a No transition from q 1 (on either a or b) Though not a DFA,

More information

CS 455/555: Finite automata

CS 455/555: Finite automata CS 455/555: Finite automata Stefan D. Bruda Winter 2019 AUTOMATA (FINITE OR NOT) Generally any automaton Has a finite-state control Scans the input one symbol at a time Takes an action based on the currently

More information

Monoidal Categories, Bialgebras, and Automata

Monoidal Categories, Bialgebras, and Automata Monoidal Categories, Bialgebras, and Automata James Worthington Mathematics Department Cornell University Binghamton University Geometry/Topology Seminar October 29, 2009 Background: Automata Finite automata

More information

Automata and Languages

Automata and Languages Automata and Languages Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Mathematical Background Mathematical Background Sets Relations Functions Graphs Proof techniques Sets

More information

CP405 Theory of Computation

CP405 Theory of Computation CP405 Theory of Computation BB(3) q 0 q 1 q 2 0 q 1 1R q 2 0R q 2 1L 1 H1R q 1 1R q 0 1L Growing Fast BB(3) = 6 BB(4) = 13 BB(5) = 4098 BB(6) = 3.515 x 10 18267 (known) (known) (possible) (possible) Language:

More information

CSci 311, Models of Computation Chapter 4 Properties of Regular Languages

CSci 311, Models of Computation Chapter 4 Properties of Regular Languages CSci 311, Models of Computation Chapter 4 Properties of Regular Languages H. Conrad Cunningham 29 December 2015 Contents Introduction................................. 1 4.1 Closure Properties of Regular

More information

Non-context-Free Languages. CS215, Lecture 5 c

Non-context-Free Languages. CS215, Lecture 5 c Non-context-Free Languages CS215, Lecture 5 c 2007 1 The Pumping Lemma Theorem. (Pumping Lemma) Let be context-free. There exists a positive integer divided into five pieces, Proof for for each, and..

More information