A Disambiguation Algorithm for Finite Automata and Functional Transducers

Similar documents
2.4 Theoretical Foundations

CS 491G Combinatorial Optimization Lecture Notes

Compression of Palindromes and Regularity.

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

CS 573 Automata Theory and Formal Languages

NON-DETERMINISTIC FSA

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Nondeterministic Finite Automata

General Algorithms for Testing the Ambiguity of Finite Automata

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Subsequence Automata with Default Transitions

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

General Algorithms for Testing the Ambiguity of Finite Automata

General Suffix Automaton Construction Algorithm and Space Bounds

General Algorithms for Testing the Ambiguity of Finite Automata and the Double-Tape Ambiguity of Finite-State Transducers

Nondeterministic Automata vs Deterministic Automata

Minimal DFA. minimal DFA for L starting from any other

Automata and Regular Languages

Petri Nets. Rebecca Albrecht. Seminar: Automata Theory Chair of Software Engeneering

Finite State Automata and Determinisation

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

INTRODUCTION TO AUTOMATA THEORY

Lecture 2: Cayley Graphs

Section 2.3. Matrix Inverses

= state, a = reading and q j

Lecture 6: Coding theory

Regular languages refresher

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Lecture 8: Abstract Algebra

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

Chapter 2 Finite Automata

Prefix-Free Regular-Expression Matching

Running an NFA & the subset algorithm (NFA->DFA) CS 350 Fall 2018 gilray.org/classes/fall2018/cs350/

Chapter 4 State-Space Planning

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

CIT 596 Theory of Computation 1. Graphs and Digraphs

On a Class of Planar Graphs with Straight-Line Grid Drawings on Linear Area

Aperiodic tilings and substitutions

Lecture 08: Feb. 08, 2019

1 Nondeterministic Finite Automata

Regular expressions, Finite Automata, transition graphs are all the same!!

On the Spectra of Bipartite Directed Subgraphs of K 4

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Nondeterminism and Nodeterministic Automata

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza

Petri automata for Kleene allegories

State Complexity of Union and Intersection of Binary Suffix-Free Languages

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

POSITIVE IMPLICATIVE AND ASSOCIATIVE FILTERS OF LATTICE IMPLICATION ALGEBRAS

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

SOME INTEGRAL INEQUALITIES FOR HARMONICALLY CONVEX STOCHASTIC PROCESSES ON THE CO-ORDINATES

Designing finite automata II

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

A Short Introduction to Self-similar Groups

Separable discrete functions: recognition and sufficient conditions

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

1 From NFA to regular expression

A Process-Algebraic Semantics for Generalised Nonblocking

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Hybrid Systems Modeling, Analysis and Control

CONTROLLABILITY and observability are the central

Model Reduction of Finite State Machines by Contraction

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching

I 3 2 = I I 4 = 2A

Monochromatic Plane Matchings in Bicolored Point Set

Coalgebra, Lecture 15: Equations for Deterministic Automata

Factorising FACTORISING.

Descriptional Complexity of Non-Unary Self-Verifying Symmetric Difference Automata

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Formal Languages and Automata

Unfoldings of Networks of Timed Automata

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

GNFA GNFA GNFA GNFA GNFA

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

arxiv: v2 [math.co] 31 Oct 2016

Lecture 09: Myhill-Nerode Theorem

Maximum size of a minimum watching system and the graphs achieving the bound

CSC2542 State-Space Planning

Solutions to Problem Set #1

Logic, Set Theory and Computability [M. Coppenbarger]

Fundamentals of Computer Science

Abstraction of Nondeterministic Automata Rong Su

Introduction to Olympiad Inequalities

Symbolic Automata for Static Specification Mining

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

On Implicative and Strong Implicative Filters of Lattice Wajsberg Algebras

PREDICTABILITY IN DISCRETE-EVENT SYSTEMS UNDER PARTIAL OBSERVATION 1. Sahika Genc, Stéphane Lafortune

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

On the existence of a cherry-picking sequence

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Lecture 9: LTL and Büchi Automata

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Transcription:

A Dismigution Algorithm for Finite Automt n Funtionl Trnsuers Mehryr Mohri Cournt Institute of Mthemtil Sienes n Google Reserh 51 Merer Street, New York, NY 1001, USA Astrt. We present new ismigution lgorithm for finite utomt n funtionl finite-stte trnsuers. We give full esription of the lgorithm, inluing etile pseuooe n nlysis, n severl illustrting exmples. Our lgorithm is often more effiient n the result rmtilly smller thn the one otine using eterminiztion for finite utomt or n existing ismigution lgorithm for trnsuers se on onstrution of Shützenerger. In vriety of ses, the size of the unmiguous trnsuer returne y our lgorithm is only liner in tht of the input trnsuer while the trnsuer given y the onstrution of Shützenerger is exponentilly lrger. Our lgorithm n e use effetively in mny pplitions to mke utomt n trnsuers more effiient to use. 1 Introution Finite utomt n trnsuers re use in vriety of pplitions in text n speeh proessing [10, 13], ioinformtis [8], imge proessing [1], optil hrter reognition [6], n mny others. In these pplitions, utomt n trnsuers re often the result of vrious omplex opertions n in generl re not effiient to use. Some optimiztion lgorithms suh s eterminiztion n mke their use more time-effiient. However, the result of eterminiztion is sometimes prohiitively lrge n not ll finite-stte trnsuers re eterminizle [7, 11]. This pper presents n nlyzes n lterntive optimiztion lgorithm, ismigution, whih in prtie n hve effiieny enefits similr to eterminiztion. Our ismigution lgorithm is novel n pplies to finite utomt, inluing utomt with ɛ-trnsitions, n to funtionl finite-stte trnsuers, tht is those representing prtil funtion. Dismigution returns n utomton or trnsuer equivlent to the input tht is unmiguous, tht is one tht mits no two epting pths lele with the sme (input) string. In mny instnes, the sene of miguity n e useful to mke serh more effiient y reuing the numer of pths to explore for very lrge utomt or trnsuers with severl hunre thousn or millions of trnsitions in text n speeh proessing or in ioinformtis, n there re mny other ritil nees for the ismigution of utomt n trnsuers.

For finite utomt, one wy to proee to otin n unmiguous n equivlent utomton is simply to pply the stnr eterminiztion lgorithm. But, s we shll see, for some input utomt our lgorithm n tke exponentilly less time thn eterminiztion n return n equivlent unmiguous utomton exponentilly smller thn the one otine y using eterminiztion. For finite-stte trnsuers, ismigution pplies to roer set of trnsuers thn those tht n e eterminize using the lgorithm esrie in [11], it pplies to ny funtionl trnsuer. In ontrst, it ws shown y [3] tht funtionl trnsuer is eterminizle if n only if it itionlly verifies the twins property [7, 11, ]. Our ismigution lgorithm is lso often rmtilly more effiient n results in sustntilly smller trnsuers thn those otine using ismigution lgorithm se on onstrution of Shützenerger [16, 15], lso esrie y E. Rohe n Y. Shes in the introutory hpter of [14]. In prtiulr, when the input trnsuer is unmiguous, our lgorithm simply returns the sme trnsuer, while the result of the lgorithm presente in [14] n e exponentilly lrger. The reminer of this pper is orgnize s follows. In Setion, we introue the nottion n si onepts neee for the presenttion n nlysis of our lgorithm. In Setion 3, we present our ismigution lgorithm for finite utomt in etil, inluing the proof of its orretness n rief esription of its extension to finite utomt with ɛ-trnsitions. In Setion 4, we show how the lgorithm n e e use to ismigute funtionl trnsuers n illustrte it with severl exmples. Preliminries We will enote y ɛ the empty string. A finite utomton A with ɛ-trnsitions is system (Σ, Q, I, F, E) where Σ is finite lphet, Q finite set of sttes, I Q the set of initil sttes, F Q the set of finl sttes, n E finite multiset of trnsitions, whih re elements of Q (Σ {ɛ}) Q. We enote y A = Q + E the size of n utomton A, tht is the sum of the numer sttes n trnsitions efining A. A pth π of n utomton is n element of E with onseutive trnsitions. The lel of pth is the string otine y ontention of the lels of its onstituent trnsitions. We enote y P (p, x, q) the set of pths from p to q lele with x or, more generlly, y P (R, x, R ) the set of pths lele with x from some set of sttes R to some set of sttes R. We lso enote y P (R, R ) the set of ll pths from R to R. An epting pth is n element of P (I, F ). The lnguge epte y n utomton A is the set of strings leling its epting pths n is enote y L(A). Two utomt A n B re si to e equivlent when L(A) = L(B). We will sy tht stte p n e rehe y string x when there exists pth from n initil stte to p lele with x. When two sttes n e rehe y the sme string, we sy tht they re o-rehle. We will lso sy tht two sttes p n q shre ommon future when they mit ommon string x to reh finl

stte, tht is when there exists string x suh tht P (p, x, F ) P (q, x, F ). For ny suset s Q n x Σ, we will enote y δ(s, x) the set of sttes tht n e rehe from the sttes in s y pth lele with x. A finite-stte trnsuer is finite utomton in whih eh trnsition is ugmente with n output lel, whih is n element of ( {ɛ}), where is finite lphet. For ny trnsuer T, we enote y T 1 its inverse, tht is the trnsuer otine from T y swpping the input n output lel of eh trnsition. We will use the stnr lgorithm to ompute the intersetion A A of two utomt A n A [1], whose sttes re pirs forme y stte of A n stte of A, n whose trnsitions re of the form ((p, q),, (p, q )), where (p,, q) is trnsition in A n (p,, q ) in A. An utomton A is si to e trim if ll of its sttes lie on some epting pth. It is si to e unmiguous if no string x Σ lels two istint epting pths, finitely miguous if there exists k N suh tht no string lels more thn k epting pths, polynomilly miguous if there exists polynomil P with oeffiients in N suh tht no string x lels more thn P ( x ) epting pths. The finite, polynomil, n exponentil miguity of n utomton with ɛ-trnsitions n e teste in polynomil time [4]. 3 Dismigution lgorithm for finite utomt In this setion, we esrie in etil our ismigution lgorithm for finite utomt. The lgorithm is first esrie for utomt without ɛ-trnsitions. The extension to the se of utomt with ɛ-trnsitions is isusse lter. Our lgorithm in generl oes not require full eterminiztion. In ft, in some ses where the eterminiztion retes n sttes where n is the numer of sttes of the input utomton, the ost of our new lgorithm or the size of its output is only in O(n). 3.1 Desription Figure 1 gives the pseuooe of the lgorithm. The first step of the lgorithm onsists of omputing the utomton A A n of trimming it y removing non-oessile sttes (line 1). The ost of this omputtion is in O( A ) sine the omplexity of intersetion is qurti n sine trimming n e one in liner time. The utomton B therey onstrute n e use to etermine in onstnt time if two sttes q n r of A tht n e rehe from I vi the sme string shre ommon future simply y heking if (q, r) is stte of B. Inee, y efinition of intersetion, this property hols iff (q, r) is stte of B. As shown y the following proposition, the utomton B is in ft iretly relte to the miguity of A. Proposition 1 ([4]). Let A e trim finite utomton with no ɛ-trnsition. A is unmiguous iff no oessile stte in A A is of the form (p, q) with p q.

Dismigution(A) 1 B Trim(A A) for eh i I o 3 s {i : i I (i, i ) B} 4 I Q Q {(i, s)} 5 Enqueue(Q, (i, s)) 6 for eh (u, u ) I o 7 R R {(u, u )} 8 while Q o 9 (p, s) He(Q) 10 Dequeue(Q) 11 if `(p F ) n ( (p, s ) F with (p, s ) R (p, s)) then 1 F F {(p, s)} 13 for eh (p,, q) E o 14 t {r δ(s, ): (q, r) B} 15 if ` ((p, s ),, (q, t)) E with (p, s ) R (p, s) then 16 if ((q, t) Q ) then 17 Q Q {(q, t)} 18 Enqueue(Q, (q, t)) 19 E E `(p, s),, (q, t) 0 for eh (p, s ) suh tht `(p, s )R (p, s) n `(p, s ),, (q, t ) E o 1 R R {(q, t), (q, t ))} return A Fig. 1. New ismigution lgorithm for finite utomt. Proof. Sine A is trim, the sttes of A A re ll essile y onstrution. Thus, stte (p, q) in A A is oessile iff it lies on n epting pth, tht is y efinition of intersetion, iff there re two pths π = π 1 π P (I, F ) n π = π 1π P (I, F ) with π 1 P (I, p) n π 1 P (I, q), with π 1 n π 1 shring the sme lel n π n π lso shring the sme lel. Thus, A is unmiguous iff p = q. The lgorithm onstruts n unmiguous utomton A = (Q, E, I, F ). The set of sttes Q re of the form (p, s) where p is stte of A n s suset of the sttes of A. Line efines the initil sttes whih re of the form (i, s) with i I n s suset of the sttes in I shring ommon future with i. The lgorithm mintins reltion R suh tht two sttes of A re in reltion vi R iff they n e rehe y the sme string from the initil sttes. In prtiulr, sine ll initil sttes re rehle y ɛ, ny two pir of initil sttes re in reltion vi R (lines 6-7). The lgorithm lso mintins queue Q ontining the set of sttes (p, s) of Q left to exmine n for whih the outgoing trnsitions re to e etermine. The queue isipline, tht is the orer in whih sttes re e or extrte from Q is ritrry n oes not ffet the orretness of the lgorithm. However, ifferent orerings n result in ifferent ut equivlent resulting utomt. At eh exeution of the loop of lines 8-1, new stte (p, s) is extrte from Q (lines 9-10). To voi n miguity ue to finlity, stte (p, s) is me finl only if there is no finl stte (p, s ) F in reltion with (p, s) (lines 11-1).

0 1 3 (0, {0}) (1, {1, }) (1, {1}) (, {1, } (3, {3}) (0, {0}) (1, {0, 1, }) (1, {0, 1}) (, {}) (3, {3}) (, {}) () () () Fig.. Illustrtion of the ismigution lgorithm. () Automton A. () Result of ismigution lgorithm pplie to A. One of the two she trnsitions is isllowe y the lgorithm. () Result of eterminiztion pplie to A. Eh outgoing trnsition (p,, q) of p is then exmine. Line 14 efines t to e the suset of the sttes of A tht n e rehe from stte of s y reing x ut exlues sttes q tht o not shre ommon future with q. This is euse the susets re use to etet miguities. If q n q o not shre ommon future even though there re pths with the sme lel x rehing them, these pths nnot e omplete to reh finl stte with the sme lel. Thus, if X is the set of strings leing to stte (p, s) of Q, the suset s ontins extly the set of sttes r of A tht n e rehe vi X from I n tht shre ommon future with p. To voi reting two pths from I to (q, t) with the sme lels, the trnsition from (p, s) to (q, t) with lel q is not rete if there exists lrey one from (p, s ) to (q, t) for stte (p, s ) tht n rehe y string lso rehing (p, s) (onition of line 15). Note tht if (p, s) is extrte from Q efore stte (p, s ) with (p, s )R(p, s), then the trnsition from (p, s) to (q, t) is rete first n the one from (p, s ) to (q, t) not rete. This is how the queue isipline irets the hoie of the trnsitions rete. Lines 16-18 (q, t) to Q when it is not lrey in Q n line 19 s the new trnsition efine to E. After retion of this trnsition, the estintion stte (q, t) is then put in reltion with ll sttes (q, t ) rehe y trnsition lele with Σ from stte (p, s ) tht is in reltion with (p, s). Figure illustrtes the pplition of the lgorithm in simple se. Oserve tht sttes 1 or re not inlue in the suset of (0, {0}) in the utomton of Figure () sine 0 oes not shre ommon future with 1 or. Figure lso shows the result of the pplition of eterminiztion to the sme exmple. As n e seen from this exmple, in some instnes, eterminiztion retes more trnsitions thn ismigution. Some sttes rete y the ismigution lgorithm my e non-oessile, tht is, they my mit no trnsition to finl stte euse their output trnsitions were not onstrute to voi generting miguity. These sttes n the trnsitions leing to them n e remove in liner time using stnr trimming lgorithm. In the se of the

0 1... n-1 n 0 1 1... 3 n-1 (n-1) n () () Fig. 3. Exmples of utomt A for whih eterminiztion returns n exponentilly lrger utomton while our lgorithm returns A (for ()) or n utomton whose size is liner in A (for ()). () Automton representing the regulr expression (+) (+) n, whose miniml eterministi equivlent hs size Ω( n ). () Automton representing the regulr expression ( + ) (( + ) n + n ), whose eterminiztion results in n utomton with Ω( n ) sttes. utomton of Figure (), the stte whose she trnsition is not onstrute n e trimme. More generlly, note tht when the input utomton is unmiguous, the susets rete y our lgorithm re reue to singletons: y Proposition 1, suset nnot ontin two istint sttes in tht se. In suh ses, our lgorithm simply returns the sme utomton A. The work one fter omputtion of B is lso liner in A. In ontrst, the eterminiztion of A my le to low-up, even when the utomton is unmiguous. In prtiulr, for the stnr se of the non-eterministi utomton of Figure 3() representing the regulr expression ( + ) ( + ) n, it is known tht eterminiztion retes n+1 1 sttes. However, this utomton is unmiguous n our lgorithm returns the sme utomton unhnge. The utomton of Figure 3() is similr ut is miguous. Nevertheless, it is not hr to see tht gin the size of the utomton returne y eterminiztion is exponentil n tht tht of the utomton output y our lgorithm is only liner. 3. Anlysis The termintion of the lgorithm is gurntee y the ft tht the numer of sttes n trnsitions rete must e finite. This is euse the numer of possile susets s of sttes of A is finite, therey lso the numer of pirs (p, s) rete y the lgorithm where p is stte of A n s suset. Also, the numer of trnsitions rete t stte (p, s) is t most equl to the numer of sttes leving p in A. In the worst se, the lgorithm my rete exponentilly mny susets n thus the omputtionl omplexity of the lgorithm is exponentil. In mny prtil ses, however, this worst se ehvior is not oserve. In prtiulr, the utomton returne y our ismigution lgorithm is sustntilly smller thn the one otine y pplition of eterminiztion. We will now show tht the utomton returne y the lgorithm is unmiguous using the following lemm. Lemm 1. Let (q, t) n (q, t ) e two sttes onstrute y lgorithm Dismigution run on input utomton A, then (q, t) R (q, t ) iff (q, t) n (q, t ) re o-rehle.

Proof. We will show y inution on the length of strings x tht if two sttes (q, t) n (q, t ) re oth rehle y x, then (p, s) R (q, t ). The steps of lines 6-7 ensure tht (q, t) R (q, t ) when oth sttes re initil, tht is, when they re rehle y ɛ. Assume tht it hols for ll strings x of length less thn or equl to n. Let x = x e string of length n + 1 with x Σ n Σ n ssume tht (q, t) n (q, t ) re oth rehle y x. Then, there exists stte (p, s) rehle y x n mitting trnsition lele with leing to (q, t) n similrly stte (p, s ) rehle y x n mitting trnsition lele with leing to (q, t ). Then, y the inution hypothesis, we hve (p, s) R (p, s ), thus (q, t) R (q, t ) is gurntee y exeution of the steps of lines 0-1. This proves the implition orresponing to one sie. The onverse hols strightforwrly y onstrution (lines 6-7 n 0-1). Proposition. The utomton A returne y lgorithm Dismigution run on input utomton A is unmiguous. Proof. Let π 1 n π e two pths in A from I to F with the sme lel x Σ. If x = ɛ, π 1 is pth from some initil stte (i 1, s 1 ) to (i 1, s 1 ) n similrly π pth from some initil stte (i, s ) to (i, s ). All initil sttes re in reltion (lines 6-7), therefore t most one n e me finl (lines 11-1). This implies tht (i 1, s 1 ) = (i, s ) n π 1 = π. Let (q 1, t 1 ) e the estintion stte of π 1 n (q, t ) the estintion stte of π. Sine (q 1, t 1 ) n (q, t ) re oth rehle y x, y Lemm 1, we hve (q 1, t 1 ) R (q, t ). Sine no two istint equivlent sttes n e me finl (lines 11-1), we must hve (q 1, t 1 ) = (q, t ). If x = ɛ, this implies tht the two pths π 1 n π oinie. If x ɛ, x n e written s x = x with x Σ n Σ n π 1 n π n e eompose s π 1 = π 1e 1 n π = π e with e 1 n e trnsitions lele with leing to (q 1, t 1 ). Let (p 1, s 1 ) e the estintion stte of π 1 n (p, s ) the estintion stte of π. Sine π 1 n π re oth lele with x, y Lemm 1, we hve (p 1, s 1 ) R (p 1, s 1). By the onition of line 15, if (p 1, s 1 ) (p 1, s 1), (p 1, s 1 ) n (p 1, s 1) nnot oth mit trnsition lele with n leing to the sme stte (q 1, t 1 ). Thus, we must hve (p 1, s 1 ) = (p 1, s 1). Proeeing in the sme wy with π 1 n π n so on shows tht the pths π 1 n π oinie, whih onlues the proof. The following lemms will e use to show the equivlene etween the utomton returne y the lgorithm n the input utomton. Lemm. Let (p, s) e stte onstrute y lgorithm Dismigution run on input utomton A. If (p, s) is rehle y the strings u n v in A, then the set of sttes rehle y u in A n shring ommon future with p oinies with the set of sttes rehle y v in A n shring ommon future with p. Proof. We show y reurrene on the length of u tht if stte (p, s) is rehle y u in A, then s is the set of sttes rehle y u n shring ommon future with p. This property hols strightforwrly for u = ɛ y the onstrution of lines -5. Assume now tht it hols for ll u of length less thn or equl to n.

Let u = u with u Σ of length n n Σ. If (p, s) is rehle y u, there must exist some stte (p, s ) rehle y u n mitting trnsition lele with leing to (p, s). By the inution hypothesis, s is the set of sttes rehle y u n shring ommon future with p. By efinition of s (line 14), s = {q δ(s, ): (q, p) B}, thus the sttes in s re ll rehle y u n shre ommon future with p. Conversely, let q e stte rehle y u n shring future with p. There is trnsition lele with from some stte q rehle y u. Sine q mits trnsition to q lele with n p mits trnsition lele with to p, n p n q shre ommon future, p n q must lso shre ommon future. By the inution hypothesis, s is the set of sttes rehle y u n shring ommon future with p, therefore q is in s. Sine q δ(q, ) n q shres ommon future with p, this implies tht q is in s. This shows tht the sttes in s re those rehle y u n shring ommon future with p. Lemm 3. Let A e the utomton returne y lgorithm Dismigution run on input utomton A. Let q e stte rehle in A y string x. Then, there exists stte (q, t) in A for some suset t suh tht (q, t) is rehle y x in A. Proof. We will prove the property y inution on the length of x. The property strightforwrly hols for x = ɛ y the onstrution steps of lines -5. Assume now tht it hols for ll strings of length less thn or equl to n n let x = u with u string of length n n Σ. If q is rehle y string x in A, then there exists stte p 0 in A rehle y u n mitting trnsition lele with leing to q. By the inution hypothesis, there exists stte (p 0, s 0 ) in A rehle y u. Now, the property lerly hols for (q, t 0 ) if the trnsition lele with leving (p 0, s 0 ) is onstrute t lines 15-19, with t 0 efine t line 14. Otherwise, y the test of line 15, there must exist in A istint stte (p 1, s 0) mitting trnsition lele with leing to (q, t 0 ) with (p 1, s 0) R (p 0, s 0 ). Note tht we nnot hve p 1 = p 0, sine the sme string nnot reh two istint sttes (p 0, s 0 ) n (p 0, s 1 ). Now, sine (p 1, s 0) mits trnsition lele with leing to (q, t 0 ), p 1 must mit trnsition lele with n leing to q. Thus, p 1 n p 0 shre ommon future in A. Sine (p 1, s 0) R (p 0, s 0 ), y Lemm 1, they re rehle y ommon string v. Thus, oth u n v reh (p 0, s 0 ). By Lemm, this implies tht the set of sttes in A rehle y u n v n shring ommon future with p 0 re the sme. Sine p 1 n p 0 shre ommon future in A n v rehes oth p 0 n p 1, u must lso reh p 1 in A. If u rehes (p 1, s 0), then (q, t 0 ) n e rehe y x sine (p 1, s 0) mits trnsition lele with leing to (q, t). Otherwise, y the inution hypothesis, there must exist istint stte (p 1, s 1 ) in A rehle y u, with p 1 mitting trnsition lele with to q. Repplying the rgument lrey presente for (p 0, s 0 ) to (p 1, s 1 ), either we fin pth in A lele with x to stte (q, t 1 ), or there exists stte (p, s ) in A with the sme property s (p 0, s 0 ) with p istint from p 1 n p 0. Sine the numer of istint suh sttes is finite,

reiterting this proess gurntees fining pth in A lele with x to stte (q, t k ) fter some finite numer of times k. Thus, the property hols in ll ses. Lemm 4. Let A e the utomton returne y lgorithm Dismigution run on input utomton A, then L(A ) L(A). Proof. The proof rgument is similr to tht of Lemm 3. Let x e string rehing finl stte q 0 F in A. By Lemm 3, there exists stte (q 0, t 0 ) in A rehle y x. If stte (q 0, t 0 ) is me finl (lines 11-1), this shows tht x is epte y A. Otherwise, there must exist finl stte (q 1, t 0) with (q 1, t 0) R (q 0, t 0 ). Note tht this implies tht q 1 is finl. Note lso tht we hve q 1 q 0 sine two sttes (q 0, t 0 ) n (q 0, t 0) nnot e o-rehle with t 0 t 0. Sine (q 1, t 0) R (q 0, t 0 ), there exists string x 1 rehing oth sttes. Sine (q 0, t 0 ) is rehle y oth x n x 1, y Lemm, the set of sttes in A rehle y x n shring ommon future with q 0 n those rehle y x 1 n shring ommon future with q 0 re the sme. q 1 shres ommon future with q 0 sine oth sttes re finl n q 1 is rehle y x 1, therefore q 1 is rehle y x. Now, if x rehes (q 1, t 0), this shows tht x is epte y A. Otherwise, y Lemm 3, there exists stte (q 1, t 1 ) in A rehle y x. We n repply to (q 1, t 1 ) the sme rgument s for (q 0, t 0 ) sine q 1 is finl stte. Doing so, we either fin finl stte in A rehle y x or stte (q, t ) in A with the sme properties s (q 0, t 0 ) with q 0, q 1, n q ll istint. Sine the numer of sttes of A is finite, reiterting this proess gurntees fining finl stte rehle y x. This onlues the proof. Proposition 3. The utomton A returne y lgorithm Dismigution run on input utomton A is equivlent to A. Proof. By onstrution, pth ((p 1, s 1 ), 1, (p, s )) ((p k, s k ), k, (p k+1, s k+1 )) is rete in A only if the pth (p 1, 1, p ) (p k, k, p k+1 ) exists in A, n stte (p, s) is me finl in A only if p is finl in A. Thus, if string x = 1 k is epte y A it is lso epte y A, whih shows tht L(A ) L(A). the reverse inlusion hols y Lemm 4. The following theorem follows iretly y Propositions n 3. Theorem 1. The utomton A returne y lgorithm Dismigution run on input utomton A is n unmiguous utomton equivlent to A. Note tht the sttes isllowe vi the onition of our lgorithm re the miniml ones tht n e sfely remove from the susets to hek the presene of miguities. 3.3 Dismigution of utomt with ɛ-trnsitions Our lgorithm n lso e extene to the se of utomt with ɛ-trnsitions. We riefly esrie tht extension. Let A e n input utomton with ɛ-trnsitions.

0 ε 1 4 ε 5 3 (0, {0, 4}) ε (1, {1,, 5}) (, {1,, 5}) ε (4, {0, 4}) (5, {1,, 5}) (3, {3}) () () Fig. 4. () Automton A with ɛ-trnsitions. () Unmiguous utomton equivlent to A returne y our ismigution lgorithm. The she trnsition is isllowe y the lgorithm. Here, the utomton B use to etermine pirs of sttes shring the sme future is otine similrly y omputing the intersetion A A y using n ɛ-filter [1] n y trimming the result y removing non-oessile sttes n trnsitions. For ny set R of sttes of A, let ɛ[r] enote the ɛ-losure of R, tht is the set of sttes rehle from sttes of R vi pths lele with ɛ. To exten the lgorithm to over the se of utomt with ɛ-trnsitions, it suffies to proee s follows. The initil sttes re efine y the set of (i, s) with i I n s = {q ɛ[i]: (i, q) B}. At line 14, δ(s, ) is efine s the set of sttes rehle from s y reing, inluing vi ɛ-trnsitions. Finlly, the reltion R is extene to ɛ-trnsitions s follows: for eh (p, s ) suh tht (p, s ) R (p, s) n ((p, s), ɛ, (q, t )) E, (p, s ) is put in reltion with (q, t ). Figure 4 illustrtes the pplition of our lgorithm in tht se. 4 Dismigution of finite-stte trnsuers In this setion, we onsier the prolem of etermining n unmiguous trnsuer equivlent to given funtionl finite-stte trnsuer, tht is finite-stte trnsuer representing (prtil) rtionl funtion, or equivlently one ssoiting t most one output string to ny input string. The funtionlity of finite-stte trnsuer T n e teste effiiently from the trnsuer T T 1 s shown y []. Theorem ([]). There exists n lgorithm for testing the funtionlity of finite-stte trnsuer T with output lphet in time O( E + Q ). One possile lgorithm for fining n unmiguous trnsuer equivlent to funtionl trnsuer is eterminiztion [11], however, s isusse erlier, not ll funtionl trnsuers mit n equivlent eterministi trnsuer. Figure 5() shows n exmple of suh funtionl trnsuer whih in ft is unmiguous. A trim funtionl trnsuer is eterminizle iff it mits the twins property [3]. We will esrie inste ismigution lgorithm oes not require tht itionl property. It is known tht ny funtionl trnsuer n e represente y n unmiguous trnsuer [9, 5]. For funtionl trnsuer, y efinition, two epting pths with the sme input lel hve the sme output lels. Thus, for ismiguting funtionl trnsuer, only input lels mtter n

0 : : : 1 : 0 : 3 4 : :ε 1 : :ε : :ε 3 (0, {0}) (1, {1, }) : (3, {3}) (, {1, }) () () () : :ε Fig. 5. () Unmiguous finite-stte trnsuer mitting no sequentil or eterministi equivlent. () Funtionl trnsuer T. () Dismigute trnsuer equivlent to T returne y our lgorithm. One of the two she trnsitions is isllowe y the lgorithm. our utomt ismigution n e reily pplie to rete n unmiguous trnsuer equivlent to n input funtionl trnsuer. Our ismigution lgorithm gives onstrutive proof of the existene of n equivlent unmiguous trnsuer for rtionl funtion. The ifferent possile ross-setions of the onstrution of [9] orrespon to ifferent orers in whih trnsitions re visite n isllowe y our lgorithm. Figure 5()-() illustrtes the pplition of the lgorithm in the se of simple funtionl trnsuer. As lrey pointe out, our lgorithm ompres fvorly with the existing ismigution lgorithm for finite-stte trnsuers of Shützenerger [16, 15]. Tht onstrution n e onisely esrie s follows. Let D e eterministi utomton otine y eterminiztion of the input utomton A of the funtionl trnsuer T, tht is the utomton otine y removing the output lels of T. Then, the lgorithm onsists of omposing D with T using the stnr omposition lgorithm for finite-stte trnsuers while isllowing finlity of two omposition sttes (p, s) n (q, s) with the sme eterminiztion suset s n istint sttes p n q of T, n similrly isllowing ll ut one trnsition lele with from two sttes (p, s) n (q, s) to the sme stte, to voi generting miguities. As n e seen from this esription, the lgorithm requires the eterminiztion of A. This is impliit in the esription of this onstrution in [14]. In ontrst, our ismigution lgorithm tht oes not require the eterminiztion of A n s seen in the previous setions n return exponentilly smller utomt thn those returne y eterminiztion is some ses. Consier for exmple the finite-stte trnsuers efine s the utomt of Figure 3 with eh trnsition ugmente with n output lel ientil to its output lel. The onstrution of Shützenerger requires for those trnsuers the eterminiztion of the input utomt, thus its ost s well s the size of the result re exponentil with respet to the size of the output s lrey isusse in Setion 3. Unlike tht onstrution, s in the utomt se, our lgorithm returns the sme trnsuer or returns one whose size is only liner in tht of the input. The susets efine y our ismigution lgorithm re never lrger thn those efine in the suset onstrution of eterminiztion. This is euse for stte (p, s) onstrute in the lgorithm, only sttes shring ommon future with p re kept in the suset s. In ition to mking the size of the susets shorter, this lso reues the numer of sttes rete: two possile sttes (p, s ) n (p, s ) in the onstrution of Shützenerger re reue to the sme (p, s)

: : 1 :ε :ε (1, {1,, 3}) :ε 0 :ε :ε : : 4 : (, {1,, 3}) :ε : : :ε :ε : : 3 () (0, {0}) : : (3, {1,, 3}) :ε :ε (4, {4}) : : (1, {1, }) :ε :ε :ε (1, {1, }) :ε : (0, {0}) :ε :ε (, {1, }) : : (4, {4}) : (, {1, }) : :ε : :ε : (3, {3}) (3, {3}) () () Fig. 6. Dismigution of funtionl trnsuers. () Funtionl trnsuer T. () Unmiguous trnsuer equivlent to T returne y our lgorithm. The she trnsitions re isllowe y the lgorithm. () Unmiguous trnsuer returne y the ismigution onstrution of Shützenerger [16, 15]. fter removl from s n s of the sttes not shring ommon future with p. This les in mny ses to trnsuers exponentilly smller thn those generte y the onstrution of Shützenerger n similr improvements in time effiieny. The oservtion just emphsize n e illustrte y the simple exmple of Figure 6. The trnsuer T of Figure 6() is funtionl ut is not unmiguous. Figure 6() shows the result of our ismigution lgorithm whih is n unmiguous trnsuer equivlent to T with the sme numer of sttes. In ontrst, the trnsuer rete y the onstrution of Shützenerger (Figure 6()) hs severl more sttes n trnsitions n some lrger susets. 5 Conlusion We presente new n often more effiient lgorithm for the ismigution of finite utomt n funtionl trnsuers. This lgorithm is of gret prtil importne in vriety of pplitions inluing text n speeh proessing, ioinformtis, n in mny other pplitions where they n e use to inrese serh effiieny. We hve lso esigne nturl extension of these lgorithms to some ro fmilies of weighte utomt n trnsuers efine over ifferent semirings. We will present these extensions s well s their theoretil nlysis in longer version of this pper.

Aknowlegments I thnk Cyril Alluzen n Mihel Riley for isussions out this work. This reserh ws supporte y Google Reserh Awr. Referenes 1. J. Alert n J. Kri. Digitl imge ompression. In Hnook of weighte utomt. Springer, 009.. C. Alluzen n M. Mohri. Effiient lgorithms for testing the twins property. Journl of Automt, Lnguges n Comintoris, 8():117 144, 003. 3. C. Alluzen n M. Mohri. Finitely susequentil trnsuers. Interntionl Journl of Fountions of Computer Siene, 14(6):983 994, 003. 4. C. Alluzen, M. Mohri, n A. Rstogi. Generl lgorithms for testing the miguity of finite utomt n the oule-tpe miguity of finite-stte trnsuers. Int. J. Foun. Comput. Si., (4):883 904, 011. 5. J. Berstel. Trnsutions n Context-Free Lnguges. Teuner Stuienuher, 1979. 6. T. M. Breuel. The OCRopus open soure OCR system. In Proeeings of IS&T/SPIE 0th Annul Symposium, 008. 7. C. Choffrut. Contriutions à l étue e quelques fmilles remrqules e fontions rtionnelles. PhD thesis, Université Pris 7, LITP: Pris, Frne, 1978. 8. R. Durin, S. R. Ey, A. Krogh, n G. J. Mithison. Biologil Sequene Anlysis: Proilisti Moels of Proteins n Nulei Ais. Cmrige University Press, 1998. 9. S. Eilenerg. Automt, Lnguges n Mhines, volume A. Aemi Press, 1974. 10. R. M. Kpln n M. Ky. Regulr moels of phonologil rule systems. Computtionl Linguistis, 0(3), 1994. 11. M. Mohri. Finite-stte trnsuers in lnguge n speeh proessing. Computtionl Linguistis, 3():69 311, 1997. 1. M. Mohri. Weighte utomt lgorithms. In Hnook of Weighte Automt, pges 13 54. Springer, 009. 13. M. Mohri, F. C. N. Pereir, n M. Riley. Speeh reognition with weighte finitestte trnsuers. In Hnook on speeh proessing n speeh ommunition. Springer, 008. 14. E. Rohe n Y. Shes, eitors. Finite-Stte Lnguge Proessing. MIT Press, 1997. 15. J. Skrovith. A onstrution on finite utomt tht hs remine hien. Theor. Comput. Si., 04(1-):05 31, 1998. 16. M. P. Shützenerger. Sur les reltions rtionnelles entre monoies lires. Theor. Comput. Si., 3():43 59, 1976.