ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA

Similar documents
1 Nondeterministic Finite Automata

Nondeterminism and Nodeterministic Automata

p-adic Egyptian Fractions

Minimal DFA. minimal DFA for L starting from any other

Parse trees, ambiguity, and Chomsky normal form

Model Reduction of Finite State Machines by Contraction

Formal Languages and Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Convert the NFA into DFA

Designing finite automata II

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Lecture 08: Feb. 08, 2019

Formal languages, automata, and theory of computation

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

First Midterm Examination

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

CM10196 Topic 4: Functions and Relations

3 Regular expressions

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Theory of Computation Regular Languages

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CMSC 330: Organization of Programming Languages

CHAPTER 1 Regular Languages. Contents

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

More on automata. Michael George. March 24 April 7, 2014

Bases for Vector Spaces

State Minimization for DFAs

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Finite Automata-cont d

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

set is not closed under matrix [ multiplication, ] and does not form a group.

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

1.3 Regular Expressions

Harvard University Computer Science 121 Midterm October 23, 2012

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

The size of subsequence automaton

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

First Midterm Examination

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Lecture 09: Myhill-Nerode Theorem

1.4 Nonregular Languages

Homework Solution - Set 5 Due: Friday 10/03/08

Review of Gaussian Quadrature method

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Regular expressions, Finite Automata, transition graphs are all the same!!

Worked out examples Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Homework 3 Solutions

Math 1B, lecture 4: Error bounds for numerical methods

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

Lecture 9: LTL and Büchi Automata

2.4 Linear Inequalities and Interval Notation

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

1 From NFA to regular expression

Closure Properties of Regular Languages

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Chapter 2 Finite Automata

Java II Finite Automata I

Surface maps into free groups

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

DFA minimisation using the Myhill-Nerode theorem

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

The Regulated and Riemann Integrals

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Hamiltonian Cycle in Complete Multipartite Graphs

CS103 Handout 32 Fall 2016 November 11, 2016 Problem Set 7

CDM Automata on Infinite Words

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

1B40 Practical Skills

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

FABER Formal Languages, Automata and Models of Computation

Lecture 3: Equivalence Relations

Review of Calculus, cont d

Exercises with (Some) Solutions

Deterministic Finite Automata

BACHELOR THESIS Star height

Lecture 2: January 27

How Deterministic are Good-For-Games Automata?

CONTEXT-SENSITIVE LANGUAGES, RATIONAL GRAPHS AND DETERMINISM

CS 275 Automata and Formal Language Theory

Designing Information Devices and Systems I Spring 2018 Homework 7

A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data

Theoretical foundations of Gaussian quadrature

Revision Sheet. (a) Give a regular expression for each of the following languages:

Transcription:

To pper in SIAM Journl on Computing c SIAM 000 ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, AND JEFFERY R. WESTBROOK Astrct. We study the prolem of constructing the deterministic equivlent of nondeterministic weighted finite-stte utomton (WFA). Determiniztion of WFAs hs importnt pplictions in utomtic speech recognition (ASR). We provide the first polynomil-time lgorithm to test for the twins property, which determines if WFA dmits deterministic equivlent. We lso give upper ounds on the size of the deterministic equivlent; the ound is tight in the cse of cyclic WFAs. Previously, Mohri presented super-polynomil time lgorithm to test for the twins property, nd he lso gve n lgorithm to determinize WFAs. He showed tht the ltter runs in time liner in the size of the output when deterministic equivlent exists; otherwise, it does not terminte. Our ounds imply n upper ound on the running time of this lgorithm. Given tht WFAs cn expnd exponentilly in size when determinized, we explore why those tht occur in ASR tend to shrink when determinized. According to ASR folklore, this phenomenon is ttriutle solely to the fct tht ASR WFAs hve simple topology, in prticulr, tht they re cyclic nd lyered. We introduce very simple clss of WFAs with this structure, ut we show tht the expnsion under determiniztion depends on the trnsition weights: some weightings cuse them to shrink, while others, including rndom weightings, cuse them to expnd exponentilly. We provide experimentl evidence tht ASR WFAs exhiit this weight dependence. Tht they shrink when determinized, therefore, is result of fvorle weightings in ddition to specil topology. These nlyses nd oservtions hve een used to design new, pproximte WFA determiniztion lgorithm, reported in seprte pper long with experimentl results showing tht it chieves significnt WFA size reduction with negligile impct on ASR performnce. Key words. lgorithms, rtionl functions nd power series, speech recognition, weighted utomt AMS suject clssifictions. 0M5, 68Q5, 68Q45, 68T0 PII.. Introduction. Finite-stte mchines nd their reltion to rtionl functions nd power series hve een extensively studied [,,, 9] nd widely pplied in fields rnging from imge compression [0,,, 7] to nturl lnguge processing [, 0,, 8, 0]. A suclss of finite-stte mchines, the weighted finite-stte utomt (WFAs), hs recently ssumed new importnce, ecuse WFAs provide powerful method for representing nd mnipulting models of humn lnguge in utomtic speech recognition (ASR) systems [, 4]. This new reserch direction lso rises numer of chllenging lgorithmic questions [5]. A weighted finite-stte utomton (WFA) is nondeterministic finite utomton (NFA), A, tht hs oth n lphet symol nd weight, from some set K, on ech trnsition. Let R = (K,,, 0, ) e semiring. Then A together with R genertes prtil function from strings to K: the vlue of n ccepted string is the semiring sum over ccepting pths of the semiring product of the weights long ech ccepting pth. A prtil function tht cn e generted this wy is rtionl power series [9]. An exmple importnt to ASR is the set of WFAs with the min-sum semiring, An extended strct of this pper ppers in Proc. 5th Int l. Conf. on Automt, Lnguges, nd Progrmming, 998. AT&T Ls, Shnnon Lortory, 80 Prk Avenue, Florhm Prk, NJ 079, USA, l@reserch.tt.com. Diprtimento di Mtemtic ed Appliczioni, Universitá di Plermo, Vi Archirfi 4, 90 Plermo, Itly, rffele@ltir.mth.unip.it. Work supported y AT&T Ls. 0th Century Television, Los Angeles, CA 9005, USA, jwestrook@cm.org. Work completed while memer of AT&T Ls.

ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK (R + {0, },min,+,,0), which compute for ech ccepted string the minimum cost ccepting pth. In this pper, we study prolems relted to the determiniztion of WFAs. A deterministic, or sequentil, WFA hs t most one trnsition with given input symol out of ech stte. Not ll rtionl power series cn e generted y deterministic WFAs. A determiniztion lgorithm tkes s input WFA nd produces deterministic WFA tht genertes the sme rtionl power series, if such deterministic WFA exists. The importnce of determiniztion to ASR is well estlished [0,, 4]. To the est of our knowledge, Mohri [0] presented the first determiniztion procedure for WFAs, extending the seminl ides of Choffrut [8, 9] nd Weer nd Klemm [] regrding string-to-string trnsducers. Mohri gives determiniztion procedure with three phses. First, A is converted to n equivlent unmiguous, trim WFA A t, using n lgorithm nlogous to one for NFAs []. (We define unmiguous nd trim elow.) Mohri then gives n lgorithm, TT, tht determines if A t hs the twins property (lso defined elow). If A t does not hve the twins property, then there is no deterministic equivlent of A. If A t hs the twins property, second lgorithm of Mohri s, DTA, cn e pplied to A t to yield A, deterministic equivlent of A. Algorithm TT runs in O(m 4n ) time, where m is the numer of trnsitions nd n the numer of sttes in A t. Algorithm DTA runs in time liner in the size of A. In prctice, DTA is run directly on A, which is ssumed to dmit deterministic equivlent; conversion to A t nd testing for twins re theoreticl steps needed to mke the procedure well defined. Mohri oserves tht A cn e exponentilly lrger thn A, ecuse WFAs include clssicl NFAs. He gives no upper ound on the worst-cse stte-spce expnsion, however, nd ecuse of the weights on trnsitions, the clssicl NFA upper ound does not pply. Finlly, Mohri gives n lgorithm tht tkes deterministic WFA nd outputs the minimum-size equivlent, deterministic WFA. We present severl results relted to the determiniztion of WFAs. In Section we give the first polynomil-time lgorithm to test whether n unmiguous, trim WFA stisfies the twins property. It runs in O(m n 6 ) time. We then provide worst-cse time complexity nlysis of DTA. The numer of sttes in the output deterministic WFA is t most n( log n+n log Σ +), where Σ is the input lphet. If the weights re rtionl, this ound ecomes n( log n++min(n log Σ,ρ)), where ρ is the mximum it-size of weight. When the input WFA is cyclic, the ound ecomes n log Σ. The cyclic ound holds for rel weights, nd it is tight (up to constnt fctors) for ny lphet size. It remins open whether there exists polynomil-time procedure to determine whether n ritrry WFA dmits deterministic equivlent, ecuse the determiniztion process ove requires converting WFA to n unmiguous equivlent prior to testing for twins. In Sections 4 6 we study questions motivted y the use of WFA determiniztion in ASR [, 4]. Although determiniztion cuses exponentil stte-spce expnsion in the worst cse, in ASR systems the determinized WFAs re often smller thn the input WFAs [0], nd they re seldom very lrge. This is fortunte, ecuse the performnce of ASR systems depends directly on WFA size [, 4]. Folklore within the ASR community credits this phenomenon entirely to the specil topology of ASR WFAs. (The topology of WFA is its underlying directed grph nd leling y input symols, ignoring weights.) ASR WFAs tend to e cyclic nd lyered. Such WFA lwys dmits deterministic equivlent. The role tht the trnsition weights might ply in controlling expnsion under determiniztion hs not een considered. In Section 4 we study the role of topology in expnsion under determiniztion.

ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA We exhiit clss of lyered, cyclic WFAs whose minimum equivlent deterministic WFAs re exponentilly lrger regrdless of weighting. The lnguges ccepted y these WFAs re quite unnturl, however. In Section 5 we study the role of trnsition weights in expnsion under determiniztion. We introduce clss of nondeterministic WFAs, RG. Ech WFA in this clss hs n extremely simple multi-prtite, cyclic topology, ccepts very trivil lnguge, nd in the sence of weights (i.e., with ll weights set to zero) hs smller deterministic equivlent. We show, however, tht for ny A RG nd ny i n, there exists n ssignment of weights to the trnsitions of A such tht the miniml equivlent deterministic WFA hs Θ( i log Σ ) sttes. This gives lower ound to mtch the upper ounds of Section. Using ides from universl hshing, we lso show tht similr results hold when the weights re rndom i-it numers. This motivtes us to exmine experimentlly the effect of vrying weights on ctul WFAs from ASR pplictions. In Section 6 we give the results of these experiments. We cll WFA weight-dependent if its expnsion under determiniztion is strongly determined y its weights. Most of the exmples from ASR were weightdependent. These experimentl results together with the theory we develop show tht the folklore explntion is insufficient: ASR WFAs shrink under determiniztion ecuse oth the topology nd weighting tend to e fvorle. Some of our results help explin the nture of WFAs from the lgorithmic point of view, i.e., how weights ssigned to the trnsitions of WFA cn ffect the performnce of lgorithms mnipulting it. Others relte directly to the theory of weighted utomt. We hve used our results to design n pproximte vrint of Mohri s determiniztion lgorithm. We descrie this lgorithm seprtely [6], long with experimentl results showing tht it chieves size reductions in ASR lnguge models tht significntly exceed those of previous methods, with negligile effects on ASR performnce (time nd ccurcy).. Definitions nd Terminology. Given semiring (K,,,0,), weighted finite utomton (WFA) is tuple G = (Q, q,σ,δ,q f ). Q is the set of sttes, q Q is the initil stte, Σ is the set of symols, δ Q Σ K Q is the set of trnsitions, nd Q f Q is the set of finl sttes. We ssume tht Σ >. A deterministic, or sequentil, WFA hs t most one trnsition t = (q,σ,ν,q ) for ny pir (q,σ); nondeterministic WFA cn hve multiple trnsitions on pir (q,σ), differing in trget stte q. The prolems exmined in this pper re motivted primrily y ASR pplictions, which work with the min-sum semiring, (R + {0, },min,+,,0), nd we therefore limit further discussion to the min-sum semiring. (Some of the lgorithms considered use sutrction. To e well-defined, therefore, they require skew field. The min-sum semiring is indeed emedded in skew field [6].) Let t = (t,...,t l ) e some sequence of trnsitions, such tht t i = (q i,σ i,ν i,q i ); t induces string w = σ σ l. String w is ccepted y t if q 0 = q nd q l Q f ; w is ccepted y G if some t ccepts w. Let c(t i ) = ν i e the weight of t i. Then the weight of t is l c( t) = c(t i ). i= Let T(w) e the set of ll sequences of trnsitions tht ccept string w. Then the weight of w is c(w) = min c( t). t T(w)

4 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK The weighted lnguge of G is the set L(G) = {(w, c(w)) w is ccepted y G} ; i.e., the weighted strings ccepted y G. Intuitively, the weight on trnsition of G cn e seen s the confidence one hs in tking tht trnsition. The weights need not, however, stisfy stochstic constrints, s do the proilistic utomt introduced y Rin [6]. Fix two sttes q nd q nd string v Σ. Let c(q,v,q ) e the minimum of c( t), tken over ll trnsition sequences t from q to q inducing v. We refer to c(q,v,q ) s the optiml cost of inducing v from q to q. We generlly use nottion so tht δ(q,w) cn represent the set of sttes rechle from stte q Q on string w Σ. We extend the function δ to strings in the usul wy: q δ(q,v),v Σ +, mens tht there is sequence of trnsitions from q to q inducing v. The topology of G is the projection π Q Σ Q (δ): i.e., the trnsitions of G without respect to the weights. We lso refer to the topology of G s the grph underlying G. A WFA is trim if every stte ppers in n ccepting pth for some string nd no trnsition is weighted 0 ( in the min-sum semiring). A WFA is unmiguous if there is exctly one ccepting pth for ech ccepted string. Determiniztion of G is the prolem of computing deterministic WFA G such tht L(G ) = L(G), if such G exists. We denote the output of lgorithm DTA y dt(g). We denote the miniml deterministic WFA ccepting L(G) y min(g), if one exists. We sy tht G expnds if dt(g) hs more sttes nd/or trnsitions thn G. Let n = Q nd m = δ, nd let the size of G e G = n + m. We lso use #G to men Q, the numer of sttes of G. We ssume tht ech trnsition is leled with exctly one symol, so Σ m. Recll tht the weights of G re non-negtive rel numers. Let C e the mximum weight. In the generl cse, weights re incommensurle rel numers, requiring infinite precision. In the integer cse, weights cn e represented with ρ = lg C its. We denote the integrl rnge [,] y [,] Z. The integer cse extends to the cse in which the weights re rtionls requiring ρ its. We ssume tht in the integer nd rtionl cses, weights re normlized to remove excess lest-significnt zero its. For our nlyses, we use the RAM model of computtion s follows. In the generl cse, we chrge constnt time for ech rithmetic-logic opertion involving weights (which re rel numers). We refer to this model s the R-RAM [5]. The relevnt prmeters for our nlyses re n, m, nd Σ. In the integer cse, we lso use RAM, except tht ech rithmetic-logic opertion now tkes O(ρ) time. We refer to this model s the CO-RAM[]. The relevnt prmeters for the nlyses re n, m, Σ, nd ρ.. Determiniztion of WFAs... An Algorithm for Testing the Twins Property. Definition.. Two sttes, q nd q, of WFA G re twins if u,v Σ such tht q δ( q,u), q δ( q,u), q δ(q,v), nd q δ(q,v), the following holds: c(q,v,q) = c(q,v,q ). G hs the twins property if ll pirs q,q Q re twins. Tht is, if sttes q nd q re rechle from q y common string, then q nd q re twins only if ny string tht induces cycle t ech induces cycles of equl optiml cost. Note tht two sttes hving no cycle on common string re twins. Mohri [0, Theorems nd ] proves tht ny WFA G over the min-sum semiring is determinizle if it hs the twins property; furthermore, if G is trim nd unmiguous, the twins property ecomes necessry nd sufficient condition. For n exmple of non-determinizle WFA, see Figure..

ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA 5 /0 /0 c/0 0 /0 d/0 / Fig... A nondeterministic, trim, unmiguous WFA G. Arcs leled σ/w correspond to trnsitions leled σ with weight w. For this nd succeeding figures, the strt stte is the unique source, nd finl sttes re denoted y doule circles. G ccepts the lnguge {( n c, 0), ( n d, n) n 0}. Sttes nd do not hve the twins property: ech is rechle from stte 0 vi string, yet the costs of the cycles leled t ech differ. It is esily shown tht no deterministic WFA cn ccept L(G). The twins property for WFAs is nlogous to tht defined y Choffrut [8, 9] nd (in different terms) y Weer nd Klemm [] to identify necessry nd sufficient conditions for string-to-string trnsducer to dmit sequentil trnsducer relizing the sme rtionl trnsduction. In spite of the strong nlogy, the proof techniques used for WFAs differ from those used to otin nlogous results for string-to-string trnsducers. In prticulr, the efficient lgorithm we derive here to test WFA for twins is not relted to the polynomil-time lgorithm of Weer nd Klemm [] for testing twins in string-to-string trnsducers. We reduce the prolem of testing the twins property to tht of computing shortest pths on some suitly defined grphs, which we introduce next. Let T q, q e the multi-prtite cyclic, leled, weighted grph hving n lyers nd inductively defined s follows. The root vertex ˆr is t lyer zero nd corresponds to ( q, q). The vertices t lyer one correspond to suset of Q Q otined s follows: ˆr is connected to vertex u, corresponding to (q,q ), if nd only if there re two distinct trnsitions t = ( q,,c,q ) nd t = ( q,,c,q ) in G. The rc connecting ˆr to u is leled with Σ nd hs cost c = c c. Assume tht we hve the vertices t lyer i. The vertices t lyer i re otined s follows. Let u e the vertex t lyer i corresponding to (q,q ) Q Q; u is connected to u, corresponding to (q,q ), t lyer i if nd only if there re two distinct trnsitions t = (q,,c,q ) nd t = (q,,c,q ) in G. The rc connecting u to u is leled with Σ nd hs cost c = c c. This grph hs O(n 4 ) vertices nd O(m n 4 ) rcs. Let (q,q ) i denote the vertex corresponding to (q,q ) Q Q t lyer i of T q, q, if ny. Let RT {(q,q ) q q } e the set of pirs of distinct sttes of G tht re rechle from ( q, q) 0 in T q, q. For ech (q,q ) RT, define T q,q nlogously to T q, q. Notice tht T q,q hs O(n 4 ) vertices nd O(m n 4 ) rcs. We need the following. Lemm.. Fix two distinct sttes q nd q of G. They cn e reched from the initil stte q of G y the sme string z Σ + if nd only if there exists some string

6 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK z Σ i, for some i n, such tht q nd q re oth reched from q using z. In tht cse, there is t lest one pth in T q, q tht goes from ( q, q) 0 to (q,q ) i. Proof. Fix string z Σ +, nd ssume tht q nd q cn e reched from q y z. Assume tht z > n, or else we re done. Since there re only n distinct pirs of sttes of G nd z > n, there must exist two sttes q nd q nd string v Σ + such tht () z = xvu; () q (rsp., q ) is on pth from q to q (rsp., q ) inducing z; nd (c) q δ(q,v) (rsp., q δ(q,v)). But then, z = xu lso reches oth q nd q from q. If z n, we re done; otherwise we iterte the rgument. The second prt of the lemm follows y construction of T q, q. Lemm.. Let G e trim nd unmiguous. Fix string y Σ i, i n, nd two distinct sttes q nd q of G. Then q δ(q,y) nd q δ(q,y) if nd only if there is exctly one pth p in T q,q tht strts t (q,q ) 0, ends t (q,q ) y, nd induces y. Moreover, the cost of p is c(q,y,q ) c(q,y,q ). Proof. We prove the sufficient cse; the necessry cse should e cler from the construction of T q,q. First oserve tht, since G is trim nd unmiguous, the following holds: for ech string y Σ + such tht q δ(q,y), there is exctly one cycle strting nd ending t q nd inducing y. Let (q = q 0,q,q,,q y = q) e the unique sequence of sttes of G originting in q nd inducing y in G. Therefore, c(q,y,q) is the sum of the weights on the trnsitions in tht sequence. Similrly define (q = q 0,q,q,,q y = q ). By the ove construction, there exists pth p in T q,q, consisting of the vertices ((q,q ) 0,(q,q ),,(q,q ) y ) nd inducing y. This pth must e unique, nd its cost is c(q,y,q) c(q,y,q ). Lemm.4 ([0, Lemm ]). Let G e trim, unmiguous WFA. G hs the twins property if nd only if u,v Σ such tht uv n, the following holds: when there exist two sttes q nd q such tht (i) {q,q } δ( q,u), nd (ii) q δ(q,v) nd q δ(q,v), then (iii) c(q,v,q) = c(q,v,q ) must follow. Fix two distinct sttes q nd q of G. Let (q,q ) i,(q,q ) i,...,(q,q ) is, 0 < i < i < < i s, e ll the occurrences of (q,q ) in T q,q, excluding (q,q ) 0. This sequence my e empty. A symmetric sequence cn e extrcted from T q,q. We refer to these sequences s the common cycles sequences of (q,q ). We sy tht q nd q stisfy the locl twins property if nd only if () their common cycles sequences re empty, or () zero is the cost of ny shortest pth from (q,q ) 0 to (q,q ) ij in T q,q nd from (q,q) 0 to (q,q) ij in T q q, for ll j s. Lemm.5. Let G e trim, unmiguous WFA. G stisfies the twins property if nd only if (i) RT is empty or (ii) ll (q,q ) RT stisfy the locl twins property. Proof. ) Assume tht G stisfies the twins property. If RT is empty, we re done. Assume then tht RT. The proof is y contrdiction. Assume tht some (q,q ) RT does not stisfy the locl twins property. The common cycles sequences of (q,q ) cnnot e empty, or else they would stisfy the locl twins property. By ssumption, there exists some j for which the cost of some shortest pth from (q,q ) 0 to (q,q ) ij in T q,q is not zero, while the cost of shortest pth from (q,q) 0 to (q,q) ij in T q q my e ny vlue, including zero (or vice vers). Fix ny such shortest pth p in T q,q. According to Lemm., p corresponds to cycles round q nd q tht ech induce the sme string y, for some y Σ ij. Moreover, we must hve c(q,y,q) c(q,y,q ) 0. By definition of RT, q nd q re ech rechle y some string u from the initil stte of G. Therefore, G does not stisfy the twins property, which is contrdiction.

ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA 7 ) Assume tht RT is empty. Then, y Lemm., no two distinct sttes q,q of G cn oth e reched y some string z Σ + from the initil stte q. Therefore, G stisfies the twins property. Assume now tht RT is not empty. We hve two sucses. Sucse A) Assume tht ll sttes in RT stisfy the locl twins property ecuse their common cycles sequences re empty. This implies tht ll pirs of distinct sttes rechle from the initil stte of G through the sme string z Σ + do not hve ny cycles in common inducing identicl strings. Thus, G stisfies the twins property. Sucse B) Assume tht some sttes in RT stisfy the locl twins property nd their common cycles sequences re not empty. Let RT e such set. Assume tht G does not stisfy the twins property. We derive contrdiction. Since RT is not empty, we hve tht the set of pirs of sttes for which (i) nd (ii) re stisfied in Lemm.4 is not empty. But since G does not stisfy the twins property, there must exist two distinct sttes q nd q nd string uv Σ, uv n, such tht (i) oth q nd q cn e reched from the initil stte of G through string u; (ii) q δ(q,v) nd q δ(q,v); nd (iii) c(q,v,q) c(q,v,q ). We now rgue tht (q,q ) must e in RT. Becuse q q nd G hs only one initil stte, we hve tht u. Thus, u n, implying tht (q,q ) RT. v cnnot e the empty string ǫ ecuse c(q,ǫ,q) = c(q,ǫ,q ) = 0. Since uv n, we hve tht v n. But then, y Lemm. nd (ii) ove, we hve tht (q,q ) v cn e reched from (q,q ) 0 in T q,q through the nonempty string v. Therefore, the common cycles sequences of (q,q ) cnnot e empty, implying tht (q,q ) RT. Without loss of generlity, ssume tht c(q,v,q) c(q,v,q ) < 0. Since v n, we hve y Lemm. tht there is exctly one pth p in T q,q strting t (q,q) 0, ending in (q,q ) v, inducing v, nd with cost c(q,v,q) c(q,v,q ) < 0. Since p hs negtive cost, the cost of the shortest pth from (q,q ) 0 to (q,q ) v in T q,q cnnot e zero, which contrdicts tht q nd q stisfy the locl twins property nd hve non-empty common cycles sequences. Our lgorithm for testing whether trim, unmiguous WFA hs the twins property works s follows. First, compute T q, q nd the set RT. Then, for ech pir of sttes (q,q ) RT tht hve not een processed yet: compute T q,q nd T q,q, extrct the common cycles sequences, nd compute the single source (from the root) shortest pths to vertices in T q,q nd T q,q. Theorem.6. Let G e trim unmiguous WFA. In the generl cse, whether G stisfies the twins property cn e checked in O(m n 6 ) time using the R-RAM model of computtion. In the integer cse, the ound ecomes O(ρm n 6 ) using the CO- RAM model of computtion. Proof. Lemm.5 implies correctness. We now nlyze the lgorithm, strting with the generl cse. Recll tht ech rithmetic-logic opertion cn e done in constnt time. T q, q cn e esily otined in O(m n 4 ) time y visiting the utomton G. Now, visiting T q, q, we cn otin the set RT in the sme mount of time. Fix pir of distinct sttes q nd q of G. It is sufficient to discuss how to compute shortest pths from the root vertex of T q,q to the other vertices in the grph. Notice tht the edges of T q,q my hve negtive cost. However, T q,q is multi-prtite cyclic grph. In tht cse, it is simple exercise to show how to perform the required computtion in time liner in the size of T q,q, i.e., O(m n 4 ) time. Since RT = O(n ), the totl time of the lgorithm is O(m n 6 ). For the integer cse, we multiply the ove ound y ρ. We lso mention, omitting the detils, tht the exponentil-time lgorithm for

8 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK / q / / {(q,),(q,0)} / q 0 /4 / / / q q 0 {(q,0)} / q /5 / {(q,0),(q,)} / () () Fig... () A nondeterministic weighted utomton, A. Arcs leled σ/w correspond to trnsitions leled σ with weight w. () The result of pplying DTA to A. This is derived from Figures nd of Mohri [0]. testing the twins property originlly devised y Mohri [0] cn e simplified nd implemented to run in pseudo-polynomil time in the integer cse. The lgorithm we devise here is wekly polynomil in the integer cse... The DTA Algorithm. Mohri [0] descries determiniztion lgorithm for finite-stte utomton with weights drwn from generl semiring. Wht we refer to s DTA is tht lgorithm restricted to the min-sum semiring. DTA is generliztion of the clssic power-set construction for finite utomt. We descrie the lgorithm, strting with n exmple. Consider the weighted utomton, A, in Figure.(). While A is not unmiguous, it hs the twins property, nd so we cn pply DTA directly to it, proceeding s follows. From the initil stte q 0, we cn rech sttes q nd q using the input symol. Anlogously to the determiniztion of finite-stte utomt, we estlish new stte {q,q } in A, rechle from q 0 with input symol. The trnsitions to q nd q, however, hve different weights in A. DTA selects the smller weight to e the weight of the trnsition to {q,q } nd records the difference etween the two weights in the new stte. In the exmple, the weight of the q 0 q trnsition is, nd tht of the q 0 q trnsition is. Therefore, the new trnsition to {q,q } gets weight, nd the difference of = is ssigned s reminder to component q. For completeness, reminder of 0 = is ssigned to q. The new stte is thus encoded s {(q,),(q,0)} in A. Similrly, from stte q 0 in A, we cn rech sttes q nd q vi symol. Agin the minimum weight mong these trnsitions is, so we ssign this weight to the new rc nd encode the reminder weights (0 nd, respectively) in the new stte {(q,0),(q,)} in A. In generl, the sttes in A re of the form ˆq = {(q i,r i ),...,(q il,r il )}. The q i s re sttes from A, nd the r i s re clled reminders. Ech such ˆq is interpreted s follows. Consider ny string w Σ such tht there is (single) pth inducing w from the strt stte, q 0, to ˆq. As in clssicl utomt determiniztion, there is t lest one pth inducing w from q 0 to ech q ij in the nondeterministic input, A. Let c j e the weight of the minimum weight pth inducing w from q 0 to q ij in A. Let c e the weight of the pth from q 0 to ˆq in A. The reminders re constructed so tht r ij = c j c. In this wy, ll necessry pth length informtion is encoded into the

ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA 9 trnsition weights nd reminders in A. Returning to the exmple, consider stte {(q,),(q,0)} in A nd the input symol. In A, we cn rech stte q from oth q nd q. Reclling the ove discussion of reminders, we consider the sum of the weight of the trnsition in A ( for the q q trnsition nd for the q q trnsition) plus the reminder ssocited with the originl source stte encoded in stte {(q,),(q,0)} in A. Tht is, we consider the sums + = 5 nd + 0 =. We tke the minimum mong those vlues, i.e.,, s the weight of the trnsition from {(q,),(q,0)} to {(q,r)} (r to e determined) in A. Since there is only one destintion stte (q ) in A, the reminder r is 0, so we encode the new destintion stte s {(q,0)}. Similrly, we construct n rc with weight on symol from {(q,0),(q,)} to {(q,0)}. (+0 =, + = 4, nd we tke the minimum, which is.) The end result is shown in Figure.(). Generlizing to n ritrry WFA G = (Q, q,σ,δ,q f ), the deterministic WFA G is otined s follows. The strt stte of G is {( q,0)}, which forms n initil set P. While P, we remove ny stte q = {(q,r ),...,(q n,r n )} from P, where q i Q nd r i R + {0, }. The reminders encode pth length informtion, s descried ove. For ech σ Σ, let {q,...,q m} e the set of sttes rechle y σ-trnsitions out of ll the q i. For j m, let ρ j = min i n;(q i,σ,ν,q j ) δ{r i + ν} e the minimum of the weights of σ-trnsitions into q j from the q i plus the respective r i. Let ρ = min j m {ρ j }. Let q = {(q,s ),...,(q m,s m )}, where s j = ρ j ρ, for j m. If q is new stte, we dd it to P.. We dd trnsition (q,σ,ρ,q ) to G. This is the only σ-trnsition out of stte q, so G is deterministic. Let T G (w) e the set of sequences of trnsitions in G tht ccept string w Σ ; let t G (w) e the (one) sequence of trnsitions in G tht ccepts the sme string. Mohri [0] shows tht c( t G (w)) = min t T G(w) {c( t)}, nd thus L(G ) = L(G). Moreover, let T G (w,q) e the set of sequences of trnsitions in G from stte q to stte q tht induce string w. Agin, let t G (w) e the (one) sequence of trnsitions in G tht induces the sme string; t G (w) ends t some stte {(q,r ),...,(q n,r n )} in G such tht some q i = q. Mohri [0] shows tht c( t G (w)) + r i = min t T G(w,q) {c( t)}. Thus ech reminder r i encodes the difference etween the weight of the shortest pth to some stte tht induces w in A nd the weight of the pth inducing w in A, s descried ove. Hence t lest one reminder in ech stte is zero... An Anlysis. We first ound #dt(g), the numer of sttes in dt(g). The results of Section 5 show tht our upper ound is tight to within polynomil fctors. Lemm.7. Assume tht G stisfies the twins property. Let R e the set of reminders generted y DTA when computing dt(g). Let R e the set of reminders r for which the following holds: w Σ, w n, nd two sttes q nd q, such tht r = c( q,w,q ) c( q,w,q ). Then R R.

0 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK Proof. Let R e the set of reminders r such tht: w Σ nd two sttes q nd q such tht r = c( q,w,q ) c( q,w,q ). Consider stte-reminder tuple in dt(g) reched y w from the initil stte, nd ssume tht q is the optiml stte in tht tuple, i.e., the one with zero reminder. Then the reminder ssocited to q is r. Thus, R R. We next show tht R = R. Clerly R R. To prove the other inclusion we only need to show tht the reminder r generted y ny string of length t lest n is generted y string of length t most n. Let p nd p e the pths of minimum cost in G, strting t q, ending t q nd q respectively, nd ech inducing u. Becuse u n nd there re only n distinct pirs of sttes in G, there exist two (not necessrily distinct) sttes, q nd q, in p nd p respectively, nd prtition of u = xvz, v Σ +, such tht {q,q } δ( q,x), q δ(q,v) nd q δ(q,v) (there re cycles t q nd q inducing v), nd, finlly, q δ(q,z) nd q δ(q,z). Since q nd q re twins, we hve tht c( q,u,q ) c( q,u,q ) = c( q,ū,q ) c( q,ū,q ), where ū = xz is in Σ + nd ū < u. If ū is of the required length, we re finished; otherwise, we iterte the rgument. Theorem.8. Let G e WFA stisfying the twins property. In the generl cse, #dt(g) < n( log n+n log Σ +) ; in the integer (or rtionl) cse, #dt(g) < n( log n++min(n log Σ,ρ)) ; nd if G is cyclic, #dt(g) < n log Σ independent of ny ssumptions on weights. The cyclic ound is tight (up to constnt fctors) for ny lphet. Proof. Let R e the set of reminders in dt(g). Ech stte in dt(g) is n i- tuple of sttes from G with corresponding i-tuple of reminders. In the worst cse, ech i-stte tuple from G will pper in dt(g), nd there re R i distinct i-tuples of reminders it cn ssume. (This over counts y including tuples without ny zero reminders.) Therefore, #dt(g) n i= ( ) n i R i ( R ) n. Let R e the set of reminders r for which the following holds: w Σ, w n, nd two sttes q nd q, such tht r = c( q,w,q ) c( q,w,q ). By Lemm.7, R R, so we cn ound R in different settings y ounding R. Generl Cse: The weights on the trnsitions of G re incommensurle rel numers, i.e., they require infinite precision s inry numers. Since ech string induced y G corresponds to t lest one pth in G, we hve y definition of R tht the crdinlity of this set is ounded y the numer of distinct pirs of pths of length t most n. There re t most n i= m i < m n such pths, where m is the numer of edges in G. Therefore R < m n. On the other hnd, the numer of strings of length t most n is ounded y Σ n. Since ech of those strings cn rech pir of (not necessrily distinct) sttes in G, we hve tht R < n Σ n. But Σ m, so n Σ n is tighter ound on R. Our first estimte follows. Integer Cse: The weights re non-negtive integers. Fix stte q nd string w tht reches q from the initil stte. Then c( q,w,q) is in [0,(n )C] Z. Therefore, the reminders in R must lso e in tht rnge. It follows tht ( R ) n < (n C) n = n( log n+ρ+). Since the topologicl ound on R we derived for the generl cse does not depend on the mgnitude of weights nd it holds lso for the cse we re considering, we hve tht ( R ) n < n( log n++min(n log Σ,ρ)). Our second estimte follows. Notice tht this results lso holds for the cse in which the weights re

ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA rtionl numers represented y ρ its. Acyclic Cse: The grph underlying G is cyclic. Thus, ech string induced y G is of length t most n. There re Σ n = n log Σ such strings. Ech of the strings induced y G will rech exctly one stte in dt(g) (which is deterministic utomton). Therefore, the numer of sttes of dt(g) is ounded y n log Σ. Tightness follows from Theorem 5.0. Processing ech tuple of stte-reminders generted y DTA tkes O( Σ (n+m)) time, excluding the cost of rithmetic nd min opertions involving two weights nd/or reminders, yielding the following. Theorem.9. Let G e WFA stisfying the twins property. DTA tkes O( Σ (n + m) #dt(g)) time using the R-RAM nd O(ρ Σ (n + m) #dt(g)) time using the CO-RAM. For the generl cse, using the R-RAM, the time is O( Σ (n + m) n( log n+n log Σ +) ). For the (rtionl or) integer cse, using the CO-RAM, the time is O(ρ Σ (n + m) n( log n++min(n log Σ,ρ)) ). For the cyclic cse, the time is O( Σ (n + m) n log Σ ) using the R-RAM nd O(ρ Σ (n + m) n log Σ ) using the CO- RAM. Theorems.8 nd.9 do not require G to e unmiguous. DTA termintes within the stted resource ounds on ny WFA tht hs the twins property. Consider in the integer cse the interply etween the growth of G when determinized, the time complexity of the lgorithm, nd the mgnitude of the weights. In the cyclic cse first, we hve tht #G S n log Σ, where S is the numer of distinct strings ccepted y G. In some sense, S gives the expressive power of G, i.e., how much informtion is compctly stored in G with the id of nondeterminism. For smll weights, i.e., ρ nlog Σ, the worst-cse time complexity of the lgorithm is dominted y the numer of strings ccepted y G. Therefore, we cn ctully uncompct some or ll of the informtion contined in G y eliminting nondeterminism. On the other hnd, when ρ > nlog Σ, the igger weights dd no informtion nd ctully slow down the lgorithm to the point tht, for very lrge weights, the rithmetic nd logic opertions dominte the cost of the entire lgorithm. For the cyclic cse, the sitution is nlogous, with weights plying n even more prominent role. Let ρ mx = n log Σ. For ρ < ρ mx, the estimte of #dt(g) depends on ρ, lthough we do not know how tight tht estimte is. For ρ ρ mx, the expnsion of G depends only on its topology, ut the lrge weights slow down the lgorithm..4. Computing Worst-Cse Weighting. The results of Section. cn e used to generte hrd instnces for ny determiniztion lgorithm. Let G e WFA. A reweighting function (or simply reweighting) f is such tht, when pplied to G, it preserves the topology nd leling of G, ut possily chnges the weights on its trnsitions. We wnt to determine reweighting f such tht min(f(g)) exists nd #min(f(g)) is mximized mong reweightings for which min(f(g)) exists. We restrict ttention to the integer cse nd, without loss of generlity, we ssume tht G is trim nd unmiguous. Theorem.8 shows tht for weights to hve n effect on the growth of dt(g), it must e tht ρ n log Σ. Set ρ mx = n log Σ. To find the required reweighting, we simply consider ll possile weight ssignments to G stisfying the twins property nd requiring t most ρ mx its, choosing the one tht leds to the minimum deterministic equivlent of mximum size. There re ( ρmx ) m = mρmx possile reweightings, nd it tkes O(n( log n+(n log Σ ))) time to compute the size expnsion or decide tht the resulting mchine cnnot e determinized. The totl time is thus

ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK x ~ x x n- ~ x n ~ ~ x ~ x x n- ~ x n x 0 ~ n x n ~ n x n x n- n ~ n x n n Fig. 4.. A nondeterministic finite-stte utomton ccepting lnguge L = S n i= (Σ { i}) n. Arcs leled i denote trnsitions on ll symols in Σ { i }. ounded y O(n( log n+(n log Σ ))+mρ mx). 4. Hot Automt. This section provides fmily of cyclic, multi-prtite WFAs tht re hot: when determinized, they expnd independently of the weights on their trnsitions. Given some lphet Σ = {,..., n }, consider the lnguge L = n (Σ { i }) n ; i= i.e., the set of ll n-length strings tht do not include ll symols from Σ. It is simple to otin n cyclic, multi-prtite NFA H of poly(n) size tht ccepts L. (See Figure 4..) One cn lso show tht the miniml DFA ccepting L hs Θ( n+log n ) sttes. Furthermore, we cn construct H so tht these ounds hold for inry lphet: encode the symols in Σ s inry strings of length log n, nd replce rcs in the ove NFA with n-vertex, (log n)-depth trees ppropritely. H corresponds to WFA with ll rcs weighted identiclly. Since cyclic WFAs stisfy the twins property, they cn lwys e determinized. Altering the weights cn only increse the expnsion. Continuing, Kintl nd Wotschke [8] provide set of NFAs tht produces hierrchy of expnsion fctors when determinized. Consider the set of lnguges L h,k = {xy x,y {0,} ; x k ; y = k;x hs t most h s in it} for k, h < k. They show tht for ech L h,k, there is n O(k )-stte cyclic (ut not multi-prtite) NFA tht ccepts L h,k, yet ny DFA ccepting L h,k must hve t lest log h ) i=0 sttes. These provide dditionl exmples of hot WFAs. ( k i 5. Weight-Dependent Automt. In this section we ddress the effect of weights on the size of the deterministic equivlent of n input WFA. We study simple fmily of WFAs with multi-prtite, cyclic topology. When the rcs re ll weighted zero, ll WFAs in this fmily shrink when determinized. We show, however, tht even though the topology is y itself very enign, certin weightings cn cuse exponentil increses in size when the WFA is determinized. This study is relted in spirit to works tht mesure mounts of nondeterminism nd miguity in finite utomt [4, 5,

ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA T T T T k- T k- T k 0 B B B B k- B k- B k Fig. 5.. Topology of the k-lyer ril grph. 0 {T,B } {T,B } {T,B } {T k-,b k- } {T k-,b k- } {T k,b k } Fig. 5.. The result of determinizing RG(k) when ll rc weights re 0. In the result, ll rcs re gin weighted 0, nd the reminders in the vertices re ll 0; these vlues re omitted from the figure. 8]. We first discuss the cse of inry lphet nd then generlize to ritrry lphets. In this section, we use the term utomton nd grph interchngely. 5.. The Ril Grph. We denote y RG(k) the k-lyer ril grph. See Figure 5.. RG(k) hs k + vertices, which we denote y {0,T,B,...,T k,b k }. There re rcs (0,T,), (0,T,) (0,B,), (0,B,), nd then, for i < k, rcs (T i,t i+,), (T i,t i+,), (B i,b i+,), nd (B i,b i+,). (It should e cler from Figure 5. tht T stnds for top nd B for ottom. ) Note tht RG(k) is (k + )-prtite nd lso hs fixed in- nd out-degrees. (All vertices hve in- nd out-degrees, except the root, which hs in-degree 0 nd outdegree 4, nd the vertices T k nd B k, which hve out-degree 0.) If we consider the strings induced y pths from 0 to either T k or B k, then the lnguge of RG(k) is the set of strings L RG (k) = {,} k. The only nondeterministic choice is t the stte 0, where either the top or ottom ril my e selected. Hence string w cn e ccepted y one of two pths, one following the top ril nd the other the ottom ril. Techniclly, the ril grph is miguous. We cn esily dismigute RG(k) y dding trnsitions from T k nd B k, ech on distinct symol, to new finl stte. Our results extend to this cse. For clrity of presenttion, however, we discuss the miguous ril grph. The ril grph is weight-dependent. In Section 5. we provide weightings such tht DTA produces trivil (k + )-vertex series-prllel grph. (See Figure 5. for n exmple.) On the other hnd, in Section 5. we exhiit weightings for the ril grph such tht, when input to DTA, we get n exponentil increse in the numer of sttes. (See Figures 5. nd 5.4 for n exmple). Notice tht we cnnot get more thn k vertices, one per string in L RG (k), in the lst lyer of the determinized grph, nd thus the weighting in Figure 5. is in some sense worst cse. In tht section we lso explore the reltionship etween the mgnitude of the weights nd the mount of expnsion tht is possile. In Section 5.4, we show tht rndom weightings induce the ehvior of worst-cse weightings. We discuss vrints of the ril grph in Section 5.5, nd finlly, in Section 5.6 we generlize the ril grph to ritrry lphets.

4 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK / /4 / k- / k- 0 / /0 /0 /0 T B /0 /0 /0 T B /0 /0 /0 T B T k- B k- /0 /0 /0 T k- B k- /0 /0 /0 T k B k Fig. 5.. Worst-cse weighting of RG(k). Arc lel σ/w mens the rc is leled with symol σ nd hs weight w. 0 4 5 6 7 8 9 0 4 5 6 7 8 9 0 4 5 6 7 8 9 0 4 5 6 7 8 9 40 4 4 4 44 45 46 47 48 49 50 5 5 5 54 55 56 57 58 59 60 6 6 Fig. 5.4. Result of determinizing RG(5), weighted s in Figure 5.. Sttes hve een renmed. All rcs re weighted 0. The reminders re not shown. 5.. A Frmework for Exmining Weightings of RG(k). Consider determinizing RG(k) with DTA. The set of sttes rechle on ny string w = σ σ j of length j k is {T j,b j }. For given weighting function c, let c T (w) denote the cost of ccepting string w if the top pth is tken; i.e., j c T (w) = c(0,σ,t ) + c(t i,σ i+,t i+ ). Anlogously define c B (w) to e the corresponding cost long the ottom pth. Let R(w) e the reminder vector for w, which is pir of the form (0,c B (w) c T (w)) or (c T (w) c B (w),0). A stte t lyer 0 < i k in the determinized WFA is leled ({T i,b i }/R(w)) for ny string w leding to tht stte; i.e., ll strings leding to prticulr stte induce the sme reminder vector. Two strings w nd w of identicl length led to distinct sttes in the determinized version of the ril grph if nd only if R(w ) R(w ). It is convenient simply to write R(w) = c T (w) c B (w). The sign of R(w) then determines which of the two forms (0,x) or (x,0) of the reminder vector occurs. Suppose tht w hs length j nd cn e written w σ, where σ {,}. Let ri T(σ) denote the weight on the (top) rc leled σ into vertex T i nd ri B (σ) denote the weight on the (ottom) rc leled σ into vertex B i. Then we cn write R(w) = R(w ) + rj T(σ) rb j (σ). Areviting rt i (σ) rb i (σ) y δ i(σ), we hve R(w) = i= j δ i (σ i ). i= The ril grph with specific weighting cn lso e regrded s function tht hshes k-it string w into numer R(w). Define symol to e 0 nd symol to

ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA 5 e, so tht string w cn e viewed s sequence of its,..., k. We cn write R(,..., k ) = R(,..., k ) + δ k ( k ). Also, we cn write δ k ( k ) = k δ k () + ( k )δ k (0). Rerrnging gives δ k ( k ) = δ k (0) + k (δ k () δ k (0)). Summing over ll i gives Alterntively, R(w) = k (δ i (0) + i (δ i () δ i (0))). i= R(w) = R + k i (δ i () δ i (0)), (5.) i= where R = k i= δ k(0) is fixed for given weighting function on RG(k). Theorem 5.. There is reweighting f such tht oth dt(f(rg(k))) nd min(f(rg(k))) relize the topology of the (k + )-vertex trivil series-prllel grph (exemplified in Figure 5.). Proof. Any weighting in which δ i () = δ i () for i = to k suffices, since in this cse R(w ) = R(w ) for ll pirs of strings {w,w }. In prticulr, giving zero weights suffices. 5.. Worst-Cse Weightings of RG(k). See Figures 5. nd 5.4. Theorem 5.. For ny j [0,k] Z there exists reweighting f such tht dt(f(rg(k))) hs the following form: Lyers 0 through j form the complete inry tree on j+ vertices, nd the remining lyers j + through k consist of trivil series-prllel grphs, ech rooted t lef of the tree. Proof. Choose ny weighting such tht δ i () = i nd δ i () = 0 for i j, nd let δ i () = δ i () = 0 for j < i k. Consider pir of strings w,w of identicl length tht differ in position i j. Let σi = if the ith symol of w is nd σi = 0 otherwise; similrly define σ i with respect to w. Then we cn write R(w ) = j i= σ i i nd R(w ) = j i= σ i i. Since σi σ i, R(w ) must differ from R(w ). Hence the two strings must led to different sttes. If on the other hnd they differ only in positions i > j they will led to the sme stte. There re i strings tht differ in positions through i; thus for i j, there re i distinct vertices in the ith lyer of the grph. Since ech vertex hs out-degree, one rc for ech symol, the grph must hve the desired form. Note tht if we set ll weights on the ottom ril to zero, nd the weights ri T() = i nd ri T () = 0 for ll i k, we get weighting tht yields complete inry tree of depth k when DTA is pplied. It is esy to show tht the minimum deterministic grph preserving shortest pths, however, consists of trivil seriesprllel grph in which ll edges hve weight zero, corresponding to the lower ril. We cn remedy this y choosing weights more judiciously. Theorem 5.. For ny j [0,k] Z there is reweighting f such tht oth dt(f(rg(k))) nd min(f(rg(k))) hve the following form: Lyers 0 through j form the complete inry tree on j vertices, nd the remining lyers j through k form trivil series-prllel grph with incoming rcs to the lyer-j vertex from ech vertex t lyer j.

6 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK 0 /0 /0 /0 / /0 / 4 6 5 /0 /6 /0 / /0 /6 /0 / 7 8 0 9 4 /0 /4 /0 / /0 /0 /0 /6 /0 /4 /0 / /0 /0 /0 /6 5 /0 /0 6 /0 /0 7/0 Fig. 5.5. Result of minimizing dt(rg(6)), weighting RG(6) s follows: ri T () = rb i () = i for i 4; ll other weights were 0. Sttes hve een renmed, nd reminders re not shown. See Figure 5.5. Theorem 5. is generlized y Theorem 5.0, nd therefore we omit its proof here. Theorems 5. nd 5. show tht the ound otined in Theorem.8 for the cyclic cse is tight for inry lphets. We now ddress the sensitivity of the size expnsion to the mgnitude of the weights. Note tht RG(k) hs k + vertices nd 4k rcs, ut we use Θ(k) its to encode the weight of ech rc in the proofs of Theorems 5. nd 5.; the input size of the WFA is thus n = Θ(k ) its. The determinized WFA hs k+ sttes nd k+ trnsitions. Agin we need Θ(k) its to encode the weight of ech trnsition, so the it size of the determinized WFA is Θ(k k ), or Θ( n) its. So while the determinized WFA hs exponentilly more sttes thn the originl WFA, the size expnsion in its, while superpolynomil, is not exponentil. We rgue tht exponentil stte-spce expnsion requires exponentilly ig weights for the ril grph. Theorem 5.4. Let f e reweighting. If #dt(f(rg(k))) = Ω( k ), then Ω(k ) its re required to represent f(rg(k)). Proof. Consider the Ω( k ) vertices t depth k in the determinized grph. Ech such stte is leled y distinct R(w) for some string w = σ σ k. Hence if there re Ω( k ) sttes in dt(f(rg(k))), there must e Ω( k ) distinct vlues of R(w). In ddition, there must e Ω( k ) distinct vlues of the solute vlue of R(w). Reclling the formultion of R(w) from Eqution (5.), there cn e t most k distinct vlues of R(w), where k is the numer of distinct vlues of (δ i () δ i (0)). Ech vlue my e included in the sum or not, nd t est choice of inclusions nd exclusions will led to unique sum. Therefore, the ssumption of Ω( k ) distinct reminders implies there must e Ω(k) distinct vlues of the (δ i () δ i (0)). Now ignore the k low-order its of the solute vlues of the reminders, nd consider only the remining high-order its. There must e Ω( k ) distinct vlues induced y the high-order its, or else there cnnot e Ω( k ) distinct vlues overll. By the sme rgument s ove, there must e Ω( k ) distinct vlues of the (δ i() δ i (0)) tht ffect the high-order its of lrge reminder, i.e., one with one of its high-order

ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA 7 k its set to. For prticulr (δ i () δ i (0)) to ffect high-order it of lrge reminder, (δ i () δ i (0)) must hve non-zero it t lest s high s position k log k. This is only true if one of the four rc weights for lyer i hs non-zero it t lest tht high. Therefore, Ω(k) rc weights require some non-zero it t lest s high s position k log k. Hence Ω(k ) its re required to represent ll the rc weights. Corollry 5.5. Let f e reweighting. If #min(f(rg(k))) = Ω( k ), then Ω(k ) its re required to represent f(rg(k)). Proof. Theorem 5.4 pplies, ecuse #min(f(rg(k))) = Ω( k ) implies tht #dt(f(rg(k))) = Ω( k ). Finlly, consider the following nlogy etween the hot grphs in Section 4 nd the ril grph. Oserve tht the hot grphs in Section 4 contin some nondeterministic choices tht cnnot e resolved until the end of the input. This cuses the respective deterministic expnsions. In those grphs, these choices re prt of the strings eing ccepted. The ril grph mnifests this sme phenomenon, ut in terms of weights rther thn strings. The weighted vrints of the ril grph tht expnd when determinized do so ecuse it is not cler until the end of the expnsion which ril will provide the shorter pth: t ny point, the choice of top or ottom ril depends on the symols tht follow. Therefore, the determiniztion must mintin enough stte informtion to provide for ll possile outcomes. Furthermore, in the non-minimizle cses, wheres the lnguge L RG (k) itself could e ccepted y (k + )-stte DFA, the weights on RG(k) necessitte n exponentil numer of sttes nd rcs in ny deterministic WFA tht induces ll the pproprite pth lengths. 5.4. Rndom Weightings of RG(k). An i-it reweighting function (or simply i-it reweighting) is reweighting function f such tht the weights on the rcs of f(g) re constrined to e in [0, i ] Z. A function f R is rndom reweighting function (or simply rndom reweighting) if nd only if it chooses the weights to ssign to the trnsitions of G uniformly nd independently t rndom from R + {0, }. Finlly, let x R Y denote tht x is selected uniformly nd independently t rndom from set Y, nd let E[X] denote the expected vlue of some rndom vrile X. We need the following technicl clim. Clim 5.6. Let X,Y,U,V R [0, k ]. Then mx Pr(X Y (U V ) = i) k+ +<i< k+ k + O ( /4 k). Proof. See Appendix A.. Theorem 5.7. Let f R e rndom k-it reweighting. E[#dt(f R (RG(k)))] = Θ( k ). Proof. As efore, let R(w) e the reminder induced y string w of length k; i.e., the difference etween the cost of the upper pth nd the cost of the lower pth tht respectively induce w. Let δ i (σ) e the cost of the (top) rc leled σ into vertex T i minus the cost of the (ottom) rc leled σ into vertex B i. Recll from Eqution (5.) tht the ril grph with specific weighting cn e regrded s function tht hshes k-it string w = σ σ k into numer R(w). Suppose w w. We cn dpt stndrd nlysis from the theory of universl hsh functions [7] to clculte the proility tht R(w ) = R(w ). Let w = α α k nd w = β β k. Without loss of generlity, ssume α k β k. (The strings must