ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA

Size: px
Start display at page:

Download "ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA"

Transcription

1 To pper in SIAM Journl on Computing c SIAM 000 ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, AND JEFFERY R. WESTBROOK Astrct. We study the prolem of constructing the deterministic equivlent of nondeterministic weighted finite-stte utomton (WFA). Determiniztion of WFAs hs importnt pplictions in utomtic speech recognition (ASR). We provide the first polynomil-time lgorithm to test for the twins property, which determines if WFA dmits deterministic equivlent. We lso give upper ounds on the size of the deterministic equivlent; the ound is tight in the cse of cyclic WFAs. Previously, Mohri presented super-polynomil time lgorithm to test for the twins property, nd he lso gve n lgorithm to determinize WFAs. He showed tht the ltter runs in time liner in the size of the output when deterministic equivlent exists; otherwise, it does not terminte. Our ounds imply n upper ound on the running time of this lgorithm. Given tht WFAs cn expnd exponentilly in size when determinized, we explore why those tht occur in ASR tend to shrink when determinized. According to ASR folklore, this phenomenon is ttriutle solely to the fct tht ASR WFAs hve simple topology, in prticulr, tht they re cyclic nd lyered. We introduce very simple clss of WFAs with this structure, ut we show tht the expnsion under determiniztion depends on the trnsition weights: some weightings cuse them to shrink, while others, including rndom weightings, cuse them to expnd exponentilly. We provide experimentl evidence tht ASR WFAs exhiit this weight dependence. Tht they shrink when determinized, therefore, is result of fvorle weightings in ddition to specil topology. These nlyses nd oservtions hve een used to design new, pproximte WFA determiniztion lgorithm, reported in seprte pper long with experimentl results showing tht it chieves significnt WFA size reduction with negligile impct on ASR performnce. Key words. lgorithms, rtionl functions nd power series, speech recognition, weighted utomt AMS suject clssifictions. 0M5, 68Q5, 68Q45, 68T0 PII.. Introduction. Finite-stte mchines nd their reltion to rtionl functions nd power series hve een extensively studied [,,, 9] nd widely pplied in fields rnging from imge compression [0,,, 7] to nturl lnguge processing [, 0,, 8, 0]. A suclss of finite-stte mchines, the weighted finite-stte utomt (WFAs), hs recently ssumed new importnce, ecuse WFAs provide powerful method for representing nd mnipulting models of humn lnguge in utomtic speech recognition (ASR) systems [, 4]. This new reserch direction lso rises numer of chllenging lgorithmic questions [5]. A weighted finite-stte utomton (WFA) is nondeterministic finite utomton (NFA), A, tht hs oth n lphet symol nd weight, from some set K, on ech trnsition. Let R = (K,,, 0, ) e semiring. Then A together with R genertes prtil function from strings to K: the vlue of n ccepted string is the semiring sum over ccepting pths of the semiring product of the weights long ech ccepting pth. A prtil function tht cn e generted this wy is rtionl power series [9]. An exmple importnt to ASR is the set of WFAs with the min-sum semiring, An extended strct of this pper ppers in Proc. 5th Int l. Conf. on Automt, Lnguges, nd Progrmming, 998. AT&T Ls, Shnnon Lortory, 80 Prk Avenue, Florhm Prk, NJ 079, USA, l@reserch.tt.com. Diprtimento di Mtemtic ed Appliczioni, Universitá di Plermo, Vi Archirfi 4, 90 Plermo, Itly, rffele@ltir.mth.unip.it. Work supported y AT&T Ls. 0th Century Television, Los Angeles, CA 9005, USA, jwestrook@cm.org. Work completed while memer of AT&T Ls.

2 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK (R + {0, },min,+,,0), which compute for ech ccepted string the minimum cost ccepting pth. In this pper, we study prolems relted to the determiniztion of WFAs. A deterministic, or sequentil, WFA hs t most one trnsition with given input symol out of ech stte. Not ll rtionl power series cn e generted y deterministic WFAs. A determiniztion lgorithm tkes s input WFA nd produces deterministic WFA tht genertes the sme rtionl power series, if such deterministic WFA exists. The importnce of determiniztion to ASR is well estlished [0,, 4]. To the est of our knowledge, Mohri [0] presented the first determiniztion procedure for WFAs, extending the seminl ides of Choffrut [8, 9] nd Weer nd Klemm [] regrding string-to-string trnsducers. Mohri gives determiniztion procedure with three phses. First, A is converted to n equivlent unmiguous, trim WFA A t, using n lgorithm nlogous to one for NFAs []. (We define unmiguous nd trim elow.) Mohri then gives n lgorithm, TT, tht determines if A t hs the twins property (lso defined elow). If A t does not hve the twins property, then there is no deterministic equivlent of A. If A t hs the twins property, second lgorithm of Mohri s, DTA, cn e pplied to A t to yield A, deterministic equivlent of A. Algorithm TT runs in O(m 4n ) time, where m is the numer of trnsitions nd n the numer of sttes in A t. Algorithm DTA runs in time liner in the size of A. In prctice, DTA is run directly on A, which is ssumed to dmit deterministic equivlent; conversion to A t nd testing for twins re theoreticl steps needed to mke the procedure well defined. Mohri oserves tht A cn e exponentilly lrger thn A, ecuse WFAs include clssicl NFAs. He gives no upper ound on the worst-cse stte-spce expnsion, however, nd ecuse of the weights on trnsitions, the clssicl NFA upper ound does not pply. Finlly, Mohri gives n lgorithm tht tkes deterministic WFA nd outputs the minimum-size equivlent, deterministic WFA. We present severl results relted to the determiniztion of WFAs. In Section we give the first polynomil-time lgorithm to test whether n unmiguous, trim WFA stisfies the twins property. It runs in O(m n 6 ) time. We then provide worst-cse time complexity nlysis of DTA. The numer of sttes in the output deterministic WFA is t most n( log n+n log Σ +), where Σ is the input lphet. If the weights re rtionl, this ound ecomes n( log n++min(n log Σ,ρ)), where ρ is the mximum it-size of weight. When the input WFA is cyclic, the ound ecomes n log Σ. The cyclic ound holds for rel weights, nd it is tight (up to constnt fctors) for ny lphet size. It remins open whether there exists polynomil-time procedure to determine whether n ritrry WFA dmits deterministic equivlent, ecuse the determiniztion process ove requires converting WFA to n unmiguous equivlent prior to testing for twins. In Sections 4 6 we study questions motivted y the use of WFA determiniztion in ASR [, 4]. Although determiniztion cuses exponentil stte-spce expnsion in the worst cse, in ASR systems the determinized WFAs re often smller thn the input WFAs [0], nd they re seldom very lrge. This is fortunte, ecuse the performnce of ASR systems depends directly on WFA size [, 4]. Folklore within the ASR community credits this phenomenon entirely to the specil topology of ASR WFAs. (The topology of WFA is its underlying directed grph nd leling y input symols, ignoring weights.) ASR WFAs tend to e cyclic nd lyered. Such WFA lwys dmits deterministic equivlent. The role tht the trnsition weights might ply in controlling expnsion under determiniztion hs not een considered. In Section 4 we study the role of topology in expnsion under determiniztion.

3 ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA We exhiit clss of lyered, cyclic WFAs whose minimum equivlent deterministic WFAs re exponentilly lrger regrdless of weighting. The lnguges ccepted y these WFAs re quite unnturl, however. In Section 5 we study the role of trnsition weights in expnsion under determiniztion. We introduce clss of nondeterministic WFAs, RG. Ech WFA in this clss hs n extremely simple multi-prtite, cyclic topology, ccepts very trivil lnguge, nd in the sence of weights (i.e., with ll weights set to zero) hs smller deterministic equivlent. We show, however, tht for ny A RG nd ny i n, there exists n ssignment of weights to the trnsitions of A such tht the miniml equivlent deterministic WFA hs Θ( i log Σ ) sttes. This gives lower ound to mtch the upper ounds of Section. Using ides from universl hshing, we lso show tht similr results hold when the weights re rndom i-it numers. This motivtes us to exmine experimentlly the effect of vrying weights on ctul WFAs from ASR pplictions. In Section 6 we give the results of these experiments. We cll WFA weight-dependent if its expnsion under determiniztion is strongly determined y its weights. Most of the exmples from ASR were weightdependent. These experimentl results together with the theory we develop show tht the folklore explntion is insufficient: ASR WFAs shrink under determiniztion ecuse oth the topology nd weighting tend to e fvorle. Some of our results help explin the nture of WFAs from the lgorithmic point of view, i.e., how weights ssigned to the trnsitions of WFA cn ffect the performnce of lgorithms mnipulting it. Others relte directly to the theory of weighted utomt. We hve used our results to design n pproximte vrint of Mohri s determiniztion lgorithm. We descrie this lgorithm seprtely [6], long with experimentl results showing tht it chieves size reductions in ASR lnguge models tht significntly exceed those of previous methods, with negligile effects on ASR performnce (time nd ccurcy).. Definitions nd Terminology. Given semiring (K,,,0,), weighted finite utomton (WFA) is tuple G = (Q, q,σ,δ,q f ). Q is the set of sttes, q Q is the initil stte, Σ is the set of symols, δ Q Σ K Q is the set of trnsitions, nd Q f Q is the set of finl sttes. We ssume tht Σ >. A deterministic, or sequentil, WFA hs t most one trnsition t = (q,σ,ν,q ) for ny pir (q,σ); nondeterministic WFA cn hve multiple trnsitions on pir (q,σ), differing in trget stte q. The prolems exmined in this pper re motivted primrily y ASR pplictions, which work with the min-sum semiring, (R + {0, },min,+,,0), nd we therefore limit further discussion to the min-sum semiring. (Some of the lgorithms considered use sutrction. To e well-defined, therefore, they require skew field. The min-sum semiring is indeed emedded in skew field [6].) Let t = (t,...,t l ) e some sequence of trnsitions, such tht t i = (q i,σ i,ν i,q i ); t induces string w = σ σ l. String w is ccepted y t if q 0 = q nd q l Q f ; w is ccepted y G if some t ccepts w. Let c(t i ) = ν i e the weight of t i. Then the weight of t is l c( t) = c(t i ). i= Let T(w) e the set of ll sequences of trnsitions tht ccept string w. Then the weight of w is c(w) = min c( t). t T(w)

4 4 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK The weighted lnguge of G is the set L(G) = {(w, c(w)) w is ccepted y G} ; i.e., the weighted strings ccepted y G. Intuitively, the weight on trnsition of G cn e seen s the confidence one hs in tking tht trnsition. The weights need not, however, stisfy stochstic constrints, s do the proilistic utomt introduced y Rin [6]. Fix two sttes q nd q nd string v Σ. Let c(q,v,q ) e the minimum of c( t), tken over ll trnsition sequences t from q to q inducing v. We refer to c(q,v,q ) s the optiml cost of inducing v from q to q. We generlly use nottion so tht δ(q,w) cn represent the set of sttes rechle from stte q Q on string w Σ. We extend the function δ to strings in the usul wy: q δ(q,v),v Σ +, mens tht there is sequence of trnsitions from q to q inducing v. The topology of G is the projection π Q Σ Q (δ): i.e., the trnsitions of G without respect to the weights. We lso refer to the topology of G s the grph underlying G. A WFA is trim if every stte ppers in n ccepting pth for some string nd no trnsition is weighted 0 ( in the min-sum semiring). A WFA is unmiguous if there is exctly one ccepting pth for ech ccepted string. Determiniztion of G is the prolem of computing deterministic WFA G such tht L(G ) = L(G), if such G exists. We denote the output of lgorithm DTA y dt(g). We denote the miniml deterministic WFA ccepting L(G) y min(g), if one exists. We sy tht G expnds if dt(g) hs more sttes nd/or trnsitions thn G. Let n = Q nd m = δ, nd let the size of G e G = n + m. We lso use #G to men Q, the numer of sttes of G. We ssume tht ech trnsition is leled with exctly one symol, so Σ m. Recll tht the weights of G re non-negtive rel numers. Let C e the mximum weight. In the generl cse, weights re incommensurle rel numers, requiring infinite precision. In the integer cse, weights cn e represented with ρ = lg C its. We denote the integrl rnge [,] y [,] Z. The integer cse extends to the cse in which the weights re rtionls requiring ρ its. We ssume tht in the integer nd rtionl cses, weights re normlized to remove excess lest-significnt zero its. For our nlyses, we use the RAM model of computtion s follows. In the generl cse, we chrge constnt time for ech rithmetic-logic opertion involving weights (which re rel numers). We refer to this model s the R-RAM [5]. The relevnt prmeters for our nlyses re n, m, nd Σ. In the integer cse, we lso use RAM, except tht ech rithmetic-logic opertion now tkes O(ρ) time. We refer to this model s the CO-RAM[]. The relevnt prmeters for the nlyses re n, m, Σ, nd ρ.. Determiniztion of WFAs... An Algorithm for Testing the Twins Property. Definition.. Two sttes, q nd q, of WFA G re twins if u,v Σ such tht q δ( q,u), q δ( q,u), q δ(q,v), nd q δ(q,v), the following holds: c(q,v,q) = c(q,v,q ). G hs the twins property if ll pirs q,q Q re twins. Tht is, if sttes q nd q re rechle from q y common string, then q nd q re twins only if ny string tht induces cycle t ech induces cycles of equl optiml cost. Note tht two sttes hving no cycle on common string re twins. Mohri [0, Theorems nd ] proves tht ny WFA G over the min-sum semiring is determinizle if it hs the twins property; furthermore, if G is trim nd unmiguous, the twins property ecomes necessry nd sufficient condition. For n exmple of non-determinizle WFA, see Figure..

5 ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA 5 /0 /0 c/0 0 /0 d/0 / Fig... A nondeterministic, trim, unmiguous WFA G. Arcs leled σ/w correspond to trnsitions leled σ with weight w. For this nd succeeding figures, the strt stte is the unique source, nd finl sttes re denoted y doule circles. G ccepts the lnguge {( n c, 0), ( n d, n) n 0}. Sttes nd do not hve the twins property: ech is rechle from stte 0 vi string, yet the costs of the cycles leled t ech differ. It is esily shown tht no deterministic WFA cn ccept L(G). The twins property for WFAs is nlogous to tht defined y Choffrut [8, 9] nd (in different terms) y Weer nd Klemm [] to identify necessry nd sufficient conditions for string-to-string trnsducer to dmit sequentil trnsducer relizing the sme rtionl trnsduction. In spite of the strong nlogy, the proof techniques used for WFAs differ from those used to otin nlogous results for string-to-string trnsducers. In prticulr, the efficient lgorithm we derive here to test WFA for twins is not relted to the polynomil-time lgorithm of Weer nd Klemm [] for testing twins in string-to-string trnsducers. We reduce the prolem of testing the twins property to tht of computing shortest pths on some suitly defined grphs, which we introduce next. Let T q, q e the multi-prtite cyclic, leled, weighted grph hving n lyers nd inductively defined s follows. The root vertex ˆr is t lyer zero nd corresponds to ( q, q). The vertices t lyer one correspond to suset of Q Q otined s follows: ˆr is connected to vertex u, corresponding to (q,q ), if nd only if there re two distinct trnsitions t = ( q,,c,q ) nd t = ( q,,c,q ) in G. The rc connecting ˆr to u is leled with Σ nd hs cost c = c c. Assume tht we hve the vertices t lyer i. The vertices t lyer i re otined s follows. Let u e the vertex t lyer i corresponding to (q,q ) Q Q; u is connected to u, corresponding to (q,q ), t lyer i if nd only if there re two distinct trnsitions t = (q,,c,q ) nd t = (q,,c,q ) in G. The rc connecting u to u is leled with Σ nd hs cost c = c c. This grph hs O(n 4 ) vertices nd O(m n 4 ) rcs. Let (q,q ) i denote the vertex corresponding to (q,q ) Q Q t lyer i of T q, q, if ny. Let RT {(q,q ) q q } e the set of pirs of distinct sttes of G tht re rechle from ( q, q) 0 in T q, q. For ech (q,q ) RT, define T q,q nlogously to T q, q. Notice tht T q,q hs O(n 4 ) vertices nd O(m n 4 ) rcs. We need the following. Lemm.. Fix two distinct sttes q nd q of G. They cn e reched from the initil stte q of G y the sme string z Σ + if nd only if there exists some string

6 6 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK z Σ i, for some i n, such tht q nd q re oth reched from q using z. In tht cse, there is t lest one pth in T q, q tht goes from ( q, q) 0 to (q,q ) i. Proof. Fix string z Σ +, nd ssume tht q nd q cn e reched from q y z. Assume tht z > n, or else we re done. Since there re only n distinct pirs of sttes of G nd z > n, there must exist two sttes q nd q nd string v Σ + such tht () z = xvu; () q (rsp., q ) is on pth from q to q (rsp., q ) inducing z; nd (c) q δ(q,v) (rsp., q δ(q,v)). But then, z = xu lso reches oth q nd q from q. If z n, we re done; otherwise we iterte the rgument. The second prt of the lemm follows y construction of T q, q. Lemm.. Let G e trim nd unmiguous. Fix string y Σ i, i n, nd two distinct sttes q nd q of G. Then q δ(q,y) nd q δ(q,y) if nd only if there is exctly one pth p in T q,q tht strts t (q,q ) 0, ends t (q,q ) y, nd induces y. Moreover, the cost of p is c(q,y,q ) c(q,y,q ). Proof. We prove the sufficient cse; the necessry cse should e cler from the construction of T q,q. First oserve tht, since G is trim nd unmiguous, the following holds: for ech string y Σ + such tht q δ(q,y), there is exctly one cycle strting nd ending t q nd inducing y. Let (q = q 0,q,q,,q y = q) e the unique sequence of sttes of G originting in q nd inducing y in G. Therefore, c(q,y,q) is the sum of the weights on the trnsitions in tht sequence. Similrly define (q = q 0,q,q,,q y = q ). By the ove construction, there exists pth p in T q,q, consisting of the vertices ((q,q ) 0,(q,q ),,(q,q ) y ) nd inducing y. This pth must e unique, nd its cost is c(q,y,q) c(q,y,q ). Lemm.4 ([0, Lemm ]). Let G e trim, unmiguous WFA. G hs the twins property if nd only if u,v Σ such tht uv n, the following holds: when there exist two sttes q nd q such tht (i) {q,q } δ( q,u), nd (ii) q δ(q,v) nd q δ(q,v), then (iii) c(q,v,q) = c(q,v,q ) must follow. Fix two distinct sttes q nd q of G. Let (q,q ) i,(q,q ) i,...,(q,q ) is, 0 < i < i < < i s, e ll the occurrences of (q,q ) in T q,q, excluding (q,q ) 0. This sequence my e empty. A symmetric sequence cn e extrcted from T q,q. We refer to these sequences s the common cycles sequences of (q,q ). We sy tht q nd q stisfy the locl twins property if nd only if () their common cycles sequences re empty, or () zero is the cost of ny shortest pth from (q,q ) 0 to (q,q ) ij in T q,q nd from (q,q) 0 to (q,q) ij in T q q, for ll j s. Lemm.5. Let G e trim, unmiguous WFA. G stisfies the twins property if nd only if (i) RT is empty or (ii) ll (q,q ) RT stisfy the locl twins property. Proof. ) Assume tht G stisfies the twins property. If RT is empty, we re done. Assume then tht RT. The proof is y contrdiction. Assume tht some (q,q ) RT does not stisfy the locl twins property. The common cycles sequences of (q,q ) cnnot e empty, or else they would stisfy the locl twins property. By ssumption, there exists some j for which the cost of some shortest pth from (q,q ) 0 to (q,q ) ij in T q,q is not zero, while the cost of shortest pth from (q,q) 0 to (q,q) ij in T q q my e ny vlue, including zero (or vice vers). Fix ny such shortest pth p in T q,q. According to Lemm., p corresponds to cycles round q nd q tht ech induce the sme string y, for some y Σ ij. Moreover, we must hve c(q,y,q) c(q,y,q ) 0. By definition of RT, q nd q re ech rechle y some string u from the initil stte of G. Therefore, G does not stisfy the twins property, which is contrdiction.

7 ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA 7 ) Assume tht RT is empty. Then, y Lemm., no two distinct sttes q,q of G cn oth e reched y some string z Σ + from the initil stte q. Therefore, G stisfies the twins property. Assume now tht RT is not empty. We hve two sucses. Sucse A) Assume tht ll sttes in RT stisfy the locl twins property ecuse their common cycles sequences re empty. This implies tht ll pirs of distinct sttes rechle from the initil stte of G through the sme string z Σ + do not hve ny cycles in common inducing identicl strings. Thus, G stisfies the twins property. Sucse B) Assume tht some sttes in RT stisfy the locl twins property nd their common cycles sequences re not empty. Let RT e such set. Assume tht G does not stisfy the twins property. We derive contrdiction. Since RT is not empty, we hve tht the set of pirs of sttes for which (i) nd (ii) re stisfied in Lemm.4 is not empty. But since G does not stisfy the twins property, there must exist two distinct sttes q nd q nd string uv Σ, uv n, such tht (i) oth q nd q cn e reched from the initil stte of G through string u; (ii) q δ(q,v) nd q δ(q,v); nd (iii) c(q,v,q) c(q,v,q ). We now rgue tht (q,q ) must e in RT. Becuse q q nd G hs only one initil stte, we hve tht u. Thus, u n, implying tht (q,q ) RT. v cnnot e the empty string ǫ ecuse c(q,ǫ,q) = c(q,ǫ,q ) = 0. Since uv n, we hve tht v n. But then, y Lemm. nd (ii) ove, we hve tht (q,q ) v cn e reched from (q,q ) 0 in T q,q through the nonempty string v. Therefore, the common cycles sequences of (q,q ) cnnot e empty, implying tht (q,q ) RT. Without loss of generlity, ssume tht c(q,v,q) c(q,v,q ) < 0. Since v n, we hve y Lemm. tht there is exctly one pth p in T q,q strting t (q,q) 0, ending in (q,q ) v, inducing v, nd with cost c(q,v,q) c(q,v,q ) < 0. Since p hs negtive cost, the cost of the shortest pth from (q,q ) 0 to (q,q ) v in T q,q cnnot e zero, which contrdicts tht q nd q stisfy the locl twins property nd hve non-empty common cycles sequences. Our lgorithm for testing whether trim, unmiguous WFA hs the twins property works s follows. First, compute T q, q nd the set RT. Then, for ech pir of sttes (q,q ) RT tht hve not een processed yet: compute T q,q nd T q,q, extrct the common cycles sequences, nd compute the single source (from the root) shortest pths to vertices in T q,q nd T q,q. Theorem.6. Let G e trim unmiguous WFA. In the generl cse, whether G stisfies the twins property cn e checked in O(m n 6 ) time using the R-RAM model of computtion. In the integer cse, the ound ecomes O(ρm n 6 ) using the CO- RAM model of computtion. Proof. Lemm.5 implies correctness. We now nlyze the lgorithm, strting with the generl cse. Recll tht ech rithmetic-logic opertion cn e done in constnt time. T q, q cn e esily otined in O(m n 4 ) time y visiting the utomton G. Now, visiting T q, q, we cn otin the set RT in the sme mount of time. Fix pir of distinct sttes q nd q of G. It is sufficient to discuss how to compute shortest pths from the root vertex of T q,q to the other vertices in the grph. Notice tht the edges of T q,q my hve negtive cost. However, T q,q is multi-prtite cyclic grph. In tht cse, it is simple exercise to show how to perform the required computtion in time liner in the size of T q,q, i.e., O(m n 4 ) time. Since RT = O(n ), the totl time of the lgorithm is O(m n 6 ). For the integer cse, we multiply the ove ound y ρ. We lso mention, omitting the detils, tht the exponentil-time lgorithm for

8 8 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK / q / / {(q,),(q,0)} / q 0 /4 / / / q q 0 {(q,0)} / q /5 / {(q,0),(q,)} / () () Fig... () A nondeterministic weighted utomton, A. Arcs leled σ/w correspond to trnsitions leled σ with weight w. () The result of pplying DTA to A. This is derived from Figures nd of Mohri [0]. testing the twins property originlly devised y Mohri [0] cn e simplified nd implemented to run in pseudo-polynomil time in the integer cse. The lgorithm we devise here is wekly polynomil in the integer cse... The DTA Algorithm. Mohri [0] descries determiniztion lgorithm for finite-stte utomton with weights drwn from generl semiring. Wht we refer to s DTA is tht lgorithm restricted to the min-sum semiring. DTA is generliztion of the clssic power-set construction for finite utomt. We descrie the lgorithm, strting with n exmple. Consider the weighted utomton, A, in Figure.(). While A is not unmiguous, it hs the twins property, nd so we cn pply DTA directly to it, proceeding s follows. From the initil stte q 0, we cn rech sttes q nd q using the input symol. Anlogously to the determiniztion of finite-stte utomt, we estlish new stte {q,q } in A, rechle from q 0 with input symol. The trnsitions to q nd q, however, hve different weights in A. DTA selects the smller weight to e the weight of the trnsition to {q,q } nd records the difference etween the two weights in the new stte. In the exmple, the weight of the q 0 q trnsition is, nd tht of the q 0 q trnsition is. Therefore, the new trnsition to {q,q } gets weight, nd the difference of = is ssigned s reminder to component q. For completeness, reminder of 0 = is ssigned to q. The new stte is thus encoded s {(q,),(q,0)} in A. Similrly, from stte q 0 in A, we cn rech sttes q nd q vi symol. Agin the minimum weight mong these trnsitions is, so we ssign this weight to the new rc nd encode the reminder weights (0 nd, respectively) in the new stte {(q,0),(q,)} in A. In generl, the sttes in A re of the form ˆq = {(q i,r i ),...,(q il,r il )}. The q i s re sttes from A, nd the r i s re clled reminders. Ech such ˆq is interpreted s follows. Consider ny string w Σ such tht there is (single) pth inducing w from the strt stte, q 0, to ˆq. As in clssicl utomt determiniztion, there is t lest one pth inducing w from q 0 to ech q ij in the nondeterministic input, A. Let c j e the weight of the minimum weight pth inducing w from q 0 to q ij in A. Let c e the weight of the pth from q 0 to ˆq in A. The reminders re constructed so tht r ij = c j c. In this wy, ll necessry pth length informtion is encoded into the

9 ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA 9 trnsition weights nd reminders in A. Returning to the exmple, consider stte {(q,),(q,0)} in A nd the input symol. In A, we cn rech stte q from oth q nd q. Reclling the ove discussion of reminders, we consider the sum of the weight of the trnsition in A ( for the q q trnsition nd for the q q trnsition) plus the reminder ssocited with the originl source stte encoded in stte {(q,),(q,0)} in A. Tht is, we consider the sums + = 5 nd + 0 =. We tke the minimum mong those vlues, i.e.,, s the weight of the trnsition from {(q,),(q,0)} to {(q,r)} (r to e determined) in A. Since there is only one destintion stte (q ) in A, the reminder r is 0, so we encode the new destintion stte s {(q,0)}. Similrly, we construct n rc with weight on symol from {(q,0),(q,)} to {(q,0)}. (+0 =, + = 4, nd we tke the minimum, which is.) The end result is shown in Figure.(). Generlizing to n ritrry WFA G = (Q, q,σ,δ,q f ), the deterministic WFA G is otined s follows. The strt stte of G is {( q,0)}, which forms n initil set P. While P, we remove ny stte q = {(q,r ),...,(q n,r n )} from P, where q i Q nd r i R + {0, }. The reminders encode pth length informtion, s descried ove. For ech σ Σ, let {q,...,q m} e the set of sttes rechle y σ-trnsitions out of ll the q i. For j m, let ρ j = min i n;(q i,σ,ν,q j ) δ{r i + ν} e the minimum of the weights of σ-trnsitions into q j from the q i plus the respective r i. Let ρ = min j m {ρ j }. Let q = {(q,s ),...,(q m,s m )}, where s j = ρ j ρ, for j m. If q is new stte, we dd it to P.. We dd trnsition (q,σ,ρ,q ) to G. This is the only σ-trnsition out of stte q, so G is deterministic. Let T G (w) e the set of sequences of trnsitions in G tht ccept string w Σ ; let t G (w) e the (one) sequence of trnsitions in G tht ccepts the sme string. Mohri [0] shows tht c( t G (w)) = min t T G(w) {c( t)}, nd thus L(G ) = L(G). Moreover, let T G (w,q) e the set of sequences of trnsitions in G from stte q to stte q tht induce string w. Agin, let t G (w) e the (one) sequence of trnsitions in G tht induces the sme string; t G (w) ends t some stte {(q,r ),...,(q n,r n )} in G such tht some q i = q. Mohri [0] shows tht c( t G (w)) + r i = min t T G(w,q) {c( t)}. Thus ech reminder r i encodes the difference etween the weight of the shortest pth to some stte tht induces w in A nd the weight of the pth inducing w in A, s descried ove. Hence t lest one reminder in ech stte is zero... An Anlysis. We first ound #dt(g), the numer of sttes in dt(g). The results of Section 5 show tht our upper ound is tight to within polynomil fctors. Lemm.7. Assume tht G stisfies the twins property. Let R e the set of reminders generted y DTA when computing dt(g). Let R e the set of reminders r for which the following holds: w Σ, w n, nd two sttes q nd q, such tht r = c( q,w,q ) c( q,w,q ). Then R R.

10 0 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK Proof. Let R e the set of reminders r such tht: w Σ nd two sttes q nd q such tht r = c( q,w,q ) c( q,w,q ). Consider stte-reminder tuple in dt(g) reched y w from the initil stte, nd ssume tht q is the optiml stte in tht tuple, i.e., the one with zero reminder. Then the reminder ssocited to q is r. Thus, R R. We next show tht R = R. Clerly R R. To prove the other inclusion we only need to show tht the reminder r generted y ny string of length t lest n is generted y string of length t most n. Let p nd p e the pths of minimum cost in G, strting t q, ending t q nd q respectively, nd ech inducing u. Becuse u n nd there re only n distinct pirs of sttes in G, there exist two (not necessrily distinct) sttes, q nd q, in p nd p respectively, nd prtition of u = xvz, v Σ +, such tht {q,q } δ( q,x), q δ(q,v) nd q δ(q,v) (there re cycles t q nd q inducing v), nd, finlly, q δ(q,z) nd q δ(q,z). Since q nd q re twins, we hve tht c( q,u,q ) c( q,u,q ) = c( q,ū,q ) c( q,ū,q ), where ū = xz is in Σ + nd ū < u. If ū is of the required length, we re finished; otherwise, we iterte the rgument. Theorem.8. Let G e WFA stisfying the twins property. In the generl cse, #dt(g) < n( log n+n log Σ +) ; in the integer (or rtionl) cse, #dt(g) < n( log n++min(n log Σ,ρ)) ; nd if G is cyclic, #dt(g) < n log Σ independent of ny ssumptions on weights. The cyclic ound is tight (up to constnt fctors) for ny lphet. Proof. Let R e the set of reminders in dt(g). Ech stte in dt(g) is n i- tuple of sttes from G with corresponding i-tuple of reminders. In the worst cse, ech i-stte tuple from G will pper in dt(g), nd there re R i distinct i-tuples of reminders it cn ssume. (This over counts y including tuples without ny zero reminders.) Therefore, #dt(g) n i= ( ) n i R i ( R ) n. Let R e the set of reminders r for which the following holds: w Σ, w n, nd two sttes q nd q, such tht r = c( q,w,q ) c( q,w,q ). By Lemm.7, R R, so we cn ound R in different settings y ounding R. Generl Cse: The weights on the trnsitions of G re incommensurle rel numers, i.e., they require infinite precision s inry numers. Since ech string induced y G corresponds to t lest one pth in G, we hve y definition of R tht the crdinlity of this set is ounded y the numer of distinct pirs of pths of length t most n. There re t most n i= m i < m n such pths, where m is the numer of edges in G. Therefore R < m n. On the other hnd, the numer of strings of length t most n is ounded y Σ n. Since ech of those strings cn rech pir of (not necessrily distinct) sttes in G, we hve tht R < n Σ n. But Σ m, so n Σ n is tighter ound on R. Our first estimte follows. Integer Cse: The weights re non-negtive integers. Fix stte q nd string w tht reches q from the initil stte. Then c( q,w,q) is in [0,(n )C] Z. Therefore, the reminders in R must lso e in tht rnge. It follows tht ( R ) n < (n C) n = n( log n+ρ+). Since the topologicl ound on R we derived for the generl cse does not depend on the mgnitude of weights nd it holds lso for the cse we re considering, we hve tht ( R ) n < n( log n++min(n log Σ,ρ)). Our second estimte follows. Notice tht this results lso holds for the cse in which the weights re

11 ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA rtionl numers represented y ρ its. Acyclic Cse: The grph underlying G is cyclic. Thus, ech string induced y G is of length t most n. There re Σ n = n log Σ such strings. Ech of the strings induced y G will rech exctly one stte in dt(g) (which is deterministic utomton). Therefore, the numer of sttes of dt(g) is ounded y n log Σ. Tightness follows from Theorem 5.0. Processing ech tuple of stte-reminders generted y DTA tkes O( Σ (n+m)) time, excluding the cost of rithmetic nd min opertions involving two weights nd/or reminders, yielding the following. Theorem.9. Let G e WFA stisfying the twins property. DTA tkes O( Σ (n + m) #dt(g)) time using the R-RAM nd O(ρ Σ (n + m) #dt(g)) time using the CO-RAM. For the generl cse, using the R-RAM, the time is O( Σ (n + m) n( log n+n log Σ +) ). For the (rtionl or) integer cse, using the CO-RAM, the time is O(ρ Σ (n + m) n( log n++min(n log Σ,ρ)) ). For the cyclic cse, the time is O( Σ (n + m) n log Σ ) using the R-RAM nd O(ρ Σ (n + m) n log Σ ) using the CO- RAM. Theorems.8 nd.9 do not require G to e unmiguous. DTA termintes within the stted resource ounds on ny WFA tht hs the twins property. Consider in the integer cse the interply etween the growth of G when determinized, the time complexity of the lgorithm, nd the mgnitude of the weights. In the cyclic cse first, we hve tht #G S n log Σ, where S is the numer of distinct strings ccepted y G. In some sense, S gives the expressive power of G, i.e., how much informtion is compctly stored in G with the id of nondeterminism. For smll weights, i.e., ρ nlog Σ, the worst-cse time complexity of the lgorithm is dominted y the numer of strings ccepted y G. Therefore, we cn ctully uncompct some or ll of the informtion contined in G y eliminting nondeterminism. On the other hnd, when ρ > nlog Σ, the igger weights dd no informtion nd ctully slow down the lgorithm to the point tht, for very lrge weights, the rithmetic nd logic opertions dominte the cost of the entire lgorithm. For the cyclic cse, the sitution is nlogous, with weights plying n even more prominent role. Let ρ mx = n log Σ. For ρ < ρ mx, the estimte of #dt(g) depends on ρ, lthough we do not know how tight tht estimte is. For ρ ρ mx, the expnsion of G depends only on its topology, ut the lrge weights slow down the lgorithm..4. Computing Worst-Cse Weighting. The results of Section. cn e used to generte hrd instnces for ny determiniztion lgorithm. Let G e WFA. A reweighting function (or simply reweighting) f is such tht, when pplied to G, it preserves the topology nd leling of G, ut possily chnges the weights on its trnsitions. We wnt to determine reweighting f such tht min(f(g)) exists nd #min(f(g)) is mximized mong reweightings for which min(f(g)) exists. We restrict ttention to the integer cse nd, without loss of generlity, we ssume tht G is trim nd unmiguous. Theorem.8 shows tht for weights to hve n effect on the growth of dt(g), it must e tht ρ n log Σ. Set ρ mx = n log Σ. To find the required reweighting, we simply consider ll possile weight ssignments to G stisfying the twins property nd requiring t most ρ mx its, choosing the one tht leds to the minimum deterministic equivlent of mximum size. There re ( ρmx ) m = mρmx possile reweightings, nd it tkes O(n( log n+(n log Σ ))) time to compute the size expnsion or decide tht the resulting mchine cnnot e determinized. The totl time is thus

12 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK x ~ x x n- ~ x n ~ ~ x ~ x x n- ~ x n x 0 ~ n x n ~ n x n x n- n ~ n x n n Fig. 4.. A nondeterministic finite-stte utomton ccepting lnguge L = S n i= (Σ { i}) n. Arcs leled i denote trnsitions on ll symols in Σ { i }. ounded y O(n( log n+(n log Σ ))+mρ mx). 4. Hot Automt. This section provides fmily of cyclic, multi-prtite WFAs tht re hot: when determinized, they expnd independently of the weights on their trnsitions. Given some lphet Σ = {,..., n }, consider the lnguge L = n (Σ { i }) n ; i= i.e., the set of ll n-length strings tht do not include ll symols from Σ. It is simple to otin n cyclic, multi-prtite NFA H of poly(n) size tht ccepts L. (See Figure 4..) One cn lso show tht the miniml DFA ccepting L hs Θ( n+log n ) sttes. Furthermore, we cn construct H so tht these ounds hold for inry lphet: encode the symols in Σ s inry strings of length log n, nd replce rcs in the ove NFA with n-vertex, (log n)-depth trees ppropritely. H corresponds to WFA with ll rcs weighted identiclly. Since cyclic WFAs stisfy the twins property, they cn lwys e determinized. Altering the weights cn only increse the expnsion. Continuing, Kintl nd Wotschke [8] provide set of NFAs tht produces hierrchy of expnsion fctors when determinized. Consider the set of lnguges L h,k = {xy x,y {0,} ; x k ; y = k;x hs t most h s in it} for k, h < k. They show tht for ech L h,k, there is n O(k )-stte cyclic (ut not multi-prtite) NFA tht ccepts L h,k, yet ny DFA ccepting L h,k must hve t lest log h ) i=0 sttes. These provide dditionl exmples of hot WFAs. ( k i 5. Weight-Dependent Automt. In this section we ddress the effect of weights on the size of the deterministic equivlent of n input WFA. We study simple fmily of WFAs with multi-prtite, cyclic topology. When the rcs re ll weighted zero, ll WFAs in this fmily shrink when determinized. We show, however, tht even though the topology is y itself very enign, certin weightings cn cuse exponentil increses in size when the WFA is determinized. This study is relted in spirit to works tht mesure mounts of nondeterminism nd miguity in finite utomt [4, 5,

13 ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA T T T T k- T k- T k 0 B B B B k- B k- B k Fig. 5.. Topology of the k-lyer ril grph. 0 {T,B } {T,B } {T,B } {T k-,b k- } {T k-,b k- } {T k,b k } Fig. 5.. The result of determinizing RG(k) when ll rc weights re 0. In the result, ll rcs re gin weighted 0, nd the reminders in the vertices re ll 0; these vlues re omitted from the figure. 8]. We first discuss the cse of inry lphet nd then generlize to ritrry lphets. In this section, we use the term utomton nd grph interchngely. 5.. The Ril Grph. We denote y RG(k) the k-lyer ril grph. See Figure 5.. RG(k) hs k + vertices, which we denote y {0,T,B,...,T k,b k }. There re rcs (0,T,), (0,T,) (0,B,), (0,B,), nd then, for i < k, rcs (T i,t i+,), (T i,t i+,), (B i,b i+,), nd (B i,b i+,). (It should e cler from Figure 5. tht T stnds for top nd B for ottom. ) Note tht RG(k) is (k + )-prtite nd lso hs fixed in- nd out-degrees. (All vertices hve in- nd out-degrees, except the root, which hs in-degree 0 nd outdegree 4, nd the vertices T k nd B k, which hve out-degree 0.) If we consider the strings induced y pths from 0 to either T k or B k, then the lnguge of RG(k) is the set of strings L RG (k) = {,} k. The only nondeterministic choice is t the stte 0, where either the top or ottom ril my e selected. Hence string w cn e ccepted y one of two pths, one following the top ril nd the other the ottom ril. Techniclly, the ril grph is miguous. We cn esily dismigute RG(k) y dding trnsitions from T k nd B k, ech on distinct symol, to new finl stte. Our results extend to this cse. For clrity of presenttion, however, we discuss the miguous ril grph. The ril grph is weight-dependent. In Section 5. we provide weightings such tht DTA produces trivil (k + )-vertex series-prllel grph. (See Figure 5. for n exmple.) On the other hnd, in Section 5. we exhiit weightings for the ril grph such tht, when input to DTA, we get n exponentil increse in the numer of sttes. (See Figures 5. nd 5.4 for n exmple). Notice tht we cnnot get more thn k vertices, one per string in L RG (k), in the lst lyer of the determinized grph, nd thus the weighting in Figure 5. is in some sense worst cse. In tht section we lso explore the reltionship etween the mgnitude of the weights nd the mount of expnsion tht is possile. In Section 5.4, we show tht rndom weightings induce the ehvior of worst-cse weightings. We discuss vrints of the ril grph in Section 5.5, nd finlly, in Section 5.6 we generlize the ril grph to ritrry lphets.

14 4 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK / /4 / k- / k- 0 / /0 /0 /0 T B /0 /0 /0 T B /0 /0 /0 T B T k- B k- /0 /0 /0 T k- B k- /0 /0 /0 T k B k Fig. 5.. Worst-cse weighting of RG(k). Arc lel σ/w mens the rc is leled with symol σ nd hs weight w Fig Result of determinizing RG(5), weighted s in Figure 5.. Sttes hve een renmed. All rcs re weighted 0. The reminders re not shown. 5.. A Frmework for Exmining Weightings of RG(k). Consider determinizing RG(k) with DTA. The set of sttes rechle on ny string w = σ σ j of length j k is {T j,b j }. For given weighting function c, let c T (w) denote the cost of ccepting string w if the top pth is tken; i.e., j c T (w) = c(0,σ,t ) + c(t i,σ i+,t i+ ). Anlogously define c B (w) to e the corresponding cost long the ottom pth. Let R(w) e the reminder vector for w, which is pir of the form (0,c B (w) c T (w)) or (c T (w) c B (w),0). A stte t lyer 0 < i k in the determinized WFA is leled ({T i,b i }/R(w)) for ny string w leding to tht stte; i.e., ll strings leding to prticulr stte induce the sme reminder vector. Two strings w nd w of identicl length led to distinct sttes in the determinized version of the ril grph if nd only if R(w ) R(w ). It is convenient simply to write R(w) = c T (w) c B (w). The sign of R(w) then determines which of the two forms (0,x) or (x,0) of the reminder vector occurs. Suppose tht w hs length j nd cn e written w σ, where σ {,}. Let ri T(σ) denote the weight on the (top) rc leled σ into vertex T i nd ri B (σ) denote the weight on the (ottom) rc leled σ into vertex B i. Then we cn write R(w) = R(w ) + rj T(σ) rb j (σ). Areviting rt i (σ) rb i (σ) y δ i(σ), we hve R(w) = i= j δ i (σ i ). i= The ril grph with specific weighting cn lso e regrded s function tht hshes k-it string w into numer R(w). Define symol to e 0 nd symol to

15 ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA 5 e, so tht string w cn e viewed s sequence of its,..., k. We cn write R(,..., k ) = R(,..., k ) + δ k ( k ). Also, we cn write δ k ( k ) = k δ k () + ( k )δ k (0). Rerrnging gives δ k ( k ) = δ k (0) + k (δ k () δ k (0)). Summing over ll i gives Alterntively, R(w) = k (δ i (0) + i (δ i () δ i (0))). i= R(w) = R + k i (δ i () δ i (0)), (5.) i= where R = k i= δ k(0) is fixed for given weighting function on RG(k). Theorem 5.. There is reweighting f such tht oth dt(f(rg(k))) nd min(f(rg(k))) relize the topology of the (k + )-vertex trivil series-prllel grph (exemplified in Figure 5.). Proof. Any weighting in which δ i () = δ i () for i = to k suffices, since in this cse R(w ) = R(w ) for ll pirs of strings {w,w }. In prticulr, giving zero weights suffices. 5.. Worst-Cse Weightings of RG(k). See Figures 5. nd 5.4. Theorem 5.. For ny j [0,k] Z there exists reweighting f such tht dt(f(rg(k))) hs the following form: Lyers 0 through j form the complete inry tree on j+ vertices, nd the remining lyers j + through k consist of trivil series-prllel grphs, ech rooted t lef of the tree. Proof. Choose ny weighting such tht δ i () = i nd δ i () = 0 for i j, nd let δ i () = δ i () = 0 for j < i k. Consider pir of strings w,w of identicl length tht differ in position i j. Let σi = if the ith symol of w is nd σi = 0 otherwise; similrly define σ i with respect to w. Then we cn write R(w ) = j i= σ i i nd R(w ) = j i= σ i i. Since σi σ i, R(w ) must differ from R(w ). Hence the two strings must led to different sttes. If on the other hnd they differ only in positions i > j they will led to the sme stte. There re i strings tht differ in positions through i; thus for i j, there re i distinct vertices in the ith lyer of the grph. Since ech vertex hs out-degree, one rc for ech symol, the grph must hve the desired form. Note tht if we set ll weights on the ottom ril to zero, nd the weights ri T() = i nd ri T () = 0 for ll i k, we get weighting tht yields complete inry tree of depth k when DTA is pplied. It is esy to show tht the minimum deterministic grph preserving shortest pths, however, consists of trivil seriesprllel grph in which ll edges hve weight zero, corresponding to the lower ril. We cn remedy this y choosing weights more judiciously. Theorem 5.. For ny j [0,k] Z there is reweighting f such tht oth dt(f(rg(k))) nd min(f(rg(k))) hve the following form: Lyers 0 through j form the complete inry tree on j vertices, nd the remining lyers j through k form trivil series-prllel grph with incoming rcs to the lyer-j vertex from ech vertex t lyer j.

16 6 ADAM L. BUCHSBAUM, RAFFAELE GIANCARLO, nd JEFFERY R. WESTBROOK 0 /0 /0 /0 / /0 / /0 /6 /0 / /0 /6 /0 / /0 /4 /0 / /0 /0 /0 /6 /0 /4 /0 / /0 /0 /0 /6 5 /0 /0 6 /0 /0 7/0 Fig Result of minimizing dt(rg(6)), weighting RG(6) s follows: ri T () = rb i () = i for i 4; ll other weights were 0. Sttes hve een renmed, nd reminders re not shown. See Figure 5.5. Theorem 5. is generlized y Theorem 5.0, nd therefore we omit its proof here. Theorems 5. nd 5. show tht the ound otined in Theorem.8 for the cyclic cse is tight for inry lphets. We now ddress the sensitivity of the size expnsion to the mgnitude of the weights. Note tht RG(k) hs k + vertices nd 4k rcs, ut we use Θ(k) its to encode the weight of ech rc in the proofs of Theorems 5. nd 5.; the input size of the WFA is thus n = Θ(k ) its. The determinized WFA hs k+ sttes nd k+ trnsitions. Agin we need Θ(k) its to encode the weight of ech trnsition, so the it size of the determinized WFA is Θ(k k ), or Θ( n) its. So while the determinized WFA hs exponentilly more sttes thn the originl WFA, the size expnsion in its, while superpolynomil, is not exponentil. We rgue tht exponentil stte-spce expnsion requires exponentilly ig weights for the ril grph. Theorem 5.4. Let f e reweighting. If #dt(f(rg(k))) = Ω( k ), then Ω(k ) its re required to represent f(rg(k)). Proof. Consider the Ω( k ) vertices t depth k in the determinized grph. Ech such stte is leled y distinct R(w) for some string w = σ σ k. Hence if there re Ω( k ) sttes in dt(f(rg(k))), there must e Ω( k ) distinct vlues of R(w). In ddition, there must e Ω( k ) distinct vlues of the solute vlue of R(w). Reclling the formultion of R(w) from Eqution (5.), there cn e t most k distinct vlues of R(w), where k is the numer of distinct vlues of (δ i () δ i (0)). Ech vlue my e included in the sum or not, nd t est choice of inclusions nd exclusions will led to unique sum. Therefore, the ssumption of Ω( k ) distinct reminders implies there must e Ω(k) distinct vlues of the (δ i () δ i (0)). Now ignore the k low-order its of the solute vlues of the reminders, nd consider only the remining high-order its. There must e Ω( k ) distinct vlues induced y the high-order its, or else there cnnot e Ω( k ) distinct vlues overll. By the sme rgument s ove, there must e Ω( k ) distinct vlues of the (δ i() δ i (0)) tht ffect the high-order its of lrge reminder, i.e., one with one of its high-order

17 ON THE DETERMINIZATION OF WEIGHTED FINITE AUTOMATA 7 k its set to. For prticulr (δ i () δ i (0)) to ffect high-order it of lrge reminder, (δ i () δ i (0)) must hve non-zero it t lest s high s position k log k. This is only true if one of the four rc weights for lyer i hs non-zero it t lest tht high. Therefore, Ω(k) rc weights require some non-zero it t lest s high s position k log k. Hence Ω(k ) its re required to represent ll the rc weights. Corollry 5.5. Let f e reweighting. If #min(f(rg(k))) = Ω( k ), then Ω(k ) its re required to represent f(rg(k)). Proof. Theorem 5.4 pplies, ecuse #min(f(rg(k))) = Ω( k ) implies tht #dt(f(rg(k))) = Ω( k ). Finlly, consider the following nlogy etween the hot grphs in Section 4 nd the ril grph. Oserve tht the hot grphs in Section 4 contin some nondeterministic choices tht cnnot e resolved until the end of the input. This cuses the respective deterministic expnsions. In those grphs, these choices re prt of the strings eing ccepted. The ril grph mnifests this sme phenomenon, ut in terms of weights rther thn strings. The weighted vrints of the ril grph tht expnd when determinized do so ecuse it is not cler until the end of the expnsion which ril will provide the shorter pth: t ny point, the choice of top or ottom ril depends on the symols tht follow. Therefore, the determiniztion must mintin enough stte informtion to provide for ll possile outcomes. Furthermore, in the non-minimizle cses, wheres the lnguge L RG (k) itself could e ccepted y (k + )-stte DFA, the weights on RG(k) necessitte n exponentil numer of sttes nd rcs in ny deterministic WFA tht induces ll the pproprite pth lengths Rndom Weightings of RG(k). An i-it reweighting function (or simply i-it reweighting) is reweighting function f such tht the weights on the rcs of f(g) re constrined to e in [0, i ] Z. A function f R is rndom reweighting function (or simply rndom reweighting) if nd only if it chooses the weights to ssign to the trnsitions of G uniformly nd independently t rndom from R + {0, }. Finlly, let x R Y denote tht x is selected uniformly nd independently t rndom from set Y, nd let E[X] denote the expected vlue of some rndom vrile X. We need the following technicl clim. Clim 5.6. Let X,Y,U,V R [0, k ]. Then mx Pr(X Y (U V ) = i) k+ +<i< k+ k + O ( /4 k). Proof. See Appendix A.. Theorem 5.7. Let f R e rndom k-it reweighting. E[#dt(f R (RG(k)))] = Θ( k ). Proof. As efore, let R(w) e the reminder induced y string w of length k; i.e., the difference etween the cost of the upper pth nd the cost of the lower pth tht respectively induce w. Let δ i (σ) e the cost of the (top) rc leled σ into vertex T i minus the cost of the (ottom) rc leled σ into vertex B i. Recll from Eqution (5.) tht the ril grph with specific weighting cn e regrded s function tht hshes k-it string w = σ σ k into numer R(w). Suppose w w. We cn dpt stndrd nlysis from the theory of universl hsh functions [7] to clculte the proility tht R(w ) = R(w ). Let w = α α k nd w = β β k. Without loss of generlity, ssume α k β k. (The strings must

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

Formal Languages and Automata

Formal Languages and Automata Moile Computing nd Softwre Engineering p. 1/5 Forml Lnguges nd Automt Chpter 2 Finite Automt Chun-Ming Liu cmliu@csie.ntut.edu.tw Deprtment of Computer Science nd Informtion Engineering Ntionl Tipei University

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton 25. Finite Automt AUTOMATA AND LANGUAGES A system of computtion tht only hs finite numer of possile sttes cn e modeled using finite utomton A finite utomton is often illustrted s stte digrm d d d. d q

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1 Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more

More information

Lecture 08: Feb. 08, 2019

Lecture 08: Feb. 08, 2019 4CS4-6:Theory of Computtion(Closure on Reg. Lngs., regex to NDFA, DFA to regex) Prof. K.R. Chowdhry Lecture 08: Fe. 08, 2019 : Professor of CS Disclimer: These notes hve not een sujected to the usul scrutiny

More information

Formal languages, automata, and theory of computation

Formal languages, automata, and theory of computation Mälrdlen University TEN1 DVA337 2015 School of Innovtion, Design nd Engineering Forml lnguges, utomt, nd theory of computtion Thursdy, Novemer 5, 14:10-18:30 Techer: Dniel Hedin, phone 021-107052 The exm

More information

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016 CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

First Midterm Examination

First Midterm Examination Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers Speech Recognition Lecture 2: Finite Automt nd Finite-Stte Trnsducers Eugene Weinstein Google, NYU Cournt Institute eugenew@cs.nyu.edu Slide Credit: Mehryr Mohri Preliminries Finite lphet, empty string.

More information

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38 Theory of Computtion Regulr Lnguges (NTU EE) Regulr Lnguges Fll 2017 1 / 38 Schemtic of Finite Automt control 0 0 1 0 1 1 1 0 Figure: Schemtic of Finite Automt A finite utomton hs finite set of control

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

CM10196 Topic 4: Functions and Relations

CM10196 Topic 4: Functions and Relations CM096 Topic 4: Functions nd Reltions Guy McCusker W. Functions nd reltions Perhps the most widely used notion in ll of mthemtics is tht of function. Informlly, function is n opertion which tkes n input

More information

3 Regular expressions

3 Regular expressions 3 Regulr expressions Given n lphet Σ lnguge is set of words L Σ. So fr we were le to descrie lnguges either y using set theory (i.e. enumertion or comprehension) or y n utomton. In this section we shll

More information

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research Speech Recognition Lecture 2: Finite Automt nd Finite-Stte Trnsducers Mehryr Mohri Cournt Institute nd Google Reserch mohri@cims.nyu.com Preliminries Finite lphet Σ, empty string. Set of ll strings over

More information

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages Deprtment of Computer Science, Austrlin Ntionl University COMP2600 Forml Methods for Softwre Engineering Semester 2, 206 Assignment Automt, Lnguges, nd Computility Smple Solutions Finite Stte Automt nd

More information

Theory of Computation Regular Languages

Theory of Computation Regular Languages Theory of Computtion Regulr Lnguges Bow-Yw Wng Acdemi Sinic Spring 2012 Bow-Yw Wng (Acdemi Sinic) Regulr Lnguges Spring 2012 1 / 38 Schemtic of Finite Automt control 0 0 1 0 1 1 1 0 Figure: Schemtic of

More information

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014 CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information

CHAPTER 1 Regular Languages. Contents

CHAPTER 1 Regular Languages. Contents Finite Automt (FA or DFA) CHAPTE 1 egulr Lnguges Contents definitions, exmples, designing, regulr opertions Non-deterministic Finite Automt (NFA) definitions, euivlence of NFAs nd DFAs, closure under regulr

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages 5//6 Grmmr Automt nd Lnguges Regulr Grmmr Context-free Grmmr Context-sensitive Grmmr Prof. Mohmed Hmd Softwre Engineering L. The University of Aizu Jpn Regulr Lnguges Context Free Lnguges Context Sensitive

More information

More on automata. Michael George. March 24 April 7, 2014

More on automata. Michael George. March 24 April 7, 2014 More on utomt Michel George Mrch 24 April 7, 2014 1 Automt constructions Now tht we hve forml model of mchine, it is useful to mke some generl constructions. 1.1 DFA Union / Product construction Suppose

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

State Minimization for DFAs

State Minimization for DFAs Stte Minimiztion for DFAs Red K & S 2.7 Do Homework 10. Consider: Stte Minimiztion 4 5 Is this miniml mchine? Step (1): Get rid of unrechle sttes. Stte Minimiztion 6, Stte is unrechle. Step (2): Get rid

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2 CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

Finite Automata-cont d

Finite Automata-cont d Automt Theory nd Forml Lnguges Professor Leslie Lnder Lecture # 6 Finite Automt-cont d The Pumping Lemm WEB SITE: http://ingwe.inghmton.edu/ ~lnder/cs573.html Septemer 18, 2000 Exmple 1 Consider L = {ww

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb. CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science CSCI 340: Computtionl Models Kleene s Theorem Chpter 7 Deprtment of Computer Science Unifiction In 1954, Kleene presented (nd proved) theorem which (in our version) sttes tht if lnguge cn e defined y ny

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17 CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking

More information

set is not closed under matrix [ multiplication, ] and does not form a group.

set is not closed under matrix [ multiplication, ] and does not form a group. Prolem 2.3: Which of the following collections of 2 2 mtrices with rel entries form groups under [ mtrix ] multipliction? i) Those of the form for which c d 2 Answer: The set of such mtrices is not closed

More information

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-* Regulr Expressions (RE) Regulr Expressions (RE) Empty set F A RE denotes the empty set Opertion Nottion Lnguge UNIX Empty string A RE denotes the set {} Alterntion R +r L(r ) L(r ) r r Symol Alterntion

More information

1.3 Regular Expressions

1.3 Regular Expressions 56 1.3 Regulr xpressions These hve n importnt role in describing ptterns in serching for strings in mny pplictions (e.g. wk, grep, Perl,...) All regulr expressions of lphbet re 1.Ønd re regulr expressions,

More information

Harvard University Computer Science 121 Midterm October 23, 2012

Harvard University Computer Science 121 Midterm October 23, 2012 Hrvrd University Computer Science 121 Midterm Octoer 23, 2012 This is closed-ook exmintion. You my use ny result from lecture, Sipser, prolem sets, or section, s long s you quote it clerly. The lphet is

More information

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9. Regulr Expressions, Pumping Lemm, Right Liner Grmmrs Ling 106 Mrch 25, 2002 1 Regulr Expressions A regulr expression descries or genertes lnguge: it is kind of shorthnd for listing the memers of lnguge.

More information

The size of subsequence automaton

The size of subsequence automaton Theoreticl Computer Science 4 (005) 79 84 www.elsevier.com/locte/tcs Note The size of susequence utomton Zdeněk Troníček,, Ayumi Shinohr,c Deprtment of Computer Science nd Engineering, FEE CTU in Prgue,

More information

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont. NFA DFA Exmple 3 CMSC 330: Orgniztion of Progrmming Lnguges NFA {B,D,E {A,E {C,D {E Finite Automt, con't. R = { {A,E, {B,D,E, {C,D, {E 2 Equivlence of DFAs nd NFAs Any string from {A to either {D or {CD

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

First Midterm Examination

First Midterm Examination 24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

Lecture 09: Myhill-Nerode Theorem

Lecture 09: Myhill-Nerode Theorem CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives

More information

1.4 Nonregular Languages

1.4 Nonregular Languages 74 1.4 Nonregulr Lnguges The number of forml lnguges over ny lphbet (= decision/recognition problems) is uncountble On the other hnd, the number of regulr expressions (= strings) is countble Hence, ll

More information

Homework Solution - Set 5 Due: Friday 10/03/08

Homework Solution - Set 5 Due: Friday 10/03/08 CE 96 Introduction to the Theory of Computtion ll 2008 Homework olution - et 5 Due: ridy 10/0/08 1. Textook, Pge 86, Exercise 1.21. () 1 2 Add new strt stte nd finl stte. Mke originl finl stte non-finl.

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

Regular expressions, Finite Automata, transition graphs are all the same!!

Regular expressions, Finite Automata, transition graphs are all the same!! CSI 3104 /Winter 2011: Introduction to Forml Lnguges Chpter 7: Kleene s Theorem Chpter 7: Kleene s Theorem Regulr expressions, Finite Automt, trnsition grphs re ll the sme!! Dr. Neji Zgui CSI3104-W11 1

More information

Worked out examples Finite Automata

Worked out examples Finite Automata Worked out exmples Finite Automt Exmple Design Finite Stte Automton which reds inry string nd ccepts only those tht end with. Since we re in the topic of Non Deterministic Finite Automt (NFA), we will

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Converting Regular Expressions to Discrete Finite Automata: A Tutorial Converting Regulr Expressions to Discrete Finite Automt: A Tutoril Dvid Christinsen 2013-01-03 This is tutoril on how to convert regulr expressions to nondeterministic finite utomt (NFA) nd how to convert

More information

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers 80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES 2.6 Finite Stte Automt With Output: Trnsducers So fr, we hve only considered utomt tht recognize lnguges, i.e., utomt tht do not produce ny output on ny input

More information

Lecture 9: LTL and Büchi Automata

Lecture 9: LTL and Büchi Automata Lecture 9: LTL nd Büchi Automt 1 LTL Property Ptterns Quite often the requirements of system follow some simple ptterns. Sometimes we wnt to specify tht property should only hold in certin context, clled

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is

More information

1 From NFA to regular expression

1 From NFA to regular expression Note 1: How to convert DFA/NFA to regulr expression Version: 1.0 S/EE 374, Fll 2017 Septemer 11, 2017 In this note, we show tht ny DFA cn e converted into regulr expression. Our construction would work

More information

Closure Properties of Regular Languages

Closure Properties of Regular Languages Closure Properties of Regulr Lnguges Regulr lnguges re closed under mny set opertions. Let L 1 nd L 2 e regulr lnguges. (1) L 1 L 2 (the union) is regulr. (2) L 1 L 2 (the conctention) is regulr. (3) L

More information

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.) CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

Java II Finite Automata I

Java II Finite Automata I Jv II Finite Automt I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum für künstliche Intelligenz Finite Automt I p.1/13 Processing Regulr Expressions We lredy lerned out Jv s regulr expression

More information

Surface maps into free groups

Surface maps into free groups Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The

More information

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS The University of Nottinghm SCHOOL OF COMPUTER SCIENCE LEVEL 2 MODULE, SPRING SEMESTER 2016 2017 LNGUGES ND COMPUTTION NSWERS Time llowed TWO hours Cndidtes my complete the front cover of their nswer ook

More information

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30 Tlen en Automten Test 1, Mon 7 th Dec, 2015 15h45 17h30 This test consists of four exercises over 5 pges. Explin your pproch, nd write your nswer to ech exercise on seprte pge. You cn score mximum of 100

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

DFA minimisation using the Myhill-Nerode theorem

DFA minimisation using the Myhill-Nerode theorem DFA minimistion using the Myhill-Nerode theorem Johnn Högerg Lrs Lrsson Astrct The Myhill-Nerode theorem is n importnt chrcteristion of regulr lnguges, nd it lso hs mny prcticl implictions. In this chpter,

More information

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations. Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014 CS125 Lecture 12 Fll 2014 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck. Outline Automt Theory 101 Rlf Huuck Introduction Finite Automt Regulr Expressions ω-automt Session 1 2006 Rlf Huuck 1 Session 1 2006 Rlf Huuck 2 Acknowledgement Some slides re sed on Wolfgng Thoms excellent

More information

Hamiltonian Cycle in Complete Multipartite Graphs

Hamiltonian Cycle in Complete Multipartite Graphs Annls of Pure nd Applied Mthemtics Vol 13, No 2, 2017, 223-228 ISSN: 2279-087X (P), 2279-0888(online) Pulished on 18 April 2017 wwwreserchmthsciorg DOI: http://dxdoiorg/1022457/pmv13n28 Annls of Hmiltonin

More information

CS103 Handout 32 Fall 2016 November 11, 2016 Problem Set 7

CS103 Handout 32 Fall 2016 November 11, 2016 Problem Set 7 CS103 Hndout 32 Fll 2016 Novemer 11, 2016 Prolem Set 7 Wht cn you do with regulr expressions? Wht re the limits of regulr lnguges? On this prolem set, you'll find out! As lwys, plese feel free to drop

More information

CDM Automata on Infinite Words

CDM Automata on Infinite Words CDM Automt on Infinite Words 1 Infinite Words Klus Sutner Crnegie Mellon Universlity 60-omeg 2017/12/15 23:19 Deterministic Lnguges Muller nd Rin Automt Towrds Infinity 3 Infinite Words 4 As mtter of principle,

More information

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS CS 310 (sec 20) - Winter 2003 - Finl Exm (solutions) SOLUTIONS 1. (Logic) Use truth tles to prove the following logicl equivlences: () p q (p p) (q q) () p q (p q) (p q) () p q p q p p q q (q q) (p p)

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018 CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35 Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA

More information

FABER Formal Languages, Automata and Models of Computation

FABER Formal Languages, Automata and Models of Computation DVA337 FABER Forml Lnguges, Automt nd Models of Computtion Lecture 5 chool of Innovtion, Design nd Engineering Mälrdlen University 2015 1 Recp of lecture 4 y definition suset construction DFA NFA stte

More information

Lecture 3: Equivalence Relations

Lecture 3: Equivalence Relations Mthcmp Crsh Course Instructor: Pdric Brtlett Lecture 3: Equivlence Reltions Week 1 Mthcmp 2014 In our lst three tlks of this clss, we shift the focus of our tlks from proof techniques to proof concepts

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Exercises with (Some) Solutions

Exercises with (Some) Solutions Exercises with (Some) Solutions Techer: Luc Tesei Mster of Science in Computer Science - University of Cmerino Contents 1 Strong Bisimultion nd HML 2 2 Wek Bisimultion 31 3 Complete Lttices nd Fix Points

More information

Deterministic Finite Automata

Deterministic Finite Automata Finite Automt Deterministic Finite Automt H. Geuvers nd J. Rot Institute for Computing nd Informtion Sciences Version: fll 2016 J. Rot Version: fll 2016 Tlen en Automten 1 / 21 Outline Finite Automt Finite

More information

BACHELOR THESIS Star height

BACHELOR THESIS Star height BACHELOR THESIS Tomáš Svood Str height Deprtment of Alger Supervisor of the chelor thesis: Study progrmme: Study rnch: doc. Štěpán Holu, Ph.D. Mthemtics Mthemticl Methods of Informtion Security Prgue 217

More information

Lecture 2: January 27

Lecture 2: January 27 CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full

More information

How Deterministic are Good-For-Games Automata?

How Deterministic are Good-For-Games Automata? How Deterministic re Good-For-Gmes Automt? Udi Boker 1, Orn Kupfermn 2, nd Mich l Skrzypczk 3 1 Interdisciplinry Center, Herzliy, Isrel 2 The Herew University, Isrel 3 University of Wrsw, Polnd Astrct

More information

CONTEXT-SENSITIVE LANGUAGES, RATIONAL GRAPHS AND DETERMINISM

CONTEXT-SENSITIVE LANGUAGES, RATIONAL GRAPHS AND DETERMINISM Logicl Methods in Computer Science Vol. 2 (2:6) 2006, pp. 1 24 www.lmcs-online.org Sumitted Jn. 31, 2005 Pulished Jul. 19, 2006 CONTEXT-SENSITIVE LANGUAGES, RATIONAL GRAPHS AND DETERMINISM ARNAUD CARAYOL

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 utomt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Prolem (II) Chpter II.5.: Properties of Context Free Grmmrs (14) nton Setzer (Bsed on ook drft y J. V. Tucker nd K. Stephenson)

More information

Designing Information Devices and Systems I Spring 2018 Homework 7

Designing Information Devices and Systems I Spring 2018 Homework 7 EECS 16A Designing Informtion Devices nd Systems I Spring 2018 omework 7 This homework is due Mrch 12, 2018, t 23:59. Self-grdes re due Mrch 15, 2018, t 23:59. Sumission Formt Your homework sumission should

More information

A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data

A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data A likelihood-rtio test for identifying proilistic deterministic rel-time utomt from positive dt Sicco Verwer 1, Mthijs de Weerdt 2, nd Cees Witteveen 2 1 Eindhoven University of Technology 2 Delft University

More information

Theoretical foundations of Gaussian quadrature

Theoretical foundations of Gaussian quadrature Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of

More information

Revision Sheet. (a) Give a regular expression for each of the following languages:

Revision Sheet. (a) Give a regular expression for each of the following languages: Theoreticl Computer Science (Bridging Course) Dr. G. D. Tipldi F. Bonirdi Winter Semester 2014/2015 Revision Sheet University of Freiurg Deprtment of Computer Science Question 1 (Finite Automt, 8 + 6 points)

More information