Simpler & More Generl Minimiztion for Weighted Finite-Stte Automt Json Eisner Johns Hopkins University My 28, 2003 HLT-NAACL First hlf of tlk is setup - revies pst ork. Second hlf gives outline of the ne results. The Minimiztion Prolem Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Represents the lnguge {,,, } The Minimiztion Prolem The Minimiztion Prolem Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Represents the lnguge {,,, } Represents the lnguge {,,, } The Minimiztion Prolem The Minimiztion Prolem Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Represents the lnguge {,,, } Represents the lnguge {,,, } 1
The Minimiztion Prolem The Minimiztion Prolem Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Here s ht you should orry out: Cn t lys ork ckrd from finl stte like this. A it more complicted ecuse of cycles. Don t orry out it for this tlk. Mergele ecuse they hve the sme suffi lnguge: {,} Mergele ecuse they hve the sme suffi lnguge: {} An equivlence reltion on sttes merge the equivlence clsses The Minimiztion Prolem Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Q: Why minimize # sttes, rther thn # rcs? A: Minimizing # sttes lso minimizes # rcs! Q: Wht if the input is n NDFA (nondeterministic)? A: Determinize it first. (could yield eponentil loup ) Q: Ho out minimizing n NDFA to n NDFA? A: Yes, could e eponentilly smller, ut prolem is PSPACE-complete so e don t try. Rel-World NLP: Automt With Weights or Outputs Finite-stte computtion of functions Conctente strings : : z Add scores :3 :2 d:0 7 Multiply proilities :0.3 :0.2 d:1 0.7 d cd z d 5 cd 9 d 0.06 cd 0.14 Rel-World NLP: Automt With Weights or Outputs Wnt to compute functions on strings: Σ* K After ll, e re doing lnguge nd speech! Finite-stte mchines cn often do the jo Esy to uild, esy to comine, run fst Build them ith eighted regulr epressions To clen up the resulting DFA, minimize it to merge redundnt portions This smller mchine is fster to intersect/compose More likely to fit on hnd-held device More likely to fit into cche memory Rel-World NLP: Automt With Weights or Outputs Wnt to compute functions on strings: Σ* K After ll, e re doing lnguge nd speech! Finite-stte mchines cn often do the jo Ho do e minimize such DFAs? Didn t Mohri lredy nser this question? Only for specil cses of the output set K! Is there generl recipe? Wht ne lgorithms cn e cook ith it? 2
Weight Algers Finite-stte computtion of fu Specify eight lger (K, ) Conctente strings Define DFAs over (K, ) : Arcs hve eights in set K : A pth s eight is lso in K: multiply its rc eights ith z Emples: Add scores (strings, conctention) :3 :2 d:0 (scores, ddition) (proilities, multipliction) 7 (score vectors, ddition) OT phonology (rel eights, multipliction) conditionl Multiply rndom proilities fields, rtionl kernels (ojective func & grdient, trining the prmeters of :0.3 model product-rule multipliction) :0.2 d:1 (it vectors, conjunction) memership in multiple lnguges t once 0.7 Weight Algers Specify eight lger (K, ) Define DFAs over (K, ) Arcs hve eights in set K A pth s eight is lso in K: multiply its rc eights ith Q: Semiring is (K,, ). Why ren t you tlking out too? A: Minimiztion is out DFAs. At most one pth per input. So no need to the eights of multiple ccepting pths. Finite-stte computtion of fu Conctente strings : : z Add scores :3 :2 d:0 7 Multiply proilities :0.3 :0.2 d:1 0.7 Shifting Outputs Along Pths Doesn t chnge the function computed: Shifting Outputs Along Pths Doesn t chnge the function computed: : : z d cd z : : z d cd z Shifting Outputs Along Pths Doesn t chnge the function computed: Shifting Outputs Along Pths Doesn t chnge the function computed: : : z d cd z :ε : z d cd z 3
Shifting Outputs Along Pths Doesn t chnge the function computed: Shifting Outputs Along Pths Doesn t chnge the function computed: : : z d cd z :3 :2 d:0 7 d 5 cd 9 Shifting Outputs Along Pths Doesn t chnge the function computed: Shifting Outputs Along Pths Doesn t chnge the function computed: 2 3 :2+1 :3-1 d:0 7-1 6 d 5 cd 9 1 4 :2+2 :3-2 d:0 7-2 5 d 5 cd 9 Shifting Outputs Along Pths Shifting Outputs Along Pths Doesn t chnge the function computed: 0 5 :2+3 :3-3 d:0 7-3 4 d 5 cd 9 : : z d cd z ed u ecd uz 4
Shifting Outputs Along Pths Shifting Outputs Along Pths Stte sucks ck prefi from its out-rcs : : z d cd z ed u ecd uz Stte sucks ck prefi from its out-rcs nd deposits it t end of its in-rcs. : : z d cd z ed u ecd uz Shifting Outputs Along Pths Shifting Outputs Along Pths : : z d cd z ed u ecd uz : : : z d cd z ed u ecd uz n d u() n n cd u() n z Shifting Outputs Along Pths Shifting Outputs Along Pths : : : z d cd z ed u ecd uz : : : z d cd z ed u ecd uz n d u() n n cd u() n z n d u() n n cd u() n z n d u() n n cd u() n z 5
Shifting Outputs Along Pths Shifting Outputs Along Pths : : : z d cd z ed u ecd uz : : : z d cd z ed u ecd uz n d u() n n cd u() n z n d u() n n cd u() n z n d u() n n cd u() n z n d u() n n cd u() n z Shifting Outputs Along Pths (Mohri) Shifting Outputs Along Pths (Mohri) Here, not ll the out-rcs strt ith But ll the out-pths strt ith Do pushck t lter sttes first: : : ε :z ε: ε d: Here, not ll the out-rcs strt ith But ll the out-pths strt ith Do pushck t lter sttes first: no e re ok! : : : z ε: ε Shifting Outputs Along Pths (Mohri) Shifting Outputs Along Pths (Mohri) Here, not ll the out-rcs strt ith But ll the out-pths strt ith Do pushck t lter sttes first: no e re ok! : : ε : z ε: ε Here, not ll the out-rcs strt ith But ll the out-pths strt ith Do pushck t lter sttes first: no e re ok! : : ε : z ε: ε 6
Shifting Outputs Along Pths (Mohri) Actully, push ck t ll sttes t once Shifting Outputs Along Pths (Mohri) Actully, push ck t ll sttes t once At every stte q, compute some λ(q) : : ε ε: ε d: : : ε ε ε: ε d: :z :z Shifting Outputs Along Pths (Mohri) Shifting Outputs Along Pths (Mohri) Actully, push ck t ll sttes t once Add λ(q) to end of q s in-rcs : : ε ε ε : z ε: ε d: Actully, push ck t ll sttes t once Add λ(q) to end of q s in-rcs Remove λ(q) from strt of q s out-rcs : : ε ε :z ε ε: ε d: Shifting Outputs Along Pths (Mohri) Actully, push ck t ll sttes t once Add λ(q) to end of q s in-rcs Remove λ(q) from strt of q s out-rcs q :k r : : ecomes ε : z q ε: ε : λ(q) -1 k λ(r) r Mergele ecuse they ccept the sme suffi lnguge: {,} 7
Still ccept sme suffi lnguge, ut produce different outputs on it : :ε :y :zz :y :zzz :z :ε Still ccept sme suffi lnguge, ut produce different outputs on it : :ε Not mergele - compute different suffi functions: yz or y cd zzz or zzz :y :zz :y :zzz :z :ε Fi y shifting outputs leftrd Fi y shifting outputs leftrd : :ε :y :zz :y : zzz :z :ε : :ε :y :zz : : y zzz :z :ε Fi y shifting outputs leftrd If e do this t ll sttes s efore : :y :zz :z : :y :zz y :z : No mergele - they hve the sme suffi function: yz cd zzz : : y zzz But still no esy y to detect mergeility. :ε : No mergele - they hve the sme suffi function: yz cd zzz : : zzz :ε 8
If e do this t ll sttes s efore If e do this t ll sttes s efore : : No mergele - they hve the sme suffi function: yz cd zzz :y :zz : : y zzz z :ε :ε : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz : : yz zzz :ε :ε No these hve the sme sufffi function too: ε No e cn discover & perform the merges: Tret ech lel :yz s single tomic symol : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz : : yz zzz no these hve sme rc lels :ε :ε so do these ecuse e rrnged for cnonicl plcement of outputs long pths : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz : : yz zzz no these hve sme rc lels :ε :ε so do these ecuse e rrnged for cnonicl plcement of outputs long pths Tret ech lel :yz s single tomic symol Tret ech lel :yz s single tomic symol : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz : : yz zzz no these hve sme rc lels :ε :ε so do these ecuse e rrnged for cnonicl plcement of outputs long pths : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz :yz :zzz no these hve sme rc lels :ε :ε so do these ecuse e rrnged for cnonicl plcement of outputs long pths 9
Tret ech lel :yz s single tomic symol Use uneighted minimiztion lgorithm! : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz :yz :zzz :ε :ε Tret ech lel :yz s single tomic symol Use uneighted minimiztion lgorithm! : : No mergele - they hve the sme suffi lnguge: {:yz :ε, :zzz :ε} :yz :zzz :yz :zzz :ε :ε Tret ech lel :yz s single tomic symol Use uneighted minimiztion lgorithm! : : :yz :zzz :yz :zzz :ε :ε Summry of eighted minimiztion lgorithm: 1. Compute λ(q) t ech stte q 2. Push ech λ(q) ck through stte q; this chnges rc eights 3. Merge sttes vi uneighted minimiztion Step 3 merges sttes Step 2 llos more sttes to merge t step 3 Step 1 controls ht step 2 does preferly, to give sttes the sme suffi function henever possile So define λ(q) crefully t step 1! Mohri s Algorithms (1997, 2000) Mohri treted to versions of (K, ) (K, ) = (strings, conctention) λ(q) = longest common prefi of ll pths from q Rther tricky to find : λ = : ε :z ε:ε d: Mohri s Algorithms (1997, 2000) Mohri treted to versions of (K, ) (K, ) = (strings, conctention) λ(q) = longest common prefi of ll pths from q Rther tricky to find (K, ) = (nonnegtive rels, ddition) λ(q) = minimum eight of ny pth from q Find it y Dijkstr s shortest-pth lgorithm λ = 8 :2 :7 d:2 2 d:2 e:3 :13 ε:2 d:99 10
Mohri s Algorithms (1997, 2000) Mohri treted to versions of (K, ) (K, ) = (strings, conctention) λ(q) = longest common prefi of ll pths from q Rther tricky to find (K, ) = (nonnegtive rels, ddition) λ(q) = minimum eight of ny pth from q Find it y Dijkstr s shortest-pth lgorithm λ = 8 :1 :2 d:0 8 0 d:0 e:3 ε:0 :13 d:95 Mohri s Algorithms (1997, 2000) Mohri treted to versions of (K, ) (K, ) = (strings, conctention) λ(q) = longest common prefi of ll pths from q Rther tricky to find (K, ) = (nonnegtive rels, ddition) λ(q) = minimum eight of ny pth from q Find it y Dijkstr s shortest-pth lgorithm λ = 8 :10 :1 d:0 0 d:0 e:11 :13 ε:0 d:95 Mohri s Algorithms (1997, 2000) Mohri treted to versions of (K, ) (K, ) = (strings, conctention) λ(q) = longest common prefi of ll pths from q Rther tricky to find (K, ) = (nonnegtive rels, ddition) λ(q) = minimum eight of ny pth from q Find it y Dijkstr s shortest-pth lgorithm In oth cses: λ(q) = sum over infinite set of pth eights must define this sum nd n lgorithm to compute it doesn t generlize utomticlly to other (K, )... Mohri s Algorithms (1997, 2000) (rel eights, multipliction)? (score vectors, ddition)? (ojective func & grdient, product-rule multipliction)? e.g., ht if e lloed negtive rels? Then minimum might not eist! 2 (K, ) = (nonnegtive rels, ddition) λ(q) = minimum eight of ny pth from q -3 Find it y Dijkstr s lgorithm In oth cses: λ(q) = sum over infinite set of pth eights must define this sum nd n lgorithm to compute it doesn t generlize utomticlly to other (K, )... Generlizing the Strtegy End of ckground mteril. No e cn sketch the ne results! Wnt to minimize DFAs in ny (K, ) Given (K, ) Just need definition of λ... then use generl lg. λ should etrct n pproprite left fctor from stte q s suffi function F q : Σ* K Rememer, F q is the function tht the utomton ould compute if stte q ere the strt stte Wht properties must λ hve to gurntee tht e get the minimum equivlent mchine? 11
Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Quotient: λ(f) is left fctor of λ( -1 F) Finl-quotient: λ(f) is left fctor of F(ε) Then pushing + merging is gurnteed to minimize the mchine. Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Suffi functions cn e ritten s F nd yy F: :z :yyz :z :yyz Shifting property sys: When e remove the prefies λ( F) nd λ(yy F) e ill remove nd yy respectively Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Suffi functions cn e ritten s F nd yy F: : z : z yy : z : z Shifting property sys: When e remove the prefies λ( F) nd λ(yy F) e ill remove nd yy respectively leving ehind common residue. Actully, remove λ(f) nd yy λ(f). Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Suffi functions cn e ritten s F nd yy F: : : z yyz : : Shifting property sys: When e remove the prefies λ( F) nd λ(yy F) e ill remove nd yy respectively leving ehind common residue. Actully, remove λ(f) nd yy λ(f). Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Quotient: λ(f) is left fctor of λ( -1 F) q :k r ecomes q : λ(f q ) -1 k λ(f r ) = λ(f q ) -1 λ(k F r ) = λ(f q ) -1 λ( -1 F q ) Quotient property sys tht this quotient eists even if λ(f q ) doesn t hve multiplictive inverse. r Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Quotient: λ(f) is left fctor of λ( -1 F) Finl-quotient: λ(f) is left fctor of F(ε) Gurntees e cn find finl-stte stopping eights. If e didn t hve this se cse, e couldn t prove: λ(f) is left fctor of every output in rnge(f). Then pushing + merging is gurnteed to minimize. 12
A Ne Specific Algorithm Mohri s lgorithms instntite this strtegy. They use prticulr definitions of λ. λ(q) = longest common string prefi of ll pths from q λ(q) = minimum numeric eight of ll pths from q interpreted s infinite sums over pth eights; ignore input symols dividing y λ mkes suffi func cnonicl: pth eights sum to 1 No for ne definition of λ! λ(q) = eight of the shortest pth from q, reking ties leicogrphiclly y input string choose just one pth, sed only on its input symols; computtion is simple, ell-defined, independent of (K, ) dividing y λ mkes suffi func cnonicl: shortest pth hs eight 1 A Ne Specific Algorithm Ne definition of λ : λ(q) = eight of the shortest pth from q, reking ties leicogrphiclly y input string Computtion is simple, ell-defined, independent of (K, ) Bredth-first serch ck from finl sttes: c c d finl sttes A Ne Specific Algorithm Ne definition of λ : λ(q) = eight of the shortest pth from q, reking ties leicogrphiclly y input string Computtion is simple, ell-defined, independent of (K, ) Bredth-first serch ck from finl sttes: A Ne Specific Algorithm Ne definition of λ : λ(q) = eight of the shortest pth from q, reking ties lpheticlly on input symols Computtion is simple, ell-defined, independent of (K, ) Bredth-first serch ck from finl sttes: c distnce 1 c d c distnce 2 q :k r c d λ(q) = k λ(r) Compute λ(q) in O(1) time s soon s e visit q. Whole lg. is liner. Fster thn finding min-eight pth àl Mohri. Requires Multiplictive Inverses Requires Multiplictive Inverses Does this definition of λ hve the necessry properties? λ(q) = eight of the shortest pth from q, reking ties lpheticlly on input symols If e regrd λ s pplying to suffi functions: λ(f) = F(min domin(f)) ith pproprite defn of min Shifting: λ(k F) = k λ(f) Trivilly true Quotient:λ(F) is left fctor of λ( -1 F) Finl-quotient: λ(f) is left fctor of F(ε) These re true provided tht (K, ) contins multiplictive inverses. i.e., oky if (K, ) is semigroup; (K,, ) is division semiring. So (K, ) must contin multiplictive inverses (under ). Consider (K, ) = (nonnegtive rels, ddition): :1 λ = 5 :5 2 13
Requires Multiplictive Inverses Requires Multiplictive Inverses So (K, ) must contin multiplictive inverses (under ). Consider (K, ) = (nonnegtive rels, ddition): So (K, ) must contin multiplictive inverses (under ). Consider (K, ) = (nonnegtive rels, ddition): :1 λ = 5 5 :0-3 :6 λ = 5 :0-3 Oops! -3 isn t legl eight. Need to sy (K, ) = (rels, ddition). Then sutrction lys gives n nser. Unlike Mohri, e might get negtive eights in the output DFA... But unlike Mohri, e cn hndle negtive eights in the input DFA (including negtive eight cycles!). Requires Multiplictive Inverses Requires Multiplictive Inverses Ho out trnsducers? (K, ) = (strings, conctention) Must dd multiplictive inverses, vi inverse letters. Ho out trnsducers? (K, ) = (strings, conctention) Must dd multiplictive inverses, vi inverse letters. : λ = y :y z y c z :ε : y y -1 z λ = y y c z Requires Multiplictive Inverses Rel Benefit Other Semirings! Ho out trnsducers? (K, ) = (strings, conctention) Must dd multiplictive inverses, vi inverse letters. :ε :y y y -1 z c z λ = y Cn ctully mke this ork, though no longer O(1) Still rguly simpler thn Mohri But this time e re it sloer in orst cse, not fster s efore Cn eliminte inverse letters fter e minimize Other (K, ) of current interest do hve mult inverses... So e no hve n esy minimiztion lgorithm for them. No lgorithm eisted efore. conditionl rndom fields, rtionl kernels (rel eights, multipliction)? (Lfferty/McCllum/Pereir; Cortes/Hffner/Mohri) (score vectors, ddition)? OT phonology (Ellison) (ojective func & grdient, trining the prmeters of model product-rule multipliction)? (Eisner epecttion semirings) 14
Bck to the Generl Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Quotient: λ(f) is left fctor of λ( -1 F) Finl-quotient: λ(f) is left fctor of F(ε) Ne lgorithm nd Mohri s lgs re specil cses Minimiztion Not Unique In previously studied cses, ll minimum-stte mchines equivlent to given DFA ere essentilly the sme. But the pper gives severl (K, ) here this is not true!? Wht if e don t hve mult. inverses? Does this strtegy ork in every (K, )? Does n pproprite λ lys eist? No! No strtegy lys orks. Minimiztion isn t lys ell-defined! Minimiztion Not Unique In previously studied cses, ll minimum-stte mchines equivlent to given DFA ere essentilly the sme. But the pper gives severl (K, ) here this is not true! Minimiztion Not Unique In previously studied cses, ll minimum-stte mchines equivlent to given DFA ere essentilly the sme. But the pper gives severl (K, ) here this is not true!? Mergeility my not e n equivlence reltion on sttes. Hving common residue my not e n equivlence reltion on suffi functions. Hs to do ith the uniqueness of prime fctoriztion in (K, ). (But hd to generlize notion so didn t ssume s commuttive.) Pper gives necessry nd sufficient conditions... Non-Unique Minimiztion Is Hrd Minimum-stte utomton isn t lys unique. But cn e find one tht hs min # of sttes? No: unfortuntely NP-complete. (reduction from Minimum Clique Prtition) Cn e get close to the minimum? No: Min Clique Prtition is inpproimle in polytime to ithin ny constnt fctor (unless P=NP). So e cn t even e sure of getting ithin fctor of 100 of the smllest possile. Summry of Results Some eight semirings re d : Don t let us minimize uniquely, efficiently, or pproimtely [ even in (it vectors, conjunction) ] Chrcteriztion of good eight semirings Generl minimiztion strtegy for good semirings Find λ... Mohri s lgorithms re specil cses Esy minimiztion lgorithm for division semirings For dditive eights, simpler & fster thn Mohri s Cn pply to trnsducers, ith inverse letters trick Applies in the other semirings of present interest fncy mchine lerning; prmeter trining; optimlity theory 15
FIN Ne definition of λ : λ(q) = eight of the shortest pth from q, reking ties lpheticlly on input symols Rnking of ccepting pths y input string: ε < < < < < geneologicl order on strings e pick the minimum string ccepted from stte q 16