The Minimization Problem. The Minimization Problem. The Minimization Problem. The Minimization Problem. The Minimization Problem

Similar documents
1 Nondeterministic Finite Automata

Minimal DFA. minimal DFA for L starting from any other

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Regular expressions, Finite Automata, transition graphs are all the same!!

Convert the NFA into DFA

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CMSC 330: Organization of Programming Languages

Let's start with an example:

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Lecture 08: Feb. 08, 2019

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Nondeterminism and Nodeterministic Automata

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

Homework 4. 0 ε 0. (00) ε 0 ε 0 (00) (11) CS 341: Foundations of Computer Science II Prof. Marvin Nakayama

Chapter 2 Finite Automata

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

Lecture 3: Equivalence Relations

Designing finite automata II

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

8 factors of x. For our second example, let s raise a power to a power:

First Midterm Examination

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

Java II Finite Automata I

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

CHAPTER 1 Regular Languages. Contents

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

State Minimization for DFAs

Homework 3 Solutions

Theory of Computation Regular Languages

Finite Automata-cont d

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

FABER Formal Languages, Automata and Models of Computation

1 From NFA to regular expression

CSE396 Prelim I Answer Key Spring 2017

GNFA GNFA GNFA GNFA GNFA

CS 311 Homework 3 due 16:30, Thursday, 14 th October 2010

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

Parse trees, ambiguity, and Chomsky normal form

CS 330 Formal Methods and Models

378 Relations Solutions for Chapter 16. Section 16.1 Exercises. 3. Let A = {0,1,2,3,4,5}. Write out the relation R that expresses on A.

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

CS375: Logic and Theory of Computing

2.4 Linear Inequalities and Interval Notation

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

First Midterm Examination

along the vector 5 a) Find the plane s coordinate after 1 hour. b) Find the plane s coordinate after 2 hours. c) Find the plane s coordinate

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

p-adic Egyptian Fractions

Deterministic Finite Automata

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy:

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Name Ima Sample ASU ID

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

September 13 Homework Solutions

set is not closed under matrix [ multiplication, ] and does not form a group.

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

Lexical Analysis Finite Automate

CISC 4090 Theory of Computation

Formal languages, automata, and theory of computation

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Coalgebra, Lecture 15: Equations for Deterministic Automata

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

ɛ-closure, Kleene s Theorem,

Bases for Vector Spaces

CS 275 Automata and Formal Language Theory

Formal Languages and Automata

Languages & Automata

Myhill-Nerode Theorem

Improper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows:

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

Regular Language. Nonregular Languages The Pumping Lemma. The pumping lemma. Regular Language. The pumping lemma. Infinitely long words 3/17/15

CS 330 Formal Methods and Models

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

Normal Forms for Context-free Grammars

Introduction to Algebra - Part 2

Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem 2/16/15

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

11.1 Finite Automata. CS125 Lecture 11 Fall Motivation: TMs without a tape: maybe we can at least fully understand such a simple model?

Lecture 2e Orthogonal Complement (pages )

Interpreting Integrals and the Fundamental Theorem

Formal Language and Automata Theory (CS21004)

CS375: Logic and Theory of Computing

CS S-12 Turing Machine Modifications 1. When we added a stack to NFA to get a PDA, we increased computational power

3 Regular expressions

Transcription:

Simpler & More Generl Minimiztion for Weighted Finite-Stte Automt Json Eisner Johns Hopkins University My 28, 2003 HLT-NAACL First hlf of tlk is setup - revies pst ork. Second hlf gives outline of the ne results. The Minimiztion Prolem Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Represents the lnguge {,,, } The Minimiztion Prolem The Minimiztion Prolem Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Represents the lnguge {,,, } Represents the lnguge {,,, } The Minimiztion Prolem The Minimiztion Prolem Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Represents the lnguge {,,, } Represents the lnguge {,,, } 1

The Minimiztion Prolem The Minimiztion Prolem Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Here s ht you should orry out: Cn t lys ork ckrd from finl stte like this. A it more complicted ecuse of cycles. Don t orry out it for this tlk. Mergele ecuse they hve the sme suffi lnguge: {,} Mergele ecuse they hve the sme suffi lnguge: {} An equivlence reltion on sttes merge the equivlence clsses The Minimiztion Prolem Input: A DFA (deterministic finite-stte utomton) Output: An equiv. DFA ith s fe sttes s possile Compleity: O( rcs log sttes ) (Hopcroft 1971) Q: Why minimize # sttes, rther thn # rcs? A: Minimizing # sttes lso minimizes # rcs! Q: Wht if the input is n NDFA (nondeterministic)? A: Determinize it first. (could yield eponentil loup ) Q: Ho out minimizing n NDFA to n NDFA? A: Yes, could e eponentilly smller, ut prolem is PSPACE-complete so e don t try. Rel-World NLP: Automt With Weights or Outputs Finite-stte computtion of functions Conctente strings : : z Add scores :3 :2 d:0 7 Multiply proilities :0.3 :0.2 d:1 0.7 d cd z d 5 cd 9 d 0.06 cd 0.14 Rel-World NLP: Automt With Weights or Outputs Wnt to compute functions on strings: Σ* K After ll, e re doing lnguge nd speech! Finite-stte mchines cn often do the jo Esy to uild, esy to comine, run fst Build them ith eighted regulr epressions To clen up the resulting DFA, minimize it to merge redundnt portions This smller mchine is fster to intersect/compose More likely to fit on hnd-held device More likely to fit into cche memory Rel-World NLP: Automt With Weights or Outputs Wnt to compute functions on strings: Σ* K After ll, e re doing lnguge nd speech! Finite-stte mchines cn often do the jo Ho do e minimize such DFAs? Didn t Mohri lredy nser this question? Only for specil cses of the output set K! Is there generl recipe? Wht ne lgorithms cn e cook ith it? 2

Weight Algers Finite-stte computtion of fu Specify eight lger (K, ) Conctente strings Define DFAs over (K, ) : Arcs hve eights in set K : A pth s eight is lso in K: multiply its rc eights ith z Emples: Add scores (strings, conctention) :3 :2 d:0 (scores, ddition) (proilities, multipliction) 7 (score vectors, ddition) OT phonology (rel eights, multipliction) conditionl Multiply rndom proilities fields, rtionl kernels (ojective func & grdient, trining the prmeters of :0.3 model product-rule multipliction) :0.2 d:1 (it vectors, conjunction) memership in multiple lnguges t once 0.7 Weight Algers Specify eight lger (K, ) Define DFAs over (K, ) Arcs hve eights in set K A pth s eight is lso in K: multiply its rc eights ith Q: Semiring is (K,, ). Why ren t you tlking out too? A: Minimiztion is out DFAs. At most one pth per input. So no need to the eights of multiple ccepting pths. Finite-stte computtion of fu Conctente strings : : z Add scores :3 :2 d:0 7 Multiply proilities :0.3 :0.2 d:1 0.7 Shifting Outputs Along Pths Doesn t chnge the function computed: Shifting Outputs Along Pths Doesn t chnge the function computed: : : z d cd z : : z d cd z Shifting Outputs Along Pths Doesn t chnge the function computed: Shifting Outputs Along Pths Doesn t chnge the function computed: : : z d cd z :ε : z d cd z 3

Shifting Outputs Along Pths Doesn t chnge the function computed: Shifting Outputs Along Pths Doesn t chnge the function computed: : : z d cd z :3 :2 d:0 7 d 5 cd 9 Shifting Outputs Along Pths Doesn t chnge the function computed: Shifting Outputs Along Pths Doesn t chnge the function computed: 2 3 :2+1 :3-1 d:0 7-1 6 d 5 cd 9 1 4 :2+2 :3-2 d:0 7-2 5 d 5 cd 9 Shifting Outputs Along Pths Shifting Outputs Along Pths Doesn t chnge the function computed: 0 5 :2+3 :3-3 d:0 7-3 4 d 5 cd 9 : : z d cd z ed u ecd uz 4

Shifting Outputs Along Pths Shifting Outputs Along Pths Stte sucks ck prefi from its out-rcs : : z d cd z ed u ecd uz Stte sucks ck prefi from its out-rcs nd deposits it t end of its in-rcs. : : z d cd z ed u ecd uz Shifting Outputs Along Pths Shifting Outputs Along Pths : : z d cd z ed u ecd uz : : : z d cd z ed u ecd uz n d u() n n cd u() n z Shifting Outputs Along Pths Shifting Outputs Along Pths : : : z d cd z ed u ecd uz : : : z d cd z ed u ecd uz n d u() n n cd u() n z n d u() n n cd u() n z n d u() n n cd u() n z 5

Shifting Outputs Along Pths Shifting Outputs Along Pths : : : z d cd z ed u ecd uz : : : z d cd z ed u ecd uz n d u() n n cd u() n z n d u() n n cd u() n z n d u() n n cd u() n z n d u() n n cd u() n z Shifting Outputs Along Pths (Mohri) Shifting Outputs Along Pths (Mohri) Here, not ll the out-rcs strt ith But ll the out-pths strt ith Do pushck t lter sttes first: : : ε :z ε: ε d: Here, not ll the out-rcs strt ith But ll the out-pths strt ith Do pushck t lter sttes first: no e re ok! : : : z ε: ε Shifting Outputs Along Pths (Mohri) Shifting Outputs Along Pths (Mohri) Here, not ll the out-rcs strt ith But ll the out-pths strt ith Do pushck t lter sttes first: no e re ok! : : ε : z ε: ε Here, not ll the out-rcs strt ith But ll the out-pths strt ith Do pushck t lter sttes first: no e re ok! : : ε : z ε: ε 6

Shifting Outputs Along Pths (Mohri) Actully, push ck t ll sttes t once Shifting Outputs Along Pths (Mohri) Actully, push ck t ll sttes t once At every stte q, compute some λ(q) : : ε ε: ε d: : : ε ε ε: ε d: :z :z Shifting Outputs Along Pths (Mohri) Shifting Outputs Along Pths (Mohri) Actully, push ck t ll sttes t once Add λ(q) to end of q s in-rcs : : ε ε ε : z ε: ε d: Actully, push ck t ll sttes t once Add λ(q) to end of q s in-rcs Remove λ(q) from strt of q s out-rcs : : ε ε :z ε ε: ε d: Shifting Outputs Along Pths (Mohri) Actully, push ck t ll sttes t once Add λ(q) to end of q s in-rcs Remove λ(q) from strt of q s out-rcs q :k r : : ecomes ε : z q ε: ε : λ(q) -1 k λ(r) r Mergele ecuse they ccept the sme suffi lnguge: {,} 7

Still ccept sme suffi lnguge, ut produce different outputs on it : :ε :y :zz :y :zzz :z :ε Still ccept sme suffi lnguge, ut produce different outputs on it : :ε Not mergele - compute different suffi functions: yz or y cd zzz or zzz :y :zz :y :zzz :z :ε Fi y shifting outputs leftrd Fi y shifting outputs leftrd : :ε :y :zz :y : zzz :z :ε : :ε :y :zz : : y zzz :z :ε Fi y shifting outputs leftrd If e do this t ll sttes s efore : :y :zz :z : :y :zz y :z : No mergele - they hve the sme suffi function: yz cd zzz : : y zzz But still no esy y to detect mergeility. :ε : No mergele - they hve the sme suffi function: yz cd zzz : : zzz :ε 8

If e do this t ll sttes s efore If e do this t ll sttes s efore : : No mergele - they hve the sme suffi function: yz cd zzz :y :zz : : y zzz z :ε :ε : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz : : yz zzz :ε :ε No these hve the sme sufffi function too: ε No e cn discover & perform the merges: Tret ech lel :yz s single tomic symol : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz : : yz zzz no these hve sme rc lels :ε :ε so do these ecuse e rrnged for cnonicl plcement of outputs long pths : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz : : yz zzz no these hve sme rc lels :ε :ε so do these ecuse e rrnged for cnonicl plcement of outputs long pths Tret ech lel :yz s single tomic symol Tret ech lel :yz s single tomic symol : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz : : yz zzz no these hve sme rc lels :ε :ε so do these ecuse e rrnged for cnonicl plcement of outputs long pths : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz :yz :zzz no these hve sme rc lels :ε :ε so do these ecuse e rrnged for cnonicl plcement of outputs long pths 9

Tret ech lel :yz s single tomic symol Use uneighted minimiztion lgorithm! : : No mergele - they hve the sme suffi function: yz cd zzz :yz :zzz :yz :zzz :ε :ε Tret ech lel :yz s single tomic symol Use uneighted minimiztion lgorithm! : : No mergele - they hve the sme suffi lnguge: {:yz :ε, :zzz :ε} :yz :zzz :yz :zzz :ε :ε Tret ech lel :yz s single tomic symol Use uneighted minimiztion lgorithm! : : :yz :zzz :yz :zzz :ε :ε Summry of eighted minimiztion lgorithm: 1. Compute λ(q) t ech stte q 2. Push ech λ(q) ck through stte q; this chnges rc eights 3. Merge sttes vi uneighted minimiztion Step 3 merges sttes Step 2 llos more sttes to merge t step 3 Step 1 controls ht step 2 does preferly, to give sttes the sme suffi function henever possile So define λ(q) crefully t step 1! Mohri s Algorithms (1997, 2000) Mohri treted to versions of (K, ) (K, ) = (strings, conctention) λ(q) = longest common prefi of ll pths from q Rther tricky to find : λ = : ε :z ε:ε d: Mohri s Algorithms (1997, 2000) Mohri treted to versions of (K, ) (K, ) = (strings, conctention) λ(q) = longest common prefi of ll pths from q Rther tricky to find (K, ) = (nonnegtive rels, ddition) λ(q) = minimum eight of ny pth from q Find it y Dijkstr s shortest-pth lgorithm λ = 8 :2 :7 d:2 2 d:2 e:3 :13 ε:2 d:99 10

Mohri s Algorithms (1997, 2000) Mohri treted to versions of (K, ) (K, ) = (strings, conctention) λ(q) = longest common prefi of ll pths from q Rther tricky to find (K, ) = (nonnegtive rels, ddition) λ(q) = minimum eight of ny pth from q Find it y Dijkstr s shortest-pth lgorithm λ = 8 :1 :2 d:0 8 0 d:0 e:3 ε:0 :13 d:95 Mohri s Algorithms (1997, 2000) Mohri treted to versions of (K, ) (K, ) = (strings, conctention) λ(q) = longest common prefi of ll pths from q Rther tricky to find (K, ) = (nonnegtive rels, ddition) λ(q) = minimum eight of ny pth from q Find it y Dijkstr s shortest-pth lgorithm λ = 8 :10 :1 d:0 0 d:0 e:11 :13 ε:0 d:95 Mohri s Algorithms (1997, 2000) Mohri treted to versions of (K, ) (K, ) = (strings, conctention) λ(q) = longest common prefi of ll pths from q Rther tricky to find (K, ) = (nonnegtive rels, ddition) λ(q) = minimum eight of ny pth from q Find it y Dijkstr s shortest-pth lgorithm In oth cses: λ(q) = sum over infinite set of pth eights must define this sum nd n lgorithm to compute it doesn t generlize utomticlly to other (K, )... Mohri s Algorithms (1997, 2000) (rel eights, multipliction)? (score vectors, ddition)? (ojective func & grdient, product-rule multipliction)? e.g., ht if e lloed negtive rels? Then minimum might not eist! 2 (K, ) = (nonnegtive rels, ddition) λ(q) = minimum eight of ny pth from q -3 Find it y Dijkstr s lgorithm In oth cses: λ(q) = sum over infinite set of pth eights must define this sum nd n lgorithm to compute it doesn t generlize utomticlly to other (K, )... Generlizing the Strtegy End of ckground mteril. No e cn sketch the ne results! Wnt to minimize DFAs in ny (K, ) Given (K, ) Just need definition of λ... then use generl lg. λ should etrct n pproprite left fctor from stte q s suffi function F q : Σ* K Rememer, F q is the function tht the utomton ould compute if stte q ere the strt stte Wht properties must λ hve to gurntee tht e get the minimum equivlent mchine? 11

Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Quotient: λ(f) is left fctor of λ( -1 F) Finl-quotient: λ(f) is left fctor of F(ε) Then pushing + merging is gurnteed to minimize the mchine. Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Suffi functions cn e ritten s F nd yy F: :z :yyz :z :yyz Shifting property sys: When e remove the prefies λ( F) nd λ(yy F) e ill remove nd yy respectively Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Suffi functions cn e ritten s F nd yy F: : z : z yy : z : z Shifting property sys: When e remove the prefies λ( F) nd λ(yy F) e ill remove nd yy respectively leving ehind common residue. Actully, remove λ(f) nd yy λ(f). Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Suffi functions cn e ritten s F nd yy F: : : z yyz : : Shifting property sys: When e remove the prefies λ( F) nd λ(yy F) e ill remove nd yy respectively leving ehind common residue. Actully, remove λ(f) nd yy λ(f). Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Quotient: λ(f) is left fctor of λ( -1 F) q :k r ecomes q : λ(f q ) -1 k λ(f r ) = λ(f q ) -1 λ(k F r ) = λ(f q ) -1 λ( -1 F q ) Quotient property sys tht this quotient eists even if λ(f q ) doesn t hve multiplictive inverse. r Generlizing the Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Quotient: λ(f) is left fctor of λ( -1 F) Finl-quotient: λ(f) is left fctor of F(ε) Gurntees e cn find finl-stte stopping eights. If e didn t hve this se cse, e couldn t prove: λ(f) is left fctor of every output in rnge(f). Then pushing + merging is gurnteed to minimize. 12

A Ne Specific Algorithm Mohri s lgorithms instntite this strtegy. They use prticulr definitions of λ. λ(q) = longest common string prefi of ll pths from q λ(q) = minimum numeric eight of ll pths from q interpreted s infinite sums over pth eights; ignore input symols dividing y λ mkes suffi func cnonicl: pth eights sum to 1 No for ne definition of λ! λ(q) = eight of the shortest pth from q, reking ties leicogrphiclly y input string choose just one pth, sed only on its input symols; computtion is simple, ell-defined, independent of (K, ) dividing y λ mkes suffi func cnonicl: shortest pth hs eight 1 A Ne Specific Algorithm Ne definition of λ : λ(q) = eight of the shortest pth from q, reking ties leicogrphiclly y input string Computtion is simple, ell-defined, independent of (K, ) Bredth-first serch ck from finl sttes: c c d finl sttes A Ne Specific Algorithm Ne definition of λ : λ(q) = eight of the shortest pth from q, reking ties leicogrphiclly y input string Computtion is simple, ell-defined, independent of (K, ) Bredth-first serch ck from finl sttes: A Ne Specific Algorithm Ne definition of λ : λ(q) = eight of the shortest pth from q, reking ties lpheticlly on input symols Computtion is simple, ell-defined, independent of (K, ) Bredth-first serch ck from finl sttes: c distnce 1 c d c distnce 2 q :k r c d λ(q) = k λ(r) Compute λ(q) in O(1) time s soon s e visit q. Whole lg. is liner. Fster thn finding min-eight pth àl Mohri. Requires Multiplictive Inverses Requires Multiplictive Inverses Does this definition of λ hve the necessry properties? λ(q) = eight of the shortest pth from q, reking ties lpheticlly on input symols If e regrd λ s pplying to suffi functions: λ(f) = F(min domin(f)) ith pproprite defn of min Shifting: λ(k F) = k λ(f) Trivilly true Quotient:λ(F) is left fctor of λ( -1 F) Finl-quotient: λ(f) is left fctor of F(ε) These re true provided tht (K, ) contins multiplictive inverses. i.e., oky if (K, ) is semigroup; (K,, ) is division semiring. So (K, ) must contin multiplictive inverses (under ). Consider (K, ) = (nonnegtive rels, ddition): :1 λ = 5 :5 2 13

Requires Multiplictive Inverses Requires Multiplictive Inverses So (K, ) must contin multiplictive inverses (under ). Consider (K, ) = (nonnegtive rels, ddition): So (K, ) must contin multiplictive inverses (under ). Consider (K, ) = (nonnegtive rels, ddition): :1 λ = 5 5 :0-3 :6 λ = 5 :0-3 Oops! -3 isn t legl eight. Need to sy (K, ) = (rels, ddition). Then sutrction lys gives n nser. Unlike Mohri, e might get negtive eights in the output DFA... But unlike Mohri, e cn hndle negtive eights in the input DFA (including negtive eight cycles!). Requires Multiplictive Inverses Requires Multiplictive Inverses Ho out trnsducers? (K, ) = (strings, conctention) Must dd multiplictive inverses, vi inverse letters. Ho out trnsducers? (K, ) = (strings, conctention) Must dd multiplictive inverses, vi inverse letters. : λ = y :y z y c z :ε : y y -1 z λ = y y c z Requires Multiplictive Inverses Rel Benefit Other Semirings! Ho out trnsducers? (K, ) = (strings, conctention) Must dd multiplictive inverses, vi inverse letters. :ε :y y y -1 z c z λ = y Cn ctully mke this ork, though no longer O(1) Still rguly simpler thn Mohri But this time e re it sloer in orst cse, not fster s efore Cn eliminte inverse letters fter e minimize Other (K, ) of current interest do hve mult inverses... So e no hve n esy minimiztion lgorithm for them. No lgorithm eisted efore. conditionl rndom fields, rtionl kernels (rel eights, multipliction)? (Lfferty/McCllum/Pereir; Cortes/Hffner/Mohri) (score vectors, ddition)? OT phonology (Ellison) (ojective func & grdient, trining the prmeters of model product-rule multipliction)? (Eisner epecttion semirings) 14

Bck to the Generl Strtegy Wht properties must the λ function hve? For ll F: Σ* K, k K, Σ: Shifting: λ(k F) = k λ(f) Quotient: λ(f) is left fctor of λ( -1 F) Finl-quotient: λ(f) is left fctor of F(ε) Ne lgorithm nd Mohri s lgs re specil cses Minimiztion Not Unique In previously studied cses, ll minimum-stte mchines equivlent to given DFA ere essentilly the sme. But the pper gives severl (K, ) here this is not true!? Wht if e don t hve mult. inverses? Does this strtegy ork in every (K, )? Does n pproprite λ lys eist? No! No strtegy lys orks. Minimiztion isn t lys ell-defined! Minimiztion Not Unique In previously studied cses, ll minimum-stte mchines equivlent to given DFA ere essentilly the sme. But the pper gives severl (K, ) here this is not true! Minimiztion Not Unique In previously studied cses, ll minimum-stte mchines equivlent to given DFA ere essentilly the sme. But the pper gives severl (K, ) here this is not true!? Mergeility my not e n equivlence reltion on sttes. Hving common residue my not e n equivlence reltion on suffi functions. Hs to do ith the uniqueness of prime fctoriztion in (K, ). (But hd to generlize notion so didn t ssume s commuttive.) Pper gives necessry nd sufficient conditions... Non-Unique Minimiztion Is Hrd Minimum-stte utomton isn t lys unique. But cn e find one tht hs min # of sttes? No: unfortuntely NP-complete. (reduction from Minimum Clique Prtition) Cn e get close to the minimum? No: Min Clique Prtition is inpproimle in polytime to ithin ny constnt fctor (unless P=NP). So e cn t even e sure of getting ithin fctor of 100 of the smllest possile. Summry of Results Some eight semirings re d : Don t let us minimize uniquely, efficiently, or pproimtely [ even in (it vectors, conjunction) ] Chrcteriztion of good eight semirings Generl minimiztion strtegy for good semirings Find λ... Mohri s lgorithms re specil cses Esy minimiztion lgorithm for division semirings For dditive eights, simpler & fster thn Mohri s Cn pply to trnsducers, ith inverse letters trick Applies in the other semirings of present interest fncy mchine lerning; prmeter trining; optimlity theory 15

FIN Ne definition of λ : λ(q) = eight of the shortest pth from q, reking ties lpheticlly on input symols Rnking of ccepting pths y input string: ε < < < < < geneologicl order on strings e pick the minimum string ccepted from stte q 16