Learning probabilistic finite automata

Similar documents
Gold s algorithm. Acknowledgements. Why would this be true? Gold's Algorithm. 1 Key ideas. Strings as states

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

Learning Moore Machines from Input-Output Traces

Nondeterminism and Nodeterministic Automata

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

CS375: Logic and Theory of Computing

Designing finite automata II

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

First Midterm Examination

CS 275 Automata and Formal Language Theory

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Minimal DFA. minimal DFA for L starting from any other

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Deterministic Finite Automata

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

CMSC 330: Organization of Programming Languages

1.4 Nonregular Languages

Non-Deterministic Finite Automata

Chapter 2 Finite Automata

Theory of Computation Regular Languages

Non-deterministic Finite Automata

Convert the NFA into DFA

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

Non Deterministic Automata. Formal Languages and Automata - Yonsei CS 1

Formal languages, automata, and theory of computation

CS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa

Reinforcement Learning

First Midterm Examination

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Non-deterministic Finite Automata

Inductive and statistical learning of formal grammars

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

More on automata. Michael George. March 24 April 7, 2014

1 Nondeterministic Finite Automata

New Expansion and Infinite Series

Nondeterminism. Nondeterministic Finite Automata. Example: Moves on a Chessboard. Nondeterminism (2) Example: Chessboard (2) Formal NFA

Lexical Analysis Finite Automate

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Chapter 14. Matrix Representations of Linear Transformations

Part 5 out of 5. Automata & languages. A primer on the Theory of Computation. Last week was all about. a superset of Regular Languages

Reinforcement learning II

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

CS 330 Formal Methods and Models

Worked out examples Finite Automata

5.1 Definitions and Examples 5.2 Deterministic Pushdown Automata

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

This lecture covers Chapter 8 of HMU: Properties of CFLs

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Regular expressions, Finite Automata, transition graphs are all the same!!

Automata and Languages

Finite Automata-cont d

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

CISC 4090 Theory of Computation

LECTURE NOTE #12 PROF. ALAN YUILLE

Context-Free Grammars and Languages

FABER Formal Languages, Automata and Models of Computation

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

Fundamentals of Computer Science

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

CSE : Exam 3-ANSWERS, Spring 2011 Time: 50 minutes

CS375: Logic and Theory of Computing

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Lecture 6 Regular Grammars

Model Reduction of Finite State Machines by Contraction

Turing Machines Part One

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Bayesian Networks: Approximate Inference

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

Formal Language and Automata Theory (CS21004)

1.3 Regular Expressions

2D1431 Machine Learning Lab 3: Reinforcement Learning

State Minimization for DFAs

Recursively Enumerable and Recursive. Languages

Administrivia CSE 190: Reinforcement Learning: An Introduction

Closure Properties of Regular Languages

Where did dynamic programming come from?

On Determinisation of History-Deterministic Automata.

CS 188: Artificial Intelligence Fall Announcements

Polynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230

19 Optimal behavior: Game theory

NFAs continued, Closure Properties of Regular Languages

Transcription:

Lerning probbilistic finite utomt Colin de l Higuer University of Nntes Zdr, August 00

Acknowledgements Lurent Miclet, Jose Oncin, Tim Otes, Rfel Crrsco, Pco Cscubert, Rémi Eyrud, Philippe Ezequel, Henning Fernu, Thierry Murgue, Frnck Thollrd, Enrique Vidl, Frédéric Tntini,... List is necessrily incomplete. Excuses to those tht hve been forgotten http://pgesperso.lin.univ-nntes.fr/~cdlh/slides/ Chpters 5 nd 6 Zdr, August 00

Outline. PFA. Distnces between distributions 3. FFA 4. Bsic elements for lerning PFA 5. ALERGIA 6. MDI nd DSAI 7. Open questions Zdr, August 00 3

PFA Probbilistic finite (stte) utomt Zdr, August 00 4

Prcticl motivtions (Computtionl biology, speech recognition, web services, utomtic trnsltion, imge processing ) A lot of positive dt Not necessrily ny negtive dt No idel trget Noise Zdr, August 00 5

The grmmr induction problem, revisited The dt consists of positive strings, «generted» following n unknown distribution The gol is now to find (lern) this distribution Or the grmmr/utomton tht is used to generte the strings Zdr, August 00 6

Success of the probbilistic models n-grms Hidden Mrkov Models Probbilistic grmmrs Zdr, August 00 7

4 b 3 b 3 4 b 3 DPFA: Deterministic Probbilistic Finite Automton Zdr, August 00 8

b 3 4 b 3 4 b 3 Pr A (bb)= 3 4 Zdr, August 00 9 3 3 4 =

0.7 0. b 0.9 0.35 0.7 0.3 b 0.3 b 0.65 Zdr, August 00 0

4 b b 3 3 4 b 3 PFA: Probbilistic Finite (stte) Automton Zdr, August 00

ε 4 b ε 3 ε 3 4 b 3 ε-pfa: Probbilistic Finite (stte) Automton with ε-trnsitions Zdr, August 00

How useful re these utomt? They cn define distribution over Σ * They do not tell us if string belongs to lnguge They re good cndidtes for grmmr induction There is (ws?) not tht much written theory Zdr, August 00 3

Bsic references The HMM literture Azri Pz 973: Introduction to probbilistic utomt Chpter 5 of my book Probbilistic Finite-Stte Mchines, Vidl, Thollrd, cdlh, Cscubert & Crrsco Grmmticl Inference ppers Zdr, August 00 4

Automt, definitions Let D be distribution over Σ * 0 Pr D (w) w Σ* Pr D (w)= Zdr, August 00 5

A Probbilistic Finite (stte) Automton is <Q, Σ, I P, F P, δ P > Q set of sttes I P : Q [0;] F P : Q [0;] δ P : Q Σ Q [0;] Zdr, August 00 6

Wht does PFA do? It defines the probbility of ech string w s the sum (over ll pths reding w) of the products of the probbilities Pr A (w)= π i pths(w) Pr(π i ) π i =q i 0 i q i i in q in Pr(π i )=I P (q i 0 ) F P (q in ) ij δ P (q ij-, ij,q ij ) Note tht if λ-trnsitions re llowed the sum my be infinite Zdr, August 00 7

0.7 0.3 0. 0. b 0.4 b b 0.4 0.45 0. 0.35 Pr(b) = 0.7*0.4*0.* +0.7*0.4*0.45*0. = 0.08+0.05=0.053 Zdr, August 00 8

non deterministic PFA: mny initil sttes/only one initil stte n λ-pfa: PFA with λ-trnsitions nd perhps mny initil sttes DPFA: deterministic PFA Zdr, August 00 9

Consistency A PFA is consistent if Pr A (Σ * )= x Σ * 0 Pr A (x) Zdr, August 00 0

Consistency theorem A is consistent if every stte is useful (ccessible nd coccessible) nd q Q F P (q) + q Q, Σ δ P (q,,q )= Zdr, August 00

Equivlence between models Equivlence between PFA nd HMM But the HMM usully define distributions over ech Σ n Zdr, August 00

A footbll HMM win drw lose win drw lose win drw lose 4 4 4 4 4 4 4 4 4 4 3 4 3 4 Zdr, August 00 3

Equivlence between PFA with λ-trnsitions nd PFA without λ-trnsitions cdlh 003, Hnneforth & cdlh 009 Mny initil sttes cn be trnsformed into one initil stte with λ-trnsitions; λ-trnsitions cn be removed in polynomil time; Strtegy: number the sttes eliminte first λ-loops, then the trnsitions with highest rnking rrivl stte Zdr, August 00 4

PFA re strictly more powerful thn DPFA Folk theorem (nd) You cn t even tell in dvnce if you re in good cse or not (see: Denis & Esposito 004) Zdr, August 00 5

Exmple 3 3 This distribution cnnot be modelled by DPFA Zdr, August 00 6

Wht does DPFA over Σ ={} look like? And with this rchitecture you cnnot generte the previous one Zdr, August 00 7

Prsing issues Computtion of the probbility of string or of set of strings Deterministic cse Simple: pply definitions Techniclly, rther sum up logs: this is esier, sfer nd cheper Zdr, August 00 8

0.7 0. b 0.9 0.35 0.7 0.3 b 0.3 b 0.65 Pr(b) = 0.7*0.9*0.35*0 = 0 Pr(bb) = 0.7*0.9*0.65*0.3 = 0.85 Zdr, August 00 9

Non-deterministic cse 0.7 0.3 0. 0. b 0.4 b b 0.4 0.45 0. 0.35 Pr(b) = 0.7*0.4*0.* +0.7*0.4*0.45*0. = 0.08+0.05=0.053 Zdr, August 00 30

In the literture The computtion of the probbility of string is by dynmic progrmming : O(n m) lgorithms: Bckwrd nd Forwrd If we wnt the most probble derivtion to define the probbility of string, then we cn use the Viterbi lgorithm Zdr, August 00 3

Forwrd lgorithm A[i,j]=Pr(q i.. j ) (The probbility of being in stte q i fter hving red.. j ) A[i,0]=I P (q i ) A[i,j+]= k Q A[k,j]. δ P (q k, j+,q i ) Pr(.. n )= k Q A[k,n]. F P (q k ) Zdr, August 00 3

Distnces Wht for? Estimte the qulity of lnguge model Hve n indictor of the convergence of lerning lgorithms Construct kernels Zdr, August 00 33

. Entropy How mny bits do we need to correct our model? Two distributions over Σ * : D et D Kullbck Leibler divergence (or reltive entropy) between D nd D : w Σ* Pr D (w) log Pr D (w)-log Pr D (w) Zdr, August 00 34

. Perplexity The ide is to llow the computtion of the divergence, but reltively to test set (S) An pproximtion (sic) is perplexity: inverse of the geometric men of the probbilities of the elements of the test set Zdr, August 00 35

w S Pr D (w) -/ S = S w S Pr D (w) Problem if some probbility is null... Zdr, August 00 36

Why multiply () We re trying to compute the probbility of independently drwing the different strings in set S Zdr, August 00 37

Why multiply? () Suppose we hve two predictors for coin toss Predictor : heds 60%, tils 40% Predictor : heds 00% The tests re H: 6, T: 4 Arithmetic men P: 36%+6%=0,5 P: 0,6 Predictor is the better predictor ;-) Zdr, August 00 38

.3 Distnce d d (D, D )= w Σ * (Pr D (w)-pr D (w)) Cn be computed in polynomil time if D nd D re given by PFA (Crrsco & cdlh 00) This lso mens tht equivlence of PFA is in P Zdr, August 00 39

3 FFA Frequency Finite (stte) Automt Zdr, August 00 40

A lerning smple is multiset Strings pper with frequency (or multiplicity) S={λ (3), (4), b (), bb (), bb (3), bb ()} Zdr, August 00 4

DFFA A deterministic frequency finite utomton is DFA with frequency function returning positive integer for every stte nd every trnsition, nd for entering the initil stte such tht the sum of wht enters is equl to wht exits nd the sum of wht hlts is equl to wht strts Zdr, August 00 4

Exmple : : 6 3 b : 5 b: 3 : 5 b: 4 Zdr, August 00 43

From DFFA to DPFA Frequencies become reltive frequencies by dividing by sum of exiting frequencies : /6 : /7 6/6 3/3 /6 /7 b: 5/3 b: 3/6 : 5/3 b: 4/7 Zdr, August 00 44

From DFA nd smple to DFFA S = {λ,, b, bbb, bbbb, bbbb} : : 6 3 b: 5 b: 3 : 5 b: 4 Zdr, August 00 45

Note Another smple my led to the sme DFFA Doing the sme with NFA is much hrder problem Typiclly wht lgorithm Bum-Welch (EM) hs been invented for Zdr, August 00 46

The frequency prefix tree cceptor The dt is multi-set The FTA is the smllest tree-like FFA consistent with the dt Cn be trnsformed into PFA if needed Zdr, August 00 47

From the smple to the FTA FTA(S) 4 3 :7 b:4 :4 4 :6 : b: b: : b: : b:4 3 : : : S={λ (3), (4), b (), bb (), bb (3), bb ()} Zdr, August 00 48

Red, Blue nd White sttes -Red sttes re confirmed sttes -Blue sttes re the (non Red) successors of the Red sttes -White sttes re the others b b b b Sme s with DFA nd wht RPNI does Zdr, August 00 49

Merge nd fold :6 Suppose we decide to merge with stte λ b 00 60 :0 0 b:6 :6 6 b:4 :4 :4 4 b:4 b:9 9 Zdr, August 00 50

Merge nd fold b:4 :6 First disconnect nd reconnect to λ b 00 60 :0 0 b:6 :6 6 :4 :4 4 b:4 b:9 9 Zdr, August 00 5

Merge nd fold b:4 Then fold :6 00 60 :0 0 b:6 b:4 :6 :4 b:9 6 9 :4 4 Zdr, August 00 5

Merge nd fold b:4 fter folding :6 00 60 :0 0 b:30 :0 0 :4 4 b:9 9 Zdr, August 00 53

Stte merging lgorithm A=FTA(S); Blue ={δ(q I,): Σ }; Red ={q I } While Blue do choose q from Blue such tht Freq(q) t 0 if p Red: d(a p,a q ) is smll then A = merge_nd_fold(a,p,q) else Red = Red {q} Blue = {δ(q,): q Red} {Red} Zdr, August 00 54

The rel question How do we decide if d(a p,a q ) is smll? Use distnce Be ble to compute this distnce If possible updte the computtion esily Hve properties relted to this distnce Zdr, August 00 55

Deciding if two distributions re similr If the two distributions re known, equlity cn be tested Distnce (L norm) between distributions cn be exctly computed But wht if the two distributions re unknown? Zdr, August 00 56

Tking decisions :6 Suppose we wnt to merge with stte λ b 00 60 :0 0 b:6 :6 6 b:4 :4 :4 4 b:4 b:9 9 Zdr, August 00 57

Tking decisions 60 :6 :0 b:4 0 b:6 Yes if the two distributions induced re similr :6 :4 6 :4 4 b:4 b:9 9 :4 :4 4 b:4 b:9 9 Zdr, August 00 58

5 Alergi Zdr, August 00 59

Alergi s test D D if x Pr D (x) Pr D (x) Esier to test: Pr (λ)=pr (λ) D D Σ Pr (Σ*)=Pr (Σ*) D D And do this recursively! Of course, do it on frequencies Zdr, August 00 60

Zdr, August 00 6 Hoeffding bounds n f n f n f γ α γ ln. + < n n γ indictes if the reltive frequencies nd re sufficiently close n f

A run of Alergi Our lerning multismple S={λ(490), (8), b(70), (3), b(4), b(38), bb(4), (8), b(0), b(0), bb(4), b(9), bb(4), bb(3), bbb(6), (), b(), b(3), bb(), b(), bb(), bb(), bbb(), b(), bb(), bb(), bbb(), bb(), bbb(), bbb(), (), b(), b(), b(), bb(), bb(), bb(), bbb()} Zdr, August 00 6

Prmeter α is rbitrrily set to 0.05. We choose 30 s vlue for threshold t 0. Note tht for the blue stte who hve frequency less thn the threshold, specil merging opertion tkes plce Zdr, August 00 63

000 490 :57 8 b : 70 70 : 64 : 57 b : 6 3 38 : 5 b : 8 b : 65 : 4 4 4 b : 9 : 3 b : 6 :5 b : 7 8 0 0 4 9 4 3 6 : 4 b : 3 : 5 b : 3 : b : : 4 b : : b : : b : : b : : Zdr, August 00 64 3 b b b

Cn we merge λ nd? Compre λ nd, Σ* nd Σ*, bσ* nd bσ* 490/000 with 8/57, 57/000 with 64/57, 53/000 with 65/57,.... All tests return true Zdr, August 00 65

000 490 Merge : 57 8 b: 70 70 : 64 : 57 b: 6 3 38 : 5 b: 8 b: 65 : 4 4 4 b: 9 : 3 b: 6 : 5 b: 7 8 0 0 4 9 4 3 6 : 4 b : 3 : 5 b : 3 : b : : 4 b : : b : : b : : b : : Zdr, August 00 66 3 b b b

And fold :34 000 660 b: 340 : 77 5 b: 38 5 0 : 6 b: 9 : 0 b: 8 7 6 7 : b: : b: : b: : Zdr, August 00 67

Next merge? λ with b? :34 000 660 b: 340 5 : 77 b: 38 5 0 : 6 b: 9 : 0 b: 8 7 6 7 : b: : b: : b: : Zdr, August 00 68

Cn we merge λ nd b? Compre λ nd b, Σ* nd bσ*, bσ* nd bbσ* 660/34 nd 5/340 re different (giving γ= 0.6) On the other hnd n + n. ln α = 0. Zdr, August 00 69

Promotion :34 000 660 b: 340 5 : 77 b: 38 5 0 : 6 b: 9 :0 b: 8 7 6 7 : b : : b : : b : : Zdr, August 00 70

Merge : 34 : 77 660 b: 340 5 b: 38 5 0 : 6 b: 9 : 0 b: 8 7 6 7 : b : : b : : b : : Zdr, August 00 7

And fold :34 : 95 000 660 b: 340 9 b: 49 : 9 7 : b: b: 9 8 : Zdr, August 00 7

Merge :34 : 95 b: 340 000 660 b: 49 5 9 : 7 : b: b: 9 8 : Zdr, August 00 73

And fold : 354 : 96 b: 35 000 698 30 b: 49 As PFA :.354 :.096 b:.35.698.30 b:.049 Zdr, August 00 74

Conclusion nd logic Alergi builds DFFA in polynomil time Alergi cn identify DPFA in the limit with probbility No good definition of Alergi s properties Zdr, August 00 75

6 DSAI nd MDI Why not chnge the criterion? Zdr, August 00 76

Criterion for DSAI Using distinguishble string Use norm L Two distributions re different if there is string with very different probbility Such string is clled μ-distinguishble Question becomes: Is there string x such tht Pr A,q (x)-pr A,q (x) >μ Zdr, August 00 77

(much more to DSAI) D. Ron, Y. Singer, nd N. Tishby. On the lernbility nd usge of cyclic probbilistic finite utomt. In Proceedings of Colt 995, pges 3 40, 995. PAC lernbility results, in the cse where trgets re cyclic grphs Zdr, August 00 78

Criterion for MDI MDL inspired heuristic Criterion is: does the reduction of the size of the utomton compenste for the increse in preplexity? F. Thollrd, P. Dupont, nd C. de l Higuer. Probbilistic Df inference using Kullbck-Leibler divergence nd minimlity. In Proceedings of the 7th Interntionl Conference on Mchine Lerning, pges 975 98. Morgn Kufmnn, Sn Frncisco, CA, 000 Zdr, August 00 79

7 Conclusion nd open questions Zdr, August 00 80

A good cndidte to lern NFA is DEES Never hs been chllenge, so stte of the rt is still uncler Lots of room for improvement towrds probbilistic trnsducers nd probbilistic context-free grmmrs Zdr, August 00 8

Appendix Stern Brocot trees Identifiction of probbilities If we were ble to discover the structure, how do we identify the probbilities? Zdr, August 00 8

By estimtion: the edge is used 50 times out of 3000 pssges through the stte : 3000 50 3000 Zdr, August 00 83

Stern-Brocot trees: (Stern 858, Brocot 860) Cn be constructed from two simple djcent frctions by the «men» opertion c +c m b d = b+d Zdr, August 00 84

0 0 3 3 3 3 3 3 4 5 5 4 4 5 5 4 3 3 Zdr, August 00 85

Ide: Insted of returning c(x)/n, serch the Stern-Brocot tree to find good simple pproximtion of this vlue. Zdr, August 00 86

Iterted Logrithm: With probbility, for co-finite number of vlues of n we hve: c(x) - < λ log log n n b n λ> Zdr, August 00 87