Learning Moore Machines from Input-Output Traces

Lerning Moore Mchines from Input-Output Trces Georgios Gintmidis 1 nd Stvros Tripkis 1,2 1 Alto University, Finlnd 2 UC Berkeley, USA

Motivtion: lerning models from blck boxes Inputs? Lerner Forml Model Outputs Mny pplictions: Verify tht blck-box component is sfe to use Dynmic mlwre nlysis... Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 2 / 32

Lerning FSMs from input-output trces IO-trces Lerned FSM 020 b 0122 bb 0122 b 02220 bb 02220 Lerner q 0 q 3 0 q 1 b 1, b q 2 2 b 2, b Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 3 / 32

Outline 1 Bckground 2 Forml problem definition 3 Relted work 4 Identifiction in the limit 5 Our lerning lgorithms 6 Results 7 Summry & future work Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 4 / 32

Moore mchines 0 q 1 b 1, b q 0 q 3 q 2 b 2, b 2 (I, O, Q, q 0, δ, λ) input lphbet, I = {, b} output lphbet, O = {0, 1, 2} set of sttes, Q = {q 0, q 1, q 2, q 3 } initil stte, q 0 trnsition function, δ : Q I Q output function, λ : Q O Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 5 / 32

Moore mchines 0 q 1 b 1, b q 0 q 3 q 2 b 2, b 2 (I, O, Q, q 0, δ, λ) input lphbet, I = {, b} output lphbet, O = {0, 1, 2} set of sttes, Q = {q 0, q 1, q 2, q 3 } initil stte, q 0 trnsition function, δ : Q I Q output function, λ : Q O By definition, our mchines re deterministic nd complete. Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 5 / 32

Input-output trces q 1 b 1, b 020 q 0 q 3 0 q 2 2 b 2, b b 0122 bb 0122 b 02220 bb 02220 Moore mchine Some I/O trces generted by the mchine Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 6 / 32

Consistency q 1 b 1, b 020 q 0 q 3 0 q 2 2 b 2, b b 0122 bb 0122 b 02220 bb 02220 This mchine is consistent with this set of trces. Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 7 / 32

Consistency 0 r 1 b 1 r 0, b r 3 r 2 2 b 2, b 020 b 0122 bb 0122 b 02220 bb 02220 This mchine is inconsistent with this set of trces. Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 8 / 32

A first ttempt t problem definition Given... Input lphbet, I Output lphbet, O Set of IO-trces, S (the trining set)... find Moore mchine M such tht: M is deterministic M is complete M is consistent with S Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 10 / 32

A trivil solution q 0 b 01 020 b 022 q ɛ 0 b q 2 q b 1 b q b 2, b, b, b Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 11 / 32

A trivil solution q 0 b 01 020 b 022 q ɛ 0 b q 2 q b 1 b q b 2, b, b, b This is clled the prefix-tree mchine. Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 11 / 32

A trivil solution q 0 b 01 020 b 022 q ɛ 0 b q 2 q b 1 b q b 2, b, b, b This is clled the prefix-tree mchine. Not quite solution: mchine incomplete... Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 11 / 32

A trivil solution q 0 b 01 020 b 022 q ɛ 0 b q 2 q b 1 b q b 2, b, b, b... but esily completed with self-loops. Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 12 / 32

Problems with the trivil solution (1) Poor generliztion, due to trivil completion with self-loops The mchine my be consistent with the trining set...... but how ccurte is it on test set? Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 13 / 32

Problems with the trivil solution (1) Poor generliztion, due to trivil completion with self-loops The mchine my be consistent with the trining set...... but how ccurte is it on test set? (2) Lrge number of sttes in the lerned mchine The prefix-tree mchine does not merge sttes t ll. Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 13 / 32

Revised problem definition The LMoMIO problem (Lerning Moore Mchines Input-Output Trces): Given... Input lphbet, I Output lphbet, O Set of IO-trces, S (the trining set)... find Moore mchine M such tht: M is deterministic M is complete M is consistent with S... nd lso: M generlizes well (good ccurcy on -priori unknown test sets) M is smll (few sttes) M is found quickly (good lerning lgorithm complexity) Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 14 / 32

How to mesure ccurcy? We define three metrics: Strong, Medium, Wek test trce mchine output strong cc. medium cc. wek cc. bc 1234 1234 1 1 1 bc 1234 4321 0 0 0 bc 1234 1212 0 1 2 1 2 bc 1234 3434 0 0 1 2 bc 1234 1324 0 1 4 1 2 Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 15 / 32

Relted work ctive A* [Angluin, 1987] pssive exct NP-hrd [Gold, 1978] heuristic K-tils [Biermnn & Feldmn, 1972] Gold's lgorithm [Gold, 1978] RPNI [Oncin & Grci, 1992] Genetic lgorithms Ant colony optimiztion Our work Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 17 / 32

Identifiction in the limit Concept introduced in [Gold, 1967], in the context of forml lnguge lerning Lerning is seen s n infinite process Trining set keeps growing: S 0 S 1 S 2 Every input word is gurnteed to eventully pper in the trining set For ech S i, the lerner outputs mchine M i Identifiction in the limit := lerner outputs the right mchine fter some i Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 19 / 32

Identifiction in the limit Concept introduced in [Gold, 1967], in the context of forml lnguge lerning Lerning is seen s n infinite process Trining set keeps growing: S 0 S 1 S 2 Every input word is gurnteed to eventully pper in the trining set For ech S i, the lerner outputs mchine M i Identifiction in the limit := lerner outputs the right mchine fter some i A good pssive lerning lgorithm must identify in the limit. Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 19 / 32

Chrcteristic smples To prove identifiction in the limit, we use the notion of the Chrcteristic Smple [C. de l Higuer, 2010]: Concept existing for DFAs (deterministic finite utomt) we dpt it to Moore mchines Intuition: set of IO-trces tht covers the mchine (covers ll sttes, ll trnsitions) For miniml Moore mchine M = (I, O, Q, q 0, δ, λ), there exists CS of totl length O( Q 4 I ) Chrteristic Smple Requirement (CSR): A lerning lgorithm stisfies CSR if it stisfies the following: If the trining set S is chrcteristic smple of miniml mchine M, then the lgorithm lerns from S mchine isomorphic to M. CSR cn be shown to imply identifiction in the limit Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 20 / 32

Three lerning lgorithms PTAP - Prefix Tree Acceptor Product PRPNI - Product RPNI MooreMI - Moore Mchine Inference Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 22 / 32

PTAP - Prefix Tree Acceptor Product This is the trivil solution we discussed erlier: q 0 b 01 020 q ɛ q 2 b q b, b b 022 0 b q b 1 2, b Drwbcks: Lrge number of sttes in lerned mchine Poor generliztion / ccurcy, b Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 23 / 32

PRPNI - Product RPNI Observtions: A DFA is specil cse of Moore mchine with binry output (ccept/reject) A Moore mchine cn be encoded s product of log 2 O DFAs Bsed on these observtions, PRPNI works s follows: Uses the RPNI lgorithm [J. Oncin nd P. Grci, 1992], which lerns DFAs Lerns severl DFAs tht encode the lerned Moore mchine Computes product of the lerned DFAs nd completes it Drwbcks: DFAs re lerned seprtely, therefore do not hve sme stte-trnsition structure = stte explosion during product computtion Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 24 / 32

PRPNI - Product RPNI Observtions: A DFA is specil cse of Moore mchine with binry output (ccept/reject) A Moore mchine cn be encoded s product of log 2 O DFAs Bsed on these observtions, PRPNI works s follows: Uses the RPNI lgorithm [J. Oncin nd P. Grci, 1992], which lerns DFAs Lerns severl DFAs tht encode the lerned Moore mchine Computes product of the lerned DFAs nd completes it Drwbcks: DFAs re lerned seprtely, therefore do not hve sme stte-trnsition structure = stte explosion during product computtion Invlid output codes Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 24 / 32

Invlid output codes Output lphbet: O = {0, 1, 2} Binry encoding of O: f = {0 00, 1 01, 2 10} b b b b b q 0 q 1 b b b q 2 r 0 r 1 s 0 s 1 s 2 00 11 00 s 5 s 4 s 3 01 10 01 b b b Invlid output code: 11 does not correspond to ny output symbol Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 25 / 32

MooreMI - Moore Mchine Inference Modified RPNI, tilored to Moore mchine lerning Like PRPNI, lerns severl DFAs tht encode the lerned Moore mchine Unlike PRPNI, lerned DFAs mintin sme stte-trnsition structure Therefore, no stte explosion during product computtion No invlid output codes either Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 26 / 32

Results Theorem 1 All three lgorithms return Moore mchines consistent with the IO-trces received s input. Theorem 2 The MooreMI lgorithm stisfies the chrcteristic smple requirement nd identifies in the limit. Experimentl evlution result: MooreMI is better not just in theory, but lso in prctice Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 28 / 32

Summry Lerning deterministic, complete Moore mchines from input-output trces Chrcteristic smple for Moore mchines Three lgorithms to solve the problem MooreMI lgorithm identifies in the limit Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 30 / 32

Future work Extend to Mely mchines Lerning symbolic mchines Lerning from trces nd forml requirements (e.g. LTL formuls) Industril cse studies Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 31 / 32

Future work Extend to Mely mchines Lerning symbolic mchines Lerning from trces nd forml requirements (e.g. LTL formuls) Industril cse studies Thnk you! Questions? Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 31 / 32

References E. M. Gold. Lnguge identifiction in the limit. Informtion nd Control, 10(5):447-474, 1967. A. W. Biermnn nd J. A. Feldmn. On the synthesis of finite-stte mchines from smples of their behvior. IEEE Trns. Comput., 21(6):592-597, June 1972. E. M. Gold. Complexity of utomton identifiction from given dt. Informtion nd Control, 37(3):302-320, 1978. D. Angluin. Lerning regulr sets from queries nd counterexmples. Inf. Comput., 75(2):87-106, 1987. J. Oncin nd P. Grci. Identifying regulr lnguges in polynomil time. In Advnces in Structurl nd Syntctic Pttern Recognition, pges 99-108, 1992. C. de l Higuer. Grmmticl Inference: Lerning Automt nd Grmmrs. CUP, 2010. Georgios Gintmidis (Alto University) Lerning Moore Mchines from Input-Output Trces December 8, 2016 32 / 32