A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data

Similar documents
1 Nondeterministic Finite Automata

Minimal DFA. minimal DFA for L starting from any other

Convert the NFA into DFA

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Designing finite automata II

Lecture 08: Feb. 08, 2019

Nondeterminism and Nodeterministic Automata

2.4 Linear Inequalities and Interval Notation

Parse trees, ambiguity, and Chomsky normal form

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Model Reduction of Finite State Machines by Contraction

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

CHAPTER 1 Regular Languages. Contents

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Formal languages, automata, and theory of computation

Formal Languages and Automata

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Lecture 09: Myhill-Nerode Theorem

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

p-adic Egyptian Fractions

3 Regular expressions

Homework 3 Solutions

First Midterm Examination

Bayesian Networks: Approximate Inference

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

Regular expressions, Finite Automata, transition graphs are all the same!!

Homework Solution - Set 5 Due: Friday 10/03/08

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

CS 275 Automata and Formal Language Theory

Inductive and statistical learning of formal grammars

Review of Gaussian Quadrature method

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Bases for Vector Spaces

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

CMSC 330: Organization of Programming Languages

First Midterm Examination

More on automata. Michael George. March 24 April 7, 2014

Deterministic Finite Automata

Lecture 9: LTL and Büchi Automata

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

1.4 Nonregular Languages

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

State Minimization for DFAs

Finite Automata-cont d

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Coalgebra, Lecture 15: Equations for Deterministic Automata

Name Ima Sample ASU ID

Tutorial Automata and formal Languages

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Learning Moore Machines from Input-Output Traces

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

Harvard University Computer Science 121 Midterm October 23, 2012

Some Theory of Computation Exercises Week 1

DFA minimisation using the Myhill-Nerode theorem

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

CS 311 Homework 3 due 16:30, Thursday, 14 th October 2010

Worked out examples Finite Automata

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Chapter 2 Finite Automata

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

CS 275 Automata and Formal Language Theory

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

Linear Inequalities. Work Sheet 1

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

1.3 Regular Expressions

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

Review of Calculus, cont d

1 From NFA to regular expression

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

Hybrid Control and Switched Systems. Lecture #2 How to describe a hybrid system? Formal models for hybrid system

FABER Formal Languages, Automata and Models of Computation

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy:

Context-Free Grammars and Languages

Section 6.1 Definite Integral

Lecture 3: Equivalence Relations

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Let's start with an example:

CS 330 Formal Methods and Models

Math 1B, lecture 4: Error bounds for numerical methods

The size of subsequence automaton

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Chapter 1, Part 1. Regular Languages. CSC527, Chapter 1, Part 1 c 2012 Mitsunori Ogihara 1

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Transcription:

A likelihood-rtio test for identifying proilistic deterministic rel-time utomt from positive dt Sicco Verwer 1, Mthijs de Weerdt 2, nd Cees Witteveen 2 1 Eindhoven University of Technology 2 Delft University of Technology s.verwer@tue.nl,{m.m.deweerdt,c.witteveen}@tudelft.nl Astrct. We dpt n lgorithm (RTI) for identifying (lerning) deterministic rel-time utomton (DRTA) to the setting of positive timed strings (or time-stmped event sequences). An DRTA cn e seen s deterministic finite stte utomton (DFA) with time constrints. Becuse DRTAs model time using numers, they cn e exponentilly more compct thn equivlent DFA models tht model time using sttes. We use new likelihood-rtio sttisticl test for checking consistency in the RTI lgorithm. The result is the RTI+ lgorithm, which stnds for rel-time identifiction from positive dt. RTI+ is n efficient lgorithm for identifying DRTAs from positive dt. We show using rtificil dt tht RTI+ is cple of identifying sufficiently lrge DRTAs in order to identify rel-world rel-time systems. 1 Introduction In previous work [11], we descried the RTI lgorithm for identifying (lerning) deterministic rel-time utomt (DRTAs) from leled dt, i.e., from n input smple S = (S +, S ). The RTI lgorithm is sed on the currently estperforming lgorithm for the identifiction of deterministic finite stte utomt (DFAs), clled evidence-driven stte-merging (ESDM) [9]. The only difference etween DFAs nd DRTAs re tht DRTAs contin time constrints. In ddition to using the stndrd stte-merging techniques, RTI identifies these time constrints y splitting trnsitions into two, see [11] for detils. The RTI lgorithm is efficient in oth run-time nd convergence ecuse it is specil cse of n efficient lgorithm for identifying one-clock timed utomt, see [12]. In prctice, however, it cn sometimes e difficult to pply RTI. The reson eing tht dt cn often only e otined from ctul oservtions of the process to e modeled. From such oservtions we only otin timed strings tht hve ctully een generted y the system. In other words, we only hve ccess to the positive dt S +. In this pper, we dpt the RTI lgorithm to this setting. A strightforwrd wy to do this is to mke the model proilistic, nd to check for consistency using sttistics. This hs een done mny times, nd in different wys, for the

prolem of identifying (proilistic) DFAs, see, e.g., [2, 8, 3]. As fr s we know, this is the first time such n pproch is pplied to the prolem of identifying DRTAs. We strt this pper y defining DRTAs nd proilistic DRTAs (PDRTA, Section 2). In ddition to DRTA structure, PDRTA contins prmeters tht model the proilities of events in the DRTA structure. In order to identify PDRTA, we thus need to solve two different identifiction prolems: the first prolem is to identify the correct DRTA structure, nd the second is to set the proilistic prmeters of this model correctly. However, ecuse PDRTA is deterministic model, we cn simply set these prmeters to the normlized frequency counts of events in the input smple S +. 3 This is very esy to compute nd it is the unique correct setting of the prmeters given the dt. We therefore focus on identifying the DRTA structure of PDRTA. We introduce new likelihood-rtio test tht cn e used to solve this identifiction prolem (Section 3). Intuitively, this test tests the null-hypothesis tht the suffixes of strings tht cn occur fter two different sttes hve een reched follow the sme PDRTA distriution, i.e., whether these two sttes cn e modeled using single stte in PDRTA. If this null-hypothesis is rejected with sufficient confidence, then this is considered to e evidence tht these two sttes should not e merged. Equivlently, if these two sttes result from split of trnsition, then this is evidence tht this trnsition should e split. In this wy, the sttisticl evidence resulting from these tests replce the evidence vlue in the originl RTI lgorithm. The result is the RTI+ lgorithm (Section 3.3), which stnds for rel-time identifiction from positive dt. The RTI+ lgorithm is n efficient lgorithm for identifying DRTAs from positive dt. The likelihood-rtio test used y RTI+ is designed specificlly for the purpose of identifying PDRTA from positive dt. Although mny lgorithms like RTI+ exist for the prolem of identifying (proilistic) DFAs, none of these lgorithms uses the non-timed version of the likelihood-rtio test of RTI+. Hence, since this test cn esily e modified in order to identify (proilistic) DFAs, it lso contriutes to the current stte-of-the-rt in DFA identifiction. In order to evlute the performnce of the RTI+ lgorithm we show typicl result of RTI+ when run on dt generted from rndom PDRTA (Section 4). This result shows tht our lgorithm is cple of identifying sufficiently complex rel-time systems in order to e useful in prctice. We end this pper with some conclusions nd pointers for future work (Section 5). 2 Proilistic Deterministic Rel-Time Automt The following exposition uses sic nottion from lnguge, utomt, nd complexity theory. For n introduction the reder is referred to [10]. In the following, we first descrie non-proilistic rel-time utomt, then we show how to dd proility distriutions to these model. 3 In the cse of non-deterministic model, setting the model prmeters is lot hrder. In fct, it cn e s difficult s identifying the model itself.

[0,10] [0,10], [6,10] [3,10] [0,5] [0,2] Fig. 1: An exmple of DRTA. The leftmost stte is the strt stte, indicted y the sourceless rrow. The topmost stte is n end stte, indicted y the doule circle. Every stte trnsition contins oth lel or nd dely gurd [n, n ]. Missing trnsitions led to rejecting grge stte. 2.1 Rel-Time Automt In rel-time system, ech occurrence of symol (event) is ssocited with time vlue, i.e., its time of occurrence. We model these time vlues using the nturl numers N. This is sufficient ecuse in prctice we lwys del with finite precision of time, e.g., milliseconds. Timed utomt [1] cn e used to ccept or generte sequence τ = ( 1, t 1 )( 2, t 2 )( 3, t 3 )... ( n, t n ) of symols i Σ pired with time vlues t i N, clled timed string. Every time vlue t i in timed string represents the time (dely) until the occurrence of symol i since the occurrence of the previous symol i 1. In timed utomt, timing conditions re dded using finite numer of clocks nd clock gurd for ech trnsition. In this pper, we use clss of timed utomt known s rel-time utomt (RTAs) [5]. An RTA hs only one clock tht represents the time dely etween two consecutive events. The clock gurds for the trnsitions re then constrints on this time dely. When trying to identify n RTA from dt, one cn lwys determine n upper ound on the possile time delys y tking the mximum oserved dely in this dt. Therefore, we represent dely gurd [n, n ] y closed intervl in N. Definition 1. (RTA) A rel-time utomton (RTA) is 5-tuple A = Q, Σ,, q 0, F, where Q is finite set of sttes, Σ is finite set of symols, is finite set of trnsitions, q 0 is the strt stte, nd F Q is set of ccepting sttes. A trnsition δ in n RTA is tuple q, q,, [n, n ], where q, q Q re the source nd trget sttes, Σ is symol, nd [n, n ] is dely gurd. Due to the complexity of identifying non-deterministic utomt (see [4]), we only consider deterministic RTAs (DRTAs). An RTA A is clled deterministic if A does not contin two trnsitions with the sme symol, the sme source stte, nd overlpping dely gurds. Like timed utomt, in DRTAs, it is possile to mke time trnsitions in ddition to the norml stte trnsitions used in DFAs. In other words, during its execution DRTA cn remin in the sme stte for while efore it genertes the next symol. The time it spends in every stte is represented y the time vlues of timed string. In DRTA, stte trnsition is possile (cn fire) only if its dely gurd contins the time spent in the previous stte. A trnsition q, q,, [n, n ] of DRTA is thus interpreted s follows:

whenever the utomton is in stte q, reding timed symol (, t) such tht t [n, n ], then the DRTA will move to the next stte q. Exmple 1. Figure 1 shows n exmple DRTA. This DRTA ccepts nd rejects timed strings not only sed on their event symols, ut lso sed on their time vlues. For instnce, it ccepts (, 4)(, 3) (stte sequence: left ottom top) nd (, 6)(, 5)(, 6) (left top left top), nd rejects (, 6)(, 2) (left top reject) nd (, 5)(, 5)(, 6) (left ottom top left). 2.2 Adding Proility Distriutions In order to identify DRTA from positive dt S +, we need to model proility distriution for timed strings using DRTA structure. Identifying DRTA then consists of fitting this distriution nd the model structure to the dt ville in S +. We wnt to dpt RTI [11] to identify such proilistic DR- TAs (PDRTAs). Since they hve the sme structure s DRTAs, we only need to decide how to represent the proility of oserving certin timed event (, t) given the current stte q of the PDRTA, i.e., P r(o = (, t) q). In order to determine the proility distriution of this rndom vrile O, we require two distriutions for every stte q of the PDRTA: one for the possile symols P r(s = q), nd one for the possile time vlues P r(t = t q). The proility of the next stte P r(x = q q) is determined y these two distriutions ecuse the PDRTA model is deterministic. The distriution over events P r(s = q) tht we use is the stndrd generliztion of the Bernoulli distriution, i.e., every symol hs some proility P r(s = q) given the current stte q, nd it holds tht Σ P r(s = q) = 1 (lso known s the multinomil distriution). This is the most strightforwrd choice for distriution function nd it is used in mny proilistic models, such s Mrkov chins, hidden Mrkov models, nd proilistic utomt. A flexile wy to model distriution over time P r(t = t q) is y using histogrms. A histogrm divides the domin of the distriution (in our cse time) into fixed numer of ins H. Every in [v, v ] H is n intervl in N. The distriutions inside the ins re modeled uniformly, i.e., for ll [v, v ] H nd ll t, t [v, v ], P r(t = t q) = P r(t = t q). Nturlly, it hs to hold tht ll these proilities sum to one: t N P r(t = t q) = 1. Using histogrms to model the time distriution might look simple, ut it is very effective. In fct, it is common wy to model time in hidden semi-mrkov models, see, e.g., [6]. The price of using histogrm to model time is tht we need to specify the mount, nd the sizes (division points) [v, v ] of the histogrm ins. Choosing these vlues oils down to mking trdeoff etween the model complexity nd the mount of dt required to identify the model. More ins led to more complex model tht is cple of modeling the time distriution more ccurtely, ut it requires more dt in order to do so. To simplify mtters, we ssume tht these ins re specified eforehnd, for exmple y domin expert, or y performing dt nlysis.

0.3 0.2 0.4 0.5 0.5 0.1 [0, 5] [5, 10] 0.8 0.2 0.2 0.45 0.1 0.25 Fig. 2: A proilistic DRTA. Every stte is ssocited with proility distriution over events nd over time. The distriution over time is modeled using histogrms. The in sizes of the histogrms re predetermined ut left out for clrity. In ddition to choosing how to model the time nd symol distriutions, we need to decide whether to mke these two distriutions dependent or independent. It is common prctice to mke these distriutions independent, see, e.g., [6]. In this cse, the time distriution represents distriution over the witing (or sojourn) time of every stte. In some cses, however, it mkes sense to let the time spent in stte depend on the generted symol. By modeling this dependence, the model cn del with cses where some events re generted more quickly thn others. Unfortuntely, this dependence comes with cost: the size of the model is incresed y polynomil fctor (the product of the sizes of the distriutions). Due to this lowup, we require lot more dt in order to identify similr sized PDRTA. This is our min reson for modeling these two distriutions independently. This results in the following PDRTA model: Definition 2. (PDRTA) A proilistic DRTA (PDRTA) A is qudruple A, H, S, T, where A = Q, Σ,, q 0 is DRTA without finl sttes, H is finite set of ins (time intervls) [v, v ], v, v N, known s the histogrm, S is finite set of symol proility distriutions S q = {P r(s = q) Σ, q Q}, nd T is finite set of time-in proility distriutions T q = {P r(t h q) h H, q Q}. The DRTA without finl sttes specifies the structure of the PDRTA. The symol- nd time-proilities S nd T specify the proilistic properties of PDRTA. The proilities in these sets re clled the prmeters of A. However, in every set S q nd T q, the vlue of one of these prmeters follows from the others ecuse their vlues hve to sum to 1. Hence, there re ( S q 1)+( T q 1) prmeters per stte q of our PDRTA model. The proility tht the next time vlue equls t given tht the current stte is q is defined s P r(t h q) P r(t = t q) = v v + 1 where h = [v, v ] H is such tht t [v, v ]. Thus, in every time-in the proilities of the individul time points re modeled uniformly. The proility of n oservtion (, t) given tht the current stte is q is defined s P r(o = (, t) q) = P r(s = q) P r(t = t q)

Thus, the distriutions over events nd time re modeled to e independent. 4 The proility of the next stte q given the current stte q is defined s P r(x = q q) = P r(o = (, t) q) q,q,,[v,v ] t [v,v ] Thus, the model is deterministic. A PDRTA models distriution over timed strings P r(o = τ), defined using the computtion of PDRTA: Definition 3. (PDRTA computtion) A finite computtion of PDRTA A = Q, Σ,, q 0, H, S, T over timed string τ = ( 1, t 1 )... ( n, t n ) is finite sequence ( 1,t 1) n,t n q 0 q 1... q n 1 qn such tht for ll 1 i n, q i 1, q i, i, [n i, n i ], nd t i [n i, n i ]. The proility of τ given A is defined s P r(o = τ A) = 1 i n P r(o = ( i, t i ) q i 1, H, S, T ). Exmple 2. Figure 2 shows PDRTA A. Let H = {[0, 2]; [3, 4]; [5, 6]; [7, 10]} e the histogrm. In every in the distriution over time vlues is uniform. We cn use A s predictor of timed events. For exmple, the proility of (, 3)(, 1)(, 9)(, 5) is P r((, 3)(, 1)(, 9)(, 5)) = 0.5 0.2 2 0.25 4 0.5 0.4 2 = 1.25 10 5. 0.5 0.3 3 0.8 A PDRTA essentilly models certin type of distriution over timed strings. An input smple S + cn e seen s smple drwn from such distriution. The prolem of identifying PDRTA then consists of finding the distriution tht generted this smple. We now descrie how we dpt RTI in order to identify PDRTA from such smple. 3 Identifying PDRTAs from positive dt In this section, we dpt the RTI lgorithm for the identifiction of DRTAs from leled dt (see [11]) to the setting of positive dt. The result is the RTI+ lgorithm, which stnd for rel-time identifiction from positive dt. Given set of oserved timed strings S +, the gol of RTI+ is to find PDRTA tht descries the rel-time process tht generted S +. Note tht, ecuse RTI+ uses sttistics (occurrence counts) to find this PDRTA, S + is multi-set, i.e., S + cn contin the sme timed string multiple times. Like RTI (see [11] for detils), RTI+ strts with n ugmented prefix tree cceptor (APTA). However, since we only hve positive dt ville, the APTA will not contin rejecting sttes. Moreover, since the points in time where the oservtions re stopped re ritrry, it lso does not contin ccepting sttes. Thus, the initil PDRTA simply is the prefix tree of S +, see Figure 3. 4 Modeling dependencies etween events nd time vlues is possile ut this comes with cost: the numer of prmeters of the model is incresed y polynomil fctor. This lowup lso increses the mount of dt required for identifiction.

[0,100] [0,100] [0,100] [0,100] [0,100] [0,100] Fig. 3: A prefix tree. It is identicl to n ugmented prefix tree cceptor, ut without ccepting nd rejecting sttes. The ounds of the dely gurds re initilized to the minimum nd mximum oserved time vlue. Strting from prefix tree, our originl lgorithm tries to merge sttes nd split trnsitions using red-lue frmework. A merge is the stndrd sttemerging opertion used in DFA identifiction lgorithms such s ESDM [9]. A split cn e seen s the opposite of merge. A split of trnsition δ requires time vlue t nd uses this to divide δ, its dely gurd [n, n ], nd the prt of the PDRTA reched fterwrds into two prts. The first prt is reched y the timed strings tht fire δ with dely vlue less or equl to t, creting new dely gurd [n, t]. The second prt is reched y timed strings for which this vlue is greter thn t, creting dely gurd [t + 1, n ]. The prts of the PDRTA reched fter firing δ re reconstructed s new prefix trees, using the suffixes of the timed strings tht rech these prts s input smple. See [11] for more informtion on the split opertion. RTI+ uses exctly the sme opertions nd frmework s RTI. The only difference is the evidence vlue we use. Originlly, the evidence ws sed on the numer of positive nd negtive exmples tht end in the sme stte. For RTI+, we require n evidence vlue tht uses only positive exmples, nd tht disregrds which sttes these exmples end in. We use likelihood-rtio test for this purpose. We now descrie this test nd explin how we use it oth s n evidence vlue nd s consistency check. 3.1 A likelihood-rtio test for stte-merging The likelihood-rtio test (see, e.g., [7]) is common wy to test nested hypotheses. A hypothesis H is clled nested within nother hypothesis H if the possile distriutions under H form strict suset of the possile distriutions under H. Less formlly, this mens tht H cn e creted y constrining H. Thus, y definition H hs more unconstrined prmeters (or degrees of freedom) thn H. Given two hypotheses H nd H such tht H is nested in H, nd dt set S +, the likelihood-rtio test sttistic is computed y LR = likelihood(s +, H) likelihood(s +, H ) where likelihood is function tht returns the mximized likelihood of dt set under hypothesis, i.e., likelihood(s +, H) is the mximum proility (with

2 4 1 3 5 2 4 5 1 2 3 4 5 1 3 6 Fig. 4: The likelihood-rtio test. We test whether using the left model (two prefix trees) insted of the right model ( single prefix tree) results in significnt increse in the likelihood of the dt with respect to the numer of dditionl prmeters (used to model the stte distriutions). optimized prmeter settings) of oserving S + under the ssumption tht H ws used to generte the dt. Let H nd H hve n nd n prmeters respectively. Since H is nested in H, the mximized likelihood of S + under H is lwys greter thn the mximized likelihood under H. Hence, the likelihood-rtio LR is vlue etween 0 nd 1. When the difference etween n nd n grows, the likelihood under H cn e optimized more nd hence LR will e closer to 0. Thus, we cn increse the likelihood of the dt S + y using different model (hypothesis) H, ut t the cost of using more prmeters n n. The likelihood-rtio test cn e used to test whether this increse in likelihood is sttisticlly significnt. The test compres the vlue 2ln(LR) to χ 2 distriution with n n degrees of freedom. The result of this comprison is p-vlue. A high p-vlue indictes tht H is etter model since the proility tht n n extr prmeters results in the oserved increse in likelihood is high. A low p-vlue indictes tht H is etter model. Applying the likelihood-rtio test to stte-merging nd trnsition-splitting is remrkly strightforwrd. Suppose tht we wnt to test whether we should perform merge of two sttes. Thus, we hve to mke choice etween two PDRTAs (models): the PDRTA A resulting from the merge of these sttes, nd the PDRTA A efore merging these sttes. Clerly, A is nested in A. Thus ll we need to do is compute the mximized likelihood of S + under A nd A, nd pply the likelihood-rtio test. Since PDRTAs re deterministic, the mximized likelihood cn e computed simply y setting ll the proilities in the PDRTAs to their normlized counts of occurrence in S +. We now show how to use this test in order to determine whether to perform merge using n exmple. Exmple 3. For simplicity, we disregrd the time vlues of timed strings nd the timed properties of PDRTAs. Suppose we wnt to test whether to merge the two root sttes of the prefix trees of Figure 4. These two prefix trees re prts of the PDRTA we re currently trying to identify. Hence only some strings from S + rech the top tree, nd some rech the ottom tree.

Let S = {10, 10, 20, 10 } nd S = {20, 20 } e the suffixes of these strings strting from the point where they rech the root stte of the top nd ottom tree respectively, where n τ mens tht the (timed) string τ occurs n times. We first set ll the prmeters of the top tree in such wy tht the likelihood of S is mximized: p,q0 = 4 5, p,q 0 = 1 5, p,q 1 = 1 3, p,q 1 = 2 3 (this is esy ecuse the model is deterministic). We do the sme for the ottom tree nd S : p,q 0 = 1 2, p,q 0 = 1 2, p,q 1 = 1, p,q2 = 1. We cn now compute the proility of S under the top tree: p 1 = ( 4 40 5) ( 1 ) 10 ( 5 1 ) 10 ( 3 2 ) 20 3 6.932 10 20, nd the proility of S under the ottom tree p 2 = ( 1 20 ( 2) 1 20 2) 9.095 10 13. Next, we set the prmeter of the right tree to mximize the likelihood of S S : p,q0 = 2 3, p,q 0 = 1 3, p,q 1 = 3 5, p,q 1 = 2 5, p,q 2 = 1, nd compute the likelihood of the dt under the right (merged) tree: p 3 = ( 2 60 ( 3) 1 30 ( 3) 3 30 ( 5) 2 20 5) 3.211 10 40. We multiply the top nd ottom tree proilities in order to get the likelihood of the dt under the left (un-merged) tree, nd use this to compute the likelihood-rtio: LR = p3 p 1 p 2 5.093 10 9. The χ 2 vlue tht we need to compre to χ 2 distriution then ecomes χ 2 38.19. Per stte Σ 1 prmeters re used. In the un-merged model, the numer of (untimed) prmeters is 5, in the merged model it is 3. A likelihoodrtio test using these vlues results in p-vlue of 5.093 10 9. This is lot less thn 0.05, nd hence the merge results in significntly worse model. Testing whether to perform split of trnsition cn e done in similr wy. When we wnt to decide whether to perform split, we lso hve to mke choice etween two PDRTAs: the PDRTA efore splitting A, nd the PDRTA fter splitting A. A is gin nested in A, nd hence we cn perform the likelihoodrtio test in the sme wy. 3.2 Deling with smll frequencies The likelihood-rtio test does not perform well when the tested models contin mny unused prmeters. The test tests whether n increse in the numer of prmeters leds to significntly higher likelihood. Thus, if there re mny unused prmeters, this increse will usully not e significnt. Hence, there will e tendency to ccept null-hypotheses, i.e., to merge sttes. This cuses prolems especilly in the lefs of the prefix tree. We del with the issue of smll frequencies y pooling the ins of the histogrm nd symol distriutions if the frequency of these ins in oth sttes is less thn 10. Pooling is the process of comining the frequencies of two ins into single in. In other words, we tret two ins s though it were single one. For exmple, suppose we hve three ins, nd their frequencies re 7, 14, nd 5, respectively. Then we tret it s eing two ins with frequencies 12 nd 14. In the likelihood-rtio test, this effectively reduces the mount of prmeters of the tested models. Theoreticlly, it cn e ojected tht this chnges the model using the dt. However, if we do not pool dt, we will otin too mny

prmeters for the sttes in which some in occurrences re very unlikely. For instnce, suppose we hve stte in which 1000 symols could occur, ut only 10 of them ctully occur. Then ccording to theory, we should count this stte s hving 999 prmeters. We count it s hving only 9. 3.3 The lgorithm We hve just descried the test we use to determine whether two sttes re similr. The null-hypothesis of this test is tht two sttes re the sme. When we otin p-vlue less thn 0.05, we cn reject this hypothesis with 95% certinty. When we otin p-vlue greter thn 0.05, we cnnot reject the possiility tht the two sttes re the sme. Insted of testing whether two sttes re the sme, however, we wnt to test whether to perform merge or split, nd if so, which one. When we test merge, high p-vlue indictes tht the merge is good. When we test split, low p-vlue indictes tht the split is good. We implemented this sttisticl evidence in RTI+ in very strightforwrd wy: If there is split tht results in p-vlue less thn 0.05, perform the split with the lowest p-vlue. If there is merge tht results in p-vlue greter thn 0.05, perform the merge with the highest p-vlue. Otherwise, perform color opertion. Thus, we merge two sttes unless we re very certin tht the two sttes re different. In ddition, we lwys perform the merge or split tht leds to the most certin conclusions. In every itertion, RTI+ selects the most visited trnsition from red stte to lue stte nd determines whether to merge the lue stte, split the trnsition, or color the lue stte red. The min reson for trying out only the most visited trnsition is tht it reduces the run-time of the lgorithm. Trying every possile merge nd split would tke much longer. Additionlly, the tests performed using the most visited trnsition will e sed on the lrgest mount of dt. Hence, we re more confident tht these conclusions re correct. An overview of the RTI+ lgorithm is shown in Algorithm 1. We clim tht RTI+ is efficient, i.e., it tht runs in polynomil time: Proposition 1. RTI+ is polynomil-time lgorithm. Proof. This follows from the fct tht ID 1DTA is efficient [12] nd the fct tht every sttistic cn e computed (up to sufficient ccurcy) in polynomil time for every stte. Since, t ny time during run of the lgorithm, the numer of sttes does not exceed the size of the input, the proposition follows. In ddition to eing time-efficient, we elieve tht RTI+ is lso dt-efficient. More specificlly, we conjecture tht returns PDRTA tht is equl to the correct PDRTA A t in the limit. With equl we men tht these PDRTAs model the exct sme proility distriutions over timed strings.

Algorithm 1 Rel-time identifiction from positive dt: RTI+ Require: A multi-set of timed strings S + generted y PDRTA A t Ensure: The result is smll DRTA A, in the limit A = A t Construct timed prefix A tree from S +, color the strt stte q 0 of A red while A contins non-red sttes do Color lue ll non-red trget sttes of trnsitions with red source sttes Let δ = q r, q,, g e most visited trnsition from red to lue stte Evlute ll possile merges of q with red sttes Evlute ll possile splits of δ If the lowest p-vlue of split is less thn 0.05 then preform this split Else if the highest merge p-vlue is greter thn 0.05 then perform this merge Else color q red end while Conjecture 1. The result A of RTI+ converges efficiently in the limit to the correct PDRTA A with proility 1. Completeness of the lgorithm follows from the fct tht the lgorithm is specil cse of the ID 1DTA lgorithm from [12]. The conjecture therefore holds if ll correct merges nd splits re performed given input smple of size polynomil in the size of A. The min reson for our conjecture follows from the fct tht with incresing mounts of dt, the p-vlue resulting from the likelihood-rtio test converges to 0 if the two sttes re different. Thus in the limit, RTI+ will perform ll the necessry splits, nd perhps some more, nd it will never perform n incorrect merge. However, when the two sttes tested in the likelihood-rtio test re the sme, there is lwys proility of 0.05 tht the p-vlue is less thn 0.05. Thus, t times it will not perform merge when it should. Fortuntely, not performing merge or performing n extr split does not influence the lnguge of the DRTA, or the distriution of the PDRTA. It only dds dditionl (unnecessry) sttes to the resulting PDRTA A. Thus, in the limit, the lgorithm should return PDRTA A tht is lnguge equivlent to the trget PDRTA A. Unfortuntely, since we use multiple sttisticl tests tht cn ecome dependent, proving this conjecture is complex nd left s future work. 4 Tests on rtificil dt In order to evlute the RTI+ lgorithm, we test it on rtificilly generted dt. First we generte rndom PDRTA (without finl sttes), nd then we generte dt using the distriutions of this PDRTA. Unfortuntely, it is difficult to mesure the qulity of models tht re identified from such dt. Commonly used mesures include the predictive qulity or model selection criterion. However, such mesures re meningless on their own, they only useful to compre the performnce of different methods ginst ech other. Since, we know of no ny other method for identifying PDRTA, we cnnot mke use of these mesures.

Therefore, in order to provide some insight into the cpilities of RTI+, we only show typicl result of RTI+ when run on this dt. We generte rndom PDRTA with 8 sttes nd size 4 lphet. Of the trnsitions of the PDRTA, 4 re split nd ssigned different trget sttes t rndom. The numer of possile time vlues for the timed strings is fixed t 100. The numer of histogrm ins used in the PDRTA is set to 10. Thus, there re individul proilities for [0, 9], [10, 19], etc. The proilities of these ins nd the symol ins re generted y first ssigning to ech in vlue etween 0 nd 1, drwn from uniform distriution. These vlues re then normlized such tht oth the histogrm vlues nd the symol vlues summed to 1. We generted 2000 timed strings from this PDRTA, which ll hve n exponentilly distriuted length with n verge of 10. Figure 5 shows the resulting originl nd identified PDRTA (no proility distriutions re drwn). From this figure, it is cler tht the most common mistke is the incorrect identifiction (or sence) of clock gurd. These re usully only minor errors, involving only infrequently visited trnsitions. The resulting PDRTA is thus very similr to the originl used to generte the dt. We performed such test multiple times nd using differently sized rndom PDRTAs. The results of these tests re encourging for up to 8 sttes, size 4 lphet, nd 4 splits. When either of these vlues is incresed, the lgorithm needs more thn 2000 exmples to come up with similr PDRTA. These results re encourging ecuse PDRTAs of this size re complex enough to model interesting rel-time systems. 5 Future work In previous work, we descried the RTI lgorithm for identifying deterministic rel-time utomt (DRTAs) from leled dt. In this pper, we showed how to dpt it to the setting of positive dt. The result is the RTI+ lgorithm. RTI+ runs in polynomil time, nd we conjecture tht it converges efficiently to the correct proilistic DRTA (PDRTA). In future work, we would like to prove this conjecture. This should e possile, ecuse none of the sttistics we use requires lrge mount of dt. Moreover, the fct tht there exist polynomil chrcteristics sets for DRTAs (see [12]) should somehow extend to identifying PDRTAs. RTI+ uses likelihood-rtio test in order to determine which sttes to merge nd which trnsitions to split. Although this test is designed for the purpose of identifying PDRTA from positive dt, it cn esily e modified in order to identify proilistic DFAs. It would e interesting to test such n pproch. The chieved performnce of RTI+ is shown to e sufficient in order to identify complex rel-time systems. We elieve this performnce to e sufficient to e useful for identifying rel-world rel-time systems. We invite everyone with timed dt to try RTI+ to identify ehviorl models, nd network protocols. The source code of RTI+ is ville on-line from the first uthor s homepge.

Originlly generted rndom DRTA, c, c [86,100], d d [38,100], c d [48,100] d [0,19] [0,94] [94,100] [26,100],, d c [0,85] c d [0,47] [0,25] c c, d d [0,37], d c, c d [20,35] d [36,100] Identified using RTI+ with the likelihood rtio test, c d, d [0,100], d [25,28], c [0,100] d d [48,100] [29,100],, c c [0,100], d c d [0,47] [0,24] c, c c, d d [43,100] d [0,42] = correct = (prtilly) incorrect Fig. 5: A rndomly generted DRTA (top) nd the DRTA identified y our lgorithm (ottom). The dshed lines re (prtilly) incorrectly identified trnsitions. The solid sttes re correctly identified, including ll outgoing trnsitions.

References 1. R. Alur nd D. L. Dill. A theory of timed utomt. Theoreticl Computer Science, 126:183 235, 1994. 2. R. Crrsco nd J. Oncin. Lerning stochstic regulr grmmrs y mens of stte merging method. In ICGI, volume 862 of LNCS, pges 139 150. Springer, 1994. 3. A. Clrk nd F. Thollrd. PAC-lernility of proilistic deterministic finite stte utomt. Journl of Mchine Lerning Reserch, pges 473 497, 2004. 4. C. de l Higuer. A iliogrphicl study of grmmticl inference. Pttern Recognition, 38(9):1332 1348, 2005. 5. C. Dim. Rel-time utomt. Journl of Automt, Lnguges nd Comintorics, 6(1):2 23, 2001. 6. Y. Guédon. Estimting hidden semi-mrkov chins from discrete sequences. Journl of Computtionl nd Grphicl Sttistics, 12(3):604 639, 2003. 7. W. L. Hys. Sttistics. Wdsworth Pu Co, fifth edition, 1994. 8. C. Kermorvnt nd P. Dupont. Stochstic grmmticl inference with multinomil tests. In ICGI, volume 2484 of LNAI, pges 149 160. Springer, 2002. 9. K. J. Lng, B. A. Perlmutter, nd R. A. Price. Results of the Adingo one DFA lerning competition nd new evidence-driven stte merging lgorithm. In ICGI, volume 1433 of LNCS. Springer, 1998. 10. M. Sipser. Introduction to the Theory of Computtion. PWS Pulishing, 1997. 11. S. Verwer, M. de Weerdt, nd C. Witteveen. An lgorithm for lerning rel-time utomt. In Benelern, pges 128 135, 2007. 12. S. Verwer, M. de Weerdt, nd Cees Witteveen. One-clock deterministic timed utomt re efficiently identifile in the limit. In LATA, volume 5457 of LNCS, pges 740 751. Springer, 2009.