Automata-based Pattern Mining from Imperfect Traces
|
|
- Polly Hines
- 6 years ago
- Views:
Transcription
1 Automt-sed Pttern Mining from Imperfect Trces Giles Reger University of Mnchester Oxford Rod, M13 9PL Mnchester, UK Howrd Brringer University of Mnchester Oxford Rod, M13 9PL Mnchester, UK Dvid Rydeherd University of Mnchester Oxford Rod, M13 9PL Mnchester, UK ABSTRACT This pper considers utomt-sed pttern mining techniques for extrcting specifictions from runtime trces nd suggests novel extension tht llows these techniques to work with so-clled imperfect trces i.e. trces tht do not exctly stisfy the intended specifiction of the system tht produced them. We show tht y tking so-clled edit-distnce etween n input trce nd the lnguge of pttern we cn extrct specifictions from imperfect trces nd identify the prts of n input trce tht do not stisfy the mined specifiction, thus iding the identifiction nd loction of errors in progrms. Keywords Pttern Mining, Specifiction Mining 1. INTRODUCTION Forml progrm specifictions re useful for numer of ctivities ut they re often missing or incomplete. The field of specifiction mining [17] ims to utomticlly construct forml progrm specifictions from progrm rtifcts. In this work we consider techniques tht operte on progrm trces i.e. finite sequences of events tht occur whilst progrm is running. One such pproch [19, 5, 6] uses set of templte ptterns to detect predefined ehviours nd then comine these together to form specifiction. Thse regulr ptterns re descried vi utomt, which llows for efficient checking. For exmple, consider the following pttern over the metsymols nd - shded sttes re ccepting sttes. We cn pply this pttern to the following trce y considering six instntitions, with ech pir of symols in the trce instntiting the pttern. connect.open. We detect three ptterns (1) [ connect, open], (2) [ connect, ] nd (3) [ open, ]. These cn then e comined to form lrger specifiction: connect open 3 This generl pproch cn e used with different ptterns nd methods of pttern comintion. However, it hs mjor drwck - imgine if we hd trce with the ove sequence repeted thousnd times followed y the two events connect nd open i.e. missing the finl. We would fil to detect the two ptterns involving nd therefore not extrct the ove specifiction. The prolem is tht this pproch ssumes perfect trces i.e. tht the correct ehviour is contined within the given trces. This ssumption is not very relistic - we would like to e le to del with cses where there re smll errors in trces. The notion is tht progrmming pttern my hold for the mjority of progrm ut the progrm my contin one or two ugs. One pproch [5] to deling with this issue is to reset pttern eing checked to its initil stte when n error occurs - ut this technique would not detect the required ptterns in our ove exmple. Insted we wnt to e le to mesure how ly trce mtches pttern. This pper presents n pproch tht extends the utomt-sed pttern mining pproch to imperfect trces y considering so-clled edit distnces etween trce nd pttern s lnguge. Structure. Section 2 formlly introduces the concept of pttern checking nd composition. Section. 3 discusses methods for deling with the imperfect trces prolem nd Sections 4, 5 nd 6 present our proposed solutions. Section. 7 presents two experiments nd Section. 8 discusses relted work. Finlly, we conclude in Section PATTERN CHECKING In this section, we introduce pttern checking frmework y first descriing how ptterns re extrcted from trces, then considering how this cn e done efficiently nd finlly discussing how extrcted ptterns re comined. 2.1 Checking ptterns In this ccount, pttern is regulr lnguge over symols i.e. set of trces (finite sequences) of symols. We consider ptterns s utomt: DEFINITION 1 (PATTERN). A pttern p = Q, Σ, δ, q 0, F is n utomton where Q is finite set of sttes, Σ is finite lphet of symols, δ Q Σ Q is trnsition function, q 0 Q is n initil stte nd F Q is set of ccepting sttes. The lnguge of pttern, L(p) is the set of trces it ccepts i.e. τ L(p) iff there exists pth q τ 0 q nd q F where is δ lifted to trces. The process of checking pttern ginst trce considers ll possile comintions of symols in the trce s replcements for the pttern s current symols. To replce pttern s symols we instntite it. DEFINITION 2 (INSTANTIATION). Given pttern p nd mp ϕ from p.σ to Σ, the instntited pttern ϕ(p) hs lphet Σ nd is the result of pplying ϕ to every symol in p. The checking process then checks if ech prticulr instntition of the pttern holds on the trce. We sy n instntited pttern holds on trce
2 if the trce ppers in the instntited pttern s lnguge fter we remove irrelevnt symols. To remove irrelevnt symols we project the trce. DEFINITION 3 (PROJECTION). The projection π Σ(τ) of trce τ over lphet Σ is defined s τ with ll elements not in Σ removed. Therefore, the detected instntited ptterns re given s follows. DEFINITION 4 (EXTRACTED PATTERNS). Given pttern p nd trce τ the extrcted ptterns detect(p, τ) re {ϕ(p) dom(ϕ) = p.σ ( s) ϕ : s τ π ϕ(p).σ (τ) L(ϕ(p))} 2.2 Checking ptterns efficiently We discuss two pproches tht llow us to check ptterns efficiently Checking mny instntitions For ech pttern we need to check ll possile instntitions. Typiclly we restrict this technique to ptterns over 2 or 3 symols. We cn then compute the extrcted instntited ptterns for pttern using 2 or 3 dimensionl grid of reched sttes - this pproch ws first used in [19]. For the introductory exmple the following mtrix would represent the sttes reched in the pttern fter checking the trce. For exmple, if we cll the pttern in the introductory exmple p 1 nd cll the following pttern p 2 then the pttern checker for p 1 nd p 2 would e {p 1, p 2} {} {p 2} {} where sttes re leled using the output function Γ. 2.3 Comining ptterns. The following is sed on the technique introduced y Gel nd Su in [5]. Once we hve extrcted set of ptterns we cn comine them together using stndrd utomt intersection. However, this opertion is only defined when two utomt hve the sme lphet. To give two utomt the sme lphet we cn expnd them y plcing self-looping trnsitions on ech stte for the missing symols. For exmple, the three detected ptterns from the introductory exmple ecome: connect open connect open - 1 connect open open open connect connect connect open The restriction of ptterns to 2 or 3 symols is for efficiency resons s this pproch is O(n m ) given n lphet of size n nd pttern with m symols. A more efficient symolic pproch using inry discussion digrms is explored in [6] Checking mny ptterns If we wnt to check multiple ptterns we would currently need to repet the ove process multiple times i.e. for ech pttern. However, given set of ptterns with the sme set of symols we cn construct pttern checker tht checks ll these ptterns simultneously y tking the union of the ptterns nd leling sttes with the ptterns tht re ccepting t tht stte. This pproch ws previously presented in [16]. DEFINITION 5 (PATTERN CHECKER). Given n lphet of symols Σ nd set of ptterns p 1,..., p n over Σ let the pttern checker for these ptterns e C(p 1,..., p n) = Q, Σ,, Γ where Q = p 1.Q... p n.q (, (q 1,..., q n)) = (p 1.δ(, q 1),..., p n.δ(, q n)) Γ((q 1,..., q n)) = {p i q i p i.f } The ptterns detected y pttern checker C in trce τ re therefore C(τ) = {p q 0 τ q p Γ(q)} We cn extend the notion of instntition to pttern checkers nd define extrcted ptterns for pttern checker s follows. DEFINITION 6 (PATTERN CHECKER EXTRACTED PATTERNS). Given pttern checker C nd trce τ the extrcted ptterns detect(c, τ) re {p ϕ : dom(ϕ) = p.σ ( s) ϕ : s τ p ϕ(c)(π ϕ(p).σ (τ))} The intersection of these three ptterns is the specifiction given in the introduction. Formlly, comintion is defined s follows. DEFINITION 7 (COMBINATION). Given set of instntited ptterns p 1,..., p n with comined lphet Σ, define their comintion s comine(p 1,..., p n) = expnd Σ\p 1.Σ (p 1)... expnd Σ\pn.Σ (p n) where is utomt intersection nd expnd Σ is function tht dds self-looping trnsitions to pttern for symols in Σ. We cn either pply this comintion opertor or directly or use it to define specific comintion rules. To use comintion directly we cn sturte the set y repeted ppliction or extrct specifiction for ech lphet of events in the trce y comining together ptterns with the sme lphet.however, this might e costly nd not ll extrcted ptterns necessrily contin useful informtion. Therefore, n lterntive pproch (which we do not consider further here) is to develop specific comintion rules for given ptterns, s is done in [5]. For exmple, they introduce sequencing rule for the pttern in our introductory exmple, which we will represent y the regulr expression (). (L 1) (L 2c) (c) (L 1L 2c) We pplied this y tking L 1 = L 2 =. The use of L 1 nd L 2 llows for repeted ppliction of the rule. The closure of the set of extrcted ptterns with respect to the set of comintion rules cn then e computed. 3. DEALING WITH IMPERFECT TRACES The previous frmework will only detect pttern if it mtches exctly with n input trce. In this section we consider how it cn e extended so tht ptterns re extrcted if they mtch lmost ll of the input trce.
3 3.1 Wht re imperfect trces? To sy tht trce is imperfect we ssume tht there is n implicit specifiction tht the progrm tht produced the trce follows nd there is some ug in the progrm tht devites from this specifiction. The process of specifiction mining is therefore to extrct this implicit specifiction. Alterntively, the progrm might e correct ut the trce recording process my e fulty - either wy, identifying specifiction nd the trce imperfections cn id deugging efforts. We could view these imperfections s uniform noise, however, in the cse of progrmming ugs, it is likely tht these imperfections re introduced y common mistkes such s forgetting to resource or check condition, or ccidentlly clling the wrong method. We cn therefore think of imperfections s smll edits tht involve the removl, ddition or sustitution of events from perfect trce. 3.2 The restrt pproch Previous pproches del with imperfect trces y restrting the pttern nd counting the numer of such restrts. With smll ptterns such s the simple lterntion pttern this cn e effective. Let us consider the following common 3-symol resource usge pttern. c Consider checking the following (imperfect) trce for the instntition [ open, use, c ]. The checking would fil fter the fifth event s n open event is omitted. If we restrt here then we immeditely fil gin. open.use.use..use..open.use. Insted, we would like to detect tht the open event is missing nd flg this s potentil ug. 3.3 Edit distnce As n lterntive to the restrt pproch we consider replcing our previous condition tht trce must exctly mtch pttern with the requirement tht the edit-distnce etween the trce nd ny trce in the lnguge of the pttern must e elow some limit. The edit-distnce we consider uses the following edit opertions: inserting new symol; deleting n existing symol; sustituting n existing symol for new symol. The edit-distnce etween two trces is then given y the (minimum) numer of edits tht trnsform one trce into the other. This is sometimes clled the Levenshtein distnce [10]. Formlly, this distnce is given s follows. DEFINITION 8 (LEVENSHTEIN DISTANCE). The Levenshtein distnce etween trces τ 1 nd τ 2 is distnce(τ 1, τ 2), defined s distnce(τ 1, ɛ) = τ 1 distnce(ɛ, τ 2) = τ 2 distnce(τ 1, τ 2) + 1 distnce(τ distnce(τ 1, τ 2) = min 1, τ 2) + 1 distnce(τ 1, τ 2) + 1 distnce(τ 1, τ 2) if if = We define n updted notion of extrcted ptterns using this metric. DEFINITION 9 (IMPERFECT EXTRACTED PATTERNS). Given pttern p, trce τ nd integer γ > 0, which we cll the tolernce, the imperfect extrcted ptterns imperfect_detect(p, τ, γ) re { ϕ(p) dom(ϕ) = p.σ ( s) ϕ : s τ τ L(ϕ(p)) : distnce(τ, π ϕ(p).σ (τ)) < γ } We extend this definition for pttern checkers s we did efore (Sec ). 3.4 Detecting ugs So fr our pproch hs een strct, considering trces of symols generted y progrm. But our motivtion hs een to extrct specifictions tht llow us to detect potentil ugs. To do so we need to e le to ccess informtion out the prt of progrm tht genertes trce - we ssume this is contined in so-clled progrm trce. DEFINITION 10 (PROGRAM TRACE). A progrm trce is finite sequence of pirs of the form (code_point, event) where code_point identifies the point in the progrm tht genertes the event. It is esy to extend our previous constructions to work on these progrm trces y ignoring the code point informtion. Our gol is to identify points in the progrm trce tht should e edited for mined specifiction to hold. These edits will follow those descried ove i.e. the removl of n event, ddition of n event etween two existing events or replcement of one event with nother. The solutions we descrie in the following two sections will produce so-clled rewrites. DEFINITION 11 (REWRITE). A rewrite ρ is finite sequence of indexes nd rewrite opertions tht cn e pplied to progrm trce to produce n edited version. A rewrite cn then e used to identify the code points tht my contin ugs, nd suggest potentil solutions i.e. edits. 4. EDITING ON FAILURE We first consider n pproch tht does not use the true edit-distnce, ut introduces new restrt opertion inspired y the metric. The ide is to introduce edit opertions only when trce fils to mtch pttern. 4.1 Filing edit-distnce We introduce n lterntive formultion of the edit-distnce tht only pplies edits when trce fils. We sy pttern fils for trce if no extensions of the trce cn stisfy the pttern. We define this metric s follows. For pttern p nd trce τ let τ = good(τ)..rest(τ) where good(τ) is longest prefix of τ such tht there exists trce τ such tht good(τ).τ L(p) ut for ll trces τ we hve good(τ)..τ / L(p). Let edit e function on symols tht non-deterministiclly replces the symol y the empty trce, trce consisting of nother symol from the trce followed y the originl symol or nother symol in the trce i.e. it cn pick one of the three edit opertions discussed ove. An edited trce is defined recursively s edited(τ) = edited(good(τ).edit().rest(τ)) if τ / L(τ) τ otherwise i.e. the repeted ppliction of the edit function to the event cusing filure. As edit is non-deterministic the filing edit-distnce is given s the minimum numer of times the edited function must e pplied to trce. The filing-edit-distnce is still n edit-distnce, ut not necessrily miniml. 4.2 Computing the filing edit-distnce To compute the filing edit-distnce we explore the non-deterministic edit opertions y mintining numer of possile configurtions of
4 Algorithm 1 Computing the filing edit-distnce with tolernce γ for pttern p = Q, Σ, δ, q 0, F nd trce τ. C { [], q 0 } for i in 1 to τ do τ(i) C {} for ρ, q in C do q δ(q, ) if filing(q ) then if ρ < γ then C C { (i, ).ρ, q } C C { (i, +).ρ, δ(, δ(, q)) Σ} C C { (i, %).ρ, δ(, q) Σ} else C C { ρ, q } C { ρ, q C filing(q)} return min({ ρ ρ, q C q F }) the instntited pttern. A configurtion is pir consisting of rewrite (Def. 11) nd stte. We sy tht trce reches configurtion ρ, q for ρ(τ) pttern p iff q 0 q where q 0 nd re the initil stte nd trnsition reltion of p. Algorithm 1 gives n lgorithm for computing the filing edit-distnce y computing the set of configurtions reched y trce. The lgorithm uses tolernce γ to restrict the size of rewrites nd therefore the lgorithm will only find the edit-distnce if it is elow this tolernce. The lgorithm uses function filing tht returns true if finl stte is not rechle from the given stte. The use of γ helps restrict the exponentil lowup introduced y the nondeterminism of edit functions. Other optimistions tht cn reduce this lowup include restricting the numer of edits llowed in row nd comining similr rewrites together. 4.3 Exmple of computing filing edit-distnce Let us tke the resource usge pttern introduced in Sec. 3.2 nd consider the trce open.open.use..use for the instntition [ open, use, c ]. Checking this pttern will fil on the second event s there is no trnsition from the second stte. Two edit opertions cn e pplied here - removl of the second event or ddition of event immeditely efore the second open - this leds to two lterntive configurtions: { [(1, )], 2, [(1, +)], 2 } We continue checking nd fil gin on the fifth event, the finl use. Here there re lso three edit opertions tht cn e pplied - removl of the event, ddition of open event or sustitution of the use event with n open event. This leves us with six finl configurtions: [(1, ), (5, )], 1, [(1, ), (5, +open)], 2, [(1, ), (5, %open)], 2, [(1, +), (5, )], 1, [(1, +), (5, +open)], 2, [(1, +), (5, %open)], 2 Therefore, the instntited pttern mtches with filing edit-distnce USING THE TRUE EDIT DISTANCE We now consider n pproch tht uses the true edit distnce etween the trce nd lnguge. We consider technique tht uses weighted trnsducers to compute the edit-distnce etween trce nd finite utomton [2]. The generl ide is tht we model the trce nd pttern s weighted trnsducers T nd P nd model the edit opertions s trnsducer X. The composition T X P will cpture the different wys tht the trce cn e rewritten to mtch the pttern nd the miniml edit-distnce is the shortest pth to n ccepting stte. 5.1 Weighted trnsducers A weighted trnsducer hs trnsitions leled with n input symol, output symol nd weight - for this ppliction we tke weights s eing 0 or 1. We llow ɛ input nd output trnsitions tht cn e tken without consuming or producing symol. DEFINITION 12 (WEIGHTED TRANSDUCER). A weighted trnsducer is 5-tuple T = Q, Σ,, δ, F where Q is finite set of sttes, Σ is finite input lphet of symols, is finite output lphet of symols, δ Q (Σ {ɛ}) ( {ɛ}) {0, 1} Q is finite set of trnsitions nd F Q is set of finl sttes. We trnslte trces into weighted trnsducers y creting trnsition to new stte per event, dding self-looping ɛ trnsitions nd only mking the lst stte finl. For exmple, the trce...c. would ecome the following weighted trnsducer where trnsitions re written input/output : weight. Note tht we use weight of 0 s there is no cost ssocited with following the trce. ɛ/ɛ : 0 ɛ/ɛ : 0 ɛ/ɛ : 0 ɛ/ɛ : 0 ɛ/ɛ : 0 ɛ/ɛ : 0 / : 0 / : 0 / : 0 c/c : 0 / : Ptterns re trnslted y keeping the structure nd leling trnsitions with the sme input nd output symols using weight of 0, nd dding self-looping ɛ trnsitions The edit trnsducer consists of single stte nd looping trnsitions for ech of the edit opertions it cn perform - for n lphet of {,, c} this would e s follows. Note how ɛ is used to model deletions nd dditions nd ll edit opertions hve weight of 1. 1 / : 0, / : 0, c/c : 0, /ɛ : 1, /ɛ : 1, c/ɛ : 1, ɛ/ : 1, ɛ/ : 1, / : 1, /c : 1, / : 1, /c : 1, c/ : 1, c/ : Composition The composition T X of two trnsducers T nd X considers ll possile sequencing etween strings of T nd strings X i.e. if /./c is string of T nd /d.c/ is string of X then /d./ is string of T X. Here we consider three-wy composition i.e. T X P. We compute s single opertion for efficiency resons - if we computed T X nd then (T X) P it is likely tht (T X) would contin mny superfluous trnsitions. An pproch for doing this is presented in [1] nd Algorithm. 2 gives n lgorithm for three-wy composition. 5.3 An exmple of computing edit-distnce Let us tke the sme exmple we used for the filing edit-distnce i.e. the trce open.open.use..use nd the resource usge pttern introduced in Sec For ese of presenttion we trnslte the trce using for open, for use nd c for. This gives us the trce used s n exmple in Sec. 5.1 ove. We therefore lredy hve our weighted trnsducer T. We then compute the weighted trnsducer P for the resource usge pttern s follows.
5 Algorithm 2 Computing the three-wy composition of trnsducers T, X nd P with the sme input nd output lphets Σ nd. Enqueue(S, (T.q 0, X.q 0, P.q 0)) Q {(T.q 0, X.q 0, P.q 0)} δ, F while isempty(s) do (q 1, q 2, q 3) Dequeue(S) if (q 1, q 2, q 3) T.F X.F P.F then F F {(q 1, q 2, q 3)} for (q 1, i 1, o 1, w 1, q 1) T.δ nd (q 3, i 3, o 3, w 3, q 3) P.δ do for (q 2, i 2, o 2, w 2, q 2) X.δ where i 2 = o 1 o 2 = i 3 do if (q 1, q 2, q 3) / Q then Q Q {(q 1, q 2, q 3)} Enqueue(S, (q 1, q 2, q 3)) δ δ ((q 1, q 2, q 3), i 1, o 3, w 1 + w 2 + w 3, (q 1, q 2, q 3)) return Q, Σ,, δ, F ɛ/ɛ : 0 ɛ/ɛ : 0 / : 0 c/c : 0 / : 0 We now compute T X P, using the edit trnsducer X presented in Sec. 5.1 ove. This gives us the weighted trnsducer in Figure 1. We then use Djkistr s shortest pth lgorithm to find shortest pth etween the initil stte nd n ccepting trce. We indicte one such shortest pth with dshed line, this corresponds to the string /././.c/c./ with weight of 2. This gives two edits to our string - replcing the second open event with use event nd the lst use event with n open event. Note tht there re multiple pths with weight of 2 here, nd therefore multiple wys we cn rewrite our trce. A shortest pth through the composition will lwys e t lest s long s the trce nd will give rewrite y relting the projected trce ck to the originl trce. If pttern checker is used then, insted of computing the shortest distnce to n ccepting stte, for ech pttern we compute the shortest distnce to n ccepting stte leled with tht pttern. 6. COMBINING IMPERFECT PATTERNS The previous two sections presented two different techniques for extrcting imperfect ptterns from imperfect trces. Ech pttern is given set of rewrites tht tell us how to edit the input trce to mke it mtch the pttern. When comining ptterns we now need to consider these rewrites. In this section we present n pproch for comining set of imperfect ptterns tht re comptile i.e. hve set of rewrites tht do not clsh. We then discuss sturtion pproch to producing set of pttern comintions. 6.1 The pproch We first define wht we men y imperfect pttern. If we took n imperfect pttern s pir of pttern nd its shortest rewrite then when comining two ptterns we might find tht these shortest rewrites re incomptile, ut tht if we hd chosen, sy, the second shortest rewrite we would e le to comine the two ptterns. Therefore, we consider ll rewrites up to certin size for pttern. An imperfect pttern is pir p, R where p is pttern nd R is set of rewrites. In the cse of the filing edit-distnce pproch R is given y the reched configurtions. In the cse of true edit-distnce pproch R is /ɛ : 1 /ɛ : 1 /ɛ : 1 c/ɛ : 1 /ɛ : 1 ɛ/ : / : 0 / : 0 / : 1 c/ : 1 / : 1 ɛ/ : 1 ɛ/ : 1 ɛ/ : 1 ɛ/ : 1 ɛ/ : 1 /c : 1 /c : 1 /c : 1 c/c : 0 /c : ɛ/ : 1 / : 1, /ɛ : 1 ɛ/ : 1 / : 1, /ɛ : 1 ɛ/ : 1 / : 0, /ɛ : 1 ɛ/ : 1 c/ɛ : 1, c/ : 1 ɛ/ : 1 / : 0, /ɛ : 1 ɛ/ : 1 Figure 1: An exmple of the composition T X P given y the lnguge of the composition - therefore it is in theory infinite, ut in prctice we use redth-first serch to select the k-shortest pths. A set of imperfect ptterns {... p i, R i...} is comptile if there exists set of rewrites {... ρ i... ρ i R i} such tht every pir of rewrites is comptile. Two rewrites re comptile if they do not ttempt to mke different rewrites t the sme point in trce. The edit-distnce of p n, R n... p n, R n is ρ 1... ρ n i.e. the numer of edits when ll rewrites re comined. Therefore, given set of comptile ptterns we wnt to find the set of rewrites tht minimizes this distnce. However, rewrites only contin informtion out the prts of the trce they updte. If one rewrite updtes n element in the trce nd the other rewrite does not then this should lso pper s n incomptiility. We therefore incorporte this informtion when computing comptiility. 6.2 Computing comptiility We compute the comptiility etween two sets of rewrites R 1 nd R 2 y tking the the set R 1 R 2 nd repetedly splitting it sed on conflicts etween rewrites nd then checking tht there is set of rewrites with rewrite in R 1 nd R 2. An lgorithm for computing comptiility is given in Algorithm 3. This cn e extended to set of sets of rewrites. The lgorithm will return "incomptile" if the two sets of rewrites re incomptile nd the smllest numer of edits tht mkes them comptile otherwise. Let min e the function tht returns this minimum distnce nd is undefined otherwise. 6.3 Sturting the set of ptterns Given set of imperfect ptterns P 0 extrcted from trce we compute the i-the sturtion of P 0 s follows, reclling tht min(r 1, R 2) is only defined if R 1 nd R 2 re comptile. P i+1 = { p 1 p 2, min(r 1, R 2) p 1, R 1 p 2, R 2 P i} In generl, P i = Pi 1 ( Pi 1 1). However, it is highly likely tht mny comintions in P i1 re empty (i.e. ccept no trces) nd therefore
6 Algorithm 3 Computing the minimum comptiility etween sets of rewrites R 1 nd R 2 from imperfect ptterns extrcted from trce τ where R i is relted to pttern with lphet Σ i. G {R 1 R 2} for i from 1 to τ do G for g G do D {ρ ρ(i) is defined } M [e {ρ D ρ(i) = e}] M M [τ(i) {ρ g\d τ(i) Σ i ρ R i] if D = then G G g else G G {(g\d) d (e d) M} G G G oky {g G ρ 1 R 1, ρ 2 R 2 : ρ 1, ρ 2 g} if G oky = then return "incomptile" else return min({ g g G oky }) send ( " s e r v e r A ", new S t r i n g [ ] { " s t r t ", " 45 " } ) ; send ( " s e r v e r B ", n u l l ) ; send ( " s e r v e r C ", new S t r i n g [ ] { " end ", " 23 " } ) ; void send ( S t r i n g d d r e s s, S t r i n g [ ] l i n e s ) { C o n n e c t i o n C = c o n n e c t ( d d r e s s ) ; Strem S = C. open ( ) ; t r y { f o r ( S t r i n g l i n e : l i n e s ) S. send ( l i n e ) ; } ctch ( N u l l P o i n t e r E x c e p t i o n e ) { send ( " empty " ) ; C. c l o s e ( ) ; } C. c l o s e ( ) ; } Figure 2: A hypotheticl piece of Jv code. cn e removed. Even though mny ptterns cn e removed, the sturtion cn grow exponentilly. Let P e the fixed-point of P i i.e. the set P i such tht P i+1 = P i. We cn either compute P or plce n upper ound on the numer of sturtions we wnt to perform. Once we hve generted set of ptterns we cn rnk them y two properties - firstly, the size of lphet, nd secondly the edit-distnce. We re interested in ptterns with lrge lphets nd smll edit-distnce. 7. EXPERIMENTS In this section we explore our new technique y first pplying it to hypotheticl code snippet nd then crrying out n experiment to evlute ccurcy where we ttempt to recrete known specifiction from imperfect trces. 7.1 Appliction to exmple code Consider the Jv code in Figure 2. This gives hypotheticl method for sending n rry of lines to n ddress y first connecting to tht ddress, opening strem, sending the lines nd then closing the strem. This exmple contins ug - in the cse where null rry of lines is given the connection is d twice. Let us ssume we execute the ove code, which clls the method three times with different inputs, recording the occurrences of the connect, open, send nd events. The resulting trce would e s follows. connect.open.send.send..connect.open.send... connect.open.send.send.. We now consider mining this trce with two ptterns - the lternting pttern given in the introduction nd the resource usge pttern given in Section 3.2. We tke the lternting pttern first. The following tle gives the filing nd true edit-distnces (filing/true) for the ove trce nd the different instntitions of the lternting pttern - - represents tht no distnce should e given (we do not consider the cse where = ) nd n x represents tht no distnce is returned. The instntition [ open, connect] does not hve filing edit-distnce s it finishes in non-finl stte tht cn e extended to finl stte - this is one drwck of the filing edit-distnce pproch. For [, connect] nd [, open] there is shorter true edit-distnce s this pproch is llowed to mke edits without filure - here this involves removing the lst event to ring the pttern into n ccepting stte. Note tht ll other distnces re the sme, this shows tht in filing edit-distnce cn e good pproximtion of true edit-distnce. connect open send connect - x/2 3/3 4/3 open 0/0-3 4/3 send 2/2 2/2-4/3 1/1 1/1 3/3 - For one cse, [ connect, open] there is distnce of 0 - this is ecuse this instntited pttern mtches the trce exctly. If we consider the two cses where there is n edit-distnce of 1 nd look t the rewrite generted we see tht ll of these produce the sme rewrite - the removl of the ninth event (the second ). Comining the three instntited ptterns with n edit distnce of 0 or 1 we get the following pttern. connect open 3 Now let us consider the resource usge pttern. The following tle gives the filing nd true edit distnces s efore - with ech entry in the tle representing the c dimension using 4-tuple. Here, gin, computed distnces re the sme ut the true edit-distnce pproch genertes some distnces where the filing edit-distnce pproch does not. connect open send connect (-,-,-,-) (-,-,5/5,4/4) (-,3/3,-,5/5) (-,2/2,4/4,-) open (-,-,2/2,1/1) (-,-,-,-) (5,-,-,5) (5/5,-,4/4,-) send (-,4/4,-,1/1) (1/1,-,-,1/1) (-,-,-,-) (x/6,x/6,-,-) (-,4/4,5/5,-) (1/1,-,5/5,-) (3/3,3/3,-,-) (-,-,-,-) There re five instntitions with n edit-distnce of 1, ut they represent different rewritings of the trce. One set removes only the ninth event (s efore) nd one set removes the only first connect event, therefore they re incomptile. When comined these give the two following ptterns respectively.
7 send connect open 3 send, open connect The rewrite for first pttern is comptile with the rewrite for pttern extrcted using the lterntion pttern nd we cn comine these ptterns to form finl specifiction, which is the sme s the one on the left ove, ut with only the initil stte ccepting. 7.2 An ccurcy experiment In the following we ttempt to evlute the ccurcy of our pproch y generting trces from the following specifiction for the Lucene tool descried in [5]. document.document. < init > document.field < init > (String, String, Store, Index) 4 3 document.field < int > (String, Reder) index.indexwriter.dddocument(document) We generte imperfect trces y first generting perfect trces nd then rndomly editing events ccording to some noise level (proility). We then pss these trces to our techniques nd test the resulting ptterns for ccurcy using set of perfect trces generted from the specifiction. Tle 1 gives the results - for ech pproch it reports the verge ccurcy of ll produced ptterns, the minimum edit required to produce pttern with mximum ccurcy, the time (in milliseconds) tken for checking nd then sturtion, the extrcted ptterns nd the size of the 3- sturted set. Experiments were crried out with rnge of trce lengths nd noise levels nd we record the min, men nd mx edits mde t ech noise level. Every experiment produces t lest one pttern with perfect ccurcy. Although the verge ccurcy is low, this is over very lrge sturted set, contining mny (over 50%) ptterns with zero ccurcy. As expected, with zero noise we detect pttern with totl ccurcy tht requires no edits. As expected, s the level of noise (nd therefore level of errors) increses ccurcy decreses. Interestingly, for trce length of 100 nd noise level of 0.05, giving n verge of 6 errors, we chieve 0.57% verge ccurcy. This is due to smll sturted set eing produced. Methods for pruning this set should e explored. We cn see tht the sturted sets re very lrge nd the mjority of time is spent computing this set. Future work should look t methods for reducing this y using more guided pproch. 8. RELATED WORK We consider lterntive techniques tht mine specifictions from runtime trces. A recent survey pper [17] gives good overview of the field. Here we focus on how techniques del with imperfect trces, in prticulr we re interested in utomt-sed pttern mining pproches. Ammons et l. [3] developed n erly pproch tht used proilistic finite utomt lerner from the field of grmmr inference. This techniques requires us to know the lphet of the inferred specifiction eforehnd. Imperfect trces require humn experts to check violtions of the inferred specifiction in coring phse. Lo et l. [13] extend this pproch - one extension tht is relevnt to mining with imperfect trces is the introduction of stge tht ttempts to filter out erroneous trces efore lerning. In contrst we ttempt to use this informtion to extrct specifiction nd identify the error. Techniques tht use frequent-itemset mining (i.e. [12, 18]) nd d frequent sequentil pttern mining (i.e. [14]) rely on computing support nd confidence vlues where support reflects the level of imperfection. Mined specifictions cn then e checked ginst the originl progrm to detect ugs. The utomt-sed pttern-mining technique ws first used y Engler et l. [4]. They focus on the lternting pttern () nd to del with imperfect trces they count the numer of times tht nd pper together in order nd the numer of times ppers without nd compute the likelihood tht nd form specifiction. Goues nd Weimer [8] extend this pproch y considering techniques for pruning flse positives y exmining the source code. Yng et l. [19] introduced templte-sed technique focusing on extrcting specifictions from imperfect trces. They use the lternting pttern nd del with imperfect trces y prtitioning trce into sequences of one event followed y nother, i.e. + +, performing mining on ech sutrce nd then counting the numer of sutrces the pttern holds for. This is similr to restrting the pttern on filure ut llows for lrger rnge of filures. They lso introduce chining heuristic for comining their lternting ptterns. Gel nd Su. [6, 5] extend this pproch y introducing symolic method for specifiction mining using inry decision digrms (BDDs) nd the Jvert tool tht uses two ptterns () nd ( ) nd composition rules sed on the notion of utomt comintion to extrct lrge ptterns. They del with imperfect trces y restrting pttern t the initil stte on filure. In [7] they extend this pproch to infer nd enforce temporl properties t runtime over finite window, thus detecting potentil ugs t runtime. Li et l. [11] extend this pproch to mine specifictions with timing ounds nd more complex pttern composition rules, ut cnnot hndle imperfect trces. Insted their focus is on mining specifictions from perfect trces nd using these to detect ugs in imperfect ones. Finlly, recent techniques [16, 9, 15] hve considered so-clled prmetric where trces contin dt i.e. open(123).open(456).(123). (456). Whilst some pproches use d-hoc pproches to del with context, these pproches focus on slicing the trce sed on this dt nd extrcting trces from the resulting dt-free trces. The work in [9] extends the pproch tken y [3] nd therefore use the sme coring technique to del with imperfect trces nd [15] uses the notions of support nd confidence from dt mining. 9. CONCLUSION In this pper we hve introduced new pproch for mining specifictions from imperfect trces. Two techniques re introduced tht use the notion of edit-distnce to compute the numer of chnges tht would hve to e mde to trce for pttern to hold. We then formlise when it is sfe to comine two imperfect ptterns nd the process is explored y first pplying it to smll code snippet to demonstrte how it works nd then ttempting to mesure the ccurcy of the pproch using trces generted from known specifiction.
8 Trce Noise (min,men,mx) Filing Perfect length level edits Accurcy Edit Time Extrcted P 3 Accurcy Edits Time Extrcted P (0,0,0) (318,9181) (171,364) (1,1,2) (94,21908) (69,29) (1,1,1) (77,20113) (55,18) (1,2,4) (78,30180) (53,24) (0,0,0) (138,4606) (491,549) (1,1,2) (130,26967) (419,339) (4,6,8) (273,31782) (428,132) (6,10,16) (325,1584) (427,75) 36 2 Tle 1: Results from ccurcy experiment This technique not only produces specifictions, ut lso description of how progrm should e updted to mke the specifiction hold. This would e useful in ug detection nd loction ut cse study is required to estlish pplicility. Further work is required to improve the efficiency nd pplicility of the pproch. This should involve the comintion of this pproch with n existing technique, for exmple the symolic mining technique of [6], nd the composition rules of [5] nd [11]. We lso pln on comining this pproch with the uthor s pttern-mining pproch tken in [16], which trgets specific lphet of events to extrct prmetric specifiction. This pproch uses so-clled open utomt tht mens tht ll extrcted ptterns cn e sound comined to form specifiction. Therefore, we would e le to use pttern comintion directly, rther thn introducing pttern composition rules. 10. REFERENCES [1] C. Alluzen nd M. Mohri. 3-wy composition of weighted finite-stte trnsducers. In Proceedings of the 13th interntionl conference on Implementtion nd Applictions of Automt, CIAA 08, pges , Berlin, Heidelerg, Springer-Verlg. [2] C. Alluzen nd M. Mohri. Liner-spce computtion of the edit-distnce etween string nd finite utomton. CoRR, s/ , [3] G. Ammons, R. Bodík, nd J. R. Lrus. Mining specifictions. SIGPLAN Not., 37(1):4 16, Jn [4] D. Engler, D. Y. Chen, S. Hllem, A. Chou, nd B. Chelf. Bugs s devint ehvior: generl pproch to inferring errors in systems code. SIGOPS Oper. Syst. Rev., 35(5):57 72, Oct [5] M. Gel nd Z. Su. Jvert: fully utomtic mining of generl temporl properties from dynmic trces. In Proceedings of the 16th ACM SIGSOFT Interntionl Symposium on Foundtions of softwre engineering, SIGSOFT 08/FSE-16, pges , New York, NY, USA, ACM. [6] M. Gel nd Z. Su. Symolic mining of temporl specifictions. In ICSE 08: Proceedings of the 30th interntionl conference on Softwre engineering, pges 51 60, New York, NY, USA, ACM. [7] M. Gel nd Z. Su. Online inference nd enforcement of temporl properties. In Proceedings of the 32nd ACM/IEEE Interntionl Conference on Softwre Engineering - Volume 1, ICSE 10, pges 15 24, New York, NY, USA, ACM. [8] C. Goues nd W. Weimer. Specifiction mining with few flse positives. In Proceedings of the 15th Interntionl Conference on Tools nd Algorithms for the Construction nd Anlysis of Systems: Held s Prt of the Joint Europen Conferences on Theory nd Prctice of Softwre, ETAPS 2009,, TACAS 09, pges , Berlin, Heidelerg, Springer-Verlg. [9] C. Lee, F. Chen, nd G. Roşu. Mining prmetric specifictions. In Proceeding of the 33rd Interntionl Conference on Softwre Engineering (ICSE 11), pges ACM, [10] V. Levenshtein. Binry Codes Cple of Correcting Deletions, Insertions nd Reversls. Soviet Physics Dokldy, 10:707, [11] W. Li, A. Forin, nd S. A. Seshi. Sclle specifiction mining for verifiction nd dignosis. In DAC 10: Proceedings of the 47th Design Automtion Conference, pges , New York, NY, USA, ACM. [12] Z. Li nd Y. Zhou. Pr-miner: utomticlly extrcting implicit progrmming rules nd detecting violtions in lrge softwre code. SIGSOFT Softw. Eng. Notes, 30(5): , Sept [13] D. Lo nd S.-C. Khoo. Smrtic: towrds uilding n ccurte, roust nd sclle specifiction miner. In Proceedings of the 14th ACM SIGSOFT interntionl symposium on Foundtions of softwre engineering, SIGSOFT 06/FSE-14, pges , New York, NY, USA, ACM. [14] D. Lo, S.-C. Khoo, nd C. Liu. Mining temporl rules for softwre mintennce. J. Softw. Mint. Evol., 20(4): , July [15] D. Lo, G. Rmlingm, V. P. Rngnth, nd K. Vswni. Mining quntified temporl rules: Formlism, lgorithms, nd evlution. Sci. Comput. Progrm., 77(6): , [16] G. Reger, H. Brringer, nd D. Rydeherd. A pttern-sed pproch to prmetric specifiction mining. In Proceedings of the 28th IEEE/ACM Interntionl Conference on Automted Softwre Engineering, Novemer To pper. [17] M. P. Roillrd, E. Bodden, D. Kwrykow, M. Mezini, nd T. Rtchford. Automted pi property inference techniques. IEEE Trnsctions on Softwre Engineering, 39(5): , [18] A. Wsylkowski nd A. Zeller. Mining temporl specifictions from oject usge. Automted Softwre Engg., 18(3-4): , Dec [19] J. Yng, D. Evns, D. Bhrdwj, T. Bht, nd M. Ds. Perrcott: mining temporl pi rules from imperfect trces. In ICSE 06: Proceedings of the 28th interntionl conference on Softwre engineering, pges , New York, NY, USA, ACM.
Automata-based Pattern Mining from Imperfect Traces
Automt-sed Pttern Mining from Imperfect Trces Giles Reger University of Mnchester Oxford Rod, M13 9PL Mnchester, UK regerg@cs.mn.c.uk Howrd Brringer University of Mnchester Oxford Rod, M13 9PL Mnchester,
More informationConvert the NFA into DFA
Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:
More informationDesigning finite automata II
Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of
More informationNondeterminism and Nodeterministic Automata
Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely
More informationChapter 2 Finite Automata
Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht
More information1 Nondeterministic Finite Automata
1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you
More informationCS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted
More informationMinimal DFA. minimal DFA for L starting from any other
Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA
More informationChapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1
Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more
More informationIntermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4
Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one
More informationCMSC 330: Organization of Programming Languages
CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic
More informationFinite Automata-cont d
Automt Theory nd Forml Lnguges Professor Leslie Lnder Lecture # 6 Finite Automt-cont d The Pumping Lemm WEB SITE: http://ingwe.inghmton.edu/ ~lnder/cs573.html Septemer 18, 2000 Exmple 1 Consider L = {ww
More informationLecture 9: LTL and Büchi Automata
Lecture 9: LTL nd Büchi Automt 1 LTL Property Ptterns Quite often the requirements of system follow some simple ptterns. Sometimes we wnt to specify tht property should only hold in certin context, clled
More informationTypes of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2
CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt
More information5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.
Regulr Expressions, Pumping Lemm, Right Liner Grmmrs Ling 106 Mrch 25, 2002 1 Regulr Expressions A regulr expression descries or genertes lnguge: it is kind of shorthnd for listing the memers of lnguge.
More informationTypes of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.
CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt
More informationFormal Languages and Automata
Moile Computing nd Softwre Engineering p. 1/5 Forml Lnguges nd Automt Chpter 2 Finite Automt Chun-Ming Liu cmliu@csie.ntut.edu.tw Deprtment of Computer Science nd Informtion Engineering Ntionl Tipei University
More informationCS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata
CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or
More information1 From NFA to regular expression
Note 1: How to convert DFA/NFA to regulr expression Version: 1.0 S/EE 374, Fll 2017 Septemer 11, 2017 In this note, we show tht ny DFA cn e converted into regulr expression. Our construction would work
More information12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016
CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple
More informationFormal languages, automata, and theory of computation
Mälrdlen University TEN1 DVA337 2015 School of Innovtion, Design nd Engineering Forml lnguges, utomt, nd theory of computtion Thursdy, Novemer 5, 14:10-18:30 Techer: Dniel Hedin, phone 021-107052 The exm
More informationLecture 08: Feb. 08, 2019
4CS4-6:Theory of Computtion(Closure on Reg. Lngs., regex to NDFA, DFA to regex) Prof. K.R. Chowdhry Lecture 08: Fe. 08, 2019 : Professor of CS Disclimer: These notes hve not een sujected to the usul scrutiny
More informationp-adic Egyptian Fractions
p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction
More informationCS 267: Automated Verification. Lecture 8: Automata Theoretic Model Checking. Instructor: Tevfik Bultan
CS 267: Automted Verifiction Lecture 8: Automt Theoretic Model Checking Instructor: Tevfik Bultn LTL Properties Büchi utomt [Vrdi nd Wolper LICS 86] Büchi utomt: Finite stte utomt tht ccept infinite strings
More informationCoalgebra, Lecture 15: Equations for Deterministic Automata
Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined
More informationConverting Regular Expressions to Discrete Finite Automata: A Tutorial
Converting Regulr Expressions to Discrete Finite Automt: A Tutoril Dvid Christinsen 2013-01-03 This is tutoril on how to convert regulr expressions to nondeterministic finite utomt (NFA) nd how to convert
More informationCS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions
CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2016 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 9 1. (4pts) ((p q) (q r)) (p r), prove tutology using truth tles. p
More informationHomework 3 Solutions
CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.
More informationWhere did dynamic programming come from?
Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf
More informationGrammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages
5//6 Grmmr Automt nd Lnguges Regulr Grmmr Context-free Grmmr Context-sensitive Grmmr Prof. Mohmed Hmd Softwre Engineering L. The University of Aizu Jpn Regulr Lnguges Context Free Lnguges Context Sensitive
More informationCS 330 Formal Methods and Models
CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2017 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 2 1. Prove ((( p q) q) p) is tutology () (3pts) y truth tle. p q p q
More informationCMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014
CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA
More information12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014
CS125 Lecture 12 Fll 2014 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple
More informationMore on automata. Michael George. March 24 April 7, 2014
More on utomt Michel George Mrch 24 April 7, 2014 1 Automt constructions Now tht we hve forml model of mchine, it is useful to mke some generl constructions. 1.1 DFA Union / Product construction Suppose
More informationFinite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018
Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More
More informationNFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.
NFA DFA Exmple 3 CMSC 330: Orgniztion of Progrmming Lnguges NFA {B,D,E {A,E {C,D {E Finite Automt, con't. R = { {A,E, {B,D,E, {C,D, {E 2 Equivlence of DFAs nd NFAs Any string from {A to either {D or {CD
More informationDesigning Information Devices and Systems I Spring 2018 Homework 7
EECS 16A Designing Informtion Devices nd Systems I Spring 2018 omework 7 This homework is due Mrch 12, 2018, t 23:59. Self-grdes re due Mrch 15, 2018, t 23:59. Sumission Formt Your homework sumission should
More informationFinite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh
Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is
More informationRegular expressions, Finite Automata, transition graphs are all the same!!
CSI 3104 /Winter 2011: Introduction to Forml Lnguges Chpter 7: Kleene s Theorem Chpter 7: Kleene s Theorem Regulr expressions, Finite Automt, trnsition grphs re ll the sme!! Dr. Neji Zgui CSI3104-W11 1
More information1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.
York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech
More informationParse trees, ambiguity, and Chomsky normal form
Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs
More informationThe University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS
The University of Nottinghm SCHOOL OF COMPUTER SCIENCE LEVEL 2 MODULE, SPRING SEMESTER 2016 2017 LNGUGES ND COMPUTTION NSWERS Time llowed TWO hours Cndidtes my complete the front cover of their nswer ook
More informationLecture 09: Myhill-Nerode Theorem
CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives
More information80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers
80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES 2.6 Finite Stte Automt With Output: Trnsducers So fr, we hve only considered utomt tht recognize lnguges, i.e., utomt tht do not produce ny output on ny input
More informationTheory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38
Theory of Computtion Regulr Lnguges (NTU EE) Regulr Lnguges Fll 2017 1 / 38 Schemtic of Finite Automt control 0 0 1 0 1 1 1 0 Figure: Schemtic of Finite Automt A finite utomton hs finite set of control
More informationCISC 4090 Theory of Computation
9/6/28 Stereotypicl computer CISC 49 Theory of Computtion Finite stte mchines & Regulr lnguges Professor Dniel Leeds dleeds@fordhm.edu JMH 332 Centrl processing unit (CPU) performs ll the instructions
More informationCS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)
CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts
More informationNFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:
CMPU 240 Lnguge Theory nd Computtion Spring 2019 NFAs nd Regulr Expressions Lst clss: Introduced nondeterministic finite utomt with -trnsitions Tody: Prove n NFA- is no more powerful thn n NFA Introduce
More informationGenetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary
Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed
More informationAssignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages
Deprtment of Computer Science, Austrlin Ntionl University COMP2600 Forml Methods for Softwre Engineering Semester 2, 206 Assignment Automt, Lnguges, nd Computility Smple Solutions Finite Stte Automt nd
More informationCS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018
CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35 Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA
More information1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.
York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech
More informationTheory of Computation Regular Languages
Theory of Computtion Regulr Lnguges Bow-Yw Wng Acdemi Sinic Spring 2012 Bow-Yw Wng (Acdemi Sinic) Regulr Lnguges Spring 2012 1 / 38 Schemtic of Finite Automt control 0 0 1 0 1 1 1 0 Figure: Schemtic of
More informationFormal Methods in Software Engineering
Forml Methods in Softwre Engineering Lecture 09 orgniztionl issues Prof. Dr. Joel Greenyer Decemer 9, 2014 Written Exm The written exm will tke plce on Mrch 4 th, 2015 The exm will tke 60 minutes nd strt
More informationWorked out examples Finite Automata
Worked out exmples Finite Automt Exmple Design Finite Stte Automton which reds inry string nd ccepts only those tht end with. Since we re in the topic of Non Deterministic Finite Automt (NFA), we will
More informationJava II Finite Automata I
Jv II Finite Automt I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum für künstliche Intelligenz Finite Automt I p.1/13 Processing Regulr Expressions We lredy lerned out Jv s regulr expression
More informationCHAPTER 1 Regular Languages. Contents
Finite Automt (FA or DFA) CHAPTE 1 egulr Lnguges Contents definitions, exmples, designing, regulr opertions Non-deterministic Finite Automt (NFA) definitions, euivlence of NFAs nd DFAs, closure under regulr
More informationAUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton
25. Finite Automt AUTOMATA AND LANGUAGES A system of computtion tht only hs finite numer of possile sttes cn e modeled using finite utomton A finite utomton is often illustrted s stte digrm d d d. d q
More informationSpeech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research
Speech Recognition Lecture 2: Finite Automt nd Finite-Stte Trnsducers Mehryr Mohri Cournt Institute nd Google Reserch mohri@cims.nyu.com Preliminries Finite lphet Σ, empty string. Set of ll strings over
More information1.4 Nonregular Languages
74 1.4 Nonregulr Lnguges The number of forml lnguges over ny lphbet (= decision/recognition problems) is uncountble On the other hnd, the number of regulr expressions (= strings) is countble Hence, ll
More informationReview of Gaussian Quadrature method
Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge
More informationCS 275 Automata and Formal Language Theory
CS 275 Automt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Problem (II) Chpter II.6.: Push Down Automt Remrk: This mteril is no longer tught nd not directly exm relevnt Anton Setzer (Bsed
More informationRegular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*
Regulr Expressions (RE) Regulr Expressions (RE) Empty set F A RE denotes the empty set Opertion Nottion Lnguge UNIX Empty string A RE denotes the set {} Alterntion R +r L(r ) L(r ) r r Symol Alterntion
More informationa,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1
CS4 45- Determinisitic Finite Automt -: Genertors vs. Checkers Regulr expressions re one wy to specify forml lnguge String Genertor Genertes strings in the lnguge Deterministic Finite Automt (DFA) re nother
More informationSome Theory of Computation Exercises Week 1
Some Theory of Computtion Exercises Week 1 Section 1 Deterministic Finite Automt Question 1.3 d d d d u q 1 q 2 q 3 q 4 q 5 d u u u u Question 1.4 Prt c - {w w hs even s nd one or two s} First we sk whether
More informationHarvard University Computer Science 121 Midterm October 23, 2012
Hrvrd University Computer Science 121 Midterm Octoer 23, 2012 This is closed-ook exmintion. You my use ny result from lecture, Sipser, prolem sets, or section, s long s you quote it clerly. The lphet is
More informationLexical Analysis Finite Automate
Lexicl Anlysis Finite Automte CMPSC 470 Lecture 04 Topics: Deterministic Finite Automt (DFA) Nondeterministic Finite Automt (NFA) Regulr Expression NFA DFA A. Finite Automt (FA) FA re grph, like trnsition
More informationBayesian Networks: Approximate Inference
pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,
More informationCompiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz
University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite
More informationCSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science
CSCI 340: Computtionl Models Kleene s Theorem Chpter 7 Deprtment of Computer Science Unifiction In 1954, Kleene presented (nd proved) theorem which (in our version) sttes tht if lnguge cn e defined y ny
More informationLearning Moore Machines from Input-Output Traces
Lerning Moore Mchines from Input-Output Trces Georgios Gintmidis 1 nd Stvros Tripkis 1,2 1 Alto University, Finlnd 2 UC Berkeley, USA Motivtion: lerning models from blck boxes Inputs? Lerner Forml Model
More information7.1 Integral as Net Change and 7.2 Areas in the Plane Calculus
7.1 Integrl s Net Chnge nd 7. Ares in the Plne Clculus 7.1 INTEGRAL AS NET CHANGE Notecrds from 7.1: Displcement vs Totl Distnce, Integrl s Net Chnge We hve lredy seen how the position of n oject cn e
More informationName Ima Sample ASU ID
Nme Im Smple ASU ID 2468024680 CSE 355 Test 1, Fll 2016 30 Septemer 2016, 8:35-9:25.m., LSA 191 Regrding of Midterms If you elieve tht your grde hs not een dded up correctly, return the entire pper to
More informationSection 6: Area, Volume, and Average Value
Chpter The Integrl Applied Clculus Section 6: Are, Volume, nd Averge Vlue Are We hve lredy used integrls to find the re etween the grph of function nd the horizontl xis. Integrls cn lso e used to find
More informationModel Reduction of Finite State Machines by Contraction
Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900
More informationSpeech Recognition Lecture 2: Finite Automata and Finite-State Transducers
Speech Recognition Lecture 2: Finite Automt nd Finite-Stte Trnsducers Eugene Weinstein Google, NYU Cournt Institute eugenew@cs.nyu.edu Slide Credit: Mehryr Mohri Preliminries Finite lphet, empty string.
More informationLet's start with an example:
Finite Automt Let's strt with n exmple: Here you see leled circles tht re sttes, nd leled rrows tht re trnsitions. One of the sttes is mrked "strt". One of the sttes hs doule circle; this is terminl stte
More informationI1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3
2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is
More information3 Regular expressions
3 Regulr expressions Given n lphet Σ lnguge is set of words L Σ. So fr we were le to descrie lnguges either y using set theory (i.e. enumertion or comprehension) or y n utomton. In this section we shll
More informationFirst Midterm Examination
Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does
More informationTutorial Automata and formal Languages
Tutoril Automt nd forml Lnguges Notes for to the tutoril in the summer term 2017 Sestin Küpper, Christine Mik 8. August 2017 1 Introduction: Nottions nd sic Definitions At the eginning of the tutoril we
More informationScanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy:
Scnner Specifying ptterns source code tokens scnner prser IR A scnner must recognize the units of syntx Some prts re esy: errors mps chrcters into tokens the sic unit of syntx x = x + y; ecomes
More informationFinite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh
Finite Automt Informtics 2A: Lecture 3 Mry Cryn School of Informtics University of Edinburgh mcryn@inf.ed.c.uk 21 September 2018 1 / 30 Lnguges nd Automt Wht is lnguge? Finite utomt: recp Some forml definitions
More information1B40 Practical Skills
B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need
More informationCMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)
CMSC 330: Orgniztion of Progrmming Lnguges DFAs, nd NFAs, nd Regexps (Oh my!) CMSC330 Spring 2018 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All
More informationPART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA
PART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA RIGHT LINEAR LANGUAGES. Right Liner Grmmr: Rules of the form: A α B, A α A,B V N, α V T + Left Liner Grmmr: Rules of the form: A Bα, A α A,B V N, α V T
More informationFirst Midterm Examination
24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet
More informationLexical Analysis Part III
Lexicl Anlysis Prt III Chpter 3: Finite Automt Slides dpted from : Roert vn Engelen, Florid Stte University Alex Aiken, Stnford University Design of Lexicl Anlyzer Genertor Trnslte regulr expressions to
More informationNon-deterministic Finite Automata
Non-deterministic Finite Automt Eliminting non-determinism Rdoud University Nijmegen Non-deterministic Finite Automt H. Geuvers nd T. vn Lrhoven Institute for Computing nd Informtion Sciences Intelligent
More informationDeterministic Finite Automata
Finite Automt Deterministic Finite Automt H. Geuvers nd J. Rot Institute for Computing nd Informtion Sciences Version: fll 2016 J. Rot Version: fll 2016 Tlen en Automten 1 / 21 Outline Finite Automt Finite
More informationDiscrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17
EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,
More informationSeptember 13 Homework Solutions
College of Engineering nd Computer Science Mechnicl Engineering Deprtment Mechnicl Engineering 5A Seminr in Engineering Anlysis Fll Ticket: 5966 Instructor: Lrry Cretto Septemer Homework Solutions. Are
More informationBases for Vector Spaces
Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything
More informationFoundations of XML Types: Tree Automata
1 / 43 Foundtions of XML Types: Tree Automt Pierre Genevès CNRS (slides mostly sed on slides y W. Mrtens nd T. Schwentick) University of Grenole Alpes, 2017 2018 2 / 43 Why Tree Automt? Foundtions of XML
More informationClosure Properties of Regular Languages
Closure Properties of Regulr Lnguges Regulr lnguges re closed under mny set opertions. Let L 1 nd L 2 e regulr lnguges. (1) L 1 L 2 (the union) is regulr. (2) L 1 L 2 (the conctention) is regulr. (3) L
More informationCSCI 340: Computational Models. Transition Graphs. Department of Computer Science
CSCI 340: Computtionl Models Trnsition Grphs Chpter 6 Deprtment of Computer Science Relxing Restrints on Inputs We cn uild n FA tht ccepts only the word! 5 sttes ecuse n FA cn only process one letter t
More informationState Minimization for DFAs
Stte Minimiztion for DFAs Red K & S 2.7 Do Homework 10. Consider: Stte Minimiztion 4 5 Is this miniml mchine? Step (1): Get rid of unrechle sttes. Stte Minimiztion 6, Stte is unrechle. Step (2): Get rid
More informationCS 311 Homework 3 due 16:30, Thursday, 14 th October 2010
CS 311 Homework 3 due 16:30, Thursdy, 14 th Octoer 2010 Homework must e sumitted on pper, in clss. Question 1. [15 pts.; 5 pts. ech] Drw stte digrms for NFAs recognizing the following lnguges:. L = {w
More informationThe size of subsequence automaton
Theoreticl Computer Science 4 (005) 79 84 www.elsevier.com/locte/tcs Note The size of susequence utomton Zdeněk Troníček,, Ayumi Shinohr,c Deprtment of Computer Science nd Engineering, FEE CTU in Prgue,
More informationThoery of Automata CS402
Thoery of Automt C402 Theory of Automt Tle of contents: Lecture N0. 1... 4 ummry... 4 Wht does utomt men?... 4 Introduction to lnguges... 4 Alphets... 4 trings... 4 Defining Lnguges... 5 Lecture N0. 2...
More information