Symbolic Automata for Static Specification Mining

Similar documents
CS 491G Combinatorial Optimization Lecture Notes

2.4 Theoretical Foundations

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

Lecture 6: Coding theory

CS 573 Automata Theory and Formal Languages

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Compression of Palindromes and Regularity.

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Minimal DFA. minimal DFA for L starting from any other

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Logic, Set Theory and Computability [M. Coppenbarger]

NON-DETERMINISTIC FSA

6.5 Improper integrals

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Automata and Regular Languages

Symbolic Automata for Representing Big Code

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Nondeterministic Automata vs Deterministic Automata

Finite State Automata and Determinisation

A Disambiguation Algorithm for Finite Automata and Functional Transducers

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

CSC2542 State-Space Planning

I 3 2 = I I 4 = 2A

GNFA GNFA GNFA GNFA GNFA

Nondeterministic Finite Automata

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

A Primer on Continuous-time Economic Dynamics

Chapter 4 State-Space Planning

INTRODUCTION TO AUTOMATA THEORY

Subsequence Automata with Default Transitions

Factorising FACTORISING.

Section 2.3. Matrix Inverses

Solutions to Problem Set #1

= state, a = reading and q j

System Validation (IN4387) November 2, 2012, 14:00-17:00

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

Lecture 2: Cayley Graphs

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005

CIT 596 Theory of Computation 1. Graphs and Digraphs

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

Lecture 8: Abstract Algebra

Convert the NFA into DFA

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Formal Languages and Automata

More on automata. Michael George. March 24 April 7, 2014

Section 2.1 Special Right Triangles

Basic Derivative Properties

1 Nondeterministic Finite Automata

arxiv: v2 [math.co] 31 Oct 2016

Model Reduction of Finite State Machines by Contraction

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Coalgebra, Lecture 15: Equations for Deterministic Automata

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233,

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Lecture 08: Feb. 08, 2019

Unfoldings of Networks of Timed Automata

Abstraction of Nondeterministic Automata Rong Su

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

POSITIVE IMPLICATIVE AND ASSOCIATIVE FILTERS OF LATTICE IMPLICATION ALGEBRAS

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Monochromatic Plane Matchings in Bicolored Point Set

Prefix-Free Regular-Expression Matching

CMSC 330: Organization of Programming Languages

If the numbering is a,b,c,d 1,2,3,4, then the matrix representation is as follows:

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching

On the Revision of Argumentation Systems: Minimal Change of Arguments Status

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

Chapter 2 Finite Automata

State Minimization for DFAs

Total score: /100 points

Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Lecture Notes No. 10

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Bisimulation, Games & Hennessy Milner logic

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Lecture 11 Binary Decision Diagrams (BDDs)

Solving the Class Diagram Restructuring Transformation Case with FunnyQT

Java II Finite Automata I

Nondeterminism and Nodeterministic Automata

Particle Physics. Michaelmas Term 2011 Prof Mark Thomson. Handout 3 : Interaction by Particle Exchange and QED. Recap

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

MCH T 111 Handout Triangle Review Page 1 of 3

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Lecture 09: Myhill-Nerode Theorem

Alpha Algorithm: Limitations

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza

Eigenvectors and Eigenvalues

Transcription:

Symoli Automt for Stti Speifition Mining Hil Peleg 1, Shron Shohm, Ern Yhv, n Hongseok Yng 1 Tel Aviv University, Isrel Tel Aviv-Yffo Aemi College, Isrel University of Ofor, UK Tehnion, Isrel Astrt. We present forml frmework for stti speifition mining. The min ie is to represent prtil temporl speifitions s symoli utomt utomt where trnsitions my e lele y vriles, n vrile n e sustitute y letter, wor, or regulr lnguge. Using symoli utomt, we onstrut n strt omin for stti speifition mining, pturing oth the prtilness of speifition n the preision of speifition. We show interesting reltionships etween lttie opertions of this omin n ommon opertors for mnipulting prtil temporl speifitions, suh s uiling more informtive speifition y onsoliting two prtil speifitions. 1 Introution Progrmmers mke etensive use of frmeworks n lirries. To perform stnr tsks suh s prsing n XML file or ommuniting with tse, progrmmers use stnr frmeworks rther thn writing oe from srth. Unfortuntely, typil frmework API n involve hunres of lsses with ozens of methos eh, n often requires sequenes of opertions to e invoke on speifi ojets to perform single tsk (e.g., [1, 6, 1,, 1]). Even eperiene progrmmers might spen hours trying to unerstn how to use seemingly simple API [6]. Stti speifition mining tehniques (e.g., [10, 7,, 1]) hve emerge s wy to otin suint esription of usge senrios when working with lirry. However, lthough they emostrte gret prtil vlue, these tehniques o not ress mny interesting n hllenging tehnil questions. In this pper, we present forml frmework for stti speifition mining. The min ie is to represent prtil temporl speifitions s symoli utomt, where trnsitions my e lele y vriles representing unknown informtion. Using symoli utomt, we present n strt omin for stti speifition mining, n show interesting reltionships etween the prtilness n the preision of speifition. Representing Prtil Speifitions using Symoli Automt We fous on generlize typestte speifitions [11, 7]. Suh speifitions pture legl sequenes of metho invotions on given API, n re usully epresse s finite-stte utomt where stte represents n internl stte of the unerlying API, n trnsitions orrespon to API metho invotions. To mke speifition mining more wiely pplile, it is ritil to llow mining from oe snippets, i.e., oe frgments with unknown prts. A nturl pproh for

mining from oe snippets is to pture gps in the snippets using gps in the speifition. For emple, when the oe ontins n invotion of n unknown metho, this pproh reflets this ft in the mine speifition s well (we elorte on this point lter). Our symoli utomton is oneive in orer to represent suh prtil informtion in speifitions. It is finite-stte mhine where trnsitions my e lele y vriles n vrile n e sustitute y letter, wor, or regulr lnguges in ontet sensitive mnner when vrile ppers in multiple strings epte y the stte mhine, it n e reple y ifferent wors in ll these strings. An Astrt Domin for Mining Prtil Speifitions One hllenge for forming n strt omin with symoli utomt is to fin pproprite opertions tht pture the sutle interply etween the prtilness n the preision of speifition. Let us eplin this hllenge using preorer over symoli utomt. When onsiering non-symoli utomt, we typilly use the notion of lnguge inlusion to moel preision we n sy tht n utomton A 1 overpproimtes n utomton A when its lnguge inlues tht of A. However, this stnr pproh is not suffiient for symoli utomt, euse the use of vriles introues prtilness s nother imension for relting the (symoli) utomt. Intuitively, in preorer over symoli utomt, we woul like to pture the notion of symoli utomton A 1 eing more omplete thn symoli utomton A when A 1 hs fewer vriles tht represent unknown informtion. In Setion, we esrie n interesting interply etween preision n prtilness, n efine preorer etween symoli utomt, tht we lter use s sis for n strt omin of symoli utomt. Consoliting Prtil Speifitions After mining lrge numer of prtil speifitions from oe snippets, it is esirle to omine onsistent prtil informtion to yiel onsolite temporl speifitions. This les to the question of omining onsistent symoli utomt. In Setion 7, we show how the join opertion of our strt omin les to n opertor for onsoliting prtil speifitions. Completion of Prtil Speifitions Hving onstrute onsolite speifitions, we n use symoli utomt s queries for oe ompletion. Treting one symoli utomton s query eing mthe ginst tse of onsolite speifitions, we show how to use simultion over symoli utomt to fin utomt tht mth the query (Setion ), n how to use unknown elimintion to fin ompletions of the query utomton (Setion 6). Min Contriutions The ontriutions of this pper re s follows: We formlly efine the notion of prtil typestte speifition se on new notion of symoli utomt. We eplore reltionships etween prtil speifitions long two imensions: (i) preision of symoli utomt, notion tht roughly orrespons to ontinment of non-symoli utomt; n (ii) prtilness of symoli utomt, notion tht roughly orrespons to n utomt hving fewer vriles, whih represent unknown informtion.

We present n strt omin of symoli utomt where opertions of the omin orrespon to key opertors for mnipulting prtil temporl speifitions. We efine the opertions require for lgorithms for onsoliting two prtil speifitions epresse in terms of our symoli utomt, n for ompleting ertin prtil prts of suh speifitions. Relte Work Mishne et. l [7] present prtil frmework for stti speifition mining n query mthing se on utomt. Their frmework imposes restritions on the struture of utomt n they oul e viewe s restrite speil se of the forml frmework introue in this pper. In ontrst to their informl tretment, this pper presents the notion of symoli utomt with n pproprite strt omin. Weimer n Neul [1] use lightweight stti nlysis to infer simple speifitions from given oese. Their insight is to use eeptionl progrm pths s negtive emples for orret API usge. They lern speifitions onsisting of pirs of events,, where n re metho lls, n o not onsier lrger utomt. Monperrus et. l [8] ttempt to ientify missing metho lls when using n API y mining oese. They only ompre ojets with ientil type n sme ontining metho signture, whih only works for inheritne-se APIs. Their pproh els with ientil histories minus k metho lls. Unlike our pproh, it nnot hnle inomplete progrms, non-liner metho ll sequenes, n generl oe queries. Wsylkowski et. l [1] use n intrproeurl stti nlysis to utomtilly mine ojet usge ptterns n ientify usge nomlies. Their pproh is se on ientifying usge ptterns, in the restrite form of pirs of events, refleting the orer in whih events shoul e use. Grusk et. l [] onsiers limite speifitions tht re only pirs of events. [1] lso mines pirs of events in n ttempt to mine prtil orer etween events. [1] mine speifitions (opertionl preonitions) of metho prmeters to etet prolems in oe. The mine speifitions re CTL formuls tht fit into severl pre-efine templtes of formuls. Thus, the user hs to know wht kin of speifitions she is looking for. Shohm et. l [10] use whole-progrm nlysis to sttilly nlyze lients using lirry. Their pproh is not pplile in the setting of prtil progrms n prtil speifition sine they rely on the ility to nlyze the omplete progrm for omplete lis nlysis n for type informtion. Plnowski [9] uses the fiel of wor equtions to ientify ssignments to vriles within onitions on strings with vrile portions n regulr epression. Gnesh et. l [] epn this work with quntifiers n limits on the ssignment size. In oth ses, the regulr lnguge tht the ssignments onsist of oes not llow vriles, isllowing the onept of symoli ssignments of vriles within the rnhing of the utomt for the regulr lnguge. In ition, while wor equtions llow ll preite rguments to hve symoli omponents, the eqution is solve y ompletely onrete ssignment, isllowing the onept of ssigning symoli lnguge. Overview We strt with n informl overview of our pproh y using simple File emple.

1 voi proess(file f) { f.open(); osomething(f); f.lose(); } open X lose () () Fig. 1. () Simple oe snippet using File. The methos open n lose re API methos, n the metho osomething is unknown. () Symoli utomton mine from this snippet. The trnsition orresponing to osomething is represente using the vrile X. Trnsitions orresponing to API methos re lele with metho nme. open nre re lose open write lose () () Fig.. Automt mine from progrms using File to () re fter nre hek; () write..1 Illustrtive Emple Consier the emple snippet of Fig. 1(). We woul like to etrt temporl speifition tht esries how this snippet uses the File omponent. The snippet invokes open n then n unknown metho osomething(f) the oe of whih is not ville s prt of the snippet. Finlly, it lls lose on the omponent. Anlyzing this snippet using our pproh yiels the prtil speifition of Fig. 1(). Tehnilly, this is symoli utomton, where trnsitions orresponing to API methos re lele with metho nme, n the trnsition orresponing to the unknown metho osomething is lele with vrile X. The use of vrile inites tht some opertions my hve een invoke on the File omponent etween open n lose, ut tht this opertion or sequene of opertions is unknown. Now onsier the speifitions of Fig., otine s the result of nlyzing similr frgments using the File API. Both of these speifitions re more omplete thn the speifition of Fig. 1(). In ft, oth of these utomt o not ontin vriles, n they represent non-prtil temporl speifitions. These three seprte speifitions ome from three piees of oe, ut ll ontriute to our knowlege of how the File API is use. As suh, we woul like to e le to ompre them to eh other n to omine them, n in the proess to eliminte s mny of the unknowns s possile using other, more omplete emples. Our first step is to onsolite the speifitions into more omprehensive speifition, esriing s muh of the API s possile, while losing no ehvior represente y the originl speifitions. Net, we woul like to eliminte unknown opertions se on the etr informtion tht the new temporl speifition now ontins with regr to the full API. For instne, where in Fig. 1 we h no knowlege of wht might hppen etween open n lose, the speifition in Fig. () suggests it might e either nre n re, or write. Thus, the symoli pleholer for the unknown opertion is now no longer neee, n the pth leing through X eomes reunnt (s shown in Fig. ()).

open X nre write 7 lose re lose 8 lose 6 open nre write re lose 6 lose () () Fig.. () Automton resulting from omining ll known speifitions of the File API, n () the File API speifitions fter prtil pths hve een susume y more onrete ones. X re Y X open Y lose () () Fig.. () Symoli utomton representing the query for the ehvior roun the metho re n () the ssignment to its symoli trnsitions whih nswers the query. We my now note tht ll three originl speifitions re still inlue in the speifition in Fig. (), even fter the unknown opertion ws remove; the onrete pths re fully there, wheres the pth with the unknown opertion is represente y oth the remining pths. The ility to fin the inlusion of one speifition with unknowns within nother is useful for performing queries. A user my wish to use the File ojet in orer to re, ut e unfmilir with it. He n query the speifition, mrking ny portion he oes not know s n unknown opertion, s in Fig. (). As this very prtil speifition is inlue in the API s speifition, there will e mth. Furthermore, we n eue wht shoul reple the symoli portions of the query. This mens the user n get the reply to his query tht X shoul e reple y open n Y y lose. Fig. shows more omple query n its ssignment. The ssignment to the vrile X is me up of two ifferent ssignments for the ifferent ontets surrouning X: when followe y write, X is ssigne open, n when followe y re, X is ssigne the wor open,nre. Even though the rnhing point in Fig. () is not ientil to the one in the query, the query n still return orret result using ontets.. An Astrt Domin of Symoli Automt To provie forml kgroun for the opertions we emonstrte here informlly, we efine n strt omin se on symoli utomt. Opertions in the omin orrespon to nturl opertors require for effetive speifition mining n nswering oe serh queries. Our strt omin serves ul gol: (i) it is use to represent prtil temporl speifition uring the nlysis of eh iniviul oe snippet; (ii) it is use for onsolition n nswering oe serh queries ross multiple snippets In its first role use in the nlysis of single snippet the strt omin n further employ quotient strtion to gurntee tht symoli utomt o not grow

X re write Y Z (*,X,re) open,nre (*,X,write) open Y lose Z lose () () Fig.. () Symoli utomton representing the query for the ehvior roun re n write methos n () the ssignment with ontets to its symoli trnsitions whih nswers the query. without oun ue to loops or reursion [10]. In Setion., we show how to otin lttie se on symoli utomt. In seon role use for onsolition n nswering oe-serh queries query mthing n e unerstoo in terms of unknown elimintion in symoli utomt (epline in Setion 6), n onsolition n e unerstoo in terms of join in the strt omin, followe y minimiztion (epline in Setion 7). Symoli Automt We represent prtil typestte speifitions using symoli utomt: Definition 1. A eterministi symoli utomton (DSA) is tuple Σ, Q, δ, ι, F, Vrs where: Σ is finite lphet,,,...; Q is finite set of sttes q, q,...; δ is prtil funtion from Q (Σ Vrs) to Q, representing trnsition reltion; ι Q is n initil stte; F Q is set of finl sttes; Vrs is finite set of vriles, y, z,.... Our efinition mostly follows the stnr notion of eterministi finite utomt. Two ifferenes re tht trnsitions n e lele not just y lphets ut y vriles, n tht they re prtil funtions, inste of totl ones. Hene, n utomton might get stuk t letter in stte, euse the trnsition for the letter t the stte is not efine. We write (q, l, q ) δ for trnsition δ(q, l) = q where q, q Q n l Σ Vrs. If l Vrs, the trnsition is lle symoli. We eten δ to wors over Σ Vrs in the usul wy. Note tht this etension of δ over wors is prtil funtion, euse of the prtility of the originl δ. When we write δ(q, sw) Q 0 for suh wors sw n stte set Q 0 in the rest of the pper, we men tht δ(q, sw) is efine n elongs to Q 0. From now on, we fi Σ n Vrs n omit them from the nottion of DSA..1 Semntis For DSA A, we efine its symoli lnguge, enote SL(A), to e the set of ll wors over Σ Vrs epte y A, i.e., SL(A) = {sw (Σ Vrs) δ(ι, sw) F }.

e () () Fig. 6. DSAs () n (). Wors over Σ Vrs re lle symoli wors, wheres wors over Σ re lle onrete wors. Similrly, lnguges over Σ Vrs re symoli, wheres lnguges over Σ re onrete. The symoli lnguge of DSA n e interprete in ifferent wys, epening on the semntis of vriles: (i) vrile represents sequene of letters from Σ; (ii) vrile represents regulr lnguge over Σ; (iii) vrile represents ifferent sequenes of letters from Σ uner ifferent ontets. All ove interprettions of vriles, eept for the lst, ssign some vlue to vrile while ignoring the ontet in whih the vrile lies. This is not lwys esirle. For emple, onsier the DSA in Fig. 6(). We wnt to e le to interpret s when it is followe y, n to interpret it s e when it is followe y (Fig. 6()). Motivte y this emple, we fous here on the lst possiility of interpreting vriles, whih lso onsiers their ontet. Formlly, we onsier the following efinitions. Definition. A ontet-sensitive ssignment, or in short ssignment, σ is funtion from (Σ Vrs) Vrs (Σ Vrs) to NonEmptyRegLngOn(Σ Vrs). When σ mps (sw 1,, sw ) to SL, we refer to (sw 1, sw ) s the ontet of. The mening is tht n ourrene of in the ontet (sw 1, sw ) is to e reple y SL (i.e., y ny wor from SL). Thus, it is possile to ssign multiple wors to the sme vrile in ifferent ontets. The ontet use in n ssignment is the full ontet preeing n following. In prtiulr, it is not restrite in length n it n e symoli, i.e., it n ontin vriles. Note tht these ssignments onsier liner ontet of vrile. A more generl efinition woul onsier the rnhing ontet of vrile (or symoli trnsition). Formlly, pplying σ to symoli wor ehves s follows. For symoli wor sw = l 1 l... l n, where l i Σ Vrs for every 1 i n, σ(sw) = SL 1 SL... SL n where (i) SL i = {l i } if l i Σ; n (ii) SL i = SL if l i Vrs is vrile n σ(l 1...l i 1,, l i+1...l n ) = SL. Aoringly, for symoli lnguge SL, σ(sl) = {σ(sw) sw SL}. Definition. An ssignment σ is onrete if its imge onsists of onrete lnguges only. Otherwise, it is symoli. If σ is onrete then σ(sl) is onrete lnguge, wheres if σ is symoli then σ(sl) n still e symoli.

g 6 () () Fig. 7. DSA efore n fter ssignment In the sequel, when σ mps some to the sme lnguge in severl ontets, we sometimes write σ(c 1,, C ) = SL s n revition for σ(sw 1,, sw ) = SL for every (sw 1, sw ) C 1 C. We lso write s n revition for (Σ Vrs). Emple 1. Consier the DSA A from Fig. 6(). Its symoli lnguge is {, }. Now onsier the onrete ssignment σ : (,, ), (,, ) e. Then σ() = {} n σ() = {e}, whih mens tht σ(sl(a)) = {, e}. If we onsier σ : (,, ), (,, ) (e ), then σ() = n σ() = (e ), whih mens tht σ(sl(a)) = ( ) ((e ) ). Emple. Consier the DSA A epite in Fig. 7() n onsier the symoli ssignment σ whih mps (,, ) to g, n mps in ny other ontet to. The ssignment is symoli sine in ny inoming ontet other thn, is ssigne. Then Fig. 7() presents DSA for σ(sl(a)). Completions of DSA Eh onrete ssignment σ to DSA A results in some ompletion of SL(A) into lnguge over Σ (.f. Emple 1). We efine the semntis of DSA A, enote A, s the set of ll lnguges over Σ otine y onrete ssignments: A = {σ(sl(a)) σ is onrete ssignment}. We ll A the set of ompletions of A. For emple, for the DSA from Fig. 6(), {, e} A (see Emple 1). Note tht if DSA A hs no symoli trnsition, i.e. SL(A) Σ, then A = {SL(A)}. An Astrt Domin for Speifition Mining In this setion we ly the groun for efining ommon opertions on DSAs y efining preorer on DSAs. In lter setions, we use this preorer to efine n lgorithm for query mthing (Setion ), ompletion of prtil speifition (Setion 6), n onsolition of multiple prtil speifition (Setion 7). The efinition of preorer over DSAs is motivte y two onepts. The first is preision. We re intereste in pturing tht one DSA is n overpproimtion of nother, in the sense of esriing more ehviors (sequenes) of n API. When DFAs re onsiere, lnguge inlusion is suitle for pturing preision (strtion) reltion etween utomt. The seon is prtilness. We woul like to pture tht DSA is more omplete thn nother in the sense of hving less vriles tht stn for unknown informtion.

0 1 () () e () () Fig. 8. Dimensions of the preorer on DSAs.1 Preorer on DSAs Our preorer omines preision n prtilness. Sine the notion of prtilness is less stnr, we first eplin how it is pture for symoli wors. The onsiertion of symoli wors rther thn DSAs llows us to ignore the imension of preision n fous on prtilness, efore we omine the two. Preorer on Symoli Wors Definition. Let sw 1, sw e symoli wors. sw 1 sw if for every onrete ssignment σ to sw, there is onrete ssignment σ 1 to sw 1 suh tht σ 1 (sw 1 ) = σ (sw ). This efinition ptures the notion of symoli wor eing more onrete or more omplete thn nother: Intuitively, the property tht no mtter how we fill in the unknown informtion in sw (using onrete ssignment), the sme ompletion n lso e otine y filling in the unknowns of sw 1, ensures tht every unknown of sw is lso unknown in sw 1 (whih n e fille in the sme wy), ut sw 1 n hve itionl unknowns. Thus, sw hs no more unknowns thn sw 1. In prtiulr, {σ(sw 1 ) σ is onrete ssignment} {σ(sw ) σ is onrete ssignment}. Note tht when onsiering two onrete wors w 1, w Σ (i.e., without ny vrile), w 1 w iff w 1 = w. In this sense, the efinition of over symoli wors is reltion of equlity over wors. For emple, y oring to our efinition. Intuitively, this repltionship hols euse is more omplete (rries more informtion) thn y. Symoli Inlusion of DSAs We now efine the preorer over DSAs tht omines preision with prtilness. On the one hn, we sy tht DSA A is igger thn A 1, if A esries more possile ehviors of the API, s pture y stnr utomt inlusion. For emple, see the DSAs () n () in Fig. 8. On the other hn, we sy tht DSA A is igger thn A 1, if A esries more omplete ehviors, in terms of hving less unknowns. For emple, see the DSAs () n () in Fig. 8. However, these emples re simple in the sense of seprting the preision n the prtilness imensions. Eh of these emples emonstrtes one imension only.

We re lso intereste in hnling ses tht omine the two, suh s ses where A 1 n A represent more thn one wor, thus the notion of ompleteness of symoli wors lone is not pplile, n in ition the lnguge of A 1 is not inlue in the lnguge of A per se, e.g., sine some of the wors in A 1 re less omplete thn those of A. This les us to the following efinition. Definition (symoli-inlusion). A DSA A 1 is symolilly-inlue in DSA A, enote y A 1 A, if for every onrete ssignment σ of A there eists onrete ssignment σ 1 of A 1, suh tht σ 1 (SL(A 1 )) σ (SL(A )). The ove efinition ensures tht for eh onrete lnguge L tht is ompletion of A, A 1 n e ssigne in wy tht will result in its lnguge eing inlue in L. This mens tht the onrete prts of A 1 n A mit the inlusion reltion, n A is more onrete thn A 1. Equivlently: A 1 is symolilly-inlue in A iff for every L A there eists L 1 A 1 suh tht L 1 L. Emple. The DSA epite in Fig. 6() is symolilly-inlue in the one epite in Fig. 6(), sine for ny ssignment σ to (), the ssignment σ 1 to () tht will yiel lnguge tht is inlue in the lnguge of () is σ : (,, ), (,, ) e. Note tht if we h onsiere ssignments to vrile without ontet, the sme woul not hol: If we ssign to the sequene, the wor from the ssigne () will remin unmthe. If we ssign e to, the wor e will remin unmthe. If we ssign to the lnguge e, then oth of the ove wors will remin unmthe. Therefore, when onsiering ontet-free ssignments, there is no suitle ssignment σ 1. Theorem 1. is refleive n trnsitive. Struturl Inlusion As sis for n lgorithm for heking if symoli-inlusion hols etween two DSAs, we note tht provie tht ny lphet Σ n e use in ssignments, the following efinition is equivlent to Definition. Definition 6. A 1 is struturlly-inlue in A if there eists symoli ssignment σ to A 1 suh tht σ(sl(a 1 )) SL(A ). We sy tht σ witnesses the struturl inlusion of A 1 in A. Theorem. Let A 1, A e DSAs. Then A 1 A iff A 1 is struturlly-inlue in A. The following orollry provies nother suffiient onition for symoli-inlusion: Corollry 1. If SL(A 1 ) SL(A ), then A 1 A. Emple. The DSA epite in Fig. 9() is not symolilly-inlue in the one epite in Fig. 9() sine no symoli ssignment to () will sustitute the symoli wor g y (symoli) wor (or set of wors) in (). This is euse ssignments nnot rop ny of the ontets of vrile (e.g., the outgoing g ontet of ). Suh ssignments re unesirle sine removl of ontets mounts to removl of oserve ehviors.

f 0 1 y 9 10 g f g 6 7 8 0 1 y 9 10 e f g 7 8 () () 0 1 1 e h z h 9 1 1 1 7 1 e 6 10 f g f 11 16 g 8 17, 18 g f 19 0 () Fig. 9. Emple for se where there is no ssignment to either () or () to show () () or () (), n where there is suh n ssignment for () so tht () (). On the other hn, the DSA epite in Fig. 9() is symolilly-inlue in the one epite in Fig. 9(), sine there is witnessing ssignment tht mintins ll the ontets of : σ : (,, ), (,, f ) h, (,, g ) eh e, (y,, ), (, y, ) z. Assigning σ to () results in DSA whose symoli lnguge is stritly inlue in the symoli lnguge of (). Note tht symoli-inlusion hols espite of the ft tht in () there is no longer stte with n inoming event n oth n outgoing f n n outgoing g events while eing rehle from the stte 1. This emple emonstrtes our interest in liner ehviors, rther thn in rnhing ehvior. Note tht in this emple, symoli-inlusion woul not hol if we i not llow to refer to ontets of ny length (n in prtiulr length > 1).. A Lttie for Speifition Mining As stte in Theorem 1, is refleive n trnsitive, n therefore preorer. However, it is not ntisymmetri. This is not surprising, sine for DFAs ollpses into stnr utomt inlusion, whih is lso not ntisymmetri (ue to the eistene of ifferent DFAs with the sme lnguge). In the se of DSAs, symoli trnsitions re n itionl soure of prolem, s emonstrte y the following emple. Emple. The DSAs in Fig. 10 stisfy in oth iretions even though their symoli lnguges re ifferent. DSA () is trivilly symolilly-inlue in () sine the symoli lnguge of () is suset of the symoli lnguge of () (see Corollry 1). Emining the emple losely shows tht the reson tht symoli-inlusion

e e 6 7 () () Fig. 10. Equivlent DSAs w.r.t. symoli-inlusion lso hols in the other iretion is the ft tht the symoli lnguge of DSA () ontins the symoli wor, s well s the onrete wor, whih is ompletion of. In this sense, is susume y the rest of the DSA, whih mounts to DSA (). In orer to otin prtil orer we follow stnr onstrution of turning preorere set to prtilly orere set. We first efine the following equivlene reltion se on : Definition 7. DSAs A 1 n A re symolilly-equivlent, enote y A 1 A, iff A 1 A n A A 1. Theorem. is n equivlene reltion over the set DSA of ll DSAs. We now lift the isussion to the quotient set DSA/, whih onsists of the equivlene lsses of DSA w.r.t. the equivlene reltion. Definition 8. Let [A 1 ], [A ] DSA/. Then [A 1 ] [A ] if A 1 A. Theorem. is prtil orer over DSA/. Definition 9. For DSAs A 1 n A, we use union(a 1, A ) to enote union DSA for A 1 n A, efine similrly to the efinition of union of DFAs. Tht is, union(a 1, A ) is DSA suh tht SL(union(A 1, A )) = SL(A 1 ) SL(A ). Theorem. Let [A 1 ], [A ] DSA/ n let union(a 1, A ) e union DSA for A 1 n A. Then [union(a 1, A )] is the lest upper oun of [A 1 ] n [A ] w.r.t.. Corollry. (DSA/, ) is join semi-lttie. The element in the lttie is the equivlene lss of DSA for. The element is the equivlene lss of DSA for Σ. Query Mthing using Symoli Simultion Given query in the form of DSA, n tse of other DSAs, query mthing ttempts to fin DSAs in the tse tht symolilly inlue the query DSA. In this setion, we esrie notion of simultion for DSAs, whih preisely ptures the preorer on DSAs n serves sis of ore lgorithms for mnipulting symoli utomt. In prtiulr, in Setion., we provie n lgorithm for omputing symoli simultion tht n e iretly use to etermine when symoli inlusion hols.

.1 Symoli Simultion Let A 1 n A e DSAs Q 1, δ 1, ι 1, F 1 n Q, δ, ι, F, respetively. Definition 10. A reltion H Q 1 ( Q \ { }) is symoli simultion from A 1 to A if it stisfies the following onitions: () (ι 1, {ι }) H; () for every (q, B) H, if q is finl stte, some stte in B is finl; () for every (q, B) H n q Q 1, if q = δ 1 (q, ) for some Σ, B s.t. (q, B ) H B {q q B s.t. q = δ (q, )}; () for every (q, B) H n q Q 1, if q = δ 1 (q, ) for Vrs, B s.t. (q, B ) H B {q q B s.t. q is rehle from q }. We sy tht (q, B ) in the thir or fourth item ove is witness for ((q, B), l), or n l-witness for (q, B) for l Σ Vrs. Finlly, A 1 is symolilly simulte y A if there eists symoli simultion H from A 1 to A. In this efinition, stte q of A 1 is simulte y nonempty set B of sttes from A, with the mening tht their union overpproimtes ll of its outgoing ehviors. In other wors, the role of q in A 1 is split mong the sttes of B in A. A split rises from symoli trnsitions, ut the split of the trget of symoli trnsition n e propgte forwr for ny numer of steps, thus llowing sttes to e simulte y sets of sttes even if they re not the trget of symoli trnsition. This ounts for splitting tht is performe y n ssignment with ontet longer thn one. Note tht sine we onsier eterministi symoli utomt, the sizes of the sets use in the simultion re monotonilly eresing, eept for when trget of symoli trnsition is onsiere, in whih se the set inreses in size. Note tht stte q 1 of A 1 n prtiipte in more thn one simultion pir in the ompute simultion, s emonstrte y the following emple. Emple 6. Consier the DSAs in Fig. 9() n (). In this se, the simultion will e H = { (0, {0}), (1, {1}), (, {, 6, 9}), (, {}), (, {, 10}), (, {7}), (6, {1}) (7, {11}), (8, {8}), (9, {1}), (10, {1}), (1, {16}), (, {17}), (, {18}), (7, {0}), (8, {19}), (, {18}), (, {0}), (6, {19}) }. One n see tht stte in (), whih is the trget of the trnsition lele, is split etween sttes, 6 n 9 of (). In the net step, fter seeing from stte in (), the trget stte rehe (stte ) is simulte y singleton set. On the other hn, fter seeing from stte in (), the trget stte rehe (stte ), is still split, however this time to only two sttes: n 10 in (). In the net step, no more splitting ours. Note tht the stte 1 in () is simulte oth y {1} n y {16}. Intuitively, eh of these sets simultes the stte 1 in nother inoming ontet ( n respetively). Theorem 6 (Sounness). For ll DSAs A 1 n A, if there is symoli simultion H from A 1 to A, then A 1 A.

Our proof of this theorem uses Theorem n onstruts esire symoli ssignment σ tht witnesses struturl inlusion of A 1 in A epliitly from H. This onstrution shows, for ny symoli wor in SL(A 1 ), the ssignment (ompletion) to ll vriles in it (in the orresponing ontet). Tken together with our net ompleteness theorem (Theorem 7), this onstrution supports view tht symoli simultion serves s finite representtion of symoli ssignment in the preorer. We evelop this further in Setion 6. Theorem 7 (Completeness). For l DSAs A 1 n A, if A 1 symoli simultion H from A 1 to A. A, then there is. Algorithm for Cheking Simultion A miml symoli simultion reltion n e ompute using gretest fipoint lgorithm (similrly to the stnr simultion). A nive implementtion woul onsier ll sets in Q, mking it eponentil. More effiiently, we otin symoli simultion reltion H y n lgorithm tht trverses oth DSAs simultneously, strting from (ι 1, {ι }), similrly to omputtion of prout utomton. For eh pir (q 1, B ) tht we eplore, we mke sure tht if q 1 F 1, then B F. If this is not the se, the pir is remove. Otherwise, we trverse ll the outgoing trnsitions of q 1, n for eh one, we look for witness in the form of nother simultion pir, s require y Definition 10 (see elow). If witness is foun, it is e to the list of simultion pirs tht nee to e eplore. If no witness is foun, the pir (q 1, B ) is remove. When simultion pir is remove, ny simultion pir for whih it is witness n no other witness eists is lso remove (for effiieny, we lso remove ll its witnesses tht re not witnesses for ny other pirs). If t some point (ι 1, {ι }) is remove, then the lgorithm onlues tht A 1 is not symolilly simulte y A. If no more pirs re to e eplore, the lgorithm onlues tht there is symoli simultion, n it is returne. Consier nite simultion pir (q 1, B ). When looking for witness for some trnsition of q 1, ruil oservtion is tht if some set B Q simultes stte q 1 Q 1, then ny superset of B lso simultes q 1. Therefore, s witness we the miml set tht fulfills the requirement: if we fil to prove tht q 1 is simulte y the miml nite for B, then we will lso fil with ny other nite, mking it unneessry to hek. Speifilly, for n -trnsition, where Σ, from q 1 to q 1, the witness is (q 1, B ) where B = {q q B s.t. q = δ (q, )}. If B = then no witness eists. For symoli trnsition from q 1 to some q 1, the witness is (q 1, B ) where B is the set of ll sttes rehle from the sttes in B (note tht B s it ontins t lest the sttes of B ). In oth ses, if q 1 is finl stte, we mke sure tht B ontins t lest one finl stte s well. Otherwise, no witness eists. In orer to prevent heking the sme simultion pir, or relte pirs, over n over gin, we keep ll remove pirs. When witness (q 1, B ) is to e e s simultion pir, we mke sure tht no simultion pir (q 1, B ) where B B ws lrey remove. If suh pir ws remove, then lerly, (q 1, B ) will lso e remove. Moreover, sine B ws hosen s the miml set tht fulfills the requirement, ny

other possile witness will omprise of its suset n will therefore lso e remove. Thus, in this se, no witness is otine. As n optimiztion, when for some simultion pir (q 1, B ) we ientify tht ll the witnesses rehle from it hve een verifie n remine s simultion pirs, we mrk (q 1, B ) s verifie. If simultion pir (q 1, B ) is to e e s witness for some pir where B B, we n utomtilly onlue tht (q 1, B ) will lso e verifie. We therefore mrk it immeitely s verifie, n onsier the witnesses of (q 1, B ) s its witnesses s well. Note tht in this se, the otine witnesses re not miml. Alterntively, it is possile to simply use (q 1, B ) inste of (q 1, B ). Sine this optimiztion mges the mimlity of the witnesses, it is not use when miml witnesses re esire (e.g., when looking for ll possile unknown elimintion results). Emple 7. Consier the DSAs epite in Fig. 9() n (). A simultion etween these DSAs ws presente in Emple 6. We now present the simultion ompute y the ove lgorithm, where miml sets re use s the sets simulting given stte. H = {(0, {0}), (1, {1}), (, {1,..., 1, 1}), (, {}), (, {, 10, 1}), (, {7}), (6, {1}), (7, {11}), (8, {8}), (9, {1}), (10, {1,..., 0}), (1, {16}), (, {16,..., 0}), (, {18}), (, {18}), (, {0}), (6, {19}), (7, {0}), (8, {19})}. For emple, the pir (, {1,..., 1, 1}) is ompute s n -witness for (1, {1}), even though the suset {, 6, 9} of {1,..., 1, 1} suffies to simulte stte. 6 Completion using Unknown Elimintion Let A 1 e DSA tht is symolilly-inlue in A. This mens tht the onrete prts of A 1 eist in A s well, n the prtil prts of A 1 hve some ompletion in A. Our gol is to e le to eliminte (some of) the unknowns in A 1 se on A. This mounts to fining (possily symoli) ssignment to A 1 suh tht σ(sl(a 1 )) SL(A ) (whose eistene is gurntee y Theorem ). We re intereste in proviing some finite representtion of n ssignment σ erive from simultion H. Nmely, for eh vrile Vrs, we woul like to represent in some finite wy the ssignments to in every possile ontet in A 1. When the set of ontets in A 1 is finite, this n e performe for every symoli wor (ontet) seprtely s esrie in the proof of Theorem 6. However, in this setion we lso wish to hnle ses where the set of possile ontets in A 1 is infinite. We hoose unique witness for every simultion pir (q 1, B ) in H n every trnsition l Σ Vrs from q 1. Whenever we refer to n l-witness of (q 1, B ) in the rest of this setion, we men this hosen witness. The reson for mking this hoie will eome ler lter on. Let Vrs e vrile. To ientify the possile ompletions of, we ientify ll the symoli trnsitions lele y in A 1, n for eh suh trnsition we ientify ll the sttes of A tht prtiipte in simulting its soure n trget sttes, q 1 n q 1 respetively. The sttes simulting q 1 n q 1 re given y sttes in simultion pirs (q 1, B ) H n (q 1, B ) H respetively. The pths in A etween sttes in B n B will provie the ompletions (ssignments) of, where the orresponing ontets

will e otine y trking the pths in A 1 tht le to (n from) the orresponing simultion pirs, where we mke sure tht the sets of ontets re pirwise isjoint. Formlly, for ll q 1, q 1, with δ(q 1, ) = q 1, we o the following: () For every simultion pir (q 1, B ) H we ompute set of inoming ontets, enote in(q 1, B ) (see omputtion of inoming ontets in the net pge). These ontets represent the inoming ontets of q 1 uner whih it is simulte y B. The sets in(q 1, B ) re ompute suh tht the sets of ifferent B sets re pirwise-isjoint, n form prtition of the set of inoming ontets of q 1 in A 1. () For every (q 1, B ) H whih is n -witness of some (q 1, B ) H, n for every q B, we ompute set of outgoing ontets, enote out(q 1, B, q ) (see omputtion of outgoing ontets). These ontets represent the outgoing ontets of q 1 uner whih it is simulte y the stte q of B. The sets out(q 1, B, q ) re ompute suh tht the sets of ifferent sttes q B re pirwise-isjoint n form prtition of the set of outgoing ontets of q 1 in A 1. () For every pir of simultion pirs (q 1, B ), (q 1, B ) H where (q 1, B ) is n -witness, n for every pir of sttes q B n q B, suh tht q ontriutes q to the witness (see omputtion of outgoing ontets), we ompute the set of wors leing from q to q in A. We enote this set y lng(q, q ). The ontriution reltion ensures tht for every stte q B there is t most one stte q B suh tht lng(q, q ). () Finlly, for every pir of simultion pirs (q 1, B ), (q 1, B ) H where (q 1, B ) is n -witness of (q 1, B ), n for every pir of sttes q B n q B, if in(q 1, B ) n out(q 1, B, q ) n lng(q, q ), then we efine σ(in(q 1, B ),, out(q 1, B, q )) = lng(q, q ). For ll other ontets, σ is efine ritrrily. Note tht in step (), for ll the sttes q B the sme set of inoming ontets is use (in(q 1, B )), wheres for every q B, seprte set of outgoing ontets is use (out(q 1, B, q )). This mens tht ssignments to tht result from sttes in the sme B o not iffer in their inoming ontet, ut they iffer y their outgoing ontets, s ensure y the property tht the sets out(q 1, B, q ) of ifferent sttes q B re pirwise-isjoint. Assignments to tht result from sttes in ifferent B sets iffer in their inoming ontet, s ensure y the property tht the sets in(q 1, B ) of ifferent B sets re pirwise-isjoint. Assignments to tht result from ifferent trnsitions lele y lso iffer in their inoming ontets, s ensure y the property tht A 1 is eterministi, n hene the set of inoming ontets of eh stte in A 1 re pirwise isjoint. Altogether, there is unique omintion of inoming n outgoing ontets for eh ssignment of. Computtion of Inoming Contets: To ompute the set in(q 1, B ) of inoming ontets of q 1 uner whih it is simulte y B, we efine the witness grph G W = (Q W, δ W ). This is lele grph whose sttes Q W re ll simultion pirs, n whose trnsitions δ W re given y the witness reltion: ((q 1, B ), l, (q 1, B )) δ W ) is l-witness of (q 1, B ). iff (q 1, B To ompute in(q 1, B ), we erive from G W DSA, enote A W (q 1, B ), y setting the initil stte to (ι 1, {ι }) n the finl stte to (q 1, B ). We then efine in(q 1, B )

to e SL(A W (q 1, B )), esriing ll the symoli wors leing from (ι 1, {ι }) to (q 1, B ) long the witness reltion. These re the ontets in A 1 for whih this witness is relevnt. By our prtiulr hoie of witnesses for H, the witness grph is eterministi n hene eh inoming ontet in it will le to t most one simultion pir. Thus, the sets in(q 1, B ) prtition the inoming ontets of q 1, mking the inoming ontets in(q 1, B ) of ifferent sets B pirwise-isjoint. Computtion of Outgoing Contets: To ompute the set out(q 1, B, q ) of outgoing ontets of q 1 uner whih it is simulte y the stte q of B, we efine ontriution reltion se on the witness reltion, n oringly ontriution grph G C. Nmely, for (q 1, B ), (q 1, B ) H suh tht (q 1, B ) is n l-witness of (q 1, B ), we sy tht q B ontriutes q B to the witness if q hs orresponing l- trnsition (if l Σ) or orresponing pth (if l Vrs) to q. If two sttes q q in B ontriute the sme stte q B to the witness, then we keep only one of them in the ontriution reltion. The ontriution grph is lele grph G C = (Q C, δ C ) whose sttes Q C re triples (q 1, B, q ) where (q 1, B ) H n q B. In this grph, trnsition ((q 1, B, q ), l, (q 1, B, q )) δ C eists iff (q 1, B ) is n l-witness of (q 1, B ) n q ontriutes q to the witness. Note tht G C refines G W in the sense tht its sttes re susttes of G W n so re its trnsitions. However, unlike W C, G C is noneterministi sine multiple sttes q B n hve outgoing l-trnsitions. To ompute out(q 1, B, q ) we erive from G C noneterministi version of our symoli utomton, enote A C (q 1, B, q ), y setting the initil stte to (q 1, B, q ) n the finl sttes to triples (q 1, B, q ) where q 1 is finl stte of A 1 n q is finl stte in A. Then out(q 1, B, q ) = SL(A C (q 1, B, q )). This is the set of outgoing ontets of q 1 in A 1 for whih the stte q of the simultion pir (q 1, B ) is relevnt. Tht is, it is use to simulte some outgoing pth of q 1 leing to finl stte. However, the sets SL(A C (q 1, B, q )) of ifferent q B re not neessrily isjoint. In orer to ensure isjoint sets of outgoing ontets out(q 1, B, q ) for ifferent sttes q within the sme B, we nee to ssoite ontets in the intersetion of the outgoing ontets of severl triples with one of them. Importntly, in orer to ensure onsisteny in the outgoing ontets ssoite with ifferent, ut relte triples, we require the following onsisteny property: If δ W ((q 1, B ), sw) = (q 1, B ) then for every q B, {sw} out(q 1, B, q ) {out(q 1, B, q ) q B (q 1, B, q ) δ C ((q 1, B, q ), sw)}. This mens tht the outgoing ontets ssoite with some triple (q 1, B, q ) re suset of the outgoing ontets of triples tht le to it in G C, trunte y the orresponing wor tht les to (q 1, B, q ). Note tht this property hols trivilly if out(q 1, B, q ) = SL(A C (q 1, B, q )), s is the se if these sets re lrey pirwise-isjoint n no itionl mnipultion is neee. The following lemm ensures tht if the intersetions of the out sets of ifferent q sttes in the sme set B re eliminte in wy tht stisfies the onsisteny property, then orretness is gurntee. In mny ses (inluing the se where A 1 ontins no loops, n the se where no two symoli trnsitions re rehle from eh other) this n e hieve y simple heuristis. In ition, in mny ses the

simultion H n e mnipulte suh tht the sets SL(A C (q 1, B, q )) themselves will eome pirwise isjoint. Lemm 1. If for every (q 1, B, q ) Q C, out(q 1, B, q ) SL(A C (q 1, B, q )), n for every (q 1, B ) Q W, q out(q 1, B, q ) = B q SL(A B C (q 1, B, q )), n the onsisteny property hols then the ssignment σ efine s ove stisfies σ(sl(a 1 )) SL(A ). Emple 8. Consier the simultion H from Emple 6, ompute for the DSAs from Fig. 9() n (). Unknown elimintion se on H will yiel the following ssignment: σ(,, (f g)) =, σ(,, g) = eh e, σ(,, f) = h, σ(y,, ( )(f g)) =, σ(, y, ( )(f g)) = z. All other ontets re irrelevnt n ssigne ritrrily. The ssignments to re se on the symoli trnsition (1,, ) in () n on the simultion pirs (1, {1}), (1, {16}) n their -witnesses (, {, 6, 9}), (, {17}) respetively. Nmely, onsier the simultion pir (q 1, B ) = (1, {1}) n its witness (q 1, B ) = (, {, 6, 9}). Then B = {1} ontriute the inoming ontet in(1, {1}) =, n eh of the sttes, 6, 9 B = {, 6, 9}, ontriute the outgoing ontets out(, {, 6, 9}, ) = (f g), out(, {, 6, 9}, 6) = g, out(, {, 6, 9}, 9) = f respetively. In this emple the out sets re pirwise-isjoint, thus no further mnipultion is neee. Note tht h we onsiere the simultion ompute in Emple 7, where the -witness for (1, {1}) is (, {,... 1, 0}), we woul still get the sme ssignment sine for ny q, 6, 9, out(, {,... 1, 0}, q) =. Similrly, (1, {16}) ontriute in(1, {16}) = y n the (only) stte 17 {17} ontriute out(, {17}, 17) = ( )(f g). The ssignment to y is se on the symoli trnsition (9,, 10) n the orresponing simultion pir (9, {1}) n its y-witness (10, {1}). 7 Consolition using Join n Minimiztion Consolition onsists of (1) union, whih orrespons to join in the lttie over equivlene lsses, n () hoosing most omplete representtive from n equivlene lss, where most omplete is pture y hving miniml set of ompletions. Note tht DSAs A, A in the sme equivlene lss o not neessrily hve the sme set of ompletions. Therefore, it is possile tht A A (s is the se in Emple ). A DSA A is most omplete in its equivlene lss if there is no equivlent DSA A suh tht A A. Thus, A is most omplete if its set of ompletions is miniml. Let A e DSA for whih we look for n equivlent DSA A tht is most omplete. If A itself is not miniml, there eists A suh tht A is equivlent to A ut A A. Equivlene mens tht (1) for every L A there eists L A suh tht L L, n () onversely, for every L A there eists L A suh tht L L. Requirement (1) hols trivilly sine A A. Requirement () is stisfie iff for every L A \ A ( ompletion tht oes not eist in the miniml DSA), there eists L A suh tht L L (sine for L A A this hols trivilly). Nmely, our gol is to fin DSA A suh tht A A n for every L A \ A there eists L A suh tht L L. Clerly, if there is no L A suh tht L L, then the requirement will not e stisfie. This mens tht the only

ompletions L tht n e remove from A re themselves non-miniml, i.e., re supersets of other ompletions in A. Note tht it is in generl impossile to remove from A ll non-miniml lnguges: s long s SL(A) ontins t lest one symoli wor sw (Σ Vrs) \ Σ, there re lwys omprle ompletions in A. Nmely, if ssignments σ n σ iffer only on their ssignment to some vrile in sw (with the orresponing ontet), where σ ssigns to it L n σ ssigns to it L where L L, then L = σ(sl(a)) = σ(sl(a)\{sw}) σ(sw) σ (SL(A)\{sw}) σ (sw) = σ (SL(A )) = L. Therefore L L where oth L, L A. On the other hn, not every DSA hs n equivlent onrete DSA, whose lnguge ontins no symoli wor. For emple, onsier DSA A suh tht SL(A ) = {}, i.e. A = Σ \ { }. Then for every onrete DSA A with A = {SL(A )}, there is L A suh tht either L SL(A ), in whih se A A, or SL(A ) L, in whih se A A. Therefore, symoli wors re possile soure of non-minimlity, ut they nnot lwys e voie. Below we provie onition whih ensures tht we remove from A only nonminiml ompletions. The intuition is tht non-minimlity of ompletion n rise from vrile in A whose ontet mthes the ontet of some known ehvior. In this se, the miniml ompletion will e otine y ssigning to the vrile the mthing known ehvior, wheres other ssignments will result in supersets of the miniml ompletion. Or in other wors, to keep only the miniml ompletion, one nees to remove the vrile in this prtiulr ontet. Emple 9. This intuition is emonstrte y Emple, where the set of ompletions of the DSA from Fig. 10() ontins non-miniml ompletions ue to the symoli wor tht o-eists with the wor in the symoli lnguge of the DSA. Completions resulting y ssigning to re strit susets of ompletions ssigning to ifferent lnguge, mking the ltter non-miniml. The DSA from Fig. 10() omits the symoli wor, keeping it equivlent to (), while mking its set of ompletions smller (ue to removl of non-miniml ompletions resulting from ssignments tht ssign to lnguge other thn ). Definition 11. Let A e DSA. An epting pth π in A is reunnt if there eists nother epting pth π in A suh tht π π. A symoli wor sw SL(A) is reunnt if its (unique) epting pth is reunnt. This mens tht symoli wor is reunnt if it is less omplete thn nother symoli wor in SL(A). In prtiulr, symoli wors where one n e otine from the other vi renming re reunnt. Suh symoli wors re lle equivlent sine their orresponing epting pths π n π re symolilly-equivlent. In Emple 9, the pth 0, 1, 6, 7 of the DSA in Fig. 10() is reunnt ue to 0, 1,,. Aoringly, the symoli wor leling this pth is lso reunnt. An equivlent hrteriztion of reunnt pths is the following: Definition 1. For DSA A n pth π in A we use A \ π to enote DSA suh tht SL(A \ π) = SL(A) \ SL(π). Lemm. Let A e DSA. An epting pth π in A is reunnt iff π A \ π.

e () () e 6 7 e 6 (u) (m) Fig. 11. Inputs () n (), union (u) n minimize DSA (m). Theorem 8. If π is reunnt pth, then (A \ π) A, n A \ π A, i.e. A \ π is t lest s omplete s A. Theorem 8 les to nturl semi-lgorithm for minimiztion y itertively ientifying n removing reunnt pths. Severl heuristis n e employe to ientify suh reunnt pths. In ft, when onsiering minimiztion of A into some A suh tht SL(A ) SL(A), it turns out tht DSA without reunnt pths nnot e minimize further: Theorem 9. If A (A \ π) for some epting pth π in A then π is reunnt in A. The theorem implies tht for DSA A without reunnt pths there eists no DSA A suh tht SL(A ) SL(A) n A A, thus it nnot e minimize further y removl of pths (or wors). Fig. 11 provies n emple for onsolition vi union (whih orrespons to join in the lttie), followe y minimiztion. 8 Putting It All Together Now tht we hve omplete the esription of symoli utomt, we esrie how they n e use in stti nlysis for speifition mining. We return to the emple in Setion, n emulte n nlysis using the new strt omin. This nlysis woul omine set of progrm snippets into typestte for given API or lss, whih n then e use for verifition or for nswering queries out API usge. Firstly, the DSAs in Fig. 1 n Fig. woul e mine from user oe using the nlysis efine y Mishne et. l [7]. In this proess, oe tht my moify the ojet ut is not ville to the nlysis eomes vrile trnsition. Seonly, we generte typestte speifition from these iniviul DSAs. As shown in Setion, this is one using the join opertion, whih onsolites the DSAs n genertes the one in Fig. (). This new typestte speifition is now store in our

speifition tse. If we re unertin tht ll the emples whih we re using to rete the typestte re orret, we n weights to DSA trnsitions, n lter prune low-weight pths, s suggeste y Mishne et. l. Finlly, user n query ginst the speifition tse, sking for the orret sequene of opertions etween open n lose, whih trnsltes to querying the symoli wor open lose. Unknown elimintion will fin n ssignment suh tht σ() = nre re, s well s the seon possile ssignment, σ() = write. The preision/prtilness orering of the lttie ptures the essene of query mthing. A query will lwys hve reltionship with its results: the query will lwys e more prtil thn its result, llowing the result to ontin the query s ssignments, s well s more preise, whih mens DSA esriing gret numer of ehviors n ontin the ompletions for very nrrow query. Aknowlegements The reserh ws prtilly supporte y The Isreli Siene Fountion (grnt no. 96/10). Yng ws prtilly supporte y EPSRC. Peleg ws prtilly supporte y EU s FP7 Progrm / ERC greement no. [117-VSSC]. Referenes 1. ACHARYA, M., XIE, T., PEI, J., AND XU, J. Mining API ptterns s prtil orers from soure oe: from usge senrios to speifitions. In ESEC-FSE 07, pp... ALUR, R., CERNY, P., MADHUSUDAN, P., AND NAM, W. Synthesis of interfe speifitions for Jv lsses. In POPL (00).. BECKMAN, N., KIM, D., AND ALDRICH, J. An empiril stuy of ojet protools in the wil. In ECOOP 11.. GANESH, V., MINNES, M., SOLAR-LEZAMA, A., AND RINARD, M. Wor equtions with length onstrints: whts eile? In Hif Verifition Conferene (01).. GRUSKA, N., WASYLKOWSKI, A., AND ZELLER, A. Lerning from 6,000 projets: Lightweight ross-projet nomly etetion. In ISSTA 10. 6. MANDELIN, D., XU, L., BODIK, R., AND KIMELMAN, D. Jungloi mining: helping to nvigte the API jungle. In PLDI 0, pp. 8 61. 7. MISHNE, A., SHOHAM, S., AND YAHAV, E. Typestte-se semnti oe serh over prtil progrms. In OOPSLA 1: Proeeings of the 7th ACM SIGPLAN Conferene on Ojet-Oriente Progrmming, Systems, Lnguges n Applitions (01). 8. MONPERRUS, M., BRUCH, M., AND MEZINI, M. Deteting missing metho lls in ojetoriente softwre. In ECOOP 10 (010), vol. 618 of LNCS, pp.. 9. PLANDOWSKI, W. An effiient lgorithm for solving wor equtions. In Proeeings of the thirty-eighth nnul ACM symposium on Theory of omputing (006), STOC 06. 10. SHOHAM, S., YAHAV, E., FINK, S., AND PISTOIA, M. Stti speifition mining using utomt-se strtions. In ISSTA 07. 11. STROM, R. E., AND YEMINI, S. Typestte: A progrmming lnguge onept for enhning softwre reliility. IEEE Trns. Softwre Eng. 1, 1 (1986), 17 171. 1. WASYLKOWSKI, A., AND ZELLER, A. Mining temporl speifitions from ojet usge. In Autom. Softw. Eng. (011), vol. 18. 1. WASYLKOWSKI, A., ZELLER, A., AND LINDIG, C. Deteting ojet usge nomlies. In FSE 07, pp.. 1. WEIMER, W., AND NECULA, G. Mining temporl speifitions for error etetion. In TACAS (00). 1. WHALEY, J., MARTIN, M. C., AND LAM, M. S. Automti etrtion of ojet-oriente omponent interfes. In ISSTA 0.