Finding all minimum-size DFA consistent with given examples: SAT-based approach

Similar documents
arxiv: v2 [cs.fl] 17 Feb 2016

Nondeterministic Automata vs Deterministic Automata

CS 573 Automata Theory and Formal Languages

Chapter 4 State-Space Planning

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

NON-DETERMINISTIC FSA

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Finite State Automata and Determinisation

Introduction to Olympiad Inequalities

= state, a = reading and q j

Minimal DFA. minimal DFA for L starting from any other

Prefix-Free Regular-Expression Matching

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

1 Nondeterministic Finite Automata

Nondeterministic Finite Automata

ANALYSIS AND MODELLING OF RAINFALL EVENTS

Engr354: Digital Logic Circuits

Lecture Notes No. 10

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Descriptional Complexity of Non-Unary Self-Verifying Symmetric Difference Automata

Convert the NFA into DFA

Chapter 2 Finite Automata

March eq Implementing Additional Reasoning into an Efficient Look-Ahead SAT Solver

Metodologie di progetto HW Technology Mapping. Last update: 19/03/09

6.5 Improper integrals

Learning Partially Observable Markov Models from First Passage Times

Linear Algebra Introduction

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

System Validation (IN4387) November 2, 2012, 14:00-17:00

Automatic Synthesis of New Behaviors from a Library of Available Behaviors

Abstraction of Nondeterministic Automata Rong Su

Designing finite automata II

CS 491G Combinatorial Optimization Lecture Notes

2.4 Theoretical Foundations

Outline. Theory-based Bayesian framework for property induction Causal structure induction

Lecture 6: Coding theory

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

CS 275 Automata and Formal Language Theory

Lecture 08: Feb. 08, 2019

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

Regular languages refresher

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

Spacetime and the Quantum World Questions Fall 2010

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

Model Reduction of Finite State Machines by Contraction

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Test Generation from Timed Input Output Automata

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

TIME AND STATE IN DISTRIBUTED SYSTEMS

1 From NFA to regular expression

Chapter 8 Roots and Radicals

Compression of Palindromes and Regularity.

General Suffix Automaton Construction Algorithm and Space Bounds

A Study on the Properties of Rational Triangles

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions

Generalization of 2-Corner Frequency Source Models Used in SMSIM

Nondeterminism and Nodeterministic Automata

Hybrid Systems Modeling, Analysis and Control

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Discrete Structures Lecture 11

ILLUSTRATING THE EXTENSION OF A SPECIAL PROPERTY OF CUBIC POLYNOMIALS TO NTH DEGREE POLYNOMIALS

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4.

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Petri Nets. Rebecca Albrecht. Seminar: Automata Theory Chair of Software Engeneering

CMSC 330: Organization of Programming Languages

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Behavior Composition in the Presence of Failure

CSE 401 Compilers. Today s Agenda

@#? Text Search ] { "!" Nondeterministic Finite Automata. Transformation NFA to DFA and Simulation of NFA. Text Search Using Automata

Probability. b a b. a b 32.

Bottom-Up Parsing. Canonical Collection of LR(0) items. Part II

Algorithm Design and Analysis

Symmetrical Components 1

TOPIC: LINEAR ALGEBRA MATRICES

THE PYTHAGOREAN THEOREM

First Midterm Examination

Ling 3701H / Psych 3371H: Lecture Notes 9 Hierarchic Sequential Prediction

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Bisimulation, Games & Hennessy Milner logic

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Part 4. Integration (with Proofs)

Section 1.3 Triangles

Formal languages, automata, and theory of computation

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem.

p-adic Egyptian Fractions

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

8 THREE PHASE A.C. CIRCUITS

Transcription:

Finding ll minimum-size DFA onsistent with given exmples: SAT-sed pproh Ily Zkirzynov 1,, Antoly Shlyto 1, nd Vldimir Ulyntsev 1 1 ITMO University, Sint Petersurg, Russi JetBrins Reserh, Sint Petersurg, Russi zkirzynov@rin.ifmo.ru, shlyto@mil.ifmo.ru, ulyntsev@rin.ifmo.ru Astrt. Deterministi finite utomton (DFA) is fundmentl onept in the theory of omputtion. The NP-hrd DFA identifition prolem n e effiiently solved y trnsltion to the Boolen stisfiility prolem (SAT). Previously we developed tehnique to redue the prolem serh spe y enforing DFA sttes to e enumerted in redthfirst serh (BFS) order. We proposed symmetry reking predites, whih n e dded to Boolen formule representing vrious utomt identifition prolems. In this pper we ontinue the study of SAT-sed pprohes. First, we propose new predites sed on depth-first serh order. Seond, we present three methods to identify ll non-isomorphi utomt of the minimum size insted of just one the #P-omplete prolem whih hs not een solved efore. Third, we revisited our implementtion of the BFS-sed pproh nd onduted new evlution experiments. It ours tht BFS-sed pproh outperforms ll other ext lgorithms for DFA identifition nd n e effetively pplied for finding ll solutions of the prolem. Keywords: Grmmtil inferene, utomt identifition, symmetry reking, Boolen stisfiility 1 Introdution A vriety of models exists in utomt theory ut deterministi finite utomton (DFA) is the si one nd mong the most importnt ones. DFA is model tht reognizes regulr lnguges [1]. The essene of the DFA identifition (indution, lerning, synthesis) prolem is to find minimum-size DFA ( DFA with the minimum numer of sttes) tht is onsistent with given set of leled exmples positive-leled strings tht must e epted y the uilt DFA nd negtive-leled strings tht must e rejeted. A smller DFA is simpler nd, euse of well-known Om s rzor priniple, it is model whih etter explins the oserved exmples. Thus the DFA lerning prolem is to find the regulr lnguge tht most likely ws used to generte set of leled exmples. This prolem is mong the est-explored ones in grmmtil inferene []. This prolem ws shown to e NP-hrd in []. Nevertheless, severl effiient DFA lerning pprohes were developed, see, e.g., []. DFA identifition using

evolutionry omputtion methods is one of historilly the first nd effetive pprohes, see, e.g., [4, 5]. Susequent reserh resulted in development of method for evolving DFA using multi-strt rndom hill limer, see, e.g., [6]. Lter pprohes re sed on heuristi lgorithms. The evidene driven stte-merging lgorithm (EDSM) is the most ommonly used nd the only one whih n hndle lrge-sized trget DFA [7]. This lgorithm is greedy nd works in polynomil time. Despite its effiieny in terms of solving time this pproh usully finds only lol optimum ut not glol one. The performne of EDSM ws severl times improved y using speilized serh proedures, see, e.g., [8, 9]. In [6] Lus nd Reynolds ompred the EDSM lgorithm nd the evolutionry lgorithm (EA) mentioned ove. They found tht the EDSM-sed pproh outperforms the EA in terms of solving time on lmost ll instnes. The methods mentioned ove re not ext they nnot gurntee tht the found DFA is one of the minimum-sized ones. Heule nd Verwer proposed so-lled trnsltion-to-sat pproh whih n e pplied to DFA identifition [10]. This pproh, s it n e otined from the nme, is sed on the trnsltion the originl prolem to well-studied Boolen stisfiility prolem (SAT). The performne of SAT solvers hs signifintly improved over the lst dede. This omputtionl strength n e used in other prolems y trnslting these prolems into SAT instnes, nd susequently running modern SAT solver on them. This pproh ws shown to e very ompetitive for some prolems, see, e.g., [11 14]. The uthors hve shown tht trnsltion-to-sat is effetive for solving DFA identifition s well. The SAT-sed method is ext s opposed to EA nd EDSM lgorithms, whih is importnt euse of the mentioned Om s rzor priniple,. The uthors lso proposed omined pproh, whih used few EDSM steps s preproessing step, nd won the first prize t the StMInA ompetition [15]. We do not onsider of this step in our pper euse EDSM is not n ext lgorithm. There re symmetries in mny omintoril prolems. Symmetry reking predites n e dded s onstrints to SAT formul with purpose of elimintion some or ll symmetries nd thus redue the serh spe, see, e.g., [16]. When we tlk out DFA the most ovious symmetries re groups of isomorphi utomt. Heule nd Verwer in [10] proposed simple ut effetive greedy mximl lique (mx-lique) lgorithm. It llows reduing the mount of isomorphi utomt in eh group from n! to (n k)!, where n is the size of the DFA nd k is the size of the found lique. We proposed symmetry reking predites whih enfore DFA sttes to e enumerted in the redth-first serh (BFS) order in [17]. These predites n e dded to Boolen formul efore pssing it to SAT solver. This pproh llows to redue the mount of isomorphi utomt in eh group from n! to only one representtive the BFS-enumerted one. The results for the ext se still were not very good in our previous pper. However, the BFS-sed pproh is more flexile thn mx-lique we demonstrted its flexiility y developing modifition of the noiseless trnsltion-to-sat tehnique for the noisy se (some exmples re wrong-leled).

In this pper we propose new symmetry reking predites sed on depthfirst serh (DFS) order. This is the modifition of our previous BFS-sed pproh. BFS-sed predites were not good enough to ompete with the mx-lique lgorithm in DFA identifition in our previous pper. Therefore we revisited our implementtion of this tehnique. It ours tht oth BFS-sed nd DFS-sed pprohes lerly outperform urrent stte-of-the-rt DFASAT from [10]. We lso propose method sed on these tehniques for solving the prolem of finding ll utomt (find-ll) with the minimum numer of sttes whih re onsistent with given set of exmples. This prolem hs not een solved effiiently efore. Moreover, none of the existing pprohes for the DFA lerning re pplile, even with slight modifitions, to solve the find-ll prolem due to their nture. We use two wys of lunhing SAT solvers: relunhing non-inrementl solver nd using n inrementl solver. We lso developed the heuristi ktrking method (lmost similr to the one presented in the pper [18]) s seline for ompring it with SAT-sed ones. Preliminries nd Previous Work.1 Enoding DFA Identifition into SAT We ssume the reder to e fmilir with the theory of lnguges nd utomt. The purpose of the DFA identifition prolem is to find the minimum DFA whih is onsistent with two given sets of strings: set of positive exmples (S + ) nd is set of negtive exmples (S ). In other words, the desired DFA must ept ll strings from S + nd rejet ll strings from S. In this pper it is ssumed tht DFA sttes re numered from 1 to C nd the strt stte hs numer 1. The exmple of the minimum DFA for S + = {,, } nd S = {, } is shown in Fig. 1. We riefly desrie the urrent stte-of-the-rt pproh for solving the onsidered prolem. The first step of the tehnique proposed y Heule nd Verwer in [10] is to uild n ugmented prefix tree eptor (APTA) from the given sets S + nd S. An APTA is tree-shped utomton sed on prefix tree for the sets S + nd S ut with leled sttes. It is lled ugmented euse it my ontin sttes whih re not epting or rejeting. The APTA for S + nd S mentioned ove is shown in Fig. 1. The seond step is to onstrut the onsisteny grph (CG) for the uilt APTA. The set of the CG verties is the sme s the APTA verties set. Two verties in the CG re djent if their merging in the APTA nd susequent determiniztion proess will use n inonsisteny: sitution when n epting stte is merged with rejeting one. The CG for APTA from Fig 1 is shown in Fig. 1. The third step of the method is to divide the CG verties set into C disjoint sets. Eh set hs to ontin ll verties equivlent to the orresponding APTA sttes whih will e merged into one stte in the resulting DFA. If suh seprtion n e mde, then the utomton with C sttes onsistent with the given

5 1 1 5 8 6 7 4 4 1 6 7 8 () An exmple of DFA () An exmple of n APTA for S + = {,, } nd S = {, } () The onsisteny grph for the APTA from Fig. 1 Fig. 1. An exmple of n APTA nd its onsisteny grph sets of strings exists nd it n e esily uilt. C n e iterted from 1 nd until suh prtition is found. Thus it is gurnteed tht the found C-sized DFA is the minimum DFA onsistent with given ehvior exmples. This n e viewed s grph oloring prolem nd we need to olor CG verties into the minimum numer of olors in suh wy tht djent verties hve different olors. The next step in the onsidered lgorithm is to trnslte the grph oloring prolem into SAT. Authors proposed so-lled ompt enoding where they use three kinds of Boolen vriles to formulte ll onstrints in CNF: olor vriles x v,i whih indite whether the vertex v in the CG is i-olored; prent reltion vriles y,i,j whih indite whether there is n -leled trnsition from the i-olored stte to the j-olored stte in the trget DFA; epting olor vriles z i whih indite whether the i-olored stte in the trget DFA is epting. There re four mndtory nd four redundnt types of luses in the proposed ompt enoding. The reder n red out them in detil in [10]. The finl step of the trnsltion-to-sat pproh is to run n externl SAT solver with the uilt CNF formul. If the formul is stisfile, then the trget DFA n e esily onstruted from the found stisfying ssignment. Otherwise, the numer of olors C is inresed.. Symmetry Brekings Predites Lrge lique predites. Heule nd Verwer used symmetry reking predites in their lgorithm [10]. In the se when the CG nnot e olored into C olors the SAT solver tries to solve the sme prolem C! times one time for eh permuttion of olors. In other words the solver onsiders C! isomorphi utomt. The uthors suggested to find some lrge lique in the CG nd to fix the olors of its verties. It helps to redue the numer of unneessry onsidertions euse in ny vlid grph oloring ll verties in lique oviously hve different olors. Thus, ssuming tht the size of the found lique is k, the solver onsiders only (C k)! isomorphi utomt. Moreover, the proess of iterting over C n e strted from k insted of 1.

BFS-sed predites. We proposed the new pproh to symmetry reking in our previous reserh [17]. Its min ide is to enfore DFA sttes to e enumerted in the redth-first serh (BFS) order. If some order (sy lexiogrphil) on the trnsition symols is fixed then only one representtive of eh equivlene lss with respet to the isomorphi reltion is BFS-enumerted due to the uniqueness of suh BFS trversl. We ll DFA BFS-enumerted if its enumertion orresponds to the order of sttes proessing during the BFS trversl. In other words, if we onsider BFS tree, uilt for some DFA nd if we rrnge the hildren of eh stte from left to right ording to the hosen order on the trnsition symols then numers of sttes should inrese from left to right on the sme depth (lyer-order) nd from top to ottom (depth-order). In [17] we used the definition sed on BFS-queue whih is equivlent to the one desried ove ut less pprehensile. An exmple of BFS-enumerted DFA is shown in Fig., nd its BFS tree is shown in Fig.. 1 1,,,, 5 6 4,, 7 4 5 6 7 () An exmple of BFS-enumerted DFA () A BFS tree of the DFA from Fig. Fig.. A BFS-enumerted DFA nd its BFS-tree If suh predites re used then while SAT solver serhes for DFA onsistent with the given smples, it is restrited to only BFS-enumerted ones. To implement this we proposed three dditionl kinds of Boolen vriles: 1. prent vriles p j,i whih re true if nd only if stte i is the prent of stte j in the BFS tree;. trnsition vriles t i,j whih re true if nd only if there is trnsition from stte i to stte j;. minimum symol vriles m l,i,j whih re true if nd only if there is l-leled trnsition from stte i to stte j nd there re no suh trnsitions leled with smller symol (ording to the hoosen order on symols). These vriles re used only in the se of non-inry lphet. BFS-enumertion is enfored with the following seven luses:

1. 1 i<j C (t i,j y l1,i,j... y ll,i,j) definition of trnsition vriles using vriles y l,i,j ;. (p j,i t i,j t i 1,j... t 1,j ) definition of prent vriles 1 i<j C using vriles t i,j ;. (p j,1 p j,... p j,j 1 ) eh stte exept the strt one holds j C prent with smller numer (depth-order); 4. (p j,i p j+1,k ) the ordering of hildren must e the sme s 1 k<i<j<c the ordering of prents (lyer-order for hildren of different prents); 5. (p j,i p j+1,i y,i,j ) in se of inry lphet this onstrint 1 i<j<c is suffiient to order two hildren j nd j+1 of stte i (lyer-order for hildren of one prent); 6. (m ln,i,j y ln,i,j y ln 1,i,j... y l1,i,j) definition of 1 i<j C 1 n L 1 i<j<c 1 k<n L miminum symol vriles using vriles y l,i,j whih re used in se of non-inry lphet; 7. (p j,i p j+1,i m ln,i,j m lk,i,j+1) in se of noninry lphet this onstrint fores hildren of stte to e ordered ording to the hosen order on symols (lyer-order for hildren of one prent). Using vriles nd luses desried ove one n fore n utomton to e BFS-enumerted. Unfortuntely the implementtion of these methods ws not perfet when we prepred our previous pper [17] so the results did not show the rel improvement used y the proposed method. We revisited it nd performed new evlution experiments. The results re shown in Setion 5 nd they re quite impressive. DFS-sed Symmetry Breking Predites In this setion we propose new wy to fix utomt sttes enumertion to void onsidertion of isomorphi utomt during SAT solving. This pproh is modifition of our BFS-sed predites. It enfores utomt sttes to e enumerted in the depth-first serh (DFS) order. We desrie the method riefly, pying ttention only to the differenes etween DFS- nd BFS-sed pprohes. Detiled informtion out BFS-sed predites n e found in Setion.. During DFS proessing it is neessry to find ll djent unvisited sttes for eh unvisited stte of the DFA. Firstly, the DFS lgorithm hndles the initil DFA stte. Then the lgorithm proesses the hildren of this stte nd reursively exeutes for eh of them. We proess hild sttes in some prtiulr (e.g., lphetil) order of symols l on trnsitions i l j. Thus gin only one representtive of eh equivlene lss with respet to the isomorphi reltion

will e proessed. We ll DFA DFS-enumerted if its sttes re numered in the order of hndling them y DFS trversl with hosen symol order. Although there is no trversl, we refer to it for the definition nd explntion. The set of developed onstrints enfores DFS. An exmple of DFS-enumerted DFA is shown in Fig.. A DFS tree for this DFA is shown in Fig.. 1 1 8 8 5 5, 4 7,, 6,, 4 6 7 () An exmple of DFS-enumerted DFA () A DFS tree of the DFA from Fig. Fig.. A DFS-enumerted DFA nd its DFS tree All vriles whih were used for the BFS enumertion re lso used for the DFS enumertion, ut some the onstrints must e hnged. In the DFS enumertion p j,i vriles (p j,i is true if nd only if stte i is the prent of j in the DFS tree) re defined differently. Due to the greediness of the DFS lgorithm, stte i is the prent of stte j if it hs the mximum numer mong sttes tht hve trnsition to j: (p j,i t i,j t i+1,j... t j 1,j ), 1 i<j C where t i,j 1 if nd only if there is trnsition etween i nd j (these vriles in their turn re defined y using y l,i,j vriles). Moreover, in the DFS enumertion insted of the hildren ordering onstrint we use the following one. If i is the prent of stte j nd k is stte etween i nd j (i < k < j) then there is no trnsition from stte k to stte q, where q is igger thn j: (p j,i t k,q ). 1 i<k<j<q<c

Indeed, sine i < k < j, stte k hs to e onsidered y the DFS lgorithm efore stte j. Hene if suh trnsition would exist then stte k must hve lower numer thn stte j. Now, to enfore the DFA to e DFS-enumerted we hve to order hildren ording to symols on trnsitions (e.g., lphetilly). We onsider two ses: lphet Σ onsists of two symols {, } nd more thn two symols {l 1,..., l L }. In the se of two symols stte i n hve only two trnsitions: to stte j nd to stte k (where without loss of generlity j < k). If the trnsition from stte i to stte j is used during the DFS trversl then it must e leled with smller symol: (p j,i t i,k y,i,j ), 1 i<j<k<c euse otherwise stte k hd to e proessed erlier. In the seond se we hve to use m l,i,j vriles: m l,i,j is true if nd only if there is n l-leled trnsition from stte i to stte j nd there is no trnsition from stte i to stte j with n lphetilly smller symol. The ide is similr to the previous se. For stte i it remins to rrnge its hildren in the hosen order. For ny two trnsitions from stte i to stte j nd from stte i to stte k (where without loss of generlity j < k), if stte j is used during the DFS trversl then it must e leled with smller symol: (p j,i t i,k m ln,i,j m lm,i,k). 1 i<j<k C 1 m<n L Thus we proposed the new set of onstrints whih enfore DFA to e DFSenumerted. The predites (for the se of three or more symols) trnslted into O(C 4 + C L ) (where C is the numer of olors nd L is the lphet size) CNF luses whih re listed in Tle 1 together with BFS-sed predites, whih re trnslted into O(C + C L ) luses. Both DFS BFS Tle 1. DFS-sed nd BFS-sed symmetry reking luses Cluses Rnge t i,j (y l1,i,j... y ll,i,j) 1 i < j C y i,j,l t i,j 1 i < j C; l Σ p j,i t i,j 1 i < j C p j,1 p j,... p j,j 1 j C m l,i,j y l,i,j 1 i < j C; l Σ m ln,i,j y lk,i,j 1 i < j C; 1 k < n L (y ln,i,j y ln 1,i,j... y l1,i,j) m ln,i,j 1 i < j C; 1 n L p j,i t k,j 1 i < k < j C (t i,j t i+1,j... t j 1,j) p j,i 1 i < j C p j,i t k,q 1 i < k < j < q C (p j,i p k,i m ln,i,j) m lm,i,k 1 i < j < k C; 1 m < n L p j,i t k,j 1 k < i < j C (t i,j t i 1,j... t 1,j) p j,i 1 i < j C p j,i p j+1,k 1 k < i < j < C (p j,i p j+1,i m ln,i,j) m lm,i,j+1 1 i < j < C; 1 m < n L

4 The find-ll prolem In this setion we onsider the prolem of finding ll non-isomorphi DFA (findll prolem) with the minimum numer of sttes whih re onsistent with given set of strings. We propose wy to modify the SAT-sed method of solving regulr DFA identifition prolem in order to pply it to the find-ll prolem. We onsider two wys of using SAT solvers: restrting non-inrementl solver fter finding eh utomton nd using n inrementl solver if suh solver finds solution, it retins its stte nd is redy to ept new luses. The most ommon interfe nd tehnique for inrementl SAT solving ws proposed in [19]. We lso propose the heuristi ktrking method s seline for ompring it with SAT-sed ones. 4.1 SAT-sed methods The min ide of SAT-sed methods of solving the find-ll prolem is to n stisfying interprettions (vrile vlues) whih hve lredy een found. It is ovious tht if the proposed symmetry reking predites re not used then this pproh finds mny isomorphi utomt extly C! for eh equivlene lss where C is the DFA size. Sine mx-lique predites fix k olors only (where k is the lique size), the lgorithm of Heule nd Verwer finds (C k)! isomorphi utomt whih is still d. The BFS-sed nd DFS-sed symmetry reking predites llow us to n isomorphi DFA from one equlity lss y nning n ordingly enumerted representtive. It must e noted tht lthough the ide to disrd stisfying interprettions is lssi for suh methods, it nnot e used in prtie without effetive symmetry reking tehniques. There were no known tehniques to del with ftoril numer of isomorphi utomt erlier, nd thus the onsidered prolem ould not e solved effetively. Proposed symmetry reking predites hnge the sitution nd ring the solution. It is esy to implement this y dding loking luse into the Boolen formul. Sine we know tht y l,i,j vriles define the struture of the trget DFA entirely, it is enough to forid only vlues of these vriles from the found interprettion: y 1 y... y n Σ, where y k is some y l,i,j from the found interprettion for 1 < k < n Σ. There re two different wys of using SAT solvers s it ws stted ove. First, we n restrt non-inrementl SAT solver with the new Boolen formul with the loking luse fter finding eh utomton. The seond pproh is sed on inrementl SAT solvers: fter eh found utomton we dd the loking luse to the solver nd ontinue its exeution. It is neessry to mention the se when some trnsitions of the found DFA re not overed y the APTA. It mens tht there re some free trnsitions whih re not used during proessing ny given word nd eh suh trnsition n end in ny stte, sine this does not influene the onsisteny of the DFA

with given set of strings. But in the se of the find-ll prolem silly we do not wish to find ll these utomt distinguished only y suh trnsitions. Thus we propose wy to fore ll free trnsitions to e self-loops end in the sme stte s they strt. To hieve tht we dd uxiliry used vriles: u l,i is true if nd only if there is n l-leled APTA edge from the i-olored stte: u l,i x 1,i... x Vl,i, l Σ 1 i C where V l is the set of ll the APTA sttes whih hve n outoming edge leled with l. To fore unused trnsitions to e self-loop we dd the following onstrints: u l,i y l,i,i. l Σ 1 i C These dditionl onstrints re trnslted into O(C L ) luses. See Fig. 4 for n exmple of n APTA for S + = {,,, } nd S = {} nd its onsistent DFA with n unused trnsition. If we dd the proposed onstrints, then this trnsition will e fored to e loop s shown y dshed line in Fig. 4. 1 6 7 8 9 4 5 1 () An exmple of n APTA for S + = {,,, } nd S = {} () The DFA is uilt from the APTA from Fig. 4 with unused -leled trnsition from stte Fig. 4. An exmple of n APTA nd its onsistent DFA 4. Bktrking lgorithm The solution sed on ktrking does not use ny externl tools like SAT solvers. This lgorithm works s follows. Initilly there is n empty DFA with n sttes. Also there is frontier the set of edges from the APTA whih re not yet represented in the DFA. Initilly the frontier ontins ll outoming edges of the APTA root. The reursive funtion Bktrking mintins the frontier in the proper stte. If the frontier is not empty, then the funtion tries to ugment the DFA with one of its edges. Eh found DFA is heked to e onsistent with the APTA nd if the DFA omplies with it then n updted frontier is

found. If the frontier is empty then the DFA is heked for ompleteness ( DFA is omplete if there re trnsitions from eh stte leled with ll lphet symols). If it is not omplete nd there re nodes whih hve the numer of outoming edges less thn the lphet size then we dd missing edges s selfloops with funtion MkeComplete. Algorithm 1 illustrtes the solution. The funtion FindNewFrontier returns the new frontier for the ugmented DFA or null if the DFA is inonsistent with the APTA. This lgorithm is n ext serh lgorithm sed on the one from [18]. Dt: ugmented prefix tree eptor APTA, urrent DFA (initilly empty), frontier (initilly ontins ll APTA root outoming edges) DFAset new Set<DFA> edge ny edge from frontier foreh destintion 1.. S do soure the stte of DFA from whih edge should e dded DFA DFA trnsition(soure, destintion, edge.lel) frontier FindNewFrontier(APTA, DFA, frontier) if frontier null then if frontier = then DFAset.dd(MkeComplete(DFA )) else DFAset.dd(Bktrking(APTA, DFA, frontier )) end end end return DFAset Algorithm 1: Bktrking solution 5 Experiments All experiments were performed using mhine with n AMD Opteron 678.4 GHz proessor running Uuntu 14.04. All lgorithms were implemented in Jv, the lingeling SAT solver ws used [0]. As fr s we know ll ommon enhmrks re too hrd for solving y ext lgorithms without some heuristi non-ext steps. Thus our own lgorithm ws used for generting prolem instnes. This lgorithm uilds set of strings with the following prmeters: size N of DFA to e generted, lphet size A, the numer S of strings to e generted. The lgorithm is rrnged s follows. First of ll N sttes re generted nd uniquely numerted from 1 to N. Eh stte is equiproly set to e epting or rejeting. Next on step i the lgorithm piks stte i, evenly hooses nother stte from [i + 1; N] nd dds rndom-leled trnsition from the first stte to the seond. After N 1 suh steps we hve prtilly uilt n utomton where ll sttes re rehle from the initil one (1-numered). In the end the

lgorithm piks eh stte one y one nd dd ll missing (in terms of utomton ompleteness) trnsitions with destintion rndomly hosen mong ll sttes. Finlly S strings re generted y proessing the utomton. The distriution of the words length is shifted to longer words. These strings with the epting or rejeting lels form the instne of the DFA identifition prolem. For DFA identifition we used the following prmeters: N [10; 0] with step, A =, S = 50N. We ompred the SAT-sed pproh with three types of symmetry reking predites: the mx-lique lgorithm from [10] (the urrent stte-of-the-rt) nd the proposed DFS-sed nd BFS-sed methods. Eh experiment ws repeted 100 times. The time limit ws set to 600 seonds. The results re listed in Tle. It n e seen from the tle tht oth DFSsed nd BFS-sed strtegies lerly outperform the mx-lique pproh. BFS-sed strtegy in its turn notly outperforms DFS-sed one when trget utomton size is lrger thn 14. These results for the BFS-sed pproh were not otined in our previous reserh due to weker tehnil implementtion. Tle. Medin exeution times of ext solving DFA identifition in seonds N DFS BFS mx-lique 10 0.9 0.5. 1 40.4 7.6 40. 14 8. 6.4 TL 16 05.1 114.1 TL 18 601.7 181.9 TL 0 501.6 9.7 TL TL 45. TL 4 TL 65.1 TL 6 TL 95.8 TL 8 TL 114.4 TL 0 TL 165.5 TL The seond experiment onerned the find-ll prolem. A rndom dtset ws lso used here. We used the following prmeters: N [5; 15], A =, S {5N, 10N, 5N}. We ompred the BFS-SAT-sed method with the restrting strtegy (REST olumn in the tle), the BFS-SAT-sed method with the inrementl strtegy (INC) nd the ktrking method (BTR). Eh experiment ws repeted 100 times s well. The time limit ws set to 600 seonds. The results re given in Tle. The first olumn in eh sutle ontins the numer of instnes whih hve more thn one DFA in the solution (> 1). If less thn 50 instnes were solved then TL is shown insted of vlue. It n e seen from the tle tht SAT-sed methods work signifintly fster thn the ktrking one when the size of the utomton is greter thn 8. It hppens euse the SAT-sed methods with BFS-sed predites onsider only one DFA for eh equivlene lss with respet to the isomorphi reltion insted of N!. As we see, the inrementl strtegy in its turn lerly outperforms the

restrt strtegy. It n e explined s inrementl SAT solver sves its stte ut non-inrementl solver does the sme tions on eh exeution. Tle. Medin exeution times in seonds of SAT-sed restrt method, SAT-sed inrementl method nd ktrking method S = 5 N S = 10 N S = 5 N N >1 REST INC BTR >1 REST INC BTR >1 REST INC BTR 5 5..0 0.8 40.6. 1. 17 4.1.4 1.5 6 56.8.4.1 1 4.7.9 1.7 7 5.4 4. 1.7 7 87.9.5 4.1 7.7.0.1 1 7.4 6.7.5 8 80 4.6.7 87. 4 7.0 6.5 41.7 16 10.1 8.9 11.6 9 91 7.6.9 475.1 50 7.7 6.4 11.6 10 1.8 1.0 61.4 10 89 15.7 5. 756. 47 8.6 7.0 974.7 11 18.8 16.1 76.8 11 94 19.9 7. TL 6 18.5 1.8 108.0 9 4.5 1.9 1158.4 1 90 8.0 9.9 TL 49. 16.7 TL 8.5 7. 89.1 1 9 185.5 18.1 TL 57 6.9.6 TL 1 6.0 51.4 TL 14 87 408.5 49.0 TL 71 85.1 41.8 TL 4 67.0 56. TL 15 95 571.1 174.1 TL 69 19. 95.7 TL 6 9. 6. TL Our implementtion of proposed predites nd lgorithms is ville on our lortory githu repository 1. 6 Conlusions We hve proposed DFS-sed symmetry reking predites. They n e dded to the Boolen formul efore pssing it to SAT solver while solving vrious DFA identifition prolems with SAT-sed lgorithms. Using these predites llows reduing the prolem serh spe y enforing DFA sttes to e enumerted in the depth-first serh order. We hve revisited our implementtion of the proposed symmetry reking predites nd ompred the trnsltion-to-sat method from [10] to the sme one with proposed symmetry reking predites insted of originl mxlique predites. The proposed pproh lerly improved the trnsltion-to- SAT tehnique whih ws demonstrted with the experiments on rndomly generted input dt. The BFS-sed pproh hs shown etter results thn the DFS-sed one if the trget DFA size is lrge. Then, we hve proposed solution for the find-ll DFA prolem. The proposed pproh n effiiently solve the prolem tht the previously developed methods nnot e pplied for. We performed the experiments whih hve shown tht our pproh with the inrementl SAT solver lerly outperfoms the Bktrking lgorithm. 1 https://githu.om/tl/dfa-indutor

Aknowledgements The uthors would like to thnk Igor Buzhinsky, Dniil Chivilikhin, Mxim Buzdlov for useful omments. This work ws finnilly supported y the Government of Russin Federtion, Grnt 074-U01. Referenes 1. Hoproft, J., Motwni, R., Ullmn, J.: Introdution to Automt Theory, Lnguges, nd Computtion. Addison-Wesley (006). De L Higuer, C.: A iliogrphil study of grmmtil inferene. Pttern reognition 8(9) (005) 1 148. Gold, E.M.: Complexity of utomton identifition from given dt. Informtion nd Control 7() (1978) 0 0 4. Dupont, P.: Regulr Grmmtil Inferene from Positive nd Negtive Smples y Geneti Serh: the GIG Method. In: Grmmtil Inferene nd Applitions. Springer (1994) 6 45 5. Luke, S., Hmhshi, S., Kitno, H.: Geneti Progrmming. In: Proeedings of the Geneti nd Evolutionry Computtion Conferene. Volume. (1999) 1098 1105 6. Lus, S.M., Reynolds, T.J.: Lerning DFA: Evolution Versus Evidene Driven Stte Merging. In: Evolutionry Computtion, 00. CEC 0. The 00 Congress on. Volume 1., IEEE (00) 51 58 7. Lng, K.J., Perlmutter, B.A., Prie, R.A.: Results of the Adingo One DFA Lerning Competition nd New Evidene-driven Stte Merging Algorithm. In: Grmmtil Inferene. Springer (1998) 1 1 8. Lng, K.J.: Fster Algorithms for Finding Miniml Consistent DFAs. Tehnil report (1999) 9. Buglho, M., Oliveir, A.L.: Inferene of regulr lnguges using stte merging lgorithms with serh. Pttern Reognition 8(9) (005) 1457 1467 10. Heule, M.J., Verwer, S.: Ext DFA Identifition Using SAT Solvers. In: Grmmtil Inferene: Theoretil Results nd Applitions. Springer (010) 66 79 11. Lohfert, R., Lu, J., Zho, D.: Solving SQL Constrints y Inrementl Trnsltion to SAT. In Nguyen, N., Borzemski, L., Grzeh, A., Ali, M., eds.: New Frontiers in Applied Artifiil Intelligene. Volume 507 of LNCS. (008) 669 676 1. Gleotti, J.P., Rosner, N., Lopez Pomo, C.G., Fris, M.F.: TACO: Effiient SAT- Bsed Bounded Verifition Using Symmetry Breking nd Tight Bounds. IEEE Trnstions on Softwre Engineering 9(9) (01) 18 107 1. Ulyntsev, V., Tsrev, F.: Extended Finite-stte Mhine Indution Using SATsolver. In: Pro. of ICMLA 011. Volume., IEEE (011) 46 49 14. Zrzezny, A.: A new trnsltion from ECTL* to SAT. Fundment Informtie 10(-4) (01) 75 95 15. Wlkinshw, N., Lmeu, B., Dms, C., Bogdnov, K., Dupont, P.: STAMINA: Competition to Enourge the Development nd Assessment of Softwre Model Inferene Tehniques. Empiril Softwre Engineering 18(4) (01) 791 84 16. Crwford, J., Ginserg, M., Luks, E., Roy, A.: Symmetry-reking predites for serh prolems. KR 96 (1996) 148 159

17. Ulyntsev, V., Zkirzynov, I., Shlyto, A.: BFS-Bsed Symmetry Breking Predites for DFA Identifition. In Dediu, A.H., Formenti, E., MrtÃŋn-Vide, C., Truthe, B., eds.: Lnguge nd Automt Theory nd Applitions. Volume 8977 of Leture Notes in Computer Siene., Springer Interntionl Pulishing (015) 611 6 18. Ulyntsev, V., Buzhinsky, I., Shlyto, A.: Ext finite-stte mhine identifition from senrios nd temporl properties. Interntionl Journl on Softwre Tools for Tehnology Trnsfer (016) 19. Eén, N., Sörensson, N.: An extensile SAT-solver. In: Theory nd pplitions of stisfiility testing, Springer (004) 50 518 0. Biere, A.: Spltz, Lingeling, Plingeling, Treengeling, YlSAT Entering the SAT Competition 016. Proeedings of SAT Competition (016) 44 45