Foreword. Grammatical inference. Examples of sequences. Sources. Example of problems expressed by sequences Switching the light

Similar documents
CS 121, Section 2. Week of September 16, 2013

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Finite Automata and Regular Languages

Closure under the Regular Operations

Theoretical Computer Science

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

CpSc 421 Homework 1 Solutions

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Improved TBL algorithm for learning context-free grammar

Closure under the Regular Operations

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

Learning k-edge Deterministic Finite Automata in the Framework of Active Learning

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Nondeterministic Finite Automata

Pushdown Automata. Reading: Chapter 6

Constructions on Finite Automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Theoretical Computer Science

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Theory of Computation

NODIA AND COMPANY. GATE SOLVED PAPER Computer Science Engineering Theory of Computation. Copyright By NODIA & COMPANY

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

Chapter 3. Regular grammars

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

CS 133 : Automata Theory and Computability

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Learning Context Free Grammars with the Syntactic Concept Lattice

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Undecidable Problems and Reducibility

2. Elements of the Theory of Computation, Lewis and Papadimitrou,

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro Diniz

Automata theory. An algorithmic approach. Lecture Notes. Javier Esparza

Nondeterminism. September 7, Nondeterminism

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Computational Models - Lecture 4

More on Finite Automata and Regular Languages. (NTU EE) Regular Languages Fall / 41

Formal Definition of a Finite Automaton. August 26, 2013

Theory of Computation (Classroom Practice Booklet Solutions)

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Regular expressions and Kleene s theorem

Theory of Computer Science

The Post Correspondence Problem

MA/CSSE 474 Theory of Computation

Automata: a short introduction

Theory of Computation - Module 3

download instant at Assume that (w R ) R = w for all strings w Σ of length n or less.

Deterministic finite Automata

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Learning Regular Languages Using Nondeterministic Finite Automata

Theory of Computation

Constructions on Finite Automata

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

Formal Languages and Automata

Theory of Computation

Chapter Five: Nondeterministic Finite Automata

Fooling Sets and. Lecture 5

Finite Automata. Seungjin Choi

Computational Models - Lecture 1 1

Computational Theory

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Question Bank UNIT I

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

What we have done so far

Automata Theory for Presburger Arithmetic Logic

REGular and Context-Free Grammars

Chapter Two: Finite Automata

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

CS 455/555: Finite automata

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Duality and Automata Theory

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

A canonical semi-deterministic transducer

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Extended transition function of a DFA

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

6.8 The Post Correspondence Problem

FLAC Context-Free Grammars

Sri vidya college of engineering and technology

Automata and Formal Languages - CM0081 Non-Deterministic Finite Automata

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

Equivalence of DFAs and NFAs

{a, b, c} {a, b} {a, c} {b, c} {a}

UNIT-I. Strings, Alphabets, Language and Operations

UNIT II REGULAR LANGUAGES

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

Subset construction. We have defined for a DFA L(A) = {x Σ ˆδ(q 0, x) F } and for A NFA. For any NFA A we can build a DFA A D such that L(A) = L(A D )

Recap DFA,NFA, DTM. Slides by Prof. Debasis Mitra, FIT.

10. Finite Automata and Turing Machines

CSci 311, Models of Computation Chapter 4 Properties of Regular Languages

Introduction to Theory of Computing

Finite Automata Part One

Transcription:

Foreword Vincent Claveau IRISA - CNRS Rennes, France In the course of the course supervised symbolic machine learning technique concept learning (i.e. 2 classes) INSA 4 Sources s of sequences Slides and concepts L. Miclet, F. Coste... Sequence of symbols sequential information genomic data (DNA, RNA, protein), language, music, logs, electrocardiogram... how to handle this sequential aspect in machine learning? can we learn automatically to recognize sequences of DNA encoding a certain physioogical property? of problems expressed by sequences Back to the starting point sequence aababaaabb is a positive example sequence aababaaaba is a negative example can we learn automatically to distinguish between sequences leading back to the starting point from the others? of problems expressed by sequences Switching the light consider 2 switches I1 and I2 for one light bulb; 4 states are possible state 1: I1 is in low and I2 is low (light is off) state 2: I1 is in high and I2 is low (light is on) state 3: I1 is in low and I2 is high (light is on) state 2: I1 is in high and I2 is high (light is on) action a modifies state of I1, action b modifies I2 only state 1 is wanted (light switched off) sequences aa, baba and abbbba are accepted sequences a, ab, baa or bbbbbbbbb are not can we learn automatically to sequences of actions leading to state 1?

of problems expressed by sequences Switching the light Other sequences: switching the light Finite state automaton of the problem Basics 1 Vocabulary...: sequence of symbols (from an alphabet Σ)...: set (possibly infinite) of words...: set of rules producing the words of a language Some tools handling sequences grammar (rules) finite state machines (automata, transducers...) trees (prefix tree...) expressions (regular expressions) HMM... Basics 2 Chomsky hierarchy one possible classification (among many others!) of the languages according to increasing expressiveness... grammars (type 3; A a and A ab)... grammars (type 2; A γ with γ = abccbcca)... grammars (type 1; αaβ γ)... grammars (type 0; α β) regular grammars are mastered, in particular, we know how to infer them... we know fewer things on context-free grammars we know almost nothing on context-sensitive and unrestricted grammars Basics 3 In this course we focus only on regular languages we use automata to represent/handle them The 4 methodological questions (cf. class 1) 1 - Describing the examples as sequences of symbols examples : b, aab, aaaab negative examples : aaab, a, aaaaa, bb 2 - Choosing the hypothesis space hypothesis: any automaton (deterministic DFA, or non-deterministic NFA) 3 - Exploring the hypothesis space exploration of discrete space (state merging, see below) 4 - Evaluation classically, by testing the final automaton with a test set

A closer look at the hypothesis space 1 What is in our hypothesis space example of automaton: in this course, we decide that non-deterministic automata are refused Properties of the hypothesis space for a finite set of examples, the hypothesis space is finite the hypothesis space can be hierarchically organized A closer look at the hypothesis space 2 Cover relation in grammatical inference the hypothesis covers the example abaa A closer look at the hypothesis space 3 Subsumption in grammatical inference the hypothesis also covers abaa space Hypothesis space space Hypothesis space A closer look at the hypothesis space 4 Bounds of the hypothesis space most specific (canonical) automaton of the training set most specific (canonical) automaton of the positive examples most general (canonical) automaton (UA) Exploring the hypothesis space Principles learning by exploring the discrete space of automata searching for an automaton with a empiric risk equals to zero bottom-up search: starting from the most specific automaton and generalizing generalization operator: state merging About merging choosing 2 states and merging them to generalize cascade of forced merging to make the automaton deterministic control (stop) the merging with the negative examples

Exploring the hypothesis space Exploring the hypothesis space of merging NB: merging may produce non-deterministic automata example to be done in course Avoiding over-generalization a criterion to stop the merging is needed examples of such criteria limitation to a certain sub-family of automata statistical criterion (the remaining states are considered as to different to be merged) use of negative examples: stop when one e is accepted by the automaton Theoretical and practical problems Open issues why starting from the canonical automaton and exploring by merging? when the training set is enough to be sure to find the good hypothesis? how to choose the state to be merged? can we accept an empiric risk greater than 0? can we generalize to more complex concepts: stochastic automata, transducers, context-free grammars? Finite state automata quintuplet (Q, Σ, δ, q 0, F ) Q: finite set of states Σ: finite alphabet δ: transition function QxΣ 2 Q Q 0 Q: set of initial states F Q: set of final (or accepting) states Deterministic automaton, complete automaton if q Q and a Σ, δ(q, a) contains at most one element (resp. exactly one element) and if Q 0 = 1, the automaton is said deterministic (DFA) (resp. complete)

What can be said about this automaton? Cover relation an automaton (deterministic or not) covers (accepts) a word u = a 1...a j, if there exists a sequence (unique or not) of j + 1 states (q 0,..., q j ) s.t. q 0 Q 0, q j F, 0 i j 1, q i+1 δ(q i, a i+1 ) the j + 1 states are said to be reached for this acceptation and q j is the accepting state the j transitions are said to be used by this acceptation Accepted language the language L(A) accepted by an automaton A is the set of all the sequences accepted by A Partitions a partition π of S is a set of subsets of S, each subset being non-empty and non-overlapping, and such that their union is S if s S, the unique element (block) of π including s is written B(s, π) a partition π i refines (is thinner than) a partition π j iff every block of π j is a block of π i or is the union of several blocks of π i s of partitions consider an automaton containing 5 states: 0, 1, 2, 3, 4 π 2 = {{0, 1}, {2}, {3, 4}} is a possible partition π 3 = {{0, 1, 2}, {3, 4}} is... than π 2 π 4 = {{0}, {1, 3}, {2, 4}}... B(0, π 2 ) =... (block containing state 0 in π 2 ) B(0, π 3 ) =... (block containing state 0 in π 3 )

Derived automata or quotient automaton let A = (Q, Σ, δ, q 0, F ) be an automaton, the automaton derived from A w.r.t. partition π of Q A/π is defined by: Q = Q/π = {B(q, π) q Q} F = {B Q B F } δ : Q xσ 2 Q : B, B Q, a Σ, B δ (B, a) iff q, q Q, q B, q B and q δ(q, a) the states of Q belonging to the same block B of the partition π are said to be merged Derived automata consider the automaton A 1 ; compute A 2 = A 1 /π 2 with π 2 = {{0, 1}, {2}, {3, 4}} Automaton A 1 Derived automata consider the automaton A 1 ; compute A 2 = A 1 /π 2 with π 2 = {{0, 1}, {2}, {3, 4}} Major property of merging if an automaton A/π j derives from an automaton A/π i, then the language accepted by A/π i is included in the one that A/π j accepts thus, A/π j recognizes all the words accepted by A/π i plus other words it means that A/π j is more general than A/π i more formally, the merging operation induces... Practical consequence starting from an automaton A, it is possible to build every automaton derived from A from the partitions of A s states there exists a partial order relation on this set, consistent with the inclusion of the language recognized by these automata Major property of merging - s back to example A 1, we ve seen that choosing the partition π = {{0, 1}, {2}, {3, 4}} make it possible to derive the quotient automaton A 2 = A 1 /π 2 thus, we know that L(A 1 ) L(A 2 ) Exercise compute A 3 = A 1 /π 3 (π 3 = {{0, 1, 2}, {3, 4}}); what can you say about it?

compute A 3 = A 1 /π 3 (π 3 = {{0, 1, 2}, {3, 4}}); what can you say about it? partition π 3 is... than π 2 since its blocks are built as the union of blocks of π 2 thus, we know that L(A 2 )...L(A 3 ) Exercise compute A 4 = A 1 /π 4 (π 4 = {{0}, {1, 3}, {2, 4}}); what can you say about it? compute A 4 = A 1 /π 4 (π 4 = {{0}, {1, 3}, {2, 4}}); what can you say about it? Hypothesis space Space E H and merging The set of automata derived from an automaton A is partially ordered by the subsumption relation given by the derivation; thus, E H is a lattice automaton A is the most specific element (bottom) universal automaton UA is the most general element (top) there are as many elements in E H as possible partitions on the states of A the more we merge states, the more the accepted language grows Structural completeness 1 Structural completeness 2 Language samples positive sample E + : finite subset of a language L negative sample E : finite sample of the complement language Σ L Structural completeness E + is structurally complete w.r.t a deterministic automaton A accepting L if every transition of A has been used every element of F (final states of A) is used as acception state it implements an... - Exercise give several DFA such that E + = {aab, ab, abbbbb} is structurally complete for them

Canonical automata 1 Maximal canonical automaton of E + - MCA biggest automata (in number of states) such that E + is structurally complete written MCA(E + ) = (Q, Σ, δ, q 0, F ); generally non-deterministic (because Q 0 > 1) example of MCA({a, ab, bab}) Canonical automata 2 Prefix tree accepting E + - PTA quotient automaton MCA(E + )/π E +, written PTA(E + ) and defined by: B(q, π E +) = B(q, π E +) iff Pr(q) = Pr(q ) PTA(E + ) is obtained by merging states of MCA(E + ) sharing the same prefixes; by construction, it is deterministic example on the previous sample Maximal generalization 1 Goal of the exploration find the minimal automaton that does not cover any negative example border set (in dash): limit of negative example acceptation Maximal generalization 2 Border set frontier set BS MCA (E +, E ): antichain in which each element is at a maximal depth in E H (space built from MCA(E + )) antichain (fr: antichaîne): subset s.t. no pair of element is in order relation (not comparable) BS PTA (E +, E ) contains the canonical automaton A(L) of any regular language L for which E+ is a positive sample and E a negative one Maximal generalization 3 Maximal generalization 4 Consequences the border set of the lattice built from MCA(E + ) is the set of the most general automata compatible with E + and E the problem of finding the smallest DFA compatible with E + and E is thus equivalent to finding the smallest DFA in the border set built from PTA(E + ) let s consider E + = {b, ab} and E = {bb} the maximal canonical automaton of E + is:

Maximal generalization 5 automata in the border set BS PTA (E +, E ) with E + = {b, ab} and E = {bb} Back on the hypothesis space 1 Fundamental properties - general case let E + be a sample of a regular language L, and A any automaton recognizing exactly L if E + is structurally complete wrt A, then A E H (space built from MCA(E + )) conversely, if A E H (built from MCA(E + )), then E + is structurally complete wrt A Fundamental properties - deterministic case let E + be a sample of a regular language L, and A a cannonical automaton recognizing L if E + is structurally complete wrt A(L), then A(L) E H (built from PTA(E + )) Back on the hypothesis space 2 Size of E H let E + be a sample of an unknown language L structurally complete for an automaton A accepting exactly L A can be derived from a partition π of the states of MCA(E + ), i.e. regular inference = finding partition π thus, the size of E H is the number of partitions P(N) with N the number of states of MCA(E + ) or of PTA(E + ) for example P(10) = 10 5, P(20) = 5 10 13, P(100) = 8.5 10 23... this number grows exponentially, thus we need a clever exploration, guided by heuristics RPNI algorithm Principles RPNI implements a depht-first search in E H built upon PTA(E + ) and find a local optimum to the problem of the smallest DFA by construction, every state of PTA(E + ) corresponds to a unique prefix and these prefixes can be sorted by length and lexicographic order (ɛ, a, b, aa, ab, ba, bb, aaa, aab...) RPNI process with N 1 steps where N is the number of states in PTA(E + ) the partition in step i is obtained by merging the two first blocks (wrt the length and lexicographic order above), of the partition of step i 1, which results in a compatible quotient automaton RPNI algorithm Input: E +, E ; Output: a partition of PTA(E+) corresponding to a DFA compatible with E + and E π {{0}, {1},..., {N 1}} ; N = number of states in PTA(E + ) A PTA(E + ) ; for i = 1 to N 1 do for j = 0 à i 1 do π π \ {B j, B i } {B i B j } ; merging of blocks/states Bi and B j if A/π do not accept elements of E then π determ fusion(a/π ) ; π π endif end for end for Return A A/π ;

RPNI algorithm RPNI algorithm Convergence RPNI outputs a DFA belonging to BS PTA (E +, E ) it is the canonical automaton for the accepted language it is the smallest compatible DFA only if the training data satisfies an additional condition: if they contain a characteristic sample i.e. when the training data are representative enough of the language, the discovery of the canonical automaton of this langage is guaranteed moreover, this automaton the smallest compatible DFA in this particular case Convergence the size of the characteristic sample for this particular algorithm is O(n 2 ) where n is the number of states of the resulting automaton the complexity of RPNI, in the latest published version, is O(( E + + E ) E + 2 ) if the training sample contains every word of length < 2n 1, then identification is guaranteed. yet, this property is hard: if the training set contains every word but a small part of the characteristic sample, identification is not guaranteed any more RPNI step by step RPNI step by step Initial data let E + = {ɛ, ab, aaa, aabaa, aaaba} and E = {aa, baa, aaab}, apply the RPNI algorithm Initial data let E + = {ɛ, ab, aaa, aabaa, aaaba} and E = {aa, baa, aaab} the PTA(E + )... RPNI step by step RPNI step by step Start RPNI begins in merging 2 states. Without any other information, states 0 and 1 that are chosen End back to the starting point, and merging 0 and 3 the resulting DFA is compatible with E + and E it is on the border set; we keep this solution

Conclusion Conclusion Real example 1 Real example 3 Genomic example searching the grammar defining a promoter of B. Subtilis E + = 131, E = 55 062 bottom = 1248616 (PTA or MCA?) solution found: 95 states, 347 transitions, several hours of computing Conclusion Real example 2 Genomic example searching the grammar defining a promoter of B. Subtilis E + = 131, E = 55 062 bottom = 1248616 (PTA or MCA?) solution found: 95 states, 347 transitions, several hours of computing compactness? readability?