Acknowledgements Gold s lgorithm Lurent Miclet, Jose Oncin nd Tim Otes for previous versions of these slides. Rfel Crrsco, Pco Cscuert, Rémi Eyrud, Philippe Ezequel, Henning Fernu, Thierry Murgue, Frnck Thollrd, Enrique Vidl, Frédéric Tntini,... List is necessrily incomplete. Excuses to those tht hve een forgotten. http://eurise.univ-st-etienne.fr/~/slides 2 2 Gold's Algorithm Why would this e true? Gold 978 The lgorithm tries to find the minimum DFA comptile with the smple. If we hve such n lgorithm, we cn identify the regulr lnguges. It cn e proved tht if given ll the strings of length up to n, there is only one utomton consistent with the dt. 3 3 4 4 Key ides Represent the sttes of n utomton s strings, prefixes of the strings in the lerning set. Find some incomptiilities etween these prefixes due to seprting suffixes. Invent the others [] [] Strings s sttes [] [] 5 5 6 6
Incomptile prefixes 2 Oservtion tle X + ={} X - ={} Then clerly there t lest 2 sttes, one corresponding to nd nother to. [] [] [] [] [] 7 7 8 8 The informtion is orgnised in tle OT(S,E) where: S Σ is the set of strings/sttes. Some of them re RED, nd some re BLUE. The RED sttes/strings re those such tht S=RED [REDΣ\RED]. E Σ is the experiment set OT: (RED BLUE) E {,,} such tht: if ue L OT[u][e] = if ue L otherwise 9 9 An oservtion tle The experiments (E) The sttes (RED) The trnsitions (BLUE=REDΣ\RED) Mening. L. L 2 2
Holes A hole in the tle is OT[u][e]=. We sy tht the tle is complete if it hs no holes, ie if u RED BLUE, e E OT[u][e] {,}. Rows Let u RED BLUE, OT[u] denotes the row indexed y u. Given u nd v, u nd v re consistent for OT if e E OT[u][e]= OT[v][e] u v OT[u][e]= OT[v][e]. Given u nd v, u nd v re oviously different for OT if e E OT[u][e]= OT[v][e]= OT[u][e]= OT[v][e]=. 3 3 4 4 Exmple: consistent rows Exmple: oviously different rows nd re consistent is consistent with oth nd nd re OD is OD from ll RED rows 5 5 6 6 3 Aout complete tles Closed tle without holes We sy tht the tle is closed if t BLUE s RED: OT[s] = OT[t]. 7 7 8 8
This tle is closed This tle is not closed 9 9 Not closed 2 2 From tle to n utomton: Tle must e complete Tle must e closed S must e prefix-closed nd E must e suffix-closed A lnguge L is prefix closed if uv L u L suffix closed if uv L v L 2 2 22 22 Building n utomton from complete nd closed tle We define C A (OT,S,E)= (Σ,Q,δ,q,F) s follows: Q = {q s : s RED } q = q F = {q u : OT[u][] = } Σ s RED q s if s RED δ(q s,)= q t if t s: t RED ny q t : t RED OT[t] OT[s] 23 23 [] [] 24 24
Comptiility Theorem: Exmple (why E should e suffix-closed) Let OT(S,E) e n oservtion tle closed nd complete. If S is prefix-closed nd E is suffix-closed then C A (OT,S,E) is consistent with the dt in OT(S,E). 25 25 RED = [] Q = {[]} q = [] F = {[]} But suppose the experiments re not suffix-closed... Notice tht the intermedite column cnnot exist 26 26 Building n utomton from tle 4 Wht do we do with the holes? We define C A (OT,S,E)= (Σ,Q,δ,q,F) s follows: Q = {q s : s RED } q = q F = {q ue : OT[u][e] = } δ(q s,)= q s The only q t : t RED OT[t]=OT[s] 27 27 28 28 Given smple X nd test set S prefix complete, it is lwys possile to select set of experiments E such tht the tle OT(S,E) contins ll the informtion in X. But normlly this tle is going to hve some holes. 29 29 Algorithm to uild tle given X nd RED E SUFF(X + ) SUFF(X - ) BLUE RED.Σ\RED For ech p in RED BLUE do For ech e in E do If p.e X + then OT[p][e] else If p.e X - then OT[p][e] else OT[p][e] 3 3
Theorem. If t BLUE such tht OT[t] is oviously different from ny OT[s], (s RED) then no filling of holes in OT(S,E) cn produce closed tle. is OD with ech RED s 3 3 Generl lgorithm. Given X, uild initil tle OT(RED={},BLUE=Σ,E=SUFF(X)) 2. (loop) Find RED sttes when BLUE stte is OD from ll RED, updting the tle. 3. Fill in the holes 4. if OT(S,E) is incomptile with X, return PTA(X + ) 32 32 Algorithm for finding the RED sttes RED = {} uild OT(S,E) from X with E suffix-closed while s BLUE: s is OD do dd s to RED dd s to BLUE, where s RED updte OT(S,E) Algorithm for filling in the holes For ech p RED BLUE,e E s.t. OT[p][e] = if u,v s.t. OT[u][v] then OT[p][e] OT[u][v] else Let t e row s.t. OT[t] OT[p] OT[p][e] OT[t][e] There cn e severl such t 33 33 34 34 Exmple run X + ={,,, } X - ={,,, } nd re OD 35 35 ) We promote line 2) We expnd the tle, dding lines nd 3) is OD 36 36
) We promote line 2) We expnd the tle, dding lines nd 3) We construct the utomton s no line is OD 37 37 [] [] [] 38 38 [] [] [] X + ={,,, } X - ={,,,, } [] [] [] 39 39 The utomton is inconsistent with X. We shll hve to return the PTA insted. 4 4 Equivlence of prolems Complexity Let S e set prefix-closed, nd X e smple. Let OT(S,E) e n oservtion tle consistent with ll the dt in X, with E suffix-closed. The question: Does there exist DFA with the sttes from RED nd consistent with X? is equivlent to: Cn we fill the holes such tht OT(S,E) is closed? The prolem: Given lelled smple X, set of strings RED, is there DFA with sttes in RED nd consistent with X? is NP-Complete. 4 4 42 42
Corollry The question: Given lelled smple X nd positive integer n, is there DFA with n sttes consistent with X? is NP-Complete. Identifiction in the limit There is lerning lgorithm Gold such tht: ) Gold is consistent; 2) Gold identifies in the limit ny regulr lnguge; 3) Gold is polynomil in X ; 4) Gold hs polynomil chrcteristic smples. 43 43 44 44 Detils If the size of the cnonicl cceptor of the lnguge is n, then there is chrcteristic smple with CS L = 2n 2 ( Σ +), such tht Gold (X) produces the cnonicl cceptor for ll X CS L Conclusion The lgorithm will find the correct utomton when chrcteristic smple is included in the dt. The lgorithm runs in polynomil time. 45 45 46 46 Open questions Exercise Cn one fill the holes in more intelligent wy? How fst cn we detect tht choice (for filling) is good or d? Run Gold s lgorithm for the following dt: X + ={,,, } X - ={,,,,, } 47 47 48 48