Inductive and statistical learning of formal grammars

Size: px

Start display at page:

Download "Inductive and statistical learning of formal grammars"

Hope Gaines
5 years ago
Views:

1 Inductive nd sttisticl lerning of forml grmmrs Pierre Dupont Grmmr Induction Mchine Lerning Gol: to give the lerning ility to mchine Design progrms the performnce of which improves over time Inductive lerning is prticulr instnce of mchine lerning Gol: to find generl lw from exmples Suprolem of theoreticl computer science, rtificil intelligence or pttern recognition Typeset y FoilTEX Pierre Dupont Grmmr Induction Outline Grmmr Induction Grmmr Induction or Grmmticl Inference Grmmr induction definition Lerning prdigms DFA lerning from positive nd negtive exmples RPNI lgorithm Proilistic DFA lerning Appliction to nturl lnguge tsk Links with Mrkov models Relted prolems nd future work Grmmr induction is prticulr cse of inductive lerning The generl lw is represented y forml grmmr or n equivlent mchine The set of exmples, known s positive smple, is usully mde of strings or sequences over specific lphet A negtive smple, i.e. set of strings not elonging to the trget lnguge, cn sometimes help the induction process Dt Induction Grmmr S >S S > λ Pierre Dupont Pierre Dupont 3

Grmmr Induction Exmples Grmmr Induction Chromosome clssifiction Nturl lnguge sentence Speech Centromere Chronologicl series Successive ctions of WEB user Successive moves during chess gme A musicl

..) Pierre Dupont "=====CDFDCBBBBBBBA==cdc==DGFB=cc==... ==cffc=ccc==cd==bcb==dfdc=====" String of Primitives Pierre Dupont Grmmr Induction Pttern Recognition Grmmr Induction A modeling hypothesis 5 3.

2 Grmmr Induction Exmples Grmmr Induction Chromosome clssifiction Nturl lnguge sentence Speech Centromere Chronologicl series Successive ctions of WEB user Successive moves during chess gme A musicl piece A progrm A form chrcterized y chin code grey density grey dens. derivtive Chromosome position long medin xis A iologicl sequence (DNA, proteins,...) Pierre Dupont "=====CDFDCBBBBBBBA==cdc==DGFB=cc==... ==cffc=ccc==cd==bcb==dfdc=====" String of Primitives Pierre Dupont Grmmr Induction Pttern Recognition Grmmr Induction A modeling hypothesis 5 3.cont 3.cont G Genertion Dt Induction Grmmr G 3 dc Find G s close s possile to G The induction process does not prove the existence of G It is modeling hypothesis dc: Pierre Dupont 5 Pierre Dupont

3 Grmmr Induction Grmmr Induction Identifiction in the limit Grmmr induction definition Lerning prdigms DFA lerning from positive nd negtive exmples RPNI lgorithm Proilistic DFA lerning Appliction to nturl lnguge tsk Links with Mrkov models Relted prolems nd future work G Dt Genertion Induction d d d n convergence in finite time to G G is representtion of L(G ) (exct lerning) Grmmr G G G* Pierre Dupont Pierre Dupont Grmmr Induction Lerning prdigms Grmmr Induction PAC Lerning How to chrcterize lerning? G Genertion Dt Induction Grmmr which concept clsses cn or cnnot e lerned? d d G G wht is good exmple? is it possile to lern in polynomil time? convergence to G d n G* G is close enough to G with high proility Proly Approximtely Correct lerning polynomil time complexity Pierre Dupont 9 Pierre Dupont

4 Grmmr Induction Grmmr Induction Other lernility results Define proility distriution D on set of strings Σ n L(G ) L(G* ) P [P D (L(G ) L(G )) < ɛ] > δ The sme unknown distriution D is used to generte the smple nd to mesure the error Σ Identifiction in the limit in polynomil time DFAs cnnot e efficiently identified in the limit unless we cn sk equivlence nd memership queries to n orcle PAC lerning DFAs re not PAC lernle (under some cryptogrphic limittion ssumption) unless we cn sk memership queries to n orcle The result must hold for ny distriution D (distriution free requirement) The lgorithm must return n hypothesis in polynomil time with respect to ɛ, δ, n, R(L) Pierre Dupont Pierre Dupont Grmmr Induction Identifiction in the limit: good nd d news Grmmr Induction The d one... Theorem. No superfinite clss of lnguges is identifile in the limit from positive dt only The good one... Theorem. Any dmissile clss of lnguges is identifile in the limit from positive nd negtive dt PAC lerning with simple exmples, i.e. exmples drwn ccording to the conditionl Solomonoff-Levin distriution P c (x) = λ c K(x c) K(x c) denotes the Kolmogorov complexity of x given representtion c of the concept to e lerned regulr lnguges re PACS lernle with positive exmples only ut Kolmogorov complexity is not computle! Pierre Dupont 3 Pierre Dupont 5

5 Grmmr Induction Cognitive relevnce of lerning prdigms Grmmr Induction A lrgely unsolved question Lerning prdigms seem irrelevnt to model humn lerning: Gold s identifiction in the limit frmework hs een criticized s children seem to lern nturl lnguge without negtive exmples All lerning models ssume known representtion clss Some lernility results re sed on enumertion Grmmr induction definition Lerning prdigms DFA lerning from positive nd negtive exmples RPNI lgorithm Proilistic DFA lerning Appliction to nturl lnguge tsk Links with Mrkov models Relted prolems nd future work Pierre Dupont Pierre Dupont Grmmr Induction Grmmr Induction Regulr Inference from Positive nd Negtive Dt However lerning models show tht: n orcle cn help some exmples re useless, others re good: chrcteristic smples typicl exmples lerning well is lerning efficiently Additionl hypothesis: the underlying theory is regulr grmmr or, equivlently, finite stte utomton Property. Any regulr lnguge hs cnonicl utomton A(L) which is deterministic nd miniml (miniml DFA) Exmple : L = ( ) exmple frequency mtters good exmples re simple exmples cognitive economy Pierre Dupont Pierre Dupont 9

6 Grmmr Induction A few definitions Grmmr Induction A theorem Definition. A positive smple S + is structurlly complete with respect to n utomton A if, when generting S + from A: every trnsition of A is used t lest one every finl stte is used s ccepting stte of t lest one string The positive dt cn e represented y prefix tree cceptor (PTA) Exmple : {,, } 3 5 Exmple : {,,, λ} Theorem 3. If the positive smple is structurlly complete with respect to cnonicl utomton A(L ) then there exists prtition π of the stte set of P T A such tht P T A/π = A(L ) Pierre Dupont Pierre Dupont Grmmr Induction Merging is fun Grmmr Induction A 3 5 A,,, Merging definition of prtition π on the set of sttes Exmple : {{,}, {}} If A = A /π then L(A ) L(A ) : merging sttes generlize lnguge,, How re we going to find the right prtition? Use negtive dt! Pierre Dupont Pierre Dupont 3

7 Grmmr Induction Summry Grmmr Induction A(L ) Dt Genertion Induction Grmmr PTA π PTA π? We oserve some positive nd negtive dt The positive smple S + comes from regulr lnguge L The positive smple is ssumed to e structurlly complete with respect to the cnonicl utomton A(L ) of the trget lnguge L (Not n dditionl hypothesis ut wy to restrict the serch to resonle generliztions!) We uild the Prefix Tree Acceptor of S +. By construction L(P T A) = S + Merging sttes generlize S + Grmmr induction definition Lerning prdigms DFA lerning from positive nd negtive exmples RPNI lgorithm Proilistic DFA lerning Appliction to nturl lnguge tsk Links with Mrkov models Relted prolems nd future work The negtive smple S helps to control over-generliztion Note: finding the miniml DFA consistent with S +, S is NP-complete! Pierre Dupont Pierre Dupont Grmmr Induction An utomton induction lgorithm Grmmr Induction RPNI lgorithm RPNI is prticulr instnce of the generliztion s serch prdigm Algorithm Automton Induction input S + S A P T A(S + ) // positive smple // negtive smple // PTA RPNI follows the prefix order in PTA 3 5 while (i, j) choose sttes () do if comptile (i, j, S ) then A A/π ij end if end while return A // Choose stte pir // Check for comptiility of merging i nd j Pierre Dupont 5 Polynomil time complexity with respect to smple size (S +, S ) RPNI identifies in the limit the clss of regulr lnguges A chrcteristic smple, i.e. smple such tht RPNI is gurnteed to produce the correct solution, hs qudrtic size with respect to A(L ) Additionl heuristics exist to improve performnce when such smple is not provided Pierre Dupont

8 Grmmr Induction RPNI lgorithm: pseudo-code Grmmr Induction Serch spce chrcteriztion input S +, S output A DFA consistent with S +, S egin A P T A(S + ) // N denotes the numer of sttes of P T A(S + ) π {{}, {},..., {N }} // One stte for ech prefix ccording to stndrd order < for i = to π // Loop over prtition susets π for j = to i // Loop over susets of lower rnk π π\{b j, B i }U{B i UB j } // Merging B i nd B j A/π derive (A, π ) π determ merging (A/π ) if comptile (A/π,S ) then // Deterministic prsing of S π π rek // Brek j loop end if end for // End j loop end for // End i loop return A/π Conditions on the lerning smple to gurntee the existence of solution DFA nd NFA in the lttice Chrcteriztion of the set of mximl generliztions similr to the G set from Version Spce Efficient Incrementl lttice construction is possile RPNI lgorithm Possile serch y genetic optimiztion Pierre Dupont Pierre Dupont 3 Grmmr Induction An execution step of RPNI Grmmr Induction Merge 5 nd Merge nd Merge nd Merge 9 nd Merge nd Grmmr induction definition Lerning prdigms DFA lerning from positive nd negtive exmples RPNI lgorithm Proilistic DFA lerning Appliction to nturl lnguge tsk Links with Mrkov models Relted prolems nd future work Pierre Dupont 9 Pierre Dupont 3

9 Grmmr Induction Proilistic DFA Grmmr Induction A proilistic utomton induction lgorithm. P () =...3 A structurl nd proilistic model n explicit nd noise tolernt theory A comined inductive lerning nd sttisticl estimtion prolem Lerning from positive exmples only nd frequency informtion Outside of the scope of the previous lerning prdigms..3. Algorithm Proilistic Automton Induction input S + α A P P T A(S + ) while (i, j) choose sttes () do if comptile (i, j, α) then A A/π ij end if end while return A // positive smple // precision prmeter // Proilistic PTA // Choose pir of sttes // Check for comptiility of merging i nd j Pierre Dupont 3 Pierre Dupont 3 Grmmr Induction Proilistic prefix tree cceptor (PPTA) Grmmr Induction Comptiility criterion 3 / /3 / /3 5 /3 /3 /3,3 /3 /3 5 Pierre Dupont 33 ALERGIA, RLIPS Two sttes re comptile (cn e merged) if their suffix distriutions re close enough MDI Two sttes re comptile if prior proility gin of the merged model compenstes for the likelihood loss of the dt: Byesin lerning (not strictly in this cse) sed on Kullck-Leiler divergence Pierre Dupont 35

10 Grmmr Induction ALERGIA RPNI stte merging order Grmmr Induction Kullck-Leiler divergence Comptiility mesure : q q D(P A P A ) nottion = D(A A ) = x Σ P A (x) log P A (x) P A (x) = x Σ P A (x) log P A (x) H(A ) Likelihood of x given model A : C(q,) C(q ) C(q,) C(q ) < ln ( α + A C(q ) C(q ) δ(q, ) nd δ(q, ) re α A comptile, Σ Remrks: It is recursive mesure of suffix proximity ), Σ {#} This mesure does not depend on the prefixes of q nd q locl criterion Pierre Dupont 3 Cross entropy etween A nd A : P A (x) = P (x A ) P A (x) log P A (x) When A is mximum likelihood estimte, e.g. the PPTA, cross entropy mesure the likelihood loss while going from A to A Pierre Dupont 3 Grmmr Induction Byesin lerning Grmmr Induction MDI lgorithm Find model ˆM which mximizes the likelihood of the dt P (X M) nd the prior proility of the model P (M) : ˆM = rgmx M PPTA mximizes the dt likelihood P (X M).P (M) A smller model (numer of sttes) is priori ssumed more likely RPNI stte merging order Comptiility mesure: smll divergence increse (= smll likelihood loss) with respect to size reduction (= prior proility increse) glol criterion (A, A ) A A < α M Efficient computtion of divergence increse D(A A ) = D(A A ) + (A, A ) (A, A ) = c i γ (q i, ) log γ (q i,) γ (q i,) q i Q Σ {#} Q = {q i Q B π (q i ) B π (q i )} denotes the set of sttes of A which hve een merged to get A from A Pierre Dupont 3 Pierre Dupont 39

11 Grmmr Induction Grmmr Induction Comprtive results Grmmr induction definition Lerning prdigms DFA lerning from positive nd negtive exmples RPNI lgorithm Proilistic DFA lerning Appliction to nturl lnguge tsk Links with Mrkov models Relted prolems nd future work Perplexity Trining smple size ALERGIA MDI Perplexity mesure the prediction power of the model: the smller, the etter Pierre Dupont Pierre Dupont Grmmr Induction Nturl lnguge ppliction: the ATIS tsk Grmmr Induction Perplexity Air trvel informtion system, spontneous Americn English Uh, I d like to go from, uh, Pittsurgh to Boston next Tuesdy, no wit, Wednesdy. Lexicon (lphet): 9 words Lerning smple: 3 sentences, 33 words Vlidtion set: 9 sentences, 3 words Test set: sentences, 3 words P (x j i qi ) : proility of generting x j i, the i-th symol of the j-th string from stte q i LL = S P P = perfectly predictive model S x log P (x j i qi ) j= i= P P = LL P P = Σ uniform rndom guessing over Σ Pierre Dupont Pierre Dupont 3

12 Grmmr Induction Grmmr Induction Equivlence etween PNFA nd HMM Grmmr induction definition Lerning prdigms DFA lerning from positive nd negtive exmples RPNI lgorithm Proilistic DFA lerning Appliction to nturl lnguge tsk Links with Mrkov models Relted prolems nd future work Proilistic non-deterministic utomt (PNFA), with no end-of-string proilities, re equivlent to Hidden Mrkov Models (HMMs) PNFA.. [.3]. [.].9..3 [.] [.]. [.] [.] [.9] [.] HMM with emission on trnsitions Pierre Dupont Pierre Dupont Grmmr Induction Links with Mrkov chins A suclss of regulr lnguges: the k-testle lnguges in the strict sense A k-tss lnguge is generted y n utomton such tht ll susequences shring the sme lst k symols led to the sme stte λ ˆp( ) = C() C() A proilistic k-tss lnguge is equivlent to k order Mrkov chin There exists proilistic regulr lnguges not reducile to Mrkov chins of ny finite order Grmmr Induction.. [.3]. [.].9..3 [.] [.]. [.] [.] [.9] [.] HMM with emission on trnsitions..3.9 [.] [.] [.3] [.] [.] [.9] [.] [.]...3 HMM with emission on sttes PNFA Pierre Dupont 5 Pierre Dupont

13 Grmmr Induction Grmmr Induction Grmmr induction definition Lerning prdigms DFA lerning from positive nd negtive exmples RPNI lgorithm Proilistic DFA lerning Appliction to nturl lnguge tsk Links with Mrkov models Relted prolems nd future work Grmmr induction definition Lerning prdigms DFA lerning from positive nd negtive exmples RPNI lgorithm Proilistic DFA lerning Appliction to nturl lnguge tsk Links with Mrkov models Relted prolems nd future work Pierre Dupont Pierre Dupont 5 Grmmr Induction The smoothing prolem Grmmr Induction Relted prolems nd pproches A proilistic DFA defines proility distriution over set of strings Some strings re not oserved on the trining smple ut they could e oserved their proility should e strictly positive The smoothing prolem: how to ssign resonle proility to (yet) unseen rndom events? Highly optimized smoothing techniques exist for Mrkov chins How to dpt these techniques to more generl proilistic utomt? I did not tlk out other induction prolems (NFA, CFG, tree grmmrs,...) heuristic pproches s neurl nets or genetic lgorithms how to use prior knowledge smoothing techniques how to prse nturl lnguge without grmmrs (decision trees) how to lern trnsducers enchmrks, pplictions Pierre Dupont 9 Pierre Dupont 5

14 Grmmr Induction Ongoing nd future work Definition of theoreticl frmework for inductive nd sttisticl lerning Links with HMMs: prmeter estimtion, structurl induction Smoothing techniques improvement key issue for prcticl pplictions Applictions to proilistic modeling of proteins Automtic trnsltion Applictions to Text ctegoriztion or Text mining Pierre Dupont 5

Learning Moore Machines from Input-Output Traces

Learning Moore Machines from Input-Output Traces Lerning Moore Mchines from Input-Output Trces Georgios Gintmidis 1 nd Stvros Tripkis 1,2 1 Alto University, Finlnd 2 UC Berkeley, USA Motivtion: lerning models from blck boxes Inputs? Lerner Forml Model