Context-Free Language Induction by Evolution of Deterministic Push-Down Automata Using Genetic Programming

Similar documents
Convert the NFA into DFA

Designing finite automata II

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

1.4 Nonregular Languages

Formal languages, automata, and theory of computation

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Parse trees, ambiguity, and Chomsky normal form

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

1 Nondeterministic Finite Automata

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

FABER Formal Languages, Automata and Models of Computation

Nondeterminism and Nodeterministic Automata

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

First Midterm Examination

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CS 330 Formal Methods and Models

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

Context-Free Grammars and Languages

Normal Forms for Context-free Grammars

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

First Midterm Examination

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Name Ima Sample ASU ID

Lecture 08: Feb. 08, 2019

Some Theory of Computation Exercises Week 1

1B40 Practical Skills

CS 275 Automata and Formal Language Theory

p-adic Egyptian Fractions

CMSC 330: Organization of Programming Languages

Minimal DFA. minimal DFA for L starting from any other

Homework 3 Solutions

Model Reduction of Finite State Machines by Contraction

Regular expressions, Finite Automata, transition graphs are all the same!!

Closure Properties of Regular Languages

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

Chapter 2 Finite Automata

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Tutorial Automata and formal Languages

Part 5 out of 5. Automata & languages. A primer on the Theory of Computation. Last week was all about. a superset of Regular Languages

Lecture 09: Myhill-Nerode Theorem

Formal Languages and Automata

CS 330 Formal Methods and Models

This lecture covers Chapter 8 of HMU: Properties of CFLs

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

5.1 Definitions and Examples 5.2 Deterministic Pushdown Automata

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

Revision Sheet. (a) Give a regular expression for each of the following languages:

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

2.4 Linear Inequalities and Interval Notation

Harvard University Computer Science 121 Midterm October 23, 2012

Theory of Computation Regular Languages

CS 311 Homework 3 due 16:30, Thursday, 14 th October 2010

CS 275 Automata and Formal Language Theory

Thoery of Automata CS402

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

3 Regular expressions

Lexical Analysis Finite Automate

1.3 Regular Expressions

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Formal Language and Automata Theory (CS21004)

More on automata. Michael George. March 24 April 7, 2014

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Overview HC9. Parsing: Top-Down & LL(1) Context-Free Grammars (1) Introduction. CFGs (3) Context-Free Grammars (2) Vertalerbouw HC 9: Ch.

Finite Automata-cont d

19 Optimal behavior: Game theory

CISC 4090 Theory of Computation

CS 330 Formal Methods and Models

Learning Moore Machines from Input-Output Traces

CSC 311 Theory of Computation

Section 4: Integration ECO4112F 2011

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy:

Let's start with an example:

Lexical Analysis Part III

CS 275 Automata and Formal Language Theory

Non-deterministic Finite Automata

Bridging the gap: GCSE AS Level

DFA minimisation using the Myhill-Nerode theorem

CHAPTER 1 Regular Languages. Contents

CS 314 Principles of Programming Languages

Parsing and Pattern Recognition

Deterministic Finite Automata

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

Transcription:

Context-Free Lnguge Induction y Evolution of Deterministic Push-Down Automt Using Genetic Progrmming Afr Zomorodin Computer Science Deprtment Stnford University P. O. Box 7171 Stnford, CA 94309 fr@cs.stnford.edu Astrct The process of lerning often consists of Inductive Inference, mking generliztions from smples. The prolem here is finding generliztions (Grmmrs) for Forml Lnguges from finite sets of positive nd negtive smple sentences. The focus of this pper is on Context-Free Lnguges (CFL s) s defined y Context-Free Grmmrs (CFG s), some of which re ccepted y Deterministic Push-Down Automt (D-PDA). This pper descries met-lnguge for constructing D-PDA s. This lnguge is then comined with Genetic Progrmming to evolve D-PDA s which ccept lnguges. The technique is illustrted with two fvorite CFL s. 1 Introduction nd Overview Intelligent ehvior often consists of Inductive Inference, mking generliztions from smple incidents. The prolem discussed here is finding generliztions for Forml Lnguges from finite sets of positive nd negtive smple sentences. A positive sentence is defined e sentence ccepted y the grmmr of lnguge nd hence included in the lnguge. A negtive sentence is defined ccordingly. Chomsky (Chomsky 1962) divides ll lnguges nd utomt into four clsses, from the most powerful (type 0 corresponding to Turing Mchines nd unrestricted grmmrs) to the lest powerful (type 3 corresponding to Finite Stte Mchines nd Regulr Expressions.) Context- Free Lnguges nd Grmmrs (type 2) re of fundmentl importnce in Computer Science since high-level procedurl progrmming lnguges, such s Pscl or C cn e represented y CFG s nd prsed y Deterministic Push-Down Automt (D-PDA). A D-PDA is defined to e finite-stte mchine coupled with ottom-less stck. The mchine is cple of reding letters off n input tpe, pushing symols onto stck, nd popping them off the stck. The mchine rejects the sentence if it reds off the end of the sentence (eyond the end-of-sentence mrker) or if it reches the stte. It ccepts the sentence if it reches n stte regrdless of the sttus of its stck. Automtic genertion of D-PDA s would e dvntgeous in nlysis of new progrmming lnguges. An utomticlly generted D-PDA could educte the resercher out the idiosyncrsies or fults of the lnguge nd provide lterntive unknown wys of prsing the lnguge. Another motivtion for the desirility of utomticlly defined D-PDA s is from theoreticl stndpoint. Evolution of D-PDA s cn e viewed s step towrd the evolution of Turning Mchines cple of lerning more complex lnguges. In this pper, lnguge is developed cple of encoding the structure of ny utomt in the spce of D-PDA's. This lnguge is then successfully used long with Genetic Progrmming (GP) to evolve D-PDA s which ccept lnguges. The reminder of this pper is orgnized s follows: Section 2 descries some previous work on lnguge induction. Section 3 provides some motivtion for using Genetic Progrmming. Section 4 presents new pproch to GP lnguge induction nd descries met-lnguge. Section 5 gives results of the ppliction of the work on two CFL s. Section 6 presents discussion of the lessons lerned in this reserch. Section 7 provides some future directions nd section 8 concludes this pper. 2 Previous Work Much work hs een crried out in lnguge-inference since Gold (Gold 1967) initilly estlished the possiility of inferring lnguges from sentences. The inference methods hve included enumertive methods, hill climing (Tomit 1982), higher order enhnced recurrent neurl nets (Giles 1990), second-order recurrent neurl nets (Wtrous 1992), nd Genetic Progrmming (Duny 1994). Most of this work consists in inferring Regulr Lnguges nd the corresponding Deterministic Finite Automt (DFA s). Tomit (Tomit 1982) ws successful in evolving DFA s using hill climing nd sttic numer of sttes. He did, however, encounter prolem when his lgorithm filed to find the correct mchine y climing locl hill. Duny, Petry, nd Buckles (Duny et l. 1994) relized the usefulness of GP for inferring Regulr Lnguges y representing DFA s s S-Expressions nd llowing GP to determine the numer of sttes needed. In their trnsltion scheme, the progrm generted is never evluted s progrm ut is the DFA (This is nlogous to the DNA eing the living creture.) A prolem tht emerged, however, ws tht ck pointers, pointers from

sttes to previous sttes, were hrd to evolve in their trnsltion scheme. In other words, the system ws not cple of evolving D-PDA s which ccept regulr lnguges with mny ckpointers. The prolem fced y Duny nd others stems from the fct tht stte-mchine is directed grph nd n S-expression is tree. Duny s solution to this prolem ws trnsltion scheme to convert DFA s to S- expressions. In this scheme, DFA s with ckpointers re hrd to evolve. These ckpointers represent conditionl finite nd infinite loops which re sic uilding locks of stte-mchines. 3 Motivtion For Using Genetic Progrmming In lnguge induction, we re essentilly serching the entire spce of D-PDA's to find generlized solution which hs lerned lnguge. Genetic Algorithms re effective serch lgorithms which do not fll into locl minim, unlike hill climing (Tomit 1982.) Conventionl genetic lgorithms, however, re not directly pplicle to the prolem t hnd ecuse their chromosomes hve constnt length nd the numer of sttes in the solution D-PDA is unknown. Genetic Progrms my e viewed s Genetic Algorithms with vrile-sized chromosomes. 4 A New Approch GP is not directly pplicle to the genertion of D-PDA s. As Duny explins, [u]nless mechnism cn e dded to the GP to more esily ccount for ck pointers, there will continue to e simple DFA s which re difficult to evolve using their trnsltion scheme (Duny et l. 1992). Trditionlly, S-expressions re used in GP ecuse the cross-over of two S-expressions is lwys syntcticlly vlid. S-expressions, however, cnnot represent loops nd ckpointers elegntly. One possile solution to the ckpointer prolem is the invention of nother set of lnguges which hve closure under the cross-over opertion. Another pproch would e to dd level of indirection into the GP pipeline. The ltter pproch is presented in this pper. 4.1 APDAL: A PDA Lnguge The pproch tken here is different from previous work done in this field in tht the GP generted progrm is not the solution ut rther the constructing progrm for the solution. The encoding lnguge is met-lnguge which descries the structure of D-PDA. This dds level of indirection to the trditionl pipeline of GP. Figure 1 shows prtil representtion of the GP pipeline. In ech genertion, D-PDA is constructed y running n progrm from the popultion. Once constructed, the D-PDA is sked to evlute numer of sentences. The progrm s fitness depends on it s D-PDA s performnce insmuch s n niml s DNA fitness depends on the niml s survivl. GP ultion of Progrms D-PDA construction D-PDA Fitness Evlution + Sentences - Figure 1: GP prolem solving pipeline The encoding lnguge, APDAL, consists of numer of mcros. A Lisp-like mcro is executed efore ny of its prmeters re. In APDAL, the prmeters re mcros themselves which crete D-PDA sttes or pseudo-sttes (s defined elow) nd ttch their cretion to the stte or pseudo-stte creted y the clling mcro. The APDAL mcros re: READ (rity 3): cretes red stte nd ttches it to the previous stte. The red stte hs pointers to three other sttes s specified y its three prmeter mcros. A red stte reds chrcter off smple sentence nd depending on the chrcter (,, or empty), chooses trnsition to nother stte. POP (rity 3): like READ, except the creted stte reds chrcter off the D-PDA's stck. PUSHA, PUSHB (rity 1): cretes pseudo-stte which pushes n A or B onto the stck, respectively. A pseudo-stte is stte which cn only e pointed to y one stte. REJECT, ACCEPT (rity 0): ttches the clling mcro s stte to the reject or ccept stte. For exmple, if REJECT is the first rgument to READ, it will point the red stte's first pointer to the reject stte. A specil scheme is used to solve the ckpointer prolem: During the ssemly of D-PDA from n

APDAL progrm, the sttes re numered cumultively s they re creted. The pseudo-sttes creted y PUSHA nd PUSHB re not numered. Furthermore, vrile (let s cll it LASTSTATE) keeps trck of the lst stte creted. The following two specil mcros llow n APDAL progrm to chnge the vlue of this vrile: DEC (rity 1): DEC is specil opertor which does not crete stte, ut decrements LASTSTATE. Note: DEC lwys decrements LASTSTATE regrdless of whether cll will e used or NOT. If the current stte is the strt stte, LASTSTATE is not decremented. TOLAST (rity 0): This mcro ttches the clling mcro s stte to the stte specified y LASTSTATE. Therefore, ny stte cn point to ny of the previously creted sttes y using DEC to decrement LASTSTATE, nd TOLAST to chieve the connection. The vlue of LASTSTATE is updted to e the numer of the lst stte creted every time READ, POP, or TOLAST re clled. Strt Figure 3: Strt Stte Strt Red Figure 4: Strt nd Red sttes 4.2 An Exmple: n n The est wy to ecome comfortle with APDAL is to see the development of D-PDA from n APDAL progrm. Consider the progrm in Figure 2. This progrm develops D-PDA for the lnguge n n (n s followed y n s.) Push Strt Red (PUSHA (TOLAST)) (REJECT) (DEC (TOLAST)) (REJECT) (REJECT) (ACCEPT))) (REJECT) (REJECT)) (TOLAST)) Push Figure 5: Specil Mcro TOLAST Strt Red Figure 2: Progrm for n n The construction of ny D-PDA egins with defult Strt Stte (Figure 3). The first mcro in the progrm is READ mcro which cretes red stte nd ttches it to the Strt stte (Figure 4). Note tht the red stte hs three stte-trnsition pointers for chrcters,, nd nothing (end of line mrker). The sttes creted y the READ mcro s three prmeters will ttch themselves to these pointers. The next mcro is PUSHA mcro which cretes pseudo-stte to push n onto the D-PDA s stck if n ws red y the red stte (Figure 5). PUSHA is onerity mcro nd its prmeter is cll to TOLAST which connects the push stte s pointer to the red stte, the lst rel stte creted. Execution of the entire APDAL progrm in Figure 2 results in the D-PDA in Figure 6. The interested reder should confirm tht the D-PDA is in fct stte mchine ccepting the CFL n n. Red,, Figure 6: D-PDA for CFL n n (See Figure 2) 4.3 Implementtion APDAL ws implemented nd deugged in Ansi-C using Symntec 7.0 on Mcintosh. The code ws ported to Sun SPARCsttion 20 Model 61 nd compiled using GNU s gcc. DGPC (Dve s Genetic Progrmming Code) y Dvid André ws used to generte nd evolve APDAL progrms.

Ojective Terminl Set Function Set nd rity Fitness Cses Rw Fitness Stndrd Fitness Hits Wrpper Other Prms. Success Predicte Evolve progrm whose output is D-PDA which ccepts CFL. ACCEPT, REJECT, TOLAST READ (3), POP (3), PUSHA (1), PUSHB (1), DEC (1) For ech developed D-PDA: numer of positive nd negtive smples from the lnguge. The fitness for which the D-PDA produces the right output ( Ntp + Ntn) 1 C, where 2 NtpNtn NfnNfp C = ( Ntn + Nfn)( Ntn + Nfp)( Ntp + Nfn)( Ntp + Nfp) where: Ntp = Numer of True Positives, Ntn = Numer of True Negtives, Nfp = Numer of Flse Positives, Nfn = Numer of Flse Negtives. Sme s Rw Fitness. None. M = 3000 individuls, T = 75 stte trnsitions, G = 200 genertions, Crossover(node) = 80%, Crossover(leves) = 10%, Muttion = 0% The Stndrd Fitness equls zero. (PUSHB ) (PUSHB (DEC (REJECT )) (DEC ) (REJECT ) ) (REJECT )) ) Figure 7: Solution t genertion 11 for n n D-PDA uses ""'s to count the numer of ""'s. Push Push Strt Red Red Useless rnch: It will never get here since it pushes 's on the stck. For some reson, the D-PDA is concerned with not leving ny ""'s on the stck. But there re no "Push " sttes. Tle 1: Genetic Progrmming Prmeters 4.4 Genetic Progrmming Prmeters Tle 1 summrizes the GP prmeters tht were used for the first lnguge discussed in the Results section. Similr prmeters were used for the other lnguge. 5 Results Two lnguges were chosen for this reserch. The lnguges re: n n nd lnced prenthesis. Both lnguges re fvorites in Computer Science s exmples of CFL s nd re frequently used to demonstrte the difference etween Regulr Lnguge nd CFL y the Pumping Lemm for Regulr Lnguges (Hopcroft nd Ullmn 1979.) 5.1 n n n n is very smll lnguge since for ny n, the frction of strings of length up to 2n elonging to this lnguge is: Figure 8: An exct solution for n n n + 1, 2n+ 1 2 1 since the lnguge only ccepts one string from every schem of even-length nd none from schem of oddlength. A set of ten positive nd nineteen negtive smple sentences were chosen for the lnguge. The numer of negtive smples is higher ecuse the lnguge is very restrictive (most sentences do not elong to the lnguge.) In smple run, the est-of-genertion for genertion 0 hd fitness of 0.33515 with 16 Hits. The solution emerged in genertion 11 nd is shown in Figure 7. The D-PDA generted y the progrm is shown in Figure 8. Note the following: The D-PDA uses s to count s. While this is izrre to humn, it is only so ecuse humns ssocite counting s with the symol. The mchine does not differentite etween symols nd uses one for counting purposes. Also note tht the solution in Figure 7 is not only n exct solution, it is lmost the sme s the humn-generted one in Figure 6!

5.2 Blnced Prentheses For lnced prentheses, the symol ws used to denote ( nd the symol ws used to denote ). There were 11 positive smples nd 12 negtive smples. The est-of-genertion for genertion 0 hd fitness of 0.28035 with 15 hits. At genertion 24, solution emerged. The progrm is shown in Figure 9 nd the generted D-PDA in Figure 10. The solution generted for the lnced prentheses is rther clever. It hs the first pop stte in n unfortunte position (right fter the strt stte.) This pop stte is checking to see if there re enough open prentheses in the sentence y popping the stck. When the mchine is ctivted, however, the stck is empty. The D-PDA s solution is to push on the stck in order to rech the red stte. Another interesting spect of this D-PDA is tht it pushes nd n for every open prenthesis onto the stck. It uses the symol to count the numer of open prentheses, so efore mking ny decisions in pop stte, it lwys pops off ll the s. Other thn the misplced pop stte, the solution hs exctly the numer of ctive sttes s the humn-generted solution nd it hs the sme structure too! 6 Discussion The successful genertion of D-PDA s using APDAL nd GP ws the fruit of the lessons lerned during the design of APDAL nd the mny unsuccessful runs which led to design nd prmeter chnges. I will now riefly discuss some of the issues rised y the APDAL-GP system. A fst implementtion of APDAL is n importnt concern. The memory for the sttes in the progrm is therefore not dynmiclly llocted ut is tken off n sttic glol rry of sttes. The sme memory is then used for ll the individuls nd for ll the genertions. Thus, the system cn e used on computers with smll mount of memory nd is lso fster since it is not llocting nd dellocting memory hundreds of thousnds of times per run. Another interesting implementtion issue is how the sttes re connected. The finl implementtion uses stck: every stte is responsile for ttching itself to the lst stte y popping the stck nd in turn llowing other sttes to e connected to it y pushing its successor pointers onto the stck. The stck is not needed y DGPC ut is included so tht the lnguge cn e used to generte D-PDA s for other purposes. At first, the stndrd fitness of n individul ws defined to e the mximum numer for hits minus the individul s numer of hits. Becuse of this definition, however, the numer of positive nd negtive smple sentences hd to e the sme (ten smple sentences were used for ech ctegory.) In ddition, popultion of 1000 ws used for 100 Genertions with n llotted time of 100 ticks per string (M = 1000, G= 100, T= 100.) The first successful individul for the lnguge n n ws evolved on the second run on genertion 17 s shown in Figure 11. (PUSHB (TOLAST (PUSHB (PUSHA )) (DEC ) (REJECT ) (REJECT )) (PUSHB ) ) (REJECT ))) Figure 9: Solution t genertion 24 for Blnced Prentheses The D-PDA gets rid of the useless 's it pushes on the stck. It is counting with the symol "". Strt Push Intelligent ehvior: The D-PDA pushes "" on the stck in order to rech the Red Stte! The D-PDA pops off ll the ""'s ecuse it is counting with 's nd wnts to see if there re ny ""'s left on the stck. If there re, there were too mny "("'s, This is the reject stte for too mny 's (too mny ")"'s) Red Push Push Push The D-PDA pushes n "" nd "" on the stck for every Red, This is useless since there is lwys n "" on top of "" in the stck. Figure 10: Clrifying Digrm of solution for Blnced Prentheses (PUSHA(PUSHA(TOLAST))(PUSHA(PU SHB(TOLAST)))(DEC(TOLAST))(REJECT)(ACCEPT )(ACCEPT))(REJECT)(ACCEPT)(REJECT)))(PUS HB(REJECT))(REJECT)(TOLAST)(ACCEPT))(PUSHB(TO LAST))))(REJECT)(ACCEPT)(TOLAST)))(PUSHA(REJ ECT))(DEC(TOLAST)))(PUSHA(PUSHA(REJECT)))(PUS HB(REJECT))(DEC(TOLAST)))(DEC(PUSHA(ACCEPT))))) Figure 11: First successful individul, Genertion 17, Second Run

The individul is not only long nd composed of mny more sttes thn needed, ut lso is defective. The discerning user cn oserve tht it ccepts the string which is clerly not in the lnguge n n. The following list of prole cuses for the evolution of this defective individul were identified: The ottomless stck ( stck which lwys returns n empty-stck-symol when popped while empty) llowed the individul to get lrge nd llow for defects. The llotted time of 100 times llowed lrge individuls to tke long time nd successfully process the smple sentences (the longest time the humn solution tkes for string is 34 ticks.) The individuls should e punished more for running out of time, hence incresing the selective pressure to remin smll. The lnguge is very restrictive nd more negtive smples re needed. The lst cuse seemed more prole nd the lck of enough negtive smples hd lwys een disturing fctor. The uthor noted tht the reson the string ws ccepted y the D-PDA ws the lck of enough negtive smples with more s thn s nd tht the D-PDA hd lerned the smples. The stndrd fitness ws chnged to involve correltion which llows for different numers of positive nd negtive smples. Nine more negtive sentences with more s thn s nd sentences strting with s were dded. After mking these chnges, the runs ecme unsuccessful. Mny of the runs would hve est-ofgenertion individuls with up to 23 hits, ut none of the runs were le to solve the prolem. The numer of genertions ws incresed to 200 with no result nd then up to 250 without ny chnge. The uthor oserved, however, tht there were usully no improvements in fitness fter out genertion 150. This ment the popultion size ws too smll. The popultion size ws incresed to 2000 nd fitness improved remrkly. A solution ws found when popultion ws rised to 3000. Hving lrge popultion mens tht solution could presumly e found y rndom chnce (essentilly chnging our serch lgorithm from the Genetic Algorithm to Rndom Serch.) Fitness ws set to e the sme for ll the individuls in the popultion (fitness = 1) for severl runs. No solution emerged from this rndom serch. Further, no improvements were seen in the numer of hits of individuls in the genertions. Therefore, the Genetic Algorithm ws responsile for generting solutions. The llowed time for processing string ws then reduced from 100 to 75 nd solution ws still found, except it ws smller. The numer of possile nodes in the Result-Producing Brnch (RPB) ws lso lowered from 50 to 30 nd the result ws even smller progrms. The lessons lerned here were: A smll numer of nodes in the RPB long with smll llotted time for string processing would pply enough selective pressure to keep the progrms smll. The prolem spce is too ig to e solved with popultion of 1000. Further, 100 genertions is enough to solve the prolems. The serch spce is extremely lrge ( very rough estimte of 30 8 ) nd popultion of 3000 will not help the rndom serch lgorithm to find solution to the D-PDA prolem. 7 Future Work Future directions include: Testing with vriety of CFL s Anlysis of dependility of the APDAL-GP system to develop D-PDA's for deterministic CFL's. Extension of APDAL to hndle lnguges with more thn 2 symols (with eventul extension to hndle ny lnguge regrdless of numer of symols.) Evolution of more complex stte mchines from the Chomsky Hierrchy 8 Conclusions Genetic Progrmming with APDAL encoding lnguge is powerful system to develop D-PDA's which lern CFL from smll set of smple sentences from the lnguge. The contriution of this reserch is the encoding of D-PDA's into Lisp-like lnguge nd the ppliction of GP to inferring CFL's. Acknowledgments The ide for APDAL ws induced to me y Frédéric Gruu's Genetic Micro Progrmming of Neurl Networks (Gruu 1994). I would like to thnk Professor Koz persuding me tht this project topic ws superior to my other ones, nd for his mny helpful comments. I would lso like to thnk Dvid André nd Scott Brve for DGPC. Finlly, my thnks go to my dvisor Mggie Johnson for introducing me to CS Theory nd for her incredile notes on the suject. References Chomsky, Nom. 1962. Context Free Grmmr nd Pushdown Storge. Qurterly Progress Report 65. The MIT Reserch Lortory in Electronics. MIT Duny, B. D., Petry, F. E., nd Buckles, W. P. 1994. Regulr lnguge induction with genetic progrmming. In Proceedings of the First IEEE Conference on Evolutionry Computtion. IEEE Press. Volume I. Pges 396-400. Giles, C. L., Miller, C. B., Chen D., Chen, H. H., Sun, G. Z., nd Lee, Y. C. 1992. Lerning nd Extrcting Finite Stte Automt with Second-Order Recurrent Neurl Networks. Neurl Computtion 4, 393-405.

Gold, E. M. 1967. "Lnguge Identifiction in the Limit," Inform. Contr. 10, pp. 447-474. Gruu, Frédéric. 1995. Genetic micro progrmming of neurl networks. In Kinner, Kenneth E. Jr. (editor). Advnces in Genetic Progrmming. Cmridge, MA: The MIT Press. Pges 495-518. Hollnd, John H. 1975. Adpttion in Nturl nd Artificil Systems. Ann Aror, MI: University of Michign Press. Hopcroft J., Ullmn J. 1979. Introduction to Automt Theory, Lnguges, nd Computtion. Reding. MA: Addison Wesley. Koz, John R. 1992. Genetic Progrmming: On The Progrmming of Computers y Nturl Selection. Cmridge, MA: MIT Press. Tomit, Msru. 1982. Dynmic Construction Of Finite- Stte Automt From Exmples Using Hill-Climing. Proceedings to the Fourth Annul Cognitive Sciences Conference. Pges 105-108. Wltrous, Rmond L. nd Kuhn, Gry M. 1992. Induction of Finite-Stte Lnguges Using Second- Order Recurrent Networks. Neurl Computtion 4. Pges 404-414.