Tournament selection in zeroth-level classifier systems based on. average reward reinforcement learning

Size: px
Start display at page:

Download "Tournament selection in zeroth-level classifier systems based on. average reward reinforcement learning"

Transcription

1 ournamen selecion in zeroh-level classifier sysems based on average reward reinforcemen learning Zang Zhaoxiang, Li Zhao, Wang Junying, Dan Zhiping (Hubei Key Laboraory of Inelligen Vision Based Monioring for Hydroelecric Engineering, China hree Gorges Universiy, Yichang Hubei, , China; College of Compuer and Informaion echnology, China hree Gorges Universiy, Yichang Hubei, , China) Absrac: As a geneics-based machine learning echnique, zeroh-level classifier sysem (ZCS) is based on a discouned reward reinforcemen learning algorihm, bucke-brigade algorihm, which opimizes he discouned oal reward received by an agen bu is no suiable for all muli-sep problems, especially large-size ones. here are some undiscouned reinforcemen learning mehods available, such as R-learning, which opimize he average reward per ime sep. In his paper, R-learning is used as he reinforcemen learning employed by ZCS, o replace is discouned reward reinforcemen learning approach, and ournamen selecion is used o replace roulee wheel selecion in ZCS. he modificaion resuls in classifier sysems ha can suppor long acion chains, and hus is able o solve large muli-sep problems. Key words: average reward; reinforcemen learning; R-learning; learning classifier sysems (LCS); zeroh-level classifier sysem (ZCS); muli-sep problems Inroducion Learning Classifier Sysems (LCSs) are rule-based adapive sysems which use Geneic Algorihm (GA) and some machine learning mehods o faciliae rule discovery and rule learning[]. LCSs are compeiive wih oher echniques on classificaion asks, daa mining[2, 3] or robo conrol applicaions[4, 5]. In general, an LCS is a model of an inelligen agen ineracing wih an environmen. Is abiliy o choose he bes policy acing in he environmen, namely adapabiliy, improves wih experience. he source of he improvemen is he learning from reinforcemen, i.e. payoff, provided by he environmen. he aim of an LCS is o maximize he achieved environmenal payoffs. o do his, LCSs ry o evolve and develop a populaion of compac and maximally general "condiion-acion-payoff" rules, called classifiers, which ell he sysem in each sae (idenified by he condiion) he amoun of payoffs for any available acion. So, LCSs can be seen as a special mehod of reinforcemen learning ha provides a differen approach o ge generalizaion. he original Learning Classifier Sysem framework proposed by Holland, is referred o as he radiional framework now. And hen, Willson proposed srengh-based Zeroh-level Classifier Sysem (ZCS)[6], and accuracy-based X Classifier Sysem (XCS)[7]. he XCS classifier sysem has solved he former main

2 shorcoming of LCSs, which is he problem of srong over-generals, by is accuracy based finess approach. Bull and Hurs[8] have recenly shown ha, despie is relaive simpliciy, ZCS is able o perform opimally hrough is use of finess sharing. ha is, ZCS was shown o perform as well, wih appropriae parameers, as he more complex XCS on a number of asks. Despie curren research has focused on he use of accuracy in rule predicions as he finess measure, he presen work depars from his popular approach and akes a sep backward, aiming o uncover he poenial of srengh based LCS (and paricularly ZCS) in sequenial decision problems. In his direcion, we will discuss he use of average reward in ZCS, and will inroduce an undiscouned reinforcemen learning echnique called R-learning[9, 0] for ZCS o opimize average reward, which is a differen meric from he discouned reward opimized by original ZCS. In paricular, we apply R-learning based ZCS o large muli-sep problems and compare i wih ZCS. Experimenal resuls are encouraging, in ha ZCS wih R-learning can perform opimally or near opimally in hese problems. Laer, we will refer o our proposal as "ZCSAR", and he "AR" sands for "average reward". he res of he paper is srucured as follows: Secion 2 provides some necessary background knowledge on reinforcemen learning, including Sarsa and R-learning. Secion 3 provides a brief descripion of ZCS and maze environmens. How ZCS can be modified o include he average reward reinforcemen learning is described in Secion 4, while Secion 5 analyzes he rouble resuling from our modificaion o ZCS, and presens some soluion o i. Experimens wih our proposal and some relaed discussion are given in Secion 6. Finally, Secion 9 ends he paper, presens our main conclusions and some direcions for fuure research. 2 Reinforcemen learning Reinforcemen learning is a formal framework in which an agen manipulaes is environmen hrough a series of acions, and receives some rewards as feedback o is acions, bu is no old wha he correc acions would have been. he agen sores is knowledge abou how o make decisions ha maximize rewards or minimize coss over a period of ime. Reinforcemen learning mus learn o perform a ask by rial and error from a reinforcemen signal (he reward values) ha is no as informaive as migh be desired. In reinforcemen learning for muli-sep problems, he reinforcemen signal usually gives delayed reward, which ypically comes a he end of a series of acions. Delayed reward makes learning much more difficul. Generally, he reinforcemen learning framework consiss of A discree se of environmen saes, S ; A discree se of available acions, A ; An immediae reinforcemen funcion R, mapping S A ino he real value r, where r is he expeced environmenal payoff afer performing he acion a, from A, in a paricular sae s, from S. On each sep of ineracion he agen perceives he environmen o be in sae s ; he agen hen chooses an acion a in he se A, and he acion a is performed in he environmen. As a resul of aking acion a, he agen receives a reward r and a new sae s.

3 he agen s job is o find a policy, mapping saes o acions, ha maximizes some long-run measure of reinforcemen. here are mainly wo measures o value a policy: discouned reward opimaliy and average reward opimaliy. In discouned reinforcemen learning, he performance measure being opimized usually is he infinie-horizon discouned model[], which akes he long-run reward of he agen ino accoun, bu rewards receiving in he fuure are geomerically 0 : discouned according o a discoun facor N lim E r ( ) 0 s () N where E denoes expeced value, and r ( s ) is he reward received a ime saring from sae s under a policy. An opimal discouned policy maximizes he above infinie-horizon discouned reward. On he oher hand, undiscouned reinforcemen learning usually opimizes he average reward model[9], in which he agen is supposed o ake acions ha maximize is long-run average reward per sep: E ( s) lim N N 0 N r ( s) If a policy maximizes he average reward over all saes, i is referred o as a gain opimal policy. Usually, average reward ( s) can be denoed as, which is sae independen[2] and grealy simplifies he design of average reward algorihms. How does he agen find a policy o maximize he long-run measure of reinforcemen? Mos of he reinforcemen learning algorihms are based on esimaing sae-acion pair value funcion (called acion value funcion) ha indicaes how good i is for he agen o perform a given acion in a given sae. Here, "how good" is defined in erms of fuure expeced reward value, usually as () or (2), corresponding o he discouned reward and average reward opimaliy. We will give a brief descripion of wo ypical reinforcemen learning algorihms based on discouned reward and on average reward opimaliy, respecively. (2) 2. Sarsa Algorihm Sarsa is a well-known reinforcemen learning algorihm ha can be seen as a varian of Q-learning algorihm[]. I is based on ieraively approximaing he able of all acion values Q( s, a ), named he Q-able. Iniially, all he Q( s, a ) values are se o 0. A ime sep, he agen perceives he environmen sae s, chooses an acion a by he -greedy policy. he acion a is performed in he environmen, and he agen receives an immediae reward rimm( s, a ) for doing acion a, and a new environmen sae s. hen, he enry Q( s, a ) is updaed using he following rule: Q( s, a ) Q( s, a ) Qˆ ( s, a ) Q( s, a ) (3)

4 Here, 0 is he learning rae conrolling how quickly errors in he esimaed acion values are correced; Q ˆ( s, ) a is he new esimae of Q( s, a ), and is compued as Qˆ ( s, a ) r ( s, a ) Q ( s, a ), (4) imm where rimm( s, a ) is he immediae reward received for performing a in sae s. 2.2 R-learning Since Q-learning discouns fuure rewards, i prefers acions ha resul in shor-erm ordinary rewards o hose ha resul in long-erm susained or considerable rewards. On he conrary, he R-learning algorihm[9] proposed by Schwarz maximizes he average reward per ime sep. R-learning is similar o Q-learning in form. I is based on ieraively approximaing he acion values R( s, a ), which represen he average adjused reward of doing an acion a in sae s once, and hen following corresponding policy subsequenly. R-learning algorihm consiss of he following seps: ) Iniialize all he R( s, a ) values o zero, and he average reward variable also iniialized o zero. 2) Le he curren ime sep be. From he curren sae s, choose an acion a by some exploraion/acion-selecion mechanism, such as he -greedy policy. 3) Perform he acion a, observe he immediae reward rimm( s, a ) received and he subsequen sae s. 4) Updae R values using he following rule:,,, max,, R s a R s a r s a R s a R s a (5) R imm aa 5) If Rs, a max Rs, a (i.e. if a greedy/non-random acion a was aa chosen), hen updae he average reward according o he rule:, max, max, r s a R s a R s a (6) imm aa aa 6), and go o sep 2. Here, 0 R is he learning rae for updaing acion values R(, ), and 0 is he learning rae for updaing average reward. he updae rule for acion value (, ) R differs from he rule for Q-learning in subracing he average reward from he immediae reward, and no discouning he nex maximum acion value. he esimaion of he average reward is a criical ask in R-learning. As menioned above, he average reward, under some condiions, does no depend on any sae, and is consan over he whole sae space[2]. his faciliaes he use of average reward algorihms.

5 Following he basic R-learning algorihm, [0] proposed some variaions. he variaions mainly focus on differen ways o updae he average reward, corresponding o he sep 5 given above. 3 ZCS Classifier Sysem and Is esing Environmens 3. A Brief Descripion of ZCS he following is a brief descripion of ZCS, furher informaion can be found in [6] and [8]. he ZCS archiecure was inroduced by Sewar Wilson in 994. I is a Michigan syle LCS wihou inernal memory, which periodically receives a binary encoded inpu from is environmen. he sysem deermines an appropriae response based on his inpu and performs he indicaed acion, usually alering he sae of he environmen. he acion is rewarded by a scalar reinforcemen. Inernally he sysem cycles hrough a sequence of performance, reinforcemen and discovery. he ZCS rule base consiss of a populaion of classifiers, symbolized by [ P ]. his populaion has a fixed maximum size N. Each classifier is a condiion-acion-srengh rule c, a, sr. he rule condiion c is a sring of characers from he ernary alphabe {0,,#}, where # acs as a wildcard allowing a classifier o generalize over differen inpu messages. he acion a { a,, a n } is represened by a binary sring and boh condiions and acions are iniialized randomly. Srengh scalar sr acs as an indicaion of he perceived uiliy of ha rule wihin he sysem. he srengh of each rule is iniialized o a predeermined value ermed S 0. On receip of an environmenal inpu message s, he rule-base is scanned and any classifiers whose condiion maches inpu message s is placed in a mach se [M]. Mach se [M] is a subse of he whole populaion [ P ] of classifiers. If on some ime-sep, [M] is empy or has a oal srengh Sr [ M ] ha is less han a fixed fracion (0 ) of he mean srengh of he populaion [ P ], hen a covering operaor is invoked. A new rule is creaed wih a condiion ha maches he environmenal inpu and a randomly seleced acion. he rule s condiion is hen made less specific by he random inclusion of # s a a probabiliy of P # per bi. he new rule is given a srengh equal o he populaion average and insered ino he populaion, overwriing a rule seleced for deleion. he deleed rules are chosen using roulee-wheel selecion based on he reciprocal of srengh. hus a paricular acion a is seleced from he mach se by roulee wheel selecion policy based on he oal srengh Sr( s, a ) of he classifiers in [M] which advocae ha acion. For all acions a a,, an in [M], Sr( s, a ) is named as sysem srengh, which is compued as: Sr( s, a) cl. sr (7) cl. a a cl [ M ] cl sands for a classifier, cl. sr for srengh of cl, and cl. a for is acion.

6 When an acion has been seleced, all rules in he [M] ha advocae his acion are placed in acion se [A] and he sysem execues he acion. Depending on environmenal circumsances, a scalar reward reinforcemen value r (maybe null) is supplied o ZCS as a consequence of execuing a, ogeher wih a new inpu configuraion s. Reinforcemen in ZCS consiss of redisribuing payoff beween subsequen acion ses. In each cycle, a "bucke-brigade" credi-assignmen policy similar o Sarsa is employed: ) A fixed fracion (0 ) of he srengh of each member of [A] a curren ime sep is deduced and placed in a common bucke B : sr[ A] ( i) ( ) sr[ A] ( i) ; B sr[ A]( i), where sr ( ) i [ A] i sands for he srengh of he i-h classifier of [A]. B is iniially se o zero. 2) If a reward r is received from he environmen as a consequence of execuing a a he previous ime sep -, hen a fixed fracion (0 ) of r is disribued evenly amongs he members of [ A] : sr[ A] ( i) sr [ A] ( i) r A, where A is he number of classifiers in [ A]. 3) Classifiers in [ A] (if i is non-empy) have heir srenghs incremened by B A, sr[ A] ( i) sr [ A] ( i) B A, where is a pre-deermined discoun facor ( 0 ), B is he oal amoun pu in he curren bucke in sep. 4) Finally, he bucke B is empied, and all classifiers in he se difference [M] - [A] have heir srenghs reduced by a small fracion (0 ), which acs as a "ax" o encourage exploiaion of srong classifier ses: cl [ M ] cl [ A] : cl. sr ( ) cl. sr. hen he above process can be wrien as a re-assignmen: Sr Sr ( r Sr Sr ) (8) [ A] [ A] [ A] [ A] Sr[ A] is he oal srengh of members of [ ] A, also known as Sr( s, a ) Sr is he oal srengh of members of [A], also known as Sr( s, a ). So, Equaion [ A] can be rewrien as ; Sr( s, a ) Sr( s, a ) ( r Sr( s, a) Sr( s, a )) (9) ZCS employs GA as discovery mechanism over he whole rule-se [ P ] a each insance (panmicic). On each cycle here is a probabiliy GA of GA invocaion. When called, he GA uses roulee wheel selecion o deermine he paren rules based on srengh. wo offspring are produced via crossover (single poin, using probabiliy ) and muaion (using probabiliy ). he parens hen donae half heir srengh o heir offspring who replace exising members of he populaion. he deleed rules are chosen based on he reciprocal of srengh.

7 3.2 Maze Environmens Maze problems, usually represened as grid-like wo-dimensional areas ha may conain differen objecs of any quaniy and wih differen properies (for example, obsacle, goal, or can be empy), serve as a simplified virual model of he real environmen, and can be used for developing core algorihms of many real-world applicaions relaed o he problem of navigaion. he agen should learn he shores pah o goal saes, wihou knowing he environmenal model in advance. F (a) F (b) Figure. (a) Maze6 environmen; (b) Woods4 environmen. Food objec is marked wih F, and obsacle is marked wih. LCS has been he mos widely used class of algorihms for reinforcemen learning in mazes for he las weny years, and has presened he mos promising performance resuls[6, 3]. Figure 3(a) presens Woods[6] maze environmen. he maze may conain differen obsacles in any quaniy, such as sanding for ree in Woods, and some objecs for learning purposes, like virual food F, which is he agens goal o reach. I mus be noed ha, if a maze has no enough obsacles o mark is boundary, he lef and righ edges of he maze are conneced, as are he op and boom. In his paper, he agen is randomly placed in he maze on an empy cell, and he agen has wo boolean sensors for each of he eigh adjacen squares. he agen can move ino any adjacen square ha is free. 4 Adding R-learning o ZCS In his secion, we show how ZCS can be modified o include R-learning[9, 0] o opimize average reward, which is differen from he discouned reward opimized by Sarsa-learning. he implemenaion of our sysem, ZCSAR, is also discussed here. As menioned above, ZCS uses a "bucke-brigade" credi-assignmen policy similar o Sarsa o updae he classifiers populaion. From Equaion (9), bucke-brigade algorihm in ZCS is indeed similar o he Sarsa updae rule (3). Besides, he comparison shows ha (i) ZCS represens each enry in he Q-able by a se of classifiers, i.e. Q( s, a ) is represened by he classifiers in [ A], and Q( s, a ) is represened by he classifiers in [ A ] ; (ii) he sysem srengh Sr( s, a ),

8 specified in Equaion (7), also known as Sr [ A], corresponds o he value Q( s, a ) in Equaion (3), and r Sr( s, a) in Equaion (9) corresponds o he esimae Q ˆ ( s, ) a of value Q( s, a ) in Equaion (4); (iii) Only one enry Q( s, a ) is updaed in abular Sarsa algorihm a ime sep, while in ZCS a se of classifiers is usually updaed in one ime sep. R-learning has been inroduced in Secion 2, and i is a new ype of reinforcemen learning. R-learning and Sarsa algorihm are similar in form bu no in meaning, since Sarsa algorihm is based on he discouned reward opimaliy, while R-learning, based on he average reward opimaliy, maximizes he average reward per sep. In R-learning, we can define he esimae of R( s, a ) as Rˆ( s, a ) r s, a max R s, a (0) imm aa hus, Equaion (5) can be rewrien as,, ˆ,, R R s a R s a R s a R s a () he major difference beween Equaion () and (3) is ha hey use differen mehods o compue he esimae R ˆ( s, ) a and Q ˆ ( s, ) a. Addiionally, R-learning needs o esimae he average reward, which is exra work han in Sarsa algorihm. From wha has been discussed above, he analogies beween Sarsa and ZCS, he difference and similariy beween Sarsa and R-learning have been presened. We can ge ha, he sysem srengh Sr( s, a ) in ZCS corresponds o he acion value R( s, a ), and r Sr( s, a) in ZCS corresponds o he new esimae R ˆ( s, ) a of R( s, a ). In order o add R-learning o ZCS, we only need o focus on he mehods o compue R ˆ( s, ) a in Equaion (0) and r Sr( s, a) in Equaion (9). Given he correspondence beween he sysem srengh Sr( s, a ) and he acion value R( s, a ), he average reward approach o compue r Sr( s, a) in Equaion (9) can be modified as r Sr( s, a). hus, Equaion (9) is changed as: Sr( s, a ) Sr( s, a ) ( r Sr( s, a) Sr( s, a )) (2) Equaion (2) will replace Equaion (9) in ZCS o change he whole reinforcemen learning mechanism employed by he original ZCS. Abou he specific updae rule of classifiers in [ A], he sep 2 and 3 in Secion 3. can be modified as: 2) If a reward r is received from he environmen as a consequence of execuing a a he previous ime sep -, and he esimae of average reward is, hen a fixed fracion (0 ) of r is disribued evenly amongs he members of [ A] : sr[ A] ( i) sr [ A] ( i) ( r ) A, where A is he number of classifiers in [ A]. 3) Classifiers in [ A] (if i is non-empy) have heir srenghs incremened by B A, sr[ A] ( i) sr [ A] ( i) B A, where B is he oal amoun pu in he curren bucke.

9 Nex, a procedure o esimae he average reward needs o be added o ZCS. Sep 5 in he descripion of R-learning algorihm in Secion 2.2 can be moved o ZCS hrough some modificaions. o do so, Sep 5 in Secion 2.2 can be rewrien as: If Sr[ A] max ( a A Sr s, a) (i.e. if a greedy/non-random acion a was chosen), hen updae he average reward according o he rule: r max Sr( s, a) max Sr( s, a) (3) aa aa he new ype of Sep 5 can be insered ino he procedure of ZCS, and locaed jus before he updae of classifiers in [ A]. I mus be noed ha, a he firs ime sep of each rial in an experimen, here is no need o updae he average reward, since no previous environmenal reward available a ha ime. And a he beginning of an experimen, is iniialized o zero. In addiion, he updae value of average reward is no used in Equaion (2) direcly. Insead, is more sable moving average value is adoped o avoid he heavy oscillaions wih is updae values, since average reward is updaed by he immediae reward r which is sochasic and wih grea flucuaion. he window size for moving average is 00, i.e. moving average is compued as he average of he las 00 updaed values. If he window size is oo small, he moving average will have no effec; if he window size is oo big, he changing rend of average reward will be hidden, which will limi he immediae feedback funcion of average reward. hrough he wo seps above, we have replaced Sarsa algorihm in ZCS wih R-learning, geing he new sysem ZCSAR. However, in order o speed up he process of convergence in ZCSAR, he flucuaion of he esimae needs o be reduced over ime. So we make he learning rae in Equaion (3) decayed over ime using a simple rule: max min, (4) NumOfrials max min where is he iniial value of, is he minimum learning rae required, and NumOfrials is he number of exploraion rials (problems) in an experimen. is updaed a he beginning of each exploraion rial using Equaion (4), bu no a each ime sep. 5 Subracion rouble and ournamen Selecion When ZCSAR uses Equaion (2) as reinforcemen learning mechanism, some issues arise. he updae rule for sysem srengh Sr(, ) differs from he rule for Sarsa-learning in subracing he average reward from he immediae reward, and no discouning he nex sysem srengh (acion value). he subracion may cause sysem srengh Sr(, ) negaive, which does no appear in original ZCS and discouned reward reinforcemen learning Sarsa. he negaive Sr(, ), less han zero, occurs when he value of r Sr( s, a) is coninuously negaive for some ime

10 seps. In mos ime seps, reward is delayed, so r is zero. he esimaion of he average reward is no an easy ask in sparse reward domains. I may differ largely from he rue value of average reward in early sage of learning. hus, wheher he value of r Sr( s, a) is negaive or no depends mainly on he difference of and Sr( s, a ). If Sr(, ) is negaive, he sum of srengh of classifiers in acion se is also less han zero, which means some classifiers srengh is negaive in acion se. However, all componens of ZCS were designed on he supposiion ha classifier s srengh is greaer han zero. Specially, roulee wheel selecion (proporionae selecion) based on classifier s srengh (or is reciprocal) is adoped as acion selecion mehod in mach se [M], parens selecion mehod in GA, classifier selecion mehod in GA deleion and covering operaor deleion. I is known ha classifier s srengh mus be posiive in roulee wheel selecion. ZCS is in line wih his requiremen, bu no ZCSAR. his is a problem caused by subracion. o address his problem, an easy way is o make negaive values be zero, i.e. le classifier s srengh no less han zero. We indicae his mehod as "runcaion". In oher words, if ZCSAR sill uses roulee wheel selecion, runcaion is an easy mehod o adap i. However, is runcaion mehod proper and effecive for ZCSAR? Is here any alernaive o ackle his problem? A promising proposal is o replace roulee wheel selecion wih ournamen selecion in ZCSAR. ournamen selecion wih ournamen sizes proporionae o he acual se size is shown o ouperform roulee wheel selecion in he widely-used classifier sysem XCS[4]. So i is expeced ha ournamen selecion can also improve he performance of ZCSAR. And imporanly, in conras o roulee wheel selecion, ournamen selecion is independen of finess scaling and does no require posiive classifier srengh, so classifiers srengh can be less han zero in ZCSAR wih ournamen selecion. In ournamen selecion, classifiers are no seleced proporional o heir srengh, bu ournamens are held in which he classifier wih he highes srengh wins. Sochasic ournamens are no considered herein. Paricipans for he ournamen are chosen a random from he corresponding classifier se in which selecion is applied. he size of he ournamen is dependen on he corresponding classifier se size, and he size of each ournamen has he size of he fracion (0,] of he corresponding classifier se size. Parameer conrols he selecion pressure. Insead of roulee wheel selecion in acion selecion in mach se [M], parens selecion in GA, classifier deleion selecion in GA and covering operaor, hree independen ournamens are held in which he classifier wih he highes (or lowes) srengh is seleced, and values are 0., 0.4, 0.6 respecively. Laer, we will refer o our proposals as "ZCSAR+Roulee" and "ZCSAR+ournamen" in he remainder of his work, o indicae ZCSAR wih roulee wheel selecion and runcaion mehod, and ZCSAR wih ournamen selecion respecively. 6 Experimens in Maze Environmens wo maze problems are esed and sudied here, o illusrae he generaliy and effeciveness of our approaches, and ZCS for comparison.

11 6. Experimenal Seup o conduc experimens, every experimen ypically consiss of 2000 problems (rials) ha he agen mus solve. And for each problem, he agen is placed ino a randomly chosen empy square in he mazes. hen he agen moves under he conrol of he classifier sysem avoiding obsacles unil eiher i reaches he food or had aken 500 seps, a which poin he problem ended uncondiionally. he agen will no change is posiion if i chooses an acion o move o a square wih an obsacle inside, hough one ime-sep sill elapses. When he agen reaches he food, i receives a consan reward of 000; oherwise, i receives a reward equal o 0. And in order o evaluae he final policy evolved, in each experimen, exploraion is urned off during he las 2000 problems and he sysem works only in exploiaion. In exploiaion problems, he acion which predics he highes payoff is always seleced in mach se [M], and he geneic algorihm is urned off. Sysem performance is compued as he average number of seps o food in he las 50 problems. Every saisic resuls presened in his paper is averaged on 0 experimens. he following classifier srucure was used for LCS in he experimens: Each classifier has 6 binary bis in he condiion field: wo bis for each of he 8 neighbouring squares, wih 00 represening he siuaion ha he square is empy, ha i conains food (F), and 0 ha i is an obsacle (). he general LCS s parameers used for ZCS, ZCSAR+Roulee, and ZCSAR+ournamen are se as follows: β=0.6, =0., GA =0.25, =0.5, χ=0.5, μ=0.002, S 0 =20.0, P # =0.33, N =800. Some specific parameers are se as follows: max min for ZCSAR+Roulee, and ZCSAR+ournamen, =0.005, =0.0000; and in ZCS, =0.7. he deailed descripion of hese parameers is available in [6] and [8]. 6.2 Experimenal Resuls and Discussions In he firs experimen, we applied ZCS, ZCSAR+Roulee and ZCSAR+ournamen o Maze6 environmen (Figure (a)). Maze6 is a ypical and somewha difficul environmen for esing he learning sysems since he goal posiion for agens o reach is hidden by some obsacles, and here is no any regulariy in i. Each sensory-acion pair in his maze almos needs a special classifier o cover (i.e. i only allows few generalizaions), so ZCS is likely o produce over-general classifiers in i. Besides, he opimal soluion in Maze6 requires he agen o perform long sequences of acions o reach he goal sae. he opimal average pah o he food in Maze6 is 5.9 seps. his experimen is used o show ha ZCS wih average reward reinforcemen learning can solve he general maze problem. Figure 2 repors he performance of ZCS, ZCSAR+Roulee and ZCSAR+ournamen in Maze6 environmen. In he hree cases, he resuls all converge o near opimum during he las 2000 exploiaion problems, and here is almos no difference beween hem, abou 5.85, 6.2, and 6.02 respecively. ZCSAR+Roulee and ZCSAR+ournamen can almos perform as well as ZCS in his environmen. During he learning period (firs 0000 problems), he hree sysems performance deviaes from opimum, since he GA coninues o funcion and probabilisic acion selecion (roulee wheel selecion or ournamen selecion) is

12 used. In addiion, ZCSAR+ournamen changes coninuously and oscillaes heavily wihin he firs 0000 learning problems, which is possibly caused by ournamen selecion used as acion selecion mechanism in mach se [M]. Number of Seps o Goal Maze6 ZCS ZCSAR+Roulee ZCSAR+ournamen Opimum Number of Problems (rials) Figure 2. Performance of applying ZCSAR+Roulee and ZCSAR+ournamen o Maze6, compared wih ZCS. Error bars represen he sandard error. Curves are averages over 0 experimens. 300 Number of Seps o Goal Woods4 ZCS ZCSAR+Roulee ZCSAR+ournamen Opimum Number of Problems (rials) Figure 3. Performance of applying ZCSAR+Roulee and ZCSAR+ournamen o Woods4, compared wih ZCS. Error bars represen he sandard error. Curves are averages over 0 experimens. In he second experimen, he esing environmen is Woods4 (Figure (b)), which is a corridor of 8 blank cells and a food cell a he end. he opimal average pah o he food in Woods4 is 9.5 seps. he agen needs longer sequences of acions o reach he goal posiion, resuling in a sparser recepion of delayed reward. So, i is complex o mos LCSs[5].

13 I can be seen from Figure 2 ha, in Woods4, performances of he hree sysems oscillae above he opimum during raining period, while evolve promising soluions during he las 2000 exploiaion problems. ZCSAR+ournamen ges abou 9.50 seps o find food, and ZCS ges abou 0.70 seps. ZCSAR+Roulee performs less well (near opimum) and converges o abou 2.36 seps. ZCSAR+ournamen can ge he opimal soluion in Woods4. his seems because of he average reward reinforcemen learning and ournamen selecion employed by ZCSAR+ournamen, which guaranees he sysem can disambiguae hose early saes in he long acion chains effecively. 7 Conclusions In his paper, due o he similariy beween Sarsa and bucke-brigade algorihm in ZCS, and he similariy in form beween Sarsa algorihm and R-learning, bucke-brigade algorihm in ZCS is replaced wih R-learning hrough some modificaions. R-learning is an undiscouned reinforcemen learning echnique o opimize average reward, which is a differen meric from he discouned reward opimized by bucke-brigade algorihm. hus, ZCS wih R-learning, ZCSAR, is able o maximize he average reward per ime sep, no he cumulaive discouned rewards. his is helpful o suppor long acion chains in large muli-sep learning problems. However, R-learning will cause some classifiers srengh is negaive in ZCSAR. his does no mee he supposiion ha classifier s srengh is greaer han zero in ZCS. Specially, roulee wheel selecion based on classifier s srengh (or is reciprocal) used in ZCS requires ha classifier s srengh is posiive. o address his problem, wo exended sysems are presened: "ZCSAR+Roulee" and "ZCSAR+ournamen". ZCSAR+Roulee indicaes ZCSAR wih roulee wheel selecion and runcaion mehod, while ZCSAR+ournamen indicaes ZCSAR wih ournamen selecion. runcaion means o cu off hose negaive srengh values, se hem o zero. We es ZCSAR+Roulee and ZCSAR+ournamen on wo well-known muli-sep problems, compared wih ZCS. Overall, experimens show ha ZCSAR+ournamen can evolve opimal or near-opimal soluions in hese ypically difficul muli-sep environmens, while ZCSAR+Roulee can jus reach he subopimum in Woods4 environmen. Especially in Woods4 environmen, he performance of ZCSAR+ournamen is very good, bu ZCS jus reaches a near-opimal performance. Because of he basic change of he reinforcemen learning employed by ZCS, and ournamen selecion is used o replace roulee wheel selecion, ZCSAR+ournamen sill needs some exra esing o sudy heir performance in oher problems. Addiionally, we plan o consider he impac of average reward reinforcemen learning in ZCS when he environmen is sochasic. References: []. Bull, L., A brief hisory of learning classifier sysems: from CS- o XCS and is varians. Evoluionary Inelligence, 205: p. -6. [2]. Ebadi,., e al., Human-inerpreable Feaure Paern Classificaion Sysem using Learning

14 Classifier Sysems. Evoluionary Compuaion, (4): p [3]. zima, F.A. and P.A. Mikas, ZCS Revisied: Zeroh-Level Classifier Sysems for Daa Mining, in Proceedings of he 2008 IEEE Inernaional Conference on Daa Mining Workshops. 2008, IEEE Compuer Sociey: Washingon, DC, USA. p [4]. Cadrik,. and M. Mach, Conrol of agens in a muli-agen sysem using ZCS evoluionary classifier sysems, in 204 IEEE 2h Inernaional Symposium on Applied Machine Inelligence and Informaics (SAMI). 204, IEEE: Herl'any, Slovakia. p [5]. Cádrik,. and M. Mach, Usage of ZCS Evoluionary Classifier Sysem as a Rule Maker for Cleaning Robo ask, in Emergen rends in Roboics and Inelligen Sysems, P. Sinčák, e al., P. Sinčák, e al.^ediors. 205, Springer Inernaional Publishing. p [6]. Wilson, S.W., ZCS: A zeroh level classifier sysem. Evoluionary Compuaion, (): p. -8. [7]. Wilson, S.W., Classifier Finess Based on Accuracy. Evoluionary Compuaion, (2): p [8]. Bull, L. and J. Hurs, ZCS Redux. Evoluionary Compuaion, (2): p [9]. Schwarz, A., A reinforcemen learning mehod for maximizing undiscouned rewards, in Proceedings of he enh Inernaional Conference on Machine Learning, P. Ugoff, P. Ugoff^Ediors. 993, Morgan Kaufmann. p [0]. Singh, S.P., Reinforcemen learning algorihms for average-payoff Markovian decision processes, in Proceedings of he welfh naional conference on Arificial inelligence (vol. ). 994, American Associaion for Arificial Inelligence: Menlo Park, CA, USA. p []. Suon, R.S. and A.G. Baro, Reinforcemen learning: an inroducion. Adapive compuaion and machine learning. 998, Cambridge, MA: MI Press. [2]. Mahadevan, S., Average reward reinforcemen learning: Foundaions, algorihms, and empirical resuls. Machine Learning, : p [3]. Zauchna, Z. and A. Bagnall, A learning classifier sysem for mazes wih aliasing clones. Naural Compuing, (): p [4]. Buz, M.V., K. Sasry and D.E. Goldberg, Srong, Sable, and Reliable Finess Pressure in XCS due o ournamen Selecion. Geneic Programming and Evolvable Machines, (): p [5]. Zang, Z., e al., Learning classifier sysem wih average reward reinforcemen learning. Knowledge-Based Sysems, (0): p

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course OMP: Arificial Inelligence Fundamenals Lecure 0 Very Brief Overview Lecurer: Email: Xiao-Jun Zeng x.zeng@mancheser.ac.uk Overview This course will focus mainly on probabilisic mehods in AI We shall presen

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9) CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints IJCSI Inernaional Journal of Compuer Science Issues, Vol 9, Issue 1, No 1, January 2012 wwwijcsiorg 18 Applying Geneic Algorihms for Invenory Lo-Sizing Problem wih Supplier Selecion under Sorage Capaciy

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems Single-Pass-Based Heurisic Algorihms for Group Flexible Flow-shop Scheduling Problems PEI-YING HUANG, TZUNG-PEI HONG 2 and CHENG-YAN KAO, 3 Deparmen of Compuer Science and Informaion Engineering Naional

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Learning to Take Concurrent Actions

Learning to Take Concurrent Actions Learning o Take Concurren Acions Khashayar Rohanimanesh Deparmen of Compuer Science Universiy of Massachuses Amhers, MA 0003 khash@cs.umass.edu Sridhar Mahadevan Deparmen of Compuer Science Universiy of

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

MODULE - 9 LECTURE NOTES 2 GENETIC ALGORITHMS

MODULE - 9 LECTURE NOTES 2 GENETIC ALGORITHMS 1 MODULE - 9 LECTURE NOTES 2 GENETIC ALGORITHMS INTRODUCTION Mos real world opimizaion problems involve complexiies like discree, coninuous or mixed variables, muliple conflicing objecives, non-lineariy,

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Lab 10: RC, RL, and RLC Circuits

Lab 10: RC, RL, and RLC Circuits Lab 10: RC, RL, and RLC Circuis In his experimen, we will invesigae he behavior of circuis conaining combinaions of resisors, capaciors, and inducors. We will sudy he way volages and currens change in

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14 CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

5.1 - Logarithms and Their Properties

5.1 - Logarithms and Their Properties Chaper 5 Logarihmic Funcions 5.1 - Logarihms and Their Properies Suppose ha a populaion grows according o he formula P 10, where P is he colony size a ime, in hours. When will he populaion be 2500? We

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

A Framework for Efficient Document Ranking Using Order and Non Order Based Fitness Function

A Framework for Efficient Document Ranking Using Order and Non Order Based Fitness Function A Framework for Efficien Documen Ranking Using Order and Non Order Based Finess Funcion Hazra Imran, Adii Sharan Absrac One cenral problem of informaion rerieval is o deermine he relevance of documens

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems Paricle Swarm Opimizaion Combining Diversificaion and Inensificaion for Nonlinear Ineger Programming Problems Takeshi Masui, Masaoshi Sakawa, Kosuke Kao and Koichi Masumoo Hiroshima Universiy 1-4-1, Kagamiyama,

More information

The field of mathematics has made tremendous impact on the study of

The field of mathematics has made tremendous impact on the study of A Populaion Firing Rae Model of Reverberaory Aciviy in Neuronal Neworks Zofia Koscielniak Carnegie Mellon Universiy Menor: Dr. G. Bard Ermenrou Universiy of Pisburgh Inroducion: The field of mahemaics

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

EKF SLAM vs. FastSLAM A Comparison

EKF SLAM vs. FastSLAM A Comparison vs. A Comparison Michael Calonder, Compuer Vision Lab Swiss Federal Insiue of Technology, Lausanne EPFL) michael.calonder@epfl.ch The wo algorihms are described wih a planar robo applicaion in mind. Generalizaion

More information

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3 and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or

More information

RC, RL and RLC circuits

RC, RL and RLC circuits Name Dae Time o Complee h m Parner Course/ Secion / Grade RC, RL and RLC circuis Inroducion In his experimen we will invesigae he behavior of circuis conaining combinaions of resisors, capaciors, and inducors.

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems. di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x

More information

Chapter 7 Response of First-order RL and RC Circuits

Chapter 7 Response of First-order RL and RC Circuits Chaper 7 Response of Firs-order RL and RC Circuis 7.- The Naural Response of RL and RC Circuis 7.3 The Sep Response of RL and RC Circuis 7.4 A General Soluion for Sep and Naural Responses 7.5 Sequenial

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

NCSS Statistical Software. , contains a periodic (cyclic) component. A natural model of the periodic component would be

NCSS Statistical Software. , contains a periodic (cyclic) component. A natural model of the periodic component would be NCSS Saisical Sofware Chaper 468 Specral Analysis Inroducion This program calculaes and displays he periodogram and specrum of a ime series. This is someimes nown as harmonic analysis or he frequency approach

More information

Analyze patterns and relationships. 3. Generate two numerical patterns using AC

Analyze patterns and relationships. 3. Generate two numerical patterns using AC envision ah 2.0 5h Grade ah Curriculum Quarer 1 Quarer 2 Quarer 3 Quarer 4 andards: =ajor =upporing =Addiional Firs 30 Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 andards: Operaions and Algebraic Thinking

More information

2016 Possible Examination Questions. Robotics CSCE 574

2016 Possible Examination Questions. Robotics CSCE 574 206 Possible Examinaion Quesions Roboics CSCE 574 ) Wha are he differences beween Hydraulic drive and Shape Memory Alloy drive? Name one applicaion in which each one of hem is appropriae. 2) Wha are he

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates Biol. 356 Lab 8. Moraliy, Recruimen, and Migraion Raes (modified from Cox, 00, General Ecology Lab Manual, McGraw Hill) Las week we esimaed populaion size hrough several mehods. One assumpion of all hese

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n Module Fick s laws of diffusion Fick s laws of diffusion and hin film soluion Adolf Fick (1855) proposed: d J α d d d J (mole/m s) flu (m /s) diffusion coefficien and (mole/m 3 ) concenraion of ions, aoms

More information

Seminar 4: Hotelling 2

Seminar 4: Hotelling 2 Seminar 4: Hoelling 2 November 3, 211 1 Exercise Par 1 Iso-elasic demand A non renewable resource of a known sock S can be exraced a zero cos. Demand for he resource is of he form: D(p ) = p ε ε > A a

More information

Decimal moved after first digit = 4.6 x Decimal moves five places left SCIENTIFIC > POSITIONAL. a) g) 5.31 x b) 0.

Decimal moved after first digit = 4.6 x Decimal moves five places left SCIENTIFIC > POSITIONAL. a) g) 5.31 x b) 0. PHYSICS 20 UNIT 1 SCIENCE MATH WORKSHEET NAME: A. Sandard Noaion Very large and very small numbers are easily wrien using scienific (or sandard) noaion, raher han decimal (or posiional) noaion. Sandard

More information

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN Inernaional Journal of Applied Economerics and Quaniaive Sudies. Vol.1-3(004) STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN 001-004 OBARA, Takashi * Absrac The

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Electrical and current self-induction

Electrical and current self-induction Elecrical and curren self-inducion F. F. Mende hp://fmnauka.narod.ru/works.hml mende_fedor@mail.ru Absrac The aricle considers he self-inducance of reacive elemens. Elecrical self-inducion To he laws of

More information

EXPLICIT TIME INTEGRATORS FOR NONLINEAR DYNAMICS DERIVED FROM THE MIDPOINT RULE

EXPLICIT TIME INTEGRATORS FOR NONLINEAR DYNAMICS DERIVED FROM THE MIDPOINT RULE Version April 30, 2004.Submied o CTU Repors. EXPLICIT TIME INTEGRATORS FOR NONLINEAR DYNAMICS DERIVED FROM THE MIDPOINT RULE Per Krysl Universiy of California, San Diego La Jolla, California 92093-0085,

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Comparing Means: t-tests for One Sample & Two Related Samples

Comparing Means: t-tests for One Sample & Two Related Samples Comparing Means: -Tess for One Sample & Two Relaed Samples Using he z-tes: Assumpions -Tess for One Sample & Two Relaed Samples The z-es (of a sample mean agains a populaion mean) is based on he assumpion

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x WEEK-3 Reciaion PHYS 131 Ch. 3: FOC 1, 3, 4, 6, 14. Problems 9, 37, 41 & 71 and Ch. 4: FOC 1, 3, 5, 8. Problems 3, 5 & 16. Feb 8, 018 Ch. 3: FOC 1, 3, 4, 6, 14. 1. (a) The horizonal componen of he projecile

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

NEWTON S SECOND LAW OF MOTION

NEWTON S SECOND LAW OF MOTION Course and Secion Dae Names NEWTON S SECOND LAW OF MOTION The acceleraion of an objec is defined as he rae of change of elociy. If he elociy changes by an amoun in a ime, hen he aerage acceleraion during

More information

Math 333 Problem Set #2 Solution 14 February 2003

Math 333 Problem Set #2 Solution 14 February 2003 Mah 333 Problem Se #2 Soluion 14 February 2003 A1. Solve he iniial value problem dy dx = x2 + e 3x ; 2y 4 y(0) = 1. Soluion: This is separable; we wrie 2y 4 dy = x 2 + e x dx and inegrae o ge The iniial

More information

A First Course on Kinetics and Reaction Engineering. Class 19 on Unit 18

A First Course on Kinetics and Reaction Engineering. Class 19 on Unit 18 A Firs ourse on Kineics and Reacion Engineering lass 19 on Uni 18 Par I - hemical Reacions Par II - hemical Reacion Kineics Where We re Going Par III - hemical Reacion Engineering A. Ideal Reacors B. Perfecly

More information

IB Physics Kinematics Worksheet

IB Physics Kinematics Worksheet IB Physics Kinemaics Workshee Wrie full soluions and noes for muliple choice answers. Do no use a calculaor for muliple choice answers. 1. Which of he following is a correc definiion of average acceleraion?

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source Muli-scale D acousic full waveform inversion wih high frequency impulsive source Vladimir N Zubov*, Universiy of Calgary, Calgary AB vzubov@ucalgaryca and Michael P Lamoureux, Universiy of Calgary, Calgary

More information

Numerical Dispersion

Numerical Dispersion eview of Linear Numerical Sabiliy Numerical Dispersion n he previous lecure, we considered he linear numerical sabiliy of boh advecion and diffusion erms when approimaed wih several spaial and emporal

More information

Phys1112: DC and RC circuits

Phys1112: DC and RC circuits Name: Group Members: Dae: TA s Name: Phys1112: DC and RC circuis Objecives: 1. To undersand curren and volage characerisics of a DC RC discharging circui. 2. To undersand he effec of he RC ime consan.

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

Basic Circuit Elements Professor J R Lucas November 2001

Basic Circuit Elements Professor J R Lucas November 2001 Basic Circui Elemens - J ucas An elecrical circui is an inerconnecion of circui elemens. These circui elemens can be caegorised ino wo ypes, namely acive and passive elemens. Some Definiions/explanaions

More information

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

A Reinforcement Learning Approach for Collaborative Filtering

A Reinforcement Learning Approach for Collaborative Filtering A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr

More information

Fishing limits and the Logistic Equation. 1

Fishing limits and the Logistic Equation. 1 Fishing limis and he Logisic Equaion. 1 1. The Logisic Equaion. The logisic equaion is an equaion governing populaion growh for populaions in an environmen wih a limied amoun of resources (for insance,

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

Chapter Floating Point Representation

Chapter Floating Point Representation Chaper 01.05 Floaing Poin Represenaion Afer reading his chaper, you should be able o: 1. conver a base- number o a binary floaing poin represenaion,. conver a binary floaing poin number o is equivalen

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should Cambridge Universiy Press 978--36-60033-7 Cambridge Inernaional AS and A Level Mahemaics: Mechanics Coursebook Excerp More Informaion Chaper The moion of projeciles In his chaper he model of free moion

More information

Shiva Akhtarian MSc Student, Department of Computer Engineering and Information Technology, Payame Noor University, Iran

Shiva Akhtarian MSc Student, Department of Computer Engineering and Information Technology, Payame Noor University, Iran Curren Trends in Technology and Science ISSN : 79-055 8hSASTech 04 Symposium on Advances in Science & Technology-Commission-IV Mashhad, Iran A New for Sofware Reliabiliy Evaluaion Based on NHPP wih Imperfec

More information

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Space

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Space Inernaional Journal of Indusrial and Manufacuring Engineering Applying Geneic Algorihms for Invenory Lo-Sizing Problem wih Supplier Selecion under Sorage Space Vichai Rungreunganaun and Chirawa Woarawichai

More information

Stability and Bifurcation in a Neural Network Model with Two Delays

Stability and Bifurcation in a Neural Network Model with Two Delays Inernaional Mahemaical Forum, Vol. 6, 11, no. 35, 175-1731 Sabiliy and Bifurcaion in a Neural Nework Model wih Two Delays GuangPing Hu and XiaoLing Li School of Mahemaics and Physics, Nanjing Universiy

More information

Announcements: Warm-up Exercise:

Announcements: Warm-up Exercise: Fri Apr 13 7.1 Sysems of differenial equaions - o model muli-componen sysems via comparmenal analysis hp//en.wikipedia.org/wiki/muli-comparmen_model Announcemens Warm-up Exercise Here's a relaively simple

More information

Waveform Transmission Method, A New Waveform-relaxation Based Algorithm. to Solve Ordinary Differential Equations in Parallel

Waveform Transmission Method, A New Waveform-relaxation Based Algorithm. to Solve Ordinary Differential Equations in Parallel Waveform Transmission Mehod, A New Waveform-relaxaion Based Algorihm o Solve Ordinary Differenial Equaions in Parallel Fei Wei Huazhong Yang Deparmen of Elecronic Engineering, Tsinghua Universiy, Beijing,

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Particle Swarm Optimization

Particle Swarm Optimization Paricle Swarm Opimizaion Speaker: Jeng-Shyang Pan Deparmen of Elecronic Engineering, Kaohsiung Universiy of Applied Science, Taiwan Email: jspan@cc.kuas.edu.w 7/26/2004 ppso 1 Wha is he Paricle Swarm Opimizaion

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON4325 Moneary Policy Dae of exam: Tuesday, May 24, 206 Grades are given: June 4, 206 Time for exam: 2.30 p.m. 5.30 p.m. The problem se covers 5 pages

More information

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent

Chapter 21. Reinforcement Learning. The Reinforcement Learning Agent CSE 47 Chaper Reinforcemen Learning The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Enironmen CSE AI Faculy Why reinforcemen learning Programming an agen o drie a car or fly a helicoper is ery

More information

Estimation of Poses with Particle Filters

Estimation of Poses with Particle Filters Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU

More information

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3 Macroeconomic Theory Ph.D. Qualifying Examinaion Fall 2005 Comprehensive Examinaion UCLA Dep. of Economics You have 4 hours o complee he exam. There are hree pars o he exam. Answer all pars. Each par has

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Intermediate Macro In-Class Problems

Intermediate Macro In-Class Problems Inermediae Macro In-Class Problems Exploring Romer Model June 14, 016 Today we will explore he mechanisms of he simply Romer model by exploring how economies described by his model would reac o exogenous

More information