SCALED GRADIENT DESCENT LEARNING RATE Reinforcement learning with light-seeking robot

Size: px
Start display at page:

Download "SCALED GRADIENT DESCENT LEARNING RATE Reinforcement learning with light-seeking robot"

Transcription

1 SCALED GRADIET DESCET LEARIG RATE Renforcement lernng wth lght-seekng robot Kry Främlng Helsnk Unversty of Technology, P.O. Box 54, FI-5 HUT, Fnlnd. Eml: Keywords: Abstrct: Lner functon pproxmton, Grdent descent, Lernng rte, Renforcement lernng, Lght-seekng robot Adptve behvour through mchne lernng s chllengng n mny rel-world pplctons such s robotcs. Ths s becuse lernng hs to be rpd enough to be performed n rel tme nd to vod dmge to the robot. Models usng lner functon pproxmton re nterestng n such tsks becuse they offer rpd lernng nd hve smll memory nd processng requrements. Adlnes re smple model for grdent descent lernng wth lner functon pproxmton. However, the performnce of grdent descent lernng even wth lner model gretly depends on dentfyng good vlue for the lernng rte to use. In ths pper t s shown tht the lernng rte should be scled s functon of the current nput vlues. A scled lernng rte mkes t possble to vod weght osclltons wthout slowng down lernng. The dvntges of usng the scled lernng rte re llustrted usng robot tht lerns to nvgte towrds lght source. Ths lght-seekng robot performs Renforcement Lernng tsk, where the robot collects trnng smples by explorng the envronment,.e. tkng ctons nd lernng from ther result by trlnd-error procedure. ITRODUCTIO The use of mchne lernng n rel-world control pplctons s chllengng. Rel-world tsks, such s those usng rel robots, nvolve nose comng from sensors, non-determnstc ctons nd uncontrollble chnges n the envronment. In robotcs, experments re lso longer thn smulted ones, so lernng must be reltvely rpd nd possble to perform wthout cusng dmge to the robot. Only nformton tht s vlble from robot sensors cn be used for lernng. Ths mens tht the lernng methods hve to be ble to hndle prtlly mssng nformton nd sensor nose, whch my be dffcult to tke nto ccount n smulted envronments. Artfcl neurl networks (A) re wellknown technque for mchne lernng n nosy envronments. In rel robotcs pplctons, however, A lernng my become too slow to be prctcl, especlly f the robot hs to explore the envronment nd collect trnng smples by tself. Lernng by utonomous explorton of the envronment by lernng gent s often performed usng renforcement lernng (RL) methods. Due to these requrements, one-lyer lner functon pproxmton As (often clled Adlnes (Wdrow & Hoff, 96)) re n nterestng lterntve. Ther trnng s much fster thn for non-lner As nd ther convergence propertes re lso better. Fnlly, they hve smll memorynd computng power requrements. However, when Adlne nputs come from sensors tht gve vlues of dfferent mgntude, t becomes dffcult to determne wht lernng rte to use n order to vod weght oscllton. Furthermore, s shown n the experments secton of ths pper, usng fxed lernng rte my be problemtc lso becuse the optml lernng rte chnges dependng on the stte of the gent nd the envronment. Ths s why the use of scled lernng rte s proposed, where the lernng rte vlue s modfed ccordng to Adlne nput vlues. The scled lernng rte mkes lernng steps of smlr mgntude ndependently of the nput vlues. It s lso sgnfcntly eser to determne sutble vlue for the scled lernng rte thn t s for fxed lernng rte. After ths ntroducton, Secton gves bckground nformton bout grdent descent lernng nd RL. Secton 3 defnes the scled lernng rte, followed by expermentl results n Secton 4. Relted work s treted n Secton 5, followed by conclusons.

2 GRADIET DESCET REIFORCEMET LEARIG In grdent descent lernng, the free prmeters of model re grdully modfed so tht the dfference between the output gven by model nd the correspondng correct or trget vlue becomes s smll s possble for ll trnng smples vlble. In such supervsed lernng, ech trnng smple conssts of nput vlues nd the correspondng trget vlues. Rel-world trnng smples typclly nvolve nose, whch mens tht t s not possble to obtn model tht would gve the exct trget vlue for ll trnng smples. The gol of lernng s rther to mnmze sttstcl error mesure, e.g. the Root Men Squre Error (RMSE) RMSE M M k k ( t ) k () where M s the number of trnng exmples, t k s the trget vlue for output nd trnng smple k nd k s the model output for output nd trnng smple k. In RL tsks, ech output typclly corresponds to one possble cton. RL dffers from supervsed lernng t lest n the followng wys: The gent hs to collect the trnng smples by explorng the envronment, whch forces t to keep blnce between explorng the envronment for new trnng smples nd explotng wht t hs lerned from the exstng ones (the explorton/explotton trde-off). In supervsed lernng, ll trnng smples re usully pre-collected nto trnng set, so lernng cn be performed off-lne. Trget vlues re only vlble for the used ctons. In supervsed lernng, trget vlues re typclly provded for ll ctons. The trget vlue s not lwys vlble drectly; t my be vlble only fter the gent hs performed severl ctons. Then we spek bout delyed rewrd lernng tsk. RL methods usully model the envronment s Mrkov Decson Process (MDP), where every stte of the envronment needs to be unquely dentfble. Ths s why the model used for RL lernng s often smple lookup-tble, where ech envronment stte corresponds to one row (or column) n the tble nd the columns (or rows) correspond to possble ctons. The vlues of the tble express how good ech cton s n the gven stte. Lookup-tbles re not sutble for tsks nvolvng sensorl nose or other resons for the gent not beng ble to unquely dentfy the current stte of the envronment (such tsks re clled hdden stte tsks). Ths s one of the resons for usng stte generlzton technques nsted of lookup-tbles. Generlston n RL s bsed on the ssumpton tht n cton tht s good n some stte s probbly good lso n smlr sttes. Vrous clssfcton technques hve been used for dentfyng smlr sttes. Some knd of A s typclly used for the generlston. As cn hndle ny stte descrptons, not only dscrete ones. Therefore they re well dpted for problems nvolvng contnuous-vlued stte vrbles nd nose, whch s usully the cse n robotcs pplctons.. Grdent descent lernng wth Adlnes The smplest A s the lner Adlne (Wdrow & Hoff, 96), where neurons clculte ther output vlue s weghted sum of ther nput vlues ( s) s w (), where w, s the weght of neuron ssocted wth the neuron s nput, (s) s the output vlue of neuron, s s the vlue of nput nd s the number of nputs. They re trned usng the Wdrow-Hoff trnng rule (Wdrow & Hoff, 96) new, w, + ( t w α ) s (3) where α s lernng rte. The Wdrow-Hoff lernng rule s obtned by nsertng equton () nto the RMSE expresson nd tkng the prtl dervtve gnst s. It cn esly be shown tht there s only one optml soluton for the error s functon of the Adlne weghts. Therefore grdent descent s gurnteed to converge f the lernng rte s selected suffcently smll. When the bck propgton rule for grdent descent n mult-lyer As ws developed (Rumelhrt et l., 988), t becme possble to lern non-lner functon pproxmtons nd clssfctons. Unfortuntely, lernng non-lner functons by grdent descent tends to be slow nd to converge to loclly optml solutons.. Renforcement lernng RL methods often ssume tht the envronment cn be modelled s MDP. A (fnte) MDP s tuple M(S,A,T,R), where: S s fnte set of sttes; A {,, k } s set of k ctons; T [P s ( ) s

3 S, A} re the next-stte trnston probbltes, wth P s (s ) gvng the probblty of trnstonng to stte s upon tkng cton n stte s; nd R specfes the rewrd vlues gven n dfferent sttes s S. RL methods re bsed on the noton of vlue functons. Vlue functons re ether stte-vlues (.e. vlue of stte) or cton-vlues (.e. vlue of tkng n cton n gven stte). The vlue of stte s S cn be defned formlly s V π () s Eπ γ k k rt + k+ st s (4) where V π (s) s the stte vlue tht corresponds to the expected return when strtng n s nd followng polcy π therefter. The fctor r t+k+ s the rewrd obtned when rrvng nto sttes s t+, s t+ etc. γ k s dscountng fctor tht determnes to wht degree future rewrds ffect the vlue of stte s. Acton vlue functons re usully denoted Q(s,), where A. In control pplctons, the gol of RL s to lern n cton-vlue functon tht llows the gent to use polcy tht s s close to optml s possble. However, snce the cton-vlues re ntlly unknown, the gent frst hs to explore the envronment n order to lern t... Explorng the envronment for trnng smples Although convergence s gurnteed for Wdrow- Hoff lernng n Adlnes, n RL tsks convergence of grdent descent cnnot lwys be gurnteed even for Adlnes (Boyn & Moore, 995). Ths s becuse the gent tself hs to explore the envronment nd collect trnng smples by explorng the envronment. If the cton selecton polcy π does not mke the gent collect relevnt nd representtve trnng smples, then lernng my fl to converge to good soluton. Therefore, the cton selecton polcy must provde suffcent explorton of the envronment to ensure tht good trnng smples re collected. At the sme tme, the gol of trnng s to mprove the performnce of the gent,.e. the cton selecton polcy so tht the lerned model cn be exploted. Blncng the explorton/explotton trde-off s one of the most dffcult problems n RL for control (Thrun, 99). A rndom serch polcy cheves mxml explorton, whle greedy polcy gves mxml explotton by lwys tkng the cton tht hs the hghest cton vlue. A commonly used method for blncng explorton nd explotton s to use ε-greedy explorton, where the greedy cton s selected wth probblty (-ε) nd n rbtrry cton s selected wth probblty ε usng unform probblty dstrbuton. Ths method s n undrected explorton method n the sense tht t does not use ny tsk-specfc nformton. Another undrected explorton method selects ctons ccordng to Boltzmnn-dstrbuted probbltes exp( Q( s, ) / T ) Prob ( ) (5) exp( Q( s, ) / T ) k where T (clled temperture) dusts the rndomness of cton selecton. The mn dfference between ths method nd ε-greedy explorton s tht non-greedy ctons re selected wth probblty tht s proportonl to ther current vlue estmte, whle ε-greedy explorton selects non-greedy ctons rndomly. Drected explorton uses tsk-specfc knowledge for gudng explorton. Mny such methods gude the explorton so tht the entre stte spce would be explored n order to lern the vlue functon s well s possble. In rel-world tsks exhustve explorton my be mpossble or dngerous. However, technque clled optmstc ntl vlues offers possblty of encourgng explorton of prevously un-encountered sttes mnly n the begnnng of explorton. It cn be mplemented by usng ntl vlue functon estmtes tht re bgger thn the expected ones. Ths gves the effect tht unused ctons hve bgger cton vlue estmtes thn used ones, so unused ctons tend to be selected rther thn lredy used ctons. When ll ctons hve been used suffcent number of tmes, the true vlue functon overrdes the ntl vlue functon estmtes. In ths pper, ε- greedy explorton s used for explorton. The effect of usng optmstc ntl vlues on lernng s lso studed... Delyed rewrd When rewrd s not mmedte for every stte trnston, rewrds somehow need to be propgted bckwrds through the stte hstory. Temporl Dfference (TD) methods (Sutton, 988) re currently the most used RL methods for hndlng delyed rewrd. TD methods updte the vlue of stte not only bsed on mmedte rewrd, but lso bsed on the dscounted vlue of the next stte of the gent. Therefore TD methods updte the vlue functon on every stte trnston, not only fter k Thrun (99) clls ths sem-unform dstrbuted explorton

4 trnstons tht result n drect rewrd. When rewrd hs been temporlly bck propgted suffcent number of tmes, these dscounted rewrd vlues cn be used s trget vlues for grdent descent lernng (Brto, Sutton & Wtkns, 99). Such grdent descent lernng llows usng lmost ny A s the model for RL, but unfortuntely the MDP ssumpton underlyng TD methods often gves convergence problems. Delyed rewrd tsks re out of the scope of ths pper, whch s the reson why such tsks re not nlysed more deeply here. Good overvews on delyed rewrd re (Kelblng, Lttmn & Moore, 996) nd (Sutton & Brto, 998). The mn gol of ths pper s to show how sclng the Adlne lernng rte mproves lernng, whch s llustrted usng n mmedte rewrd RL tsk. 3 SCALED LEARIG RATE In methods bsed on grdent descent, the lernng rte hs gret nfluence on the lernng speed nd on f lernng succeeds t ll. In ths secton t s shown why the lernng rte should be scled s functon of the nput vlues of Adlne-type As. Sclng the lernng rte of other As s lso dscussed. 3. Adlne lernng rte If we combne equtons () nd (3), we obtn the followng expresson for the new output vlue fter updtng Adlne weghts usng the Wdrow-Hoff lernng rule: ( s) ( w + α( t ) s ), new s s w + α, + s s w ( t new, αs ( t ) ) (6) where new (s) s the new output vlue for output fter the weght updte when the nput vlues s re presented gn. If the lernng rte s set to α, then new (s) would be exctly equl to t f the expresson s multpled by s ( t ) (7) s Then, by contnung from equton (6): new ( s) + α( t s ) t s (8) + α ( t ) (9) when settng α. Multplyng (7) by (8) scles the weght modfcton n such wy tht the new output vlue wll pproch the trget vlue wth the rto gven by the lernng rte, ndependently of the nput vlues s. In the rest of the pper, the term scled lernng rte (slr) wll be used for denotng lernng rte tht s multpled by expresson (8). If the vlue of the (un-scled) lernng rte s greter thn the vlue gven by expresson (8), then weghts re modfed so tht the new output vlue wll be on the opposte sde of the trget vlue n the grdent drecton. Such overshootng esly leds to uncontrolled weght modfctons, where weght vlues tend to oscllte towrds nfnty. Ths knd of weght osclltons usully mkes lernng fl completely. The squred sum of nput vlues n expresson (8) cnnot be llowed to be zero. Ths cn only hppen f ll nput vlues s re zero. However, the bs term used n most Adlne mplementtons vods ths. A bs term s supplementry nput wth constnt vlue one, by whch expresson () cn represent ny lner functon, no mtter wht s the nput spce dmensonlty. Most mult-lyer As mplctly lso vod stutons where ll Adlne nput vlues would be zero t the sme tme. Ths s studed more n detl n the followng secton. 3. on-lner, mult-lyer As In As, neurons re usully orgnzed nto lyers, where the output vlues of neurons n lyer re ndependent of ech other nd cn therefore be clculted smultneously. Fgure shows feedforwrd A wth one nput nd one output (there my be n rbtrry number of both). A nput vlues re dstrbuted s nput vlues to the hdden

5 neurons of the hdden lyer. Hdden neurons usully hve non-lner output functon wth vlues lmted to the rnge [, ]. Some non-lner output functons hve vlues lmted to the rnge [-, ]. o s Fgure. Three-lyer feed-forwrd A wth sgmod outputs n hdden lyer nd lner output lyer. A commonly used output functon for hdden neurons s the sgmod functon t o f ( A) A () + e where o s the output vlue nd A s the weghted sum of the nputs (). The output neurons of the output lyer my be lner or non-lner. If they re lner, they re usully Adlnes. Lernng n mult-lyer As cn be performed n mny wys. For As lke the one n Fgure, grdent descent method clled bck-propgton s the most used one (Rumelhrt et l., 988). For such lernng, two questons rse: Is t possble to use scled lernng rte lso for non-lner neurons? If the output lyer (or some other lyer) s n Adlne, s t then useful to use the scled lernng rte? The nswer to the frst queston s probbly no. The reson for ths s tht functons lke the sgmod functon () do not llow rbtrry output vlues. Therefore, f the trget vlue s outsde the rnge of possble output vlues, then t s not possble to fnd weghts tht would modfy the output vlue so tht t would become exctly equl to the trget vlue. Insted, the lernng rte of non-lner neurons s usully dynmclly dpted dependng on n estmton of the steepness of the grdent over severl trnng steps (Hykn, 999). The nswer to the second queston s mybe. If Adlne nputs re lmted to the rnge [, ], then the squred sum n (8) remns lmted. Stll, n the begnnng of trnng, hdden neuron outputs re generlly very smll. Then scled lernng rte mght ccelerte output lyer lernng, whle slowng down when hdden neurons become more ctvted. Ths could be n nterestng drecton of future reserch to nvestgte. 4 EXPERIMETAL RESULTS Experments were performed usng robot bult wth the Lego Mndstorms Robotcs Inventon System (RIS). The RIS offers chep, stndrd nd smple pltform for performng rel-world tests. In ddton to Lego buldng blocks, t ncludes two electrcl motors; two touch sensors nd one lght sensor. The mn block contns smll computer (RCX) wth connectors for motors nd sensors. Among others, the Jv progrmmng lnguge cn be used for progrmmng the RCX. Fgure. Lego Mndstorms robot. Lght sensor s t the top n the front, drected forwrds. One touch sensor s nstlled t the front nd nother t the rer. The robot hd one motor on ech sde; touch sensors n the front nd n the bck nd lght sensor drected strght forwrd mounted n the front (Fgure ). Robots usully hve more thn one lght sensor, whch were smulted by turnng the robot round nd gettng lght redngs from three dfferent drectons. One lght redng ws from the drecton strght forwrd nd the two others bout 5 degrees left/rght, obtned by lettng one motor go forwrd nd the other motor bckwrd for 5 mllseconds nd then nversng the operton. The lght sensor redng from the forwrd drecton fter performng n cton s drectly used s the rewrd vlue, thus vodng hnd tunng of the rewrd functon. Fve ctons re used, whch consst n dong one of the followng motor commnds for 45 mllseconds: ) both motors forwrd, /3) one forwrd, other stopped, 4/5) one forwrd, other bckwrd. Gong strght forwrd mens dvncng bout 5 cm, ctons /3 gong forwrd bout cm nd turnng bout 5 degrees nd ctons 4/5 turnng bout 4 degrees wthout dvncng.

6 9 lght sensor vlue The robot strts bout centmetres from the lmp, ntlly drected strght towrds t. Rechng lght vlue of 8 out of sgnfes tht the gol s reched, whch mens one to ffteen centmetres from the lmp dependng on the pproch drecton nd sensor nose. In order to rech the gol lght vlue, the robot hs to be very precsely drected strght towrds the lmp. The lmp s on the floor level nd gves drected lght n hlf-sphere n front of t. If the robot hts n obstcle or drves behnd the lmp, then t s mnully put bck to the strt poston nd drecton. The test room s n offce room wth nose due to floor reflectons, wlls nd shelves wth dfferent colours etc. The robot tself s lso source of nose due to mprecse motor movements, bttery chrge etc. However, the lght sensor s clerly the bggest source of nose s shown n Fgure 3, where lght sensor smples re ndcted for two dfferent levels of lumnosty lght sensor vlue smple ndex smple ndex number of smples number of smples lght sensor vlue lght sensor vlue Fgure 3. 5 lght smples for two dfferent lght condtons, tken wth 5 mllsecond ntervls. Averge vlues re.5 nd 6.. Rw vlues re shown to the left, vlue dstrbuton to the rght. Tble. Hnd-coded weghts. One row per cton, one column per stte vrble (lght sensor redng). Acton Left Mddle Rght Forwrd..8. Left/forwrd.6.3. Rght/forwrd..3.6 Left.6.. Rght...6 When usng n A there s one output per cton, where the output vlue corresponds to the cton-vlue estmte of the correspondng cton. Wth fve ctons nd three stte vrbles, 5x3 weght mtrx cn represent the weghts (no bs nput used here). A hnd-coded gent wth predefned weghts (Tble ) ws used n order to prove tht n Adlne lner functon pproxmtor cn solve the control tsk nd s reference for udgng how good the performnce s for lernng gents. These weghts were determned bsed on the prncple tht f the lght vlue s gretest n the mddle, then mke the forwrd-gong cton hve the bggest output vlue. In the sme wy, f the lght vlue s greter to the left, then fvour some leftturnng cton nd vce vers for the rght sde. The hnd-coded gent reched the gol fter 3- epsode verge of bout 7 steps. Lernng gents used the sme Adlne rchtecture s the hnd-coded gent. Weghts re modfed by the Wdrow-Hoff trnng rule (3). All gents used ε-greedy explorton wth ε., whch seemed to be the best vlue fter expermentton. Tests were performed both wth weghts ntlsed to smll rndom vlues n the rnge [,.) nd wth weghts hvng optmstc ntl vlues n the rnge [, ). Such weghts re optmstc becuse ther expected sum per cton neuron s.5, whle weght vlues fter trnng should converge to vlues whose sum s close to one. Ths s becuse stte vrble vlues nd rewrd vlues re ll lght sensor redngs, so the estmted rewrd vlue should be close to t lest one of the stte vrble vlues. If the RL s successful, then the estmted rewrd should even be lttle bt bgger snce the gol by defnton of RL s to mke the gent move towrds sttes gvng hgher rewrd. An un-scled lernng rte vlue of. ws used fter lot of expermentton. Ths vlue s compromse. Fr from the lmp, lght sensor vlues re bout, so expresson (8) gves the vlue /( + + ),333. Close to the lmp, lght sensor vlues pproch 8, so the correspondng vlue would be.5. Accordng to expresson (8), the un-scled lernng rte vlue. corresponds to lght sensor vlues round 58, whch s lredy close to the lmp. Therefore, excessve weght modfctons probbly occur close to the lmp, but then the number of steps remnng to rech the gol s usully so smll tht weghts do not hve the tme to oscllte. number of steps epsode number lr. slr.8 Fgure 4. Comprson between sttc (lr) nd scled (slr) lernng rte. Averges for runs.

7 Fgure 4 compres the performnce of n gent usng the un-scled lernng rte. nd n gent usng slr.8. Wth ths scled lernng rte, the frst epsode s slghtly fster. Convergence s lso much smoother wth the scled lernng rte thn wth the un-scled lernng rte. The sttstcs shown n Tble further emphsze the dvntge of usng scled lernng rte. The totl number of steps s slghtly smller, but the verge length of the lst fve epsodes s clerly lower for the gents usng the scled lernng rte. Ths s most probbly becuse the un-scled lernng rte sometmes cuses excessve weght modfctons tht prevent the gent from convergng to optml weghts. The number of mnul resets s further ndcton of excessve weght modfctons. One bd lght sensor redng my be suffcent to mke the robot get stuck nto usng the sme cton for severl steps. If tht cton hppens to be gong strght forwrd, then the robot usully hts wll fter few steps. Reducng the vlue of the unscled lernng rte could reduce ths phenomenon, but t would lso mke lernng slower. Tble. Sttstcs for gents usng dfferent lernng rtes. Averges for runs. Agent Totl Aver. 5 lst Mn. resets lr slr slr slr Fgure 5 compres the performnce of gents usng dfferent vlues for the scled lernng rte. All grphs re smooth nd converge nerly s rpdly. Ths result shows tht the scled lernng rte s tolernt to dfferent vlues. The menng of the scled lernng rte s lso esy to understnd (.e. percentge of modfcton of output vlue towrds trget vlue ), so determnng good vlue for the scled lernng rte s eser thn for the unscled lernng rte. number of steps epsode number Fgure 5. Results for dfferent vlues of scled lernng rte. Averges for runs. slr. slr.5 slr.8 Fgure 6 nd Tble 3 compre the performnce of gents whose weghts re ntlsed wth rndom vlues from the rnge [,.) nd gents whose weghts re ntlsed wth optmstc ntl vlues,.e. rndom vlues from the rnge [, ). Usng optmstc ntl vlues clerly gves fster ntl explorton. The number of mnul resets wth optmstc ntl vlues s lso lower for gents usng slr.5 nd slr.8, but nsted t s hgher for lr. nd slr.. Fnlly, when settng ε fter epsodes,.e. lwys tkng the greedy cton, the trned gents hd dentcl performnce s the hnd-coded gent. However, one should remember tht lernng gents could lso dpt to chnges n the envronment or dfferences n sensor sensblty, for nstnce. number of steps epsode number Fgure 6. Results for dfferent ntl weghts. Averges for runs. Tble 3. Sttstcs for dfferent ntl weghts. Averges for runs. nt[,.) nt[, ) Intl weghts Totl Aver. 5 lst Mn. resets [,.) [, ) RELATED WORK The mount of lterture on grdent descent lernng s bundnt. One of the most recent nd exhustve sources on the subect s (Hykn, 999). Adustng the Adlne lernng rte hs been studed prevously t lest by Luo (99), who shows tht the Adlne lernng rte should be reduced durng lernng n order to vod cyclclly umpng round the optml soluton. References n (Luo, 99) lso offer good overvew of reserch concernng the grdent descent lernng rte. However, to the uthor s knowledge, the concept of scled lernng rte ntroduced n ths pper s new. RL hs been used n mny robotc tsks, but most of them hve been performed n smulted envronments. Only few results hve been reported

8 on the use of RL on rel robots. The expermentl settng used here resembles behvor lernng performed by Ln (99) nd Mhdevn & Connell (99). Behvorl tsks treted by them nclude wll followng, gong through door, dockng nto chrger (guded by lght sensors), fndng boxes, pushng boxes nd gettng un-wedged from stlled sttes. Some of these behvors re more chllengng thn the lght-seekng behvor used n ths pper, but the smple lner Adlne model used here for stte generlzton gretly smplfes the lernng tsk compred to prevous work. An exmple of non-rl work on lght-seekng robots s Lebeltel et l. (4). 6 COCLUSIOS One of the most mportnt dvntges of the scled lernng rte presented n ths pper s tht t s esy to understnd the sgnfcton of the vlues used for t. Evdence s lso shown tht the scled lernng rte mproves lernng becuse t mkes the network output vlues pproch the correspondng trget vlues wth smlr mount ndependently of the nput vlues. Expermentl results wth rel-world lght-seekng robot llustrte the mprovement n lernng results by usng the scled lernng rte. It seems rther surprsng tht scled lernng rte hs not been used yet, ccordng to the uthor s best knowledge. One explnton mght be tht n supervsed lernng tsks, the trnng smples re usully vlble beforehnd, whch mkes t possble to normlze them nto sutble vlues. In rel-world RL tsks, wth constrnts on lernng tme nd the vlblty of trnng smples, ths my not be possble. Usng mult-lyer non-lner As lso mght reduce the utlty of sclng the lernng rte, s explned n secton 3.. In ddton to the scled lernng rte, the RL explorton/explotton trde-off s lso ddressed n the pper. The explorton polcy used determnes the qulty of collected trnng smples nd therefore gretly ffects lernng speed nd the qulty of lerned solutons. Emprcl results re shown mnly on the dvntges of usng optmstc ntl vlues for the network weghts when possble. Future work ncludes mprovng explorton polces nd hndlng delyed rewrd. Obtnng further results on the use of the scled lernng rte for other thn RL tsks would lso be useful. ACKOWLEDGEMETS I would lke to thnk Brn Bgnll for wrtng the rtcle Buldng Lght-Seekng Robot wth Q- Lernng, publshed on-lne on Aprl 9 th,. It gve me the de to use the Lego Mndstorms kt nd hs source code ws of vluble help. REFERECES Brto, A.G., Sutton, R.S., Wtkns C.J.C.H. (99). Lernng nd Sequentl Decson Mkng. In M. Gbrel nd J. Moore (eds.), Lernng nd computtonl neuroscence : foundtons of dptve networks. M.I.T. Press. Boyn, J. A., Moore, A. W. (995). Generlzton n Renforcement Lernng: Sfely Approxmtng the Vlue Functon. In Tesuro, G., Touretzky, D., Leen, T. (eds.), IPS'994 proc., Vol. 7. MIT Press, Hykn, S. (999). eurl etworks - comprehensve foundton. Prentce-Hll, ew Jersey, USA. Kelblng, L.P., Lttmn, M.L., Moore, A.W. (996). Renforcement Lernng: A Survey. Journl of Artfcl Intellgence Reserch, Vol. 4, Lebeltel, O., Bessère, P., Drd, J., Mzer, E. (4). Byesn Robot Progrmmng. Autonomous Robots, Vol. 6, Ln, L.-J. (99). Progrmmng robots usng renforcement lernng nd techng. In Proc. of the nth tonl Conference on Artfcl Intellgence (AAAI), Luo, Z. (99). On the convergence of the LMS lgorthm wth dptve lernng rte for lner feedforwrd networks. eurl Computton, Vol. 3, Mhdevn, S., Connell, J. (99). Automtc Progrmmng of Behvor-bsed Robots usng Renforcement Lernng. Artfcl Intellgence, Vol. 55, os. -3, Rumelhrt, D. E., McClellnd, J. L. et l. (988). Prllel Dstrbuted Processng Vol.. MIT Press, Msschusetts. Sutton, R. S. (988). Lernng to predct by the method of temporl dfferences. Mchne Lernng, Vol. 3, Sutton, R.S., Brto, A.G. (998). Renforcement Lernng. MIT Press, Cmbrdge, MA. Thrun, S.B. (99). The role of explorton n lernng control. In DA Whte & DA Sofge, (eds.), Hndbook of Intellgent Control: eurl, Fuzzy nd Adptve Approches. Vn ostrnd Renhold, ew York. Wdrow, B., Hoff, M.E. (96). Adptve swtchng crcuts. 96 WESCO Conventon record Prt IV, Insttute of Rdo Engneers, ew York, 96-4.

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC Introducton Rnk One Updte And the Google Mtrx y Al Bernsten Sgnl Scence, LLC www.sgnlscence.net here re two dfferent wys to perform mtrx multplctons. he frst uses dot product formulton nd the second uses

More information

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation Chpter.4 Newton-Rphson Method of Solvng Nonlner Equton After redng ths chpter, you should be ble to:. derve the Newton-Rphson method formul,. develop the lgorthm of the Newton-Rphson method,. use the Newton-Rphson

More information

Applied Statistics Qualifier Examination

Applied Statistics Qualifier Examination Appled Sttstcs Qulfer Exmnton Qul_june_8 Fll 8 Instructons: () The exmnton contns 4 Questons. You re to nswer 3 out of 4 of them. () You my use ny books nd clss notes tht you mght fnd helpful n solvng

More information

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1 Denns Brcker, 2001 Dept of Industrl Engneerng The Unversty of Iow MDP: Tx pge 1 A tx serves three djcent towns: A, B, nd C. Ech tme the tx dschrges pssenger, the drver must choose from three possble ctons:

More information

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism CS294-40 Lernng for Rootcs nd Control Lecture 10-9/30/2008 Lecturer: Peter Aeel Prtlly Oservle Systems Scre: Dvd Nchum Lecture outlne POMDP formlsm Pont-sed vlue terton Glol methods: polytree, enumerton,

More information

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation Chpter 0.04 Newton-Rphson Method o Solvng Nonlner Equton Ater redng ths chpter, you should be ble to:. derve the Newton-Rphson method ormul,. develop the lgorthm o the Newton-Rphson method,. use the Newton-Rphson

More information

Remember: Project Proposals are due April 11.

Remember: Project Proposals are due April 11. Bonformtcs ecture Notes Announcements Remember: Project Proposls re due Aprl. Clss 22 Aprl 4, 2002 A. Hdden Mrov Models. Defntons Emple - Consder the emple we tled bout n clss lst tme wth the cons. However,

More information

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x)

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x) DCDM BUSINESS SCHOOL NUMEICAL METHODS (COS -8) Solutons to Assgnment Queston Consder the followng dt: 5 f() 8 7 5 () Set up dfference tble through fourth dfferences. (b) Wht s the mnmum degree tht n nterpoltng

More information

Definition of Tracking

Definition of Tracking Trckng Defnton of Trckng Trckng: Generte some conclusons bout the moton of the scene, objects, or the cmer, gven sequence of mges. Knowng ths moton, predct where thngs re gong to project n the net mge,

More information

18.7 Artificial Neural Networks

18.7 Artificial Neural Networks 310 18.7 Artfcl Neurl Networks Neuroscence hs hypotheszed tht mentl ctvty conssts prmrly of electrochemcl ctvty n networks of brn cells clled neurons Ths led McCulloch nd Ptts to devse ther mthemtcl model

More information

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II Mcroeconomc Theory I UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS MSc n Economcs MICROECONOMIC THEORY I Techng: A Lptns (Note: The number of ndctes exercse s dffculty level) ()True or flse? If V( y )

More information

Quiz: Experimental Physics Lab-I

Quiz: Experimental Physics Lab-I Mxmum Mrks: 18 Totl tme llowed: 35 mn Quz: Expermentl Physcs Lb-I Nme: Roll no: Attempt ll questons. 1. In n experment, bll of mss 100 g s dropped from heght of 65 cm nto the snd contner, the mpct s clled

More information

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service Dynmc Power Mngement n Moble Multmed System wth Gurnteed Qulty-of-Servce Qnru Qu, Qng Wu, nd Mssoud Pedrm Dept. of Electrcl Engneerng-Systems Unversty of Southern Clforn Los Angeles CA 90089 Outlne! Introducton

More information

6 Roots of Equations: Open Methods

6 Roots of Equations: Open Methods HK Km Slghtly modfed 3//9, /8/6 Frstly wrtten t Mrch 5 6 Roots of Equtons: Open Methods Smple Fed-Pont Iterton Newton-Rphson Secnt Methods MATLAB Functon: fzero Polynomls Cse Study: Ppe Frcton Brcketng

More information

Lecture 4: Piecewise Cubic Interpolation

Lecture 4: Piecewise Cubic Interpolation Lecture notes on Vrtonl nd Approxmte Methods n Appled Mthemtcs - A Perce UBC Lecture 4: Pecewse Cubc Interpolton Compled 6 August 7 In ths lecture we consder pecewse cubc nterpolton n whch cubc polynoml

More information

4. Eccentric axial loading, cross-section core

4. Eccentric axial loading, cross-section core . Eccentrc xl lodng, cross-secton core Introducton We re strtng to consder more generl cse when the xl force nd bxl bendng ct smultneousl n the cross-secton of the br. B vrtue of Snt-Vennt s prncple we

More information

Statistics and Probability Letters

Statistics and Probability Letters Sttstcs nd Probblty Letters 79 (2009) 105 111 Contents lsts vlble t ScenceDrect Sttstcs nd Probblty Letters journl homepge: www.elsever.com/locte/stpro Lmtng behvour of movng verge processes under ϕ-mxng

More information

Principle Component Analysis

Principle Component Analysis Prncple Component Anlyss Jng Go SUNY Bufflo Why Dmensonlty Reducton? We hve too mny dmensons o reson bout or obtn nsghts from o vsulze oo much nose n the dt Need to reduce them to smller set of fctors

More information

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus ESI 34 tmospherc Dnmcs I Lesson 1 Vectors nd Vector lculus Reference: Schum s Outlne Seres: Mthemtcl Hndbook of Formuls nd Tbles Suggested Redng: Mrtn Secton 1 OORDINTE SYSTEMS n orthonorml coordnte sstem

More information

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia Vrble tme mpltude mplfcton nd quntum lgorthms for lner lgebr Andrs Ambns Unversty of Ltv Tlk outlne. ew verson of mpltude mplfcton;. Quntum lgorthm for testng f A s sngulr; 3. Quntum lgorthm for solvng

More information

INTRODUCTION TO COMPLEX NUMBERS

INTRODUCTION TO COMPLEX NUMBERS INTRODUCTION TO COMPLEX NUMBERS The numers -4, -3, -, -1, 0, 1,, 3, 4 represent the negtve nd postve rel numers termed ntegers. As one frst lerns n mddle school they cn e thought of s unt dstnce spced

More information

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands In ths Chpter Chp. 3 Mrov chns nd hdden Mrov models Bontellgence bortory School of Computer Sc. & Eng. Seoul Ntonl Unversty Seoul 5-74, Kore The probblstc model for sequence nlyss HMM (hdden Mrov model)

More information

523 P a g e. is measured through p. should be slower for lesser values of p and faster for greater values of p. If we set p*

523 P a g e. is measured through p. should be slower for lesser values of p and faster for greater values of p. If we set p* R. Smpth Kumr, R. Kruthk, R. Rdhkrshnn / Interntonl Journl of Engneerng Reserch nd Applctons (IJERA) ISSN: 48-96 www.jer.com Vol., Issue 4, July-August 0, pp.5-58 Constructon Of Mxed Smplng Plns Indexed

More information

Two Coefficients of the Dyson Product

Two Coefficients of the Dyson Product Two Coeffcents of the Dyson Product rxv:07.460v mth.co 7 Nov 007 Lun Lv, Guoce Xn, nd Yue Zhou 3,,3 Center for Combntorcs, LPMC TJKLC Nnk Unversty, Tnjn 30007, P.R. Chn lvlun@cfc.nnk.edu.cn gn@nnk.edu.cn

More information

Demand. Demand and Comparative Statics. Graphically. Marshallian Demand. ECON 370: Microeconomic Theory Summer 2004 Rice University Stanley Gilbert

Demand. Demand and Comparative Statics. Graphically. Marshallian Demand. ECON 370: Microeconomic Theory Summer 2004 Rice University Stanley Gilbert Demnd Demnd nd Comrtve Sttcs ECON 370: Mcroeconomc Theory Summer 004 Rce Unversty Stnley Glbert Usng the tools we hve develoed u to ths ont, we cn now determne demnd for n ndvdul consumer We seek demnd

More information

Online Appendix to. Mandating Behavioral Conformity in Social Groups with Conformist Members

Online Appendix to. Mandating Behavioral Conformity in Social Groups with Conformist Members Onlne Appendx to Mndtng Behvorl Conformty n Socl Groups wth Conformst Members Peter Grzl Andrze Bnk (Correspondng uthor) Deprtment of Economcs, The Wllms School, Wshngton nd Lee Unversty, Lexngton, 4450

More information

International Journal of Pure and Applied Sciences and Technology

International Journal of Pure and Applied Sciences and Technology Int. J. Pure Appl. Sc. Technol., () (), pp. 44-49 Interntonl Journl of Pure nd Appled Scences nd Technolog ISSN 9-67 Avlle onlne t www.jopst.n Reserch Pper Numercl Soluton for Non-Lner Fredholm Integrl

More information

A Tri-Valued Belief Network Model for Information Retrieval

A Tri-Valued Belief Network Model for Information Retrieval December 200 A Tr-Vlued Belef Networ Model for Informton Retrevl Fernndo Ds-Neves Computer Scence Dept. Vrgn Polytechnc Insttute nd Stte Unversty Blcsburg, VA 24060. IR models t Combnng Evdence Grphcl

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Improving Anytime Point-Based Value Iteration Using Principled Point Selections

Improving Anytime Point-Based Value Iteration Using Principled Point Selections In In Proceedngs of the Twenteth Interntonl Jont Conference on Artfcl Intellgence (IJCAI-7) Improvng Anytme Pont-Bsed Vlue Iterton Usng Prncpled Pont Selectons Mchel R. Jmes, Mchel E. Smples, nd Dmtr A.

More information

Smart Motorways HADECS 3 and what it means for your drivers

Smart Motorways HADECS 3 and what it means for your drivers Vehcle Rentl Smrt Motorwys HADECS 3 nd wht t mens for your drvers Vehcle Rentl Smrt Motorwys HADECS 3 nd wht t mens for your drvers You my hve seen some news rtcles bout the ntroducton of Hghwys Englnd

More information

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning 3/6/ Hdden Mrkov Models: Explnton nd Model Lernng Brn C. Wllms 6.4/6.43 Sesson 2 9/3/ courtesy of JPL copyrght Brn Wllms, 2 Brn C. Wllms, copyrght 2 Redng Assgnments AIMA (Russell nd Norvg) Ch 5.-.3, 2.3

More information

A Reinforcement Learning System with Chaotic Neural Networks-Based Adaptive Hierarchical Memory Structure for Autonomous Robots

A Reinforcement Learning System with Chaotic Neural Networks-Based Adaptive Hierarchical Memory Structure for Autonomous Robots Interntonl Conference on Control, Automton nd ystems 008 Oct. 4-7, 008 n COEX, eoul, Kore A Renforcement ernng ystem wth Chotc Neurl Networs-Bsed Adptve Herrchcl Memory tructure for Autonomous Robots Msno

More information

Electrochemical Thermodynamics. Interfaces and Energy Conversion

Electrochemical Thermodynamics. Interfaces and Energy Conversion CHE465/865, 2006-3, Lecture 6, 18 th Sep., 2006 Electrochemcl Thermodynmcs Interfces nd Energy Converson Where does the energy contrbuton F zϕ dn come from? Frst lw of thermodynmcs (conservton of energy):

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Simultaneous estimation of rewards and dynamics from noisy expert demonstrations

Simultaneous estimation of rewards and dynamics from noisy expert demonstrations Smultneous estmton of rewrds nd dynmcs from nosy expert demonstrtons Mchel Hermn,2, Tobs Gndele, Jo rg Wgner, Felx Schmtt, nd Wolfrm Burgrd2 - Robert Bosch GmbH - 70442 Stuttgrt - Germny 2- Unversty of

More information

Statistics 423 Midterm Examination Winter 2009

Statistics 423 Midterm Examination Winter 2009 Sttstcs 43 Mdterm Exmnton Wnter 009 Nme: e-ml: 1. Plese prnt your nme nd e-ml ddress n the bove spces.. Do not turn ths pge untl nstructed to do so. 3. Ths s closed book exmnton. You my hve your hnd clcultor

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

M/G/1/GD/ / System. ! Pollaczek-Khinchin (PK) Equation. ! Steady-state probabilities. ! Finding L, W q, W. ! π 0 = 1 ρ

M/G/1/GD/ / System. ! Pollaczek-Khinchin (PK) Equation. ! Steady-state probabilities. ! Finding L, W q, W. ! π 0 = 1 ρ M/G//GD/ / System! Pollcze-Khnchn (PK) Equton L q 2 2 λ σ s 2( + ρ ρ! Stedy-stte probbltes! π 0 ρ! Fndng L, q, ) 2 2 M/M/R/GD/K/K System! Drw the trnston dgrm! Derve the stedy-stte probbltes:! Fnd L,L

More information

Computing a complete histogram of an image in Log(n) steps and minimum expected memory requirements using hypercubes

Computing a complete histogram of an image in Log(n) steps and minimum expected memory requirements using hypercubes Computng complete hstogrm of n mge n Log(n) steps nd mnmum expected memory requrements usng hypercubes TAREK M. SOBH School of Engneerng, Unversty of Brdgeport, Connectcut, USA. Abstrct Ths work frst revews

More information

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR REVUE D ANALYSE NUMÉRIQUE ET DE THÉORIE DE L APPROXIMATION Tome 32, N o 1, 2003, pp 11 20 THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR TEODORA CĂTINAŞ Abstrct We extend the Sheprd opertor by

More information

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past Usng Predctons n Onlne Optmzton: Lookng Forwrd wth n Eye on the Pst Nngjun Chen Jont work wth Joshu Comden, Zhenhu Lu, Anshul Gndh, nd Adm Wermn 1 Predctons re crucl for decson mkng 2 Predctons re crucl

More information

Review of linear algebra. Nuno Vasconcelos UCSD

Review of linear algebra. Nuno Vasconcelos UCSD Revew of lner lgebr Nuno Vsconcelos UCSD Vector spces Defnton: vector spce s set H where ddton nd sclr multplcton re defned nd stsf: ) +( + ) (+ )+ 5) λ H 2) + + H 6) 3) H, + 7) λ(λ ) (λλ ) 4) H, - + 8)

More information

Solution of Tutorial 5 Drive dynamics & control

Solution of Tutorial 5 Drive dynamics & control ELEC463 Unversty of New South Wles School of Electrcl Engneerng & elecommunctons ELEC463 Electrc Drve Systems Queston Motor Soluton of utorl 5 Drve dynmcs & control 500 rev/mn = 5.3 rd/s 750 rted 4.3 Nm

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

1 Probability Density Functions

1 Probability Density Functions Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Introduction to Numerical Integration Part II

Introduction to Numerical Integration Part II Introducton to umercl Integrton Prt II CS 75/Mth 75 Brn T. Smth, UM, CS Dept. Sprng, 998 4/9/998 qud_ Intro to Gussn Qudrture s eore, the generl tretment chnges the ntegrton prolem to ndng the ntegrl w

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Physics 121 Sample Common Exam 2 Rev2 NOTE: ANSWERS ARE ON PAGE 7. Instructions:

Physics 121 Sample Common Exam 2 Rev2 NOTE: ANSWERS ARE ON PAGE 7. Instructions: Physcs 121 Smple Common Exm 2 Rev2 NOTE: ANSWERS ARE ON PAGE 7 Nme (Prnt): 4 Dgt ID: Secton: Instructons: Answer ll 27 multple choce questons. You my need to do some clculton. Answer ech queston on the

More information

Non-Linear Data for Neural Networks Training and Testing

Non-Linear Data for Neural Networks Training and Testing Proceedngs of the 4th WSEAS Int Conf on Informton Securty, Communctons nd Computers, Tenerfe, Spn, December 6-8, 005 (pp466-47) Non-Lner Dt for Neurl Networks Trnng nd Testng ABDEL LATIF ABU-DALHOUM MOHAMMED

More information

Effects of polarization on the reflected wave

Effects of polarization on the reflected wave Lecture Notes. L Ros PPLIED OPTICS Effects of polrzton on the reflected wve Ref: The Feynmn Lectures on Physcs, Vol-I, Secton 33-6 Plne of ncdence Z Plne of nterfce Fg. 1 Y Y r 1 Glss r 1 Glss Fg. Reflecton

More information

State Estimation in TPN and PPN Guidance Laws by Using Unscented and Extended Kalman Filters

State Estimation in TPN and PPN Guidance Laws by Using Unscented and Extended Kalman Filters Stte Estmton n PN nd PPN Gudnce Lws by Usng Unscented nd Extended Klmn Flters S.H. oospour*, S. oospour**, mostf.sdollh*** Fculty of Electrcl nd Computer Engneerng, Unversty of brz, brz, Irn, *s.h.moospour@gml.com

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization On-lne Renforcement Lernng Usng Incrementl Kernel-Bsed Stochstc Fctorzton André M. S. Brreto School of Computer Scence McGll Unversty Montrel, Cnd msb@cs.mcgll.c Don Precup School of Computer Scence McGll

More information

Two Activation Function Wavelet Network for the Identification of Functions with High Nonlinearity

Two Activation Function Wavelet Network for the Identification of Functions with High Nonlinearity Interntonl Journl of Engneerng & Computer Scence IJECS-IJENS Vol:1 No:04 81 Two Actvton Functon Wvelet Network for the Identfcton of Functons wth Hgh Nonlnerty Wsm Khld Abdulkder Abstrct-- The ntegrton

More information

Chapter 5 Supplemental Text Material R S T. ij i j ij ijk

Chapter 5 Supplemental Text Material R S T. ij i j ij ijk Chpter 5 Supplementl Text Mterl 5-. Expected Men Squres n the Two-fctor Fctorl Consder the two-fctor fxed effects model y = µ + τ + β + ( τβ) + ε k R S T =,,, =,,, k =,,, n gven s Equton (5-) n the textook.

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Investigation phase in case of Bragg coupling

Investigation phase in case of Bragg coupling Journl of Th-Qr Unversty No.3 Vol.4 December/008 Investgton phse n cse of Brgg couplng Hder K. Mouhmd Deprtment of Physcs, College of Scence, Th-Qr, Unv. Mouhmd H. Abdullh Deprtment of Physcs, College

More information

CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION. Indu Manickam, Andrew S. Lan, and Richard G.

CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION. Indu Manickam, Andrew S. Lan, and Richard G. CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION Indu Mnckm, Andrew S. Ln, nd Rchrd G. Brnuk Rce Unversty ABSTRACT Optmzng the selecton of lernng resources nd prctce

More information

7.2 Volume. A cross section is the shape we get when cutting straight through an object.

7.2 Volume. A cross section is the shape we get when cutting straight through an object. 7. Volume Let s revew the volume of smple sold, cylnder frst. Cylnder s volume=se re heght. As llustrted n Fgure (). Fgure ( nd (c) re specl cylnders. Fgure () s rght crculr cylnder. Fgure (c) s ox. A

More information

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVES Rodolphe Prm, Ntle Shlomo Southmpton Sttstcl Scences Reserch Insttute Unverst of Southmpton Unted Kngdom SAE, August 20 The BLUE-ETS Project s fnnced

More information

Research Article On the Upper Bounds of Eigenvalues for a Class of Systems of Ordinary Differential Equations with Higher Order

Research Article On the Upper Bounds of Eigenvalues for a Class of Systems of Ordinary Differential Equations with Higher Order Hndw Publshng Corporton Interntonl Journl of Dfferentl Equtons Volume 0, Artcle ID 7703, pges do:055/0/7703 Reserch Artcle On the Upper Bounds of Egenvlues for Clss of Systems of Ordnry Dfferentl Equtons

More information

CIS587 - Artificial Intelligence. Uncertainty CIS587 - AI. KB for medical diagnosis. Example.

CIS587 - Artificial Intelligence. Uncertainty CIS587 - AI. KB for medical diagnosis. Example. CIS587 - rtfcl Intellgence Uncertnty K for medcl dgnoss. Exmple. We wnt to uld K system for the dgnoss of pneumon. rolem descrpton: Dsese: pneumon tent symptoms fndngs, l tests: Fever, Cough, leness, WC

More information

arxiv: v2 [cs.lg] 9 Nov 2017

arxiv: v2 [cs.lg] 9 Nov 2017 Renforcement Lernng under Model Msmtch Aurko Roy 1, Hun Xu 2, nd Sebstn Pokutt 2 rxv:1706.04711v2 cs.lg 9 Nov 2017 1 Google Eml: urkor@google.com 2 ISyE, Georg Insttute of Technology, Atlnt, GA, USA. Eml:

More information

Machine Learning Support Vector Machines SVM

Machine Learning Support Vector Machines SVM Mchne Lernng Support Vector Mchnes SVM Lesson 6 Dt Clssfcton problem rnng set:, D,,, : nput dt smple {,, K}: clss or lbel of nput rget: Construct functon f : X Y f, D Predcton of clss for n unknon nput

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9 CS434/541: Pttern Recognton Prof. Olg Veksler Lecture 9 Announcements Fnl project proposl due Nov. 1 1-2 prgrph descrpton Lte Penlt: s 1 pont off for ech d lte Assgnment 3 due November 10 Dt for fnl project

More information

10/20/2009. Announcements. Last Time: Q-Learning. Example: Pacman. Function Approximation. Feature-Based Representations

10/20/2009. Announcements. Last Time: Q-Learning. Example: Pacman. Function Approximation. Feature-Based Representations Introducton to Artfcl Intellgence V.047-00 Fll 009 Lecture : Renforcement Lernng Announcement Agnment due next Mondy t mdnght Plee end eml to me bout fnl exm Rob Fergu Dept of Computer Scence, Cournt Inttute,

More information

The Schur-Cohn Algorithm

The Schur-Cohn Algorithm Modelng, Estmton nd Otml Flterng n Sgnl Processng Mohmed Njm Coyrght 8, ISTE Ltd. Aendx F The Schur-Cohn Algorthm In ths endx, our m s to resent the Schur-Cohn lgorthm [] whch s often used s crteron for

More information

Bi-level models for OD matrix estimation

Bi-level models for OD matrix estimation TNK084 Trffc Theory seres Vol.4, number. My 2008 B-level models for OD mtrx estmton Hn Zhng, Quyng Meng Abstrct- Ths pper ntroduces two types of O/D mtrx estmton model: ME2 nd Grdent. ME2 s mxmum-entropy

More information

4.4 Areas, Integrals and Antiderivatives

4.4 Areas, Integrals and Antiderivatives . res, integrls nd ntiderivtives 333. Ares, Integrls nd Antiderivtives This section explores properties of functions defined s res nd exmines some connections mong res, integrls nd ntiderivtives. In order

More information

Let us look at a linear equation for a one-port network, for example some load with a reflection coefficient s, Figure L6.

Let us look at a linear equation for a one-port network, for example some load with a reflection coefficient s, Figure L6. ECEN 5004, prng 08 Actve Mcrowve Crcut Zoy Popovc, Unverty of Colordo, Boulder LECURE 5 IGNAL FLOW GRAPH FOR MICROWAVE CIRCUI ANALYI In mny text on mcrowve mplfer (e.g. the clc one by Gonzlez), gnl flow-grph

More information

Reinforcement Learning with a Gaussian Mixture Model

Reinforcement Learning with a Gaussian Mixture Model Renforcement Lernng wth Gussn Mxture Model Alejndro Agostn, Member, IEEE nd Enrc Cely Abstrct Recent pproches to Renforcement Lernng (RL) wth functon pproxmton nclude Neurl Ftted Q Iterton nd the use of

More information

Fitting a Polynomial to Heat Capacity as a Function of Temperature for Ag. Mathematical Background Document

Fitting a Polynomial to Heat Capacity as a Function of Temperature for Ag. Mathematical Background Document Fttng Polynol to Het Cpcty s Functon of Teperture for Ag. thetcl Bckground Docuent by Theres Jul Zelnsk Deprtent of Chestry, edcl Technology, nd Physcs onouth Unversty West ong Brnch, J 7764-898 tzelns@onouth.edu

More information

Identification of Robot Arm s Joints Time-Varying Stiffness Under Loads

Identification of Robot Arm s Joints Time-Varying Stiffness Under Loads TELKOMNIKA, Vol.10, No.8, December 2012, pp. 2081~2087 e-issn: 2087-278X ccredted by DGHE (DIKTI), Decree No: 51/Dkt/Kep/2010 2081 Identfcton of Robot Arm s Jonts Tme-Vryng Stffness Under Lods Ru Xu 1,

More information

A Family of Multivariate Abel Series Distributions. of Order k

A Family of Multivariate Abel Series Distributions. of Order k Appled Mthemtcl Scences, Vol. 2, 2008, no. 45, 2239-2246 A Fmly of Multvrte Abel Seres Dstrbutons of Order k Rupk Gupt & Kshore K. Ds 2 Fculty of Scence & Technology, The Icf Unversty, Agrtl, Trpur, Ind

More information

Department of Mechanical Engineering, University of Bath. Mathematics ME Problem sheet 11 Least Squares Fitting of data

Department of Mechanical Engineering, University of Bath. Mathematics ME Problem sheet 11 Least Squares Fitting of data Deprtment of Mechncl Engneerng, Unversty of Bth Mthemtcs ME10305 Prolem sheet 11 Lest Squres Fttng of dt NOTE: If you re gettng just lttle t concerned y the length of these questons, then do hve look t

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Course Review Introduction to Computer Methods

Course Review Introduction to Computer Methods Course Revew Wht you hopefully hve lerned:. How to nvgte nsde MIT computer system: Athen, UNIX, emcs etc. (GCR). Generl des bout progrmmng (GCR): formultng the problem, codng n Englsh trnslton nto computer

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Study of Trapezoidal Fuzzy Linear System of Equations S. M. Bargir 1, *, M. S. Bapat 2, J. D. Yadav 3 1

Study of Trapezoidal Fuzzy Linear System of Equations S. M. Bargir 1, *, M. S. Bapat 2, J. D. Yadav 3 1 mercn Interntonl Journl of Reserch n cence Technology Engneerng & Mthemtcs vlble onlne t http://wwwsrnet IN (Prnt: 38-349 IN (Onlne: 38-3580 IN (CD-ROM: 38-369 IJRTEM s refereed ndexed peer-revewed multdscplnry

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

8. INVERSE Z-TRANSFORM

8. INVERSE Z-TRANSFORM 8. INVERSE Z-TRANSFORM The proce by whch Z-trnform of tme ere, nmely X(), returned to the tme domn clled the nvere Z-trnform. The nvere Z-trnform defned by: Computer tudy Z X M-fle trn.m ued to fnd nvere

More information

Soft Set Theoretic Approach for Dimensionality Reduction 1

Soft Set Theoretic Approach for Dimensionality Reduction 1 Interntonl Journl of Dtbse Theory nd pplcton Vol No June 00 Soft Set Theoretc pproch for Dmensonlty Reducton Tutut Herwn Rozd Ghzl Mustf Mt Ders Deprtment of Mthemtcs Educton nversts hmd Dhln Yogykrt Indones

More information

Linear and Nonlinear Optimization

Linear and Nonlinear Optimization Lner nd Nonlner Optmzton Ynyu Ye Deprtment of Mngement Scence nd Engneerng Stnford Unversty Stnford, CA 9430, U.S.A. http://www.stnford.edu/~yyye http://www.stnford.edu/clss/msnde/ Ynyu Ye, Stnford, MS&E

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

Pyramid Algorithms for Barycentric Rational Interpolation

Pyramid Algorithms for Barycentric Rational Interpolation Pyrmd Algorthms for Brycentrc Rtonl Interpolton K Hormnn Scott Schefer Astrct We present new perspectve on the Floter Hormnn nterpolnt. Ths nterpolnt s rtonl of degree (n, d), reproduces polynomls of degree

More information

An Introduction to Support Vector Machines

An Introduction to Support Vector Machines An Introducton to Support Vector Mchnes Wht s good Decson Boundry? Consder two-clss, lnerly seprble clssfcton problem Clss How to fnd the lne (or hyperplne n n-dmensons, n>)? Any de? Clss Per Lug Mrtell

More information

SVMs for regression Multilayer neural networks

SVMs for regression Multilayer neural networks Lecture SVMs for regresson Muter neur netors Mos Husrecht mos@cs.ptt.edu 539 Sennott Squre Support vector mchne SVM SVM mmze the mrgn round the seprtng hperpne. he decson functon s fu specfed suset of

More information

2008 Mathematical Methods (CAS) GA 3: Examination 2

2008 Mathematical Methods (CAS) GA 3: Examination 2 Mthemticl Methods (CAS) GA : Exmintion GENERAL COMMENTS There were 406 students who st the Mthemticl Methods (CAS) exmintion in. Mrks rnged from to 79 out of possible score of 80. Student responses showed

More information

6.6 The Marquardt Algorithm

6.6 The Marquardt Algorithm 6.6 The Mqudt Algothm lmttons of the gdent nd Tylo expnson methods ecstng the Tylo expnson n tems of ch-sque devtves ecstng the gdent sech nto n tetve mtx fomlsm Mqudt's lgothm utomtclly combnes the gdent

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and MANAGEMENT SCIENCE Vol. 53, No. 2, Februry 2007, pp. 308 322 ssn 0025-1909 essn 1526-5501 07 5302 0308 nforms do 10.1287/mnsc.1060.0614 2007 INFORMS Bs nd Vrnce Approxmton n Vlue Functon Estmtes She Mnnor

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

The Number of Rows which Equal Certain Row

The Number of Rows which Equal Certain Row Interntonl Journl of Algebr, Vol 5, 011, no 30, 1481-1488 he Number of Rows whch Equl Certn Row Ahmd Hbl Deprtment of mthemtcs Fcult of Scences Dmscus unverst Dmscus, Sr hblhmd1@gmlcom Abstrct Let be X

More information

Advanced Machine Learning. An Ising model on 2-D image

Advanced Machine Learning. An Ising model on 2-D image Advnced Mchne Lernng Vrtonl Inference Erc ng Lecture 12, August 12, 2009 Redng: Erc ng Erc ng @ CMU, 2006-2009 1 An Isng model on 2-D mge odes encode hdden nformton ptchdentty. They receve locl nformton

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

For the percentage of full time students at RCC the symbols would be:

For the percentage of full time students at RCC the symbols would be: Mth 17/171 Chpter 7- ypothesis Testing with One Smple This chpter is s simple s the previous one, except it is more interesting In this chpter we will test clims concerning the sme prmeters tht we worked

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information