Autonomous Learning of High-Level States and Actions in Continuous Environments. Jonathan Mugan and Benjamin Kuipers, Fellow, IEEE

Size: px
Start display at page:

Download "Autonomous Learning of High-Level States and Actions in Continuous Environments. Jonathan Mugan and Benjamin Kuipers, Fellow, IEEE"

Transcription

1 Autonomous Lerning of High-Level Sttes nd s in Continuous Environments Jonthn Mugn nd Benjmin Kuipers, Fellow, IEEE Abstrct How cn n gent bootstrp up from low-level representtion to utonomously lern high-level sttes nd ctions using only domin-generl knowledge? In this pper we ssume tht the lerning gent hs set of continuous vribles describing the environment. There exist methods for lerning models of the environment, nd there lso exist methods for plnning. However, for utonomous lerning, these methods hve been used lmost exclusively in discrete environments. We propose ttcking the problem of lerning high-level sttes nd ctions in continuous environments by using qulittive representtion to bridge the gp between continuous nd discrete vrible representtions. In this pproch, the gent begins with brod discretiztion nd initilly cn only tell if the vlue of ech vrible is incresing, decresing, or remining stedy. The gent then simultneously lerns qulittive representtion (discretiztion) nd set of predictive models of the environment. These models re converted into plns to perform ctions. The gent then uses those lerned ctions to explore the environment. The method is evluted using simulted robot with relistic physics. The robot is sitting t tble tht contins block nd other distrctor objects tht re out of rech. The gent utonomously explores the environment without being given tsk. After lerning, the gent is given vrious tsks to determine if it lerned the necessry sttes nd ctions to complete them. The results show tht the gent ws ble to use this method to utonomously lern to perform the tsks. Index Terms unsupervised lerning, reinforcement lerning, qulittive resoning, intrinsic motivtion, ctive lerning. I. INTRODUCTION WE would like to build intelligent gents tht cn utonomously lern to predict nd control the environment using only domin-generl knowledge. Such gents could simply be plced in n environment, nd they would lern it. After they hd lerned the environment, the gents could be directed to chieve specified gols. The intelligence of the gents would free engineers from hving to design new gents for ech environment. These gents would be flexible nd robust becuse they would be ble to dpt to unnticipted spects of the environment. Designing such gents is difficult problem becuse the environment cn be lmost infinitely complex. This complexity mens tht n gent with limited resources cnnot represent nd reson bout the environment without describing it in simpler form. And possibly more importntly, the complexity of the environment mens tht it is chllenge to generlize from experience since ech experience will be in some respect J. Mugn is with the School of Computer Science, Crnegie Mellon University, Pittsburgh, PA 53 USA (e-mil: jmugn@cs.cmu.edu). B. Kuipers is with the Deprtment of Electricl Engineering nd Computer Science, University of Michign, Ann Arbor, MI 4809 USA (e-mil: kuipers@umich.edu). different. A solution to the difficulty of lerning in complex environment is for the gent to utonomously lern useful nd pproprite bstrctions. There re pproches tht do impressive lerning in specific scenrios [] [3]. But most utonomous lerning methods require discrete representtion nd given set of discrete ctions [4] [8]. Our gol is to enble utonomous lerning to be done in continuous environment nd to enble n gent to lern its first ctions. The Problem: The world is continuous nd infinitely complex (or effectively so). The dynmics of the world re lso continuous, nd dd further complexity. An gent cts in tht world by sending continuous low-level motor signls. Our gol is to show how such n gent cn lern hierrchy of ctions for cting effectively in such world. We simplify the problem of perception for our gent. Insted of deling with continuous high-bndwidth pixel-level perception, we ssume tht the gent hs trckers for smll number of moving objects (including its own body prts) within n otherwise sttic environment. These trckers provide the gent with perceptul strem consisting of reltively smll number of continuous vribles. Lerning such trckers nd the models tht provide dynmiclly updted model prmeters is done by methods outside the scope of this project, such s the Object Semntic Hierrchy [9]. Our solution to the problem of lerning hierrchy of ctions in continuous environments is the Qulittive Lerner of nd Perception, QLAP. A. The Qulittive Lerner of nd Perception, QLAP QLAP hs two mjor processes: () Modeling Events: An event is qulittive chnge in the vlue of some stte vrible (Section II). QLAP strts by identifying contingencies between events: situtions where the observtion of one event E mens tht nother event E is more likely to occur soon fter. QLAP serches for improved descriptions of the conditions for contingencies, trying to find sufficiently relible simple models tht predict the results of ctions (Section III). This is done by introducing new distinctions into the qulittive bstrctions of the domins of prticulr vribles, nd by identifying dditionl dependencies on context vribles, mking predictions more relible. These improved models of the forwrd dynmics of the environment re represented s Portions of this mteril were previously presented t conferences [0] [4]. This journl rticle is bsed on the first uthor s Ph.D. thesis [5] nd extends the method nd provides both more unified description nd more detiled evlution.

2 dynmic Byesin networks (DBNs). This process is depicted in Figure. () Modeling s: Define n ction to be the occurrence of prticulr event: prticulr qulittive chnge to some vrible. A relible DBN model tht leds to tht event cn be trnsformed (by fmilir RL methods) into pln for ccomplishing tht consequent event by mens of chieving ntecedent events, nd thus for crrying out tht ction. Since the pln thus consists of embedded ctions, we get nturl hierrchicl structure on ctions, vi the plns vilble to crry them out (Section IV). A hierrchy of ctions nd plns must ground out in motor ctions: qulittive chnges to the vlues of motor vribles tht the gent cn crry out simply by willing them. QLAP ensures tht its higher-level ctions nd plns hve this grounding in motor ctions becuse it lerns everything utonomously, strting from rndom explortion of the consequences of setting its motor vribles (i.e. motor bbbling ). The process of modeling ctions is depicted in Figure. To perform explortion nd lerning, the two processes of Modeling Events nd Modeling s run continuously s the gent cts in the world (Section V). Those ctions cn be driven initilly by rndom motor bbbling. However, they cn be driven by utonomous explortion through vrious intrinsic motivtion drives. They cn lso be driven by explicit coching, or by ply posing vrious gols nd mking plns to chieve them. QLAP uses qulittive representtion (Section II) to discretize quntittive (continuous) vribles. The quntittive representtion includes importnt but implicit distinctions tht the gent needs to cpture, which we refer to s nturl joints. However, the quntittive representtion is so complex tht inference is intrctble. Therefore, we need n bstrction to discrete representtion. But if we pick n bstrction in dvnce, before we understnd the domin, we re likely to discrd distinctions we need. So to get the best of both worlds, QLAP dynmiclly constructs qulittive representtion. QLAP strts by constructing n overly-bstrct qulittive description. Non-deterministic contingencies drw the gent s ttention to qulittive sttes in this representtion tht need further distinctions. We cn then use sttisticl methods like Fyyd nd Irni [6] to identify new distinctions, letting experience with the environment tell us how to chnge the grnulrity of the qulittive bstrction. QLAP does this in greedy, hill-climbing wy, dding just enough qulittive distinctions to llow resonbly relible predictions. (After the representtions nd inferences re explined, Algorithm in Section V gives high-level description of the QLAP lgorithm. See [7] for video explining QLAP.) B. Contributions To the best of our knowledge, QLAP is the only lgorithm tht lerns sttes nd hierrchicl ctions through utonomous explortion in continuous, dynmic environments with continuous motor commnds. For the field of Autonomous Mentl Development, QLAP provides method for developing gent to lern its first temporlly-extended ctions nd to lern more complex ctions on top of previously-lerned ctions. time Imges () (b) () Models re converted into plns. Continuous vribles (c) Discrete vribles (e) (d) Models of the environment Feedbck from model to discretiztion Fig. : Perception in QLAP. (c) s nd plns re put together into hierrchy. Low-level motor commnds Q( s, ) ( s) rgmx Q( s, ) Q( s, ) ( s) rgmx Q( s, ) Q( s, ) ( s) rgmx Q( s, ) Q( s, ) ( s) rgmx Q( s, ) Q s Q( s, ) ( s) Q( s, ) ( s) rgmx (, ) rgmx Q( s, ) (b) Plns re different wys to do ctions. Q( s, ) ( s) rgmx Q( s, ) Q( s, ) ( s) Q( s, ) ( s) rgmx Q( s, ) Q( s, ) ( s) rgmx Q( s, ) Q( s, ) ( s) rgmx Q( s, ) rgmx Q( s, ) Fig. : s in QLAP. Q( s, ) ( s) Q (, ) n s ( s) rgmx Q( s, ) n rgmx Q( s, ) n The relted field of reinforcement lerning is bout enbling n gent to lern from experience to mximize rewrd signl [8]. QLAP ddresses three chllenges in reinforcement lerning: () continuous sttes nd ctions, () utomtic hierrchy construction, nd (3) utomtic genertion of reinforcement lerning problems. Continuous sttes nd ctions re chllenge becuse it is hrd to know how to generlize from experience since no two sttes re exctly like. There exist mny function pproximtion methods, nd other methods tht use rel vlues [3], [9] [3], but QLAP provides method for discretizing the stte nd ction spce so tht the discretiztion corresponds to the nturl joints in the environment. Finding these nturl joints llows QLAP to use simple lerning lgorithms while still representing the complexity of the environment. Lerning of hierrchies cn enble n gent to explore the spce more effectively becuse it cn ggregte smller ctions into lrger ones. QLAP cretes hierrchicl set of ctions from continuous motor vribles. Currently, most reinforcement lerning problems must be designed by the humn experimenter. QLAP utonomously cretes reinforcement lerning problems s prt of its developmentl progression. Section II discusses the qulittive representtion used to discretize the continuous dt. Section III explins how QLAP

3 3 lerns models of events, nd Section IV describes how models of events re converted into ctions. Section V discusses how the QLAP gent explores nd lerns bout the world. Experimentl results re presented in Section VI, nd the pper concludes with discussion (Section VII), n overview of relted work (Section VIII), nd summry (Section IX). II. QUALITATIVE REPRESENTATION A qulittive representtion llows n gent to bridge the gp between continuous nd discrete vlues. It does this by encoding the vlues of continuous vribles reltive to known lndmrks [4]. The vlue of continuous vrible cn be described qulittively. Its qulittive mgnitude is described s equl to lndmrk vlue or s in the open intervl between two djcent lndmrk vlues. Its qulittive direction of chnge cn be incresing, stedy, or decresing. Becuse lndmrk is intended to represent n importnt vlue of vrible where the system behvior my chnge qulittively, qulittive representtion llows the gent to generlize nd to focus on importnt events. A. Lndmrks A lndmrk is symbolic nme for point on number line. Using lndmrks, QLAP cn convert continuous vrible ṽ with n infinite number of vlues into qulittive vrible v with finite set of qulittive vlues Q(v) clled quntity spce [4]. A quntity spce Q(v) = L(v) I(v), where L(v) = {v,, v n} is totlly ordered set of lndmrk vlues, nd I(v) = {(, v ), (v, v ),, (v n, + )} is the set of mutully disjoint open intervls tht L(v) defines in the rel number line. A quntity spce with two lndmrks might be described by (v, v ), which implies five distinct qulittive vlues, Q(v) = {(, v ), v, (v, v ), v, (v, + )}. This is shown in Figure 3. v v v v v v v v ( v),,,,,,, Fig. 3: Lndmrks divide the number line into discrete set of qulittive vlues. QLAP perceives the world through set of continuous input vribles nd ffects the world through set of continuous motor vribles. For ech continuous input vrible ṽ, two qulittive vribles re creted: discrete vrible v(t) tht represents the qulittive mgnitude of ṽ(t), nd discrete vrible v(t) tht represents the qulittive direction of chnge of ṽ(t). Also, qulittive vrible u(t) is creted for ech continuous motor vrible ũ. 3 The result of these trnsformtions is three types of qulittive vribles tht the gent cn QLAP cn hndle discrete (nominl) input vribles. See [5, pp. A] for detils. 3 When the distinction between motor vribles nd non-motor vribles is unimportnt, we will refer to the vrible s v. use to ffect nd reson bout the world: motor vribles, mgnitude vribles, nd direction of chnge vribles. The properties of these vribles re shown in Tble I. TABLE I: Types of Qulittive Vribles Type of Vrible Initil Lndmrks Lern Lndmrks? motor {0} yes mgnitude {} yes direction of chnge {0} no Ech direction of chnge vrible v hs single intrinsic lndmrk t 0, so its quntity spce is Q( v) = {(, 0), 0, (0, + )}, which cn be bbrevited s Q( v) = {[ ], [0], [+]}. Motor vribles re lso given n initil lndmrk t 0. Mgnitude vribles initilly hve no lndmrks, treting zero s just nother point on the number line. Initilly, when the gent knows of no meningful qulittive distinctions mong vlues for ṽ(t), we describe the quntity spce with the empty list of lndmrks, {}, s Q( v) = {(, + )}. However, the gent cn lern new lndmrks for mgnitude nd motor vribles. Ech dditionl lndmrk llows the gent to perceive or ffect the world t finer grnulrity. B. Events If is qulittive vlue of qulittive vrible A, mening Q(A), then the event A t is defined by A(t ) nd A(t) =. Tht is, n event tkes plce when discrete vrible A chnges to vlue t time t, from some other vlue. We will often drop the t nd describe this simply s A. We will lso refer to n event s E when the vrible nd vlue involved re not importnt, nd we use the nottion E(t) to indicte tht event E occurs t time t. For mgnitude vribles, A t is relly two possible events, depending on the direction tht the vlue is coming from. If t time t, A(t) <, then we describe this event s A t. Likewise, if t time t, A(t) >, then we describe this event s A t. However, for ese of nottion, we generlly refer to the event s A t. We lso sy tht event A t is stisfied if A t =. III. MODELING EVENTS There re mny methods for lerning predictive models in continuous environments. Such models hve been lerned, for exmple, using regression [3], [9] [], neurl networks [], nd Gussin processes [3]. But s described in the introduction, we wnt to brek up the environment nd represent it using qulittive representtion. In discretized environment, dynmic Byesin networks (DBNs) re convenient wy to encode predictive models. Most work on lerning DBNs lern network to predict ech vrible t the next timestep for ech ction, e.g. [5] [7]. However, QLAP lerns the set of ctions, nd QLAP works in environments where events my tke more thn one timestep. QLAP lerns two different types of DBN models. The first type of DBN models re those tht predict events on chnge vribles (chnge DBNs). The second type of DBN models

4 4 re those for reching mgnitude vlues (mgnitude DBNs); Section III-E. To lern chnge DBNs, QLAP uses novel DBN lerning lgorithm. Given the current discretiztion, QLAP trcks sttistics on ll pirs of events to serch for contingencies (Section III-A) where n ntecedent event leds to consequent event. When such contingency is found, QLAP converts it to DBN with the ntecedent event s the prent vrible nd the consequent event s the child vrible. QLAP dds context vribles to the DBN one t time s they mke the DBN more relible. For ech DBN, QLAP lso serches for new discretiztion tht will mke the DBN more relible. This new discretiztion then cretes new possible events nd llows new DBNs to be lerned. This method is outlined in Figure 4. A. Serching for Contingencies The serch for chnge DBNs begins with serch for contingencies. A contingency represents the knowledge tht if the ntecedent event occurs, then the consequent event will soon occur. An exmple would be if you flip light switch, then the light will go off. QLAP serches for contingencies by trcking sttistics on pirs of events E, E nd extrcting those pirs into contingency where the occurrence of event E indictes tht event E is more likely to occur soon fter E thn it would otherwise. ) Definition: To define contingencies in continuous environment, we hve to discretize both vrible vlues nd time. To discretize vrible vlues, we crete specil Boolen vrible event(t, X, x). For clrity, we cn rewrite event(t, X, x) s event(t, X x). Vrible event(t, X x) is true if event X t x occurs event(t, X x) X t x () To discretize time, we use time window. We define the Boolen vrible soon(t, Y, y) (rewritten s soon(t, Y y)) tht is true if event Y t y occurs within time window of length k soon(t, Y y) t [t t < t + k event(t, Y y)] () (The length of the time window k is lerned by noting how long it tkes for motor commnds to be observed s chnges in the world, see [5, pp. C].) With these vribles, we define contingency s event(t, X x) soon(t, Y y) (3) which represents the proposition tht if the ntecedent event X x occurs, then the consequent event Y y will occur within k timesteps. ) The Pirwise Serch: QLAP looks for contingencies using pirwise serch by trcking sttistics on pirs of events X x nd Y y to determine if the pir is contingency. QLAP lerns contingency E E if when the event E occurs, then the event E is more likely to soon occur thn it would hve been otherwise P r(soon(t, E ) E (t)) > P r(soon(t, E )) (4) ntecedent events () Pirwise serch for contingencies consequent events (b) Found contingency converted to DBN ntecedent event event(, t X, x) consequent event soonty (,, y) (d) Lndmrks refine pirwise serch (c) Add context vribles nd lndmrks event(, t X, x) V() t V () n t context vribles soonty (,, y) Fig. 4: () Do pirwise serch for contingencies tht use one event to predict nother. The ntecedent events re long the y-xis, nd the consequent events re long the x-xis, The color indictes the probbility tht the consequent event will soon follow the ntecedent event (lighter corresponds to higher probbility). When the probbility of the consequent event is sufficiently high, it is converted into contingency (yellow). (b) When contingency is found, it is used to crete DBN. (c) Once DBN is creted, context vribles re dded to mke it more relible. (d) The DBN cretes self-supervised lerning problem to predict when the consequent event will follow the ntecedent event. This llows new lndmrks to be found. Those lndmrks crete new events for the pirwise serch. (Best viewed in color.) where P r(soon(t, E )) is the probbility of event E occurring within rndom window of k timesteps. Specificlly, the contingency is lerned when P r(soon(t, E ) E (t)) P r(soon(t, E )) > θ pen (5) where θ pen = 0.05 is penlty term whose vlue ws determined by experimenttion. This is n importnt prmeter. If it is set too smll, then too mny contingencies my be lerned, nd if it is set too lrge, then it might be difficult to lern new contingencies. QLAP performs this serch considering ll pirs of events, excluding those where ) The consequent event is mgnitude vrible (since these re hndled by the models on mgnitude vribles s discussed in Section III-E). ) The consequent event is on direction of chnge vrible to the lndmrk vlue [0] (since we wnt to predict chnges tht result in moving towrds or wy from lndmrks). 3) The mgnitude vrible corresponding to the direction of chnge vrible on the consequent event mtches the mgnitude vrible on the ntecedent event (since we wnt to lern how the vlues of vribles re ffected by other vribles). B. Converting Contingencies to DBNs In this section we describe how QLAP converts contingency of the form event(t, X x) soon(t, Y y) into dynmic Byesin network. A dynmic Byesin network (DBN) is compct wy to describe probbility distribution over time-series dt. For prticulr vrible, the DBN indictes the other vribles tht ffect its vlue. Dynmic

5 5 ) QLAP nottion r : E E with v, v,, vn b) DBN representtion ntecedent event context vribles event(, t E) v() t v () t v () n t consequent event soont (, E) Fig. 5: Correspondence between QLAP DBN nottion nd trditionl grphicl DBN nottion. () QLAP nottion of DBN. Context C consists of set of qulittive vribles. Event E is n ntecedent event nd event E is consequent event. (b) Trditionl grphicl nottion. Boolen prent vrible event(t, E ) is true if event E occurs t time t. Boolen child vrible soon(t, E ) is true if event E occurs within k timesteps of t. The other prent vribles re the context vribles in C. The conditionl probbility tble (CPT) gives the probbility of soon(t, E ) for ech vlue of its prents. For ll elements of the CPT where event(t, E ) is flse, the probbility is undefined. The remining probbilities re lerned through experience. Byesin networks llow QLAP to identify situtions when the contingency will be relible. ) Adding Context: The consequent event my only follow the ntecedent event in certin contexts, so we lso wnt to lern set of qulittive context vribles C tht predict when event Y y will soon follow X x. This cn be represented s DBN r of the form r = C : event(t, X x) soon(t, Y y) (6) which we bbrevite to r = C : X x Y y (7) In this nottion, event E = X x is the ntecedent event, nd event E = Y y is the consequent event. We cn further bbrevite this QLAP DBN r s r = C : E E (8) Figure 5 shows the correspondence between this nottion nd stndrd DBN nottion. Becuse we consider only cses where event E occurs, we cn tret the conditionl probbility tble (CPT) of DBN r s defined over the probbility tht event Y y will soon follow event X x for ech qulittive vlue in context C. If the ntecedent event does not occur, then the CPT does not define the probbility for the consequent event occurring. If the ntecedent event occurs, nd the consequent event does follow soon fter, we sy tht the DBN succeeds. Likewise, if the ntecedent event occurs, nd the consequent event does not follow soon fter, we sy tht the DBN fils. These models re referred to s dynmic Byesin networks nd not simply Byesin networks becuse we re using them to model dynmic system. An exmple of DBN lerned by QLAP is shown in Figure 6. u (300, ) x h x h x Pr h [ ] x CPT (,.5) [.5] (.5,.5) [.5] (.5, ) Fig. 6: An exmple DBN. This DBN sys tht if the motor vlue of u x becomes greter thn 300, nd the loction of the hnd, h x, is in the rnge.5 h x <.5, then the vrible ḣ x will most likely soon become [+] (the hnd will move to the right). (The limits of movement of h x re.5 nd +.5, nd so the prior of 0.5 domintes outside of tht rnge.) The set C = {v,..., v n } consists of the vribles in the conditionl probbility tble (CPT) of the DBN r = C : E E. The CPT is defined over the product spce Q(C) = Q(v ) Q(v ) Q(v n ) (9) Since C is subset of the vribles vilble to the gent, Q(C) is n bstrction of the overll stte spce S Q(C) Q(v ) Q(v ) Q(v m ) = S (0) where m n is the number of vribles vilble to the gent. 4 ) Nottion of DBNs: We define the relibility for q Q(C) for DBN r s rel(r, q) = P r(soon(t, E ) E (t), q) () which is the probbility of success for the DBN for the vlue q Q(C). (Note tht we my lso use rel(r, s) to men the relibility of DBN r in stte s.) These probbilities come from the CPT nd re clculted using observed counts. The best relibility of DBN gives the highest probbility of success in ny context stte. We define the best relibility brel(r) of DBN r s brel(r) = mx rel(r, q) () q Q(C) (We require 5 ctul successes for q Q(C) before it cn be considered for best relibility.) By incresing the best relibility brel(r) we increse the relibility of DBN r. And we sy tht DBN r is sufficiently relible if t ny time brel(r) > θ SR where θ SR = Like θ pen, it is importnt tht the vlue of θ SR is not too big or too smll. A thorough evlution of robustness remins to be done. However, s experimentl evidence tht QLAP is robust, neither this prmeter nor ny other prmeters hd to be chnged for ny environment evluted in this pper or in [5]. The entropy of DBN r = C : E E is mesure of how well the context C predicts tht event E will soon follow event E. Since we only consider the timesteps where event E occurs, we define the entropy H(r) of DBN r s H(r) = H(soon(t, E ) q, E (t))p r(q E (t)) (3) q Q(C) By decresing the entropy H(r) of DBN r, we increse the determinism of DBN r. 4 In our experiments, we limit n to be.

6 6 C. Adding Context Vribles QLAP itertively dds context vribles to DBNs to mke them more relible nd deterministic. This hillclimbing process of dding one context vrible t time is inspired by the mrginl ttribution process in Drescher s [5] schem mechnism. To implement mrginl ttribution, QLAP initilly hillclimbs on best relibility brel(r) becuse the DBN models will eventully be used for plnning. In our experiments, we found this to be necessry to mke good plns becuse we wnt to find some context stte in which the model is relible. This llows the gent to set subgol getting to tht relible context stte. However, we lso wnt the model to be deterministic, so fter model is sufficiently relible, QLAP hillclimbs on reduction of entropy H(r). Essentilly, if r is sufficiently relible, then n improved DBN r must provide reduction in entropy. If r is not sufficiently relible, then r must provide n increse in best relibility. The required mount of reduction in entropy or increse in best relibility is greter when the context for r is lrger thn the context for r. (See [5, sect. 4.3.].) The hillclimbing procedure is not completely greedy. QLAP lso considers the possibility tht the DBN needs fewer context vribles. So the hillclimbing lgorithm first sets side the current context C. Then, QLAP cretes n entirely new context C by hillclimbing by dding vribles to improve the DBN. QLAP then compres the new context C with the originl context C to see if the new context is n improvement over r with the originl context C. See [5, sect. 4.3.] for detils. D. Lerning New Lndmrks Lerning new lndmrks llows the gent to perceive nd represent the world t higher resolution. This increse in resolution llows existing models to be mde more relible nd llows new models to be lerned. QLAP hs two mechnisms for lerning lndmrks. The first is to lern new lndmrk to mke n existing DBN more relible. The second is to lern new lndmrk tht predicts the occurrence of n event. In both cses, when new lndmrk is lerned, it pplies to ll DBNs, nd ech DBN reclcultes the relibility sttistics of tht vrible. To reclculte these sttistics, for ech DBN, QLAP sves the rel vlue of ech vrible the lst 00 times the ntecedent event of the DBN occurred (lthough in the implementtion, the dt trce is sved nd the DBN just stores the line number). ) New Lndmrks on Existing DBNs: QLAP lerns new lndmrks bsed on previously-lerned models (DBNs). For ny prticulr DBN, predicting when the consequent event will follow the ntecedent event is supervised lerning problem. This is becuse once the ntecedent event occurs, the environment will determine if the consequent event will occur. QLAP tkes dvntge of this supervisory signl to lern new lndmrks tht improve the predictive bility of DBNs. For ech DBN r = C : E E, QLAP serches for lndmrk on ech mgnitude nd motor vrible v in ech open intervl q Q(v). Finding this lndmrk is done using the informtion theoretic method of Fyyd nd Irni [6]. The best cndidte is then dopted if it hs sufficient informtion gin, nd the product of the informtion gin nd the probbility of being in tht intervl P r(v = q) is sufficient, nd the dopted lndmrk would improve DBN r. See [5, sect. 4.4] for detils. ) New Lndmrks to Predict Events: QLAP lso lerns new lndmrks to predict events. QLAP needs this second lndmrk lerning process becuse some events my not be preceded by nother known event. An exmple of this is when n object moves becuse it is hit by nother object. In this cse, it needs to lern tht distnce of 0 between objects is significnt, becuse it cuses one of the objects to move. To find these lndmrks, QLAP needs to compre the distribution of vrible ṽ just before n event E with the overll distribution of ṽ. Since we re looking t the continuous vlues of vribles, we use bins to estimte the distributions. We let the bin size for ech bin bṽ for vrible ṽ be two times the verge chnge vlue (excluding the first timestep t = 0). A chnge is determined to occur if t > nd ṽ(t) ṽ(t ) > QLAP then cretes lndmrk v for bin bṽ when P r(ṽ t bṽ E(t)) P r(ṽ t bṽ) > θ E (4) where θ E = 0.30, nd bṽ corresponds to the bin tht hs the highest vlue of P r(ṽ t bṽ E(t)) P r(ṽ t bṽ). The rnge [lb, ub] of the lndmrk corresponding to bṽ re the bounds of the bucket bṽ. These probbilities re computed under two different normliztion conditions. The first is tht QLAP normlizes over three buckets in ech direction from bṽ. This llows QLAP to find locl spikes in differences of the probbility distributions. The second normliztion condition is to normlize the probbility over ll of the buckets. QLAP first looks for lndmrk using the first normliztion condition. If none is found, QLAP looks for lndmrk using the second normliztion condition. E. Mgnitude DBN Models A mgnitude vlue cn be less thn, greter thn, or equl to qulittive vlue. We wnt to hve models for vrible ṽ reching qulittive vlue q. Intuitively, if we wnt v = q nd currently v(t) < q, then we need to set v = [+]. This section describes how this process is modeled. For ech mgnitude vrible v nd ech qulittive vlue q Q(v), QLAP cretes two models, one tht corresponds to pproching the vlue v = q from below on the number line, nd nother tht corresponds to pproching v = q from bove. For ech mgnitude vrible Y nd ech vlue y Q(Y ), these models cn be written s r + = C : Ẏ [+] Y y (5) r = C : Ẏ [ ] Y y (6) DBN r + mens tht if Y t < y nd Ẏ = [+], then eventully event Y y will occur (DBN r is nlogous in this discussion). As the nottion suggests, we cn tret Ẏ [+] Y y similrly to how we tret contingency, nd we cn lern context vribles for when this model will be relible. These models re bsed on the test-operte-test-exit (TOTE) models of Miller et l. [8].

7 7 Mgnitude DBNs do not use the soon predicte becuse how long it tkes to rech qulittive vlue is determined by how fr wy the vrible is from tht vlue. Insted, sttistics re gthered on mgnitude DBNs when the gent sets Ẏ = [+] to bring bout Y y. DBN r + is successful if Y y occurs while Ẏ = [+], nd it fils if the gent is unble to mintin Ẏ = [+] long enough to bring bout event Y y. Like chnge DBNs, mgnitude DBNs will be used in plnning s described in Section IV. 5 IV. MODELING ACTIONS QLAP uses the lerned DBN models to crete plns for performing ctions. There re two brod plnning frmeworks within AI: STRIPS-bsed gol regression [9], nd Mrkov Decision Process (MDP) plnning [30]. Gol regression hs the dvntge of working well when only some of the vribles re relevnt, nd MDP plnning hs the dvntge of providing principled frmework for probbilistic ctions [3]. Plnning in QLAP ws designed to exploit the best of both frmeworks. A brod principle of QLAP is tht the gent should lern fctored model of the environment to mke lerning nd plnning more trctble. QLAP uses MDP plnning to pln within ech lerned fctor nd uses gol regression to stitch the fctors together. QLAP defines n ction to chieve ech qulittive vlue of ech vrible. Once QLAP lerns qulittive vlue on motor vrible, the gent considers chieving tht vlue to be primitive ction nd cn just set tht motor vrible to rel vlue (determined rndomly) within the rnge specified by the qulittive vlue. s on non-motor vribles re performed using plns. s cn hve multiple plns, nd ech pln is different wy to perform n ction nd ech pln is lerned from DBN. The ctions tht cn be clled by ech pln re QLAP ctions to bring bout qulittive events. This stitches the plns together nd leds to n ction hierrchy becuse the plns to perform one QLAP ction cll other QLAP ctions s if they were primitive ctions. This bility to tret highlevel ctions s primitive ctions is wht mkes the ction network tht QLAP lerns hierrchicl. QLAP cn simply cll n ction such s hit the block right within pln, nd even though such n ction my require hundreds of lowlevel timesteps nd will be performed differently in different situtions, QLAP does not hve to consider the detils of how this ction is crried out while mking pln to use this ction s prt of pln to perform some, yet, higher-level ction. This hierrchicl ction network encodes ll of the lerned skills of the gent. See Figure 7. In this section, we define ctions nd plns in QLAP. We discuss how chnge nd mgnitude DBNs re converted into plns. We then discuss how QLAP cn lern when vribles need to be dded to the stte spce of pln, nd we conclude with description of how ctions re performed in QLAP. 5 While context vribles re lerned on mgnitude DBNs, experiments showed tht lndmrks lerned from these models were not useful to the gent, so these models re not used for lerning lndmrks in QLAP. X x Z W (c) Y y Q( s, ) i ( s) i rgmx Q( s, ) () i (b) Yy Q ( s, ) j ( s) j rgmx Q ( s, ) j Q ( s, ) k ( s) k rgmx Q ( s, ) k Xx Q( s, ) l ( s) l rgmx Q( s, ) Fig. 7: Plnning in QLAP. () QLAP defines n ction for ech qulittive vlue of ech vrible. This ction is to bring vrible Y to vlue y. (b) Ech ction cn hve multiple plns, which re different wys to perform the ction. Ech pln comes from n MDP. The vlue Q i (s, ) is the cumultive, expected, discounted rewrd of choosing ction in stte s in MDP M i. Given the function Q i, the policy π i cn be computed. (c) Plns re creted from models. The stte spce for n MDP is the cross product of the vlues of X, Y, Z, nd W from the model (lthough more cn be dded if needed, see Section IV-D). (d) The ctions for ech pln re QLAP ctions to move to different loctions in the stte spce of the MDP. This is reminiscent of gol-regression. Above, we see tht one of the ctions for M i is to cll the QLAP ction to bring bout X x. This link results from event X x being the ntecedent event of the DBN model to bring bout event Y y. A. s nd Plns in QLAP s re how the QLAP gent brings bout chnges in the world. An ction (v, q) is creted for ech combintion of qulittive vrible v nd qulittive vlue q Q(v). An ction (v, q) is clled by the gent nd is sid to be successful if v = q when it termintes. (v, q) fils if it termintes with v q. Sttistics re trcked on the relibility of ctions. The relibility of n ction is denoted by rel(), which gives the probbility of succeeding if it is clled. When n ction is clled, the ction chooses pln to crry it out. Ech pln implements only one ction, nd n ction cn hve multiple different plns where ech pln is different wy to perform the ction. This gives QLAP the dvntge of being ble to use different plns in different situtions insted of hving one big pln tht must cover ll situtions. As with ctions, we sy tht pln ssocited with ction (v, q) is successful if it termintes with v = q nd fils if it termintes with v q. Models lerned by QLAP cn result in plns. Ech pln is represented s policy π i over n MDP M i = S i, A i, T i, R i. A Mrkov Decision Process (MDP) is frmework for temporl decision mking [30]. Since QLAP lerns multiple models of the environment, QLAP lerns multiple MDPs. And s with models, ech MDP represents smll prt of the environment. The ctions used within ech MDP re QLAP ctions. And since these ctions, in turn, use plns tht cll other QLAP ctions, the ctions nd plns of (d) l

8 8 QLAP re tied together, nd plnning tkes the flvor of gol regression. We cn think of this policy π i s being prt of n option o i = I i, π i, β i where I i is set of initition sttes, π i is the policy, nd β i is set of termintion sttes or termintion function [3]. An option is like subroutine tht cn be clled to perform tsk. Options in QLAP follow this pttern except tht π i is policy over QLAP ctions insted of being over primitive ctions or other options. We use the terminology of pln being n option becuse options re common in the literture, nd becuse QLAP tkes dvntge of the non-mrkov termintion function β i tht cn terminte fter fixed number of timesteps. However, plns in QLAP differ from options philosophiclly becuse options re usully used with the ssumption tht there is some underlying lrge MDP. QLAP ssumes no lrge, underlying MDP, but rther cretes mny little, independent MDPs tht re connected by ctions. Ech smll MDP M i creted by QLAP hs one policy π i. B. Converting Chnge DBNs to Plns When DBN of the form r i = C : X x Y y becomes sufficiently relible it is converted into pln to bring bout Y y. 6 This pln cn then be clled by the ction (Y, y). This pln is bsed on n MDP M i (there is one-to-one correspondence between MDPs nd plns). In this section, we will first describe how QLAP cretes MDP M i from DBN r i. We will then describe how QLAP lerns policy for this MDP. And finlly, we will describe how this policy is mpped to n option. ) Creting the MDP from the DBN: QLAP converts DBNs of the form r i = C : X x Y y to n MDP of the form M i = S i, A i, T i, R i. The stte spce S i consists of the Crtesin product of the vlues of the vribles in DBN r i. The ctions in A i re the QLAP ctions to bring the gent to the different sttes of S i. The trnsition function T i comes from the CPT of r i nd the relibility rel() of different ctions A i. The rewrd function simply penlizes ech ction with cost of nd gives rewrd of 0 for reching the gol of Y = y. The vlue 0 for reching the gol nd the cost of were chosen through experimenttion. ) Lerning Policy for the MDP: QLAP uses three different methods to lern the policy π i for ech MDP M i. () QLAP uses the trnsition function T i nd the rewrd function R i to lern the Q-tble Q i using dynmic progrmming with vlue itertion [33]. The Q-tble vlue Q i (s, ) represents the cumultive, expected discounted rewrd of tking ction A i in stte s S i. The policy π i then follows directly from Q i becuse for ech stte s, the gent cn choose ction tht mximizes Q i (s, ) (or it cn choose some other ction to explore the world). () As the gent further experiences the world, policy π i is updted using the temporl difference 6 There re some restrictions on this to limit resource usge. For exmple, ctions tht cn be relibly performed successfully do not get more plns, the gent must be ble to bring bout the ntecedent event X x with sufficient relibility, nd ech ction my hve t most three plns. See [5, sect. 6.]. lerning method Srs(λ) [33]. (3) And s the gent gthers more sttistics, its trnsition model my be improved. QLAP lso updtes the model by occsionlly running one loop of the dynmic progrmming. 3) Mpping the Policy to n Option: An option hs the form o i = I i, π i, β i. We hve described how the policy π i is lerned. When n option o i is creted for DBN r i = C : X x Y y, the set of initition sttes I i is the set of ll sttes. The termintion function β i termintes option o i when it succeeds (the consequent event occurs) or when it exceeds resource constrints (300 timesteps, or 5 ction clls) or when the gent gets stuck. The gent is considered stuck if none of the self vribles (see Section V-C for discussion of how the gent lerns which vribles re prt of self ) or vribles in S i chnge in 0 timesteps. C. Converting Mgnitude DBNs into Plns As discussed in Section III, ech qulittive vlue y Q(Y ) on ech mgnitude vrible Y hs two DBNs r + = C : Ẏ [+] Y y (7) r = C : Ẏ [ ] Y y (8) tht correspond to chieving the event Y y from below nd bove the vlue Y = y, respectively. Both of these DBN models re converted into plns to chieve Y y. And like with chnge DBNs, ech mgnitude DBN r i is converted into pln bsed on n MDP M i. For MDP M i, the stte spce S i, the set of vilble ctions A i, the trnsition function T i, nd the rewrd function R i re computed similrly s they re for chnge plns. The result of this is tht ech ction (v, q) on mgnitude vrible hs two plns. There is one pln to perform the ction when v < q, nd nother pln to perform the ction when v > q. See [5, sect. 5.3] for further detils. D. Improving the Stte Spce of Plns The stte spce of pln consists of the Crtesin product of the quntity spces Q(v) of the vribles in the model from which it ws creted. But wht if there re vribles tht were not prt of the model, but tht re nonetheless necessry to successfully crry out the pln? To lern when new vribles should be dded to plns, QLAP keeps sttistics on the relibility of ech pln nd uses those sttistics to determine when vrible should be dded. ) Trcking Sttistics on Plns: QLAP trcks sttistics on plns the sme wy it does when lerning models. For chnge DBN models, QLAP trcks sttistics on the relibility of the contingency. For mgnitude DBN models, QLAP trcks sttistics on the bility of vrible to rech qulittive vlue if moving in tht direction. For plns, QLAP trcks sttistics on the gent s bility to successfully complete the pln when clled. To trck these sttistics on the probbility of the pln o i = I i, π i, β i of MDP M i being successful, QLAP cretes second-order DBN r o i = C oi : cll(t, o i ) succeeds(t, o i ) (9)

9 9 The child vrible of second-order DBN ro i is succeeds(t, o i ), which is true if option o i succeeds fter being clled t time t nd is flse otherwise. The prent vribles of ro i re cll(t, o i ) nd the context vribles in C oi. The Boolen vrible cll(t, o i ) is true when the option is clled t time t nd is flse otherwise. When creted, model ro i initilly hs n empty context, nd context vribles re dded in s they re for mgnitude nd chnge DBNs when they re found to mke the model more relible nd deterministic. The nottion for these models is the sme s for mgnitude nd chnge models: QLAP computes rel(o i ), brel(o i ), nd rel(o i, s) in prticulr sttes s. Therefore pln cn lso be sufficiently relible if t ny time brel(o i ) > θ SR where θ SR = ) Adding New Vribles to the Stte Spce: Second-order DBN llow the gent to identify other vribles necessry for the success of n option o i becuse those vribles will be dded to its context. Ech vrible tht is dded to ro i is lso dded to the stte spce S i of its ssocited MDP M i. For exmple, for pln creted from model r i = C : X x Y y, the stte spce S i is updted so tht S i = Q(C oi ) Q(C) Q(X) Q(Y ) (0) (vribles in more thn one of C oi, C, {X}, or {Y } re only represented once in S i ). For both mgnitude nd chnge options, n ction (v, q) where v Q(C oi ) is treted the sme wy s v Q(C). E. Performing s QLAP ctions re performed using plns, nd these plns cll other QLAP ctions. This leds to hierrchy of plns nd ctions. ) Clling nd Processing s: When n ction is clled, it chooses pln nd then strts executing the policy of tht chosen pln. Executing tht policy results in more QLAP ctions being clled, nd this process continues until motor ction is reched. When n ction (u, q) is clled on motor vrible u, then QLAP sends rndom motor vlue within the rnge covered by the qulittive vlue u = q to the body. This hierrchicl structure of ctions nd plns mens tht multiple ctions will be performed simultneously. Ech pln only keeps trck of wht ction it is currently performing. And when tht ction termintes, the next ction is clled bsed on the policy. So s the initil ction clled by the gent is being processed, the pth between tht initil ction nd motor ction continully chnges. ) Terminting s: An ction (v, q) termintes if v = q, in which cse it succeeds. It lso termintes if it fils. An ction fils if () it hs no plns, or (b) for every pln for this ction, the ction to bring bout the ntecedent event of the pln is lredy in the cll list, or (c) its chosen pln fils. Similr to n ction, pln to bring bout v = q termintes if v = q, in which cse it succeeds. It lso termintes if it fils. A pln to bring bout v = q fils if () the termintion function β is triggered by resource constrints, or (b) there is no pplicble ction in the current stte, or (c) the ction chosen by the policy is lredy in the cll list, or (d) the ction chosen by the policy immeditely fils when it is clled. V. EXPLORATION AND LEARNING The QLAP gent explores nd lerns utonomously without being given tsk. This utonomous explortion nd lerning rises mny issues. For exmple, how cn the gent decide wht is worth exploring? As the gent explores, it lerns new representtions. How cn it keep from lerning unnecessry representtions nd getting bogged down? Should the gent use the sme criteri for lerning ll representtions? Or should it tret some representtions s especilly importnt? And finlly, cn the gent lern tht some prts of the environment cn be controlled with high relibility nd low ltency so tht they cn be considered prt of self? Previous sections hve explined how QLAP lerns representtions tht tke the form of lndmrks, DBNs, plns, nd ctions. This section explins how lerning in QLAP unfolds over time. We first discuss how the gent explores the environment. We then discuss developmentl restrictions tht determine wht representtions the gent lerns nd the order in which it lerns them. We then discuss how QLAP pys specil ttention to gols tht re hrd to chieve. And finlly, we discuss how the gent lerns wht is prt of self. A. Explortion The QLAP gent explores the environment utonomously without being given tsk. Insted of trying to lern to do prticulr tsk, the gent tries to lern to predict nd control ll of the vribles in its environment. However, this rises difficulties becuse there might be mny vribles in the environment, nd some my be difficult or impossible to predict or control. This section explins how the gent determines wht should be explored nd the best wy to go bout tht explortion. Initilly, the gent motor bbbles for 0,000 timesteps by repetedly choosing rndom motor vlues nd mintining those vlues for rndom number of timesteps. After tht point, QLAP begins to prctice its lerned ctions. An outline of the execution of QLAP is shown in Algorithm. Note tht Algorithm references the deletion of DBNs nd plns. DBNs re deleted to sve resources nd this cn occur if the ction corresponding to the consequent event of the DBN becomes sufficiently relible nd tht DBN is not lredy pln, or if the DBN does not become sufficiently relible within 00,000 timesteps. There re mximum of three plns for n ction; pln cn be deleted if better pln is found to replce it, or if the relibility rel(o) flls below The gent continully mkes three types of choices during its explortion. These choices vry in time scle from corse to fine. The gent chooses: lerned ction (v, q) to prctice, the best pln o i for performing the ction (v, q), nd the ction bsed on policy π i for pln o i.

10 0 ) Choosing Lerned to Prctice: One method for choosing where to explore is to mesure prediction error nd then to motivte the gent to explore prts of the spce for which it currently does not hve good model. This form of intrinsic motivtion is used in [34] [36]. However, focusing ttention on sttes where the model hs poor prediction bility cn cuse the gent to explore spces where lerning is too difficult. Schmidhuber [37] proposed method whereby n gent lerns to predict the decrese in the error of the model tht results from tking ech ction. The gent cn then choose the ction tht will cuse the biggest decrese in prediction error. Oudeyer, Kpln, nd Hfner [38] pply this pproch with developing gent nd hve the gent explore regions of the sensory-motor spce tht re expected to produce the lrgest decrese in predictive error. Their method is clled Intelligent Adptive Curiosity (IAC). QLAP uses IAC to determine which ction to prctice. After the motor bbbling period of 0,000 timesteps, QLAP chooses motor bbbling ction with probbility 0., otherwise it uses IAC to choose lerned ction to prctice. Choosing lerned ction to prctice consists of two steps: () determine the set of pplicble ctions tht could be prcticed in the current stte s, nd () choose n ction from tht set. The set of pplicble ctions to prctice consists of the set of ctions tht re not currently ccomplished, but could be performed. For chnge ction, this mens tht the ction must hve t lest one pln. For mgnitude ction (v, q), this mens tht if v t < q then ( v, [+]) must hve t lest one pln (nd similrly for v t > q). QLAP chooses n ction to prctice by ssigning weight w to ech ction in the set of pplicble ctions. The ction is then chosen rndomly bsed on this weight w. The weights re ssigned using version of Intelligent Adptive Curiosity (IAC) [38] tht mesures the chnge in the gent s bility to perform the ction over time nd then chooses ctions where tht bility is incresing. ) Choosing the Best Pln to Perform n : When n ction is clled, it chooses pln to perform the ction. QLAP seeks to choose the pln tht is most likely to be successful in the current stte. To compre plns, QLAP computes weight w s o for ech pln o in stte s. To compute the weight w s o for pln o in stte s, QLAP computes the product of the relibility of the DBN r tht led to the pln rel(r, s) nd the relibility of the second-order DBN rel(o, s) so tht w s o = rel(r, s) rel(o, s) () To choose the pln to perform the ction, QLAP uses ɛ- greedy ction pln selection (ɛ = 0.05). With probbility ɛ, QLAP chooses the pln with the highest weight. And with probbility ɛ it chooses pln rndomly. To prevent loops in the clling list, pln whose DBN hs its ntecedent event lredy in the cll list is not pplicble nd cnnot be chosen. 3) Choosing n within Pln: Recll from Section IV tht QLAP lerns Q-tble for ech pln tht gives vlue for tking ech ction in stte s. Here gin, QLAP uses ɛ-greedy selection. With probbility ɛ, in stte s, QLAP chooses ction tht mximizes Q i (s, ), nd with probbility ɛ, QLAP chooses rndom ction. This ction selection method blnces explortion with exploittion [33]. Algorithm The Qulittive Lerner of nd Perception (QLAP) : for t = 0 : do : sense environment 3: convert input to qulittive vlues using current lndmrks 4: updte sttistics for lerning new contingencies 5: updte sttistics for ech DBN 6: if mod(t, 000) == 0 then 7: lern new DBNs 8: updte contexts on existing DBNs 9: delete unneeded DBNs nd plns 0: if mod(t, 4000) == 0 then : lern new lndmrks on events : else 3: lern new lndmrks on DBNs 4: end if 5: convert DBNs to plns 6: end if 7: if current explortion ction is completed then 8: choose new explortion ction nd ction pln 9: end if 0: get low-level motor commnd bsed on current qulittive stte nd pln of current explortion ction : pss motor commnd to robot : end for B. Trgeted Lerning Since QLAP cretes n ction for ech vrible nd qulittive vlue combintion, QLAP gent is fced with mny potentil ctions tht could be lerned. QLAP cn choose different ctions to prctice bsed on the lerning grdient, but wht bout the thresholds to lern predictive DBN models nd plns? Some ctions might be more difficult to lern thn others, so it seems resonble tht the requirements for lerning representtions tht led to lerning such ctions should be loosened. QLAP does trgeted lerning for difficult ctions. To lern pln for n ction chosen for trgeted lerning, QLAP ) Lowers the threshold needed to lern contingency. Recll from Section III-A, tht contingency is lerned when P r(soon(t, E ) E (t)) P r(soon(t, E )) > θ pen () where θ pen = If event E is chosen for trgeted lerning, QLAP mkes it more likely tht contingency will by lerned by setting θ pen = 0.0. ) Lowers the threshold needed to lern pln. Recll from Section IV-B tht one of the requirements to convert chnge DBN r into pln is tht brel(r) > θ SR = If event E is chosen for trgeted lerning, QLAP mkes it more likely tht DBN will be converted to pln by setting θ SR = 0.5.

11 This leves the question of when to use trgeted lerning of ctions. An event is chosen s gol for trgeted lerning if the probbility of being in stte where the event is stisfied is less thn 0.05; we cll such n event sufficiently rre. This is reminiscent of Bonrini et l. [39]. They consider desirble sttes to be those tht re rrely reched or re esily left once reched. C. Lerning Wht Is Prt of Self One step towrds tool use is mking objects in the environment prt of self so tht they cn be used to perform useful tsks. The representtion of self is strightforwrd in QLAP. A chnge vrible is prt of self if it cn be quickly nd relibly mnipulted. QLAP lerns wht is prt of self by looking for vribles tht it cn relibly control with low ltency. Mrjnovic [40] enbled robot to identify wht ws prt of self by hving the robot wve its rm nd hving the robot ssume tht the only thing moving in the scene ws itself. The work of Mett nd Fitzptrick [4] is similr but more sophisticted becuse it looks for opticl flow tht correltes with motor commnds of the rm. Gold nd Scssellti [4] use the time between giving motor commnd nd seeing movement to determine wht is prt of self. Our method for lerning self is similr to tht of Gold nd Scssellti, but we lern wht is prt of self while lerning ctions. A direction of chnge vrible v is prt of self if: ) the verge time it tkes for the ction to set v = [+] nd v = [ ] is less thn k, nd ) the ctions for both v = [+] nd v = [ ] re sufficiently relible. where k is how long it tkes for motor commnds to be observed s chnges in the world, see [5, pp. C]. VI. EVALUATION A. Evlution Environment The evlution environment is implemented in Breve [43] nd hs relistic physics. Breve simultes physics using the Open Dynmics Engine (ODE) [44]. The simultion consists of robot t tble with block nd floting objects. The robot hs n orthogonl rm tht cn move in the x, y, nd z directions. The environment is shown in Figure 8, nd the vribles perceived by the gent for the core environment re shown in Tble II. The block hs width tht vries between nd 3 units. The block is replced when it is out of rech nd not moving, or when it hits the floor. Ech timestep in the simultor corresponds to 0.05 seconds. I.e., 0 timesteps/second is equl to 00 timesteps/minute is equl to 7, 000 timesteps per hour. See [5, sect. 7.] for further detils. The robot cn grsp the block in wy tht is reminiscent of both the plmer reflex [45] nd hving sticky mitten [46]. The plmer reflex is reflex tht is present from birth until the ge 4-6 months in humn bbies. The reflex cuses the bby to close its hnd when something touches the plm. In the sticky mittens experiments, three-month-old infnts wore mittens covered with Velcro tht llowed them to more esily grsp objects. Grsping is implemented on the robot to llow it to grsp only when over the block. Specificlly, the block is grsped if the hnd nd block re colliding, nd the Eucliden D distnce from the center of the block in the x nd y directions is less thn hlf the width of the plm, 3/ =.5 units. In ddition to the core environment, QLAP is lso evluted with distrctor objects. This is done using the floting extension environment, which dds two floting objects tht the gent cn observe but cnnot interct with. The purpose of this environment is to evlute QLAP s bility to focus on lernble reltionships in the presence of unlernble ones. The objects flot round in n invisible box. The vribles dded to the core environment to mke the floting extension environment re shown in Tble III. () Not grsping (b) Grsping Fig. 8: The evlution environment (shown here with floting objects). Evluting utonomous lerning is difficult becuse there is no pre-set tsk on which to evlute performnce. The pproch we tke is to first hve the gent lern utonomously in n environment; we then evlute if the gent is ble to perform set of tsks. It is importnt to note tht during lerning the gent does not know on which tsks it will be evluted. B. Experimentl Conditions We compre the performnce of QLAP fter utonomous lerning with the performnce of directed reinforcement lerning lgorithm trined only on the evlution tsks This puts QLAP t disdvntge on the evlution tsks becuse QLAP is not informed of the evlution tsks nd QLAP lerns more thn the evlution tsks. We hope QLAP cn demonstrte developmentl lerning by getting better t the tsks over time, nd tht QLAP cn do s well s the directed lerner. Ech gent is evluted on three tsks. These re referred to s the core tsks. ) move the block The evlutor picks gol to move the block left ( T L = [+]), right ( T R = [ ]), or forwrd ( T T = [+]). The gol is chosen rndomly bsed on the reltive position of the hnd nd the block. A tril is terminted erly if the gent hits the block in the wrong direction.

12 ) hit the block to the floor The gol is to mke bng = true. 3) pick up the block The gol is to get the hnd in just the right plce so the robot cn grsp the block nd mke T = true. A tril is terminted erly if the gent hits the block out of rech. The directed lerner (SupLrn) is trined using the lgorithm tht Sutton nd Brto [33] refer to s liner, grdient-descent Srs(λ) with binry fetures where the binry fetures come from tile coding, nd rewrd function tht provides lrge rewrd upon reching the gol nd smll penlties for ech step. 7 Tile coding is wy to discretize continuous input for reinforcement lerning. Both the QLAP nd the directed lerning gents re evluted on the core tsks in both the core environment nd the floting extension environment under three experimentl conditions. ) QLAP The QLAP lgorithm. ) SupLrn- Directed lerning, choosing n ction every timestep. (Srs(λ) with tile coding.) 3) SupLrn-0 Directed lerning, choosing n ction every 0 timesteps. (Srs(λ) with tile coding.) SupLrn- nd SupLrn-0 re both used becuse SupLrn- hs difficulty lerning the core tsks due to high tsk dimeter ( lrge number of steps re required to rech the gol). QLAP lerns utonomously for 50,000 timesteps (corresponding to bout 3.5 hours of physicl experience) s described in Section V. The directed lerning gents repetedly perform trils of prticulr core tsk for 50,000 timesteps. At the beginning of ech tril, the core tsk tht the directed lerning gent will prctice is chosen rndomly. The stte of the gent is sved every 0,000 timesteps (bout every 8 minutes of physicl experience). The gent is then evluted on how well it cn do the specified tsk using the representtions from ech stored stte. At the beginning of ech tril, block is plced in rndom loction within rech of the gent nd the hnd is moved to rndom loction. Then, the gol is given to the gent. The gent mkes nd executes plns to chieve the gol. If the QLAP gent cnnot mke pln to chieve the gol, it moves rndomly. The tril is terminted fter 300 timesteps or when the gol is chieved. The gent receives penlty of 0.0 for ech timestep it does not chieve the gol nd rewrd of 9.99 on the timestep it chieves the gol. (SupLrn-0 gets penlty of 0. every 0th timestep it does not rech the gol nd rewrd of 9.99 on the timestep it reches the gol.) Ech evlution consists of 00 trils. The rewrds over the 00 trils re verged, nd the verge rewrd is tken s mesure of bility. For ech experiment, 0 QLAP gents nd 7 There were 6 tilings nd memory size of 65,536. The motor vribles u x nd u y were ech divided into 0 equl-width bins, so tht there were 0 ctions with ech ction either setting u x or u y to nonzero vlue. The chnge vribles were ech divided into 3 bins: (, 0.05), [ 0.05, 0.05], (0.05, ). The gol ws represented with discrete vrible. The remining vribles were treted s continuous. They were normlized to the rnge [0, ] using the minimum nd mximum vlues observed during typicl run of QLAP. The generliztion ws 0.5, nd the prmeter vlues used were λ = 0.9, γ =.0, nd α = 0.. selection ws ɛ-greedy where ɛ = The code for the implementtion cme from PLASTK [47]. 0 directed lerning gents re trined. Vrible u x, u y, u z u UG h x, h y, h z ḣ x, ḣy, ḣz y T B, ẏ T B y BT, ẏ BT x RL, ẋ RL x LR, ẋ LR z BT, ż BT z F, ż F T L, T L T R, T R T T, T T c x, ċ x c y, ċ y T bng TABLE II: Vribles of the core environment Mening force in x, y, nd z directions ungrsp the block globl loction of hnd in x, y, nd z directions derivtive of hx, hy, hz top of hnd in frme of reference of bottom of block (y direction) bottom of hnd in frme of reference of top of block (y direction) right side of hnd in frme of reference of left side of block (x direction) left side of hnd in frme of reference of right side of block (x direction) bottom side of hnd in frme of reference of top of block (z direction) distnce to the floor loction of nerest edge of block in x direction in coordinte frme defined by left edge of tble loction of nerest edge of block in x direction in coordinte frme defined by right edge of tble loction of nerest edge of block in y direction in coordinte frme defined by top edge of tble loction of hnd in x direction reltive to center of block loction of hnd in y direction reltive to center of block block is grsped, true or flse. Becomes true when the hnd is touching the block nd the D distnce between the center of the hnd nd the center of the block is less thn.5. true when block hits the floor TABLE III: Vribles dded to the core environment to mke up the floting extension environment Vrible fx, fy, fz f x, f y, f z fx, fy, fz f x, f y, f z C. Results Mening loction of first floting object in x, y, nd z directions derivtive of fx, fy, fz loction of second floting object in x, y, nd z directions derivtive of fx, fy, fz The results re shown in Figures 9 nd 0. Figure 9 compres QLAP nd directed lerning on the tsk of moving the block in the specified direction. As cn be seen in Figure 9(), SupLrn- ws not ble to do the tsk well compred to QLAP due to the high tsk dimeter. Hving the directed lerning gents choose n ction every 0 timesteps improved their performnce, s cn be seen in Figure 9(b). But s cn be seen by visully inspecting Figure 9(c), the performnce of directed lerning degrdes much more thn the performnce of QLAP degrdes when the distrctor objects re dded. This sme pttern of QLAP outperforming SupLrn-, SupLrn-0 doing s well or better thn QLAP in the environment without the distrctor objects, but then QLAP not degrding with the distrctor objects ws lso observed for the tsks of hitting the block off the tble nd picking up the block. Figure 0 shows the performnce of QLAP nd directed lerning on the tsks of hitting the block off the tble nd picking up the block. For brevity, this figure only contins the

13 3 5 4 QLAP SupLrn- 5 4 QLAP SupLrn QLAP SupLrn-0 Averge rewrd per episode (move tsk) Averge rewrd per episode (move tsk) Averge rewrd per episode (move tsk) Timesteps (x 0,000) () QLAP nd SupLrn Timesteps (x 0,000) (b) QLAP nd SupLrn Timesteps (x 0,000) (c) Flot: QLAP nd SupLrn-0 Fig. 9: Moving the block. () QLAP does better thn SupLrn- becuse of the high tsk dimeter. (b) SupLrn-0 does better thn QLAP. (c) When the floting objects re dded, the performnce of SupLrn-0 degrdes much more thn the performnce of QLAP degrdes. Averge rewrd per episode (hit to floor) Averge rewrd per episode (pickup tsk) QLAP SupLrn Timesteps (x 0,000) () Flot: QLAP nd SupLrn-0 QLAP SupLrn Timesteps (x 0,000) (b) Flot: QLAP nd SupLrn-0 Fig. 0: QLAP outperforms reinforcement lerning using tile coding on the more difficult tsks in the floting extension environment. () Knock the block off the tble. (b) Pick up the block. finl comprison of QLAP nd SupLrn-0 with floting objects on those tsks. We see tht QLAP does better thn SupLrn-0 on these more difficult tsks in the environment with floting objects. For ll three tsks, QLAP displys developmentl lerning nd gets better over time. These results suggest the following conclusions: () QLAP utonomously selects n pproprite corse temporl coding for ctions, outperforming the fine-grined ctions used in SupLrn-, due to the resulting lrge tsk dimeter; () QLAP is more robust to distrctor events thn SupLrn-0. Also note tht QLAP generlly lerns between 0 nd 5 lndmrks per vrible, nd in the core environment QLAP lerns fewer thn 400 DBNs on verge. D. Additionl evlution: QLAP lerns lndmrks tht re generlly useful We wnt to show tht the lerned lndmrks relly do represent the nturl joints in the environment. Since we hve no ground truth for wht the nturl joints of the environment re, we will compre the results of tbulr Q- lerning using lndmrks lerned with QLAP with the results of tbulr Q-lerning using rndomly generted lndmrks. If the lndmrks do represent the joints in the environment, then the tbulr Q-lerner using lerned lndmrks should do better thn the one using rndom lndmrks. ) Experimentl Environment: Tbulr Q-lerning does not generlize well. During explortory experiments, the stte spce of the core environment ws so lrge tht tbulr Q- lerning rrely visited the sme stte more thn once. We therefore evlute this clim using smller environment nd simple tsk. We use the D core environment where the hnd only moves in two dimensions. It removes the vribles of the z direction from the core environment. It subtrcts u z, u UG, h z, ḣz, z BT, ż BT, c x, ċ x, c y, ċ y, T, nd bng. ) Experimentl Conditions: () QLAP lndmrks Tbulr Q-lerning using lndmrks lerned using QLAP fter previous run of 00,000 timesteps on the D core environment. (b) rndom lndmrks Tbulr Q-lerning using rndomly generted lndmrks. To generte the rndom lndmrks, for ech mgnitude or motor vrible v, rndom number of lndmrks between 0 nd 5 is chosen. Ech lndmrk is then plced in rndomly chosen loction within the minimum nd mximum rnge observed for v during typicl run of QLAP. Note tht motor vribles lredy hve lndmrk t 0, so ech motor vrible hd between nd 6 lndmrks. 3) Results: The results re shown in Figure. Tbulr Q- lerning works much better using the lerned lndmrks thn using the rndom ones.

14 4 Averge rewrd per episode (move tsk) QLAP lndmrks rndom lndmrks Timesteps (x 0,000) Fig. : QLAP lndmrks enble the gent to lern the tsk better thn do rndom lndmrks. VII. DISCUSSION A. Description of Lerned Representtions One of the first contingencies (models) tht QLAP lerns is tht if it gives positive force to the hnd, then the hnd will move to the right. But this contingency is not very relible becuse in the simultor it tkes force of t lest 300 units to move the hnd. The gent lerns lndmrk t 300 on tht motor vrible, nd modifies the model to stte tht if the force is t lest 300, then the hnd will move to the right. But this model still is not completely relible, becuse if the hnd is lredy ll the wy to the right, then it cn t move ny frther. But from this model, the gent cn note the loction of its hnd ech time it pplies force of 300, nd it cn lern lndmrk to indicte when the hnd is ll the wy to the right. The completed model sttes tht if the hnd is not ll the wy to the right nd force of t lest 300 is given, then the hnd will usully move to the right. Even this model is not completely relible, becuse there re unusul situtions where, for exmple, the hnd is stuck on the block. But the model is probbilistic, so it cn hndle nondeterminism. The gent lso lerns tht the loction of the right side of the hnd in the frme of reference of the left side of the block hs specil vlue t 0. It lerns this becuse it notices tht the block begins to move to the right when tht vlue is chieved. It then cretes lndmrk to indicte tht vlue nd n ction to rech tht vlue. Bsed on this lndmrk, QLAP cn lern contingency (model) tht sys if the vlue goes to 0, then the block will move to the right. It cn then lern other lndmrks tht indicte in which situtions this will be successful. In similr wy, it lerns to pick up the block nd knock the block off the tble. B. Theoreticl Bounds QLAP is resonbly efficient in time nd spce. Let V be the number of vribles. QLAP serches for pirs of events to form contingency, so this process is O(V ) in both time nd spce. QLAP serches for context vribles nd lndmrks for ech lerned contingency. It does this by considering one vrible t time for ech DBN, thus these processes re O(V 3 ) in time nd spce. This of course mens tht if you hd very lrge number of vribles, such s 000, tht the vribles would need to be ctegorized so tht only subset of possible contingencies were considered. MDP plnning is known to be computtionlly expensive becuse the stte spce grows exponentilly with the number of vribles. However, this explosion is limited in QLAP becuse QLAP builds mny MDPs, ech consisting of only few vribles. Essentilly, QLAP serches for the simplest MDP models tht give resonble descriptions of the observed dynmics of the environment. The serch is greedy serch, incrementlly incresing the number of vribles in the MDP (vi the DBN). C. Assumptions of QLAP QLAP ssumes tht ny gol tht n outside observer would wnt the gent to ccomplish is represented with n input vrible. QLAP lso ssumes tht meningful lndmrks cn be found on single vribles. In some cses when these ssumptions re violted, QLAP cn do serch on combintions of vribles (see [5, sect. 0.3]). QLAP lso ssumes set of continuous motor primitives tht correspond to orthogonl directions of movement. QLAP builds on the work of Pierce nd Kuipers [48]. In their work, n gent ws ble to use principl components nlysis (PCA) [49] to lern set of motor primitives corresponding to turn nd trvel for robot tht hd motors to turn ech of two wheels independently. VIII. RELATED WORK QLAP lerns sttes nd hierrchicl ctions in continuous, dynmic environments with continuous motors through utonomous explortion. The closest direct competitor to QLAP is the work of Brto, Jonsson, nd Vigorito. Given DBN model of the environment, the VISA lgorithm [50] cretes cusl grph which it uses to identify stte vribles for options. Like QLAP, the VISA lgorithm performs stte bstrction by finding the relevnt vribles for ech option. Jonsson nd Brto [6] lern DBNs through n gent s interction with discrete environment by mximizing the posterior of the DBN given the dt by building tree to represent the conditionl probbility. Vigorito nd Brto [5] extends [6], [50] by proposing n lgorithm for lerning options when there is no specific tsk. This work differs from QLAP in tht lerning tkes plce in discrete environments with events tht re ssumed to occur over one-timestep intervls. The work lso ssumes tht the gent begins with set of discrete ctions. Becuse QLAP is designed for continuous environments with dynmics, QLAP uses qulittive representtion. This qulittive representtion leds to novel DBN lerning lgorithm for lerning predictive models, nd novel method for converting those models into set of hierrchicl ctions. Shen s LIVE lgorithm [5] lerns set of rules in firstorder logic nd then uses gol regression to perform ctions. The lgorithm ssumes tht the gent lredy hs bsic ctions, nd the experiments presented re in environments without dynmics such s the Tower of Hnoi. Another method for

15 5 lerning plnning rules in first-order logic is [7], [8]. The rules they lern re probbilistic, given context nd n ction their lerned rules provide distribution over results. This lgorithm ssumes discrete stte spce nd tht the gent lredy hs bsic ctions such s pick up. QLAP s structure of ctions nd plns is reminiscent of the MAXQ vlue function decomposition [53]. QLAP defines its own ctions nd plns s it lerns, nd s the gent lerns more refined discretiztion the hierrchy chnges. There hs lso been much work on lerning hierrchy. Like QLAP, Digney [54] cretes tsk to chieve ech discrete vlue of ech vrible. However, QLAP lerns the discretiztion. Work hs been done on lerning hierrchicl decomposition of fctored Mrkov decision process by identifying exits. Exits re combintions of vrible vlues nd ctions tht cuse some stte vrible to chnge its vlue [50]. Exits roughly correspond to the DBNs found by QLAP except tht there is no explicit ction needed for QLAP DBNs. Hengst [55] determined n order on the input vribles bsed on how often they chnged vlue. Using this ordering, he identified exits to chnge the next vrible in the order nd creted n option for ech exit. There hs been other work on structure lerning in sequentil decision processes where the environment cn be modeled s fctored MDP. Degris et l. [5] proposed method clled SDYNA tht lerns structured representtion in the form of decision tree nd then uses tht structure to compute vlue function. Strehl et l. [7] lern DBN to predict ech component of fctored stte MDP. Hester nd Stone [56] lern decision trees to predict both the rewrd nd the chnge in the next stte. All of these methods re evluted in discrete environments where trnsitions occur over onetimestep intervls. IX. SUMMARY AND CONCLUSION The Qulittive Lerner of nd Perception (QLAP) is n unsupervised lerning lgorithm tht llows n gent to utonomously lern sttes nd ctions in continuous environments. Lerning ctions from lerned representtion is significnt becuse it moves the stte of the rt of utonomous lerning from grid worlds to continuous environments. Another contribution of QLAP is providing method for fctoring the environment into smll pieces. Insted of lerning one lrge predictive model, QLAP lerns mny smll models. And insted of lerning one lrge pln to perform n ction, QLAP lerns mny smll plns tht re useful in different situtions. QLAP strts with bottom-up process tht detects contingencies nd builds DBN models to identify the conditions (i.e., vlues of context vribles) under which there re nerdeterministic reltions mong events. Menwhile, n ction is defined for ech event, where the ction hs the intended effect of mking tht event occur. The desirble sitution is for n ction to hve one or more sufficiently relible plns for implementing the ction. If there re no plns for n ction, or if the plns re not sufficiently relible, the ction is still defined, but it is not useful, so it will not be used s step in higher-level pln. Ech pln comes from sufficiently relible DBN, where the overll structure is tht DBNs led to MDPs, MDPs re converted into policies, nd policies re plns. QLAP explores utonomously nd tries to lern to chieve ech qulittive vlue of ech vrible. To explore, the gent continully chooses n ction to prctice. To choose which ction to prctice, QLAP uses Intelligent Adptive Curiosity (IAC). IAC motivtes the gent to prctice ctions tht it is getting better t, nd IAC motivtes the gent to stop prcticing ctions tht re too hrd or too esy. QLAP ws evluted in environments with simulted physics. The evlution ws performed by hving QLAP explore utonomously nd then mesuring how well it could perform set of tsks. The gent lerned to hit block in specified direction nd to pick up the block s well or better thn directed lerner trined only on the tsk. The evlution lso showed tht the lndmrks lerned by QLAP were brodly useful. Future work will consist of incorporting continuous lerning methods within the discretized representtion lerned by QLAP. This should enble QLAP to leverge the best of both discrete lerning nd continuous lerning. ACKNOWLEDGMENT This work hs tken plce in the Intelligent Robotics Lb t the Artificil Intelligence Lbortory, The University of Texs t Austin. Reserch of the Intelligent Robotics lb is supported in prt by grnts from the Texs Advnced Reserch Progrm ( ), nd from the Ntionl Science Foundtion (IIS-04357, IIS-07350, nd IIS-07500). REFERENCES [] A. Sxen, J. Driemeyer, J. Kerns, nd A. Ng, Robotic grsping of novel objects, Advnces in neurl informtion processing systems, vol. 9, p. 09, 007. [] P. Fitzptrick, G. Mett, L. Ntle, S. Ro, nd G. Sndini, Lerning bout objects through ction-initil steps towrds rtificil cognition, in IEEE Interntionl Conference on Robotics nd Automtion, 003. Proceedings. ICRA 03, vol. 3, 003. [3] S. Vijykumr, A. D souz, nd S. Schl, Incrementl online lerning in high dimensions, Neurl Computtion, vol. 7, no., pp , 005. [4] C. M. Vigorito nd A. G. Brto, Intrinsiclly motivted hierrchicl skill lerning in structured environments, IEEE Trnsctions on Autonomous Mentl Development (TAMD), vol., no., 00. [5] G. L. Drescher, Mde-Up Minds: A Constructivist Approch to Artificil Intelligence. Cmbridge, MA: MIT Press, 99. [6] P. R. Cohen, M. S. Atkin, T. Otes, nd C. R. Bel, Neo: Lerning conceptul knowledge by sensorimotor interction with n environment, in Agents 97. Mrin del Rey, CA: ACM, 997. [7] L. S. Zettlemoyer, H. Psul, nd L. P. Kelbling, Lerning plnning rules in noisy stochstic worlds. in Proc. 0nd Conf. on Artificil Intelligence (AAAI-005), 005, pp [8] H. Psul, L. Zettlemoyer, nd L. Kelbling, Lerning symbolic models of stochstic domins, Journl of Artificil Intelligence Reserch, vol. 9, pp , 007. [9] C. Xu nd B. Kuipers, Towrds the Object Semntic Hierrchy, in Proc. of the Int. Conf. on Development nd Lerning (ICDL 00), 00. [0] J. Mugn nd B. Kuipers, Lerning to predict the effects of ctions: Synergy between rules nd lndmrks, in Proc. of the Int. Conf. on Development nd Lerning, 007. [], Lerning distinctions nd rules in continuous world through ctive explortion, in Proc. of the Int. Conf. on Epigenetic Robotics, 007. [], Towrds the ppliction of reinforcement lerning to undirected developmentl lerning, in Proc. of the Int. Conf. on Epigenetic Robotics, 008.

16 6 [3], A comprison of strtegies for developmentl ction cquisition in QLAP, in Proc. of the Int. Conf. on Epigenetic Robotics, 009. [4], Autonomously lerning n ction hierrchy using lerned qulittive stte representtion, in Proc. of the Int. Joint Conf. on Artificil Intelligence, 009. [5] J. Mugn, Autonomous Qulittive Lerning of Distinctions nd s in Developing Agent, Ph.D. disserttion, University of Texs t Austin, 00. [6] U. Fyyd nd K. Irni, On the hndling of continuous-vlued ttributes in decision tree genertion, Mchine Lerning, vol. 8, no., pp. 87 0, 99. [7] J. Mugn nd B. Kuipers, The qulittive lerner of ction nd perception, QLAP, in AAAI Video Competition (AIVC 00), 00, mugn qlp. [8] R. S. Sutton nd A. G. Brto, Reinforcement Lerning. Cmbridge MA: MIT Press, 998. [9] C. G. Atkeson, A. W. Moore, nd S. Schl, Loclly weighted lerning, Artificil Intelligence Review, vol., no. /5, pp. 73, 997. [0], Loclly weighted lerning for control, Artificil Intelligence Review, vol., no. /5, pp. 75 3, 997. [] S. Vijykumr nd S. Schl, Loclly weighted projection regression: An O(n) lgorithm for incrementl rel time lerning in high dimensionl spce, in Proceedings of the Seventeenth Interntionl Conference on Mchine Lerning (ICML 000), vol., 000, pp [] M. Jordn nd D. Rumelhrt, Forwrd models: Supervised lerning with distl techer, Cognitive Science, vol. 6, pp , 99. [3] C. Rsmussen, Gussin processes in mchine lerning, Advnced Lectures on Mchine Lerning, pp. 63 7, 006. [4] B. Kuipers, Qulittive Resoning. Cmbridge, Msschusetts: The MIT Press, 994. [5] T. Degris, O. Sigud, nd P. Wuillemin, Lerning the structure of fctored Mrkov decision processes in reinforcement lerning problems, in ICML, 006, pp [6] A. Jonsson nd A. Brto, Active lerning of dynmic byesin networks in mrkov decision processes, Lecture Notes in Artificil Intelligence: Abstrction, Reformultion, nd Approximtion - SARA, pp , 007. [7] A. Strehl, C. Diuk, nd M. Littmn, Efficient structure lerning in fctored-stte MDPs, in AAAI, vol., no., 007, p [8] G. A. Miller, E. Glnter, nd K. H. Pribrm, Plns nd the Structure of Behvior. Holt, Rinehrt nd Winston, 960. [9] N. J. Nilsson, Principles of Artificil Intelligence. Tiog Publishing Compny, 980. [30] M. Putermn, Mrkov Decision Problems. New York: Wiley, 994. [3] C. Boutilier, T. Den, nd S. Hnks, Decision theoretic plnning: Structurl ssumptions nd computtionl leverge, Journl of Artificil Intelligence Reserch, vol., no., p. 94, 999. [3] R. S. Sutton, D. Precup, nd S. Singh, Between MDPs nd semi- MDPs: A frmework for temporl bstrction in reinforcement lerning, Artificil Intelligence, vol., no. -, pp. 8, 999. [33] R. S. Sutton nd A. G. Brto, Reinforcement Lerning. Cmbridge MA: MIT Press, 998. [34] X. Hung nd J. Weng, Novelty nd Reinforcement Lerning in the Vlue System of Developmentl Robots, Proc. nd Inter. Workshop on Epigenetic Robotics, 00. [35] J. Mrshll, D. Blnk, nd L. Meeden, An emergent frmework for self-motivtion in developmentl robotics, Proc. of the 3rd Int. Conf. on Development nd Lerning (ICDL 004), 004. [36] R. Brfmn nd M. Tennenholtz, R-mx- generl polynomil time lgorithm for ner-optiml reinforcement lerning, The Journl of Mchine Lerning Reserch, vol. 3, pp. 3 3, 003. [37] J. Schmidhuber, Curious model-building control systems, in Proc. Int. Joint Conf. on Neurl Networks, vol., 99, pp [38] P. Oudeyer, F. Kpln, nd V. Hfner, Intrinsic Motivtion Systems for Autonomous Mentl Development, Evolutionry Computtion, IEEE Trnsctions on, vol., no., pp , 007. [39] A. Bonrini, A. Lzric, nd M. Restelli, Incrementl Skill Acquisition for Self-Motivted Lerning Animts, in From Animls to Animts 9: 9th Interntionl Conference on Simultion of Adptive Behvior, SAB. Springer, 006, pp [40] M. Mrjnovic, B. Scssellti, nd M. Willimson, Self-tught visully guided pointing for humnoid robot, in From Animls to Animts 4: Proc. Fourth Int l Conf. Simultion of Adptive Behvior, 996, pp [4] G. Mett nd P. Fitzptrick, Erly integrtion of vision nd mnipultion, Adptive Behvior, vol., no., pp. 09 8, 003. [4] K. Gold nd B. Scssellti, Lerning cceptble windows of contingency, Connection Science, vol. 8, no., pp. 7 8, 006. [43] J. Klein, Breve: 3d environment for the simultion of decentrlized systems nd rtificil life, in Proc. of the Int. Conf. on Artificil Life, 003. [44] R. Smith, Open dynmics engine v 0.5 user guide, [45] V. G. Pyne nd L. D. Iscs, Humn Motor Development: A Lifespn Approch. McGrw-Hill Humnities/Socil Sciences/Lnguges, 007. [46] A. Needhm, T. Brrett, nd K. Petermn, A pick-me-up for infnts explortory skills: Erly simulted experiences reching for objects using sticky mittens enhnces young infnts object explortion skills, Infnt Behvior nd Development, vol. 5, no. 3, pp , 00. [47] J. Provost, sourceforge.net, 008. [48] D. M. Pierce nd B. J. Kuipers, Mp lerning with uninterpreted sensors nd effectors. Artificil Intelligence, vol. 9, pp. 69 7, 997. [49] R. O. Dud, P. E. Hrt, nd D. G. Stork, Pttern Clssifiction. Wiley- Interscience Publiction, 000. [50] A. Jonsson nd A. Brto, Cusl grph bsed decomposition of fctored MDPs, The Journl of Mchine Lerning Reserch, vol. 7, pp , 006. [5] C. M. Vigorito nd A. G. Brto, Autonomous hierrchicl skill cquisition in fctored mdps, in Yle Workshop on Adptive nd Lerning Systems, New Hven, Connecticut, 008. [5] W.-M. Shen, Autonomous Lerning from the Environment. W. H. Freemn nd Compny, 994. [53] T. Dietterich, The MAXQ method for hierrchicl reinforcement lerning, ICML, 998. [54] B. Digney, Emergent hierrchicl control structures: Lerning rective/hierrchicl reltionships in reinforcement environments, in From nimls to nimts 4: proceedings of the Fourth Interntionl Conference on Simultion of Adptive Behvior. The MIT Press, 996, p [55] B. Hengst, Discovering hierrchy in reinforcement lerning with HEXQ, in Proceedings of the Nineteenth Interntionl Conference on Mchine Lerning, 00, pp [56] T. Hester nd P. Stone, Generlized model lerning for reinforcement lerning in fctored domins, in Proceedings of The 8th Interntionl Conference on Autonomous Agents nd Multigent Systems-Volume. Interntionl Foundtion for Autonomous Agents nd Multigent Systems, 009, pp Jonthn Mugn is Postdoctorl Fellow t Crnegie Mellon University. He received his Ph.D. in computer science from the University of Texs t Austin. He received his M.S. from the University of Texs t Dlls, nd he received his M.B.A. nd B.A. from Texs A&M University. Benjmin Kuipers joined the University of Michign in Jnury 009 s Professor of Computer Science nd Engineering. Prior to tht, he held n endowed Professorship in Computer Sciences t the University of Texs t Austin. He received his B.A. from Swrthmore College, nd his Ph.D. from MIT.

WE would like to build intelligent agents that can. Autonomous Learning of High-Level States and Actions in Continuous Environments

WE would like to build intelligent agents that can. Autonomous Learning of High-Level States and Actions in Continuous Environments Autonomous Lerning of High-Level Sttes nd s in Continuous Environments Jonthn Mugn nd Benjmin Kuipers, Fellow, IEEE Abstrct How cn n gent bootstrp up from pixel-level representtion to utonomously lern

More information

Jonathan Mugan. July 15, 2013

Jonathan Mugan. July 15, 2013 Jonthn Mugn July 15, 2013 Imgine rt in Skinner box. The rt cn see screen of imges, nd dot in the lower-right corner determines if there will be shock. Bottom-up methods my not find this dot, but top-down

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

1 Probability Density Functions

1 Probability Density Functions Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance Generl structure ECO 37 Economics of Uncertinty Fll Term 007 Notes for lectures 4. Stochstic Dominnce Here we suppose tht the consequences re welth mounts denoted by W, which cn tke on ny vlue between

More information

Handout: Natural deduction for first order logic

Handout: Natural deduction for first order logic MATH 457 Introduction to Mthemticl Logic Spring 2016 Dr Json Rute Hndout: Nturl deduction for first order logic We will extend our nturl deduction rules for sententil logic to first order logic These notes

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

Line and Surface Integrals: An Intuitive Understanding

Line and Surface Integrals: An Intuitive Understanding Line nd Surfce Integrls: An Intuitive Understnding Joseph Breen Introduction Multivrible clculus is ll bout bstrcting the ides of differentition nd integrtion from the fmilir single vrible cse to tht of

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Chapter 14. Matrix Representations of Linear Transformations

Chapter 14. Matrix Representations of Linear Transformations Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

MAA 4212 Improper Integrals

MAA 4212 Improper Integrals Notes by Dvid Groisser, Copyright c 1995; revised 2002, 2009, 2014 MAA 4212 Improper Integrls The Riemnn integrl, while perfectly well-defined, is too restrictive for mny purposes; there re functions which

More information

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz

More information

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a). The Fundmentl Theorems of Clculus Mth 4, Section 0, Spring 009 We now know enough bout definite integrls to give precise formultions of the Fundmentl Theorems of Clculus. We will lso look t some bsic emples

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 UNIFORM CONVERGENCE Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 Suppose f n : Ω R or f n : Ω C is sequence of rel or complex functions, nd f n f s n in some sense. Furthermore,

More information

Vyacheslav Telnin. Search for New Numbers.

Vyacheslav Telnin. Search for New Numbers. Vycheslv Telnin Serch for New Numbers. 1 CHAPTER I 2 I.1 Introduction. In 1984, in the first issue for tht yer of the Science nd Life mgzine, I red the rticle "Non-Stndrd Anlysis" by V. Uspensky, in which

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties

More information

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information CS 188: Artificil Intelligence nd Vlue of Informtion Instructors: Dn Klein nd Pieter Abbeel niversity of Cliforni, Berkeley [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

Generalized Fano and non-fano networks

Generalized Fano and non-fano networks Generlized Fno nd non-fno networks Nildri Ds nd Brijesh Kumr Ri Deprtment of Electronics nd Electricl Engineering Indin Institute of Technology Guwhti, Guwhti, Assm, Indi Emil: {d.nildri, bkri}@iitg.ernet.in

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

20 MATHEMATICS POLYNOMIALS

20 MATHEMATICS POLYNOMIALS 0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of

More information

Week 10: Line Integrals

Week 10: Line Integrals Week 10: Line Integrls Introduction In this finl week we return to prmetrised curves nd consider integrtion long such curves. We lredy sw this in Week 2 when we integrted long curve to find its length.

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction Lesson : Logrithmic Functions s Inverses Prerequisite Skills This lesson requires the use of the following skills: determining the dependent nd independent vribles in n exponentil function bsed on dt from

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

Bernoulli Numbers Jeff Morton

Bernoulli Numbers Jeff Morton Bernoulli Numbers Jeff Morton. We re interested in the opertor e t k d k t k, which is to sy k tk. Applying this to some function f E to get e t f d k k tk d k f f + d k k tk dk f, we note tht since f

More information

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d Interntionl Industril Informtics nd Computer Engineering Conference (IIICEC 15) Driving Cycle Construction of City Rod for Hybrid Bus Bsed on Mrkov Process Deng Pn1,, Fengchun Sun1,b*, Hongwen He1, c,

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

Non-Linear & Logistic Regression

Non-Linear & Logistic Regression Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find

More information

1.9 C 2 inner variations

1.9 C 2 inner variations 46 CHAPTER 1. INDIRECT METHODS 1.9 C 2 inner vritions So fr, we hve restricted ttention to liner vritions. These re vritions of the form vx; ǫ = ux + ǫφx where φ is in some liner perturbtion clss P, for

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Where did dynamic programming come from?

Where did dynamic programming come from? Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of

More information

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as Improper Integrls Two different types of integrls cn qulify s improper. The first type of improper integrl (which we will refer to s Type I) involves evluting n integrl over n infinite region. In the grph

More information

#6A&B Magnetic Field Mapping

#6A&B Magnetic Field Mapping #6A& Mgnetic Field Mpping Gol y performing this lb experiment, you will: 1. use mgnetic field mesurement technique bsed on Frdy s Lw (see the previous experiment),. study the mgnetic fields generted by

More information

STEP FUNCTIONS, DELTA FUNCTIONS, AND THE VARIATION OF PARAMETERS FORMULA. 0 if t < 0, 1 if t > 0.

STEP FUNCTIONS, DELTA FUNCTIONS, AND THE VARIATION OF PARAMETERS FORMULA. 0 if t < 0, 1 if t > 0. STEP FUNCTIONS, DELTA FUNCTIONS, AND THE VARIATION OF PARAMETERS FORMULA STEPHEN SCHECTER. The unit step function nd piecewise continuous functions The Heviside unit step function u(t) is given by if t

More information

Overview of Calculus I

Overview of Calculus I Overview of Clculus I Prof. Jim Swift Northern Arizon University There re three key concepts in clculus: The limit, the derivtive, nd the integrl. You need to understnd the definitions of these three things,

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 Automt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Problem (II) Chpter II.6.: Push Down Automt Remrk: This mteril is no longer tught nd not directly exm relevnt Anton Setzer (Bsed

More information

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one

More information

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!) CMSC 330: Orgniztion of Progrmming Lnguges DFAs, nd NFAs, nd Regexps (Oh my!) CMSC330 Spring 2018 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All

More information

221B Lecture Notes WKB Method

221B Lecture Notes WKB Method Clssicl Limit B Lecture Notes WKB Method Hmilton Jcobi Eqution We strt from the Schrödinger eqution for single prticle in potentil i h t ψ x, t = [ ] h m + V x ψ x, t. We cn rewrite this eqution by using

More information

DIRECT CURRENT CIRCUITS

DIRECT CURRENT CIRCUITS DRECT CURRENT CUTS ELECTRC POWER Consider the circuit shown in the Figure where bttery is connected to resistor R. A positive chrge dq will gin potentil energy s it moves from point to point b through

More information

Math 426: Probability Final Exam Practice

Math 426: Probability Final Exam Practice Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by

More information

Riemann Integrals and the Fundamental Theorem of Calculus

Riemann Integrals and the Fundamental Theorem of Calculus Riemnn Integrls nd the Fundmentl Theorem of Clculus Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University September 16, 2013 Outline Grphing Riemnn Sums

More information

Lecture 1. Functional series. Pointwise and uniform convergence.

Lecture 1. Functional series. Pointwise and uniform convergence. 1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is

More information

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation Strong Bisimultion Overview Actions Lbeled trnsition system Trnsition semntics Simultion Bisimultion References Robin Milner, Communiction nd Concurrency Robin Milner, Communicting nd Mobil Systems 32

More information

Natural examples of rings are the ring of integers, a ring of polynomials in one variable, the ring

Natural examples of rings are the ring of integers, a ring of polynomials in one variable, the ring More generlly, we define ring to be non-empty set R hving two binry opertions (we ll think of these s ddition nd multipliction) which is n Abelin group under + (we ll denote the dditive identity by 0),

More information

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that Arc Length of Curves in Three Dimensionl Spce If the vector function r(t) f(t) i + g(t) j + h(t) k trces out the curve C s t vries, we cn mesure distnces long C using formul nerly identicl to one tht we

More information

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018 DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:

More information

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

fractions Let s Learn to

fractions Let s Learn to 5 simple lgebric frctions corne lens pupil retin Norml vision light focused on the retin concve lens Shortsightedness (myopi) light focused in front of the retin Corrected myopi light focused on the retin

More information

Jack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah

Jack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah 1. Born-Oppenheimer pprox.- energy surfces 2. Men-field (Hrtree-Fock) theory- orbitls 3. Pros nd cons of HF- RHF, UHF 4. Beyond HF- why? 5. First, one usully does HF-how? 6. Bsis sets nd nottions 7. MPn,

More information

12 TRANSFORMING BIVARIATE DENSITY FUNCTIONS

12 TRANSFORMING BIVARIATE DENSITY FUNCTIONS 1 TRANSFORMING BIVARIATE DENSITY FUNCTIONS Hving seen how to trnsform the probbility density functions ssocited with single rndom vrible, the next logicl step is to see how to trnsform bivrite probbility

More information

Scientific notation is a way of expressing really big numbers or really small numbers.

Scientific notation is a way of expressing really big numbers or really small numbers. Scientific Nottion (Stndrd form) Scientific nottion is wy of expressing relly big numbers or relly smll numbers. It is most often used in scientific clcultions where the nlysis must be very precise. Scientific

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s). Mth 1A with Professor Stnkov Worksheet, Discussion #41; Wednesdy, 12/6/217 GSI nme: Roy Zho Problems 1. Write the integrl 3 dx s limit of Riemnn sums. Write it using 2 intervls using the 1 x different

More information

Density of Energy Stored in the Electric Field

Density of Energy Stored in the Electric Field Density of Energy Stored in the Electric Field Deprtment of Physics, Cornell University c Tomás A. Aris October 14, 01 Figure 1: Digrm of Crtesin vortices from René Descrtes Principi philosophie, published

More information