Monte Carlo Value Iteration with Macro-Actions

Size: px
Start display at page:

Download "Monte Carlo Value Iteration with Macro-Actions"

Transcription

1 In Advnces in Neurl Informtion Processing Systems (NIPS), 2011 Monte Crlo Vlue Itertion with Mcro-Actions Zhnwei Lim Dvid Hsu Wee Sun Lee Deprtment of Computer Science, Ntionl University of Singpore Singpore, , Singpore Abstrct POMDP plnning fces two mjor computtionl chllenges: lrge stte spces nd long plnning horizons. The recently introduced Monte Crlo Vlue Itertion (MCVI) cn tckle POMDPs with very lrge discrete stte spces or continuous stte spces, but its performnce degrdes when fced with long plnning horizons. This pper presents Mcro-MCVI, which extends MCVI by exploiting mcro-ctions for temporl bstrction. We provide sufficient conditions for Mcro-MCVI to inherit the good theoreticl properties of MCVI. Mcro-MCVI does not require explicit construction of probbilistic models for mcro-ctions nd is thus esy to pply in prctice. Experiments show tht Mcro-MCVI substntilly improves the performnce of MCVI with suitble mcro-ctions. 1 Introduction Prtilly observble Mrkov decision process (POMDP) provides principled generl frmework for plnning with imperfect stte informtion. In POMDP plnning, we represent n gent s possible sttes probbilisticlly s belief nd systemticlly reson over the spce of ll beliefs in order to derive policy tht is robust under uncertinty. POMDP plnning, however, fces two mjor computtionl chllenges. The first is the curse of dimensionlity. A complex plnning tsk involves lrge number of sttes, which result in high-dimensionl belief spce. The second obstcle is the curse of history. In pplictions such s robot motion plnning, n gent often tkes mny ctions before reching the gol, resulting in long plnning horizon. The complexity of the plnning tsk grows very fst with the horizon. Point-bsed pproximte lgorithms [10, 14, 9] hve brought drmtic progress to POMDP plnning. Some of the fstest ones, such s HSVI [14] nd SARSOP [9], cn solve modertely complex POMDPs with hundreds of thousnds sttes in resonble time. The recently introduced Monte Crlo Vlue Itertion (MCVI) [2] tkes one step further. It cn tckle POMDPs with very lrge discrete stte spces or continuous stte spces. The min ide of MCVI is to smple both n gent s stte spce nd the corresponding belief spce simultneously, thus voiding the prohibitive computtionl cost of unnecessrily processing these spces in their entirety. It uses Monte Crlo smpling in conjunction with dynmic progrmming to compute policy represented s finite stte controller. Both theoreticl nlysis nd experiments on severl robotic motion plnning tsks indicte tht MCVI is promising pproch for plnnning under uncertinty with very lrge stte spces, nd it hs lredy been pplied successfully to compute the thret resolution logic for ircrft collision voidnce systems in 3-D spce [1]. However, the performnce of MCVI degrdes, s the plnning horizon increses. Temporl bstrction using mcro-ctions is effective in mitigting the negtive effect nd hs chieved good results in erlier work on Mrkov decision processes (MDPs) nd POMDPs (see Section 2). In this work, we show tht mcro-ctions cn be semlessly integrted into MCVI, leding to the Mcro- MCVI lgorithm. Unfortuntely, the theoreticl properties of MCVI, such s the pproximtion error bounds [2], do not crry over to Mcro-MCVI utomticlly, if rbitrry mpping from belief to ctions re llowed s mcro-ctions. We give sufficient conditions for the good theoreticl properties 1

2 to be retined, trnforming POMDPs into prticulr type of prtilly observble semi-mrkov decision processes (POSMDPs) in which the lengths of mcro-ctions re not observble. A mjor dvntge of the new lgorithm is its bility to bstrct wy the lengths of mcro-ctions in plnning nd reduce the effect of long plnning horizons. Furthermore, it does not require explicit probbilistic models for mcro-ctions nd trets them just like primitive ctions in MCVI. This simplifies mcro-ction construction nd is mjor benefit in prctice. Mcro-MCVI cn lso be used to construct hierrchy of mcro-ctions for plnning lrge spces. Experiments show tht the lgorithm is effective with suitbly designed mcro-ctions. 2 Relted Works Mcro-ctions hve long been used to speed up plnning nd lerning lgorithms for MDPs (see, e.g., [6, 15, 3]). Similrly, they hve been used in offline policy computtion for POMDPs [16, 8]. Mcro-ctions cn be composed hierrchiclly to further improve sclbility [4, 11]. These erlier works rely on vector representtions for beliefs nd vlue functions, mking it difficult to scle up to lrge stte spces. Mcro-ctions hve lso been used in online serch lgorithms for POMDPs [7]. Mcro-MCVI is relted to Hnsen nd Zhou s work [5]. The erlier work uses finite stte controllers for policy representtion nd policy itertion for policy computtion, but it hs not yet been shown to work on lrge stte spces. Expecttion-mximiztion (EM) cn be used to trin finite stte controllers [17] nd potentilly hndle lrge stte spces, but it often gets stuck in locl optim. 3 Plnning with Mcro-ctions We would like to generlize POMDPs to hndle mcro-ctions. Idelly, the generliztion should retin properties of POMDPs such s piecewise liner nd convex finite horizon vlue functions. We would lso like the pproximtion bounds for MCVI [2] to hold with mcro-ctions. We would like to llow our mcro-ctions to be s powerful s possible. A very powerful representtion for mcro-ction would be to llow it to be n rbitrry mpping from belief to ction tht will run until some termintion condition is met. Unfortuntely, the vlue function of process with such mcro-ctions need not even be continuous. Consider the following simple finite horizon exmple, with horizon one. Assume tht there re two primitive ctions, both with constnt rewrds, regrdless of stte. Consider two mcro-ctions, one which selects the poorer primitive ction ll the time while the other which selects the better primitive ction for some beliefs. Clerly, the second mcro-ction domintes the first mcro-ction over the entire belief spce. The rewrd for the second mcro-ction tkes two possible vlues depending on which ction is selected for the belief. The rewrd function lso forms the optiml vlue function of the process nd need not even be continuous s the mcro-ction cn be n rbitrry mpping from belief to ction. Next, we give sufficient conditions for the process to retin piecewise linerity nd convexity of the vlue function. We do this by constructing type of prtilly observble semi-mrkov decision process (POSMDP) with the desired property. The POSMDP does not need to hve the length of the mcro-ction observed, property tht cn be prcticlly very useful s it llows the brnching fctor for serch to be significntly smller. Furthermore, the process is strict generliztion of POMDP s it reduces to POMDP when ll the mcro-ctions hve length one. 3.1 Prtilly Observble Semi-Mrkov Decision Process Finite-horizon (undiscounted) POSMDP were studied in [18]. Here, we focus on type of infinitehorizon discounted POSMDPs whose trnsition intervls re not observble. Our POSMDP is formlly defined s tuple (S, A, O, T, R, ), where S is stte spce, A is mcro-ction spce, O is mcro-observtion spce, T is joint trnsition nd observtion function, R is rewrd function, nd 2 (0, 1) is discount fctor. If we pply mcro-ction with strt stte s i, T = p(s j, o,k s i, ) encodes the joint conditionl probbility of the end stte s j, mcro-observtion o, nd the number of time steps k tht it tkes for to rech s j from s i. We could decompose T into stte-trnsition function nd n observtion function, but void doing so here to remin generl nd simplify the nottion. The rewrd function R gives the discounted cumultive rewrd for mcro-ction tht strts t stte s: R(s, ) = P 1 t=0 t E(r t s, ), where E(r t s, ) is the expected rewrd t step t. Here we ssume tht the rewrd is 0 once mcro-ction termintes. For convenience, we will work with reweighted beliefs, insted of beliefs. Assuming tht the number of sttes is n, reweighted belief (like belief) is vector of n non-negtive numbers tht sums to 2

3 one. By ssuming tht the POSMDP process will stop with probbility 1 t ech time step, we cn interpret the reweighted belief s the conditionl probbility of stte given tht the process hs not stopped. This gives n interprettion of the reweighted belief in terms of the discount fctor. Given reweighted belief, we compute the next reweighted belief given mcroction nd observtion o, b 0 = (b,, o), s follows: P 1 b 0 k=1 k 1 P n i=1 (s) = p(s, o,k s i, )b(s i ) P P 1 k=1 k 1 n P n j=0 i=1 p(s j, o,k s i, )b(s i ). (1) We P will simply refer to the reweighted belief s belief from here on. We denote the denomintor 1 k=1 k 1 P n P n j=0 i=1 p(s j, o,k s i, )b(s i ) by p (o,b). The vlue of p (o,b) cn be interpreted s the probbility tht observtion o is received nd the POSMDP hs not stopped. Note tht p (o,b) my sum to less thn 1 due to discounting. P o A policy is mpping from belief to mcro-ction. Let R(b, ) = P s b(s)r(s, ). The vlue of policy cn be defined recursively s V (b) =R(b, (b)) + o p (o (b),b)v ( (b, (b), o)). Note tht the policy opertes on the belief nd my not know the number of steps tken by the mcro-ctions. If knowledge of the number of steps is importnt, it cn be dded into the observtion function in the modeling process. We now define the bckup opertor H tht opertes on vlue function V m nd returns V m+1 HV (b) = mx R(b, )+ o2o p (o,b)v ( (b,, o)). (2) The bckup opertor is contrctive mpping 1. Lemm 1 Given vlue functions U nd V, HU HV 1 pple U V 1. Let the vlue of n optiml policy,, be V. The following theorem is consequence of the Bnch fixed point theorem nd Lemm 1. Theorem 1 V is the unique fixed point of H nd stisfies the Bellmn eqution V = HV. We cll policy n m-step policy if the number of times the mcro-ctions is pplied is m. For m-step policies, V cn be pproximted by finite set of liner functions; the weight vectors of these liner functions re clled the -vectors. Theorem 2 The vlue function for n m-step policy is piecewise liner nd convex nd cn be represented s V m (b) = mx (s)b(s) (3) where m is finite collection of -vectors. 2 m As V m is convex nd converges to V, V is lso convex. 3.2 Mcro-ction Construction We would like to construct mcro-ctions from primitive ctions of POMDP in order to use temporl bstrction to help solve difficult POMDP problems. A prtilly observble Mrkov decision process (POMDP) is defined by finite stte spce S, finite ction spce A, rewrd function R(s, ), n observtion spce O, nd discount 2 (0, 1). In our POSMDP, the probbility function p(s j, o,k s i, ) for mcro-ction must be independent of the history given the current stte s i ; hence the selection of primitive ctions nd termintion conditions within the mcro-ction cnnot depend on the belief. We exmine some llowble dependencies here. Due to prtil observbility, it is often not possible to llow the primitive ction nd the termintion condition to be functions of the initil stte. Dependence on the portion of history 1 Proofs of the results in this section re included in the supplementry mteril. 3

4 tht occurs fter the mcro-ction hs strted is, however, llowed. In some POMDPs, subset of the stte vribles re lwys observed nd cn be used to decide the next ction. In fct, we my sometimes explicitly construct observed vribles to remember relevnt prts of the history prior to the strt of mcro-ction (see Section 5); these cn be considered s prmeters tht re pssed on to the mcro-ction. Hence, one wy to construct the next ction in mcro-ction is to mke it function of the history since the mcro-ction strted, x k, k,o k+1,...,x t 1, t 1,o t,x t, where x i is the fully observble subset of stte vribles t time i, nd k is the strting time of the mcro-ction. Similrly, when the termintion criterion nd the observtion function of the mcro-ction depends only on the history x k, k,o k+1,...,x t 1, t 1,o t,x t, the mcro-ction cn retin trnsition function tht is independent of the history given the initil stte. Note tht the observtion to be pssed on to the POSMDP to crete the POSMDP observtion spce, O, is prt of the design trdeoff - usully it is desirble to reduce the number of observtions in order to reduce complexity without degrding the vlue of the POSMDP too much. In prticulr, we my not wish to include the execution length of the mcro-ction if it does not contribute much towrds obtining good policy. 4 Monte Crlo Vlue Itertion with Mcro-Actions We hve shown tht if the ction spce A nd the observtion spce O of POSMDP re discrete, then the optiml vlue function V cn be pproximted rbitrrily closely by piecewise-liner, convex function. Unfortuntely, when S is very high-dimensionl (or continuous), vector representtion is no longer effective. In this section, we show how the Monte Crlo Vlue Itertion (MCVI) lgorithm [2], which hs been designed for POMDPs with very lrge or infinite stte spces, cn be extended to POSMDP. Insted of -vectors, MCVI uses n lterntive policy representtion clled policy grph G. A policy grph is directed grph with lbeled nodes nd edges. Ech node of G is lbeled with n mcro-ction nd ech edge of G is lbeled with n observtion o. To execute policy G, it is treted s finite stte controller whose sttes re the nodes of G. Given n initil belief b, strting node v of G is selected nd its ssocited mcro-ction v is performed. The controller then trnsitions from v to new node v 0 by following the edge (v, v 0 ) lbeled with the observtion received, o. The process then repets with the new controller node v 0. Let G,v denote policy represented by G, when the controller lwys strts in node v of G. We define the vlue v (s) to be the expected totl rewrd of executing G,v with initil stte s. Hence V G (b) = mx v (s)b(s). (4) v2g V G is completely determined by the -functions ssocited with the nodes of G. 4.1 MC-Bckup One wy to pproximte the vlue function is to repetedly run the bckup opertor H strting from n rbitrry vlue function until it is close to convergence. This lgorithm is clled vlue itertion (VI). Vlue itertion cn be crried out on policy grphs s well, s it provides n implicit representtion of vlue function. Let V G be the vlue function for policy grph G. Substituting (4) into (2), we get HV G (b) = mx 2A n R(s, )b(s)+ o2o p (o,b) mx v2g o v (s)b 0 (s). (5) It is possible to then evlute the right-hnd side of (5) vi smpling nd monte crlo simultion t belief b. The outcome is new policy grph G 0 with vlue function ĤbV G. This is clled MC-bckup of G t b (Algorithm 1) [2]. There re A G O possible wys to generte new policy grph G 0 which hs one new node compred to the old policy grph node. Algorithm 1 computes n estimte of the best new policy grph t b using only N A G smples. Furthermore, we cn show tht MC-bckup pproximtes the stndrd VI bckup (eqution (5)) well t b, with error decresing t the rte O(1/ p N). Let R mx be the lrgest bsolute vlue of the rewrd, r t, t ny time step. 4

5 Algorithm 1 MC-Bckup of policy grph G t belief b 2Bwith N smples. MC-BACKUP(G, b, N) 1: For ech ction 2A, R 0. 2: For ech ction 2A, ech observtion o 2O, nd ech node v 2 G, V,o,v 0. 3: for ech ction 2Ado 4: for i =1to N do 5: Smple stte s i with probbility b(s i). 6: Simulte tking mcro-ction in stte s i. Generte new stte s 0 i, observtion o i, nd discounted rewrd R 0 (s i, ) by smpling from p(s j, o,k s i, ). 7: R R + R 0 (s i, ). 8: for ech node v 2 G do 9: Set V 0 to be the expected totl rewrd of simulting the policy represented by G, with initil controller stte v nd initil stte s 0 i. 10: V,oi,v V,oi,v + V 0. 11: for ech observtion o 2Odo 12: V,o mx v2g V,o,v. 13: v,o rgmx v2g V,o,v. 14: V (R + P o2o V,o)/N. 15: V mx 2A V. 16: rgmx 2A V. 17: Crete new policy grph G 0 by dding new node u to G. Lbel u with. For ech o 2O, dd the edge (u, v,o) nd lbel it with o. 18: return G 0. Theorem 3 Given policy grph G nd point b 2 B, MC-BACKUP(G, b, N) produces n improved policy grph such tht ĤbV G (b) with probbility t lest 1. HV G (b) pple 2R mx 1 s 2 O ln G +ln(2 A )+ln(1/ ) N The proof uses Hoeffding bound together with union bound. Detils cn be found in [2]. MC-bckup cn be combined with point-bsed POMDP plnning, which smples the belief spce B. Point-bsed POMDP lgorithms use set B of points smpled from B s n pproximte representtion of B. In contrst to the stndrd VI bckup opertor H, which performs bckup t every point in B, the opertor ĤB pplies MC-BACKUP(G m,b,n) on policy grph G m t every point in B. This results in B new policy grph nodes. Ĥ B then produces new policy grph G m+1 by dding the new policy grph nodes to the previous policy grph G m. Let B =sup b2b min b 0 2B kb b 0 k 1 be the mximum L 1 distnce from ny point in B to the closest point in B. Let V 0 be vlue function for some initil policy grph nd V m+1 = ĤBV m. The theorem below bounds the pproximtion error between V m nd the optiml vlue function V., Theorem 4 For every b 2 B, s V (b) V m(b) pple 2Rmx 2 O ln( B m)+ln(2 A )+ln( B m/ ) (1 ) 2 N + 2Rmx (1 ) 2 B + 2 m R mx (1 ), with probbility t lest 1. The proof requires the contrction property nd Lipschitz property tht cn be derived from the piece-wise linerity of the vlue function. Hving estblished those results in Section 3.1, the rest of the proof follows from the proof in [2]. The first term in the bound in Theorem 4 comes from Theorem 3, showing tht the error from smpling decys t the rte O(1/ p N) nd cn be reduced by tking lrge enough smple size. The second term depends on how well the set B covers B nd cn be reduced by smpling lrger number of beliefs. The lst term depends on the number of MC-bckup itertions nd decys exponentilly with m. 5

6 () (b) (c) Figure 1: () Underwter Nvigtion: A reduced mp with grid is shown with S mrking the possible initil positions, D mrking the destintions, R mrking the rocks nd O mrking the loctions where the robot cn loclize completely. (b) Collbortive serch nd cpture: Two robotic gents ctching 12 escped crocodiles in grid. (c) Vehiculr d-hoc networking: An UAV mintins d-hoc network over four ground vehicles in grid with B mrking the bse nd D the destintions. 4.2 Algorithm Theorem 4 bounds the performnce of the lgorithm when given set of beliefs. Mcro-MCVI, like MCVI, smples beliefs incrementlly in prctice nd performs bckup t the smpled beliefs. Brnch nd bound is used to void smpling unimportnt prts of the belief spce. See [2] for detils. The other importnt component in prcticl lgorithm is the genertion of next belief; Mcro- MCVI uses prticle filter for tht. Given the mcro-ction construction s described in Section 3.2, simple prticle filter is esily implemented to pproximte the next belief function in eqution (1): smple set of sttes from the current belief; from ech smpled stte, simulte the current mcroction until termintion, keeping trck of its pth length, t; if the observtion t termintion mtches the desired observtion, keep the prticle; the set of prticles tht re kept re weighted by t nd then renormlized to form the next belief 2. Similrly, MC-bckup is performed by simply running simultions of the mcro-ctions - there is no need to store dditionl trnsition nd observtion mtrices, llowing the method to run for very lrge stte spces. 5 Experiments We now illustrte the use of mcro-ctions for temporl bstrction in three POMDPs of vrying complexity. Their stte spces rnge from reltively smll to very lrge. Correspondingly, the mcro-ctions rnge from reltively simple ones to much more complex ones forming hierrchy. Underwter Nvigtion: The underwter nvigtion tsk ws introduced in [9]. In this tsk, n utonomous underwter vehicle (AUV) nvigtes in n environment modeled s 51 x 52 grid mp. The AUV needs to move from the left border to the right border while voiding the rocks scttered ner its destintion. The AUV hs six ctions: move north, move south, move est, move north-est, move south-est or sty in the sme loction. Due to poor visibility, the AUV cn only loclize itself long the top or bottom borders where there re becon signls. This problem hs severl interesting chrcteristics. First, the reltively smll stte spce size of 2653 mens tht solvers tht use -vectors, such s SARSOP [9] cn be used. Second, the dynmics of the robot is ctully noiseless, hence the min difficulty is ctully locliztion from the robot s initilly unknown loction. We use 5 mcro-ctions tht move in direction (north, south, est, north-est, or south-est) until either becon signl or the destintion is reched. We lso define n dditionl mcro-ction tht: nvigtes to the nerest gol loction if the AUV position is known, or simply stys in the sme loction if the AUV position is not known. To enble proper behviour of the lst mcro-ction, we ugment the stte spce with fully observble stte vrible tht indictes the current AUV loction. The vrible is initilized to vlue denoting unknown but tkes the vlue of the current AUV loction fter the becon signl is received. This gives simple exmple where the originl stte spce is ugmented with fully observble stte vrible to llow more sophisticted mcroction behviour. 2 More sophisticted pproximtion of the belief cn be constructed but my require more knowledge of the underlying POMDP nd more computtion. 6

7 Collbortive Serch nd Cpture: In this problem, group of crocodiles hd escped from its enclosure into the environment nd two robotic gents hve to collborte to hunt down nd cpture the crocodiles (see Figure 1). Both gents re centrlly controlled nd ech gent cn mke one step move in one of the four directions (north, south, est nd west) or sty still t ech time instnce. There re twelve crocodiles in the environment. At every time instnce, ech crocodile moves to loction furthest from the gent tht is nerest to it with probbility 1 p (p =0.05 in the experiments). With probbility p, the crocodile moves rndomly. A crocodile is cptured when it is t the sme loction s n gent. The gents do not know the exct loction of the crocodiles, but ech gent knows the number of crocodiles in the top left, top right, bottom left nd bottom right qudrnts round itself from the noise mde by the crocodiles. Ech cptured crocodile gives rewrd of 10, while movement is free. We define twenty-five mcro ctions where ech gents move (north, south, est, west, or sty) long pssge wy until one of them reches n intersection. In ddition, the mcro-ctions only return the observtion it mkes t the point when the mcro-ction termintes, reducing the complexity of the problem, possibly t cost of some sub-optimlity. In this problem, the mcro-ctions re simple, but the stte spce is extremely lrge (pproximtely ). Vehiculr Ad-hoc Network: In post disster serch nd rescue scenrio, group of rescue vehicles re deployed for opertion work in n re where communiction infrstructure hs been destroyed. The rescue units need high-bndwidth network to rely imges of ground situtions. An Unmnned Aeril Vehicle (UAV) cn be deployed to mintin WiFi network communiction between the ground units. The UAV needs to visit ech vehicle s often s possible to pick up nd deliver dt pckets [13]. In this tsk, 4 rescue vehicles nd 1 UAV nvigtes in terrin modeled s 10 x 10 grid mp. There re obstcles on the terrin tht re impssble to ground vehicle but pssble to UAV. The UAV cn move in one of the four directions (north, south, est, nd west) or sty in the sme loction t every time step. The vehicles set off from the sme bse nd move long some predefined pth towrds their pre-ssigned destintions where they will strt their opertions, rndomly stopping long the wy. Upon reching its destintion, the vehicle my rom round the environment rndomly while crrying out its mission. The UAV knows its own loction on the mp nd cn observe the loction of vehicle if they re in the sme grid squre. To elicit policy with low network ltency, there is penlty of 0.1 number of time steps since lst visit of vehicle for ech time step for ech vehicle. There is rewrd of 10 for ech time vehicle is visited by the UAV. The stte spce consists of the vehicles loctions, UAV loction in the grid mp nd the number of time steps since ech vehicle is lst seen (for computing the rewrd). We bstrct the movements of UAV to serch nd visit single vehicle s mcro ctions. There re two kinds of serch mcro ctions for ech vehicle: serch for vehicle long its predefined pth nd serch for vehicle tht hs strted to rom rndomly. To enble the mcro-ctions to work effectively, the stte spce is lso ugmented with the previous seen loction of ech vehicle. Ech mcro-ction is in turn hierrchiclly constructed by solving the simplified POMDP tsk of serching for single vehicle on the sme mp using bsic ctions nd some simple mcroctions tht move long the pths. This problem hs both complex hierrchiclly constructed mcro-ctions nd very lrge stte spce. 5.1 Experimentl setup We pplied Mcro-MCVI to the bove tsks nd compred its performnce with the originl MCVI lgorithm. We lso compred with stte-of-the-rt off-line POMDP solver, SARSOP [9], on the underwter nvigtion tsk. SARSOP could not run on the other two tsks, due to their lrge stte spce sizes. For ech tsk, we rn Mcro-MCVI until the verge totl rewrd stblized. We then rn the competing lgorithms for t lest the sme mount of time. The exct running times re difficult to control becuse of our implementtion limittions. To confirm the comprison results, we lso rn the competing lgorithms 100 times longer when possible. All experiments were conducted on 16 core Intel eon 2.4Ghz computer server. Neither MCVI nor SARSOP uses mcro-ctions. We re not wre of other efficient off-line mcroction POMDP solvers tht hve been demonstrted on very lrge stte spce problems. Some online serch lgorithms, such s PUMA [7], use mcro-ctions nd hve shown strong results. Online serch lgorithms do not generte policy, mking fir comprison difficult. Despite tht, they 7

8 re useful s bseline references; we implement vrint of PUMA s one such reference. In our experiments, we simply gve the online serch lgorithms s much or more time thn Mcro-MCVI nd report the results here. PUMA uses open-loop mcro-ctions. As bseline reference for online solvers with closed-loop mcro-ctions, we lso creted n online serch vrint of Mcro-MCVI by removing the MC-bckup component. We refer to this vrint s Online-Mcro. It is similr to other recent online POMDP lgorithms [12], but uses the sme closed-loop mcro-ctions s MCVI does. 5.2 Results The performnce of the different lgorithms is shown in Figure 2 with 95% confidence intervls. The underwter nvigtion tsk consist of two phses: the locliztion phse nd nvigte to gol phse. Mcro-MCVI s policy tkes one mcro-ction, moving northest until reching the border, to loclize nd nother mcro-ction, nvigting to the gol, to rech the gol. In contrst, both MCVI nd SARSOP fil to mtch the performnce of Mcro-MCVI even when they re run 100 times longer. Online-Mcro does well, s the plnning horizon is short with the use of mcro-ctions. PUMA, however, does not do s well, s it uses the less powerful open-loop mcroctions, which move in the sme direction for fixed number of time steps. Figure 2: Performnce comprison. Rewrd Time(s) Underwter Nvigtion Mcro-MCVI ± MCVI ± ± SARSOP ± ± PUMA ± Online-Mcro ± Collbortive Serch & Cpture Mcro-MCVI ± MCVI ± ± PUMA 1.04 ± Online-Mcro Vehiculr Ad-Hoc Network Mcro-MCVI ± MCVI ± Greedy ± For the collbortive serch & cpture tsk, MCVI fils to mtch the performnce of Mcro-MCVI even when it is run for 100 times longer. PUMA nd Online-Mcro do bdly s they fil to serch deep enough nd do not hve the benefit of reusing sub-policies obtined from the bckup opertion. To confirm tht it is the bckup opertion nd not the shorter per mcro-ction time tht is responsible for the performnce difference, we rn Online-Mcro for much longer time nd found the result unchnged. The vehiculr d-hoc network tsk ws solved hierrchiclly in two stges. We first used Mcro- MCVI to solve for the policy tht finds single vehicle. This stge took roughly 8 hours of computtion time. We then used the single-vehicle policy s mcro-ction nd solved for the higher-level policy tht plns over the mcro-ctions. Although it took substntil computtion time, Mcro- MCVI generted resonble policy in the end. In constrst, MCVI, without mcro-ctions, fils bdly for this tsk. Due to the long running time involved, we did not run MCVI 100 times longer. To confirm tht tht the policy computed by Mcro-MCVI t the higher level of the hierrchy is lso effective, we mnully crfted greedy policy over the single-vehicle mcro-ctions. This greedy policy lwys serches for the vehicle tht hs not been visited for the longest durtion. The experimentl results indicte tht the higher-level policy computed by Mcro-MCVI is more effective thn the greedy policy. We did not pply online lgorithms to this tsk, s we re not wre of ny simple wy to hierrchiclly construct mcro-ctions online. 6 Conclusions We hve successfully extended MCVI, n lgorithm for solving very lrge stte spce POMDPs, to include mcro-ctions. This llows MCVI to use temporl bstrction to help solve difficult POMDP problems. The method inherits the good theoreticl properties of MCVI nd is esy to pply in prctice. Experiments show tht it cn substntilly improve the performnce of MCVI when used with ppropritely chosen mcro-ctions. Acknowledgement We thnk Tomás Lozno-Pérez nd Leslie Kelbling from MIT for mny insightful discussions. This work is supported in prt by MoE grnt MOE2010-T nd MDA GAMBIT grnt R

9 References [1] H. Bi, D. Hsu, M.J. Kochenderfer, nd W. S. Lee. Unmnned ircrft collision voidnce using continuous-stte POMDPs. In Proc. Robotics: Science & Systems, [2] H. Bi, D. Hsu, W. S. Lee, nd V. Ngo. Monte Crlo Vlue Itertion for Continuous-Stte POMDPs. In Algorithmic Foundtions of Robotics I Proc. Int. Workshop o n the Algorithmic Foundtions of Robotics (WAFR), pges Springer, [3] Andrew G. Brto nd Sridhr Mhdevn. Recent dvnces in hierrchicl reinforcement lerning. Discrete Event Dynmic Systems, 13:2003, [4] T. G. Dietterich. Hierrchicl reinforcement lerning with the MAQ vlue function decomposition. J. Artificil Intelligence Reserch, 13: , [5] E. Hnsen nd R. Zhou. Synthesis of hierrchicl finite-stte controllers for POMDPs. In Proc. Int. Conf. on Automted Plnning nd Scheduling, [6] M. Huskrecht, N. Meuleu, L.P. Kelbling, T. Den, nd C. Boutilier. Hierrchicl solution of Mrkov decision processes using mcro-ctions. In Proc. Conf. on Uncertinty in Artificil Intelligence, pges Citeseer, [7] R. He, E. Brunskill, nd N. Roy. PUMA: Plnning under uncertinty with mcro-ctions. In Proc. AAAI Conf. on Artificil Intelligence, [8] H. Kurniwti, Y. Du, D. Hsu, nd W. S. Lee. Motion plnning under uncertinty for robotic tsks with long time horizons. Int. J. Robotics Reserch, 30(3): , [9] H. Kurniwti, D. Hsu, nd W.S. Lee. SARSOP: Efficient point-bsed POMDP plnning by pproximting optimlly rechble belief spces. In Proc. Robotics: Science & Systems, [10] J. Pineu, G. Gordon, nd S. Thrun. Point-bsed vlue itertion: An nytime lgorithm for POMDPs. In Int. Jnt. Conf. on Artificil Intelligence, volume 18, pges , [11] J. Pineu, N. Roy, nd S. Thrun. A hierrchicl pproch to POMDP plnning nd execution. In Workshop on Hierrchy & Memory in Reinforcement Lerning (ICML), volume 156, [12] S. Ross, J. Pineu, S. Pquet, nd B. Chib-Dr. Online plnning lgorithms for POMDPs. Journl of Artificil Intelligence Reserch, 32(1): , [13] A. Sivkumr nd C.K.Y. Tn. UAV swrm coordintion using coopertive control for estblishing wireless communictions bckbone. In Proc. Int. Conf. on Autonomous Agents & Multigent Systems, pges , [14] T. Smith nd R. Simmons. Heuristic serch vlue itertion for POMDPs. In Proc. Conf. on Uncertinty in Artificil Intelligence, pges AUAI Press, [15] R.S. Sutton, D. Precup, nd S. Singh. Between MDPs nd semi-mdps: A frmework for temporl bstrction in reinforcement lerning. Artificil Intelligence, 112(1): , [16] G. Theochrous nd L. P. Kelbling. Approximte plnning in POMDPs with mcro-ctions. Advnces in Neurl Processing Informtion Systems, 17, [17] M. Toussint, L. Chrlin, nd P. Pouprt. Hierrchicl POMDP controller optimiztion by likelihood mximiztion. Proc. Conf. on Uncertinty in Artificil Intelligence, [18] C.C. White. Procedures for the solution of finite-horizon, prtilly observed, semi-mrkov optimiztion problem. Opertions Reserch, 24(2): ,

10 Supplementry Mteril for Monte Crlo Vlue Itertion with Mcro-Actions Lemm 1 Given vlue functions U nd V, HU HV 1 pple U V 1. Proof. Let b be n rbitrry belief nd ssume tht HV (b) pple HU(b) holds. Let be the optiml mcro ction for HU(b). Then 0 pple HU(b) HV (b) pple R(b, )+ p (o,b)u( (b, o, )) R(b, ) p (o,b)v ( (b, o, )) o2o = o2o p (o,b)[u( (b, o, ) V ( (b, o, ))] o2o pple o2o p (o,b) U V 1 pple U V 1. Since 1 is symmetricl, the result is the sme for the cse of HU(b) pple HV (b). By tking 1 over ll weighted belief, we get HU HV 1 pple U V 1. Thus, H is contrctive mpping. Theorem 2 The vlue function for n m-step policy is piecewise liner nd convex nd cn be represented s V m (b) = mx (s)b(s) (1) where m is finite collection of -vectors. 2 m Proof. We prove this property by induction. When m =1, the intil vlue function V 1 is the best expected rewrd nd cn be written s V 1 (b) = mx R(b, ) = mx R(s, )b(s). This hs the sme form s V m (b) = mx m2 m P m(s)b(s) where there is one liner -vector for ech mcro ction. V 1 (b) cn therefore be represented s finite collection of -vectors. Assuming the optiml vlue function for ny b i 1 is represented using finite set of -vector i 1 = { i 0 1, 1 i 1,...} nd V i 1(b i 1) = mx b i 1 (s) i 1 (s) (2) Substituting into (2), we get b i 1 (s) = 1 j=1 V i 1 (b i 1 ) = mx i 12 i 1 j 1 s0 i 12 i 1 Substituting it into the bckup eqution (??) gives V i (b i ) = mx = mx = mx R(b i, )+ p (o,b i ) mx i 12 i 1 o2o R(b i, )+ mx o2o 1 i 1 2 i 1,..., O i 1 s 0 2S mx i 12 i 1 p(s, o,j s 0, )b i (s 0 )/p (o,b i )1 P 1 j=1 j 1 P s 0 p(s, o,j s0, )b i (s 0 ) 1 j=1 2 b i (s 0 ) 4R(s 0, )+ j p (o,b i ) 1 s0 i 1 (s). P 1 j=1 j 1 P s 0 p(s, o,j s0, )b i (s 0 ) p (o,b i ) p(s, o,j s 0, )b i (s 0 ) i 1 o2o j=1 1 (s) i 3 j 1 p(s, o,j s 0, ) i o 1(s) 5 1 (s) 1

11 The expression in the squre brcket cn evlute to A i 1 O different vectors. We cn rewrite V i (b i ) s: V i (b i ) = mx i (s)b i (s). i2 i Hence V i (b i ) cn be represented by finite set of -vector. 2

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as Improper Integrls Two different types of integrls cn qulify s improper. The first type of improper integrl (which we will refer to s Type I) involves evluting n integrl over n infinite region. In the grph

More information

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018 DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:

More information

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b. Mth 255 - Vector lculus II Notes 4.2 Pth nd Line Integrls We begin with discussion of pth integrls (the book clls them sclr line integrls). We will do this for function of two vribles, but these ides cn

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Uninformed Search Lecture 4

Uninformed Search Lecture 4 Lecture 4 Wht re common serch strtegies tht operte given only serch problem? How do they compre? 1 Agend A quick refresher DFS, BFS, ID-DFS, UCS Unifiction! 2 Serch Problem Formlism Defined vi the following

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties

More information

MATH 144: Business Calculus Final Review

MATH 144: Business Calculus Final Review MATH 144: Business Clculus Finl Review 1 Skills 1. Clculte severl limits. 2. Find verticl nd horizontl symptotes for given rtionl function. 3. Clculte derivtive by definition. 4. Clculte severl derivtives

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

221B Lecture Notes WKB Method

221B Lecture Notes WKB Method Clssicl Limit B Lecture Notes WKB Method Hmilton Jcobi Eqution We strt from the Schrödinger eqution for single prticle in potentil i h t ψ x, t = [ ] h m + V x ψ x, t. We cn rewrite this eqution by using

More information

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence Fall Announcements CS 188: Artificil Intelligence Fll 2009 Lecture 20: Prticle Filtering 11/5/2009 Dn Klein UC Berkeley Announcements Written 3 out: due 10/12 Project 4 out: due 10/19 Written 4 proly xed, Project 5 moving

More information

13: Diffusion in 2 Energy Groups

13: Diffusion in 2 Energy Groups 3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups

More information

Math 61CM - Solutions to homework 9

Math 61CM - Solutions to homework 9 Mth 61CM - Solutions to homework 9 Cédric De Groote November 30 th, 2018 Problem 1: Recll tht the left limit of function f t point c is defined s follows: lim f(x) = l x c if for ny > 0 there exists δ

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

Jonathan Mugan. July 15, 2013

Jonathan Mugan. July 15, 2013 Jonthn Mugn July 15, 2013 Imgine rt in Skinner box. The rt cn see screen of imges, nd dot in the lower-right corner determines if there will be shock. Bottom-up methods my not find this dot, but top-down

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

CBE 291b - Computation And Optimization For Engineers

CBE 291b - Computation And Optimization For Engineers The University of Western Ontrio Fculty of Engineering Science Deprtment of Chemicl nd Biochemicl Engineering CBE 9b - Computtion And Optimiztion For Engineers Mtlb Project Introduction Prof. A. Jutn Jn

More information

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 UNIFORM CONVERGENCE Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 Suppose f n : Ω R or f n : Ω C is sequence of rel or complex functions, nd f n f s n in some sense. Furthermore,

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

A Fast and Reliable Policy Improvement Algorithm

A Fast and Reliable Policy Improvement Algorithm A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce

More information

4.4 Areas, Integrals and Antiderivatives

4.4 Areas, Integrals and Antiderivatives . res, integrls nd ntiderivtives 333. Ares, Integrls nd Antiderivtives This section explores properties of functions defined s res nd exmines some connections mong res, integrls nd ntiderivtives. In order

More information

The practical version

The practical version Roerto s Notes on Integrl Clculus Chpter 4: Definite integrls nd the FTC Section 7 The Fundmentl Theorem of Clculus: The prcticl version Wht you need to know lredy: The theoreticl version of the FTC. Wht

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

8 Laplace s Method and Local Limit Theorems

8 Laplace s Method and Local Limit Theorems 8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Lecture 1. Functional series. Pointwise and uniform convergence.

Lecture 1. Functional series. Pointwise and uniform convergence. 1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

Math& 152 Section Integration by Parts

Math& 152 Section Integration by Parts Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

CS667 Lecture 6: Monte Carlo Integration 02/10/05

CS667 Lecture 6: Monte Carlo Integration 02/10/05 CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations ME 3600 Control ystems Chrcteristics of Open-Loop nd Closed-Loop ystems Importnt Control ystem Chrcteristics o ensitivity of system response to prmetric vritions cn be reduced o rnsient nd stedy-stte responses

More information

Lesson 1: Quadratic Equations

Lesson 1: Quadratic Equations Lesson 1: Qudrtic Equtions Qudrtic Eqution: The qudrtic eqution in form is. In this section, we will review 4 methods of qudrtic equtions, nd when it is most to use ech method. 1. 3.. 4. Method 1: Fctoring

More information

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of

More information

Lecture 19: Continuous Least Squares Approximation

Lecture 19: Continuous Least Squares Approximation Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Numerical Integration

Numerical Integration Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the

More information

Operations with Polynomials

Operations with Polynomials 38 Chpter P Prerequisites P.4 Opertions with Polynomils Wht you should lern: How to identify the leding coefficients nd degrees of polynomils How to dd nd subtrct polynomils How to multiply polynomils

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

APPROXIMATE INTEGRATION

APPROXIMATE INTEGRATION APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

Quantum Physics II (8.05) Fall 2013 Assignment 2

Quantum Physics II (8.05) Fall 2013 Assignment 2 Quntum Physics II (8.05) Fll 2013 Assignment 2 Msschusetts Institute of Technology Physics Deprtment Due Fridy September 20, 2013 September 13, 2013 3:00 pm Suggested Reding Continued from lst week: 1.

More information

221A Lecture Notes WKB Method

221A Lecture Notes WKB Method A Lecture Notes WKB Method Hmilton Jcobi Eqution We strt from the Schrödinger eqution for single prticle in potentil i h t ψ x, t = [ ] h m + V x ψ x, t. We cn rewrite this eqution by using ψ x, t = e

More information

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model: 1 2 MIXED MODELS (Sections 17.7 17.8) Exmple: Suppose tht in the fiber breking strength exmple, the four mchines used were the only ones of interest, but the interest ws over wide rnge of opertors, nd

More information

A. Limits - L Hopital s Rule ( ) How to find it: Try and find limits by traditional methods (plugging in). If you get 0 0 or!!, apply C.! 1 6 C.

A. Limits - L Hopital s Rule ( ) How to find it: Try and find limits by traditional methods (plugging in). If you get 0 0 or!!, apply C.! 1 6 C. A. Limits - L Hopitl s Rule Wht you re finding: L Hopitl s Rule is used to find limits of the form f ( x) lim where lim f x x! c g x ( ) = or lim f ( x) = limg( x) = ". ( ) x! c limg( x) = 0 x! c x! c

More information

Indefinite Integral. Chapter Integration - reverse of differentiation

Indefinite Integral. Chapter Integration - reverse of differentiation Chpter Indefinite Integrl Most of the mthemticl opertions hve inverse opertions. The inverse opertion of differentition is clled integrtion. For exmple, describing process t the given moment knowing the

More information

Line and Surface Integrals: An Intuitive Understanding

Line and Surface Integrals: An Intuitive Understanding Line nd Surfce Integrls: An Intuitive Understnding Joseph Breen Introduction Multivrible clculus is ll bout bstrcting the ides of differentition nd integrtion from the fmilir single vrible cse to tht of

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

1 The Riemann Integral

1 The Riemann Integral The Riemnn Integrl. An exmple leding to the notion of integrl (res) We know how to find (i.e. define) the re of rectngle (bse height), tringle ( (sum of res of tringles). But how do we find/define n re

More information

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that Arc Length of Curves in Three Dimensionl Spce If the vector function r(t) f(t) i + g(t) j + h(t) k trces out the curve C s t vries, we cn mesure distnces long C using formul nerly identicl to one tht we

More information

Chapter 7 Notes, Stewart 8e. 7.1 Integration by Parts Trigonometric Integrals Evaluating sin m x cos n (x) dx...

Chapter 7 Notes, Stewart 8e. 7.1 Integration by Parts Trigonometric Integrals Evaluating sin m x cos n (x) dx... Contents 7.1 Integrtion by Prts................................... 2 7.2 Trigonometric Integrls.................................. 8 7.2.1 Evluting sin m x cos n (x)......................... 8 7.2.2 Evluting

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

Conservation Law. Chapter Goal. 5.2 Theory

Conservation Law. Chapter Goal. 5.2 Theory Chpter 5 Conservtion Lw 5.1 Gol Our long term gol is to understnd how mny mthemticl models re derived. We study how certin quntity chnges with time in given region (sptil domin). We first derive the very

More information

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac REVIEW OF ALGEBRA Here we review the bsic rules nd procedures of lgebr tht you need to know in order to be successful in clculus. ARITHMETIC OPERATIONS The rel numbers hve the following properties: b b

More information

Chapter 3 Solving Nonlinear Equations

Chapter 3 Solving Nonlinear Equations Chpter 3 Solving Nonliner Equtions 3.1 Introduction The nonliner function of unknown vrible x is in the form of where n could be non-integer. Root is the numericl vlue of x tht stisfies f ( x) 0. Grphiclly,

More information

REINFORCEMENT learning (RL) was originally studied

REINFORCEMENT learning (RL) was originally studied IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 3, MARCH 2015 385 Multiobjective Reinforcement Lerning: A Comprehensive Overview Chunming Liu, Xin Xu, Senior Member, IEEE, nd

More information

A-Level Mathematics Transition Task (compulsory for all maths students and all further maths student)

A-Level Mathematics Transition Task (compulsory for all maths students and all further maths student) A-Level Mthemtics Trnsition Tsk (compulsory for ll mths students nd ll further mths student) Due: st Lesson of the yer. Length: - hours work (depending on prior knowledge) This trnsition tsk provides revision

More information

Integral equations, eigenvalue, function interpolation

Integral equations, eigenvalue, function interpolation Integrl equtions, eigenvlue, function interpoltion Mrcin Chrząszcz mchrzsz@cernch Monte Crlo methods, 26 My, 2016 1 / Mrcin Chrząszcz (Universität Zürich) Integrl equtions, eigenvlue, function interpoltion

More information

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior Reversls of Signl-Posterior Monotonicity for Any Bounded Prior Christopher P. Chmbers Pul J. Hely Abstrct Pul Milgrom (The Bell Journl of Economics, 12(2): 380 391) showed tht if the strict monotone likelihood

More information

The Wave Equation I. MA 436 Kurt Bryan

The Wave Equation I. MA 436 Kurt Bryan 1 Introduction The Wve Eqution I MA 436 Kurt Bryn Consider string stretching long the x xis, of indeterminte (or even infinite!) length. We wnt to derive n eqution which models the motion of the string

More information

Math 270A: Numerical Linear Algebra

Math 270A: Numerical Linear Algebra Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner

More information

Math 113 Fall Final Exam Review. 2. Applications of Integration Chapter 6 including sections and section 6.8

Math 113 Fall Final Exam Review. 2. Applications of Integration Chapter 6 including sections and section 6.8 Mth 3 Fll 0 The scope of the finl exm will include: Finl Exm Review. Integrls Chpter 5 including sections 5. 5.7, 5.0. Applictions of Integrtion Chpter 6 including sections 6. 6.5 nd section 6.8 3. Infinite

More information