Efficient Planning in R-max

Size: px
Start display at page:

Download "Efficient Planning in R-max"

Transcription

1 Efficient Plnning in R-mx Mrek Grześ nd Jee Hoey Dvid R. Cheriton School of Computer Science, Univerity of Wterloo 200 Univerity Avenue Wet, Wterloo, ON, N2L 3G1, Cnd {mgrze, ABSTRACT PAC-MDP lgorithm re prticulrly efficient in term of the number of mple obtined from the environment which re needed by the lerning gent in order to chieve ner optiml performnce. Thee lgorithm however execute time conuming plnning tep fter ech new tte-ction pir become known to the gent, tht i, the pir h been mpled ufficiently mny time to be conidered known by the lgorithm. Thi fct i eriou limittion on broder ppliction of thee kind of lgorithm. Thi pper exmine the plnning problem in PAC-MDP lerning. Vlue itertion, prioritized weeping, nd bckwrd vlue itertion re invetigted. Through the exploittion of the pecific nture of the plnning problem in the conidered reinforcement lerning lgorithm, we how how thee plnning lgorithm cn be improved. Our extenion yield ignificnt improvement in ll evluted lgorithm, nd tndrd vlue itertion in prticulr. The theoreticl jutifiction to ll contribution i provided nd ll pproche re further evluted empiriclly. With our extenion, we mnged to olve problem of ize which hve never been pproched by PAC-MDP lerning in the exiting literture. Ctegorie nd Subject Decriptor I.2.6 [Artificil Intelligence]: Lerning; I.2.8 [Artificil Intelligence]: Problem Solving, Control Method, nd Serch Generl Term Algorithm, Experimenttion, Theory Keyword Reinforcement lerning, Plnning, MDP, Vlue Itertion 1 Introduction The key reerch chllenge in the re of reinforcement lerning (RL) i how to blnce the explortion-exploittion trde-off. One of the bet pproche to explortion in RL, which h good theoreticl propertie, i o clled PAC-MDP lerning (PAC men Probbly Approximtely Correct). Stte-of-the-rt exmple of thi ide re E 3 [9] nd R-mx [3]. PAC-MDP lerning define the explortion trtegy which gurntee tht with high probbility the lgorithm perform ner optimlly for ll but polynomil number of time tep (i.e., polynomil in the relevnt prmeter of the underlying proce). Thi fct men tht PAC-MDP lgorithm Cite : Efficient Plnning in R-mx, Mrek Grześ nd Jee Hoey, Proc. of 10th Int. Conf. on Autonomou Agent nd Multigent Sytem (AAMAS 2011), Tumer, Yolum, Sonenberg nd Stone (ed.), My, 2 6, 2011, Tipei, Tiwn, pp. XXX-XXX. Copyright c 2011, Interntionl Foundtion for Autonomou Agent nd Multigent Sytem ( All right reerved. re coniderbly efficient in term of the number of mple which re needed during lerning in order to chieve ner optiml performnce. Thee lgorithm however execute time conuming plnning tep fter ech new tte-ction pir become known to the gent, i.e., the pir w mpled ufficiently mny time to be conidered known by the lgorithm, nd thi i eriou limittion gint broder ppliction of thee kind of lgorithm [21]. Thi pper exmine the plnning problem in PAC-MDP lerning. A number of lgorithm re invetigted with regrd to plnning in PAC-MDP RL (thi include vlue itertion, prioritized weeping, nd bckwrd vlue itertion), nd the contribution of thi pper cn ummrized follow: Firt, we how how the tndrd R-mx lgorithm cn reduce the wort ce number of plnning tep from S A to S. Second, exploiting the pecil nture of the plnning problem in conidered RL lgorithm, the new updte opertor i propoed which updte only the bet ction of ech tte until convergence within the given tte. Thi pproch yield ignificnt improvement in ll evluted lgorithm, nd in tndrd vlue itertion in prticulr. Next, n extenion i propoed to the prioritized weeping lgorithm which gin exploit propertie of plnning problem in PAC-MDP lerning. Specificlly, only policy predeceor of ech tte re dded to the priority queue in contrt to dding ll predeceor in the tndrd prioritized weeping lgorithm. Finlly, we pply bckwrd vlue itertion (BVI) to plnning in R-mx, nd we how tht the originl lgorithm from the literture [4] cn fil on brod cle of MDP. We how the problem, nd fter tht our correction to the BVI lgorithm i propoed for the generl ce. Then, our extenion to the corrected verion of BVI which re gin pecific to plnning in PAC-MDP lerning re propoed. The theoreticl jutifiction to ll contribution i provided nd ll pproche re further evluted empiriclly on two domin. Regrdle which prticulr PAC-MDP lgorithm i conidered, the time conuming plning tep i required fter new tte-ction pir become known. Thi problem pplie lo to other modelbed RL lgorithm which re not PAC-MDP, uch the Byein Explortion Bonu lgorithm [10]. Our work i to improve the plning tep of thee kind of lgorithm, nd it pplie to ll exiting flvour of PAC-MDP lerning [16, 19]. In thi pper, we re focuing on R-mx, populr exmple of PAC-MDP lerning, nd our work i eqully pplicble to other relted model-bed RL lgorithm (including thoe which heuriticlly modify rewrd [1]). 2 Bckground The underlying mthemticl model of the RL methodology i the Mrkov Deciion Proce (MDP). An MDP i defined tuple (S, A, T, R, γ), where S i the tte pce, A i the ction pce, T : S A S i the trnition function, R : S A S R

2 the rewrd function (which i umed here to be bounded bove by the vlue R mx ), nd 0 γ 1 i the dicount fctor which determine how the long-term rewrd i clculted from immedite rewrd [15]. The problem of olving n MDP i to find policy (i.e., mpping from tte to ction) which mximize the ccumulted rewrd. A Bellmn eqution define optimlity condition when the environment dynmic (i.e., trnition probbilitie nd rewrd function) re known [2]. In uch ce, the problem of finding the policy become plnning problem which cn be olved uing itertive pproche like policy nd vlue itertion [2]. Thee lgorithm tke (S, A, T, R, γ) n input nd return policy which determine which ction hould be tken in ech tte o tht the long term rewrd i mximized. In lgorithm which repreent the policy vi the vlue function, Q(, ) reflect the expected long term rewrd when ction i executed in tte nd V () = mx Q(, ). The policy nd vlue itertion method require cce to n explicit, mthemticl model of the environment, tht i, trnition probbilitie, T, nd the rewrd function, R, of the controlled proce. When uch model i not vilble, there i need for lgorithm which cn lern from experience. Algorithm which lern the policy from the imultion in the bence of the MDP model re known reinforcement lerning [18, 2]. The firt mjor pproch to RL i to etimte the miing model of the environment uing, e.g., ttiticl technique. The repeted imultion i ued to pproximte or verge the model. Once uch n etimtion of the model i vilble, tndrd technique for olving MDP re gin pplicble. Thi pproch i known model-bed RL [17]. Thi pper invetigte pecil type of model-bed RL which i known PAC-MDP lerning. An lterntive cl of pproche to RL which re not conidered in thi pper doe not ttempt to etimte the model of the environment, nd becue of tht i clled model-free RL. Algorithm of thi type directly etimte the vlue function or policy [13] from repeted imultion. The tndrd exmple of thi pproch contitute Q-lerning nd SARSA lgorithm [18]. PAC-MDP lerning i prticulr pproch to explortion in RL nd i bed on optimim in the fce of uncertinty [9, 3]. Like in tndrd model-bed lerning, in PAC-MDP model-bed lgorithm, the dynmic of the underlying MDP re etimted from dt. If certin tte-ction pir h been experienced enough time (prmeter m control thi in R-mx), then the etimted dynmic re cloe to the true vlue. The optimim under uncertinty ply crucil role when deling with tte-ction pir which hve not been experienced m time. For uch pir, the lgorithm ume the highet poible vlue of their Q-vlue. Sttection pir for which n(, ) < m re nmed unknown nd known when n(, ) m where n(, ) i the number of time the ttection pir w experienced. When new tte ction pir become known, the exiting pproximtion, ˆM, of the true model, M, i ued to compute the correponding optiml policy for ˆM which when executed will encourge the lgorithm to try unknown ction nd lern their dynmic. Such n explortion trtegy gurntee tht with high probbility the lgorithm perform ner optimlly for ll but polynomil number of tep (i.e., polynomil in the relevnt prmeter of the underlying MDP). The prototypicl R-mx lgorithm ue the tndrd Bellmn bckup (ee Algorithm 1) nd vlue itertion to compute the policy, ˆπ, for the model ˆM, where the policy ˆπ() i defined in Eqution 1. ˆπ() = rg mx ˆQ(, ) (1) Summrizing, the R-mx lgorithm work follow: It ct Algorithm 1 Bckup(): Bellmn bckup for tte old_vl ˆV () { ˆV () = mx ˆQ(, ) = ˆR(, ) + γ ˆT (,, ) ˆV } ( ) return old_vl ˆV () greedily ccording to the current ˆV. Once new tte-ction pir become known, it perform plnning with the updted model (i.e., model with new known tte-ction pir), nd gin ct greedily ccording to updted ˆV. A nturl nd the mot efficient pproch to plnning in thi cenrio i to ue the outcome of the previou plnning proce the initil vlue function for new plnning, which we refer in the pper to incrementl plnning. Thi i umed for ll lgorithm nd experiment of thi pper. The proof nd the theoreticl nlyi of PAC-MDP lgorithm cn be found in the relevnt literture [8, 16]. In our nlyi one pecific property of uch lgorithm i dvocted: the optimim under uncertinty which gurntee tht inequlity ˆV () V () i lwy tified during lerning, where V () i the optiml vlue function which correpond to the true MDP model M. 3 Known Stte in R-mx The focu of thi pper i how to perform the plnning tep in R- mx efficiently. In originl R-mx, the plnning tep i executed every time new tte-ction pir become known [3] (thi i lo the ce in known implementtion [1]). While invetigting the rnge of plnning lgorithm which re dicued below, we found tht the efficiency of plnning in R-mx cn be improved by tking into ccount the fct tht the vlue of given tte doe not chnge until ll it ction become known. Thi i becue if ll unknown ttection pir re initilized with V mx ( i the ce in R-mx), where V mx = R mx /(1 γ) when γ < 1 nd V mx = R mx if γ = 1, then V () = V mx long t let one ction remin unknown in tte. If the R-mx lgorithm execute the plnning lgorithm fter the pir (,) become known, where there till exit t let one ction which i unknown in, then only one Q-vlue will chnge it vlue, i.e., the vlue of the pir (,). If, fter the updte, Q(, ) < V () = V mx, the vlue of will not chnge. Action will not be executed next time in tte, nd nother ction will be ued. In thi wy, unknown ction re correctly explored by policy ˆπ from Eqution 1, but we oberve here tht the updte i uele. Our novel improvement, which come from the bove obervtion, i to extend the notion of known tte-ction pir by notion of known tte, where known() = true iff known(, ) = true. With thi extenion, our pproch i to execute the plnning tep in R-mx only when new tte,, become known (i.e., known() become true). The only iue now i tht the ction election ccording to Eqution 1 h be chnged in order to del properly with tte for which known() = fle. Thi cn be ddreed by electing ction uing Algorithm 2 inted of Eqution 1. A explined Algorithm 2 GetAction(): modified ction election method if known() then return ˆπ() {ee Eqution 1} ele return ny ction for which known(, ) = fle end if bove, thi procedure will not chnge the explortion of the R- mx lgorithm when tie re broken rndomly. Normlly, when

3 the plnning tep i executed fter lerning ech new tte-ction pir, it Q-vlue i Q(, ) V () = V mx when there exit t let one unknown ction. When tie re broken rndomly (thi i for the ce when Q(, ) = V mx for updted known ction ), thi i equivlent to potponing plnning nd executing nother ction which i till unknown when known() = fle. Thi improvement i prticulrly ueful for plnning lgorithm which do the ytemtic updte of the entire Q-tble vlue itertion doe, becue when known() = fle the entire plnning proce chnge Q-vlue only of thoe ction which hve jut become known nd there re no chnge in Q-vlue of ny other tte, where vlue itertion will iterte nd perform (uele) Bellmn updte for ll te. Experimentl vlidtion of our extenion i in the experimentl ection of the pper. Since, thi improvement yielded coniderble peed-up nd repreent more efficient implementtion of R-mx, if not tted otherwie, we ue thi extenion in ll experiment preented in the pper. The min gol of thi pper i to peed up the R-mx lgorithm with regrd to plnning, nd our pproch preented here reduce the number of execution of the plnner (regrdle which plnner i ued) from O( S A ) to O( S ). 4 Bet-ction Only Updte From thi point, we re looking t wy of improving plnning lgorithm. The firt extenion which i introduced in thi ection i pplicble to ll lgorithm invetigted in the pper. However in order to mke the preenttion eier to undertnd by the reder nd to explin the intuition which i behind thi extenion, we how firtly how it pplie to vlue itertion. It ppliction to other plnning pproche i dicued in detil in ubequent ection. Let ume the tndrd cenrio of R-mx lerning when vlue itertion i ued plnning method, together with the incrementl pproch indicted t the end of Section 2. Thi men tht the initil vlue function t the beginning of plnning i lwy optimitic with regrd to the vlue which i the reult of plnning. Additionlly, under condition pecified below, the vlue function fter ech Bellmn bckup i lo optimitic with regrd to the vlue function fter the previou Bellmn bckup (in R-mx, vlue re ucceively decreed to reflect the chnge in the model which mde the model le optimitic when new tte becme known). The intuition which motivte Algorithm 3 i tht the chnge of V () in given itertion cn be triggered only by the chnge of the Q-vlue of the bet ction of becue ll Q(, ) re lwy optimitic with regrd to the optiml vlue function nd to the vlue fter ucceeding Bellmn bckup, nd we rgue here tht in ech tte the ction which h highet Q(, ) hould be updted firt. Thi cn be explined follow. If the vlue of the bet ction will not chnge fter it updte, which men tht V () will not chnge in the current itertion, then ll other remining ction cn be kipped in thi itertion becue they hve lower vlue nd they will not influence V () (thi explin why the for loop in Algorithm 3 cn bckup only the bet ction). If the vlue of the bet ction chnge fter the updte on the other hnd, then nother ction my be the bet nd it i reonble to updte currently the bet ction of the me tte gin (thi explin why the externl loop of Algorithm 3 mke ene). We recll here tht in the tndrd Bellmn bckup (ee Algorithm 1) ll ction re updted. Our ide here i tht it i profitble to focu Bellmn bckup only on the bet ction of ech tte inted of performing updte of ll ction when optimitic initiliztion tifie condition defined below. Thi concept i nmed bet-ction only updte (BAO) nd i cptured by Algorithm 3. The two forml rgument below prove tht Algorithm 3 i vlid. Algorithm 3 BAO(): bet-ction only bckup of tte old_vl V () repet bet_ction = ll in t. Q(, ) mx i Q(, i) < ɛ δ = 0 for ech in bet_ction do old_q = Q(, ) Q(, ) = R(, ) + γ T (,, ) mx Q(, ) if old_q Q(, ) > δ then δ = old_q Q(, ) end if end for until δ < ɛ return old_vl V () DEFINITION 1. Optimitic initiliztion with one tep monotonicity (OOSM) i the pecil ce of optimitic initiliztion of the Q-tble which tifie the following property: Q(, ) R(, ) + γ T (,, )V ( ). The property of OOSM initiliztion i tified, e.g., in ny MDP long ll Q-vlue re initilized with V mx. It will be hown in wht follow tht plnning in R-mx tifie the OOSM requirement well. In order to prove Algorithm 3, we firt prove the following lemm: LEMMA 1. If ll Q(, ) re initilized ccording to optimim with one tep monotonicity (OOSM), then fter ech individul t + 1-t Bellmn bckup of the Q-tble, the following inequlity i tified:,q t(, ) Q t+1(, ), where Q t i the vlue function fter the previou, t-th, Bellmn bckup. PROOF. We prove thi lemm by induction on the number of performed Bellmn bckup of Q-vlue. To prove the be ce, we how tht the lemm i tified fter the firt Bellmn bckup. Thi i tified directly by the definition of optimim with one tep monotonicity (ee Definition 1). After proving the be ce, we ume tht the ttement hold fter t Bellmn bckup, nd we will how tht it hold fter t + 1 bckup uing the following rgument: Q t (, ) = R(, ) + γ T (,, )V t 1 ( ) R(, ) + γ T (,, )V t( ) = Q t+1(, ), The firt Bellmn eqution how tht the updte of Q t (, ) in the bckup t i bed on vlue of ll next tte,, fter t 1 bckup, nd the third Bellmn eqution i nlogou for the bckup t + 1. The econd tep i from the induction hypothei which ume tht V t 1 ( ) V t ( ). The following corollry reult from Lemm 1: COROLLARY 1. Q-vlue converge monotoniclly to Q (, ) when ll Q(, ) entrie re OOSM initilized in vlue itertion. THEOREM 1. Vlue itertion with bet-ction only updte of Algorithm 3 converge to the me vlue tndrd vlue itertion with the Bellmn bckup of Algorithm 1 when the vlue function i OOSM initilized, i.e., when the optimitic initiliztion tifie Definition 1.

4 PROOF. In order to prove thi theorem, it i ufficient to how tht non-bet ction do not hve to be updted. Let ume tht i non-bet ction of prticulr tte, i.e., n ction t. Q(, ) < mx i Q(, i). Becue ll Q-vlue re initilly OOSM optimitic, we know from Lemm 1 tht Q(, ) cnnot be mde higher thn it current vlue in ny of the future itertion of vlue itertion. It men tht Q(, ) cnnot be mde higher thn mx i Q(, i) by updting Q(, ), nd the only wy to mke Q(, ) the bet ction in i to reduce the vlue of mx i Q(, ) which my hppen only by updting ction i which tifie mx i Q(, i). Thi how tht if the vlue function i initilized with OOSM optimim, it i ufficient to updte the bet ction only. Additionlly, if mx i Q(, i) < ɛ, V () cnnot chnge in the current itertion of vlue itertion (within given preciion ɛ) nd the lgorithm cn move to updting other tte of thi itertion. Thi proof mke BAO updte pplicble to generl vlue itertion plnning with OOSM optimitic initiliztion. A mentioned before, OOSM i nturlly tified in ny MDP long ll vlue re initilized with V mx. Thi requirement i rther wek nd ey to tify nd in thi wy pplicbility of BAO i ubtntil. A hort explntion i required on why in R-mx OOSM i tified. In our pproch, ech new plnning tep trt with the vlue function of the previou plnning tep (incrementl plnning). The new MDP model i different from the previou one jut in hving one more known tte. Thu, ll tte which were known in the previou model tify OOBC with equlity, nd the tte which h jut become known till h it V () = V mx which cnnot be mde higher, which tifie OOBC well. Due to the nture of the BAO updte, thi method i expected to yield prticulrly ignificnt improvement in domin with lrger number of ction in ech tte. It lo h gret potentil to improve plnning in domin with continue ction, becue only limited number of continuou ction hould be updted. 5 Prioritized Sweeping for R-mx Prioritized weeping (PS) h been populr for improved empiricl convergence rte but the theoreticl convergence w only expected by [12] to be provble bed on the convergence reult in ynchronou dynmic progrmming (ADP) by oberving tht PS i n ADP lgorithm. The firt forml proof for generl PS w recently preented by [11], nd the PS lgorithm of [12] w lo proved pecil ce under rther retrictive condition tht initilly ll tte hve to be igned non-zero priority. Thi i rther retrictive umption with regrd to incrementl plnning which i found in R-mx becue in R-mx uully not ll tte require being updted even once. In wht follow, we prove tht PS converge when ued for plnning in R-mx without thoe retrictive umption. Thi hold lo for our extenion to bic PS (hown in Algorithm 4), which i bed on the ide tht it i ufficient to dd to the priority queue only policy predeceor of tte, defined P olicyp red() = { T (, π( ), ) > 0}, (2) (ee Line 6 in Algorithm 4) inted of ll predeceor, defined P red() = { T (,, ) > 0}, (3) it i the ce in tndrd PS [12]. LEMMA 2. The prioritized weeping lgorithm pecified in Algorithm 4 drive Bellmn error to 0 (with required preciion ɛ) when executed for newly lerned tte, k, in R-mx, nd initilizing the vlue function uing the vlue function of the previou plnning tep in which k w not known. Algorithm 4 PS-PP( k ): prioritized weeping with policy predeceor for incrementl plnning in R-mx fter tte k become known 1: P Q k 2: while P Q do 3: remove the firt element from P Q 4: reidul() Bckup() 5: if reidul() > ɛ then 6: for ll P olicyp red() do 7: priority T (,, ) reidul() 8: if / P Q then 9: inert into P Q ccording to priority 10: ele 11: updte in P Q if the new priority i higher 12: end if 13: end for 14: end if 15: end while PROOF. Let F S be the et of tte which do not hve k in their policy grph. Since, the vlue of k cn only decree in the current plnning proce (becue in the previou plnning proce it w unknown with V ( k ) = V mx, nd now it become known nd it V ( k ) V mx), tte k will not pper in the optiml policy grph of ny tte in F, therefore current vlue of ll tte in F re correct, do not require updte, nd their Bellmn error i lredy 0. Thi rgument prove tht tte in F do not hve to be updted, nd only tte in S \ F hould be updted, tht i, policy predeceor of k. Thi prove tht bckwrd expnion of policy predeceor in Line 6 i correct, nd contitute our extenion to the tndrd PS lgorithm [12] for plnning in R-mx. Let S k be S \ F. Since k i the only tte in S k which chnge it dynmic, k i the only tte from which the modified vlue function hould be bck-propgted. The rgument of the previou prgrph howed tht thi bck-propgtion cn keep updting only policy predeceor of tte k, therefore the lt condition to prove i tht the predeceor of tte hould be viited only when reidul() > ɛ. We do thi by howing tht if for ll which cn be reched when ny ction i executed in, reidul() ɛ, then reidul( ) ɛ. Thi men tht if ll ucceor of chnge le thn ɛ, doe not hve to be bcked up given preciion ɛ. Thi cn be derived follow: = mx γ reidul( ) = mx R(, ) + γ + V ()] R(, ) γ T (,, ) V () mx γ = mx γ mx γ T (,, )[V () T (,, )V () T (,, ) V () T (,, ) reidul() T (,, )ɛ = γɛ ɛ. The firt eqution i the definition of reidul( ) where current V ( ) w computed from V (), nd new V ( ) i for V () + V () for ech ucceor of. Next tep re imple lgebric opertion, nd inequlitie re from + b + b, reidul() ɛ, nd γ 1. Bckwrd erch from k in Algorithm 4 will not expnd tte only when ll ucceor of for given policy ction hve reidul() ɛ ( will be viited if t let for one reidul() > ɛ). Thi end the proof tht V ( ) i

5 ) b) c) Figure 1: An exmple when the originl bckwrd vlue itertion fil on the loop within required preciion ɛ when the lgorithm terminte. Algorithm 4 would normlly ue the Bckup() method of Algorithm 1 in Line 4. The proof of Theorem 1 extend to Algorithm 4 with OOSM initiliztion well, nd the BAO procedure preented in Algorithm 3 cn be lo ued in Algorithm 4 by replcing, in Line 4, Bckup() with BAO(). 6 Bckwrd Vlue Itertion with Loop Bckwrd vlue itertion (BVI) i n lgorithm for plnning in generl MDP with et of terminl tte [4]. Thi lgorithm trvere the trnpoe of the policy grph uing breth- or dept-firt erch which trt from the gol tte, nd check for duplicte o tht ech tte i updted only once in the me itertion. Stte re bcked up in the order they re encountered during erch. Before pplying thi lgorithm for plnning in R-mx nd propoe our extenion, we how tht the originl verion of the lgorithm cn fil in computing the correct vlue function. Let ume the originl verion of the BVI lgorithm from [4] nd ummrized bove, nd the ue of thi lgorithm in plnning in the domin whoe four tte re hown in Figure 1. Firt, in Figure 1, current policy ction re hown before ny updte of the current itertion of BVI. Figure 1b how policy ction fter performing bckup on tte b nd d fter which the policy ction of tte d chnged (the new ction i highlighted uing think tyle). Figure 1c how updte of tte nd c fter which the bet ction of tte c chnged (gin the thick tyle how new ction). After thee updte, there i loop which involve tte c nd d, nd the BVI lgorithm will not updte thee tte in the current itertion gin becue ech tte i updted only once, nd the lgorithm will lo never updte thee two tte gin in ny of the future itertion, becue policy ction of ll tte in the loop do not led to ny tte outide of the loop (o neither c nor d will be the previou tte - ccording to policy ction - of ny tte outide of the loop). Thi itution cn hppen in brod cl of MDP in which tte re reviited, in our teting domin, nd pplie lo to tochtic ction when ll ction of ll tte in the loop led to tte in the loop only. It i worth noting tht in [4] where the BVI lgorithm w introduced, ll domin require mny tep to reviit the tte (ction re not eily reverible due to velocity in the tte pce). Our exmple how, tht the tndrd verion of the BVI lgorithm cn fil by encountering the loop in brod cl of MDP. Thi problem of the tndrd BVI lgorithm w found empiriclly during our experimenttion in thi reerch, in which the R-mx gent w getting tuck in uch loop. It i worth reclling here tht the PS-PP lgorithm of the previou ection expnd only policy predeceor, however it will not uffer from the me problem becue PS-PP gurntee tht will be viited if t let for one reidul() > ɛ, thu tte which contitute the loop will be updted well nd they will converge to proper vlue. The BVI lgorithm with policy predeceor nd updting ech tte once in ech itertion will fil in thi indicted in Figure 1. The brief nlyi of Figure 1c indicte one imple olution to the preented problem of the tndrd BVI lgorithm. Since tte which re in the loop hve other non-policy ction which led to tte outide of the loop (e.g., tte d h non-policy ction which led to tte b), the trightforwrd olution to the loop problem i to perform bckwrd erch on ll predeceor of given tte oppoed to policy predeceor it i the ce in the originl BVI lgorithm. Thi i the firt extenion to BVI which i propoed in thi pper, nd the BVI lgorithm modified in thi wy i nmed LBVI which tnd for BVI with loop. The LBVI lgorithm with thi modifiction i pplicble to generl MDP plnning. Our dditionl extenion to the LBVI lgorithm re pecific to incrementl plnning in R-mx which i tudied in thi pper. The complete lgorithm i preented in Algorithm 5. Thi i the tndrd verion of the BVI lgorithm with the following extenion: (1) ll predeceor re ued in the tte expnion in Line 13 (to del with the problem of Figure 1), (2) reidul i checked in Line 12 (to prune the tte expnion when poible), nd (3) the BAO bckup i pplied in Line 8. Algorithm 5 LBVI( k ): bckwrd vlue itertion for incrementl plnning in R-mx fter tte k become known 1: repet 2: ppended() fle 3: LrgetReidul 0 4: F IF OQ k 5: ppended( k ) true 6: while F IF OQ do 7: remove the firt element from F IF OQ 8: reidul() Bckup() 9: if reidul() > LrgetReidul then 10: LrgetReidul reidul() 11: end if 12: if reidul() > ɛ then 13: for ll P red() do 14: if ppended( ) == fle then 15: ppend to F IF OQ 16: ppended( ) = true 17: end if 18: end for 19: end if 20: end while 21: until LrgetReidul < ɛ LEMMA 3. The bckwrd vlue itertion lgorithm pecified in Algorithm 5 drive Bellmn error to 0 (with required preciion ɛ) when executed for newly lerned tte, k, in R-mx, nd initilizing the vlue function uing the vlue function of the previou plnning tep in which k w not known. PROOF. Let E S be the et of tte from which tte k cnnot be reched uing ny policy nd non-policy ction. Since tte k i not rechble from ny tte in E nd k i the only tte whoe dynmic chnge, none of the tte in E require being updted, hence Bellmn error of ll tte in E i lredy 0. Let S k be S \ E. Since k i the only tte in S k which chnge it dynmic, k i the only tte from which the modified vlue function hould be bck-propgted. Since the bckwrd erch proce expnd ll predeceor of ech tte nd trt from k, ll tte which rech tte k (uing both policy nd non-policy

6 ction) will be updted. Therefore the lt condition to prove i tht the predeceor of tte hould be viited only when reidul() > ɛ. In the prof of Lemm 2, it h been lredy hown tht if for ll which cn be reched from, reidul() ɛ, then reidul( ) ɛ. Bckwrd erch from k in Algorithm 5 will not expnd tte only when ll ucceor of hve reidul() ɛ ( will be viited if t let for one reidul() > ɛ). Thi end the proof tht when the lgorithm terminte, V () i within required preciion ɛ. Algorithm 5 would normlly bck up tte in Line 8 uing the Bellmn bckup hown in Algorithm 1. The proof of Theorem 1 extend to Algorithm 5 well, nd the BAO procedure preented in Algorithm 3 for bcking up tte cn be lo ued in Algorithm 5 by replcing, in Line 8, Bckup() with BAO(). 7 Empiricl Evlution Thi ection preent empiricl evlution of propoed pproche to incrementl plnning in R-mx. Plnning time i the meure tht one wihe to minimize in R-mx. 7.1 Algorithm The firt experiment evlute the extenion to the R-mx lgorithm introduced in Section 3. Specificlly, the tndrd R-mx with vlue itertion nd ction election ccording to Eqution 1 i compred gint modified R-mx with our predicte known() nd the ction election rule pecified by Algorithm 2 inted of uing Eqution 1. The gol of the min empiricl evlution i to check how different extenion to tndrd plnning lgorithm improve the time of plnning, nd for thi reon ll propoed extenion re evluted lo eprtely to ee their individul influence. Therefore, the following configurtion re evluted in the empiricl tudy of the pper: VI: tndrd vlue itertion VI-BAO: vlue itertion with BAO updte PS: tndrd prioritized weeping [12] PS-PP: tndrd prioritized weeping with policy predeceor PS-BAO: tndrd prioritized weeping with BAO updte PS-PP-BAO: prioritized weeping with policy predeceor nd BAO updte LBVI: bckwrd vlue itertion which cope will loop (bckwrd erch to ll predeceor) LBVI-RES: LBVI with reidul check (Line 12 in Algorithm 5) LBVI-BAO: LBVI with BAO updte LBVI-RES-BAO: LBVI with reidul check nd BAO updte All lgorithm were implemented in C++, nd the gol w to provide the me mount of optimiztion to ech lgorithm. With thi in mind, the crucil element of prioritized weeping lgorithm w the priority queue. Since, the opertion of increing the priority of the element in the priority queue i required (in Line 11 in Algorithm 4), the trinomil hep w ued becue it upport thi opertion in contnt time [20]. In the implementtion of the queue ued in LBVI, memory buffer were reued in order to hve ft opertion on the FIFO queue. A mentioned before, if not tted otherwie, ll lgorithm ue the modified tretment of unknown tte pecified in Algorithm 2 in Section 3, which ignificntly reduce the number of time the plnner re executed. In ll experiment, the R-mx prmeter m w et to 5, nd the plnning preciion ɛ w Experiment on the mze domin preent the verge vlue of 30 run, nd the hnd whing domin of 10 run. The tndrd error of the men (SEM) i hown both in grph nd in the tble. 7.2 Domin The firt domin i the verion of the nvigtion mze tk which cn be found in the literture. In our implementtion cled up verion of uch mze from [1] i ued nd it contin grid poition. The econd domin i implified model of ituted prompting ytem tht it multiple peron with dementi to complete ctivitie of dily living (ADL) more independently by giving pproprite prompt when needed. Such itution rie in hred pce, e.g. mrt long-term cre fcility, or mrt home with multiple reident in need of itnce. Prompting for ech ADL-reident combintion cn be done uing (PO)MDP [6], but the itution i more complex when multiple reident re preent, prompt cn interfere cro ADL nd between reident. The optiml olution (purued here) i to model the complete joint pce of ll reident nd ADL, lthough pproximte ditributed olution re lo poible [5]. Our pecific implementtion follow the decription in [14]. In our ce, ech MDP h 9 tte nd there re 3 prompt (do nothing or iue one of the two prompt pecific to the current pln tep) for ech tte. When prompting mny client t the me time, prompt of one client cn influence other client, where other prompt cnnot be executed for more thn one client t time, e.g., udio prompt. For exmple, the domin with 4 client h Q(, ) entrie in it Q-tble. Other ize cn be clculted nlogouly. 7.3 Reult The firt tet w to evlute the improvement of our modified notion of tte being known to the R-mx lgorithm introduced in Section 3. A pecified in the firt prgrph of Section 7.1, two verion of the R-mx lgorithm were evluted on the mze domin. Thee two verion of R-mx were executed 30 time nd the uer time w compred. The verion of the lgorithm with our pproch to ditinguih known nd unknown tte (from Section 3) w 2.3 time fter thn the originl verion. The pplicbility of thi extenion doe not depend on the plnning lgorithm nd ll ucceeding experiment ue thi modifiction to tndrd R-mx. Next experiment evlute the mjor contribution of thi pper. Figure 2 nd 3 how the evlution of ll 10 lgorithm pecified in Section 7.1 on the mze domin. Thee lgorithm determine how plnning i done, nd in principle the R-mx lgorithm hould be ble to explore in exctly the me wy regrdle which plnning lgorithm i ued. In order to verify thi, the obtined reult re compred with regrd to the ymptotic convergence of the R-mx lgorithm, nd the verge cumultive rewrd function of the epiode number i preented in Figure 4. Thi figure how tht explortion w the me, nd thi cn be een n empiricl proof, tht ll plnning lgorithm where returning the me explortion policy t their output. The BAO pproch to updting tte how ubtntil improvement in ll three lgorithm. In prticulr, vlue itertion which i trditionlly lower thn, for exmple, prioritized weeping ignificntly reduced it plnning time nd the number of bckup. Thi reult i prticulrly ignificnt not only to plning in R-mx, but lo to generl vlue-bed plnning in MDP when initiliztion tifie the requirement of Definition 1 which uniform optimitic initiliztion with V mx doe. With our BAO pproch, vlue itertion cn be done much fter in trightforwrd wy. A cloer nlyi of PS performnce indicte tht both policy predeceor nd BAO updte yield improvement when pplied individully, nd further improvement i gined when both technique re ued together. Overll with our extenion, PS when ued for incrementl plnning in R-mx i nrrowing it gp to BVI which w hown in [4] to outperform PS in the tndrd ce due

7 e e+07 1e+07 8e+06 6e+06 4e+06 2e+06 VI VI-BAO PS PS-PP PS-BAO PS-PP-BAO LBVI LBVI-RES Time [m] LBVI-BAO LBVI-RES-BAO Figure 2: Plnning time in the mze experiment 0 VI VI-BAO PS PS-PP PS-BAO Number of Q(,) bckup PS-PP-BAO LBVI LBVI-RES LBVI-BAO LBVI-RES-BAO Figure 3: Number of Q(,) bckup in the mze experiment to the overhed of mintining the priority queue. The LBVI lgorithm w evluted with reidul checking nd with BAO updte. Here, thee extenion yield improvement when pplied individully, nd dditionl gin re obtined when they re ued together. The ftet plnning lgorithm in thi experiment w LBVI with both reidul check nd BAO updte. In our implementtion, BVI i ued with our modifiction which updte ll predeceor inted of policy predeceor, ince thi w hown to be trightforwrd olution to the loop problem of the tndrd BVI lgorithm dicued in Section 6. Thi led however to n incree in the number of tte expnion, but our extenion proved to be ufficient in order to gurntee ft plnning of the modified BVI lgorithm. We cknowledge tht there i nother direction to improve the performnce of BVI by till uing policy predeceor, however the olution h to be found on how to void loop which re reported in Section 6. Thi loop problem i detrimentl for R-mx gent becue the gent get tuck in uch loop during explortion. Reult on the hnd whing domin re in Tble 1. The rnk of Averge cumultive rewrd / VI VI-BAO PS PS-PP PS-BAO PS-PP-BAO LBVI LBVI-RES LBVI-BAO LBVI-RES-BAO Number of epiode / 10 Figure 4: The cumultive rewrd of the lerning gent ech lgorithm i the me in the mze domin bove. The ignificnce of our improvement, BAO in prticulr, become more evident when the tte nd ction pce re bigger. It i worth noting tht in the lt two intnce (4 nd 5 client), we were ble to do off-line plnning in R-mx with nd ttection pir in the Q-tble! Experiment in which it w infeible to wit for their completion re indicted with -. 8 Relted Work The fct tht plnning i bottleneck of PAC-MDP lerning h been recently emphized lo in [21] where Monte Crlo on-line plnning lgorithm for PAC-MDP lerning were propoed. Thee lgorithm re intereting becue their complexity doe not depend on the number of tte. Thi i chieved by mpling C time from ech tte (which limit the brnching fctor) nd the horizon i dditionlly limited by the dicount fctor. In thi wy, it i ufficient to do Monte Crlo mpling only in the limited neighbourhood of given tte. The didvntge of thee lgorithm i tht they require the entire proce to be repeted for ech ction election. Our lgorithm which re propoed in thi pper lo mke ue of the fct tht when new tte become known, motly only it neighbourhood need to be updted, which i reflected very well in our reult. Our conjecture here i tht the lgorithm which we propoe in thi pper, could be proven to hve complexity dependent only on the cloe neighbourhood of the tte which trigger the plnning proce. The rtionl for thi theoreticl future work i indicted by our reult in thi pper. In [21] uthor report reult with Monte Crlo plnning on flg domin with 5 5 grid nd 6 flg poibly ppering, where VI did not ucceed. In our experiment of thi pper, we re reporting reult on lrge domin where even though VI w very inefficient or did not work t ll, our extenion to VI-bed plnning were proven to be ucceful. Such off-line lgorithm require plnning only once for ech known tte nd once plnning i done, the policy cn be ued very ft, where Monte Crlo method pln for every tep. Our method could further cle the off-line method up when ued with fctored plnner for MDP [7]. We re dditionlly not wre of ny PAC-MDP reult with off-line plnning on domin lrge thoe olved in thi pper. 9 Concluion PAC-MDP lgorithm re prticulrly efficient in term of the number of mple which re needed by the lerning gent in order to chieve ner optiml performnce. Thee lgorithm however execute time conuming plnning tep fter ech new tte-ction pir (or new tte ccording to our extenion) become known to the gent. Thi fct i eriou limittion on broder ppliction of thee kind of lgorithm. Thi pper exmine the plnning problem in PAC-MDP lerning, nd eek wy of hortening the durtion of the plnning tep. The contribution of thi pper cn be ummrized follow: The number of execution of the plnner cn be reduced when plnning i triggered by new tte becoming known introduced in Section 3 The new updte opertor, BAO, w propoed which, inted of updting ll ction of given tte once, updte only the bet ction of ech tte but continue thi updting until convergence within the given tte. Thi pproch yield ignificnt improvement in ll evluted lgorithm, nd tndrd vlue itertion in prticulr. Thi pproch i lo pplicble beyond plnning in R-mx, ince optimitic initiliztion with V mx cn be eily pplied in generl vlue-bed MDP plnning, nd thi contribution h potentil to ber n impct on the field

8 Algorithm 1 Client 2 Client 3 Client 4 Client 5 Client VI 7.9 ± ± ± VI-BAO 2.7 ± ± ± ± PS 5.1 ± ± ± ± PS-PP 3.7 ± ± ± ± PS-BAO 1.3 ± ± ± ± PS-PP-BAO 1.4 ± ± ± ± ± LBVI 5.3 ± ± ± LBVI-RES 4.3 ± ± ± ± LBVI-BAO 1.4 ± ± ± ± LBVI-RES-BAO 1.6 ± ± ± ± ± Tble 1: Plnning time [m] for different ize of the hnd whing domin An extenion to the prioritized weeping lgorithm w propoed which exploit propertie of plnning problem in PAC-MDP lerning. Specificlly, only policy predeceor of ech tte re dded to the priority queue in contrt to dding ll predeceor in the tndrd prioritized weeping lgorithm It w hown tht the originl bckwrd vlue itertion lgorithm from the literture - which updte ech tte exctly once in ech itertion - cn fil on brod cl of MDP domin. The problem nd one trightforwrd correction were hown. Then, our extenion to the corrected verion of BVI which re pecific to plnning in PAC-MDP lerning were propoed. Specificlly, it w hown tht the predeceor tte doe not hve to be expnded in given itertion when ll it ucceor hve their reidul mller thn preciion ɛ The intnce of the hnd whing domin with lrge tte pce were olved, which extend pplicbility of the PAC-MDP prdigm coniderbly beyond exiting PAC-MDP evlution which cn be found in the literture All preented in the pper lgorithm re eqully pplicble to gol-bed well infinite horizon RL problem, becue both in prioritized weeping nd bckwrd vlue itertion, plnning trt from pecific tte, nd it doe not mtter whether the domin h gol tte or not The theoreticl jutifiction to ll contribution w provided nd ll pproche were further evluted empiriclly. Regrdle of the more pecific detil of the empiricl evlution, prticulrly ubtntil contribution of thi work i tht the tndrd vlue itertion lgorithm cn be mde coniderbly fter by the trightforwrd ppliction of the BAO updte rule which w propoed in thi pper. 10 Acknowledgement Thi reerch w ponored by Americn Alzheimer Aocition grnt number ETAC Reference [1] J. Amuth, M. L. Littmn, nd R. Zinkov. Potentil-bed hping in model-bed reinforcement lerning. In Proceeding of AAAI, [2] D. P. Bertek nd J. N. Titikli. Neuro-Dynmic Progrmming. Athen Scientific, [3] R. I. Brfmn nd M. Tennenholtz. R-mx - generl polynomil time lgorithm for ner-optiml reinforcement lerning. JMLR, 3: , [4] P. Di nd E. A. Hnen. Prioritizing Bellmn bckup without priority queue. In Proceeding of ICAPS, [5] J. Hoey nd M. Grześ. Ditributed control of ituted itnce in lrge domin with mny tk. In Proc. of ICAPS, [6] J. Hoey, P. Pouprt, A. von Bertoldi, T. Crig, C. Boutilier, nd A. Mihilidi. Automted hndwhing itnce for peron with dementi uing video nd prtilly obervble mrkov deciion proce. Computer Viion nd Imge Undertnding, 114(5), My [7] J. Hoey, R. St-Aubin, A. Hu, nd C. Boutilier. SPUDD: Stochtic plnning uing deciion digrm. In Proceeding of UAI, pge , [8] S. M. Kkde. On the Smple Complexity of Reinforcement Lerning. PhD thei, Gtby Computtionl Neurocience Unit, Univerity College, London, [9] M. Kern nd S. Singh. Ner-optiml reinforcement lerning in polynomil time. Mchine Lerning, 49: , [10] J. Z. Kolter nd A. Ng. Ner-Byein explortion in polynomil time. In Proceeding of ICML, [11] L. Li nd M. L. Littmn. Priorioritized weeping converge to the optiml vlue function. Technicl report, Rutger Univerity, [12] A. W. Moore nd C. G. Atkenon. Prioritized weeping: Reinforcement lerning with le dt nd le time. Mchine Lerning, 13: , [13] A. Y. Ng nd M. Jordn. PEGASUS: A policy erch method for lrge MDP nd POMDP. In In Proceeding of Uncertinty in Artificil Intelligence, pge , [14] P. Pouprt, N. Vli, J. Hoey, nd K. Regn. An nlytic olution to dicrete Byein reinforcement lerningbell. In Proceeding of ICML, pge , [15] M. L. Putermn. Mrkov Deciion Procee: Dicrete Stochtic Dynmic Progrmming. John Wiley & Son, Inc., New York, NY, USA, [16] A. L. Strehl nd M. L. Littmn. An nlyi of model-bed intervl etimtion for Mrkov deciion procee. Journl of Computer nd Sytem Science, 74: , [17] R. S. Sutton. Integrted rchitecture for lerning, plnning, nd recting bed on pproximting dynmic progrmming. In Proceeding of ICML, pge , [18] R. S. Sutton nd A. G. Brto. Reinforcement Lerning: An Introduction. MIT Pre, [19] I. Szit nd C. Szepevári. Model-bed reinforcement lerning with nerly tight explortion complexity bound. In Proceeding of ICML, pge , [20] T. Tkok. Theory of trinomil hep. In Proceeding of the Interntionl Conference on Computing nd Combintoric, LNCS, pge , [21] T. J. Wlh, S. Gochin, nd M. L. Littmn. Integrting mple-bed plnning nd model-bed reinforcement lerning. In Proceeding of AAAI, 2010.

Artificial Intelligence Markov Decision Problems

Artificial Intelligence Markov Decision Problems rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome

More information

Reinforcement Learning and Policy Reuse

Reinforcement Learning and Policy Reuse Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez

More information

Reinforcement learning

Reinforcement learning Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

20.2. The Transform and its Inverse. Introduction. Prerequisites. Learning Outcomes

20.2. The Transform and its Inverse. Introduction. Prerequisites. Learning Outcomes The Trnform nd it Invere 2.2 Introduction In thi Section we formlly introduce the Lplce trnform. The trnform i only pplied to cul function which were introduced in Section 2.1. We find the Lplce trnform

More information

Reinforcement Learning for Robotic Locomotions

Reinforcement Learning for Robotic Locomotions Reinforcement Lerning for Robotic Locomotion Bo Liu Stnford Univerity 121 Cmpu Drive Stnford, CA 94305, USA bliuxix@tnford.edu Hunzhong Xu Stnford Univerity 121 Cmpu Drive Stnford, CA 94305, USA xuhunvc@tnford.edu

More information

TP 10:Importance Sampling-The Metropolis Algorithm-The Ising Model-The Jackknife Method

TP 10:Importance Sampling-The Metropolis Algorithm-The Ising Model-The Jackknife Method TP 0:Importnce Smpling-The Metropoli Algorithm-The Iing Model-The Jckknife Method June, 200 The Cnonicl Enemble We conider phyicl ytem which re in therml contct with n environment. The environment i uully

More information

Markov Decision Processes

Markov Decision Processes Mrkov Deciion Procee A Brief Introduction nd Overview Jck L. King Ph.D. Geno UK Limited Preenttion Outline Introduction to MDP Motivtion for Study Definition Key Point of Interet Solution Technique Prtilly

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

APPENDIX 2 LAPLACE TRANSFORMS

APPENDIX 2 LAPLACE TRANSFORMS APPENDIX LAPLACE TRANSFORMS Thi ppendix preent hort introduction to Lplce trnform, the bic tool ued in nlyzing continuou ytem in the frequency domin. The Lplce trnform convert liner ordinry differentil

More information

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Policy Gradient Methods for Reinforcement Learning with Function Approximation Policy Grdient Method for Reinforcement Lerning with Function Approximtion Richrd S. Sutton, Dvid McAlleter, Stinder Singh, Yihy Mnour AT&T Lb Reerch, 180 Prk Avenue, Florhm Prk, NJ 07932 Abtrct Function

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes Non-Myopic Multi-Apect Sening with Prtilly Oervle Mrkov Deciion Procee Shiho Ji 2 Ronld Prr nd Lwrence Crin Deprtment of Electricl & Computer Engineering 2 Deprtment of Computer Engineering Duke Univerity

More information

CHOOSING THE NUMBER OF MODELS OF THE REFERENCE MODEL USING MULTIPLE MODELS ADAPTIVE CONTROL SYSTEM

CHOOSING THE NUMBER OF MODELS OF THE REFERENCE MODEL USING MULTIPLE MODELS ADAPTIVE CONTROL SYSTEM Interntionl Crpthin Control Conference ICCC 00 ALENOVICE, CZEC REPUBLIC y 7-30, 00 COOSING TE NUBER OF ODELS OF TE REFERENCE ODEL USING ULTIPLE ODELS ADAPTIVE CONTROL SYSTE rin BICĂ, Victor-Vleriu PATRICIU

More information

Robot Planning in Partially Observable Continuous Domains

Robot Planning in Partially Observable Continuous Domains Robot Plnning in Prtilly Obervble Continuou Domin Joep M. Port Intitut de Robòtic i Informàtic Indutril (UPC-CSIC) Lloren i Artig 4-6, 828, Brcelon Spin Emil: port@iri.upc.edu Mtthij T. J. Spn Informtic

More information

Robot Planning in Partially Observable Continuous Domains

Robot Planning in Partially Observable Continuous Domains Robot Plnning in Prtilly Obervble Continuou Domin Joep M. Port Intitut de Robòtic i Informàtic Indutril (UPC-CSIC) Lloren i Artig 4-6, 828, Brcelon Spin Emil: port@iri.upc.edu Mtthij T. J. Spn Informtic

More information

STABILITY and Routh-Hurwitz Stability Criterion

STABILITY and Routh-Hurwitz Stability Criterion Krdeniz Technicl Univerity Deprtment of Electricl nd Electronic Engineering 6080 Trbzon, Turkey Chpter 8- nd Routh-Hurwitz Stbility Criterion Bu der notlrı dece bu deri ln öğrencilerin kullnımın çık olup,

More information

4-4 E-field Calculations using Coulomb s Law

4-4 E-field Calculations using Coulomb s Law 1/11/5 ection_4_4_e-field_clcultion_uing_coulomb_lw_empty.doc 1/1 4-4 E-field Clcultion uing Coulomb Lw Reding Aignment: pp. 9-98 Specificlly: 1. HO: The Uniform, Infinite Line Chrge. HO: The Uniform Dik

More information

2. The Laplace Transform

2. The Laplace Transform . The Lplce Trnform. Review of Lplce Trnform Theory Pierre Simon Mrqui de Lplce (749-87 French tronomer, mthemticin nd politicin, Miniter of Interior for 6 wee under Npoleon, Preident of Acdemie Frncie

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

CONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Design Using the Root Locus

CONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Design Using the Root Locus CONTROL SYSTEMS LABORATORY ECE311 LAB 3: Control Deign Uing the Root Locu 1 Purpoe The purpoe of thi lbortory i to deign cruie control ytem for cr uing the root locu. 2 Introduction Diturbnce D( ) = d

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables Copyright 013 Peron Eduction, Inc. Tble nd Formul for Sullivn, Sttitic: Informed Deciion Uing Dt 013 Peron Eduction, Inc Chpter Orgnizing nd Summrizing Dt Reltive frequency = frequency um of ll frequencie

More information

Bias in Natural Actor-Critic Algorithms

Bias in Natural Actor-Critic Algorithms Bi in Nturl Actor-Critic Algorithm Philip S. Thom pthom@c.um.edu Deprtment of Computer Science, Univerity of Mchuett, Amhert, MA 01002 USA Technicl Report UM-CS-2012-018 Abtrct We how tht two populr dicounted

More information

MArkov decision processes (MDPs) have been widely

MArkov decision processes (MDPs) have been widely Spre Mrkov Deciion Procee with Cul Spre Tlli Entropy Regulriztion for Reinforcement Lerning yungje Lee, Sungjoon Choi, nd Songhwi Oh rxiv:709.0693v3 [c.lg] 3 Oct 07 Abtrct In thi pper, re Mrkov deciion

More information

PHYSICS 211 MIDTERM I 22 October 2003

PHYSICS 211 MIDTERM I 22 October 2003 PHYSICS MIDTERM I October 3 Exm i cloed book, cloed note. Ue onl our formul heet. Write ll work nd nwer in exm booklet. The bck of pge will not be grded unle ou o requet on the front of the pge. Show ll

More information

PHYS 601 HW 5 Solution. We wish to find a Fourier expansion of e sin ψ so that the solution can be written in the form

PHYS 601 HW 5 Solution. We wish to find a Fourier expansion of e sin ψ so that the solution can be written in the form 5 Solving Kepler eqution Conider the Kepler eqution ωt = ψ e in ψ We wih to find Fourier expnion of e in ψ o tht the olution cn be written in the form ψωt = ωt + A n innωt, n= where A n re the Fourier

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

COUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS

COUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS COUNTING DESCENTS, RISES, AND LEVELS, WITH PRESCRIBED FIRST ELEMENT, IN WORDS Sergey Kitev The Mthemtic Intitute, Reykvik Univerity, IS-03 Reykvik, Icelnd ergey@rui Toufik Mnour Deprtment of Mthemtic,

More information

Near-Bayesian Exploration in Polynomial Time

Near-Bayesian Exploration in Polynomial Time J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent

More information

. The set of these fractions is then obviously Q, and we can define addition and multiplication on it in the expected way by

. The set of these fractions is then obviously Q, and we can define addition and multiplication on it in the expected way by 50 Andre Gthmnn 6. LOCALIZATION Locliztion i very powerful technique in commuttive lgebr tht often llow to reduce quetion on ring nd module to union of mller locl problem. It cn eily be motivted both from

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

Uninformed Search Lecture 4

Uninformed Search Lecture 4 Lecture 4 Wht re common serch strtegies tht operte given only serch problem? How do they compre? 1 Agend A quick refresher DFS, BFS, ID-DFS, UCS Unifiction! 2 Serch Problem Formlism Defined vi the following

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

Math 2142 Homework 2 Solutions. Problem 1. Prove the following formulas for Laplace transforms for s > 0. a s 2 + a 2 L{cos at} = e st.

Math 2142 Homework 2 Solutions. Problem 1. Prove the following formulas for Laplace transforms for s > 0. a s 2 + a 2 L{cos at} = e st. Mth 2142 Homework 2 Solution Problem 1. Prove the following formul for Lplce trnform for >. L{1} = 1 L{t} = 1 2 L{in t} = 2 + 2 L{co t} = 2 + 2 Solution. For the firt Lplce trnform, we need to clculte:

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

Oracular Partially Observable Markov Decision Processes: A Very Special Case

Oracular Partially Observable Markov Decision Processes: A Very Special Case Orculr Prtilly Obervble Mrkov Deciion Procee: A Very Specil Ce Nichol Armtrong-Crew nd Mnuel Veloo Robotic Intitute, Crnegie Mellon Univerity {nrmtro,veloo}@c.cmu.edu Abtrct We introduce the Orculr Prtilly

More information

ARCHIVUM MATHEMATICUM (BRNO) Tomus 47 (2011), Kristína Rostás

ARCHIVUM MATHEMATICUM (BRNO) Tomus 47 (2011), Kristína Rostás ARCHIVUM MAHEMAICUM (BRNO) omu 47 (20), 23 33 MINIMAL AND MAXIMAL SOLUIONS OF FOURH ORDER IERAED DIFFERENIAL EQUAIONS WIH SINGULAR NONLINEARIY Kritín Rotá Abtrct. In thi pper we re concerned with ufficient

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction Efficient Plnning 1 Tuesdy clss summry: Plnning: ny computtionl process tht uses model to crete or improve policy Dyn frmework: 2 Questions during clss Why use simulted experience? Cn t you directly compute

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Vyacheslav Telnin. Search for New Numbers.

Vyacheslav Telnin. Search for New Numbers. Vycheslv Telnin Serch for New Numbers. 1 CHAPTER I 2 I.1 Introduction. In 1984, in the first issue for tht yer of the Science nd Life mgzine, I red the rticle "Non-Stndrd Anlysis" by V. Uspensky, in which

More information

Bernoulli Numbers Jeff Morton

Bernoulli Numbers Jeff Morton Bernoulli Numbers Jeff Morton. We re interested in the opertor e t k d k t k, which is to sy k tk. Applying this to some function f E to get e t f d k k tk d k f f + d k k tk dk f, we note tht since f

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999. Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

Section 6.1 INTRO to LAPLACE TRANSFORMS

Section 6.1 INTRO to LAPLACE TRANSFORMS Section 6. INTRO to LAPLACE TRANSFORMS Key terms: Improper Integrl; diverge, converge A A f(t)dt lim f(t)dt Piecewise Continuous Function; jump discontinuity Function of Exponentil Order Lplce Trnsform

More information

DIRECT CURRENT CIRCUITS

DIRECT CURRENT CIRCUITS DRECT CURRENT CUTS ELECTRC POWER Consider the circuit shown in the Figure where bttery is connected to resistor R. A positive chrge dq will gin potentil energy s it moves from point to point b through

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

On the Adders with Minimum Tests

On the Adders with Minimum Tests Proceeding of the 5th Ain Tet Sympoium (ATS '97) On the Adder with Minimum Tet Seiji Kjihr nd Tutomu So Dept. of Computer Science nd Electronic, Kyuhu Intitute of Technology Atrct Thi pper conider two

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Taylor Polynomial Inequalities

Taylor Polynomial Inequalities Tylor Polynomil Inequlities Ben Glin September 17, 24 Abstrct There re instnces where we my wish to pproximte the vlue of complicted function round given point by constructing simpler function such s polynomil

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 Automt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Problem (II) Chpter II.6.: Push Down Automt Remrk: This mteril is no longer tught nd not directly exm relevnt Anton Setzer (Bsed

More information

20 MATHEMATICS POLYNOMIALS

20 MATHEMATICS POLYNOMIALS 0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of

More information

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac REVIEW OF ALGEBRA Here we review the bsic rules nd procedures of lgebr tht you need to know in order to be successful in clculus. ARITHMETIC OPERATIONS The rel numbers hve the following properties: b b

More information

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes Jim Lmbers MAT 169 Fll Semester 2009-10 Lecture 4 Notes These notes correspond to Section 8.2 in the text. Series Wht is Series? An infinte series, usully referred to simply s series, is n sum of ll of

More information

Working with Powers and Exponents

Working with Powers and Exponents Working ith Poer nd Eponent Nme: September. 00 Repeted Multipliction Remember multipliction i y to rite repeted ddition. To y +++ e rite. Sometime multipliction i done over nd over nd over. To rite e rite.

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s). Mth 1A with Professor Stnkov Worksheet, Discussion #41; Wednesdy, 12/6/217 GSI nme: Roy Zho Problems 1. Write the integrl 3 dx s limit of Riemnn sums. Write it using 2 intervls using the 1 x different

More information

How can we approximate the area of a region in the plane? What is an interpretation of the area under the graph of a velocity function?

How can we approximate the area of a region in the plane? What is an interpretation of the area under the graph of a velocity function? Mth 125 Summry Here re some thoughts I ws hving while considering wht to put on the first midterm. The core of your studying should be the ssigned homework problems: mke sure you relly understnd those

More information

For the percentage of full time students at RCC the symbols would be:

For the percentage of full time students at RCC the symbols would be: Mth 17/171 Chpter 7- ypothesis Testing with One Smple This chpter is s simple s the previous one, except it is more interesting In this chpter we will test clims concerning the sme prmeters tht we worked

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Math& 152 Section Integration by Parts

Math& 152 Section Integration by Parts Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 UNIFORM CONVERGENCE Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 Suppose f n : Ω R or f n : Ω C is sequence of rel or complex functions, nd f n f s n in some sense. Furthermore,

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

SPACE VECTOR PULSE- WIDTH-MODULATED (SV-PWM) INVERTERS

SPACE VECTOR PULSE- WIDTH-MODULATED (SV-PWM) INVERTERS CHAPTER 7 SPACE VECTOR PULSE- WIDTH-MODULATED (SV-PWM) INVERTERS 7-1 INTRODUCTION In Chpter 5, we briefly icue current-regulte PWM inverter uing current-hyterei control, in which the witching frequency

More information

Convergence of Fourier Series and Fejer s Theorem. Lee Ricketson

Convergence of Fourier Series and Fejer s Theorem. Lee Ricketson Convergence of Fourier Series nd Fejer s Theorem Lee Ricketson My, 006 Abstrct This pper will ddress the Fourier Series of functions with rbitrry period. We will derive forms of the Dirichlet nd Fejer

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

M. A. Pathan, O. A. Daman LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS

M. A. Pathan, O. A. Daman LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS DEMONSTRATIO MATHEMATICA Vol. XLVI No 3 3 M. A. Pthn, O. A. Dmn LAPLACE TRANSFORMS OF THE LOGARITHMIC FUNCTIONS AND THEIR APPLICATIONS Abtrct. Thi pper del with theorem nd formul uing the technique of

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

EE Control Systems LECTURE 8

EE Control Systems LECTURE 8 Coyright F.L. Lewi 999 All right reerved Udted: Sundy, Ferury, 999 EE 44 - Control Sytem LECTURE 8 REALIZATION AND CANONICAL FORMS A liner time-invrint (LTI) ytem cn e rereented in mny wy, including: differentil

More information

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018 DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

ELECTRICAL CIRCUITS 10. PART II BAND PASS BUTTERWORTH AND CHEBYSHEV

ELECTRICAL CIRCUITS 10. PART II BAND PASS BUTTERWORTH AND CHEBYSHEV 45 ELECTRICAL CIRCUITS 0. PART II BAND PASS BUTTERWRTH AND CHEBYSHEV Introduction Bnd p ctive filter re different enough from the low p nd high p ctive filter tht the ubject will be treted eprte prt. Thi

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

Before we can begin Ch. 3 on Radicals, we need to be familiar with perfect squares, cubes, etc. Try and do as many as you can without a calculator!!!

Before we can begin Ch. 3 on Radicals, we need to be familiar with perfect squares, cubes, etc. Try and do as many as you can without a calculator!!! Nme: Algebr II Honors Pre-Chpter Homework Before we cn begin Ch on Rdicls, we need to be fmilir with perfect squres, cubes, etc Try nd do s mny s you cn without clcultor!!! n The nth root of n n Be ble

More information

Lesson 1: Quadratic Equations

Lesson 1: Quadratic Equations Lesson 1: Qudrtic Equtions Qudrtic Eqution: The qudrtic eqution in form is. In this section, we will review 4 methods of qudrtic equtions, nd when it is most to use ech method. 1. 3.. 4. Method 1: Fctoring

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

1 Probability Density Functions

1 Probability Density Functions Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our

More information

1 The Riemann Integral

1 The Riemann Integral The Riemnn Integrl. An exmple leding to the notion of integrl (res) We know how to find (i.e. define) the re of rectngle (bse height), tringle ( (sum of res of tringles). But how do we find/define n re

More information