A Fast and Reliable Policy Improvement Algorithm
|
|
- David Lawson
- 5 years ago
- Views:
Transcription
1 A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce simple, efficient method tht improves stochstic policies for Mrkov decision processes. The computtionl complexity is the sme s tht of the vlue estimtion problem. We prove tht when the vlue estimtion error is smll, this method gives n improvement in performnce tht increses with certin vrince properties of the initil policy nd trnsition dynmics. Performnce in numericl experiments compres fvorbly with previous policy improvement lgorithms. 1 Introduction Mrkov decision problems (MDPs) re sequentil decision problems where loss hs memory (lso known s stte). The objective is to find policy mpping from sttes to ctions tht yields high discounted cumultive rewrd. In lrge-stte problems, finding n optiml policy is chllenging nd one hs to resort to pproximtions. Unfortuntely, mny pproximte MDP lgorithms do not lwys improve monotoniclly. We propose computtionlly efficient lgorithm nd show tht it genertes sequence of incresingly better policies. We consider MDPs with finite stte nd ction spces, nd rewrd function r defined on the stte spce. The distribution of the stte t time t + 1 is function of the stte x t nd ction t t the previous time t. We define trnsition mtrix P, with rows indexed by stte-ction pirs nd columns indexed by subsequent sttes, so tht P (xt, t) is the vector of probbilities of stte x t+1. A policy π is mpping from sttes to probbility distributions over ctions. We write π( x) Appering in Proceedings of the 19 th Interntionl Conference on Artificil Intelligence nd Sttistics (AISTATS) 016, Cdiz, Spin. JMLR: W&CP volume 51. Copyright 016 by the uthors. s the probbility of ction in stte x under policy π. (We lso use π(x t ) to denote the rndom ction t distributed ccording to π( x t ).) For strting stte x 0, the vlue function corresponding to π is defined by [ ] V π (x 0 ) = E γ t r(x t ), (1) t=0 where γ (0, 1) is discount fctor, x t is the stte t time t, nd t π( x t ). The expecttion is over the stochsticity in the policy nd in the evolution of sttes. The objective is to find policy π such tht the totl cumultive loss V π (x 0 ) is ner-optiml. (The optiml policy is the one for which V π (x 0 ) is mximized.) We ssume tht the rewrd function is bounded in [0, (1 γ)b] for some b (0, 1). There is vst literture on Mrkov decision problems nd reinforcement lerning (RL) (Sutton nd Brto, 1998, Bertseks nd Tsitsiklis, 1996). Dynmic progrmming (DP) lgorithms, such s vlue itertion nd policy itertion, re stndrd techniques for computing the optiml policy. In lrge stte spce problems, exct DP is not fesible, becuse the computtionl complexity scles t lest qudrticlly with the number of sttes. In such problems, the optiml vlue function cn be pproximted with liner combintion of smll number of fetures, with the understnding tht serching in this low dimensionl subspce is esier thn solving the originl problem. Unlike exct DP, pproximte DP does not necessrily improve the policy in ech itertion (Kkde nd Lngford, 00). Given stochstic policy π, our method finds n estimte V π for its vlue, nd returns n improved policy π such tht V π (x 0 ) V π (x 0 ) E( V π, V π ) + for some policy evlution error E( V π, V π ) nd some positive sclr. Our performnce bounds re composed of policy evlution (PE) error term nd positive policy improvement (PI) term. The min dvntge of both our method nd CPI, by comprison with API, is tht we cn obtin strict policy improvement s long s the PI term is bigger thn the PE term. If the PE error is very lrge, our lgorithm might fil to improve the policy. The sme is true of the CPI pproch of 1338
2 A Fst nd Relible Policy Improvement Algorithm Kkde nd Lngford (00). Vlue estimtes however re needed only t the sttes tht the gent visits under the policy. Estimtes cn be obtined by performing roll-outs from the current stte. By choosing the number of roll-outs ppropritely, we cn control the ccurcy of these estimtes, nd thus ensure policy improvement. For API, the performnce is only gurnteed to not degrde by more thn the PE error. The policy π is rndomized, ssigning lrger probbilities to ctions with lrger vlue estimtes. The closest to our work is Conservtive Policy Itertion (CPI) of Kkde nd Lngford (00) tht uses n pproximte greedy updte. Pirott et l. (013) study severl extensions of CPI. Thoms et l. (015) propose different pproch tht gurntees sfe policy improvement, but the computtionl complexity of their method is high. Our contributions re s follows: (1) We propose policy itertion scheme tht mkes step towrds the greedy policy, however unlike CPI, the mixture coefficients re stte-dependent nd unlike Pirott et l. (013), these stte-dependent coefficients cn be computed efficiently. () We nlyze the proposed lgorithm nd show tht its performnce improvement is lrger thn tht of CPI. While the improvement in CPI hs the form of the qudrtic of n expecttion, our improvement hs the form of the expecttion of qudrtics. Moreover, the mixture coefficients cn be significntly lrger in our updtes, mking our lgorithm prcticl while gurnteed to improve the initil policy. (3) We study the proposed lgorithm numericlly on chin-wlk nd inverted-pendulum benchmrks, showing tht it performs well in these domins. 1.1 Nottion The expecttion of rndom vrible z with respect to distribution v is denoted by E v z = p v(p)z(p), where summtion is over the countble domin of z. For policy π, we write E π( x) z = π( x)z(x, ) nd E π( x)p z = π( x)p (x,)z. Similrly, Vr π( x) z = E π( x) z (E π( x) z). Vribles z nd y cn be sclrs, vectors, or mtrices. We use P π to denote the probbility trnsition mtrix under policy π. We use L π to denote the Bellmn opertor: for ny V R X, (L π V )(x) = π( x)(r(x, ) + γp (x,) V ). 1 Algorithm We ssume tht rewrd is independent of the ction. From here on, we use r(x) to represent r(x, ), since the rewrd is independent of in A. Fix constnt 1 For ny V R X, P (x,) V = x P (x x, )V (x ). b < 1 nd scle rewrds such tht r(x) [0, (1 γ)b]. This implies tht V π (x) (0, b) for ny policy π nd stte x. We sy function V R X is consistent vlue estimte if for ny stte x, min(r(x) + γp (x,) V ) V (x) mx (r(x) + γp (x,)v ). Let π be n rbitrry policy. Let V π R X be the vlue of π. Let V π R X be n pproximtion of V π, nd define Q π (x, ) = r(x) + γp (x,) V π. First, check if V π is consistent vlue estimte: min Q π (x, ) V π (x) mx If () holds, find policy ν such tht Q π (x, ). () V π (x) = E ν( x) Q π (x, ) + Vr ν( x) Q π (x, ). (3) Otherwise find policy ν such tht E π( x) Q π (x, ) = E ν( x) Q π (x, ) + Vr ν( x) Q π (x, ). (4) Eqution (4) lwys hs solution ν. If we choose ν = π, then LHS is no more thn RHS. On the other hnd, if ν ssigns ll the probbility mss to rgmin Q π (x, ), then Vr ν( x) Q π (x, ) = 0 nd LHS is no less thn RHS. As RHS is continuous function in ν, the bove eqution hs solution nd t lest one { solution is convex } combintion of π( x) nd 1 rgmin Q π (x, ). Similrly, (3) hs solution under condition (). Becuse of monotonicity, the solution cn be found efficiently by binry serch. Let π (x, ) = Q π (x, ) E ν( x) Q π (x, ) nd π( x) = ν( x)(1 + π (x, )). Inclusion of the term E ν( x) Q π (x, ) ensures tht the probbilities sum to one: A π( x) = 1 for ll x X. In the bsence of estimtion error, tht is, V π = V π, it cn be shown tht L π V π = V π. (See Lemm.) Although π might be different from π, it hs the sme vlue function V π = V π. Let F ( π) = mx x, π (x, ). nd define the policy Choose s = 1/F ( π) π( x) = ν( x)(1 + s π (x, )). (5) If π (x, ) = 0 for ll x nd, we use the convention tht 0 1/0 = 0. If we do not hve ccess to good estimte of F ( π), choose s = 1/(γ mx x V π (x)). This ensures tht γ(p x, V π )s 1. If we do not hve ccess to good estimte of mx x V π (x), then we cn use the more conservtive choice of s = 1/(γb). In prctice, when estimting F ( π) nd mx x V π (x) is hrd, we strt from lrge vlue of s nd decrese it when we observe negtive π vlue. 1339
3 Ysin Abbsi-Ydkori, Peter L. Brtlett, Stephen J. Wright Input: Policy π, constnt s; for t = 1,,... do Observe stte x t ; Estimte V π (x t ) nd Q π (x t, ) for A; if Inequlity () holds then Obtin ν( x) such tht (3) is stisfied; else Obtin ν( x) such tht (4) is stisfied; end if Tke ction smpled ccording to π( x t ) := ν( x t )(1 + s π (x t, )), where s is defined in the text; end for Input: Initil policy π 1, constnt s, time horizon T ; for i = 1,,..., I do Run policy π i for T steps; Estimte V π i; Obtin ν from (3) or (4); Define the new policy for ll x, : π i+1 ( x) := ν( x)(1 + s π i(x, )), where s is defined in the text; end for Figure : Itertive LPI Algorithm. Figure 1: Linerized Policy Improvement Algorithm. The definition of s ensures tht π( x) 0 for ll x nd. In summry, we reshpe policy π nd obtin π tht hs the sme vlue function. Then π( x) is obtined by incresing the probbility of ctions with positive π (x, ). We clculte ν( x) only when we visit stte x. So we do not need to perform these clcultions for ll sttes beforehnd. We cll the resulting lgorithm the LPI lgorithm for Linerized Policy Improvement. Pseudo-code of the lgorithm is given in Figure 1. Let I(V ) be the set of sttes such tht V is consistent vlue estimte. In Theorem 4, we show tht for ny strting stte distribution c R X, we hve c V π c V π c B(s 1) ( V π V π ) + (6) v π,c (x) E π( x) Q π (x, ) V π (x), where v π,c = c t=0 γt (P π ) t nd B = x In prticulr, if V π = V π, then v π,c (x)vr ν( x) Q π (x, ). c V π c V π + B(s 1)/. All quntities on the RHS cn be estimted by rollouts, which provides n efficient wy to estimte policy improvement. We cn iterte the procedure of Figure 1 to improve the policy. The resulting lgorithm, clled Itertive LPI or ILPI, is shown in Figure. Our updte rule (5) hs similrities with the CPI rule of Kkde nd Lngford (00), lthough it is not convex combintion of the current policy nd the greedy policy. Also, unlike CPI, our updte is nonuniform cross the stte spce. Updte rule (5) mkes smll chnges to the current policy when there re smll differences in Q π vlues, nd lrger chnges when the differences in Q π vlues re more substntil. Interestingly, our theorem lso reflects this; our theoreticl improvement is more significnt compred to CPI when differences in Q π vlues vry cross the stte spce. (See Section.3, where we show tht our lgorithm enjoys stronger performnce gurntees.) Policy improvement in (6) depends on the error in estimting the vlue of the previous policy π. An effective wy to keep this error smll is to perform roll-outs in sttes tht we visit under policy π. Unfortuntely, the computtionl cost increses exponentilly with the number of itertions I in the ILPI lgorithm, mking this pproch effective only when I is smll. An lterntive pproch, which we use in our invertedpendulum experiments in Section 3, is to estimte V π i by liner combintion of columns of feture mtrix: Vπ i Φθ, where Φ R X d is feture mtrix nd θ R d is prmeter vector. For exmple, we cn use the vlue itertion lgorithm to estimte θ: θ 0 = 0, ( ) 1 θ k+1 = Φ(x) Φ(x) Φ(x) t k (x, π i (x)), x S x S where S is set of sttes visited while running policy π i nd t k (x, ) = r(x)+γp (x,) Φθ k. Notice tht in our performnce gurntee (6), there is no estimtion error in sttes where V π i is consistent vlue estimte. For this reson, we propose the following modified procedure where trget vlues re thresholded with pproprite min/mx vlues: ( ) 1 θ k+1 = Φ(x) Φ(x) Φ(x) y k (x, π i (x)), 1340 x S x S
4 A Fst nd Relible Policy Improvement Algorithm where y k (x, ) = r(x) + γp (x,) z k nd min t k (x, ) if Φ(x)θ k < min t k (x, ) z k (x) = mx t k (x, ) if Φ(x)θ k > mx t k (x, ) Φ(x)θ k otherwise..1 Anlysis In this section, we show performnce bound for the LPI lgorithm. We strt with useful lemm tht expresses the objective c V π in terms of c V nd Bellmn error. The lemm is from Kkde nd Lngford (00). Its proof cn lso be extrcted from the proof of Theorem 1 of de Fris nd Vn Roy (003). Lemm 1 (Kkde nd Lngford (00)). Fix policy π nd vectors V, c R X. Let P π denote the probbility trnsition kernel under policy π. Define the mesure We hve v π,c = c t=0 γ t (P π ) t = c (I γp π ) 1. (7) c V π = c V + v π,c(l π V V ). (8) Lemm. Consider the policy π( x) = ν( x)(1 + Q π (x, ) E ν( x) Q π (x, )). Under Condition (3), we hve (L π V π )(x) = V π (x) nd under Condition (4), we hve (L π V π )(x) = E π( x) Q π (x, ). Proof. First consider Condition (4). We wnt to show tht for stte x, E π( x) Q π (x, ) = ν( x)(1 + Q π (x, ) E ν( x) Q π (x, )) Q π (x, ). This implies tht E π( x) Q π (x, ) = E ν( x) Q π (x, ) + E ν( x) Q π (x, ) (E ν( x) Q π (x, )) = E ν( x) Q π (x, ) + Vr ν( x) Q π (x, ). This lst equlity holds by Condition (4). We hve similr rgument when Condition (3) holds. Lemm 3. Let π w ( x) = ν( x)(1 + π (x, )w). Consider the function h(w) = c ( V π w) + v π,c (L π w ( V π w) V π w). Then h(w) = 1 Bw + gw + f where f = v π,c r, g = c V π v π,c V π + γv π,c (P ν V π ), B = x v π,c (x)vr ν( x) Q π (x, ). The proof is in Appendix A. The min result of this section is s follows. Theorem 4. Let I(V ) be the set of sttes such tht V is consistent vlue estimte (s defined in the beginning of this section). For ny strting stte distribution c, c V π c V π + c B(s 1) ( V π V π ) + v π,c (x) E π( x) Q π (x, ) V π (x). Proof. Recll the definition of π w nd h(w) from Lemm 3. Notice tht π s = π. The function h(w) cn be written s h(w) = c ( V π w) + v π,c (x) x ( ν( x)(1 + w π (x, ))(r(x) + γp (x,) V π w) We hve tht V π w ). h(0) = v π,c r = c (I γp π ) 1 r = c V π, where the second equlity holds by definition of v π,c in Lemm 1. If we set V = V π s nd π = π = π s, then it is pprent by compring (8) with the definition of h( ) in Lemm 3 tht h(s) = c T V π. Thus, h(0) = h(s). On the other hnd, h(1) = c V π + x v π,c (x)((l π V π )(x) V π (x)) c V π + c ( V π V π ) v π,c (x) (L π V π )(x) V π (x) x I( V π ) v π,c (x) (L π V π )(x) V π (x) = c V π + c ( V π V π ) v π,c (x) E π( x) Q π (x, ) V π (x). where the lst step holds by Lemm. Becuse h is convex nd 0 < 1 < s, h(1) h(s). We cn clculte the improvement: Write h in the qudrtic form h(w) = 1 Bw + gw + f, where B, g, f re defined in Lemm 3. We know tht h(s) = h(0) = f. Thus the improvement is h(s) h(1) = g B/. On the other hnd, h(s) = Bs / + gs + f = f nd so g = Bs/. Thus, h(s) h(1) = B(s 1)/, from which the theorem sttement follows. 1341
5 Ysin Abbsi-Ydkori, Peter L. Brtlett, Stephen J. Wright. Choosing s As Theorem 4 suggests, bigger vlue of s gives bigger policy improvement. On the other hnd, the nlysis is vlid s long s the probbilities π( x) = ν( x)(1+s π (x, )) re positive, nd this prevents us from choosing very lrge vlues of s. The next corollry relxes the positivity condition nd shows tht if these probbilities re negtive only in smll subset of the stte spce, we cn still hve policy improvement. Corollry 5. Let G be the set of good sttes where ν( x)(1 + s π (x, )) is positive nd let B = X G. Define the policy { π w( x) ν( x)(1 + w π (x, )) if x G = ν( x)(1 + π (x, )) if x B nd π = π s. Let We hve tht B = x v π,c (x)vr ν( x) Q π (x, ). c V π c V π + c ( V π V π ) v π,c (x) E π( x) Q π (x, ) V π (x) x B v π,c (x)vr ν( x) Q π (x, ) + Proof. Consider the function B(s 1) h (w) = c ( V π w) + v π,c (L π w ( V π w) V π w). Similr to the rgument in the proof of Theorem 4, we hve tht h (0) = v π,c r = c V π nd h (s) = c V π. Thus, h (0) = h (s). As before, h (1) c V π + c ( V π V π ) v π,c (x) E π( x) Q π (x, ) V π (x). Let B = x G v π,c (x)vr ν( x) Q π (x, ). The new h is lso qudrtic nd cn be written s h (w) = 1 B w + g w + f, for some g nd f. We know tht h (s) = h (0) = f. Thus the improvement is h (s) h (1) = g B /. On the other hnd, h (s) = B s / + g s + f = f nd so g = B s/. Thus, h (s) h (1) = B (s 1) B(s 1) = v π,c (x)vr ν( x) Q π (x, ), x B from which the sttement follows.. Input: Initil policy π 1, negtivity threshold ɛ, time horizon T, initil s 0 ; for i = 1,,..., I do s = s 0 ; repet Run policy π i for T steps; Estimte G i (s) using (9); If G i (s) > ɛ, set s = s/; until G i (s) ɛ; Estimte V π i; Obtin ν i from (3) or (4); Define new π i+1 bsed on Corollry 5; end for Figure 3: The Adptive Itertive LPI Algorithm. In prticulr, if x B (1γ)v π,c(x) ɛ for some smll ɛ, then c V π c V π + c ( V π V π ) ɛb B(s 1) + 4(1 γ) v π,c (x) E π( x) Q π (x, ) V π (x). This rgument motivtes n dptive procedure for updting s: strt from big vlue of s nd decrese it only when the frequency of visits to bd sttes becomes lrger thn threshold. The dptive lgorithm, clled AILPI, is shown in Figure 3. In the figure, π i is the ith policy, ν i is the corresponding bse policy, { π i+1 ν i ( x)(1 + s 0 ( x) = π i(x, )) if x G ν i ( x)(1 + π i(x, )) if x B, B i (s) = {x :, ν i ( x)(1 + s π i(x, )) < 0}, nd G i (s) = x B i(s) (1 γ)v πi,c(x). (9) To simplify the presenttion, we estimte G i (s) fter running policy for fixed number of rounds. We cn lso design version tht updtes the estimte in n online fshion nd decreses s s soon s the number of visits to bd sttes becomes lrge..3 Comprison with Conservtive Policy Itertion Let us compre the performnce bound in Theorem 4 with the performnce bound of Conservtive Policy Itertion. To simplify the rgument, we ssume the exct vlue functions re vilble nd ν is the uniform 134
6 A Fst nd Relible Policy Improvement Algorithm policy. Let Q π (x, ) = r(x) + γp (x,) V π be the stte-ction vlue of policy π nd let A π (x, ) = Q π (x, ) V π (x) be the dvntge function. Let g π (x) = rgmx Q π (x, ) be the greedy policy with respect to policy π nd let A π π (x) = π ( x)a π (x, ) be the policy dvntge of π with respect to π. Let A g = (1 γ) x = (1 γ) x v π,c (x)a g π π (x) v π,c (x)(mx Q π (x, ) V π (x)). Let E CPI = A g/(8b). Kkde nd Lngford (00) propose Conservtive Policy Itertion tht uses n pproximte greedy updte π CPI ( x) = (1 α) π( x) + α1 { = g π (x)} (10) for some α (0, 1). Kkde nd Lngford (00) show tht using the choice of α = (1 γ)a g /(4b), c V πcpi c V π + E CPI. Let N x = mx Q π (x, ) min Q π (x, ) denote the rnge of Q π (x, ). The CPI improvement cn be upper bounded by E CPI 1 8b ( x (1 γ)v π,c (x)n x ). Theorem 4 shows n improvement of c V π = c V π + Define B(s 1) = c V π + (s 1) x E LPI def = (s 1) x v π,c (x)vr ν( x) Q π (x, ). v π,c (x)vr ν( x) Q π (x, ). Becuse ν is ssumed to be uniform, Vr ν( x) Q π (x, ) = N x/4. Thus, E LPI = ((s 1)/4) x v π,c(x)n x. Let s choose b = γ nd s = 1/(bγ) (the most conservtive choice of s). Thus Thus, E LPI (1 γ )/(4γ ) x E LPI E CPI 1 + γ 4γ 1 8γ v π,c (x)n x. (1 γ)v π,c (x)nx x ( ) (1 γ)v π,c (x)n x. x A direct comprison is not possible becuse v π,c is different from v π,c. If we ssume tht v π,c nd v π,c re similr, by Jensen s inequlity, we expect E LPI to be bigger thn E CPI. We ttribute this difference to the fct tht, unlike CPI, the mixture coefficient in our updte rule is not constnt nd depends on the stte nd ction. Even if N x is uniform over the stte spce nd equl to constnt N, we still hve n improvement: E LPI E CPI N ( 1 + γ 1 ) 3N 4γ γ 8γ. In prctice, the recommended choice of α = (1 γ)a g /(4b) leds to very conservtive updtes nd very slow progress (Scherrer, 014). Often one needs to choose much lrger α to mke CPI prcticl, but there re no theoreticl gurntees for such choices. Scherrer (014) proposes doing line serch to find the best α. But unlike our dptive method, such procedure lcks theoreticl justifiction. As we show in experiments, even our most conservtive choice of s = 1/(bγ) results in fster progress thn CPI. The bove rgument ssumes mximum vrince for ν. If π is deterministic, then ν is lso deterministic, Vr ν( x) Q π (x, ) = 0, B = 0, nd the performnce bound in Theorem 4 shows no improvements. CPI does not hve this restriction nd cn be pplied with initil deterministic policies. Also, we require rewrds to be ction-independent, while CPI pplies to more generl rewrd functions. Let m π = mx x 1 {rgmx Q π (x, )} π( x) 1, A π π = mx x,x A π π (x) A π π (x ), nd α = (1 γ)a g /(γm π A g π π ). Pirott et l. (013) improve the theoreticl nlysis of Kkde nd Lngford (00) nd show tht if α 1 nd we updte the policy ccording to (10) with the choice of mixture coefficient α, the policy improvement is t lest A g/(γm π A g π π ). Although this improves upon CPI, estimting m π nd α is computtionlly hrd in lrge stte problems. Pirott et l. (013) lso propose multi-prmeter version tht uses different vlue of α for ech stte, but the improvement over the single prmeter version is not shown nd the method is computtionlly expensive. 3 Experiments We implemented the ILPI lgorithm in Python nd tested its performnce on three problems: two chin wlk problems nd blncing n inverted pendulum. The performnce of the lgorithm is compred with the performnce of CPI (Kkde nd Lngford, 00). 1343
7 Ysin Abbsi-Ydkori, Peter L. Brtlett, Stephen J. Wright Figure 4: Performnce of ILPI on chin wlk benchmrk (50 sttes). Ech run is repeted 10 times nd men nd stndrd devitions re reported. ILPI finds n optiml policy in less thn 10 itertions. 3.1 Chin Wlk Domins We tested the performnce of the lgorithm on two simple chin wlk problems. (See Section 9.1 in (Lgoudkis nd Prr, 003).) The first chin hs 50 sttes nd there re two ctions (Left nd Right) vilble in ech stte. An ction moves the stte in the intended direction with probbility 0.9, nd moves the sttes in the opposite direction with probbility 0.1. Rewrd is +1 in sttes 10 nd 41, nd is zero in other sttes. The discount fctor is 0.9. Figure 4 shows the performnce of the exct version of ILPI lgorithm on this benchmrk. The initil policy π 1 is the uniform rndom policy tht tkes Left nd Right with equl probbility. We chose s = 1/F (π 1 ) nd b = 0.9 in the ILPI lgorithm. Figure 4 shows tht the ILPI lgorithm chieves the performnce of the optiml policy in less thn 10 itertions. In comprison, the USPI lgorithm of Pirott et l. (013) needs 74 itertions to chieve this performnce. 3 CPI exhibits much slower progress (Pirott et l., 013). The second chin hs 4 sttes. The ction set, discount fctor, nd trnsition dynmics is the sme s before. Lgoudkis nd Prr (003) show tht LSPI finds the optiml policy in this problem, lthough Koller nd Prr (000) show tht n lgorithm tht is combintion of LSTD nd policy improvement oscilltes between the suboptiml policies RRRR nd LLLL (lwys going to the right nd lwys going to the left). Figure 5 shows the performnce of five versions of ILPI lgorithm on this benchmrk. The initil policy π 3 The vlue of optiml policy tht we find is slightly different thn the vlue reported by Pirott et l. (013). Figure 5: Performnce of ILPI on chin wlk benchmrk with 4 sttes. 95% confidence intervls re shown for pproximte lgorithms. is lwys the uniform rndom policy tht tkes Left nd Right with equl probbility. The first three versions (shown by blue circles, strs, nd red circles), use s i = 1/F (π i ), s i = 1/(γ mx x V π i(x)), nd s = 1/(γb), respectively, nd vlue functions V π re computed exctly. Notice tht the first two versions chnge s i in ech itertion dptively. The fourth version (shown by tringles), uses s = 1/(γb). Vlue functions re estimted by verging over 4 roll-outs of length 0. Other quntities (ν nd Q π ) re lso estimted by verging over 4 smples. The lst version (shown by the pink line) uses only one roll-out to estimte quntity. This lst version fils to improve the initil policy (pprently due to lrge estimtion errors). We lso show the performnce of the CPI lgorithm, which improves the policies very slowly. Pirott et l. (013) show tht their lgorithms find ner optiml policy in 49 itertions, however s discussed in Section.3, these pproximte lgorithms use quntity m π = mx x 1 {rgmx Q π (x, )} π(. x) nd hving ccess to such quntity for n pproximte lgorithm is questionble. We mke few observtions. First, ll versions of the exct ILPI lgorithm re fster thn CPI. Second, using roll-outs to estimte vlue functions re sufficient to improve policies, however, the number of roll-outs should be sufficiently lrge so tht estimtion errors become smll. 3. Inverted Pendulum The problem is to blnce n inverted pendulum t the upright position by pplying horizontl forces to the crt tht the pendulum is ttched to. The length nd mss of the pendulum re unknown to the lerner. The ctions re left force (-50N), right force (50N), or no force (0N). A uniform perturbtion in [-10,10] is dded to the ction. The stte vector consists of the verticl 1344
8 A Fst nd Relible Policy Improvement Algorithm () Performnce of ILPI (s = 1 γb ). (b) Performnce of AILPI. (c) Performnce of ILPI (s = 100). Figure 6: Performnce of ILPI nd AILPI on inverted pendulum benchmrk. 95% confidence intervls re shown. ngle θ nd the ngulr velocity θ of the pendulum. Given ction, the stte evolves ccording to θ = 9.8 sin(θ) αml( θ) sin(θ)/ α cos(θ) 4l/3 αml cos (θ) Here, m = kg is the mss of the pendulum, M = 8kg is the mss of the crt, l = 0.5m is the length of the pendulum, nd α = 1/(m + M). The simultion step is 0.1 seconds. The objective is to keep the ngle in [π/, π/]. An episode ends when the ngle of the pendulum is outside this intervl or when the episode exceeds 3000 steps. We tested the performnce of the itertive policy improvement lgorithm on this problem. We used 10 bsis functions to estimte vlue of policies: Ψ(x) = (1, exp( x p 1 ),..., exp( x p 9 )), where {p 1,..., p 9 } = {π/4, 0, π/4} {1, 0, +1}. To estimte vlue of policy π i, we collected dt by running π i for 100 episodes. Then we used this dt nd estimted V π i by n pproximte vlue itertion (AVI) lgorithm (using the dditionl trick tht we introduced t the end of Section ). The number of itertions of AVI is 100. We performed 0 policy improvements (so I = 0 in Figure ). We chose γ = 0.95, b = 0.9, nd s = 1/(γb) in the ILPI lgorithm. Figure 6() shows the performnce of the ILPI lgorithm. The CPI lgorithm exhibits very slow progress; even fter 100 itertions, the number of steps is less thn 15. The performnce of the ILPI lgorithm cn be significntly improved by using lrger s. Becuse the stte spce is continuous, clculting mx x Vπ i or F (π i ) is not esy. Insted, we run the AILPI lgorithm tht dptively updtes s. Figure 6(b) shows performnce of AILPI with initil s = 0. We choose ɛ = 0. nd 100 episodes re used for vlue estimtion. Figure 6(c) shows tht ILPI with fixed s = 100 finds the optiml policy in itertions.. 4 Conclusions We proposed policy itertion lgorithm tht is gurnteed to improve the performnce of the initil stochstic policy. We showed tht the theoreticl improvement is bigger thn tht of Conservtive Policy Itertion lgorithm. Our theorem hs two dvntges compred with the gurntees tht re known for CPI: First, the mixture coefficients re stte-dependent nd becuse of this, our improvement hs the form of the expecttion of qudrtics while the improvement of CPI hs the form of the qudrtic of n expecttion. Second, our theorem llows for much bigger steps towrds the greedy policy, hence fster convergence. Our experiments re consistent with these theoreticl dvntges. Acknowledgements We grtefully cknowledge the support of the Austrlin Reserch Council through n Austrlin Lurete Fellowship (FL ) nd through the Austrlin Reserch Council Centre of Excellence for Mthemticl nd Sttisticl Frontiers (ACEMS). 1345
9 Ysin Abbsi-Ydkori, Peter L. Brtlett, Stephen J. Wright References D. P. Bertseks nd J. Tsitsiklis. Neuro-Dynmic Progrmming. Athen Scientific, D. P. de Fris nd B. Vn Roy. The liner progrmming pproch to pproximte dynmic progrmming. Opertions Reserch, 51, 003. S. Kkde nd J. Lngford. Approximtely optiml pproximte reinforcement lerning. In ICML, 00. D. Koller nd R. Prr. Policy itertion for fctored MDPs. In Sixteenth Conference on Uncertinty in Artificil Intelligence, 000. M. G. Lgoudkis nd R. Prr. Lest-squres policy itertion. JMLR, 4: , 003. M. Pirott, M. Restelli, A. Pecorino, nd D. Clndriello. Sfe policy itertion. In ICML, 013. B. Scherrer. Approximte policy itertion schemes: A comprison. In ICML, 014. R. S. Sutton nd A. G. Brto. Reinforcement Lerning: An Introduction. Brdford Book. MIT Press, P. S. Thoms, G. Theochrous, nd M. Ghvmzdeh. High confidence policy improvement. In ICML,
Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More information2D1431 Machine Learning Lab 3: Reinforcement Learning
2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed
More informationBellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More informationReinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More information{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationChapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More informationContinuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht
More informationBest Approximation. Chapter The General Case
Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More informationMulti-Armed Bandits: Non-adaptive and Adaptive Sampling
CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent
More informationReview of Calculus, cont d
Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some
More informationChapter 5 : Continuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll
More informationMonte Carlo method in solving numerical integration and differential equation
Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The
More informationExam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1
Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution
More informationNumerical integration
2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter
More informationMath 1B, lecture 4: Error bounds for numerical methods
Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the
More informationLecture 14: Quadrature
Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl
More information1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationTheoretical foundations of Gaussian quadrature
Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of
More informationThe Regulated and Riemann Integrals
Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue
More information3.4 Numerical integration
3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,
More informationSolution for Assignment 1 : Intro to Probability and Statistics, PAC learning
Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (
More informationSUMMER KNOWHOW STUDY AND LEARNING CENTRE
SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18
More informationChapter 3 Solving Nonlinear Equations
Chpter 3 Solving Nonliner Equtions 3.1 Introduction The nonliner function of unknown vrible x is in the form of where n could be non-integer. Root is the numericl vlue of x tht stisfies f ( x) 0. Grphiclly,
More informationMath 270A: Numerical Linear Algebra
Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner
More informationAPPROXIMATE INTEGRATION
APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be
More informationDiscrete Least-squares Approximations
Discrete Lest-squres Approximtions Given set of dt points (x, y ), (x, y ),, (x m, y m ), norml nd useful prctice in mny pplictions in sttistics, engineering nd other pplied sciences is to construct curve
More informationChapter 3 Polynomials
Dr M DRAIEF As described in the introduction of Chpter 1, pplictions of solving liner equtions rise in number of different settings In prticulr, we will in this chpter focus on the problem of modelling
More informationRiemann is the Mann! (But Lebesgue may besgue to differ.)
Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >
More informationLecture 1. Functional series. Pointwise and uniform convergence.
1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is
More informationChapter 4 Contravariance, Covariance, and Spacetime Diagrams
Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz
More informationReview of basic calculus
Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below
More informationMath& 152 Section Integration by Parts
Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible
More informationNumerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1
AMSC/CMSC 46/466 T. von Petersdorff 1 umericl Integrtion 1 Introduction We wnt to pproximte the integrl I := f xdx where we re given, b nd the function f s subroutine. We evlute f t points x 1,...,x n
More informationThe steps of the hypothesis test
ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of
More informationNUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.
NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with
More informationAbstract inner product spaces
WEEK 4 Abstrct inner product spces Definition An inner product spce is vector spce V over the rel field R equipped with rule for multiplying vectors, such tht the product of two vectors is sclr, nd the
More informationTHE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.
THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem
More informationA REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007
A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus
More informationNumerical Analysis: Trapezoidal and Simpson s Rule
nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =
More informationCMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature
CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy
More informationOnline Supplements to Performance-Based Contracts for Outpatient Medical Services
Jing, Png nd Svin: Performnce-bsed Contrcts Article submitted to Mnufcturing & Service Opertions Mngement; mnuscript no. MSOM-11-270.R2 1 Online Supplements to Performnce-Bsed Contrcts for Outptient Medicl
More information1 Probability Density Functions
Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our
More informationMath 61CM - Solutions to homework 9
Mth 61CM - Solutions to homework 9 Cédric De Groote November 30 th, 2018 Problem 1: Recll tht the left limit of function f t point c is defined s follows: lim f(x) = l x c if for ny > 0 there exists δ
More informationThe Wave Equation I. MA 436 Kurt Bryan
1 Introduction The Wve Eqution I MA 436 Kurt Bryn Consider string stretching long the x xis, of indeterminte (or even infinite!) length. We wnt to derive n eqution which models the motion of the string
More informationLecture 21: Order statistics
Lecture : Order sttistics Suppose we hve N mesurements of sclr, x i =, N Tke ll mesurements nd sort them into scending order x x x 3 x N Define the mesured running integrl S N (x) = 0 for x < x = i/n for
More informationRecitation 3: More Applications of the Derivative
Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech
More information221B Lecture Notes WKB Method
Clssicl Limit B Lecture Notes WKB Method Hmilton Jcobi Eqution We strt from the Schrödinger eqution for single prticle in potentil i h t ψ x, t = [ ] h m + V x ψ x, t. We cn rewrite this eqution by using
More informationDiscrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17
EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,
More informationAdvanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004
Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when
More informationMath 8 Winter 2015 Applications of Integration
Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl
More information1 The Lagrange interpolation formula
Notes on Qudrture 1 The Lgrnge interpoltion formul We briefly recll the Lgrnge interpoltion formul. The strting point is collection of N + 1 rel points (x 0, y 0 ), (x 1, y 1 ),..., (x N, y N ), with x
More informationECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance
Generl structure ECO 37 Economics of Uncertinty Fll Term 007 Notes for lectures 4. Stochstic Dominnce Here we suppose tht the consequences re welth mounts denoted by W, which cn tke on ny vlue between
More informationDuality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.
Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we
More informationW. We shall do so one by one, starting with I 1, and we shall do it greedily, trying
Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)
More informationOrthogonal Polynomials and Least-Squares Approximations to Functions
Chpter Orthogonl Polynomils nd Lest-Squres Approximtions to Functions **4/5/3 ET. Discrete Lest-Squres Approximtions Given set of dt points (x,y ), (x,y ),..., (x m,y m ), norml nd useful prctice in mny
More informationLECTURE NOTE #12 PROF. ALAN YUILLE
LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of
More informationMIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:
1 2 MIXED MODELS (Sections 17.7 17.8) Exmple: Suppose tht in the fiber breking strength exmple, the four mchines used were the only ones of interest, but the interest ws over wide rnge of opertors, nd
More informationAn approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95
An pproximtion to the rithmetic-geometric men G.J.O. Jmeson, Mth. Gzette 98 (4), 85 95 Given positive numbers > b, consider the itertion given by =, b = b nd n+ = ( n + b n ), b n+ = ( n b n ) /. At ech
More informationSection 11.5 Estimation of difference of two proportions
ection.5 Estimtion of difference of two proportions As seen in estimtion of difference of two mens for nonnorml popultion bsed on lrge smple sizes, one cn use CLT in the pproximtion of the distribution
More informationNumerical Integration
Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the
More information8 Laplace s Method and Local Limit Theorems
8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved
More informationDiscrete Mathematics and Probability Theory Summer 2014 James Cook Note 17
CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking
More informationGoals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite
Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite
More informationp-adic Egyptian Fractions
p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction
More informationStudent Activity 3: Single Factor ANOVA
MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether
More informationLecture 19: Continuous Least Squares Approximation
Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for
More informationDriving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d
Interntionl Industril Informtics nd Computer Engineering Conference (IIICEC 15) Driving Cycle Construction of City Rod for Hybrid Bus Bsed on Mrkov Process Deng Pn1,, Fengchun Sun1,b*, Hongwen He1, c,
More informationDefinite integral. Mathematics FRDIS MENDELU
Definite integrl Mthemtics FRDIS MENDELU Simon Fišnrová Brno 1 Motivtion - re under curve Suppose, for simplicity, tht y = f(x) is nonnegtive nd continuous function defined on [, b]. Wht is the re of the
More informationSection 4.8. D v(t j 1 ) t. (4.8.1) j=1
Difference Equtions to Differentil Equtions Section.8 Distnce, Position, nd the Length of Curves Although we motivted the definition of the definite integrl with the notion of re, there re mny pplictions
More informationODE: Existence and Uniqueness of a Solution
Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =
More informationNumerical Integration
Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl
More informationa < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1
Mth 33 Volume Stewrt 5.2 Geometry of integrls. In this section, we will lern how to compute volumes using integrls defined by slice nlysis. First, we recll from Clculus I how to compute res. Given the
More informationP 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)
1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this
More informationdifferent methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).
Mth 1A with Professor Stnkov Worksheet, Discussion #41; Wednesdy, 12/6/217 GSI nme: Roy Zho Problems 1. Write the integrl 3 dx s limit of Riemnn sums. Write it using 2 intervls using the 1 x different
More informationMath 426: Probability Final Exam Practice
Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by
More informationCBE 291b - Computation And Optimization For Engineers
The University of Western Ontrio Fculty of Engineering Science Deprtment of Chemicl nd Biochemicl Engineering CBE 9b - Computtion And Optimiztion For Engineers Mtlb Project Introduction Prof. A. Jutn Jn
More informationapproaches as n becomes larger and larger. Since e > 1, the graph of the natural exponential function is as below
. Eponentil nd rithmic functions.1 Eponentil Functions A function of the form f() =, > 0, 1 is clled n eponentil function. Its domin is the set of ll rel f ( 1) numbers. For n eponentil function f we hve.
More informationMAA 4212 Improper Integrals
Notes by Dvid Groisser, Copyright c 1995; revised 2002, 2009, 2014 MAA 4212 Improper Integrls The Riemnn integrl, while perfectly well-defined, is too restrictive for mny purposes; there re functions which
More informationResearch Article Moment Inequalities and Complete Moment Convergence
Hindwi Publishing Corportion Journl of Inequlities nd Applictions Volume 2009, Article ID 271265, 14 pges doi:10.1155/2009/271265 Reserch Article Moment Inequlities nd Complete Moment Convergence Soo Hk
More information( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that
Arc Length of Curves in Three Dimensionl Spce If the vector function r(t) f(t) i + g(t) j + h(t) k trces out the curve C s t vries, we cn mesure distnces long C using formul nerly identicl to one tht we
More informationNear-Bayesian Exploration in Polynomial Time
J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning
More informationA recursive construction of efficiently decodable list-disjunct matrices
CSE 709: Compressed Sensing nd Group Testing. Prt I Lecturers: Hung Q. Ngo nd Atri Rudr SUNY t Bufflo, Fll 2011 Lst updte: October 13, 2011 A recursive construction of efficiently decodble list-disjunct
More informationAcceptance Sampling by Attributes
Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire
More informationNew Expansion and Infinite Series
Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University
More informationEstimation of Binomial Distribution in the Light of Future Data
British Journl of Mthemtics & Computer Science 102: 1-7, 2015, Article no.bjmcs.19191 ISSN: 2231-0851 SCIENCEDOMAIN interntionl www.sciencedomin.org Estimtion of Binomil Distribution in the Light of Future
More informationLecture Note 9: Orthogonal Reduction
MATH : Computtionl Methods of Liner Algebr 1 The Row Echelon Form Lecture Note 9: Orthogonl Reduction Our trget is to solve the norml eution: Xinyi Zeng Deprtment of Mthemticl Sciences, UTEP A t Ax = A
More informationDefinite integral. Mathematics FRDIS MENDELU. Simona Fišnarová (Mendel University) Definite integral MENDELU 1 / 30
Definite integrl Mthemtics FRDIS MENDELU Simon Fišnrová (Mendel University) Definite integrl MENDELU / Motivtion - re under curve Suppose, for simplicity, tht y = f(x) is nonnegtive nd continuous function
More informationEntropy and Ergodic Theory Notes 10: Large Deviations I
Entropy nd Ergodic Theory Notes 10: Lrge Devitions I 1 A chnge of convention This is our first lecture on pplictions of entropy in probbility theory. In probbility theory, the convention is tht ll logrithms
More informationPhysics 116C Solution of inhomogeneous ordinary differential equations using Green s functions
Physics 6C Solution of inhomogeneous ordinry differentil equtions using Green s functions Peter Young November 5, 29 Homogeneous Equtions We hve studied, especilly in long HW problem, second order liner
More informationMath 360: A primitive integral and elementary functions
Mth 360: A primitive integrl nd elementry functions D. DeTurck University of Pennsylvni October 16, 2017 D. DeTurck Mth 360 001 2017C: Integrl/functions 1 / 32 Setup for the integrl prtitions Definition:
More informationVariational Techniques for Sturm-Liouville Eigenvalue Problems
Vritionl Techniques for Sturm-Liouville Eigenvlue Problems Vlerie Cormni Deprtment of Mthemtics nd Sttistics University of Nebrsk, Lincoln Lincoln, NE 68588 Emil: vcormni@mth.unl.edu Rolf Ryhm Deprtment
More informationWe partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.
Mth 255 - Vector lculus II Notes 4.2 Pth nd Line Integrls We begin with discussion of pth integrls (the book clls them sclr line integrls). We will do this for function of two vribles, but these ides cn
More informationModule 6: LINEAR TRANSFORMATIONS
Module 6: LINEAR TRANSFORMATIONS. Trnsformtions nd mtrices Trnsformtions re generliztions of functions. A vector x in some set S n is mpped into m nother vector y T( x). A trnsformtion is liner if, for
More informationMath 113 Exam 2 Practice
Mth 3 Exm Prctice Februry 8, 03 Exm will cover 7.4, 7.5, 7.7, 7.8, 8.-3 nd 8.5. Plese note tht integrtion skills lerned in erlier sections will still be needed for the mteril in 7.5, 7.8 nd chpter 8. This
More informationRiemann Sums and Riemann Integrals
Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment
More informationIntegral equations, eigenvalue, function interpolation
Integrl equtions, eigenvlue, function interpoltion Mrcin Chrząszcz mchrzsz@cernch Monte Crlo methods, 26 My, 2016 1 / Mrcin Chrząszcz (Universität Zürich) Integrl equtions, eigenvlue, function interpoltion
More information