A Fast and Reliable Policy Improvement Algorithm

Size: px
Start display at page:

Download "A Fast and Reliable Policy Improvement Algorithm"

Transcription

1 A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce simple, efficient method tht improves stochstic policies for Mrkov decision processes. The computtionl complexity is the sme s tht of the vlue estimtion problem. We prove tht when the vlue estimtion error is smll, this method gives n improvement in performnce tht increses with certin vrince properties of the initil policy nd trnsition dynmics. Performnce in numericl experiments compres fvorbly with previous policy improvement lgorithms. 1 Introduction Mrkov decision problems (MDPs) re sequentil decision problems where loss hs memory (lso known s stte). The objective is to find policy mpping from sttes to ctions tht yields high discounted cumultive rewrd. In lrge-stte problems, finding n optiml policy is chllenging nd one hs to resort to pproximtions. Unfortuntely, mny pproximte MDP lgorithms do not lwys improve monotoniclly. We propose computtionlly efficient lgorithm nd show tht it genertes sequence of incresingly better policies. We consider MDPs with finite stte nd ction spces, nd rewrd function r defined on the stte spce. The distribution of the stte t time t + 1 is function of the stte x t nd ction t t the previous time t. We define trnsition mtrix P, with rows indexed by stte-ction pirs nd columns indexed by subsequent sttes, so tht P (xt, t) is the vector of probbilities of stte x t+1. A policy π is mpping from sttes to probbility distributions over ctions. We write π( x) Appering in Proceedings of the 19 th Interntionl Conference on Artificil Intelligence nd Sttistics (AISTATS) 016, Cdiz, Spin. JMLR: W&CP volume 51. Copyright 016 by the uthors. s the probbility of ction in stte x under policy π. (We lso use π(x t ) to denote the rndom ction t distributed ccording to π( x t ).) For strting stte x 0, the vlue function corresponding to π is defined by [ ] V π (x 0 ) = E γ t r(x t ), (1) t=0 where γ (0, 1) is discount fctor, x t is the stte t time t, nd t π( x t ). The expecttion is over the stochsticity in the policy nd in the evolution of sttes. The objective is to find policy π such tht the totl cumultive loss V π (x 0 ) is ner-optiml. (The optiml policy is the one for which V π (x 0 ) is mximized.) We ssume tht the rewrd function is bounded in [0, (1 γ)b] for some b (0, 1). There is vst literture on Mrkov decision problems nd reinforcement lerning (RL) (Sutton nd Brto, 1998, Bertseks nd Tsitsiklis, 1996). Dynmic progrmming (DP) lgorithms, such s vlue itertion nd policy itertion, re stndrd techniques for computing the optiml policy. In lrge stte spce problems, exct DP is not fesible, becuse the computtionl complexity scles t lest qudrticlly with the number of sttes. In such problems, the optiml vlue function cn be pproximted with liner combintion of smll number of fetures, with the understnding tht serching in this low dimensionl subspce is esier thn solving the originl problem. Unlike exct DP, pproximte DP does not necessrily improve the policy in ech itertion (Kkde nd Lngford, 00). Given stochstic policy π, our method finds n estimte V π for its vlue, nd returns n improved policy π such tht V π (x 0 ) V π (x 0 ) E( V π, V π ) + for some policy evlution error E( V π, V π ) nd some positive sclr. Our performnce bounds re composed of policy evlution (PE) error term nd positive policy improvement (PI) term. The min dvntge of both our method nd CPI, by comprison with API, is tht we cn obtin strict policy improvement s long s the PI term is bigger thn the PE term. If the PE error is very lrge, our lgorithm might fil to improve the policy. The sme is true of the CPI pproch of 1338

2 A Fst nd Relible Policy Improvement Algorithm Kkde nd Lngford (00). Vlue estimtes however re needed only t the sttes tht the gent visits under the policy. Estimtes cn be obtined by performing roll-outs from the current stte. By choosing the number of roll-outs ppropritely, we cn control the ccurcy of these estimtes, nd thus ensure policy improvement. For API, the performnce is only gurnteed to not degrde by more thn the PE error. The policy π is rndomized, ssigning lrger probbilities to ctions with lrger vlue estimtes. The closest to our work is Conservtive Policy Itertion (CPI) of Kkde nd Lngford (00) tht uses n pproximte greedy updte. Pirott et l. (013) study severl extensions of CPI. Thoms et l. (015) propose different pproch tht gurntees sfe policy improvement, but the computtionl complexity of their method is high. Our contributions re s follows: (1) We propose policy itertion scheme tht mkes step towrds the greedy policy, however unlike CPI, the mixture coefficients re stte-dependent nd unlike Pirott et l. (013), these stte-dependent coefficients cn be computed efficiently. () We nlyze the proposed lgorithm nd show tht its performnce improvement is lrger thn tht of CPI. While the improvement in CPI hs the form of the qudrtic of n expecttion, our improvement hs the form of the expecttion of qudrtics. Moreover, the mixture coefficients cn be significntly lrger in our updtes, mking our lgorithm prcticl while gurnteed to improve the initil policy. (3) We study the proposed lgorithm numericlly on chin-wlk nd inverted-pendulum benchmrks, showing tht it performs well in these domins. 1.1 Nottion The expecttion of rndom vrible z with respect to distribution v is denoted by E v z = p v(p)z(p), where summtion is over the countble domin of z. For policy π, we write E π( x) z = π( x)z(x, ) nd E π( x)p z = π( x)p (x,)z. Similrly, Vr π( x) z = E π( x) z (E π( x) z). Vribles z nd y cn be sclrs, vectors, or mtrices. We use P π to denote the probbility trnsition mtrix under policy π. We use L π to denote the Bellmn opertor: for ny V R X, (L π V )(x) = π( x)(r(x, ) + γp (x,) V ). 1 Algorithm We ssume tht rewrd is independent of the ction. From here on, we use r(x) to represent r(x, ), since the rewrd is independent of in A. Fix constnt 1 For ny V R X, P (x,) V = x P (x x, )V (x ). b < 1 nd scle rewrds such tht r(x) [0, (1 γ)b]. This implies tht V π (x) (0, b) for ny policy π nd stte x. We sy function V R X is consistent vlue estimte if for ny stte x, min(r(x) + γp (x,) V ) V (x) mx (r(x) + γp (x,)v ). Let π be n rbitrry policy. Let V π R X be the vlue of π. Let V π R X be n pproximtion of V π, nd define Q π (x, ) = r(x) + γp (x,) V π. First, check if V π is consistent vlue estimte: min Q π (x, ) V π (x) mx If () holds, find policy ν such tht Q π (x, ). () V π (x) = E ν( x) Q π (x, ) + Vr ν( x) Q π (x, ). (3) Otherwise find policy ν such tht E π( x) Q π (x, ) = E ν( x) Q π (x, ) + Vr ν( x) Q π (x, ). (4) Eqution (4) lwys hs solution ν. If we choose ν = π, then LHS is no more thn RHS. On the other hnd, if ν ssigns ll the probbility mss to rgmin Q π (x, ), then Vr ν( x) Q π (x, ) = 0 nd LHS is no less thn RHS. As RHS is continuous function in ν, the bove eqution hs solution nd t lest one { solution is convex } combintion of π( x) nd 1 rgmin Q π (x, ). Similrly, (3) hs solution under condition (). Becuse of monotonicity, the solution cn be found efficiently by binry serch. Let π (x, ) = Q π (x, ) E ν( x) Q π (x, ) nd π( x) = ν( x)(1 + π (x, )). Inclusion of the term E ν( x) Q π (x, ) ensures tht the probbilities sum to one: A π( x) = 1 for ll x X. In the bsence of estimtion error, tht is, V π = V π, it cn be shown tht L π V π = V π. (See Lemm.) Although π might be different from π, it hs the sme vlue function V π = V π. Let F ( π) = mx x, π (x, ). nd define the policy Choose s = 1/F ( π) π( x) = ν( x)(1 + s π (x, )). (5) If π (x, ) = 0 for ll x nd, we use the convention tht 0 1/0 = 0. If we do not hve ccess to good estimte of F ( π), choose s = 1/(γ mx x V π (x)). This ensures tht γ(p x, V π )s 1. If we do not hve ccess to good estimte of mx x V π (x), then we cn use the more conservtive choice of s = 1/(γb). In prctice, when estimting F ( π) nd mx x V π (x) is hrd, we strt from lrge vlue of s nd decrese it when we observe negtive π vlue. 1339

3 Ysin Abbsi-Ydkori, Peter L. Brtlett, Stephen J. Wright Input: Policy π, constnt s; for t = 1,,... do Observe stte x t ; Estimte V π (x t ) nd Q π (x t, ) for A; if Inequlity () holds then Obtin ν( x) such tht (3) is stisfied; else Obtin ν( x) such tht (4) is stisfied; end if Tke ction smpled ccording to π( x t ) := ν( x t )(1 + s π (x t, )), where s is defined in the text; end for Input: Initil policy π 1, constnt s, time horizon T ; for i = 1,,..., I do Run policy π i for T steps; Estimte V π i; Obtin ν from (3) or (4); Define the new policy for ll x, : π i+1 ( x) := ν( x)(1 + s π i(x, )), where s is defined in the text; end for Figure : Itertive LPI Algorithm. Figure 1: Linerized Policy Improvement Algorithm. The definition of s ensures tht π( x) 0 for ll x nd. In summry, we reshpe policy π nd obtin π tht hs the sme vlue function. Then π( x) is obtined by incresing the probbility of ctions with positive π (x, ). We clculte ν( x) only when we visit stte x. So we do not need to perform these clcultions for ll sttes beforehnd. We cll the resulting lgorithm the LPI lgorithm for Linerized Policy Improvement. Pseudo-code of the lgorithm is given in Figure 1. Let I(V ) be the set of sttes such tht V is consistent vlue estimte. In Theorem 4, we show tht for ny strting stte distribution c R X, we hve c V π c V π c B(s 1) ( V π V π ) + (6) v π,c (x) E π( x) Q π (x, ) V π (x), where v π,c = c t=0 γt (P π ) t nd B = x In prticulr, if V π = V π, then v π,c (x)vr ν( x) Q π (x, ). c V π c V π + B(s 1)/. All quntities on the RHS cn be estimted by rollouts, which provides n efficient wy to estimte policy improvement. We cn iterte the procedure of Figure 1 to improve the policy. The resulting lgorithm, clled Itertive LPI or ILPI, is shown in Figure. Our updte rule (5) hs similrities with the CPI rule of Kkde nd Lngford (00), lthough it is not convex combintion of the current policy nd the greedy policy. Also, unlike CPI, our updte is nonuniform cross the stte spce. Updte rule (5) mkes smll chnges to the current policy when there re smll differences in Q π vlues, nd lrger chnges when the differences in Q π vlues re more substntil. Interestingly, our theorem lso reflects this; our theoreticl improvement is more significnt compred to CPI when differences in Q π vlues vry cross the stte spce. (See Section.3, where we show tht our lgorithm enjoys stronger performnce gurntees.) Policy improvement in (6) depends on the error in estimting the vlue of the previous policy π. An effective wy to keep this error smll is to perform roll-outs in sttes tht we visit under policy π. Unfortuntely, the computtionl cost increses exponentilly with the number of itertions I in the ILPI lgorithm, mking this pproch effective only when I is smll. An lterntive pproch, which we use in our invertedpendulum experiments in Section 3, is to estimte V π i by liner combintion of columns of feture mtrix: Vπ i Φθ, where Φ R X d is feture mtrix nd θ R d is prmeter vector. For exmple, we cn use the vlue itertion lgorithm to estimte θ: θ 0 = 0, ( ) 1 θ k+1 = Φ(x) Φ(x) Φ(x) t k (x, π i (x)), x S x S where S is set of sttes visited while running policy π i nd t k (x, ) = r(x)+γp (x,) Φθ k. Notice tht in our performnce gurntee (6), there is no estimtion error in sttes where V π i is consistent vlue estimte. For this reson, we propose the following modified procedure where trget vlues re thresholded with pproprite min/mx vlues: ( ) 1 θ k+1 = Φ(x) Φ(x) Φ(x) y k (x, π i (x)), 1340 x S x S

4 A Fst nd Relible Policy Improvement Algorithm where y k (x, ) = r(x) + γp (x,) z k nd min t k (x, ) if Φ(x)θ k < min t k (x, ) z k (x) = mx t k (x, ) if Φ(x)θ k > mx t k (x, ) Φ(x)θ k otherwise..1 Anlysis In this section, we show performnce bound for the LPI lgorithm. We strt with useful lemm tht expresses the objective c V π in terms of c V nd Bellmn error. The lemm is from Kkde nd Lngford (00). Its proof cn lso be extrcted from the proof of Theorem 1 of de Fris nd Vn Roy (003). Lemm 1 (Kkde nd Lngford (00)). Fix policy π nd vectors V, c R X. Let P π denote the probbility trnsition kernel under policy π. Define the mesure We hve v π,c = c t=0 γ t (P π ) t = c (I γp π ) 1. (7) c V π = c V + v π,c(l π V V ). (8) Lemm. Consider the policy π( x) = ν( x)(1 + Q π (x, ) E ν( x) Q π (x, )). Under Condition (3), we hve (L π V π )(x) = V π (x) nd under Condition (4), we hve (L π V π )(x) = E π( x) Q π (x, ). Proof. First consider Condition (4). We wnt to show tht for stte x, E π( x) Q π (x, ) = ν( x)(1 + Q π (x, ) E ν( x) Q π (x, )) Q π (x, ). This implies tht E π( x) Q π (x, ) = E ν( x) Q π (x, ) + E ν( x) Q π (x, ) (E ν( x) Q π (x, )) = E ν( x) Q π (x, ) + Vr ν( x) Q π (x, ). This lst equlity holds by Condition (4). We hve similr rgument when Condition (3) holds. Lemm 3. Let π w ( x) = ν( x)(1 + π (x, )w). Consider the function h(w) = c ( V π w) + v π,c (L π w ( V π w) V π w). Then h(w) = 1 Bw + gw + f where f = v π,c r, g = c V π v π,c V π + γv π,c (P ν V π ), B = x v π,c (x)vr ν( x) Q π (x, ). The proof is in Appendix A. The min result of this section is s follows. Theorem 4. Let I(V ) be the set of sttes such tht V is consistent vlue estimte (s defined in the beginning of this section). For ny strting stte distribution c, c V π c V π + c B(s 1) ( V π V π ) + v π,c (x) E π( x) Q π (x, ) V π (x). Proof. Recll the definition of π w nd h(w) from Lemm 3. Notice tht π s = π. The function h(w) cn be written s h(w) = c ( V π w) + v π,c (x) x ( ν( x)(1 + w π (x, ))(r(x) + γp (x,) V π w) We hve tht V π w ). h(0) = v π,c r = c (I γp π ) 1 r = c V π, where the second equlity holds by definition of v π,c in Lemm 1. If we set V = V π s nd π = π = π s, then it is pprent by compring (8) with the definition of h( ) in Lemm 3 tht h(s) = c T V π. Thus, h(0) = h(s). On the other hnd, h(1) = c V π + x v π,c (x)((l π V π )(x) V π (x)) c V π + c ( V π V π ) v π,c (x) (L π V π )(x) V π (x) x I( V π ) v π,c (x) (L π V π )(x) V π (x) = c V π + c ( V π V π ) v π,c (x) E π( x) Q π (x, ) V π (x). where the lst step holds by Lemm. Becuse h is convex nd 0 < 1 < s, h(1) h(s). We cn clculte the improvement: Write h in the qudrtic form h(w) = 1 Bw + gw + f, where B, g, f re defined in Lemm 3. We know tht h(s) = h(0) = f. Thus the improvement is h(s) h(1) = g B/. On the other hnd, h(s) = Bs / + gs + f = f nd so g = Bs/. Thus, h(s) h(1) = B(s 1)/, from which the theorem sttement follows. 1341

5 Ysin Abbsi-Ydkori, Peter L. Brtlett, Stephen J. Wright. Choosing s As Theorem 4 suggests, bigger vlue of s gives bigger policy improvement. On the other hnd, the nlysis is vlid s long s the probbilities π( x) = ν( x)(1+s π (x, )) re positive, nd this prevents us from choosing very lrge vlues of s. The next corollry relxes the positivity condition nd shows tht if these probbilities re negtive only in smll subset of the stte spce, we cn still hve policy improvement. Corollry 5. Let G be the set of good sttes where ν( x)(1 + s π (x, )) is positive nd let B = X G. Define the policy { π w( x) ν( x)(1 + w π (x, )) if x G = ν( x)(1 + π (x, )) if x B nd π = π s. Let We hve tht B = x v π,c (x)vr ν( x) Q π (x, ). c V π c V π + c ( V π V π ) v π,c (x) E π( x) Q π (x, ) V π (x) x B v π,c (x)vr ν( x) Q π (x, ) + Proof. Consider the function B(s 1) h (w) = c ( V π w) + v π,c (L π w ( V π w) V π w). Similr to the rgument in the proof of Theorem 4, we hve tht h (0) = v π,c r = c V π nd h (s) = c V π. Thus, h (0) = h (s). As before, h (1) c V π + c ( V π V π ) v π,c (x) E π( x) Q π (x, ) V π (x). Let B = x G v π,c (x)vr ν( x) Q π (x, ). The new h is lso qudrtic nd cn be written s h (w) = 1 B w + g w + f, for some g nd f. We know tht h (s) = h (0) = f. Thus the improvement is h (s) h (1) = g B /. On the other hnd, h (s) = B s / + g s + f = f nd so g = B s/. Thus, h (s) h (1) = B (s 1) B(s 1) = v π,c (x)vr ν( x) Q π (x, ), x B from which the sttement follows.. Input: Initil policy π 1, negtivity threshold ɛ, time horizon T, initil s 0 ; for i = 1,,..., I do s = s 0 ; repet Run policy π i for T steps; Estimte G i (s) using (9); If G i (s) > ɛ, set s = s/; until G i (s) ɛ; Estimte V π i; Obtin ν i from (3) or (4); Define new π i+1 bsed on Corollry 5; end for Figure 3: The Adptive Itertive LPI Algorithm. In prticulr, if x B (1γ)v π,c(x) ɛ for some smll ɛ, then c V π c V π + c ( V π V π ) ɛb B(s 1) + 4(1 γ) v π,c (x) E π( x) Q π (x, ) V π (x). This rgument motivtes n dptive procedure for updting s: strt from big vlue of s nd decrese it only when the frequency of visits to bd sttes becomes lrger thn threshold. The dptive lgorithm, clled AILPI, is shown in Figure 3. In the figure, π i is the ith policy, ν i is the corresponding bse policy, { π i+1 ν i ( x)(1 + s 0 ( x) = π i(x, )) if x G ν i ( x)(1 + π i(x, )) if x B, B i (s) = {x :, ν i ( x)(1 + s π i(x, )) < 0}, nd G i (s) = x B i(s) (1 γ)v πi,c(x). (9) To simplify the presenttion, we estimte G i (s) fter running policy for fixed number of rounds. We cn lso design version tht updtes the estimte in n online fshion nd decreses s s soon s the number of visits to bd sttes becomes lrge..3 Comprison with Conservtive Policy Itertion Let us compre the performnce bound in Theorem 4 with the performnce bound of Conservtive Policy Itertion. To simplify the rgument, we ssume the exct vlue functions re vilble nd ν is the uniform 134

6 A Fst nd Relible Policy Improvement Algorithm policy. Let Q π (x, ) = r(x) + γp (x,) V π be the stte-ction vlue of policy π nd let A π (x, ) = Q π (x, ) V π (x) be the dvntge function. Let g π (x) = rgmx Q π (x, ) be the greedy policy with respect to policy π nd let A π π (x) = π ( x)a π (x, ) be the policy dvntge of π with respect to π. Let A g = (1 γ) x = (1 γ) x v π,c (x)a g π π (x) v π,c (x)(mx Q π (x, ) V π (x)). Let E CPI = A g/(8b). Kkde nd Lngford (00) propose Conservtive Policy Itertion tht uses n pproximte greedy updte π CPI ( x) = (1 α) π( x) + α1 { = g π (x)} (10) for some α (0, 1). Kkde nd Lngford (00) show tht using the choice of α = (1 γ)a g /(4b), c V πcpi c V π + E CPI. Let N x = mx Q π (x, ) min Q π (x, ) denote the rnge of Q π (x, ). The CPI improvement cn be upper bounded by E CPI 1 8b ( x (1 γ)v π,c (x)n x ). Theorem 4 shows n improvement of c V π = c V π + Define B(s 1) = c V π + (s 1) x E LPI def = (s 1) x v π,c (x)vr ν( x) Q π (x, ). v π,c (x)vr ν( x) Q π (x, ). Becuse ν is ssumed to be uniform, Vr ν( x) Q π (x, ) = N x/4. Thus, E LPI = ((s 1)/4) x v π,c(x)n x. Let s choose b = γ nd s = 1/(bγ) (the most conservtive choice of s). Thus Thus, E LPI (1 γ )/(4γ ) x E LPI E CPI 1 + γ 4γ 1 8γ v π,c (x)n x. (1 γ)v π,c (x)nx x ( ) (1 γ)v π,c (x)n x. x A direct comprison is not possible becuse v π,c is different from v π,c. If we ssume tht v π,c nd v π,c re similr, by Jensen s inequlity, we expect E LPI to be bigger thn E CPI. We ttribute this difference to the fct tht, unlike CPI, the mixture coefficient in our updte rule is not constnt nd depends on the stte nd ction. Even if N x is uniform over the stte spce nd equl to constnt N, we still hve n improvement: E LPI E CPI N ( 1 + γ 1 ) 3N 4γ γ 8γ. In prctice, the recommended choice of α = (1 γ)a g /(4b) leds to very conservtive updtes nd very slow progress (Scherrer, 014). Often one needs to choose much lrger α to mke CPI prcticl, but there re no theoreticl gurntees for such choices. Scherrer (014) proposes doing line serch to find the best α. But unlike our dptive method, such procedure lcks theoreticl justifiction. As we show in experiments, even our most conservtive choice of s = 1/(bγ) results in fster progress thn CPI. The bove rgument ssumes mximum vrince for ν. If π is deterministic, then ν is lso deterministic, Vr ν( x) Q π (x, ) = 0, B = 0, nd the performnce bound in Theorem 4 shows no improvements. CPI does not hve this restriction nd cn be pplied with initil deterministic policies. Also, we require rewrds to be ction-independent, while CPI pplies to more generl rewrd functions. Let m π = mx x 1 {rgmx Q π (x, )} π( x) 1, A π π = mx x,x A π π (x) A π π (x ), nd α = (1 γ)a g /(γm π A g π π ). Pirott et l. (013) improve the theoreticl nlysis of Kkde nd Lngford (00) nd show tht if α 1 nd we updte the policy ccording to (10) with the choice of mixture coefficient α, the policy improvement is t lest A g/(γm π A g π π ). Although this improves upon CPI, estimting m π nd α is computtionlly hrd in lrge stte problems. Pirott et l. (013) lso propose multi-prmeter version tht uses different vlue of α for ech stte, but the improvement over the single prmeter version is not shown nd the method is computtionlly expensive. 3 Experiments We implemented the ILPI lgorithm in Python nd tested its performnce on three problems: two chin wlk problems nd blncing n inverted pendulum. The performnce of the lgorithm is compred with the performnce of CPI (Kkde nd Lngford, 00). 1343

7 Ysin Abbsi-Ydkori, Peter L. Brtlett, Stephen J. Wright Figure 4: Performnce of ILPI on chin wlk benchmrk (50 sttes). Ech run is repeted 10 times nd men nd stndrd devitions re reported. ILPI finds n optiml policy in less thn 10 itertions. 3.1 Chin Wlk Domins We tested the performnce of the lgorithm on two simple chin wlk problems. (See Section 9.1 in (Lgoudkis nd Prr, 003).) The first chin hs 50 sttes nd there re two ctions (Left nd Right) vilble in ech stte. An ction moves the stte in the intended direction with probbility 0.9, nd moves the sttes in the opposite direction with probbility 0.1. Rewrd is +1 in sttes 10 nd 41, nd is zero in other sttes. The discount fctor is 0.9. Figure 4 shows the performnce of the exct version of ILPI lgorithm on this benchmrk. The initil policy π 1 is the uniform rndom policy tht tkes Left nd Right with equl probbility. We chose s = 1/F (π 1 ) nd b = 0.9 in the ILPI lgorithm. Figure 4 shows tht the ILPI lgorithm chieves the performnce of the optiml policy in less thn 10 itertions. In comprison, the USPI lgorithm of Pirott et l. (013) needs 74 itertions to chieve this performnce. 3 CPI exhibits much slower progress (Pirott et l., 013). The second chin hs 4 sttes. The ction set, discount fctor, nd trnsition dynmics is the sme s before. Lgoudkis nd Prr (003) show tht LSPI finds the optiml policy in this problem, lthough Koller nd Prr (000) show tht n lgorithm tht is combintion of LSTD nd policy improvement oscilltes between the suboptiml policies RRRR nd LLLL (lwys going to the right nd lwys going to the left). Figure 5 shows the performnce of five versions of ILPI lgorithm on this benchmrk. The initil policy π 3 The vlue of optiml policy tht we find is slightly different thn the vlue reported by Pirott et l. (013). Figure 5: Performnce of ILPI on chin wlk benchmrk with 4 sttes. 95% confidence intervls re shown for pproximte lgorithms. is lwys the uniform rndom policy tht tkes Left nd Right with equl probbility. The first three versions (shown by blue circles, strs, nd red circles), use s i = 1/F (π i ), s i = 1/(γ mx x V π i(x)), nd s = 1/(γb), respectively, nd vlue functions V π re computed exctly. Notice tht the first two versions chnge s i in ech itertion dptively. The fourth version (shown by tringles), uses s = 1/(γb). Vlue functions re estimted by verging over 4 roll-outs of length 0. Other quntities (ν nd Q π ) re lso estimted by verging over 4 smples. The lst version (shown by the pink line) uses only one roll-out to estimte quntity. This lst version fils to improve the initil policy (pprently due to lrge estimtion errors). We lso show the performnce of the CPI lgorithm, which improves the policies very slowly. Pirott et l. (013) show tht their lgorithms find ner optiml policy in 49 itertions, however s discussed in Section.3, these pproximte lgorithms use quntity m π = mx x 1 {rgmx Q π (x, )} π(. x) nd hving ccess to such quntity for n pproximte lgorithm is questionble. We mke few observtions. First, ll versions of the exct ILPI lgorithm re fster thn CPI. Second, using roll-outs to estimte vlue functions re sufficient to improve policies, however, the number of roll-outs should be sufficiently lrge so tht estimtion errors become smll. 3. Inverted Pendulum The problem is to blnce n inverted pendulum t the upright position by pplying horizontl forces to the crt tht the pendulum is ttched to. The length nd mss of the pendulum re unknown to the lerner. The ctions re left force (-50N), right force (50N), or no force (0N). A uniform perturbtion in [-10,10] is dded to the ction. The stte vector consists of the verticl 1344

8 A Fst nd Relible Policy Improvement Algorithm () Performnce of ILPI (s = 1 γb ). (b) Performnce of AILPI. (c) Performnce of ILPI (s = 100). Figure 6: Performnce of ILPI nd AILPI on inverted pendulum benchmrk. 95% confidence intervls re shown. ngle θ nd the ngulr velocity θ of the pendulum. Given ction, the stte evolves ccording to θ = 9.8 sin(θ) αml( θ) sin(θ)/ α cos(θ) 4l/3 αml cos (θ) Here, m = kg is the mss of the pendulum, M = 8kg is the mss of the crt, l = 0.5m is the length of the pendulum, nd α = 1/(m + M). The simultion step is 0.1 seconds. The objective is to keep the ngle in [π/, π/]. An episode ends when the ngle of the pendulum is outside this intervl or when the episode exceeds 3000 steps. We tested the performnce of the itertive policy improvement lgorithm on this problem. We used 10 bsis functions to estimte vlue of policies: Ψ(x) = (1, exp( x p 1 ),..., exp( x p 9 )), where {p 1,..., p 9 } = {π/4, 0, π/4} {1, 0, +1}. To estimte vlue of policy π i, we collected dt by running π i for 100 episodes. Then we used this dt nd estimted V π i by n pproximte vlue itertion (AVI) lgorithm (using the dditionl trick tht we introduced t the end of Section ). The number of itertions of AVI is 100. We performed 0 policy improvements (so I = 0 in Figure ). We chose γ = 0.95, b = 0.9, nd s = 1/(γb) in the ILPI lgorithm. Figure 6() shows the performnce of the ILPI lgorithm. The CPI lgorithm exhibits very slow progress; even fter 100 itertions, the number of steps is less thn 15. The performnce of the ILPI lgorithm cn be significntly improved by using lrger s. Becuse the stte spce is continuous, clculting mx x Vπ i or F (π i ) is not esy. Insted, we run the AILPI lgorithm tht dptively updtes s. Figure 6(b) shows performnce of AILPI with initil s = 0. We choose ɛ = 0. nd 100 episodes re used for vlue estimtion. Figure 6(c) shows tht ILPI with fixed s = 100 finds the optiml policy in itertions.. 4 Conclusions We proposed policy itertion lgorithm tht is gurnteed to improve the performnce of the initil stochstic policy. We showed tht the theoreticl improvement is bigger thn tht of Conservtive Policy Itertion lgorithm. Our theorem hs two dvntges compred with the gurntees tht re known for CPI: First, the mixture coefficients re stte-dependent nd becuse of this, our improvement hs the form of the expecttion of qudrtics while the improvement of CPI hs the form of the qudrtic of n expecttion. Second, our theorem llows for much bigger steps towrds the greedy policy, hence fster convergence. Our experiments re consistent with these theoreticl dvntges. Acknowledgements We grtefully cknowledge the support of the Austrlin Reserch Council through n Austrlin Lurete Fellowship (FL ) nd through the Austrlin Reserch Council Centre of Excellence for Mthemticl nd Sttisticl Frontiers (ACEMS). 1345

9 Ysin Abbsi-Ydkori, Peter L. Brtlett, Stephen J. Wright References D. P. Bertseks nd J. Tsitsiklis. Neuro-Dynmic Progrmming. Athen Scientific, D. P. de Fris nd B. Vn Roy. The liner progrmming pproch to pproximte dynmic progrmming. Opertions Reserch, 51, 003. S. Kkde nd J. Lngford. Approximtely optiml pproximte reinforcement lerning. In ICML, 00. D. Koller nd R. Prr. Policy itertion for fctored MDPs. In Sixteenth Conference on Uncertinty in Artificil Intelligence, 000. M. G. Lgoudkis nd R. Prr. Lest-squres policy itertion. JMLR, 4: , 003. M. Pirott, M. Restelli, A. Pecorino, nd D. Clndriello. Sfe policy itertion. In ICML, 013. B. Scherrer. Approximte policy itertion schemes: A comprison. In ICML, 014. R. S. Sutton nd A. G. Brto. Reinforcement Lerning: An Introduction. Brdford Book. MIT Press, P. S. Thoms, G. Theochrous, nd M. Ghvmzdeh. High confidence policy improvement. In ICML,

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Best Approximation. Chapter The General Case

Best Approximation. Chapter The General Case Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Theoretical foundations of Gaussian quadrature

Theoretical foundations of Gaussian quadrature Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

3.4 Numerical integration

3.4 Numerical integration 3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Chapter 3 Solving Nonlinear Equations

Chapter 3 Solving Nonlinear Equations Chpter 3 Solving Nonliner Equtions 3.1 Introduction The nonliner function of unknown vrible x is in the form of where n could be non-integer. Root is the numericl vlue of x tht stisfies f ( x) 0. Grphiclly,

More information

Math 270A: Numerical Linear Algebra

Math 270A: Numerical Linear Algebra Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner

More information

APPROXIMATE INTEGRATION

APPROXIMATE INTEGRATION APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be

More information

Discrete Least-squares Approximations

Discrete Least-squares Approximations Discrete Lest-squres Approximtions Given set of dt points (x, y ), (x, y ),, (x m, y m ), norml nd useful prctice in mny pplictions in sttistics, engineering nd other pplied sciences is to construct curve

More information

Chapter 3 Polynomials

Chapter 3 Polynomials Dr M DRAIEF As described in the introduction of Chpter 1, pplictions of solving liner equtions rise in number of different settings In prticulr, we will in this chpter focus on the problem of modelling

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Lecture 1. Functional series. Pointwise and uniform convergence.

Lecture 1. Functional series. Pointwise and uniform convergence. 1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is

More information

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

Math& 152 Section Integration by Parts

Math& 152 Section Integration by Parts Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible

More information

Numerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1

Numerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1 AMSC/CMSC 46/466 T. von Petersdorff 1 umericl Integrtion 1 Introduction We wnt to pproximte the integrl I := f xdx where we re given, b nd the function f s subroutine. We evlute f t points x 1,...,x n

More information

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Abstract inner product spaces

Abstract inner product spaces WEEK 4 Abstrct inner product spces Definition An inner product spce is vector spce V over the rel field R equipped with rule for multiplying vectors, such tht the product of two vectors is sclr, nd the

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Online Supplements to Performance-Based Contracts for Outpatient Medical Services

Online Supplements to Performance-Based Contracts for Outpatient Medical Services Jing, Png nd Svin: Performnce-bsed Contrcts Article submitted to Mnufcturing & Service Opertions Mngement; mnuscript no. MSOM-11-270.R2 1 Online Supplements to Performnce-Bsed Contrcts for Outptient Medicl

More information

1 Probability Density Functions

1 Probability Density Functions Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our

More information

Math 61CM - Solutions to homework 9

Math 61CM - Solutions to homework 9 Mth 61CM - Solutions to homework 9 Cédric De Groote November 30 th, 2018 Problem 1: Recll tht the left limit of function f t point c is defined s follows: lim f(x) = l x c if for ny > 0 there exists δ

More information

The Wave Equation I. MA 436 Kurt Bryan

The Wave Equation I. MA 436 Kurt Bryan 1 Introduction The Wve Eqution I MA 436 Kurt Bryn Consider string stretching long the x xis, of indeterminte (or even infinite!) length. We wnt to derive n eqution which models the motion of the string

More information

Lecture 21: Order statistics

Lecture 21: Order statistics Lecture : Order sttistics Suppose we hve N mesurements of sclr, x i =, N Tke ll mesurements nd sort them into scending order x x x 3 x N Define the mesured running integrl S N (x) = 0 for x < x = i/n for

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

221B Lecture Notes WKB Method

221B Lecture Notes WKB Method Clssicl Limit B Lecture Notes WKB Method Hmilton Jcobi Eqution We strt from the Schrödinger eqution for single prticle in potentil i h t ψ x, t = [ ] h m + V x ψ x, t. We cn rewrite this eqution by using

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

1 The Lagrange interpolation formula

1 The Lagrange interpolation formula Notes on Qudrture 1 The Lgrnge interpoltion formul We briefly recll the Lgrnge interpoltion formul. The strting point is collection of N + 1 rel points (x 0, y 0 ), (x 1, y 1 ),..., (x N, y N ), with x

More information

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance Generl structure ECO 37 Economics of Uncertinty Fll Term 007 Notes for lectures 4. Stochstic Dominnce Here we suppose tht the consequences re welth mounts denoted by W, which cn tke on ny vlue between

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

Orthogonal Polynomials and Least-Squares Approximations to Functions

Orthogonal Polynomials and Least-Squares Approximations to Functions Chpter Orthogonl Polynomils nd Lest-Squres Approximtions to Functions **4/5/3 ET. Discrete Lest-Squres Approximtions Given set of dt points (x,y ), (x,y ),..., (x m,y m ), norml nd useful prctice in mny

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model: 1 2 MIXED MODELS (Sections 17.7 17.8) Exmple: Suppose tht in the fiber breking strength exmple, the four mchines used were the only ones of interest, but the interest ws over wide rnge of opertors, nd

More information

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95 An pproximtion to the rithmetic-geometric men G.J.O. Jmeson, Mth. Gzette 98 (4), 85 95 Given positive numbers > b, consider the itertion given by =, b = b nd n+ = ( n + b n ), b n+ = ( n b n ) /. At ech

More information

Section 11.5 Estimation of difference of two proportions

Section 11.5 Estimation of difference of two proportions ection.5 Estimtion of difference of two proportions As seen in estimtion of difference of two mens for nonnorml popultion bsed on lrge smple sizes, one cn use CLT in the pproximtion of the distribution

More information

Numerical Integration

Numerical Integration Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the

More information

8 Laplace s Method and Local Limit Theorems

8 Laplace s Method and Local Limit Theorems 8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17 CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

Lecture 19: Continuous Least Squares Approximation

Lecture 19: Continuous Least Squares Approximation Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for

More information

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d Interntionl Industril Informtics nd Computer Engineering Conference (IIICEC 15) Driving Cycle Construction of City Rod for Hybrid Bus Bsed on Mrkov Process Deng Pn1,, Fengchun Sun1,b*, Hongwen He1, c,

More information

Definite integral. Mathematics FRDIS MENDELU

Definite integral. Mathematics FRDIS MENDELU Definite integrl Mthemtics FRDIS MENDELU Simon Fišnrová Brno 1 Motivtion - re under curve Suppose, for simplicity, tht y = f(x) is nonnegtive nd continuous function defined on [, b]. Wht is the re of the

More information

Section 4.8. D v(t j 1 ) t. (4.8.1) j=1

Section 4.8. D v(t j 1 ) t. (4.8.1) j=1 Difference Equtions to Differentil Equtions Section.8 Distnce, Position, nd the Length of Curves Although we motivted the definition of the definite integrl with the notion of re, there re mny pplictions

More information

ODE: Existence and Uniqueness of a Solution

ODE: Existence and Uniqueness of a Solution Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =

More information

Numerical Integration

Numerical Integration Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl

More information

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1 Mth 33 Volume Stewrt 5.2 Geometry of integrls. In this section, we will lern how to compute volumes using integrls defined by slice nlysis. First, we recll from Clculus I how to compute res. Given the

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s). Mth 1A with Professor Stnkov Worksheet, Discussion #41; Wednesdy, 12/6/217 GSI nme: Roy Zho Problems 1. Write the integrl 3 dx s limit of Riemnn sums. Write it using 2 intervls using the 1 x different

More information

Math 426: Probability Final Exam Practice

Math 426: Probability Final Exam Practice Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by

More information

CBE 291b - Computation And Optimization For Engineers

CBE 291b - Computation And Optimization For Engineers The University of Western Ontrio Fculty of Engineering Science Deprtment of Chemicl nd Biochemicl Engineering CBE 9b - Computtion And Optimiztion For Engineers Mtlb Project Introduction Prof. A. Jutn Jn

More information

approaches as n becomes larger and larger. Since e > 1, the graph of the natural exponential function is as below

approaches as n becomes larger and larger. Since e > 1, the graph of the natural exponential function is as below . Eponentil nd rithmic functions.1 Eponentil Functions A function of the form f() =, > 0, 1 is clled n eponentil function. Its domin is the set of ll rel f ( 1) numbers. For n eponentil function f we hve.

More information

MAA 4212 Improper Integrals

MAA 4212 Improper Integrals Notes by Dvid Groisser, Copyright c 1995; revised 2002, 2009, 2014 MAA 4212 Improper Integrls The Riemnn integrl, while perfectly well-defined, is too restrictive for mny purposes; there re functions which

More information

Research Article Moment Inequalities and Complete Moment Convergence

Research Article Moment Inequalities and Complete Moment Convergence Hindwi Publishing Corportion Journl of Inequlities nd Applictions Volume 2009, Article ID 271265, 14 pges doi:10.1155/2009/271265 Reserch Article Moment Inequlities nd Complete Moment Convergence Soo Hk

More information

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that Arc Length of Curves in Three Dimensionl Spce If the vector function r(t) f(t) i + g(t) j + h(t) k trces out the curve C s t vries, we cn mesure distnces long C using formul nerly identicl to one tht we

More information

Near-Bayesian Exploration in Polynomial Time

Near-Bayesian Exploration in Polynomial Time J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning

More information

A recursive construction of efficiently decodable list-disjunct matrices

A recursive construction of efficiently decodable list-disjunct matrices CSE 709: Compressed Sensing nd Group Testing. Prt I Lecturers: Hung Q. Ngo nd Atri Rudr SUNY t Bufflo, Fll 2011 Lst updte: October 13, 2011 A recursive construction of efficiently decodble list-disjunct

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

Estimation of Binomial Distribution in the Light of Future Data

Estimation of Binomial Distribution in the Light of Future Data British Journl of Mthemtics & Computer Science 102: 1-7, 2015, Article no.bjmcs.19191 ISSN: 2231-0851 SCIENCEDOMAIN interntionl www.sciencedomin.org Estimtion of Binomil Distribution in the Light of Future

More information

Lecture Note 9: Orthogonal Reduction

Lecture Note 9: Orthogonal Reduction MATH : Computtionl Methods of Liner Algebr 1 The Row Echelon Form Lecture Note 9: Orthogonl Reduction Our trget is to solve the norml eution: Xinyi Zeng Deprtment of Mthemticl Sciences, UTEP A t Ax = A

More information

Definite integral. Mathematics FRDIS MENDELU. Simona Fišnarová (Mendel University) Definite integral MENDELU 1 / 30

Definite integral. Mathematics FRDIS MENDELU. Simona Fišnarová (Mendel University) Definite integral MENDELU 1 / 30 Definite integrl Mthemtics FRDIS MENDELU Simon Fišnrová (Mendel University) Definite integrl MENDELU / Motivtion - re under curve Suppose, for simplicity, tht y = f(x) is nonnegtive nd continuous function

More information

Entropy and Ergodic Theory Notes 10: Large Deviations I

Entropy and Ergodic Theory Notes 10: Large Deviations I Entropy nd Ergodic Theory Notes 10: Lrge Devitions I 1 A chnge of convention This is our first lecture on pplictions of entropy in probbility theory. In probbility theory, the convention is tht ll logrithms

More information

Physics 116C Solution of inhomogeneous ordinary differential equations using Green s functions

Physics 116C Solution of inhomogeneous ordinary differential equations using Green s functions Physics 6C Solution of inhomogeneous ordinry differentil equtions using Green s functions Peter Young November 5, 29 Homogeneous Equtions We hve studied, especilly in long HW problem, second order liner

More information

Math 360: A primitive integral and elementary functions

Math 360: A primitive integral and elementary functions Mth 360: A primitive integrl nd elementry functions D. DeTurck University of Pennsylvni October 16, 2017 D. DeTurck Mth 360 001 2017C: Integrl/functions 1 / 32 Setup for the integrl prtitions Definition:

More information

Variational Techniques for Sturm-Liouville Eigenvalue Problems

Variational Techniques for Sturm-Liouville Eigenvalue Problems Vritionl Techniques for Sturm-Liouville Eigenvlue Problems Vlerie Cormni Deprtment of Mthemtics nd Sttistics University of Nebrsk, Lincoln Lincoln, NE 68588 Emil: vcormni@mth.unl.edu Rolf Ryhm Deprtment

More information

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b. Mth 255 - Vector lculus II Notes 4.2 Pth nd Line Integrls We begin with discussion of pth integrls (the book clls them sclr line integrls). We will do this for function of two vribles, but these ides cn

More information

Module 6: LINEAR TRANSFORMATIONS

Module 6: LINEAR TRANSFORMATIONS Module 6: LINEAR TRANSFORMATIONS. Trnsformtions nd mtrices Trnsformtions re generliztions of functions. A vector x in some set S n is mpped into m nother vector y T( x). A trnsformtion is liner if, for

More information

Math 113 Exam 2 Practice

Math 113 Exam 2 Practice Mth 3 Exm Prctice Februry 8, 03 Exm will cover 7.4, 7.5, 7.7, 7.8, 8.-3 nd 8.5. Plese note tht integrtion skills lerned in erlier sections will still be needed for the mteril in 7.5, 7.8 nd chpter 8. This

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

Integral equations, eigenvalue, function interpolation

Integral equations, eigenvalue, function interpolation Integrl equtions, eigenvlue, function interpoltion Mrcin Chrząszcz mchrzsz@cernch Monte Crlo methods, 26 My, 2016 1 / Mrcin Chrząszcz (Universität Zürich) Integrl equtions, eigenvlue, function interpoltion

More information