Metrics for Finite Markov Decision Processes

Size: px
Start display at page:

Download "Metrics for Finite Markov Decision Processes"

Transcription

1 Metrics for Finite Mrkov Decision Processes Norm Ferns chool of Computer cience McGill University Montrél, Cnd, H3 27 Prksh Pnngden chool of Computer cience McGill University Montrél, Cnd, H3 27 Doin Precup chool of Computer cience McGill University Montrél, Cnd, H3 27 bstrct We present metrics for mesuring the similrity of sttes in finite Mrkov decision process (MDP). The formultion of our metrics is bsed on the notion of bisimultion for MDPs, with n im towrds solving discounted infinite horizon reinforcement lerning tsks. uch metrics cn be used to ggregte sttes, s well s to better structure other vlue function pproximtors (e.g., memory-bsed or nerest-neighbor pproximtors). We provide bounds tht relte our metric distnces to the optiml vlues of sttes in the given MDP. Introduction Mrkov decision processes (MDPs) offer populr mthemticl tool for plnning nd lerning in the presence of uncertinty (Boutilier et l., 999). MDPs re stndrd formlism for describing multi-stge decision mking in probbilistic environments. The objective of the decision mking is to mximize cumultive mesure of long-term performnce, clled the return. Dynmic progrmming lgorithms, e.g., vlue itertion or policy itertion (Putermn, 994), llow us to compute the optiml expected return for ny stte, s well s the wy of behving (policy) tht genertes this return. However, in mny prcticl pplictions, the stte spce of n MDP is simply too lrge, possibly even continuous, for such stndrd lgorithms to be pplied. typicl mens of overcoming such circumstnces is to prtition the stte spce in the hope of obtining n essentilly equivlent reduced system. One defines new MDP over the prtition blocks, nd if it is smll enough, it cn be solved by clssicl methods. The hope is tht optiml vlues nd policies for the reduced MDP cn be extended to optiml vlues nd policies for the originl MDP. Recent MDP reserch on defining equivlence reltions on MDPs (Givn et l., 23) hs built on the notion of strong probbilistic bisimultion from concurrency theory. Bisimultion ws introduced by Lrsen nd kou (99) bsed on ides of Prk (98) nd Milner (98). Roughly speking, two sttes of process re deemed equivlent if ll the trnsitions of one stte cn be mtched by trnsitions of the other stte, nd the results re themselves bisimilr. The extension of bisimultion to trnsition systems with rewrds ws crried out in the context of MDPs by Givn, Den nd Greig (23) nd in the context of performnce evlution by Bernrdo nd Brvetti (23). In both cses, the motivtion is to use the equivlence reltion to ggregte the sttes nd get smller stte spces. The bsic notion of bisimultion is modified only slightly be the introduction of rewrds. The notion of equivlence for stochstic processes is problemtic to use in prctice becuse it requires tht the trnsition probbilities gree exctly. This is not robust concept, especilly considering tht usully, the numbers used in probbilistic models come from experimenttion or re pproximte estimtes. smll chnge in probbility estimtes cn cuse bisimilr sttes to pper non-bisimilr. Den, Givn nd Lech (997) ddressed this issue by llowing the stte spce to be prtitioned into blocks of sttes such tht the sttes within block re close in terms of their trnsition probbilities. However, their technique involves moving to slightly generlized model, nmely, the bounded-prmeter MDP. In this pper we ddress the sme problem in different wy, by developing metrics, or distnce functions, on the sttes of n MDP. Unlike n equivlence reltion, metric my vry smoothly s function of the trnsition probbilities. Yet, metric cn be used to ggregte sttes in mnner similr to n equivlence reltion. For exmple, we cn choose tolernce prmeter, ε, nd cluster together sttes tht re in ε-neighborhoods. metric cn hve broder pplicbility to other clsses of function pproximtors s well. For instnce, metric cn be used in nerest-neighbor pproximtor in order to decide on the dt points to be used s prototypes. The metrics we develop re bsed on the notion of bisimultion. More precisely, we will require tht if one

2 of our metrics ssigns distnce of to pir of sttes, then those sttes hve to be bisimilr. Thus, our metrics provide quntittive nlogue of bisimultion. dditionlly, our metrics will possess the following plesing property: if the system prmeters of two bisimilr sttes re perturbed slightly, then the two sttes will remin close in metric distnce. We build on previous work by Deshrnis, Pnngden, Jgdeesn nd Gupt (Deshrnis et l., 999; Deshrnis et l., 22) nd by vn Breugel nd Worrell (2), in which the theory of bisimultion, metrics nd pproximtion ws developed for lbeled Mrkov processes with continuous stte spces. Their work ws developed in the context of forml verifiction; here we tke the first steps to pply nd extend their results in the context of optimiztion problems. lthough we present our work currently in the context of discrete MDPs, our recent reserch indictes tht the results cn be extended for continuous MDPs. The pper is orgnized s follows. ections 2 nd 3 provide the definitions nd theoreticl results required to construct our metrics. In section 4 we introduce two kinds of bisimultion metrics nd in section 5 provide bounds on the optiml vlue function of MDPs tht cn be obtined by using these metrics for stte ggregtion. In section 6 we provide some experimentl results to compre nd contrst our metrics. ection 7 contins conclusions nd directions for future work. 2 Bckground finite Mrkov decision process consists of finite set of sttes,, finite set of ctions,, nd for every pir of sttes s nd s nd ction, Mrkovin stte trnsition probbility, Pss, nd numericl rewrd, rs. In the rest of this work we will focus on fixed, known MDP. Moreover, since rewrds re necessrily bounded, we will ssume without loss of generlity tht s. rs. We will now review briefly some bsic definitions nd results from MDP theory (e.g., (Putermn, 994), sec ). wy of behving or policy is defined s mpping from sttes to ctions, π :, nd s. The vlue of stte s under policy π, V π s, is defined s: V π s E t γt r t s s π, where s is the stte t time, γ is discount fctor for future rewrds, r t is the rewrd obtined t time t, nd the expecttion is chieved by following the stte dynmics induced by π. The mpping V π : is clled the vlue function ccording to π. The gol of decision mking in n MDP is to find policy π tht mximizes V π s for ech s. uch mximizing policy nd its ssocited vlue function re sid to be If rewrds re bounded between R min nd R mx, we cn chieve this by subtrcting R min from ll rewrds nd dividing by R mx R min optiml. Note tht while there my be mny optiml policies, the optiml vlue function, V, is unique nd stisfies fmily of fixed point equtions, V s mx rs γ P ss V s s s These re known s the Bellmn optimlity equtions. They led to the following theorem, which expresses V s the limit of sequence of itertes. Theorem 2.. Let V s s nd mx rs s γ P ss s s Then n converges to V uniformly. These results cn be relized vi dynmic progrmming (DP) lgorithm tht computes vlue function up to prescribed degree of ccurcy. For exmple, if one is given positive tolernce ε then iterting until the mximum difference between consecutive itertes is ε γ 2γ gurntees tht the current iterte differs from the true vlue function by t most ε. Unfortuntely, it is sometimes the cse tht the stte spce is too lrge for DP to be fesible. stndrd strtegy is to pproximte the given MDP by ggregting its stte spce. The hope is tht one cn obtin smller equivlent MDP, with n esily computble vlue function, tht could provide informtion bout the vlue function of the originl MDP. Givn, Den, nd Greig (23) investigted severl notions of stte equivlence nd determined tht the most pproprite is stochstic bisimultion: Definition 2.2. stochstic bisimultion reltion is n equivlence reltion R on tht stisfies the following property: srs! r s r s nd C " R P s C# P s C where " R is the stte prtition induced by R nd P s C c C P sc. tochstic bisimultion, $, is the lrgest stochstic bisimultion reltion. In (Givn et l., 23) it ws shown tht the stochstic bisimultion (henceforth simply bisimultion ) prtition could be found by itertively refining prtitions bsed on rewrds nd equivlence clss trnsition probbilities, beginning with n initil prtition in which ll sttes re lumped together. This could be done in O % 3 opertions. Unfortuntely, bisimultion is too stringent. Consider the smple MDP in figure with 4 sttes lbeled s, t, u, nd v, nd one ction lbeled. uppose rv. Then ll sttes re bisimilr, becuse they shre the sme immedite rewrd nd trnsition mong themselves w.p.. On the

3 " ' other hnd, if rv then v is the only stte in its bisimultion clss since it is the only one with positive rewrd. Moreover, s nd t re bisimilr iff they shre the sme probbility of trnsitioning to v s bisimultion clss. Ech is bisimilr to u iff tht probbility is zero. Thus, u, s, t $ v, s $ t p q; s $ u p, nd t $ u q. " ' #!$ &% #!$ &% Figure : mple MDP! This exmple demonstrtes tht bisimultion is simply too strong notion; if r v is just slightly positive, nd p differs only slightly from q then we should expect s nd t to be prcticlly bisimilr. From the point of view of the vlue function, these sttes will lso be very close, nd one cn rgue tht ggregting them would be sfe. However, such fine distinction cnnot be mde using bisimultion lone. Therefore, we seek quntittive notion of bisimultion so tht we cn obtin mesure of how bisimilr two sttes re. To formulte such notion we use semimetrics, distnce functions on the stte spce. Definition 2.3. semimetric on is mp d : ( such tht for ll s, s, s :. s s d s s 2. d s s d s s 3. d s s *) d s s d s s If the converse of the first xiom holds s well, we sy d is metric. Let M be the set of ll semimetrics on tht ssign distnces of t most. Note tht every semimetric d induces n equivlence reltion, R d, on, obtined by equting points ssigned distnce zero by d. Definition 2.4. We sy tht d M is bisimultion reltion metric if R d is bisimultion reltion. We sy tht d is bisimultion metric if R d is $. 3 Probbility metrics Our gol is to construct clss of bisimultion metrics for use in MDP stte ggregtion. pecificlly, such metrics would be required to be esily computble nd provide informtion concerning the optiml vlues of sttes. However, if we denote by +-, X /. Y the bisimultion metric tht ssigns distnce to sttes tht re not bisimilr then it is not hrd to show tht +, X. / Y stisfies both requirements, while possessing no more distinguishing power thn tht of bisimultion itself. o we dditionlly require tht metric distnces vry smoothly nd proportionlly with differences in rewrds nd differences in probbilities. Formlly, we will construct bisimultion metrics vi metric on rewrds nd metric on probbility functions. The choice of metric on rewrds is n obvious one: we simply use the bsolute vlue of the difference. However, there re mny wys of defining useful probbility metrics (Gibbs & u, 22). Two of the most importnt re the Kntorovich metric nd the totl vrition metric. 2 Given d M, the Kntorovich metric, d, pplied to stte probbility functions P nd Q is defined by the following liner progrm: 3 3 mx u i i P s i 54 Q s i u i i subject to: i j u i 4 u j 6 d s i s j i ) u i ) which is equivlent to the following dul progrm: 3 3 min l k j l d s k s j subject to: k l P s k j j l Q s j k k j l 8 The origins of the Kntorovich metric lie in mss trnsporttion theory. Consider two copies of the stte spce, one in which sttes re lbeled s supply nodes, nd the other in which sttes re lbeled s demnd nodes. Ech supply node hs supply whose vlue is equl to the probbility mss of the corresponding stte under P. Ech demnd hs vlue equl to the probbility mss of the corresponding stte under Q. Furthermore, imgine there is trnsporttion rc from ech supply node to ech demnd node, lbeled with cost equl to the distnce of the corresponding sttes under d. This constitutes trnsporttion network. flow with respect to this network is n ssignment of quntities to be shipped long ech rc subject to the conditions tht the totl flow leving supply node is equl to its supply, nd the totl flow entering demnd node is equl to its demnd. The cost of flow long n rc is the vlue of the flow long tht rc multiplied by the cost ssigned to tht rc. The gol of the Kntorovich optiml mss trnsporttion problem is to find the best totl flow for the given network, i.e. the flow of miniml cost. This formultion is cptured exctly in the dul progrm 2 Note tht the Kullbch-Leibler divergence, lso known s KL-distnce, which is commonly used to estimte the similrity of probbility distributions, is not metric.

4 bove. The distnce ssigned to P nd Q, d P Q, is the cost of the optiml flow, which is known to be computble in strongly polynomil time. This formultion cn be computed in O 2 log time (Orlin, 988). ince the underlying cost function d is semimetric, the Kntorovich metric my be further simplified. Lemm 3.. Let d M. Then d P Q mx P C 4 Q C v C v C C R d subject to: C D v C 4 v D ) min i C j d s i s j D C ) v C ) nd d P Q P C Q C, C " R d. Proof. Let v i be ny fesible solution to the priml LP for d P Q. Note tht if s i R d s j then we must hve v i v j. Define for ech C " R d, v C v i for some s i C. Then collecting terms yields the desired expression. From this expression it is cler tht if P C Q C for every equivlence clss C, then d P Q. For the converse, suppose tht C such tht P C Q C. Without loss of generlity, suppose P C Q C. Clerly C, so we my tke v C min k C j C d s k s j nd v D for ll other clsses nd obtin positive lower bound on d P Q. By contrst, the totl vrition probbility metric, T TV, is defined independently of d by T TV P Q 2 s P s 4 Q s which is hlf the L -norm of P nd Q. It clerly hs the dvntge of being simply defined nd esily computble. Yet, it my still be plced within the previous context since T TV cn be expressed s +-, X. Y. 4 Bisimultion Metrics Our construction of bisimultion metrics is hevily bsed on the following two lemms, which re importnt consequences of lemm 3.. Here the usefulness of the Kntorovich metric becomes evident. Lemm 4.. If d is bisimultion metric then s s, d s s # rs rs nd d Ps Ps () ince condition is necessry for d M to be bisimultion metric, the question nturlly rises s to whether or not it is sufficient s well. In generl, the nswer is negtive. However, it is sufficient for d to be bisimultion reltion metric. Lemm 4.2. uppose d M stisfies (). Then d s s s $ s We hve stted tht our gol is to construct bisimultion metrics tht provide useful informtion concerning the optiml vlues of sttes, but we hve not mentioned how this cn be done. For inspirtion we look to the Bellmn optimlity equtions for the optiml vlue function, which yield the following bound: V s4 V s ) mx rs 4 r s γ Psu 4 P s u V u u The first component of the RH is simply the distnce in immedite rewrds, while the second component is strikingly similr to the priml LP for the Kntorovich distnce in distributions. Bsed on these observtions we fix prticulr form for our bisimultion metrics, nmely d s s # mx c R rs 4 r s d P Ps P s where d P is some probbility metric nd nd re two positive -bounded constnts. Intuitively, these constnts weight the importnce given to the distnce between rewrds reltive to the distnce between trnsition probbilities respectively. For instnce, in MDPs nturl choice would be γ nd 4 γ. The prticulr choice of probbility metric leds to two kinds of bisimultion metrics, which we now describe in detil. 4. Fixed-Point Metrics In this section, we will use the Kntorovich distnce s bsis for formulting bisimultion metric. Before we do so, we need some definitions nd results from fixed-point theory. These my be found, for exmple, in (Winskel, 993). We present them in generl nottion first, then we explin it in the context of our problem. Let X be prtil order. n ω-chin of this prtil order is n incresing sequence x n. The prtil order is sid to be n ω-complete prtil order (ω-cpo) if it contins lest upper bounds of ll ω-chins. It is clled n ω-cpo with bottom if it dditionlly contins lest element,, clled bottom. function f : X Y between ω-cpos is sid to be monotonic if x x f x f x. It is continuous if for every ω-chin x n, f n x n n f x n. point x X is sid to be prefixed-point of f if f x x. It is fixed-point if x f x. With these definitions, the following importnt theorem cn be estblished. Theorem 4.3 (Fixed-Point Theorem). Let f : X X be continuous function on n ω-cpo with bottom X. Define fix f n f n. Then fix f is the lest prefixedpoint of f nd the lest fixed-point of f.

5 In order to use this result, we equip M with the usul pointwise ordering: d ) d iff d s s ) d s s for ll s s. s result, we obtin n ω-cpo with bottom, where is the constnt zero function nd n d n is given by n d n s s sup n d n s s. Moreover, the sme cn be sid of the set M P of semimetrics on the set of probbility functions on. With this in mind it is now esy to see tht this ordering is preserved by the Kntorovich metric, i.e. Lemm 4.4. : M M P is continuous. Proof. ee ppendix. We re now redy to estblish the bisimultion metric bsed on the Kntorovich probbility metric: Theorem 4.5. Let, 8 with ). Define F : M M by F d s s mx c R rs 4 rs d Ps Ps Then F hs lest fixed-point, d f ix, nd d f ix is bisimultion metric. Proof. Clerly, existence of the lest fixed-point will follow from theorem 4.3, so we only need to show tht F is continuous. For future reference we will denote the itertes, F n, by d n nd remrk tht they form n ω-chin in M. Continuity of F follows from lemm 4.4, since it estblishes the monotonicity of F, nd from the fct tht given n ω-chin x n in M nd pir of sttes s nd s, F n x n s s mx c R rs 4 r s n x n Ps Ps mx c R rs 4 r s sup x n Ps Ps n supmx c n R rs 4 r s x n Ps P s sup F x n s s n F x n s s n o d f ix exists, nd d f ix n F n. Note tht by construction, d f ix stisfies (), nd so, from lemm 4.2 d f ix s s s $ s. On the other hnd, since +&, X. / Y is bisimultion metric, by pplying lemm 4. nd the definition of F, F +-, X. / Y is lso bisimultion metric. Therefore, F +, X. / Y ) +, X /. Y, i.e. +, X /. Y is prefixed-point of F. o d f ix ) +, X /. Y, since d f ix is the lest prefixed-point of F. Thus, s $ s d f ix s s. Note tht by induction d f ix 4 d n ) c n T for every n. Thus, we cn compute d f ix up to prescribed degree of ccurcy δ by itertively pplying F for lnδ ln steps. ince this essentilly reduces to computing Kntorovich metric t ech itertion for every ction nd pir of sttes, d f ix cn be computed in O % 4 lnδ log ln opertions. 4.2 Metrics bsed on Totl Vrition We remrked in the proof of theorem 4.5 tht F(+, X /. Y ), which we now denote by d/ is lso bisimultion metric. The dvntge to using d/ in plce of d f ix is tht its component probbility semimetric, +, X. / Y, dmits n explicit, esily computble formultion, similr to tht of the totl vrition metric. Lemm , X /. Y P Q 2 C / Proof: By lemm 3., we hve: P C 4 Q C +, X /. Y P Q mx u C P C / C 4 Q C u C subject to: C D u C 4 u D ) min i C j +, X /. Y s i s j D C ) u C ) However, for distinct bisimultion equivlence clsses C nd D, +-, X. / Y s i s j is, nd so the first constrint is extrneous. Thus, if we define u C to be if P C 8 Q C nd otherwise, then it is cler tht u C is fesible solution t which the mximum is chieved. For this solution we hve, +, X. / Y P Q# P C / C 4 Q C u C P C / C 4 Q C u C 4 2 P 4 Q 2 2 C P / C54 Q C Thus, d/ cn be computed vi the bisimultion prtition in O 3 opertions. 5 Vlue Function Bounds We re now redy to provide vlue function bounds. We will stte the bounds in terms of d f ix only. The bounds hold immeditely for d/ s well, becuse d f ix ) d/. Theorem 5.. uppose γ ). Then s s : s 4 s ) V s 4 V s ) d n s s d f ix s s Proof: Clerly the proof of the second item follows from the first by tking limits. For the proof of the first item we proceed by induction. Note tht since γ ) ) γ V i u ) 4 γ 4 γ )

6 nd by the induction hypothesis γ u 4 γ v ) u 4 v ) d n u v c o R γ u : u constitutes fesible solution to the priml LP for d n Ps Ps. It follows tht s 4 s mx rs γ Psu u V n u 4 mx r s γ P s u u V n u ) mx rs 4 r s γ Psu 4 P s u u V n u ) mx c R rs 4 r s Psu 4 P γ s u u u ) mx c R rs 4 r s d n Ps P s F d n s s d n s s These bounds cn be extended to relte the optiml vlues of sttes in the given MDP nd n ggregte MDP. First, let us fix some nottion nd ssumptions concerning the form of n ggregte MDP. We ssume the ggregte is given by P CD : C D r C : C where is prtition of the stte spce, is the sme finite set of ctions, nd trnsition probbilities nd rewrds re ech verged over equivlence clsses, i.e. P CD C P s D nd rc s C C rs s C dditionlly in the following we will denote the mp from to tking stte to its equivlence clss by ρ, nd the verge distnce from stte s to ll sttes in its equivlence clss under semimetric d, by g s d 3 ρ s3 s ρ s d s s. Theorem 5.2. uppose γ ) inequlities hold: ρ s 4 s ) V ρ s 4 V s ) Proof: ee ppendix.. Then s, the following g n s d n γ n k mx k u g u d k g γ s d f ix 4 γ mx u g u d f ix The proposed distnce metrics cn be used for ggregting sttes in strightforwrd wy. For some positive ε we choose severl seed sttes nd for ech, we cluster ll the sttes within n ε-neighborhood (while ensuring tht ech stte is plced in only one cluster). Then for cluster C nd ny stte s belonging to it, the bove theorem tells us tht V C 4 V 2ε s ) γ, provided γ ). Thus, s ε decreses, the optiml vlues of clss nd its sttes converge. 6 Illustrtion We illustrte our distnce metrics nd error bounds on very simple toy MDP, consisting of 5 ( 5 grid. There re 5 ctions, north, south, est, west nd sty. Trnsitions for ech cell re uniformly distributed mong djcent cells. Rewrds re distributed s follows. Moving south from rows -4 to rows 2-5 yields rewrds of.,.2,.3,.4 respectively. Moving est from columns -4 to columns 2-5 yields rewrds of.5,.53,.56,.59 respectively. Finlly, stying in the southest corner yields rewrd of. ll other ctions give rewrd. We used these prmeters in order to be ble to inspect the prtitions obtined. More extensive (but similr) illustrtions, using rndom MDPs, re discussed in (Ferns, 23). In ll experiments, 4 γ nd γ. We first compute the pirwise distnces between ll pirs of sttes. Note tht this is not prcticl pproch; here, we re just trying to understnd the behvior of the metrics. Then, from n initil seed stte we grow prtition of ε-clusters of sttes, dding new cluster ech time we encounter stte t distnce greter thn ε from the seed sttes of ech cluster presently in the prtition. Of course, the qulity of the prtition will depend on the choice of seeds, nd more sophisticted methods cn be employed here (e.g., picking the seeds for subsequent prtitions s fr s possible from the previous ones). Once prtition is estblished, we perform vlue itertion to find the vlue of the optiml policy. We vried the prmeter ε which bounds the llowed distnce between sttes, nd well s the discount fctor γ. Note tht low ε (close to ) mens tht we only llow sttes to be ggregted if they re very close in terms of the distnce. Hence, t this end of the spectrum, very little ggregtion will occur nd the vlue function in the ggregted MDP should be very close (or identicl) to the one in the originl MDP. When ε, ll sttes cn be ggregted, resulting in single-stte MDP, nd poor pproximtion of the optiml vlue function. Figure 2 shows the size of the ggregted MDPs, obtined using the Kntorovich metric nd the totl vrition metric, for vlues of γ, 5 nd 9. The two metrics re close for low γ but behve quite differently for high vlues of γ (which re typicl in the MDP community). In prticulr, the totl vrition metric hs very brupt trnsition from no ggregtion to ggregting ll sttes in one lump. We note, though, tht this metric is much fster to compute (by n order of 4 in our experiments, in Jv implementtion). Figure 3 compres the metrics in terms of ctul nd estimted error. The lower curves represent the mximum error between the optiml vlue functions of the ggregted MDP nd the originl MDP. The higher curves re the upper bound on the error, bsed on Theorem ε The stright line is the nive estimte, γ. Note tht the bounds in the theorem re much tighter thn the nive bound (which is omitted in the lst grph to mke the fig-

7 25 Totl vrition Kntorovich 25 Totl vrition Kntorovich 25 Totl vrition Kntorovich ize of ggregte MDP 5 ize of ggregte MDP 5 ize of ggregte MDP Figure 2: ize of ggregted MDP s function of ε, for γ (left), γ 5 (middle) nd γ 9 (right) True error Totl Vrition True error Kntorovich Bound Totl vrition Bound Kntorovich Nive bound 8 7 True error Totl Vrition True error Kntorovich Bound Totl vrition Bound Kntorovich Nive bound 9 8 True error Totl Vrition True error Kntorovich Bound Totl vrition Bound Kntorovich 6 7 Mximum error.5 Mximum error Mximum error Figure 3: True error nd estimted error bounds between the optiml vlue function of the originl nd ggregted MDP, s function of ε, for γ (left), γ 5 (middle) nd γ 9 (right). ure cler). The bounds get looser s γ increses, due to the γ fctor. We note, though, tht the shpe of the bound mimics very well the shpe of the ctul error. 7 Conclusion In this pper, we introduced metrics for mesuring the distnce between the sttes of n MDP, bsed on the notion of bisimultion. Unlike equivlence reltions, the metrics re robust to perturbtions in the prmeters of the MDP: if two bisimilr sttes re slightly perturbed, the metric will still show them s close. Moreover, the sme cn be sid of the sttes optiml vlues, s reflected by the bounds relting these to our metrics. uch metrics re obviously useful for stte ggregtion, but lso for other vlue function pproximtors (e.g. memory-bsed). We re currently pursuing n interesting connection to diffusion kernels on grphs (Kondor & Lfferty, 22). The existence of bisimultion metrics for finite MDPs llows us to tckle compression of such systems in new mnner. metric defined on the stte spce of n MDP cn be extended to metric on the spce of finite MDPs. With this in mind, we re now concerned with nswering the following question: given finite MDP nd positive integer k, wht is its best k-stte pproximtion? Here by best we men k-stte MDP of miniml distnce to the originl. We lso im to extend these results to other probbilistic models. We hve mostly estblished n extension for continuous-stte MDPs. In the future, we hope to tckle fctored MDPs nd prtilly observble MDPs s well. ppendix: Proof of Lemm 4.4 Fix probbility functions P nd Q. Monotonicity of follows from the priml LP: for, if d ) d then every fesible solution to d P Q is fesible solution to d P Q. Thus, d ) d. Next, given ω-chin d n note tht by monotonicity sup d n P Q ) d n P Q. For the other direction, we use the dul LP. For ech n, let l n denote

8 fesible solution of d n yielding the minimum. Then ech is lso fesible solution for d n. Define ε n d n s k s j 4 d n s k s j nd δ min P s k Q s j. Then for every k, j, nd n, ε n 8, lim n ε n, nd l n ) δ. Thus, d n P Q*) l n ) d n P Q l n ε n d n s k s j ) sup d n P Q δ ε n By tking n on both sides of the inequlity, we obtin the desired result. ppendix: Proof of Theorem 5.2 Once more we proceed by induction. s mx rρ s γ Pρ s D D mx D rs γ Psu u mx rs rs Ps u D γ D u D mx rs rs mx rs rs c R γ Psu u γ Ps u ρ u u mx rs rs γ Ps u ρ u PsuV n u γ Ps u Psu u Ps u Psu mx P s u ρ u u γ u c Note by theorem 5. tht R γ u : u constitutes fesible solution to the priml LP for d n Ps Ps. Hence we cn continue s follows: c R γ mx rs rs d n Ps Ps mx d n s s g s d n γmx g u d n g s d n g s d n P s u mx ρ u u γmx g u d n n γ n k γmx ρ u u n γ n k n γ n k k mx v g v d k k mx g u d k k mx g u d k References Bernrdo, M., & Brvetti, M. (23). Performnce mesure sensitive congruences for Mrkovin process lgebrs. Theoreticl Computer cience, 29, 7 6. Boutilier, C., Den, T., & Hnks,. (999). Decisiontheoretic plnning: tructurl ssumptions nd computtionl leverge. Journl of rtificil Intelligence Reserch,, 94. Den, T., Givn, R., & Lech,. (997). Model reduction techniques for computing pproximtely optiml solutions for Mrkov decision processes. Proceedings of UI (pp. 24 3). Deshrnis, J., Gupt, V., Jgdeesn, R., & Pnngden, P. (999). Metrics for lbeled mrkov systems. Interntionl Conference on Concurrency Theory (pp ). Deshrnis, J., Gupt, V., Jgdeesn, R., & Pnngden, P. (22). The metric nlogue of wek bisimultion for probbilistic processes. Logic in Computer cience (pp ). IEEE Computer ociety. Ferns, N. (23). Metrics for mrkov decision processes. Mster s thesis, McGill University. URL: nferns/mythesis.ps. Gibbs,. L., & u, F. E. (22). On choosing nd bounding probbility metrics. Interntionl ttisticl Review, 7, Givn, R., Den, T., & Greig, M. (23). Equivlence notions nd model minimiztion in mrkov decision processes. rtificil Intelligence, 47, Kondor, R. I., & Lfferty, J. (22). Diffusion kernels on grphs nd other discrete structures. Proceedings of the ICML. Lrsen, K., & kou,. (99). Bisimultion through probbilistic testing. Informtion nd Computtion, 94, 28. Milner, R. (98). clculus of communicting systems. Lecture Notes in Computer cience Vol. 92. pringer- Verlg. Orlin, J. (988). fster strongly polynomil minimum cost flow lgorithm. Proceedings of the Twentieth nnul CM symposium on Theory of Computing (pp ). CM Press. Prk, D. (98). Concurrency nd utomt on infinite sequences. Proceedings of the 5th GI-Conference on Theoreticl Computer cience (pp ). pringer- Verlg. Putermn, M. L. (994). Mrkov decision processes: Discrete stochstic dynmic progrmming. John Wiley & ons, Inc. vn Breugel, F., & Worrell, J. (2). n lgorithm for quntittive verifiction of probbilistic trnsition systems. Proceedings of the 2th Interntionl Conference on Concurrency Theory (pp ). pringer-verlg. Winskel, G. (993). The forml semntics of progrmming lnguges. Foundtions of Computing. The MIT Press.

Metrics for Markov Decision Processes with Infinite State Spaces

Metrics for Markov Decision Processes with Infinite State Spaces Metrics for Mrkov Decision Processes with Infinite Stte Spces Norm Ferns School of Computer Science McGill University Montrél, Cnd, H3A 2A7 nferns@cs.mcgill.c Prksh Pnngden School of Computer Science McGill

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Lecture 1. Functional series. Pointwise and uniform convergence.

Lecture 1. Functional series. Pointwise and uniform convergence. 1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is

More information

Theoretical foundations of Gaussian quadrature

Theoretical foundations of Gaussian quadrature Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Numerical Integration

Numerical Integration Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 UNIFORM CONVERGENCE Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 Suppose f n : Ω R or f n : Ω C is sequence of rel or complex functions, nd f n f s n in some sense. Furthermore,

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

A recursive construction of efficiently decodable list-disjunct matrices

A recursive construction of efficiently decodable list-disjunct matrices CSE 709: Compressed Sensing nd Group Testing. Prt I Lecturers: Hung Q. Ngo nd Atri Rudr SUNY t Bufflo, Fll 2011 Lst updte: October 13, 2011 A recursive construction of efficiently decodble list-disjunct

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

ODE: Existence and Uniqueness of a Solution

ODE: Existence and Uniqueness of a Solution Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =

More information

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance Generl structure ECO 37 Economics of Uncertinty Fll Term 007 Notes for lectures 4. Stochstic Dominnce Here we suppose tht the consequences re welth mounts denoted by W, which cn tke on ny vlue between

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior Reversls of Signl-Posterior Monotonicity for Any Bounded Prior Christopher P. Chmbers Pul J. Hely Abstrct Pul Milgrom (The Bell Journl of Economics, 12(2): 380 391) showed tht if the strict monotone likelihood

More information

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation Strong Bisimultion Overview Actions Lbeled trnsition system Trnsition semntics Simultion Bisimultion References Robin Milner, Communiction nd Concurrency Robin Milner, Communicting nd Mobil Systems 32

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

Math 270A: Numerical Linear Algebra

Math 270A: Numerical Linear Algebra Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner

More information

Numerical Integration

Numerical Integration Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes Jim Lmbers MAT 169 Fll Semester 2009-10 Lecture 4 Notes These notes correspond to Section 8.2 in the text. Series Wht is Series? An infinte series, usully referred to simply s series, is n sum of ll of

More information

Advanced Calculus: MATH 410 Uniform Convergence of Functions Professor David Levermore 11 December 2015

Advanced Calculus: MATH 410 Uniform Convergence of Functions Professor David Levermore 11 December 2015 Advnced Clculus: MATH 410 Uniform Convergence of Functions Professor Dvid Levermore 11 December 2015 12. Sequences of Functions We now explore two notions of wht it mens for sequence of functions {f n

More information

Notes on length and conformal metrics

Notes on length and conformal metrics Notes on length nd conforml metrics We recll how to mesure the Eucliden distnce of n rc in the plne. Let α : [, b] R 2 be smooth (C ) rc. Tht is α(t) (x(t), y(t)) where x(t) nd y(t) re smooth rel vlued

More information

Credibility Hypothesis Testing of Fuzzy Triangular Distributions

Credibility Hypothesis Testing of Fuzzy Triangular Distributions 666663 Journl of Uncertin Systems Vol.9, No., pp.6-74, 5 Online t: www.jus.org.uk Credibility Hypothesis Testing of Fuzzy Tringulr Distributions S. Smpth, B. Rmy Received April 3; Revised 4 April 4 Abstrct

More information

Entropy and Ergodic Theory Notes 10: Large Deviations I

Entropy and Ergodic Theory Notes 10: Large Deviations I Entropy nd Ergodic Theory Notes 10: Lrge Devitions I 1 A chnge of convention This is our first lecture on pplictions of entropy in probbility theory. In probbility theory, the convention is tht ll logrithms

More information

Lecture 19: Continuous Least Squares Approximation

Lecture 19: Continuous Least Squares Approximation Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for

More information

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar) Lecture 3 (5.3.2018) (trnslted nd slightly dpted from lecture notes by Mrtin Klzr) Riemnn integrl Now we define precisely the concept of the re, in prticulr, the re of figure U(, b, f) under the grph of

More information

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a). The Fundmentl Theorems of Clculus Mth 4, Section 0, Spring 009 We now know enough bout definite integrls to give precise formultions of the Fundmentl Theorems of Clculus. We will lso look t some bsic emples

More information

1.9 C 2 inner variations

1.9 C 2 inner variations 46 CHAPTER 1. INDIRECT METHODS 1.9 C 2 inner vritions So fr, we hve restricted ttention to liner vritions. These re vritions of the form vx; ǫ = ux + ǫφx where φ is in some liner perturbtion clss P, for

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

SYDE 112, LECTURES 3 & 4: The Fundamental Theorem of Calculus

SYDE 112, LECTURES 3 & 4: The Fundamental Theorem of Calculus SYDE 112, LECTURES & 4: The Fundmentl Theorem of Clculus So fr we hve introduced two new concepts in this course: ntidifferentition nd Riemnn sums. It turns out tht these quntities re relted, but it is

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

Math& 152 Section Integration by Parts

Math& 152 Section Integration by Parts Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

We will see what is meant by standard form very shortly

We will see what is meant by standard form very shortly THEOREM: For fesible liner progrm in its stndrd form, the optimum vlue of the objective over its nonempty fesible region is () either unbounded or (b) is chievble t lest t one extreme point of the fesible

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

Infinite Geometric Series

Infinite Geometric Series Infinite Geometric Series Finite Geometric Series ( finite SUM) Let 0 < r < 1, nd let n be positive integer. Consider the finite sum It turns out there is simple lgebric expression tht is equivlent to

More information

Spanning tree congestion of some product graphs

Spanning tree congestion of some product graphs Spnning tree congestion of some product grphs Hiu-Fi Lw Mthemticl Institute Oxford University 4-9 St Giles Oxford, OX1 3LB, United Kingdom e-mil: lwh@mths.ox.c.uk nd Mikhil I. Ostrovskii Deprtment of Mthemtics

More information

Frobenius numbers of generalized Fibonacci semigroups

Frobenius numbers of generalized Fibonacci semigroups Frobenius numbers of generlized Fiboncci semigroups Gretchen L. Mtthews 1 Deprtment of Mthemticl Sciences, Clemson University, Clemson, SC 29634-0975, USA gmtthe@clemson.edu Received:, Accepted:, Published:

More information

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac REVIEW OF ALGEBRA Here we review the bsic rules nd procedures of lgebr tht you need to know in order to be successful in clculus. ARITHMETIC OPERATIONS The rel numbers hve the following properties: b b

More information

N 0 completions on partial matrices

N 0 completions on partial matrices N 0 completions on prtil mtrices C. Jordán C. Mendes Arújo Jun R. Torregros Instituto de Mtemátic Multidisciplinr / Centro de Mtemátic Universidd Politécnic de Vlenci / Universidde do Minho Cmino de Ver

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

Tech. Rpt. # UMIACS-TR-99-31, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, June 3, 1999.

Tech. Rpt. # UMIACS-TR-99-31, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, June 3, 1999. Tech. Rpt. # UMIACS-TR-99-3, Institute for Advnced Computer Studies, University of Mrylnd, College Prk, MD 20742, June 3, 999. Approximtion Algorithms nd Heuristics for the Dynmic Storge Alloction Problem

More information

The practical version

The practical version Roerto s Notes on Integrl Clculus Chpter 4: Definite integrls nd the FTC Section 7 The Fundmentl Theorem of Clculus: The prcticl version Wht you need to know lredy: The theoreticl version of the FTC. Wht

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

CS667 Lecture 6: Monte Carlo Integration 02/10/05

CS667 Lecture 6: Monte Carlo Integration 02/10/05 CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

3.4 Numerical integration

3.4 Numerical integration 3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,

More information

How can we approximate the area of a region in the plane? What is an interpretation of the area under the graph of a velocity function?

How can we approximate the area of a region in the plane? What is an interpretation of the area under the graph of a velocity function? Mth 125 Summry Here re some thoughts I ws hving while considering wht to put on the first midterm. The core of your studying should be the ssigned homework problems: mke sure you relly understnd those

More information

Overview of Calculus I

Overview of Calculus I Overview of Clculus I Prof. Jim Swift Northern Arizon University There re three key concepts in clculus: The limit, the derivtive, nd the integrl. You need to understnd the definitions of these three things,

More information

Recitation 3: Applications of the Derivative. 1 Higher-Order Derivatives and their Applications

Recitation 3: Applications of the Derivative. 1 Higher-Order Derivatives and their Applications Mth 1c TA: Pdric Brtlett Recittion 3: Applictions of the Derivtive Week 3 Cltech 013 1 Higher-Order Derivtives nd their Applictions Another thing we could wnt to do with the derivtive, motivted by wht

More information

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as Improper Integrls Two different types of integrls cn qulify s improper. The first type of improper integrl (which we will refer to s Type I) involves evluting n integrl over n infinite region. In the grph

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4

More information

Heat flux and total heat

Heat flux and total heat Het flux nd totl het John McCun Mrch 14, 2017 1 Introduction Yesterdy (if I remember correctly) Ms. Prsd sked me question bout the condition of insulted boundry for the 1D het eqution, nd (bsed on glnce

More information

Math 360: A primitive integral and elementary functions

Math 360: A primitive integral and elementary functions Mth 360: A primitive integrl nd elementry functions D. DeTurck University of Pennsylvni October 16, 2017 D. DeTurck Mth 360 001 2017C: Integrl/functions 1 / 32 Setup for the integrl prtitions Definition:

More information

20 MATHEMATICS POLYNOMIALS

20 MATHEMATICS POLYNOMIALS 0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of

More information

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b. Mth 255 - Vector lculus II Notes 4.2 Pth nd Line Integrls We begin with discussion of pth integrls (the book clls them sclr line integrls). We will do this for function of two vribles, but these ides cn

More information

Conservation Law. Chapter Goal. 5.2 Theory

Conservation Law. Chapter Goal. 5.2 Theory Chpter 5 Conservtion Lw 5.1 Gol Our long term gol is to understnd how mny mthemticl models re derived. We study how certin quntity chnges with time in given region (sptil domin). We first derive the very

More information

Chapter 3 Polynomials

Chapter 3 Polynomials Dr M DRAIEF As described in the introduction of Chpter 1, pplictions of solving liner equtions rise in number of different settings In prticulr, we will in this chpter focus on the problem of modelling

More information

MAA 4212 Improper Integrals

MAA 4212 Improper Integrals Notes by Dvid Groisser, Copyright c 1995; revised 2002, 2009, 2014 MAA 4212 Improper Integrls The Riemnn integrl, while perfectly well-defined, is too restrictive for mny purposes; there re functions which

More information

Math Lecture 23

Math Lecture 23 Mth 8 - Lecture 3 Dyln Zwick Fll 3 In our lst lecture we delt with solutions to the system: x = Ax where A is n n n mtrix with n distinct eigenvlues. As promised, tody we will del with the question of

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

APPROXIMATE INTEGRATION

APPROXIMATE INTEGRATION APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

MATH 144: Business Calculus Final Review

MATH 144: Business Calculus Final Review MATH 144: Business Clculus Finl Review 1 Skills 1. Clculte severl limits. 2. Find verticl nd horizontl symptotes for given rtionl function. 3. Clculte derivtive by definition. 4. Clculte severl derivtives

More information

Review of Riemann Integral

Review of Riemann Integral 1 Review of Riemnn Integrl In this chpter we review the definition of Riemnn integrl of bounded function f : [, b] R, nd point out its limittions so s to be convinced of the necessity of more generl integrl.

More information

ODE: Existence and Uniqueness of a Solution

ODE: Existence and Uniqueness of a Solution Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry dierentil eqution (ODE) du f(t) dt with initil condition u() : Just

More information

STUDY GUIDE FOR BASIC EXAM

STUDY GUIDE FOR BASIC EXAM STUDY GUIDE FOR BASIC EXAM BRYON ARAGAM This is prtil list of theorems tht frequently show up on the bsic exm. In mny cses, you my be sked to directly prove one of these theorems or these vrints. There

More information