Finding Correlated Equilibria in General Sum Stochastic Games

Size: px
Start display at page:

Download "Finding Correlated Equilibria in General Sum Stochastic Games"

Transcription

1 Finding Correlted Equilibri in Generl Sum Stochstic Gmes Chris Murry nd Geoff Gordon June 2007 CMU-ML

2 Finding Correlted Equilibri in Generl Sum Stochstic Gmes Chris Murry nd Geoff Gordon June 2007 CMU-ML School of Computer Science Crnegie-Mellon University Pittsburgh, PA Abstrct Often problems rise where multiple self-interested gents with individul gols cn coordinte their ctions to improve their outcomes. We model these problems s generl sum stochstic gmes. We develop trctble pproximtion lgorithm for computing subgme-perfect correlted equilibri in these gmes. Our lgorithm is n extension of stndrd dynmic progrmming methods like vlue itertion nd Q-lerning. And, it is conservtive: while it is not gurnteed to find ll vlue vectors chievble in correlted equilibrium, ny policy which it does find is gurnteed to be n exct equilibrium of the stochstic gme (to within limits of ccurcy which depend on the number of bckups nd not on the pproximtion scheme). Our new lgorithm is bsed on the plnning lgorithm of [1]. Tht lgorithm computes subgme-perfect Nsh equilibri, but ssumes tht it is given set of punishment policies s input. Our new lgorithm requires only the description of the gme, n importnt improvement since suitble punishment policies my be difficult to come by.

3 Keywords: Multi-gent plnning, subgme perfect correlted equilibrium, stochstic gmes.

4 1 INTRODUCTION We model the multi-gent plnning problem, where self-interested rtionl gents interct with ech other nd with the world, s generl sum stochstic gme. The world stte nd the plyers joint ction determine the rewrds to ech plyer nd the world s next stte, nd the process repets. Since our gents re self-interested, simply finding policy tht chieves some vlue for ech gent will not suffice: we need to find policies where every gent hs n incentive to cooperte, tht is, equilibri of the gme. An equilibrium keeps every gent in line by promising rewrds for complince or thretening punishment for devition. More specificlly, in order to give gents the mximum flexibility in jointly choosing ctions, we will look for policies tht re subgme-perfect correlted equilibri. Unlike Nsh equilibri, correlted equilibri llow gents to correlte their ctions t ny given point in the gme. Tht is, gents smple from distribution over joint ctions, rther thn ech gent individully smpling her own ction nd hving the joint ction distribution be the product of the individul ction distributions. Correlted equilibri llow richer set of policies to be chieved, nd llow gents to void executing unintended joint ctions. In ddition, s we will see below, trgeting correlted equilibri llows us to develop clener lgorithm, since ech bckup opertion cn be viewed s pproximting convex set of pyoff vectors. Subgme perfection mens tht our equilibrium policies will contin no incredible threts: even fter devition by one gent (which might led to sitution which cn never be observed in equilibrium ply), no gent wishes to mke nother devition. In other words, if our policy deters devition with the thret of some punishment, the punishment policy is itself n equilibrium. (Of course, if we just wish to compute correlted equilibri without worrying bout subgme perfection, our lgorithm lso provides wy to do so.) Our focus here is on the plnning problem: given stochstic gme, we wnt to find the set of vlues tht cn be chieved in correlted equilibrium, nd policies to chieve these vlues. We won t focus on the problem of selecting n equilibrium from this set or executing policy once it is chosen: we will imgine tht modertor cn serve both of these functions. If modertor is unvilble, the negotition protocol in [1] cn be used to select n equilibrium 1 nd the cryptogrphic protocol in [2] cn be used to smple from distribution over joint ctions. 1 The negotition protocol requires disgreement policy s input, which we cn tke to be ny prespecified element of V(s strt). Good choices for disgreement policy re often domin-specific, but one resonble domin-independent choice might be tht, if the plyers disgree, they will pick vlue vector uniformly t rndom from the Preto frontier of V(s strt) nd use it s trget. 3

5 2 STOCHASTIC GAMES A stochstic gme represents multi-gent plnning problem in the sme wy tht Mrkov Decision Process [3] represents single-gent plnning problem. As in n MDP, trnsitions in stochstic gme depend on the current stte nd ction. Unlike MDPs, the current (joint) ction is vector of individul ctions, one for ech plyer. More formlly, generl sum stochstic gme G is tuple (S,s strt,p,a,t,r,γ). S is set of sttes, nd s strt S is the strt stte. P is the number of plyers. A = A 1 A 2... A P is the finite set of joint ctions. We write for joint ction, nd α for plyer s individul ction. When joint ction should be executed, but plyer p devites nd plys individul ction α insted of α, we write the resulting joint ction s p:α. We del with fully observble stochstic gmes with perfect monitoring, where ll plyers cn observe the true stte nd true joint ction. T : S A P(S) is the trnsition function, where P(S) is the set of probbility distributions over S. R : S A R P is the rewrd function. We will write R p (s,) for the pth component of R(s,). γ [0,1) is the discount fctor. Plyer p wnts to mximize the expecttion of her discounted totl vlue for the observed sequence of sttes nd joint ctions s 1, 1,s 2, 2,...: V p = γ t 1 R p (s t, t ) t=1 A (sttionry, joint) policy is function π p : S P(A) which tells the plyers how to pick their joint ction t ech stte. A nonsttionry policy is function π : ( t=0 (S A A) t S) P(A) which tkes history of sttes, recommended joint ctions, nd ctul joint ctions, nd produces distribution over recommended joint ctions for the next time step. For ny nonsttionry policy, there is sttionry policy tht chieves the sme vlue t every stte [4]; but, the sttionry policy my not be n equilibrium even if the originl nonsttionry policy is. For both kinds of policy, we imgine tht there is modertor who observes the current stte s (or the history of sttes nd ctions h), smples joint ction from π(s) (or π(h)), nd tells ech plyer her recommended ction p. We need such modertor (or n equivlent cryptogrphic protocol) to mke sure tht no plyer lerns nother plyer s recommended ction before choosing her own ction. The vlue function Vp π : S R gives expected vlues for plyer p under joint policy π t every stte. (For nonsttionry policy π we will define Vp π (h) to be the vlue fter observing history h.) And, the ction-vlue function Q π p : S A R gives expected vlues if we strt t stte s nd perform ction. (If π is nonsttionry, we will write Q π p(h,) for its vlue to p when strting t history h nd performing joint ction.) The vlue nd ction-vlue functions for π stisfy the liner equtions: V π p (h) = (π(h))()q π p(h,,) (1) 4

6 Q π p(h,, ) = R p (s(h), ) + γ s P(s s(h), )V π p ( h,,,s ) (2) Here s(h) is the stte corresponding to history h (i.e., the lst element of h). Note tht Vp π (h) in Eq. 1 depends only on Q-vlues for mtching recommended nd ctul ctions, but Eq. 2 defines Q-vlues for both mtching nd nonmtching cses. Also note tht we hve written Eqs. 1 2 for the more generl cse of nonsttionry policies; if π is sttionry, we cn simplify the equtions by replcing ech history with its finl stte. The vlue vector t stte s, V π (s), is the vector with components Vp π (s) (nd similrly for V π (h)). We will write V(s) to represent the set of vlue vectors which re chievble strting from stte s nd following ny correlted equilibrium policy (either sttionry or nonsttionry). This set is convex, since the modertor cn rndomize. 3 CORRELATED EQUILIBRIUM A joint policy π is correlted equilibrium if, when following π, no plyer ever hs incentive to devite. It s tempting to think of the correlted equilibrium condition s simply Vp π (s) Vp π (s) for ny policy π which is the sme s π except tht plyer p plys some individul ctions differently. However, this is not correct for two resons. First, if plyer p devites from π, the other plyers my rect nd chnge their ctions t subsequent sttes to punish p. So, the immedite benefit which p chieves by deviting must be weighed ginst her predicted future loss from being punished. Second, nd more subtly, p my consider not only unconditionl devitions of the form ignore wht I m supposed to do nd ply ction α insted, but lso conditionl devitions. During the execution of policy, t some time t nd stte s, p first lerns the individul ction α which she is recommended, nd then decides whether or not to follow tht ction. Lerning α tells plyer p conditionl distribution on wht joint ction will be followed; so, plyer p cn esily compute the conditionl expecttion of her future discounted vlue hving been recommended ction α. Crucilly, this vlue depends on the recommendtion α. Similrly, p s expected loss from being punished fter deviting cn lso depend on α. So, p my wish to devite fter some recommendtions nd not others. Putting these two requirements together, if policy wnts to recommend some individul ction α to plyer p fter history h, it must promise plyer p more by following α thn p would get from ny possible devition (nd the resulting punishment). More formlly, write Q π p(h,α,α ) for plyer p s expected vlue if she receives recommendtion α nd plys α insted, given tht the current history is h nd the policy is π: Q π p(h,α,α ) = P π ( α,h)q π p(h,, p:α ) (3) 5

7 For convenience, we will define Q π p(h,α,α ) to be zero if π never recommends ction α to plyer p given history h. Note tht Q π p(h,, p:α ) cn include penlty for plyer p if α α, since π cn prescribe tht the other plyers will chnge their behvior fter observing p:α insted of. With these definitions, the subgme-perfect correlted equilibrium condition is just ( h,p,α,α ) Q π p(h,α,α) Q π p(h,α,α ) (4) We cn relx the condition of subgme perfection by requiring Eq. 4 to hold only t histories h which re rechble during on-policy ply. 4 VALUE BACKUPS The lgorithm we use to find ll the chievble vlue vectors in stochstic gme is bsed on dynmic progrmming, nd is similr to (but more complicted thn) vlue itertion or Q-lerning for MDPs. The finl result of the lgorithm will be P-dimensionl convex set for ech stte, telling wht vectors of vlues re chievble in correlted equilibrium beginning in tht stte. The lgorithm will lso return informtion sufficient for us to reconstruct policy tht chieves ny one of those vlue vectors. The vlue bckups themselves ren t tht different from the norml MDP vlue bckups, s long s the multipliction nd ddition opertors re defined properly to work on convex sets of vectors. However, we must lso include pruning step which removes policies where some gent hs n incentive to devite. In this section we will describe the simpler bckup opertor tht doesn t enforce equilibrium constrints, nd thus finds ll vlue vectors chievble by ny policy in gme, rther thn only those chievble vi correlted equilibrium policy. In Section 5, we will show how to dd in the incentive constrints to rrive t the complete bckup opertor which does enforce equilibrium constrints. 4.1 MDP bckups To derive the set-vlued bckup opertor, we will strt from the ordinry Bellmn equtions for Mrkov decision processes: Q MDP (s,) = R(s,) + γ s P(s s,)v MDP (s ) (5) V MDP (s) = mx Q MDP (s,) (6) Here P(s s,) is the probbility of trnsitioning from stte s to stte s when tking ction. The MDP bckup works by treting Eqs. 5 nd 6 s ssignments: the opertor T MDP cn be written [ ] T MDP (V )(s) = mx R(s,) + γ s P(s s)v (s ) 6

8 or, in mtrix nottion, T MDP (V ) = mx [R + γp V ] (7) In this nottion, the mx opertion opertes componentwise. 4.2 Set-vlued bckups In the multi-plyer generliztion, V(s) R P is set of vlue vectors chievble strting from stte s: ech v V(s) hs one component for every plyer. But, the rules for bcking up the vlue for single plyer under fixed joint ction re exctly the sme s in Eq. 5. To pply Eq. 5 to n entire set of vlue vectors t once, we cn define ddition nd multipliction to work in the usul wy on sets of vectors: for two sets A nd B, sclr c, nd vector d, ca = {c A} d+a = {d+ A} A+B = {+b A,b B} If V is vector of sets nd M is mtrix of sclrs, the bove definitions of ddition nd sclr multipliction lso llow us to interpret the mtrix multipliction MV. With these definitions, ssuming tht V noprune (s) is the set of chievble vlue vectors t stte s, we cn write Q noprune (s,) = R(s,) + γ s P(s s,)v noprune (s ) (8) for the set of ll vlue vectors tht we cn chieve by strting t stte s nd executing ction. Eq. 8 is one hlf of the Bellmn equtions for stochstic gmes without pruning. The remining hlf defines V in terms of Q: V noprune (s) = conv A Q noprune (s,) (9) Here the conv opertor first tkes the union of its rguments, nd then finds the convex hull of the result. The reson we need conv in Eq. 9 (insted of the mx in Eq. 6) is tht we wnt ll vlue vectors tht cn be chieved from stte s, not just the one which mximizes some plyer s pyoff. (Convex combintions of chievble vlue vectors re chievble since the modertor cn rndomize mong joint ctions.) By combining Equtions (8) nd (9), we cn define the simplified trnsition opertor, which is the sme s one itertion of the exct vlue itertion lgorithm except tht it omits the pruning step. T noprune (V) = conv A [R + γp V] (10) If our gol were to find ll vlue vectors chievble vi ny policy, regrdless of equilibrium constrints, then repeted ppliction of T noprune would converge 7

9 to the correct nswer. 2 However, we wnt to prune out those vlues tht ren t chievble by correlted equilibrium policy. The following section detils how to do so. 5 INDIVIDUAL RATIONALITY In the full Bellmn equtions for discounted stochstic gmes (nd in the corresponding bckup opertor), we cn define Q exctly s we did for the no-pruning cse (cf. Eq. 8): Q(s,) = R(s,) + γ s P(s s,)v(s ) (11) But now, insted of finding V(s) by tking the convex hull of Q(s,) for ll, we need to define new pruning opertion which removes non-equilibrium ction distributions so tht V(s) = prune Q(s,) (12) The rest of this section defines the prune opertor; Appendix A shows tht our definition is correct. (Tht is, it shows tht the unique mximl solution of Eqs consists of exctly the vlue vectors chievble in subgme-perfect correlted equilibrium.) In the expression V = prune Q, the set V nd ll of the sets Q re subsets of R P. 5.1 Anlyzing V π ( s ) By definition, V(s) consists of ll vlue vectors V π ( s ) tht cn be chieved strting from stte s under ny subgme-perfect correlted equilibrium policy π. We cn brek π into two pieces: first, we hve n immedite distribution over recommended ctions, ω = π( s ). And second, we hve policy for the future: if we recommended n ction nd took possibly-different ction, then our future policy is π s,, (h) = π( s,,,h ). From Eqs. 1 2 nd the definitions of ω nd π s,,, we know tht [ ] V π ( s ) = ω() R(s,) + γ s P(s s,)v πs,, ( s ) (13) Eq. 13 shows tht the future policy π s,, influences V π ( s ) only through the vlue vectors V πs,, ( s ) t sttes s tht we might rech fter one step. Since we cn choose our future policy rbitrrily (our future ctions re not limited by how we rrived t s ), nd since V(s ) tells us wht vlue vectors we cn chieve t s, Eq. 13 mens tht we don t need to worry bout our exct future 2 Repeted ppliction will converge s long s we initilize V(s) to nonempty, bounded set for ech s. For the full lgorithm, we will in ddition need to initilize V(s) to set contining the correct nswer, such s the lrge cube suggested in Fig. 1. 8

10 policy: for purposes of computing V(s), we just need to keep trck of how much vlue our future policy will give us t ech stte s hving followed ech possible joint ction. In fct, by exmining Eq. 11, we cn see tht the term in brckets in Eq. 13 is n element of Q(s,). So, we hve V π p ( s ) = ω()q (14) where ω is probbility distribution over ctions nd where q Q(s,) for ech. If we llowed ll choices of ω nd q, we would rrive t the no-pruning Bellmn equtions described in Sec. 4. But, not every choice of ω nd q will correspond to n equilibrium policy. (So, prune Q(s,) will be subset of conv Q(s,).) Therefore, to compute V(s), we still need to enforce the individul rtionlity constrints, Eq Enforcing Eq. 4 Fixing h = s nd substituting the definition of Q π p(h,α,α ) into Eq. 4, we hve: P π ( α,s)q π p( s,,) P π ( α,s)q π p( s,, p:α ) (15) For π to be rtionl t s, Eq. 15 must hold for ll p, α, nd α with P π (α s) > 0. By Byes rule, P π ( α,s) = P π (α,s)p π ( s)/p π (α s). The first term, P π (α,s), is either 0 or 1 depending on whether α is consistent with. The second term, P π ( s), is given by our immedite ction distribution ω. And, since the lst term is positive nd doesn t depend on, we cn fctor it out nd cncel it from both sides of Eq. 15: ω()q π p( s,, p:α ) (16) α ω()q π p( s,,) α for ll p, α, nd α. (We cn drop the qulifiction P π (α s) > 0 since Eq. 16 is vcuous if P π (α s) = 0.) The sum is over ll ctions which re consistent with the recommendtion α. On the left-hnd side of Eq. 16, Q π p( s,,) is just the pth element of q from Eq. 14 bove. On the right-hnd side, Q π p( s,, p:α ) tells us how much vlue plyer p will get by deviting to α. Since the off-policy Q-vlue Q π p( s,, p:α ) does not directly influence V π (s), we don t need to be concerned with its exct vlue except to mke sure tht Eq. 16 is stisfied. Tht is, we only need to mke sure tht policy π punishes devitions sufficiently severely to deter them. By vrying the future policy π s,, over equilibrium policies, we cn mke the vector Q π ( s,, p:α ) be n rbitrry element of Q(s, p:α ). So, define Q p (s,) = min Q Q(s,) Q p 9

11 And, define Q(s,) to be the vector whose pth element is Q p (s,). Q p (s,) is the vlue of the hrshest punishment tht the other plyers cn impose on plyer p (within the bounds of equilibrium) given tht we strt with stte s nd ction. So, if Eq. 16 cn be stisfied t ll, it will be stisfied when Q π p( s,, p:α ) = Q p (s, p:α ). Tht mens tht Eq. 16 reduces to α ω()q α ω()q(s, p:α ) (17) for ll plyers p, recommendtions α, nd devitions α. Here ω is probbility distribution, nd q Q(s,) for ech. In Eq. 17, the opertion on length-p vectors is interpreted componentwise. If we wish to compute correlted equilibri without regrd to subgme perfection, we cn replce Q p (s,) by the miniml vlue tht ny fesible policy ssigns to plyer p; since this punishment will not be visited during equilibrium ply, it does not need to be n equilibrium unless we wnt subgme perfection. To compute the miniml fesible vlue for ech plyer t ech stte nd ction, we cn run the no-pruning version of our lgorithm to completion. 5.3 Putting it together Summrizing, we hve tht { } V(s) = ω()q (18) where the distribution ω nd the vlue vectors q Q(s,) re constrined to stisfy Eq. 17. Eqs constitute definition of the prune opertor from Eq. 12. However, to mke it esier to compute V(s), we will rerrnge this definition slightly: define q = ω()q And, ssume tht we re given system of inequlities defining Q(s,), Q(s,) = {q M q + b 0} (A mtrix M nd vector b for such system lwys exist, since Q(s,) is convex set; however, M nd b my be infinitely tll if Q(s,) hs curved boundry.) Then V(s) is chrcterized by the liner system of inequlities V = q (19) q ω()q(s, p:α ) p,α,α (20) α α M q + ω()b 0 (21) 10

12 ω() = 1 (22) ω() 0 (23) The constnts Q(s,) in Eq. 20 cn be precomputed from Q(s,). Inequlity 21 ensures tht q ω()q(s,), where ω()q(s,) is copy of Q(s,) scled down by ω(). 5.4 Policy execution If we hve solution to the Bellmn equtions (Eqs ), then we cn use Eqs to find, for ny trget vlue vector v V(s), probbility distribution ω() nd vlue vectors q Q(s,) such tht v = ω()q. (If q = 0, then q is not determined, but my be chosen rbitrrily since ω() = 0.) And, we know tht ech q stisfies q = R(s,) + γ s P(s s,)v (s ) (24) for some vectors v (s ) V(s ). The distribution ω nd vectors v (s ) for ll nd s tell us how to chieve the trget vlue vector v from stte s: 1. Drw joint ction ccording to the distribution ω(). 2. Attempt to execute tht joint ction. 3. If plyer p devited, switch to the policy corresponding to Q p (s,) Else, let s be the new stte. () Set the current stte s to be s. (b) Set the trget vlue vector v to be v (s ). (c) Recompute ω, q, nd v (s) for ll,s ccording to Eqs nd 24. (d) Goto 1. 6 ALGORITHM Using the Bellmn equtions (Eqs ) nd the liner-inequlity representtion of the prune opertor (Eqs ), we cn design dynmic progrmming lgorithm which computes n pproximtion to V(s) for ll s. We first present conceptul, exct lgorithm tht is intrctble to implement; below, in Sec. 6.1, we show how to modify the lgorithm so tht we cn implement it efficiently. 3 A subsequent devition by nother plyer p will cuse us to switch to some other punishment policy, corresponding to Q p (s, ) for the stte s nd ction involved in the devition. 11

13 Initiliztion for s S V(s) {V V Rmx 1 γ } end Repet until converged for itertion 1,2,... for s S Compute vlue vector set nd punishment vlue for ech joint ction for A Q(s,) {R(s,)} + γ s S P(s s,)v(s ) for p {1...P }, Q p (s,) min Q Q(s,) Q p end Do bckups for enforceble one-step joint-ction distributions V(s) {V (V,ω, q ) IRC} end end Figure 1: Dynmic progrmming using exct opertions on sets of vlue vectors. The set IRC is the intersection of the individul rtionlity constrints in Eqs In our lgorithm, the set V(s) for ech stte is initilized to lrge P- dimensionl hypercube, centered t the origin nd extending distnce Rmx 1 γ in ech direction, where R mx is the bsolute vlue of the lrgest rewrd vilble in the gme. This initiliztion mens tht V(s) strts out contining ll vlue vectors which plyers could ever hope to chieve. From this initil vlue of V, we compute Q ccording to Eq. 11. Then, for ech stte s, we intersect the liner inequlities to find vlue vectors V, probbility distributions ω, nd scled Q-vlues q tht re consistent with individul rtionlity for one step into the future. We updte V(s) to be the set of ll one-step individully rtionl vlue vectors V. We then continue in this mnner, lterntely recomputing Q nd V; ech dditionl such bckup extends the plnning horizon nd the individul rtionlity constrints nother step into the future. Finlly, once we hve computed V to the desired ccurcy, we cn select n element of V(s strt ) nd begin executing our policy s described in Sec Write V T(V) for the bckup opertion. We show in Appendix A tht T k (V) converges s k. Unfortuntely, we hve not been ble to show tht the convergence is liner s it is for MDPs: the problem is tht we could need to find very ccurte pproximtion to V(s ) before we relize tht some ction distribution ω is irrtionl t s. If tht ction distribution ws being used s punishment to support equilibri, such chnge could cuse rpid djustment to our vlue sets. 12

14 6.1 Approximte bckups The lgorithm given in the previous section is intrctble since it opertes on rbitrry convex sets. We cn mke trctble lgorithm by storing finite number of points to represent ech convex set V(s). Since the sets re convex, we need only store points on the exterior. Since the vlue vectors lie in R P, we use finite set of directions w 1...w K R P nd store only the points on the exterior of the convex sets frthest in ech direction w i. More precisely, t step k of the lgorithm we pproximte ech convex set T k (V 0 )(s) s V k (s) = conv {V i k(s) i = 1...K} where V i k (s) is the point in T(V k 1)(s) frthest in the w i direction, V i k(s) = rg mx V w i V T(V k 1 )(s) This pproximtion is conservtive, since our pproximte V k (s) is contined in the exct T k (V 0 )(s). So, while our pproximte lgorithm might miss equilibri, it will never erroneously clim tht non-equilibrium is n equilibrium. Using this representtion, the pproximte lgorithm is the sme s the exct lgorithm in Fig. 1, except tht we replce the line with V(s) {V (V,ω, q ) IRC} for i = 1...K, V i (s) rg mx V V w i s.t. (V,ω, q ) IRC (25) The constrints nd objectives for the mximiztions in Eq. 25 re liner, so we cn implement the pproximte lgorithm vi clls to stndrd LP solver. In prticulr, since the sets Q(s, ) re represented s the convex hull of finitely mny points, there re finitely mny liner constrints on the q vectors. We cn sve some time by not computing Q(s,) explicitly, but insted finding the frthest point in Q(s,) in the direction w i directly from the points V i (s): rg mx Q w i = R(s,) + γ P(s s,)rg mx V w i Q Q(s,) v V(s) s = R(s,) + γ s P(s s,)v i (s) 7 EXPERIMENTS In order to test our lgorithm nd our intuition, we creted simple repeted gme clled three-wy mtching pennies. 4 This is three-plyer symmetric gme where every plyer holds penny nd ech turn cn revel heds or tils. If plyers one nd two revel the sme side of the coin, plyer three gets 4 Repeted gmes re subset of generl sum stochstic gmes. 13

15 Tble 1: Pyoff mtrix to the three-wy mtching pennies gme. Plyer 1 Plyer 2 Plyer 3 Tils Heds Tils Tils (0,0,1) (0,0,1) Tils Heds (1,1,0) (0,0,1) Heds Tils (0,0,1) (1,1,0) Heds Heds (0,0,1) (0,0,1) point. If plyers one nd two revel different sides of their coins, they my ern point, depending on wht side plyer three shows. Thus the gme inherently fvors plyer three. The pyoffs re shown in tble 1. An importnt point is tht plyers one nd two will both be better off if they coordinte their joint ctions: ny time they both revel the sme side of the coin they will do poorly. Thus we expect tht solution policies which re correlted equilibri will llow plyers one nd two to do better thn solution policies which re merely Nsh equilibri. Figure 2 shows the chievble vlue vectors for this gme. By coordinting their ctions nd lwys plying either HT or TH, plyers one nd two cn do s well s plyer three nd score hlf the time (corresponding to the point (1,1) in the figure). This is the best plyers one nd two cn do: if they ply HT more thn hlf the time (or less thn hlf the time), plyer three will just ply T (or H) nd reduce their score. If plyers one nd two didn t coordinte but mde their ction choices independently, they would score qurter of the time (corresponding to the point (0.5,1.5)), since they would often end up plying the sme side of the coin nd gurnteeing plyer three point. And, if plyers one nd two nticoordinte their ction choices, they cn score none of the time (corresponding to the point (0,2)). This unfortunte outcome is still n equilibrium: plyers two nd three cting together cn keep plyer one from scoring ny points (nd likewise plyers one nd three cn keep plyer two from scoring), so with the thret of such punishment, there is no incentive for plyer one or two to devite. 8 CONCLUSION We presented trctble pproximtion lgorithm for finding subgme-perfect correlted equilibri in generl sum stochstic gmes. This plnning lgorithm is importnt since it llows self-interested gents to find policies where they cn jointly chieve higher pyoffs by cooperting, nd since subgme-perfect correlted equilibrium is strong condition. To use this lgorithm in prctice, the gents would need to coordinte on vlue vector in V(s strt ) nd would need to simulte modertor; for these purposes the gents cn use the negotition protocol in [1] nd the cryptogrphic protocol in [2], respectively. 14

16 2 1.5 Vlue to plyer Vlue to plyers 1 nd 2 Figure 2: Achievble correlted equilibrium vlue vectors for the three-wy mtching pennies gme with discount fctor of γ = 0.5. Though the true vlue vectors lie in three dimensions, the gme s pyoff structure trets plyers one nd two the sme, so the plot shows vlue to plyers one nd two on the X xis nd vlue to plyer three on the Y xis. Acknowledgements The uthors would like to thnk Ron Prr for helpful comments nd discussion t n erly stge of this work. This reserch ws supported in prt by grnt from DARPA s Computer Science Study Pnel progrm. References [1] Chris Murry nd Geoff Gordon. Multi-robot negotition: Approximting the set of subgme perfect equilibri in generl-sum stochstic gmes. In NIPS, [2] Yevgeniy Dodis, Shi Hlevi, nd Tl Rbin. A cryptogrphic solution to gme theoretic problem. In Lecture Notes in Computer Science, volume 1880, pge 112. Springer, Berlin, [3] D. P. Bertseks. Dynmic Progrmming nd Optiml Control. Athen Scientific, Msschusetts, [4] Prjit K. Dutt. A folk theorem for stochstic gmes. Journl of Economic Theory, 66:1 32,

17 A Proofs In this ppendix we will prove tht the exct lgorithm presented in Figure 1 is correct. We do so by mking the following rguments: Monotonicity The sets of vlue vectors stored by the lgorithm decrese monotoniclly s the lgorithm progresses, so if the lgorithm is llowed to run for long enough they will converge to finl sets of vlue vectors. (Since T is continuous, the finl sets form fixed point V of T, TV = V.) Achievbility After the lgorithm hs converged, ll vlue vectors tht it stores re chievble in subgme-perfect correlted equilibrium. Conservtive initiliztion The vlue vector sets re initilized to contin ll possible vlue vectors tht could be chieved in the gme. Conservtive bckups As the lgorithm runs, it never throws out vlue vector which is chievble in correlted equilibrium. These properties, together, ssure tht the lgorithm finds ll vlue vectors chievble in correlted equilibrium. It is interesting to note tht there my be more thn one solution to the Bellmn equtions (Eqs ). (In prticulr, setting V(s) = for ll s yields trivil fixed point.) The chievbility property mens tht ll of these fixed points contin only vlue vectors chievble in subgme-perfect correlted equilibrium. However, becuse of the conservtive initiliztion nd conservtive bckup properties, our lgorithm finds the (unique) lrgest fixed point, which includes ll equilibrium vlue vectors. The conservtive initiliztion property is esy to show: Figure 1 initilizes V(s) to the hypercube [ R mx /(1 γ),r mx /(1 γ)] P, nd no policy cn possibly chieve more thn this mount of rewrd. The following sections contin proofs of the remining properties. A.1 Monotonicity We will show, first, tht the bckup opertor T is monotone. Tht is, if V nd W re two set-vlue functions with V(s) W(s) for ll s (we will write this property s V W) then T(V) T(W) (26) We will then show tht, s long s we pick our initil set-vlue function V 0 ppropritely, T(V 0 ) V 0 (27) Using (27) s bse cse nd (26) s n inductive step, we will then hve tht, s climed, our sequence of vlue functions decreses monotoniclly s our lgorithm progresses. Lemm 1 T is monotone (Eqution (26)). 16

18 Proof: Write Then by definition Q (V) = R + γp V (28) T(V)(s) = prune Q (V)(s) (29) It is esy to see tht, if V(s) W(s) for ll s, then Q (V)(s) Q (W)(s) for ll s nd : liner opertions on sets preserve subset reltionships. So, if we cn show tht the pruning opertor lso preserves subset reltionships, we will hve the desired result. The sets Q (V)(s) pper in two plces in Eqs (the definition of the pruning opertor): first, they influence the fesible set for q, nd second, they influence the punishment vlues Q(s,). In the first cse, shrinking Q (V)(s) only leds to tighter constrint on q. And, in the second cse, shrinking Q (V)(s) only leds to higher vlue for Q(s,); since Q(s,) ppers with positive sign on the right-hnd side of constrint, rising Q(s,) lso results in tighter constrint. So, the fesible set described by Eqs when using Q (V)(s) is contined in the fesible set when using Q (W)(s), which is wht we wnted to prove. Lemm 2 With V 0 defined s in the initiliztion of Fig. 1, T(V 0 ) V 0. Proof: By stndrd MDP rgument, Q (V 0 ) V 0 : the stochstic mtrix P mps the origin-centered cube V 0 into itself, nd the discount fctor γ shrinks the cube enough tht the offset R cnnot plce the resulting set outside of V 0. But, s rgued in Sec. 5, prune Q conv Q ; the desired result follows. Given these two lemms, the inductive rgument outlined t the beginning of the section shows tht (T k+1 (V 0 ))(s) (T k (V 0 ))(s) for ech s nd k. So, the sequence (T k (V 0 ))(s) converges for ech s, since it is decresing nd bounded below by the empty set. (In fct, T k (V 0 ) is bounded below by ny fixed point V: becuse no policy cn chieve more thn R mx /(1 γ) or less thn R mx /(1 γ), we know V V 0. By monotonicity, T k V T k V 0, nd by the fixed point ssumption, T k V = V. So, T k V 0 contins V for ll k. Therefore, V = lim k T k V 0 contins ny fixed point V, mening tht V is the unique lrgest fixed point of T.) A.2 Achievbility Lemm 3 Let V be fixed set of T, tht is, V = T(V ). For ny v V (s), the policy π v,s described in Sec. 5.4 chieves v in expecttion strting from stte s. And, no gent hs n incentive to devite from π v,s t ny step. Proof: The proof will be by induction. Specificlly, we will show tht, for ll v nd s, following π v,s for k steps will yield n ctul expected discounted vlue vector A k (v,s) which stisfies A k (v,s) v γ k R mx 1 γ (30) 17

19 (The norm will lwys be the mx (infinity) norm, but we will leve off the subscript to void clutter.) So, lim A k(v,s) = v (31) k for ll s nd v V (s). Since the prune Q opertion enforces incentive constrints under the ssumption tht the Q sets correctly describe chievble vlues, Eq. 31 mens tht incentive constrints re correctly enforced. (More precisely, the incentive for ny plyer to devite is bounded by twice the mxnorm error in chieving trget vlue vector; since Eq. 31 shows tht this error is zero in the limit, there is no incentive to devite.) Bse cse: Following ny policy for 0 steps from stte s chieves A 0 (v,s) = 0, while v V (s) mens tht v Rmx 1 γ. So, A 0(v,s) v γ 0 Rmx 1 γ. Inductive cse: By the inductive hypothesis, we cn strt in ny stte s nd, for ny v V (s), chieve in k steps vlue vector A k (v,s) stisfying A k (v,s) v γ k Rmx 1 γ. Since V is fixed set, for every v V (s) there exists distribution ω nd vlues v,s for ll nd s stisfying one-step incentive constrints nd [ ] v = ω() R(s,) + γ P(s s,)v,s s (32) v,s V (s ) (33) Our k + 1-step policy will use the ction distribution ω on the first step, nd will trget v,s for the next k steps to chieve [ ] A k+1 (v,s) = ω() R(s,) + γ s P(s s,)a k (v,s,s ) (34) This lets us write the mx-norm distnce between the vlue chieved by the k + 1-step policy nd the trget vlue s A k+1 (v,s) v = A k+1(v,s) [ ω() R(s,) + γ P(s s,)v,s ] (35) s = ω()γ P(s s,)[a k (v,s,s ) v,s ] (36) s γ ω()p(s s,) A k (v,s,s ) v,s (37) s γ ω()p(s s,)γ K R mx (38) 1 γ s = γ K+1 R mx 1 γ 18 (39)

20 Here Eq. 35 expnds v using eqution 32. Eq. 36 expnds A k+1 (v,s) using eqution 34. This expnsion lets the rewrd terms ω()r(s,) cncel out. Eqution 37 uses the fct tht the norm of sum of terms is not greter thn the sum of the norms of the terms. Eqution 38 uses the inductive hypothesis, which sys tht K-step policy strting from ny stte s cn cn chieve vlue within γ K Rmx 1 γ of ny vlue v(s ) V (s ). The lst eqution uses the fct tht probbility distributions sum to one. Tking Eqs together, we hve verified the inductive cse nd hve therefore proven the lemm. A.3 Conservtive bckups Lemm 4 If V 0 contins the vlue vectors for ll subgme-perfect correlted equilibri t ll sttes, then T k (V 0 ) lso contins the vlue vectors for ll subgme-perfect correlted equilibri t ll sttes. Proof: The proof is by induction. The bse cse (k = 0) is ssumed in the lemm. For the inductive step, ssume the lemm is true for T k. The bckup opertor consists of two steps, defined in Eqs The first step throws out only vlue vectors tht re not chievble by n rbitrry initil ction followed by ny equilibrium policy. The second step throws out only vlue vectors for which the individul rtionlity constrints re violted t the first step, nd must therefore leve subgme-perfect correlted equilibri untouched. 19

21 Crnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA Crnegie Mellon University does not discriminte nd Crnegie Mellon University is required not to discriminte in dmission, employment, or dministrtion of its progrms or ctivities on the bsis of rce, color, ntionl origin, sex or hndicp in violtion of Title VI of the Civil Rights Act of 1964, Title IX of the Eductionl Amendments of 1972 nd Section 504 of the Rehbilittion Act of 1973 or other federl, stte, or locl lws or executive orders. In ddition, Crnegie Mellon University does not discriminte in dmission, employment or dministrtion of its progrms on the bsis of religion, creed, ncestry, belief, ge, vetern sttus, sexul orienttion or in violtion of federl, stte, or locl lws or executive orders. However, in the judgment of the Crnegie Mellon Humn Reltions Commission, the Deprtment of Defense policy of, "Don't sk, don't tell, don't pursue," excludes openly gy, lesbin nd bisexul students from receiving ROTC scholrships or serving in the militry. Nevertheless, ll ROTC clsses t Crnegie Mellon University re vilble to ll students. Inquiries concerning ppliction of these sttements should be directed to the Provost, Crnegie Mellon University, 5000 Forbes Avenue, Pittsburgh PA 15213, telephone (412) or the Vice President for Enrollment, Crnegie Mellon University, 5000 Forbes Avenue, Pittsburgh PA 15213, telephone (412) Obtin generl informtion bout Crnegie Mellon University by clling (412)

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Chapter 14. Matrix Representations of Linear Transformations

Chapter 14. Matrix Representations of Linear Transformations Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Lecture 1. Functional series. Pointwise and uniform convergence.

Lecture 1. Functional series. Pointwise and uniform convergence. 1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Theoretical foundations of Gaussian quadrature

Theoretical foundations of Gaussian quadrature Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

Math 426: Probability Final Exam Practice

Math 426: Probability Final Exam Practice Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by

More information

NOTES ON HILBERT SPACE

NOTES ON HILBERT SPACE NOTES ON HILBERT SPACE 1 DEFINITION: by Prof C-I Tn Deprtment of Physics Brown University A Hilbert spce is n inner product spce which, s metric spce, is complete We will not present n exhustive mthemticl

More information

A recursive construction of efficiently decodable list-disjunct matrices

A recursive construction of efficiently decodable list-disjunct matrices CSE 709: Compressed Sensing nd Group Testing. Prt I Lecturers: Hung Q. Ngo nd Atri Rudr SUNY t Bufflo, Fll 2011 Lst updte: October 13, 2011 A recursive construction of efficiently decodble list-disjunct

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 UNIFORM CONVERGENCE Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 Suppose f n : Ω R or f n : Ω C is sequence of rel or complex functions, nd f n f s n in some sense. Furthermore,

More information

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar) Lecture 3 (5.3.2018) (trnslted nd slightly dpted from lecture notes by Mrtin Klzr) Riemnn integrl Now we define precisely the concept of the re, in prticulr, the re of figure U(, b, f) under the grph of

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

ODE: Existence and Uniqueness of a Solution

ODE: Existence and Uniqueness of a Solution Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =

More information

221B Lecture Notes WKB Method

221B Lecture Notes WKB Method Clssicl Limit B Lecture Notes WKB Method Hmilton Jcobi Eqution We strt from the Schrödinger eqution for single prticle in potentil i h t ψ x, t = [ ] h m + V x ψ x, t. We cn rewrite this eqution by using

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz

More information

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a). The Fundmentl Theorems of Clculus Mth 4, Section 0, Spring 009 We now know enough bout definite integrls to give precise formultions of the Fundmentl Theorems of Clculus. We will lso look t some bsic emples

More information

Math Lecture 23

Math Lecture 23 Mth 8 - Lecture 3 Dyln Zwick Fll 3 In our lst lecture we delt with solutions to the system: x = Ax where A is n n n mtrix with n distinct eigenvlues. As promised, tody we will del with the question of

More information

We will see what is meant by standard form very shortly

We will see what is meant by standard form very shortly THEOREM: For fesible liner progrm in its stndrd form, the optimum vlue of the objective over its nonempty fesible region is () either unbounded or (b) is chievble t lest t one extreme point of the fesible

More information

Equations and Inequalities

Equations and Inequalities Equtions nd Inequlities Equtions nd Inequlities Curriculum Redy ACMNA: 4, 5, 6, 7, 40 www.mthletics.com Equtions EQUATIONS & Inequlities & INEQUALITIES Sometimes just writing vribles or pronumerls in

More information

Lecture 3. Limits of Functions and Continuity

Lecture 3. Limits of Functions and Continuity Lecture 3 Limits of Functions nd Continuity Audrey Terrs April 26, 21 1 Limits of Functions Notes I m skipping the lst section of Chpter 6 of Lng; the section bout open nd closed sets We cn probbly live

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Online Supplements to Performance-Based Contracts for Outpatient Medical Services

Online Supplements to Performance-Based Contracts for Outpatient Medical Services Jing, Png nd Svin: Performnce-bsed Contrcts Article submitted to Mnufcturing & Service Opertions Mngement; mnuscript no. MSOM-11-270.R2 1 Online Supplements to Performnce-Bsed Contrcts for Outptient Medicl

More information

Math 270A: Numerical Linear Algebra

Math 270A: Numerical Linear Algebra Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

UniversitaireWiskundeCompetitie. Problem 2005/4-A We have k=1. Show that for every q Q satisfying 0 < q < 1, there exists a finite subset K N so that

UniversitaireWiskundeCompetitie. Problem 2005/4-A We have k=1. Show that for every q Q satisfying 0 < q < 1, there exists a finite subset K N so that Problemen/UWC NAW 5/7 nr juni 006 47 Problemen/UWC UniversitireWiskundeCompetitie Edition 005/4 For Session 005/4 we received submissions from Peter Vndendriessche, Vldislv Frnk, Arne Smeets, Jn vn de

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Sufficient condition on noise correlations for scalable quantum computing

Sufficient condition on noise correlations for scalable quantum computing Sufficient condition on noise correltions for sclble quntum computing John Presill, 2 Februry 202 Is quntum computing sclble? The ccurcy threshold theorem for quntum computtion estblishes tht sclbility

More information

dt. However, we might also be curious about dy

dt. However, we might also be curious about dy Section 0. The Clculus of Prmetric Curves Even though curve defined prmetricly my not be function, we cn still consider concepts such s rtes of chnge. However, the concepts will need specil tretment. For

More information

13: Diffusion in 2 Energy Groups

13: Diffusion in 2 Energy Groups 3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups

More information

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

and that at t = 0 the object is at position 5. Find the position of the object at t = 2. 7.2 The Fundmentl Theorem of Clculus 49 re mny, mny problems tht pper much different on the surfce but tht turn out to be the sme s these problems, in the sense tht when we try to pproimte solutions we

More information

For the percentage of full time students at RCC the symbols would be:

For the percentage of full time students at RCC the symbols would be: Mth 17/171 Chpter 7- ypothesis Testing with One Smple This chpter is s simple s the previous one, except it is more interesting In this chpter we will test clims concerning the sme prmeters tht we worked

More information

Math 520 Final Exam Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Math 520 Final Exam Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008 Mth 520 Finl Exm Topic Outline Sections 1 3 (Xio/Dums/Liw) Spring 2008 The finl exm will be held on Tuesdy, My 13, 2-5pm in 117 McMilln Wht will be covered The finl exm will cover the mteril from ll of

More information

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one

More information

Lecture Note 9: Orthogonal Reduction

Lecture Note 9: Orthogonal Reduction MATH : Computtionl Methods of Liner Algebr 1 The Row Echelon Form Lecture Note 9: Orthogonl Reduction Our trget is to solve the norml eution: Xinyi Zeng Deprtment of Mthemticl Sciences, UTEP A t Ax = A

More information

Bernoulli Numbers Jeff Morton

Bernoulli Numbers Jeff Morton Bernoulli Numbers Jeff Morton. We re interested in the opertor e t k d k t k, which is to sy k tk. Applying this to some function f E to get e t f d k k tk d k f f + d k k tk dk f, we note tht since f

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

Recitation 3: Applications of the Derivative. 1 Higher-Order Derivatives and their Applications

Recitation 3: Applications of the Derivative. 1 Higher-Order Derivatives and their Applications Mth 1c TA: Pdric Brtlett Recittion 3: Applictions of the Derivtive Week 3 Cltech 013 1 Higher-Order Derivtives nd their Applictions Another thing we could wnt to do with the derivtive, motivted by wht

More information

Main topics for the First Midterm

Main topics for the First Midterm Min topics for the First Midterm The Midterm will cover Section 1.8, Chpters 2-3, Sections 4.1-4.8, nd Sections 5.1-5.3 (essentilly ll of the mteril covered in clss). Be sure to know the results of the

More information

Lecture 19: Continuous Least Squares Approximation

Lecture 19: Continuous Least Squares Approximation Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for

More information

MORE FUNCTION GRAPHING; OPTIMIZATION. (Last edited October 28, 2013 at 11:09pm.)

MORE FUNCTION GRAPHING; OPTIMIZATION. (Last edited October 28, 2013 at 11:09pm.) MORE FUNCTION GRAPHING; OPTIMIZATION FRI, OCT 25, 203 (Lst edited October 28, 203 t :09pm.) Exercise. Let n be n rbitrry positive integer. Give n exmple of function with exctly n verticl symptotes. Give

More information

20 MATHEMATICS POLYNOMIALS

20 MATHEMATICS POLYNOMIALS 0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.) CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts

More information

Energy Bands Energy Bands and Band Gap. Phys463.nb Phenomenon

Energy Bands Energy Bands and Band Gap. Phys463.nb Phenomenon Phys463.nb 49 7 Energy Bnds Ref: textbook, Chpter 7 Q: Why re there insultors nd conductors? Q: Wht will hppen when n electron moves in crystl? In the previous chpter, we discussed free electron gses,

More information

AP Calculus Multiple Choice: BC Edition Solutions

AP Calculus Multiple Choice: BC Edition Solutions AP Clculus Multiple Choice: BC Edition Solutions J. Slon Mrch 8, 04 ) 0 dx ( x) is A) B) C) D) E) Divergent This function inside the integrl hs verticl symptotes t x =, nd the integrl bounds contin this

More information

More on automata. Michael George. March 24 April 7, 2014

More on automata. Michael George. March 24 April 7, 2014 More on utomt Michel George Mrch 24 April 7, 2014 1 Automt constructions Now tht we hve forml model of mchine, it is useful to mke some generl constructions. 1.1 DFA Union / Product construction Suppose

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

Spanning tree congestion of some product graphs

Spanning tree congestion of some product graphs Spnning tree congestion of some product grphs Hiu-Fi Lw Mthemticl Institute Oxford University 4-9 St Giles Oxford, OX1 3LB, United Kingdom e-mil: lwh@mths.ox.c.uk nd Mikhil I. Ostrovskii Deprtment of Mthemtics

More information

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations. Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

Math 61CM - Solutions to homework 9

Math 61CM - Solutions to homework 9 Mth 61CM - Solutions to homework 9 Cédric De Groote November 30 th, 2018 Problem 1: Recll tht the left limit of function f t point c is defined s follows: lim f(x) = l x c if for ny > 0 there exists δ

More information

Lecture 3: Equivalence Relations

Lecture 3: Equivalence Relations Mthcmp Crsh Course Instructor: Pdric Brtlett Lecture 3: Equivlence Reltions Week 1 Mthcmp 2014 In our lst three tlks of this clss, we shift the focus of our tlks from proof techniques to proof concepts

More information

Chapter 3. Vector Spaces

Chapter 3. Vector Spaces 3.4 Liner Trnsformtions 1 Chpter 3. Vector Spces 3.4 Liner Trnsformtions Note. We hve lredy studied liner trnsformtions from R n into R m. Now we look t liner trnsformtions from one generl vector spce

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Generalized Fano and non-fano networks

Generalized Fano and non-fano networks Generlized Fno nd non-fno networks Nildri Ds nd Brijesh Kumr Ri Deprtment of Electronics nd Electricl Engineering Indin Institute of Technology Guwhti, Guwhti, Assm, Indi Emil: {d.nildri, bkri}@iitg.ernet.in

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

Is there an easy way to find examples of such triples? Why yes! Just look at an ordinary multiplication table to find them!

Is there an easy way to find examples of such triples? Why yes! Just look at an ordinary multiplication table to find them! PUSHING PYTHAGORAS 009 Jmes Tnton A triple of integers ( bc,, ) is clled Pythgoren triple if exmple, some clssic triples re ( 3,4,5 ), ( 5,1,13 ), ( ) fond of ( 0,1,9 ) nd ( 119,10,169 ). + b = c. For

More information

AQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system

AQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system Complex Numbers Section 1: Introduction to Complex Numbers Notes nd Exmples These notes contin subsections on The number system Adding nd subtrcting complex numbers Multiplying complex numbers Complex

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

Quadratic Forms. Quadratic Forms

Quadratic Forms. Quadratic Forms Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte

More information

221A Lecture Notes WKB Method

221A Lecture Notes WKB Method A Lecture Notes WKB Method Hmilton Jcobi Eqution We strt from the Schrödinger eqution for single prticle in potentil i h t ψ x, t = [ ] h m + V x ψ x, t. We cn rewrite this eqution by using ψ x, t = e

More information

SOLUTIONS FOR ADMISSIONS TEST IN MATHEMATICS, COMPUTER SCIENCE AND JOINT SCHOOLS WEDNESDAY 5 NOVEMBER 2014

SOLUTIONS FOR ADMISSIONS TEST IN MATHEMATICS, COMPUTER SCIENCE AND JOINT SCHOOLS WEDNESDAY 5 NOVEMBER 2014 SOLUTIONS FOR ADMISSIONS TEST IN MATHEMATICS, COMPUTER SCIENCE AND JOINT SCHOOLS WEDNESDAY 5 NOVEMBER 014 Mrk Scheme: Ech prt of Question 1 is worth four mrks which re wrded solely for the correct nswer.

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

Riemann Integrals and the Fundamental Theorem of Calculus

Riemann Integrals and the Fundamental Theorem of Calculus Riemnn Integrls nd the Fundmentl Theorem of Clculus Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University September 16, 2013 Outline Grphing Riemnn Sums

More information

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac REVIEW OF ALGEBRA Here we review the bsic rules nd procedures of lgebr tht you need to know in order to be successful in clculus. ARITHMETIC OPERATIONS The rel numbers hve the following properties: b b

More information