Towards a Belief-Based Theory of Repeated Games with Private Monitoring: An Application of POMDP

Size: px
Start display at page:

Download "Towards a Belief-Based Theory of Repeated Games with Private Monitoring: An Application of POMDP"

Transcription

1 Towards a Belef-Based Theory of Repeated Games wth Prvate Montorng: An Applcaton of POMDP KANDORI, Mchhro Faculty of Economcs Unversty of Tokyo OBARA, Ichro Department of Economcs UCLA Ths Verson: June 3, 2010 Abstract An equlbrum n a repeated game wth mperfect prvate motorng s called a fnte state equlbrum, feachplayer sactonon the equlbrum path s gven by an automaton wth a fnte number of states. We provde a tractable general method to check the equlbrum condtons n ths class. Our method s based on the belef-based approach and employs the theory of POMDP (Partally Observable Markov Decson Processes). Ths encompasses the majorty of exstng works. 1 Introducton Repeated games wth mperfect prvate montorng represent long-term relatonshps, where each player receves a nosy prvate sgnal about others actons. Although ths class of games has a wde range of applcatons, the characterzaton of all equlbra has yet to be obtaned. Ths s n sharp contrast to the theory of repeated games wth perfect or mperfect publc montorng, where complete characterzatons of all equlbra have been obtaned. The present paper provdes valuable general methods to verfy equlbrum condtons n repeated games wth prvate montorng. In partcular, we focus on a fnte state equlbrum, where each player s acton on the equlbrum path s gven by an automaton wth a fnte state space. We provde a complete characterzaton of ths class of equlbra and provde a tractable computatonal method to determne f a gven profle of fnte automata (one for Prevous versons of ths paper has been crculated under the ttle "Fnte State Equlbra". e-mal: kandor@e.u-tokyo.ac.jp e-mal: obara@econ.ucla.edu 1

2 each player to determne hs acton on the path of play) can consttute a (fnte) equlbrum. The present paper provdes a unfyng general theory to encompass the majorty of the exstng works, because most of them are based on some form of fnte state equlbra. The belef-based approach by Sekguch (1997), Bhaskar and Obara (2002) consder trgger strateges on the path of play, hence fnte state equlbra. The present paper can be regarded as a generalzaton of those papers. The beleffree approach by Ely and Välmäk (2002) consders an equlbrum whch can be mplemented by a fnte state automata on and off the path of play. Proposton 4 n Ely, Horner and Olszewsk (2005) (bang-bang property) shows that most of belef-free equlbrum payoffs (ndeed all of them f the dscount factor s close to 1) can be obtaned by a fnte state equlbrum wth only two states. Matsushma s revew strategy equlbra (2004) and Hörner and Olszewsk (2006) also employ fnte equlbra. An excepton whch does not employ a fnte equlbrum s Pccone (2002), whose equlbrum path requres nfnte (countably many) states. However, the result by Ely, Horner and Olszewsk shows that Pccone s equlbrum payoff can be obtaned by a fnte state equlbrum. A more recent paper by Phelan and Skrzypacz (2009) also proposes an algorthm to compute a class of statonary fnte state equlbra. Ther approach focuses on dynamcs of belefs, whle we utlze the theory of POMDP (Partally Observable Markov Decson Processes) to solve the belef-based dynamc programmng problem. 2 Repeated Games wth Prvate Montorng We use repeated games wth prvate montorng as our base model. Let A be the (fnte) set of actons for player =1,...,N and A := A 1 A N. Wthn each perod, player observes her own acton a and prvate sgnal ω Ω. We denote ω =(ω 1,...,ω N ) Ω := Ω 1 Ω N and let q(ω a) be the probablty of prvate sgnal profle ω gven acton profle a (we assume that Ω s a fnte set). We denote the margnal dstrbuton of ω by q (ω a). It s also assumed that no player can nfer whch actons were taken (or not taken) for sure; to ths end, we assume that each ω Ω occurs wth a postve probablty for any a A ( full support assumpton). Player s realzed payoff s determned by her own acton and sgnal, and denoted π (a, ω ). Hence her expected payoff s gven by g (a) = X ω Ω π (a, ω )q(ω a). Ths formulaton ensures that the realzed payoff π conveys no more nformaton than a and ω do. Amxedactonforplayer s denoted by α (A ),where (A ) s the set of probablty dstrbutons over A. Wth an abuse of notaton, we denote the expected payoff and the sgnal dstrbuton under a mxed acton profle α =(α 1,...,α N ) by g (α) and q(ω α) respectvely. The stage game s to be played 2

3 repeatedly over an nfnte tme horzon t =1, 2,... Player s dscounted payoff from a sequence of acton profles a(t) A, t =0, 1, 2... s gven by P t=1 δt g (a(t)), where δ (0, 1) s the dscount factor. 3 Repeated Game Strateges and Path Automata We now explore several ways to represent repeated game strateges. We start wth the conventonal representaton of strateges n the repeated game defned above. 3.1 Repeated Game Strateges A prvate hstory for player at the begnnng of tme t s the record of player s past actons and sgnals, h t =(a (0), ω (0),...,a (t 1), ω (t 1)) H t := (A Ω ) t. To determne the ntal acton of each player, we ntroduce a dummy ntal hstory (or null hstory) h 0 and let H0 be a sngleton set {h0 }. A pure strategy s for player s a functon specfyng an acton after any hstory: formally, s : H A,where H = t 0 H t. Smlarly, a (behavorally) mxed strategy for player s denoted by σ : H (A ). 3.2 Path Automata and Fnte State Equlbra A path automaton M (Θ, b θ,f,t ) of player specfes the path of play (but not the behavor off the path of play) for player, by specfyng the followng: 1. a set of states Θ 2. the ntal state b θ Θ 3. (pure) acton choce for each state, f : Θ A (wthout loss of generalty, we can assume that a pure acton s played n each state 1.) 4. (possbly stochastc) state transton T : Θ Ω (Θ ). Specfcally, T (θ (t +1) θ (t), ω (t)) s the probablty of the next state beng θ (t +1) gven the current state θ (t), actona (t) =f (θ (t)), and prvate sgnal ω (t). A path automaton wthout the specfcaton of the ntal state, denoted by m (Θ,f,T ),sreferredtoasapath preautomaton. Ths concept turns out to be useful n our analyss. For any path preautomaton m,wedenotethe correspondng path automaton wth ntal state θ Θ by (m, θ ). 1 Stochastc transton functon can represent mxed acton. For example, suppose that acton C and D are played wth an equal probablty at state θ. Then, we can splt ths state θ nto two states θ C and θ D and assume that n state θ a, pure acton a s played (a = C, D). Furthermore we can specfy the stochastc state transson functon such that, n the event that θ s to be the next stae, state θ C or θ D realzes wth an equal probablty. 3

4 An mportant part of our defnton above s that the transton functon T presumes that the equlbrum acton a (t) =f (θ (t)) s played. Hence, our path automaton does not specfy the behavor after a dsequlbrum acton a (t) 6= f (θ (t)) (and therefore a path automaton only represents a part of a repeated game strategy). To represent a repeated game strategy, one need to extend the transton functon to specfy the state transton after a devatng acton a (t) 6= f (θ (t)). To ths end, let us defne an extended transton functon T : Θ Ω A (Θ ),where T (θ (t+1) θ (t), ω (t),a (t)) stheprobabltyofthenextstatebengθ (t+1) gven the current state θ (t), an arbtrary current acton a (t), and the current prvate sgnal ω (t). An extended automaton s denoted by M (Θ, b θ,f, T ). An extended automaton specfes an acton after any prvate hstory, so t nduces a full repeated game strategy. A message of the present paper s that, somewhat surprsngly, path automata often turn out to be more useful than extended automata n the belefbased analyss of repeated games wth prvate montorng. A fnte path automaton s a path automaton wth a fnte number of states. A fnte path preautomaton and a fnte extended automaton are defned smlarly. We are nterested n equlbra where each player s behavor on the equlbrum can be descrbed by a fnte path automaton. We call a profle of fnte path preautomata m =(m 1,...,m N ) compatble f, for every, there exsts some state θ Θ and some belef b (Θ ) such that (m, θ ) s the optmal plan gven hs subjectve belef b. Compatblty s necessary for any profle of fnte path preautomata to be played on the equlbrum path. Note that compatblty does not guarantee that a set of such state-belef pars across players are consstent: t may not be generated by some jont dstrbuton on Θ. Suppose that there s a common pror r (Θ) = (Θ 1 Θ N ) such that (m, θ ) s optmal aganst r( θ ) (Θ ) for every θ Θ n the support of r and =1,..,N. Snce we assume full support (of the margnal dstrbuton on Ω ), such profle of fnte path preautomata and a jont dstrbuton r on Θ consttutes a (correlated) sequental equlbrum once optmal strateges are assgned after every off the equlbrum hstory. 2 A fnte state equlbrum s such a correlated sequental equlbrum where on-path behavor can be represented by fnte state preautomata and a jont dstrbuton on the product state space. Defnton 1 A fnte state equlbrum s a (correlated) sequental equlbrum of a repeated game wth prvate montorng, where players behavor on the equlbrum path s gven by fnte path preautomata m (Θ,f,T ), =1,...,N and a jont probablty dstrbuton of the ntal states r (Θ). 2 Snce we assume full support, (1) there s no ambguty regardng the defnton of sequental equlbrum although our game s an nfnte horzon game and (2) every correlated sequental equlbrum s a correlated equlbrum.e. sequental ratonalty s not a restrcton. 4

5 The probablty dstrbuton r s the ntal correlaton devce. At the begnnng of t =0,aprofle of recommended ntal states b θ realzes wth probablty r( b θ), and player observes hs recommended ntal state b θ. In a fnte state equlbrum, each player fnds t optmal to follow path automaton (m, b θ ), gven that others obey ther recommended ntal states. In other words, (m, r) consttutes a correlated equlbrum. A standard argument shows that (m, r) can always be "extended" to obtan a sequental equlbrum: Remark 1 The path preautomata m 1,...,m N do not specfy how each player should behave after hs own devatons, but we can always fnd, for each player, arepeated game strategy σ (m ) such that () t specfes the same behavor as m on the equlbrum path and () t specfes an optmal contnuaton strategy for each nformaton set of player. 3 We wll refer both (σ 1 (m 1 ),...,σ N (m N ),r) and (m, r) as a fnte state equlbrum, f no confuson ensues. Note that, n a fnte state equlbrum, a player s behavor off the equlbrum path need not be descrbed by a fnte automata. In fact, there s an equlbrum dentfed n the exstng lterature whch has the on-path behavor descrbed by fnte path automata, whle specfyng ts off-path behavor requres an extended automaton wth nfntely many states: Example 1 The followng example s from Bhaskar and Obara (2002). Consder a standard repeated prsoner s dlemma game wth almost perfect and ndependent prvate montorng: A = {C, D}, Ω = {c, d}, Pr (ω = c a,c)=pr(ω = d a,d)= 1 ε for any a A. Suppose that player 2 s play s determned by the followng trgger strategy" preautomaton m, Θ = {R, P }, f(r) = C, f (P )= D, T (R R, c) = 1,T (P R, d) =T (P P, ω 2 )=1for any ω 2 Ω 2. Let b 1 be player 1 s belef that player 2 s at state R. For a certan range of dscount factor and small enough ε > 0, t can be shown that the optmal strategy aganst ths preautomaton can be represented as a smple cut-off strategy ( belef-based strategy") n terms of belef: play C f b 1 >b and play D f b 1 <b for some cut-off belef b (0, 1). It can be shown that ths optmal strategy can be descrbed by a fnte state automaton gven any belef. In fact, t s exactly gven by the above preautomaton tself: (m, P ) s the optmal strategy for b 1 [0,b ] and (m, R) s the optmal strategy for b 1 [b, b] 3 The strategy σ (m ) s obtaned by specfyng an optmal contnuaton strategy for each nformaton set of player that s not reached on the equlbrum path (such an nformaton set can bereachedonlybyplayer s own devatons). Ths does not change player s payoff, andt does not affect other playres best reples ether (because player s devaton s never detected by other players under the full support assumpton). It remans to check f σ (m ) specfes optmal contnuaton strateges on the nformaton sets that are reached on the equlbrum path. If ths were not true, player s payoff assocated wth m couldbemprovedbyreplacngthesupoptmal contnuaton strategy (specfed by m ) wth an optmal one n some nformaton set that s reached wth a postve probablty. Ths would mply that m were not a best reply, contradctng our premse that (m, r) s a correlated equlbrum. 5

6 ([0, b] s a belef-closed set,.e. player 1 s belef never leaves [0, b] when the ntal belef s n [0, b].. So we don t have to consder b 1 above b). However, player 1 s optmal strategy off the equlbrum path cannot be descrbed by a fnte state automaton. Consder a hstory where b 1 s close to 0, for example, a hstory such as(..., Dd, Dd, Dd, Dd). Suppose that player 1 devates repeatedly by playng C (because D s the unque optmal acton for b 1 [0,b ]) and observes c n many perods. In ths off the equlbrum path, player 1 s belef eventually exceeds b after many realzaton of (Cc), after whch the optmal contnuaton strategy must be (m, R). Tomplementsuchastrategyoff the equlbrum path by an extended fnte state automaton, one needs to use as many number of states as the requred number of Cc to go back to (m, R). But ths number can be arbtrarly large f b 1 s arbtrarly close to 0. Hence there s no fnte extended automaton to mplement suchastrategy.. 4 Prvate Montorng Problem as Partally Observable Markov Decson Process (POMDP). In a fnte state equlbrum, player opponents equlbrum behavor s gven by fnte path preautomata m j = (Θ j,f j,t j ), j 6=. Under the full support assumpton of prvate sgnals, player never receves an evdence that hs opponents j 6= devated from ther equlbrum behavor, so that he always beleves that hs opponents are usng ther path preautomata. Hence, after any prvate hstory, player s nformaton regardng the current and future behavor of player j 6= s summarzed by hs belef over hs opponents states b 4 (Θ ). Belef b compactly summarzes the relevant nformaton contaned n a prvate hstory h t =(a (1), ω (1),...,a (t), ω (t)) (a much more complcated object than b ). Furthermore, player s belef at tme t, denoted by b (t), can be calculated by hs prevous belef b (t 1), acton a (t 1) and sgnal ω (t 1) (the dynamcs of belefs s a controlled Markov process). Ths means that, n a fnte state equlbrum, each player faces a relatvely smple problem: a Markov decson problem (dynamc programmng problem) wth a fnte dmensonal state space 4 (Θ ). Ths pont has been emphaszed by, for example, Bhaskar and Obara (2002), Malath and Morrs (2002), and Phelan and Skrzypacz (2009). We go one step further and observe that ths problem corresponds to a reasonably tractable class of Markov decson problems, known as POMDP (Partally Observable Markov Decson Process). The present paper ntroduces a general technque to solve such a Markov decson problem, buldng heavly on a technque developed n operatons research and computer scence (see, for example, Kaelblng, Lttman, and Cassandra (1998)). A crucal observaton s that ths decson problem s easer to solve than t appears. An apparent dffculty n solvng ths problem s that the value functon W s defned on uncountable number of states b 4 (Θ ). Hence, computng value teraton W n+1 (b )=Γ W n(b ) or verfyng the fxed pont equaton W (b )= 6

7 Γ W (b ) for all b n prncple nvolves uncountably many calculatons (one for each b ). Ths s certanly true for a general non-lnear value functons. Fortunately, however, the theory of POMDP shows that we can confne attenton to pecewse lnear value functons, and for those partcular value functons the computaton can be exactly done wth a fnte number of calculatons. Let us explan how the theory of POMDP works to verfy that a gven fnte path preautomata m (Θ,f,T ), =1,...,N, consttute a correlated sequental equlbrum for some ntal correlaton devce r (Θ). We wll focus on player s decson problem, gven the opponents fnte state path preautomata m. The relevant state for player s the profle of hs opponents state θ,whchevolvesby the Markov transton functons T j (j 6= ). However, the true state s not drectly observable, and player only obtans partal nformaton through hs prvate sgnal ω. Ths s an nstance of a Markov decson process wth partally observable state varable. The jont dstrbuton of current sgnal ω and the next state θ 0 gven the current state and acton (θ,a ) s gven by r (ω, θ 0 θ,a ) X Y Tj (θ 0 j θ j, ω j )q(ω, ω a,f (θ )), (1) ω j6= Usng r thus defned, we can derve player s posteror belef χ [a, ω,b ] after current acton a and prvate sgnal ω gven her current belef b (Θ ): χ [a, ω,b ](θ 0 ) = Prb,a (ω, θ 0, ) Pr b,a (ω ) P θ = r (ω, θ 0 θ,a )b (θ ) Pθ q (ω a,f (θ ))b (θ ). Gven any value functon W : (Θ ) <, defne the Bellman operator Γ W by Γ W (b )= X max b (θ ) X a A θ j θ j Call W (b ) a belef-based value functon. (2) g (a,f (θ )) + δ X W (χ [a, ω,b ])q (ω a,f (θ )). ω The standard theory of dynamc programmng shows that the optmal value functon W (b ), whch represents the maxmum dscounted payoff to player under current belef b, s the unque fxed pont Γ W = W.Furthermore,W can be obtaned as the lmt of the sequence of value functons {W n n+1 },wherew = Γ W n (W 0 can be any functon). The theory of POMDP shows that the Bellman operator Γ W (2) has a much smpler expresson for some W. To explan how t works, we need to ntroduce a couple of concepts. Frst, for any fnte set of player s path automata M = {M 1,...,MK }, consder a path automaton such that 7

8 1. t starts wth a state where some pure acton a s played and 2. after ω s realzed, an automaton n M, denoted by M (ω ), s mplemented. Such a path automaton (a,m ( )) s called a one-shot extenson of path automata M. It s a path automaton constructed by attachng an ntal state to the set of path automata M. Let fm denote the set of all one-shot extensons of M. Note that M f has a fnte number (= A M Ω )ofelements. Second, for any path automaton M of player, letv M θ be player s payoff assocated wth M, when the opponents states are θ. Then the expected payoff assocated wth M under belef b 4 (Θ ) s gven by V M (b )= X v M θ b (θ ). (3) θ As an expected payoff, ths functon s lnear n belef b. Ths fact plays an mportant role n what follows. Gven those concepts, the essence of POMDP can be summarzed by the followng proposton. It shows that the Bellman operator Γ W (2) has a very smple representaton for some partcular value functons W : Proposton 1 (POMDP) Let M be a set of player s path automata and consder value functon W (b ) max M M V M (b ). Then, the Bellman operator for ths value functon s gven by Γ W (b )= max M M V M (b ), where f M s the set of one-shot extensons of M. The proof s gven by a drect calculaton. Here we provde an ntuton. The value functon W (b ) max M M V M (b ) s smply player s payoff under the constraned best reply to b (gven that player must chose a path automata n set M ). Hence, the maxmand on the rght-hand sde of the Bellman operator Γ W (2) s the payoff assocated wth some acton a today, followed by a best path automata n M. Hence, Γ W (b ) s the payoff assocated wth the best one-shot extenson of M, gven belef b (=max M M V M (b )). Why s ths proposton useful? In prncple, computng Γ W (b ) for all b 4 (Θ ) nvolves uncountably many calculatons (one for each b ). However, computaton of max M M V M (b ) s easy. Ths s smply the upper envelope of a fnte number of lnear functons V M (b ), M M f, and t can be exactly calculated n a fnte number of steps. Now let us formally explan how the theory of POMDP works. Recall that our goal s to verfy that the gven fnte path preautomaton profle m =(m 1,...,m N ) 8

9 can consttute a fnte state equlbrum. follows. The POMDP procedure s descrbes as P OMDP V alue Iteraton: Defne the ntal canddate path automata as M 0 {(m, θ )} θ Θ. In step n (n=1,2...), gven the set of ntal canddate path automata M n 1, compute the value functon by W n (b )= max M M n 1 V M (b ), where M fn 1 s the set of one-shot extensons of path automata M n 1. Ths can be computed n a fnte number of steps, because W n s the upper envelope of a fnte number of lnear functons V M (b ). Then, construct the set of canddate path automata for the nest step by "cream skmmng" the one-shot extensons: n M n M M f o n 1 W n (b )=V M (b ) for some b. Ths defnes an ncreasng sequence of value functons W 0 W 1,where, for each n<, W n s a pecewse lnear, convex and contnuous functon (because t s the upper envelope of a fnte number of lnear functons). The optmal value functon s obtaned as the lmt W =lm n W n. The monotoncty of the sequence of value functon s shown by a standard argument. It s easy to show W 0 W 1.4 By the defnton, operator Γ s monotone: W 1 W 0 mples W 2 = Γ W 1 Γ W 0 = W 1. Proceedng nductvely, we obtan W 0 W 1... Also the followng s a standard result n POMDP. Lemma 1 The optmal value functon W s convex and contnuous. 4 The proof W 0 W 1 : If the transton functons of canddate automata n M 0 are determnstc, the proof s trval (because M 0 M 0 ). Here we consder the case wth stochastc transton functons. W 1 (b ) s the payoff when () an optmal acton a s played today and () the contnuaton play after ω s gven by a best on-path automaton n M 0, gven the posteror belef b = χ [a, ω,b ]. On the other hand, W 0 (b ) s contnuaton payoff s asspcated wth some on-path automataon n M 0. Hence, t s the payoff when () some acton s played today and () the contnuaton play after ω s gven by a probablty dstrbuton over M 0 (ths s gven by the possbly stochastc transton functon T of preoutomaton m ). Snce the choce of contnuaton automata n () may not be optmal, W 0 (b ) s no greater than W 1 (b ), whch uses optmal contnuaton automata n M 0. Q.E.D. 9

10 Proof. Let M be the set of optmal path automata (.e., an element of M s a path automaton whch s optmal for some belef b ). Then, W (b ) s the upper envelope of a famly of lnear functons V M (b ), M M. Hence t s convex and contnuous. Remark 2 An mportant goal n the lterature on POMDP s to desgn a fast computer algorthm to mplement the value teraton descrbed above, and there are some free programs ("POMDP solvers") avalable over the nternet. The POMDP value teraton provdes a teratve procedure to fnd player s best reples aganst m j, whch represents varous contnuaton strateges the opponents mght use n the equlbrum. In each step n, afnte number of path automata M n s gven and the procedure search for better reples n a neghborhood of M n. Ths neghborhood (denoted M gn ) s the set of one-shot extensons of M n. The cream skmmng procedure s just to fnd undomnated strateges n M gn (n the game wth restrcted strategy spaces gven by M gn and m ). (Note that the cream skmmng procedure s to fnd strateges whch can be a best reply to some jont dstrbuton over the opponents states. A standard result n game theory shows that such a strategy s equvalent to a strategy that s not strctly domnated.) The POMDP teraton shows that repeated search for better reples by fndng undomnated oneshot extensons eventually leads to the best response. A partcularly useful mplcaton s that verfyng optmalty s easy. To show that a fnte set of path automata M provdes best reples to m, one only needs to show that there s no better reples n the one-shot extensons of M. Ths s a counterpart of the celebrated one-shot devaton prncple n the repeated games wth publc montorng. We elaborate on ths pont n Secton Fnte State Equlbrum as Multperson POMDP So when does a gven fnte state path automaton m =(m 1,...,m N ) consttute a fnte state equlbrum? Observe that (m, θ ) s optmal gven b 4 (Θ ) f and only f the dscounted value generated by path automaton (m, θ ) s exactly equal to the optmal value functon W (b ). Hence m s compatble f and only f W (b )=W 0 (b ) for some b for every. Proposton 2 Aprofle of path preautomata m =(m 1,...,m N ) s compatble f there exsts b (Θ ) such that W (b )=W 0 (b ) for =1,...,N. Once we verfy that m s compatble, then we need to fnd a jont dstrbuton r on (Θ) such that (m, r) consttutes a correlated sequental equlbrum. Thus our basc computatonal procedure works as follows. 10

11 Verfcaton Procedure: Gven fnte path preautomata m =(m 1,...,m N ) 1. Use the theory of POMDP to fnd the optmal value functon W. 2. Look for the regon of belefs where the optmal value concdes wth the payoff assocated wth a canddate path automaton (m, θ ): B θ {b W (b )=V (m,θ ) (b )}. Snce W s convex and contnuous and V (m,θ ) s lnear, B θ s a (possbly empty) closed and convex set. If ths s empty for all θ, m cannot consttute an equlbrum. Otherwse, proceed to Fnd an ntal correlaton devce r (Θ) such that r (θ ) > 0 mples r ( θ ) B θ 6= (r s the margnal dstrbuton, and r ( θ ) s the condtonal dstrbuton of θ ). If there s such r, then(m, r) s a fnte state equlbrum. If there s no such r, m cannot consttute an equlbrum. We frst dscuss the frst two steps, whch concerns wth the compatblty of path automata, then dscuss the thrd step n the next subsecton. To verfy compatblty, we frst need to fnd the optmal value functon W. One possblty s that we fnd a fxed pont W n = Γ W n = W n a fnte number of step. We have examples of ths case n Sectons It s sometmes useful to focus on a subset of belefs. A belef-closed set X (Θ ) for player s the set of player s belefs such that player s posteror belef wll never leave t f player s current belef s n t after any prvate hstory of player (ncludng off the equlbrum hstory). Formally, X s a belef-closed set f χ [a, ω,b ] X for any a A, ω Ω and b Θ. If we can fnd a fxed pont W = Γ W on X,thentmustbethecasethatW (b )=W (b ) on X. Ths s because player s dynamc programmng problem s unchanged even f we use a smaller state space X when the ntal belef b s n X. Examples of ths case are provded n Sectons The prevous case s a specal case where X = (Θ ), whch s obvously a belef-closed set. The followng proposton summarzes ths observaton. Proposton 3 Aprofle of path preautomata m =(m 1,...,m N ) s compatble f there exsts a belef-closed set X (Θ ) and W for each such that (1) W (b )= Γ W (b ) for every b X and (2) W (b 0 )=V (m,θ ) (b 0 ) for some b0 X and some θ. 11

12 Another possblty s that W n s strctly ncreasng n each step (at least for some belefs), hence t takes (or seems to take) nfnte teratons to reach the optmal value functon. Now suppose that, n such a case, there s a fnte n and some belef b where W n (b )=V (m,θ ) (b ) (= W 0 (b )). Even f ths holds for a farly large n, however, there s a possblty that n a succeedng step k>nthe value functon shfts upward W k (b ) >V (m,θ ) (b ) and the optmalty of automaton (m, θ ) at belef b s dsproved. How can we verfy the optmalty of a canddate automaton n a fnte step, whenw n does not (seem to) converge to W n any fnte step? The followng proposton addresses ths ssue. To state our proposton, we need to defne the followng concepts. Gven a profle of on-path preautomata m, wedefne a profle of on-path belef closed sets for player as X =(X (θ )) θ Θ, that satsfes θ X (θ ) 4 (Θ ) (X (θ ) can be an empty set) and the followng property: For any θ 0, θ 00,b, ω, f b X (θ 0 ) and T (θ θ 0, ω ) > 0, then χ [f (θ 0 ), ω,b ] X (θ ). Ths means that player s posteror belef never leaves the on-path belef closed sets as long as player does not devate from the specfed acton at each state. Let p (M,X,b ) be the probablty that the posteror belef n the next perod moves outsde of X when the current belef s b and a path-automaton M s played. For any belef-based value functons V and W,defne V W sup b (Θ ) V (b ) W (b ). Proposton 4 (Optmalty Verfcaton n A Fnte Step) Let X =(X (θ )) θ Θ be any profle of on-path belef closed sets of player wth respect to m. Suppose that, at the nth step of the POMDP value teraton, for any θ and b X (θ ), we have W n(b )=V (m,θ ) (b ) and V (m,θ ) (b ) max M M n ½ ¾ V M (b )+p(m,x,b ) δn+1 1 δ W 1 W 0. (4) Then, path automaton (m, θ ) s optmal at all b X (θ ): W (b )=V (m,θ ) (b ) for any b X (θ ) and any θ Θ. Remark 3 Ths proposton s useful when () W n = V (m,θ ) for some belefs but () for some other belefs the value teraton takes (or seems to take) nfnte steps to reach the optmal value functon W. The ntuton of ths proposton s qute smple. The standard theory of dynamc programmng shows that the value functon at the nth teraton s very close to the optmal one, for a large n. In partcular, the optmal value functon s bounded above by W n + 1 δ δn W 1 W 0, and the condton (4) bascally says that any one-shot devaton s unproftable, even when the player can receve ths upper bound of the optmal value off the equlbrum path. 12

13 Proof. Defne U 0 by U 0 (b ) = V (m,θ ) (b ) f b X (θ ) for some θ and U 0 (b )=W ) for every other belef (outsde of X ). (U 0 (b ) s well-defned even when b belongs to more than one sets X (θ ),X (θ 0 ),..., because our premse requres V (m,θ ) (b )=V (m,θ 0 ) (b )= = W n(b ).) We wll show that U 0 = W. We frst show that U 0 (b )=Γ U 0 (b ) f b X (θ ) for some θ. When we denote player s current expected stage payoff gven current acton a and belef b by u (a,b ) and the posteror belef n the next perod by b 0, Γ U 0(b ) s expressed as Γ U 0 (b ) = max u (a,b )+δe[u 0 (b 0 ) a,b ] ª a A = u (a 1,b )+δe[u 0 (b 0 ) a 1,b ]. Snce U 0 canbeexpressedas ( U 0 (b 0 W )= n(b0 ) ( = V (m,θ ) (b 0 ) )fb0 X (θ ) for some θ W (b0 ) otherwse, we have Γ U 0 (b ) = u (a 1,b )+δe[w n (b 0 )+ U 0 (b 0 ) W n (b 0 ) ª a 1,b ]. u (a 1,b )+δe[w n (b 0 ) a 1,b ]+p M,X 0,b δ W W n, where M 0 s any path automaton whose ntal acton s a1.recallthatp(m 0,X,b ) s the probablty that the posteror b 0 moves out of the profle of belef-closed sets ( b 0 / X (θ ) for any θ ) under automaton M 0, so that t only depends on the ntal acton of M 0. Now, by the standard theory of dynamc programmng ( W W n 1 δ δn W 1 W 0 ), we have Γ U 0 (b ) max u (a,b )+δe[w a n (b 0 ) a,b ] ª + p M,X 0 δ n+1,b 1 δ W 1 W 0. Snce the frst term on the rght hand sde s equal to Γ W n we have Γ U 0 (b ) =max M M n V M (b ), ½ ¾ max V M M M n (b )+p(m,x,b ) δn+1 1 δ W 1 W 0. (5) By the premse of the proposton (4), ths s no greater than V (m,θ ) (b ). Snce U 0(b )=V (m,θ ) (b ) for b X (θ ),wehaveestablshedthatγ U 0(b ) U 0(b ) f b X (θ ) for some θ. Now we show Γ U 0(b ) U 0(b ) f b X (θ ) for some θ. Ths follows from the fact that, f b X (θ ) for some θ, Γ U 0 (b ) u (f (θ ),b )+δe[u 0 (b 0 ) f (θ ),b ]=V (m,θ ) (b )=U 0 (b ). 13

14 The frst equalty holds because of the followng reasonng. Suppose that player plays acton f (θ ) today under current belef b X (θ ), and consder the case where the posteror belef n the next perod s b 0 X (θ 0 ). Ths means that tomorrow s state s θ 0 (for a moment, consder the case where (m, θ ) has determnstc transton functon). Therefore, the contnuaton payoff of automaton (m, θ ) s V (m,θ 0 ) (b 0 ). By defnton, ths s equal to U 0 (b0 ) (because b0 X (θ 0 )). Hence, u (f (θ ),b )+ δe[u 0 (b0 ) f (θ ),b ] represents the value assocated wth automaton (m, θ ),sothat t s equal to V (m,θ ) (b ). The case of stochastc transton functon s smlar. 5 Hence, we have shown U 0(b )=Γ U 0(b ) for b X (θ ),forsomeθ. Lastly, for b / X (θ ) for any θ,weshowγ U 0(b )=U 0(b ). ClearlyΓ U 0 (b ) W (b )=U 0 (b ) f b / X (θ ) for any θ,becauseu 0 (b ) W (b ) for every b. Hence, we have U 1 (b )=Γ U 0 (b ) U 0 (b ) for all b,andu n would be a decreasng sequence and lm n U n = W by the theory of dynamc programmng. Therefore, f U 1 (b0 )=Γ U 0 (b0 ) <W (b0 )(=U0 (b )) for some b 0 / X (θ ) for any θ,wewould have a contradcton W (b0 ) U 1 (b0 ) <W (b0 ). Hence,tmustbethecasethat Γ U 0 (b )=U 0 (b ) f b / X (θ ) for any θ. Those arguments have shown that U 0 s a fxed pont of Γ, so we obtan W = U 0 everywhere. In partcular, W (b )=U 0(b )=V (m,θ ) (b ) for any b X (θ ) and any θ Θ. An example of Proposton 4 (Example 4) s provded n Secton A Smpler Verfcaton Method When Off-Path Canddate Automata Are Also Gven Suppose that you have some guess about "comprehensve" path preautomata m and a belef-closed set X,=1,...,N that work. That s, m should nclude not only the automata to be played on the equlbrum path but also all automata used off the equlbrum path. You expect that W 0, whch s generated from m, s a fxed pont on X. We can use POMDP to verfy ths numercally as descrbed above. But there s a smpler, albet equvalent, way to verfy ths. When we have such a "comprehensve" canddate m, ourverfcaton procedure descrbed above bols down to: Belef-Based One-Shot Devaton Prncple: Canddate preautomaton m provdes the best reples aganst m on a belef-closed set X,ftcannotbemproved upon by one-shot extensons. That s, there s no one-shot extenson M of 5 When m has a stochastc transton functon, two staes θ 0 and θ may be possble for a gven posteror belef b 0 (because of the stochastc transton functon). Hence the same posteor belef b 0 may be contaned n X (θ 0 ) and X (θ ). However, the premse n ths proposton guarantees V (m,θ 0 ) (b 0 )=V (m,θ ) (b 0 )(=W n (b 0 )), sothatwecanemploythesameargumentasbefore. 14

15 {(m, θ ) θ Θ } and b X such that V M (b ) >V (m,θ ) (b ) for all θ. Ths statement s equvalent to Γ W 0(b )=W 0(b ) for all b X,wthW 0(b )= max θ V (m,θ ) (b ). In what follows, we show that, nstead of checkng Γ W 0(b )= W 0(b ) for all belefs, we only need to check ths at a fnte number of belefs (ths s essentally because the maxmzed value functon s the upper envelope of lnear functons). Let B θ,0 4 (Θ ) be the set of belefs where the followng holds W 0 (b )=V (m,θ ) (b ). Clearly [ B θ,0 covers 4 (Θ ). Snce each B θ,0 s an ntersecton of a fnte number θ of half spaces (and a hyperplane P θ b (θ )=1), t s a closed convex polyhedron. Take any convex polyhedron D such that X D. Defne a fnte subset of D as follows: n D θ = b D b s an extreme pont of D B θ,0 o, θ Θ. Remember that the objectve functon of the dynamc programmng problem s lnear n belefs. Hence, f we can verfy that Γ W 0(b )=W 0(b ) for every b D θ, whchsjustasetoffnte ponts, then we also obtan Γ W 0(b )=W 0(b ) every where n D B θ,0.e. W 0 s a fxed pont on D ( X ). Fgure 1 llustrates our pont. Ths corresponds to a two-player game, where the canddate path pre automaton of each player has three states, θ 1, θ 2,andθ 3. In the fgure, D B θk,0 s denoted by B k. D s the set of all belefs (= 4 (Θ )) n ths example, and player has canddate optmal plans m, θ 1, m, θ 2,and m, θ 3 [. conssts of seven ponts b 1,..., b 7, and each of them can be easly computed. θ D θ For example, b 3 s a soluton to the system of lnear equaltes V (m,θ 1 ) (b )=V (m,θ 2 ) (b ) V (m,θ 2 ) (b )=V (m,θ 3 ) P (b ). θ b(θ )=1 Let Θ b Θ be the set of states such that B θ,0 ncludes b 0 [ θ D θ. Our argument above can be summarzed as follows: Proposton 5 (Fnte Verfcaton of the Belef-Based One-Shot Devaton Prncple) To verfy Γ W 0 = W 0 =max θ V (m,θ ) on a belef-closed set X,t 15

16 b 1 b 2 b 3 B 1 b 4 B 2 B 3 b 5 b 6 b 7 Fgure 1: s suffcent to check the followng condtons ("no gan from one-shot devatons") at a fnte number of "extreme pont" belefs b [ D θ, θ (1) for each θ Θ b, path automaton (m, θ ) transts to an optmal path automata among m, θ 0 ª, θ 0 Θ after every realzaton of prvate sgnal ω (because (m, θ ) must be optmal gven b ),.e. χ [f(θ ), ω,b ] B θ0,0 for any ω and θ 0 wth T θ 0 θ, ω > 0. (2) for any acton a A that s not played by any θ Θ b, no one-shot extenson (a,m ( )) should generate a larger value than V (m,θ ) (b ) for any θ Θ b Ṡecton 6.2 provdes an example of ths proposton. 5.2 When Can We Fnd a Rght Intal Correlaton Devce? We show that, when a profle of path preautomata s compatble, the exstence of a rght ntal correlaton devce s guaranteed under a mld set of assumptons. Theorem 4 Let m (Θ,f,T ), =1,...,N be the canddate fnte path preautomata, and suppose that () Markov chan nduced by (m 1,...,m N ) on Θ has a unque recurrent communcaton class 6 X Θ () X contans an aperodc state 7 6 AsubsetX Θ s called a recurrent communcaton class f () any two ponts n X are mutually reachable and () no pont θ / X s reachable from any state wthn X. 7 Astatex Θ s aperodc f the greatest common dvsor of {t >0 Pr(θ(t) =x θ(0) = x) > 0} 16

17 and () for all, there s a belef b 0 such that V θ0 (b 0 )=W (b0 ) for some θ0 Θ, where W s the optmal value functon. Then, there s a probablty dstrbuton μ over X such that choosng the ntal states by μ and followng (m 1,...,m N ) s a correlated equlbrum. Example 2 Let us present an example of condtons () and (). Ths s the example n the next secton. Let m be the path preautomaton assocated wth the grm trgger strategy n our prsoner s dlemma game. The state space s Θ = {R, P }, andr and P are the cooperatve and non-cooperatve states respectvely. The Markov chan nduced by (m 1,m 2 ) has a unque recurrent communcaton class X = {(P, P)}, and (P, P) s an aperodc state (because {t >0 Pr(θ(t) =(R, R) θ(0) = (R, R)) > 0} = {1, 2,...}). Remark: A smlar observaton s made n Phelan and Skrzypacz (2009). It seems that, lke us, PS mpose some suffcent condtons on the transton of Markov chan so that there s the unque and globally stable ergodc dstrbuton. Snce the jont dstrbuton on Θ converges to the statonary dstrbuton for any ntal dstrbuton, they can use the statonary dstrbuton as the ntal jont dstrbuton to check f the fnte extended automata to be an equlbrum. Then they check whether one-shot devaton constrant s satsfed at every possble subjectve belef condtonal on each state (but the set of condtonal subjectve belefs s larger n ther settng because they keep track of belefs of on and off the equlbrum path). On the other hand, we frst fnd a belef on whch a gven path automaton s optmal. Then we mpose suffcent condtons for the exstence of unque and globally stable ergodc dstrbuton on Θ andverfythatpathpreautomatonsoptmalnthelmt,.e. when the ergodc dstrbuton s used as a correlaton devce. Proof. Let MC(Θ,m) be the Markov chan nduced by (m 1,...,m N ) on Θ. The theory of Markov chan shows that there s a unque nvarant dstrbuton μ of MC(Θ,m) and () ts support s X and () for any ntal dstrbuton of state profle μ 0 (Θ), the probablty dstrbuton of state profle at tme t, denotedby μ t, converges to μ. By condton () n the Theorem, each player has a belef b 0 under whch automaton (m, θ 0 ) s hs optmal strategy. Let μ 0 be the ntal dstrbuton of Markov chan MC(Θ,m) where player s n state θ 0 and the probablty of other players states s gven by b 0 (θ ). (More precsely, μ 0 (θ 0, θ ) = b 0 (θ ) and μ 0 (θ) =0f θ 6= θ 0.) Let μ t be the probablty dstrbuton of θ(t), gvenμ 0. Then we have lm t μ t = μ (μ s the nvarant dstrbuton wth support X). Snce μ assgns a strctly postve probablty for any θ X, sodoesμ t for all suffcently large t. Hence, for any θ X and any, the condtonal dstrbutons gven θ are well defned for all large t and we have s one. lm t μt (θ θ )=μ(θ θ ) for all θ X. 17

18 Fx any θ X and let H t(θ ) be the set of possble prvate hstores leadng to state θ at tme t: Formally, H t(θ ) s the set of prvate hstores h t such that () θ (0) = θ 0, θ (t) =θ,and()h t realzes wth a postve probablty under Markov chan MC(Θ,m) wth ntal dstrbuton μ 0. The above argument shows that H t(θ ) 6=, for all suffcently large t. Then, the condtonal probablty s well defned for all large t and expressed as μ t Pr(θ(t) =θ) (θ θ )= Pr(θ (t) =θ ) P h = t Ht (θ ) b (θ h t )Pr(ht ) Ph t Ht (θ ) Pr(ht ), (6) where b (θ h t ) s the condtonal probablty of θ (t) =θ gven prvate hstory h t (thssequaltoplayer s belef after ht ). Snce automaton (m, θ 0 ) s optmal for player at t =0, followng t after any prvate hstory whch realzes wth a postve probablty should also be optmal. In partcular, after prvate hstory h t Ht (θ ) player s n state θ and therefore followng automaton (m, θ ) must be optmal after h t. Note also that player has belef b ( h t ) after ht. Those facts, taken together, mply that automaton (m, θ ) s optmal under belef b ( h t ), for any ht Ht (θ ). Now let B θ be the set of belefs over θ under whch automaton (m, θ ) s optmal for player. Snce () B θ = {b W (b )=V θ (b )} () the optmal value functon W s a contnuous convex functon and () V θ ( ) s a lnear functon, B θ s a closed convex set (ths may be stated as a lemma elsewhere. Also note that a drect proof, based on the lnearty of expected payoff wth respect to belefs, s easy.). Equaton (6) and our argument n the prevous paragraph show that μ t ( θ ) s a convex combnaton of the belefs (.e., b (θ h t ), ht Ht (θ )) nb θ. By the convexty of B θ,wehave μ t ( θ ) B θ for all t t. By the closedness of B θ and lm t μ t ( θ )=μ( θ ) mples μ( θ ) B θ for all θ X. Ths means that the jont dstrbuton μ over ntal state profle θ X nduces a correlated equlbrum. 18

19 6 Examples 6.1 Example 1 We apply the POMDP technque to the prsoner s dlemma model analyzed by Sekguch (1997) and Bhaskar and Obara (2002). The stage payoff s gven by C D C 1, 1 1+g, l D l,1+g 0, 0 Each player s prvate sgnal s ω = c, d, whch s a nosy observaton of the opponent s acton. For example, when the opponent chose C, player s more lkely to receve the correct sgnal ω = c, but sometmes an observaton error provdes a wrong sgnal ω = d. More precsely, we assume that exactly one player receves a wrong sgnal s ε > 0 and both receve wrong sgnals wth probablty ξ > 0. For example, when acton profle s (C, C), the jont dstrbuton of prvate sgnals s c d c 1 2ε ξ ε d ε ξ Hence n ths model we have fve parameters g, l, ε, ξ, and δ. In Example 1, we assume that ε =(1 r)r, ξ = r 2 (ndependent observaton errors), r =1/6, g =1, l =1,andδ =0.9. The canddate path preautomaton m corresponds to the grm trgger strategy and s shown n the followng fgure.we show that ths preautomaton consttutes a correlated equlbrum, gven some ntal dstrbuton of states. Frst, we determne v θ θ j,player s payoff under state profle (θ, θ j ). Ths can be done by fndng a soluton to the system of equatons The soluton shows v RR =1+δ (1 2ε ξ)v RR + εv RP + εv PR + ξv PPª v RP = l + δ (ε + ξ)v RP +(1 ε ξ)v PPª v PR =(1+g)+δ (ε + ξ)v PR +(1 ε ξ)v PPª. (7) v PP =0+δv PP v RR = ,v RP = ,v PR = ,v PP =0. Let b denote player s belef that j s n state R, andletv θ (b) the expected payoff to player when hs state s θ = R, P. 8 Then, the graphs of V θ (b) = bv θ R +(1 b)v θ P, θ = R, P are gven n the followng fgure. The horzontal axs measures b. 8 In our general notaton, v θ θ j = v (m,θ )θ j and V θ = V (m,θ ). 19

20 = c,d = c P acton = D = d R acton = C Fgure 2: y x From top to bottom (on the y axs) V P (blue) and V R (black). The ntal value functon n the POMDP teraton W 0 s the upper envelope of those two lnear functons (W 0 (b) =max{v R (b),v P (b)}). In the frst step of POMDP, we consder one-shot extensons of path automata {(m,r), (m,p)}. Let M a zz 0 (a = C, D and z,z 0 = P, R) be the one-shot ex- 20

21 tenson that () starts wth a state wth acton a, () moves to state z of m after ω = c, and () moves to state z 0 of m after ω = d. For example, M CRP starts wth acton C and goes to state R (to play the grm trgger strategy) after ω = c and moves on to P (to play permanent defecton) after ω = d. It s easy to see that M CRP actually mplements the same strategy as (m,r) (the grm trgger strategy). To perform the cream skmmng of those one-shot extensons, lookng at the belef dynamcs s useful. Let χ a ω (b) denote the posteror probablty of θ j = R when the current acton, sgnal, belef of player are a, ω,andb. By Bayes rule we have b(1 2ε ξ) χ Cc (b) = b(1 ε ξ)+(1 b)(ε + ξ), χ Cd (b) = χ Dc (b) = bε b(ε + ξ)+(1 b)(1 ε ξ), bε b(1 ε ξ)+(1 b)(ε + ξ), and bξ χ Dd (b) = b(ε + ξ)+(1 b)(1 ε ξ). For our parameter values n ths example, t can be checked that χ ad (b) < χ ac (b) for all b and a = C, D (bad sgnal d always reduces the probablty that the opponent s stll cooperatng). Let b be the pont where V R and V P ntersect. Snce ½ V W 0 (b) f b b (b) = V P (b) f b b, W 0 (χ ac (b)) = V P (χ ac (b)) mples W 0 (χ ad (b)) = V P (χ ad (b)). Ths means that, for any b, the maxmum value to acheve ΓW 0 (b) n the Bellman equaton (2) s not acheved by M CPR or M DPR. Hence, for any b, ΓW 0 (b) s the assocated wth one of the followng sx automata: M CRR,M CRP,M CPP,M DRR,M DRP,M DPP. Note that M CRP and M DPP mplement the same strateges as M R (the grm trgger strategy) and M P (permanent defecton). Let V a zz 0 (b) be the expected payoff to player, when he plays ths automaton M a zz 0 under belef b. Ths s a lnear functon n belef b: V a zz 0 (b) =bv a zz 0,R +(1 b)v a zz 0,P, (8) where v a zz 0,R s the payoff under automaton M a zz 0 when the opponent plays (m j,r) (v a zz 0,P s defned smlarly). To compute v a zz 0,θ j,wecanjustusev R (b), V P (b) and belef functons χ a ω (b). For example, v Czz0,R (zz 0 = RR, P P ) s determned by ³ v Czz0,R = g 1 (C, C)+δ V z (χ Cc (1)) Pr(ω = c CC)+V z0 (χ Cd (1)) Pr(ω = d CC). 21

22 Computaton shows v CRR,R = ,v CPP,R = ,v CRR,P = ,v CPP,P = 1, v DRR,R = ,v DRP,R =1.7059,v DRR,P = ,v DRP,P = Then, the new value functon W 1 (b) s the upper envelope of sx lnear functons V a zz 0 (b), a zz 0 = CRR,CRP,CPP,DRR,DRP,DPP. It s gven by the followng fgure. y x -1-2 From top to bottom (on the y axs) V P = V DPP (blue), V DRP (green dotted), V CPP (pnk dotted), V DRR (brown dotted), V R = V CRP (black), and V CRR (red). POMDP to compute W 1 The upper envelope W 1 s almost the same as the orgnal value functon W 0 (= the upper envelope of the blue and black lnes). Note that the red lne (V CRR ) concdes wth W 1 only for very hgh belef b [ , 1], andonecancheck that χ a ω (b) does not le n ths regon for any a, ω and b. Ths mples that W 1 (χ a ω (b)) = W 0 (χ a ω (b)) for all a, ω,andb, whchshowsγw 1 = ΓW 0 = W 1. Hence POMDP converged to the optmal value functon W = W 1 on Θ =[0, 1] n two steps. Note that we can fnd a fxed pont one step earler f we restrct attenton to a belef-closed set X =[0, ( )], where s a fxed pont of χ Cc n (0, 1). 22

23 If the current belef s n X,then every posteror belef s n X gven any (a, ω ) because there s always a postve probablty that a bad sgnal s observed even when C s played. Snce V CRR matters only for b = , W 0 s the fxed pont on X. Concluson: The above fgure and our computaton show 1. the optmal value functon W concdes wth V P for b [0, 0.625] 2. the optmal value functon W concdes wth V R for b [0.625, ] Hence, when a correlaton devce can generate ntal belefs b R [0.625, ] and b P [0, 0.625], we get a correlated equlbrum where players use automata M R (the grm trgger strategy) and M p (permanent defecton). One such equlbrum s the mxed strategy equlbrum dentfed by Bhaskar and Obara, n whch the ntal belef s gven by (where the player s ndfferent between M R and M P ). 6.2 Example 1 - A Smpler Verfcaton The prevous secton llustrates how POMDP works, but for ths partcular example, there s a much smpler verfcaton of equlbrum condtons, based on Proposton 5(Fnte Verfcaton of the Belef-Based One-Shot Devaton Prncple). Let b c = ( ) be the fxed pont of χ Cc. Then, one can see that X =[0,b c ] s a belef-closed set. The reason s the followng. If we start wth b XÂ{0} and play C and observe ω = c repeatedly, the posteror belef ncreases and approaches the fxed pont b c. Ths s because the graph of posteror belef functon χ Cc s ncreasng and cross the 45 o degree lne at b c from above (.e., b c s a stable fxed pont). If current acton and sgnal s not (C, c), the posteror belef goes down. Fnally, b =0s absorbng. Hence X s belef-closed. We wll show that ΓW 0 = W 0 =max θ =R,P V θ on X. Proposton 5 shows that we only need check ths at three ponts, 0,b (the belef at whch V P and V R cross), and b c. 23

24 Checkng ΓW 0 (0) = W 0 (0) s trval, because b =0means that the opponent s playng D forever and the best reply s permanent defecton. To check ΓW 0 (b )= W 0 (b ), we check condton (1) n Proposton 5. If D s played today, the posteror belef s always less than b and the optmal contnuaton (accordng to W 0 )s permanent defecton. Ths s exactly what automaton (m,p) specfes. Now suppose C s played today. If the prvate sgnal s c, the posteror belef moves up (t s above b ), and the optmal contnuaton (accordng to W 0 ) s grm trgger. If the prvate sgnal s d, the posteror belef goes down (t s below b ), and the optmal contnuaton (accordng to W 0 ) s permanent defecton. Agan, ths s exactly what automaton (m,r) specfes. Hence condton (1) of Proposton 5 s satsfed. Lastly, let us check ΓW 0 (b c )=W 0 (b c ). We know χ Cc (b c )=b c and calculaton shows χ Cd (b c ) <b, so condton (1) s satsfed. When D s played today, χ Dc (b c ) and χ Dd (b c ) are both below b, and therefore the maxmum value of one-shot extenson whch plays D s gven by V P (b c ) (permanent defecton: the blue lne n the fgure). Snce ths s less than V R (b c ), the condton (2) of Proposton 5 s satsfed. Hence, the analyss n Sekguch (1997) and Bhaskar and Obara (2002) can be much smplfed as above, accordng to our general belef-based methodology. 6.3 Example 2 Modfy the stage game of the prevous example as follows: C C0 D C 1, 1 1, 1 ε l, 1+g C 0 1 ε, 1 1 ε, 1 ε l ε, 1+g D 1+g, l 1+g, l ε 0, 0 You can nterpret C 0 as C plus some montorng actvty that costs ε > 0. If a player chooses C 0, then he can observe the other player s acton perfectly. (Ths volates 24

25 our full support assumpton, but the analyss below remans essentally the same even when we assume that a player wth acton C 0 can observe the other player s acton almost perfectly.) From the other player, C and C 0 are ndstngushable,.e. the other player observes c wth probablty 1 r and d wth probablty r whether C or C 0 s played. We show that the grm trgger strategy stll consttutes a correlated sequental equlbrum for some ntal dstrbuton. Let s use the same parameter values for g, l, r and δ as before. Let V R and V P be the value functon assocated wth R and P as before. In the frst step of POMDP, we need to consder eght one-shot extensons ths tme: M a,z,z 0,a {C, C 0,D},z,z 0 {R, P }. Agan we don t have to consder a combnaton of (z,z 0)=(R, P ). Let V a,z,z 0 be the dscounted payoff functon assocated wth M a,z,z 0. We already know that V CPP,V DRR,V DRP are domnated. We have three new payoff functons because of C 0 : V C0RR,V C0RP,V C0PP. It s easy to show that we only need to consder V C0RP (remember that montorng s perfect gven C 0 ). How does ths new functon V C0RP affect the updated value functon? Suppose that ε =0for the tme beng. Then M C0RP domnates M CRR and M CRP because C 0 generates the same expected payoff as C n the current perod, but more nformatve than C. So V C0RP s above V CRR or V CRP. Only at b =1,V C0RP concdes wth V CRR because the player s sgnal does not convey any nformaton about the the other player s current state (whch he knows to be R). Thus V C0RP looks as follows. 9 9 Note that v C0 RP,R = v CRP,R and v C0 RP,P = l = 1. Hence (wth ε =0), V C0 RP (b) =bv C0 RP,R +(1 b)v C0 RP,P =3.1176b (1 b). 25

Finite State Equilibria in Dynamic Games

Finite State Equilibria in Dynamic Games Fnte State Equlbra n Dynamc Games Mchhro Kandor Faculty of Economcs Unversty of Tokyo Ichro Obara Department of Economcs UCLA June 21, 2007 Abstract An equlbrum n an nfnte horzon game s called a fnte state

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

Tit-For-Tat Equilibria in Discounted Repeated Games with. Private Monitoring

Tit-For-Tat Equilibria in Discounted Repeated Games with. Private Monitoring 1 Tt-For-Tat Equlbra n Dscounted Repeated Games wth Prvate Montorng Htosh Matsushma 1 Department of Economcs, Unversty of Tokyo 2 Aprl 24, 2007 Abstract We nvestgate nfntely repeated games wth mperfect

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

The Folk Theorem for Games with Private Almost-Perfect Monitoring

The Folk Theorem for Games with Private Almost-Perfect Monitoring The Folk Theorem for Games wth Prvate Almost-Perfect Montorng Johannes Hörner y Wojcech Olszewsk z October 2005 Abstract We prove the folk theorem for dscounted repeated games under prvate, almost-perfect

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Coordination failure in repeated games with almost-public monitoring

Coordination failure in repeated games with almost-public monitoring Theoretcal Economcs 1 (2006), 311 340 1555-7561/20060311 Coordnaton falure n repeated games wth almost-publc montorng GEORGE J. MAILATH Department of Economcs, Unversty of Pennsylvana STEPHEN MORRIS Department

More information

Perfect Competition and the Nash Bargaining Solution

Perfect Competition and the Nash Bargaining Solution Perfect Competton and the Nash Barganng Soluton Renhard John Department of Economcs Unversty of Bonn Adenauerallee 24-42 53113 Bonn, Germany emal: rohn@un-bonn.de May 2005 Abstract For a lnear exchange

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Online Appendix. t=1 (p t w)q t. Then the first order condition shows that

Online Appendix. t=1 (p t w)q t. Then the first order condition shows that Artcle forthcomng to ; manuscrpt no (Please, provde the manuscrpt number!) 1 Onlne Appendx Appendx E: Proofs Proof of Proposton 1 Frst we derve the equlbrum when the manufacturer does not vertcally ntegrate

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

k t+1 + c t A t k t, t=0

k t+1 + c t A t k t, t=0 Macro II (UC3M, MA/PhD Econ) Professor: Matthas Kredler Fnal Exam 6 May 208 You have 50 mnutes to complete the exam There are 80 ponts n total The exam has 4 pages If somethng n the queston s unclear,

More information

Portfolios with Trading Constraints and Payout Restrictions

Portfolios with Trading Constraints and Payout Restrictions Portfolos wth Tradng Constrants and Payout Restrctons John R. Brge Northwestern Unversty (ont wor wth Chrs Donohue Xaodong Xu and Gongyun Zhao) 1 General Problem (Very) long-term nvestor (eample: unversty

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

The RepeatedPrisoners DilemmawithPrivate Monitoring: a N-player case

The RepeatedPrisoners DilemmawithPrivate Monitoring: a N-player case The RepeatedPrsoners DlemmawthPrvate Montorng: a N-player case Ichro Obara Department of Economcs Unversty of Pennsylvana obara@ssc.upenn.edu Frst verson: December, 1998 Ths verson: Aprl, 2000 Abstract

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India February 2008

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India February 2008 Game Theory Lecture Notes By Y. Narahar Department of Computer Scence and Automaton Indan Insttute of Scence Bangalore, Inda February 2008 Chapter 10: Two Person Zero Sum Games Note: Ths s a only a draft

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

a b a In case b 0, a being divisible by b is the same as to say that

a b a In case b 0, a being divisible by b is the same as to say that Secton 6.2 Dvsblty among the ntegers An nteger a ε s dvsble by b ε f there s an nteger c ε such that a = bc. Note that s dvsble by any nteger b, snce = b. On the other hand, a s dvsble by only f a = :

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space. Lnear, affne, and convex sets and hulls In the sequel, unless otherwse specfed, X wll denote a real vector space. Lnes and segments. Gven two ponts x, y X, we defne xy = {x + t(y x) : t R} = {(1 t)x +

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

The equation of motion of a dynamical system is given by a set of differential equations. That is (1)

The equation of motion of a dynamical system is given by a set of differential equations. That is (1) Dynamcal Systems Many engneerng and natural systems are dynamcal systems. For example a pendulum s a dynamcal system. State l The state of the dynamcal system specfes t condtons. For a pendulum n the absence

More information

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION Advanced Mathematcal Models & Applcatons Vol.3, No.3, 2018, pp.215-222 ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EUATION

More information

6. Stochastic processes (2)

6. Stochastic processes (2) 6. Stochastc processes () Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 6. Stochastc processes () Contents Markov processes Brth-death processes 6. Stochastc processes () Markov process

More information

6. Stochastic processes (2)

6. Stochastic processes (2) Contents Markov processes Brth-death processes Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 Markov process Consder a contnuous-tme and dscrete-state stochastc process X(t) wth state space

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Endogenous timing in a mixed oligopoly consisting of a single public firm and foreign competitors. Abstract

Endogenous timing in a mixed oligopoly consisting of a single public firm and foreign competitors. Abstract Endogenous tmng n a mxed olgopoly consstng o a sngle publc rm and oregn compettors Yuanzhu Lu Chna Economcs and Management Academy, Central Unversty o Fnance and Economcs Abstract We nvestgate endogenous

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

A 2D Bounded Linear Program (H,c) 2D Linear Programming

A 2D Bounded Linear Program (H,c) 2D Linear Programming A 2D Bounded Lnear Program (H,c) h 3 v h 8 h 5 c h 4 h h 6 h 7 h 2 2D Lnear Programmng C s a polygonal regon, the ntersecton of n halfplanes. (H, c) s nfeasble, as C s empty. Feasble regon C s unbounded

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Learning from Private Information in Noisy Repeated Games

Learning from Private Information in Noisy Repeated Games Learnng from Prvate Informaton n Nosy Repeated Games Drew Fudenberg and Yuch Yamamoto Frst verson: February 18, 2009 Ths verson: November 3, 2010 Abstract We study the perfect type-contngently publc ex-post

More information

On Tacit Collusion among Asymmetric Firms in Bertrand Competition

On Tacit Collusion among Asymmetric Firms in Bertrand Competition On Tact Colluson among Asymmetrc Frms n Bertrand Competton Ichro Obara Department of Economcs UCLA Federco Zncenko Department of Economcs UCLA November 11, 2011 Abstract Ths paper studes a model of repeated

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

(1 ) (1 ) 0 (1 ) (1 ) 0

(1 ) (1 ) 0 (1 ) (1 ) 0 Appendx A Appendx A contans proofs for resubmsson "Contractng Informaton Securty n the Presence of Double oral Hazard" Proof of Lemma 1: Assume that, to the contrary, BS efforts are achevable under a blateral

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Subjective Uncertainty Over Behavior Strategies: A Correction

Subjective Uncertainty Over Behavior Strategies: A Correction Subjectve Uncertanty Over Behavor Strateges: A Correcton The Harvard communty has made ths artcle openly avalable. Please share how ths access benefts you. Your story matters. Ctaton Publshed Verson Accessed

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

2.3 Nilpotent endomorphisms

2.3 Nilpotent endomorphisms s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Implementation and Detection

Implementation and Detection 1 December 18 2014 Implementaton and Detecton Htosh Matsushma Department of Economcs Unversty of Tokyo 2 Ths paper consders mplementaton of scf: Mechansm Desgn wth Unqueness CP attempts to mplement scf

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper Games of Threats Elon Kohlberg Abraham Neyman Workng Paper 18-023 Games of Threats Elon Kohlberg Harvard Busness School Abraham Neyman The Hebrew Unversty of Jerusalem Workng Paper 18-023 Copyrght 2017

More information

Random Walks on Digraphs

Random Walks on Digraphs Random Walks on Dgraphs J. J. P. Veerman October 23, 27 Introducton Let V = {, n} be a vertex set and S a non-negatve row-stochastc matrx (.e. rows sum to ). V and S defne a dgraph G = G(V, S) and a drected

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

We Can Cooperate Even When the Monitoring Structure Will Never Be Known

We Can Cooperate Even When the Monitoring Structure Will Never Be Known We Can Cooperate Even When the Montorng Structure Wll Never Be Known Yuch Yamamoto Frst Draft: March 29, 2014 Ths Verson: Aprl 8, 2017 Abstract Ths paper consders nfnte-horzon stochastc games wth hdden

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness.

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness. 20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The frst dea s connectedness. Essentally, we want to say that a space cannot be decomposed

More information

Infinitely Split Nash Equilibrium Problems in Repeated Games

Infinitely Split Nash Equilibrium Problems in Repeated Games Infntely Splt ash Equlbrum Problems n Repeated Games Jnlu L Department of Mathematcs Shawnee State Unversty Portsmouth, Oho 4566 USA Abstract In ths paper, we ntroduce the concept of nfntely splt ash equlbrum

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Edge Isoperimetric Inequalities

Edge Isoperimetric Inequalities November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Lecture 17 : Stochastic Processes II

Lecture 17 : Stochastic Processes II : Stochastc Processes II 1 Contnuous-tme stochastc process So far we have studed dscrete-tme stochastc processes. We studed the concept of Makov chans and martngales, tme seres analyss, and regresson analyss

More information

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle

More information

Canonical transformations

Canonical transformations Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,

More information

First day August 1, Problems and Solutions

First day August 1, Problems and Solutions FOURTH INTERNATIONAL COMPETITION FOR UNIVERSITY STUDENTS IN MATHEMATICS July 30 August 4, 997, Plovdv, BULGARIA Frst day August, 997 Problems and Solutons Problem. Let {ε n } n= be a sequence of postve

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Common Learning and Cooperation in Repeated Games

Common Learning and Cooperation in Repeated Games Common Learnng and Cooperaton n Repeated Games Takuo Sugaya and Yuch Yamamoto Frst Draft: December 10, 2012 Ths Verson: September 29, 2018 Abstract We study repeated games n whch players learn the unknown

More information

A note on the one-deviation property in extensive form games

A note on the one-deviation property in extensive form games Games and Economc Behavor 40 (2002) 322 338 www.academcpress.com Note A note on the one-devaton property n extensve form games Andrés Perea Departamento de Economía, Unversdad Carlos III de Madrd, Calle

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Calculation of time complexity (3%)

Calculation of time complexity (3%) Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

Appendix B. Criterion of Riemann-Stieltjes Integrability

Appendix B. Criterion of Riemann-Stieltjes Integrability Appendx B. Crteron of Remann-Steltes Integrablty Ths note s complementary to [R, Ch. 6] and [T, Sec. 3.5]. The man result of ths note s Theorem B.3, whch provdes the necessary and suffcent condtons for

More information

A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION. Contents

A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION. Contents A SURVEY OF PROPERTIES OF FINITE HORIZON DIFFERENTIAL GAMES UNDER ISAACS CONDITION BOTAO WU Abstract. In ths paper, we attempt to answer the followng questons about dfferental games: 1) when does a two-player,

More information

Maximizing the number of nonnegative subsets

Maximizing the number of nonnegative subsets Maxmzng the number of nonnegatve subsets Noga Alon Hao Huang December 1, 213 Abstract Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what s the maxmum

More information

Lecture Space-Bounded Derandomization

Lecture Space-Bounded Derandomization Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval

More information

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting Onlne Appendx to: Axomatzaton and measurement of Quas-hyperbolc Dscountng José Lus Montel Olea Tomasz Strzaleck 1 Sample Selecton As dscussed before our ntal sample conssts of two groups of subjects. Group

More information

Week 2. This week, we covered operations on sets and cardinality.

Week 2. This week, we covered operations on sets and cardinality. Week 2 Ths week, we covered operatons on sets and cardnalty. Defnton 0.1 (Correspondence). A correspondence between two sets A and B s a set S contaned n A B = {(a, b) a A, b B}. A correspondence from

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information