Nash and Correlated Equilibria for Pursuit-Evasion Games Under Lack of Common Knowledge

Size: px
Start display at page:

Download "Nash and Correlated Equilibria for Pursuit-Evasion Games Under Lack of Common Knowledge"

Transcription

1 Nash and Correlated Equilibria or Pursuit-Evasion Games Under Lack o Common Knowledge Daniel T. Larsson Georgios Kotsalis Panagiotis Tsiotras Abstract The majority o work in pursuit-evasion games assumes perectly rational players who are omnipotent and have complete knowledge o the environment and the capabilities o other agents and, consequently, are correct in their assumption o the game that is played. This is rarely the case in practice. More oten than not, the players have dierent knowledge about the environment either because o sensing limitations or because o prior experience. In this paper, we wish to relax this assumption and consider pursuit-evasion games in a stochastic setting, where the players involved in the game have dierent perspectives regarding the transition probabilities that govern the world dynamics. We show the existence o a (Nash) equilibrium in this setting and discuss the computational aspects obtaining such an equilibrium. We also investigate a relaxation o this problem employing the notion o correlated equilibria. Finally, we demonstrate the approach using a gridworld example with two players in the presence o obstacles. I. INTRODUCTION Game theory studies multi-agent decision problems in which the payo o each player depends not only on its own actions, but also on the actions o the other player(s). Traditionally, pursuit-evasion games have been studied in the context o dierential games [1] [3]. In these ormulations, it is typically assumed that both players are in agreement in regards to the environment in which their interactions take place. Moreover, it is also assumed that the two players not only mutually agree about the environment they operate in, but they are also aware o this agreement, a situation that is reerred to in the literature as common knowledge [4]. In pursuit-evasion problems this leads to modeling the strategic interaction between the two players as a zero-sum dierential game. We would like to progressively relax this assumption. We start by investigating the case when the two players have potentially dierent perceptions o the evolution o the overall system. This implies that the equilibrium calculations or each player may be perormed under potentially erroneous assumptions about the environment. In order to illustrate the peculiarities o the proposed problem ormulation, we investigate a pursuit-evasion problem where the two players have dierent perceptions o the transition probabilities o the overall system, in the context D. Larsson is a PhD student with the D. Guggenheim School o Aerospace Engineering, Georgia Institute o Technology, Atlanta, GA, , USA. daniel.larsson@gatech.edu G. Kotsalis is with the H. Milton Stewart School o Industrial & Systems Engineering, Georgia Institute o Technology, Atlanta, GA, , USA. gkotsalis3@gatech.edu P. Tsiotras is a Proessor with the D. Guggenheim School o Aerospace Engineering and the Institute or Robotics and Intelligent Machines, Georgia Institute o Technology, Atlanta, GA, , USA. tsiotras@gatech.edu o a stochastic game [5]. Stochastic games provide a discrete analog o dierential games and oer a natural ramework to study pursuit-evasion problems. The term stochastic game reers to the scenario where multiple players interact in a dynamic probabilistic environment comprising o a inite number o states where each o the players have initely many actions at their disposal. The motivation or studying this problem stems rom situations o asymmetric and imprecise inormation on the underlying characteristics o the game as a result o deception [6], insuicient measurement capabilities [7] or erroneous modeling assumptions about the environment. Consider, or example, a pursuit-evasion scenario between two small UAVs in the presence o an external wind ield. The presence o the wind ield may have a large impact on the ensuing vehicle trajectories [8] [11]. For a pursuit-evasion problem, accurate knowledge, (or lack thereo) will thereby impact the outcome o the game. Depending on the on-board sensors and the individual player s modeling assumptions, every player is aced with a problem where each player s perception about the environment (and, subsequently, each other s dynamics) may dier. This discrepancy on each player s belie about the true state o the world is called lack o common knowledge. In our work, we investigate the impact o lack o common knowledge to a pursuit-evasion game in a stochastic setting. We show that lack o common knowledge leads naturally to a non-zero-sum setting even i the reward structure or both players initially leads to a zero-sum game (under common knowledge). We show the existence o an equilibrium in a such case, and provide a discussion regarding the numerical aspects o computing equilibria or such general (that is, nonzero-sum) games in Section V. Finally, we present a simple two-player pursuit-evasion game where the players do not share the same understanding o the world dynamics. II. NOTATION AND PRELIMINARIES The set o non-negative integers is denoted by Z 0, the set o positive integers by Z +, the set o real numbers by R and the set o non-negative reals by R +. For n Z + let R n denote the Euclidean n-space and R n + the non-negative orthant in R n. For n, m Z + let R n m denote the space o n-by-m real matrices. Typically, vectors and matrices will be written with boldace letters. The transpose o the column vector x R n is denoted by x T and or i Z +, [x] i reers to the i-th entry o x. Similarly or i, j Z +, [A] ij reers to the (i, j)-th entry o the matrix A R n m. The vector o 1 s in R n is denoted by 1 n = [ 1 1 ] T. n = {x R n n + i=1 [x] i = 1} denotes the simplex

2 in R n. Given x R n, x = max i {1,...,n} [x] i, and x = ( n ) 1 i=1 [x] i. Similarly or : {1,..., n} R, = max i {1,...,n} (i) and x, y R n, x, y denotes the inner product o two vectors. For vectors representing joint probability distributions x j,k = Pr {J = j & K = k} or random variables J, K. For payo matrices A i R n m, A i j,k is the entry corresponding to Player i playing action k when all others play action j. For a given set A, the set o probability measures on A will be denoted by P[A]. The power set, (the set o all subsets) o A, will be denoted by A. III. PROBLEM FORMULATION The notation and terminology used in this work is taken mainly rom [5]. We consider a discrete time, ininite-horizon dierential game with two players, indexed by i I = {1, }. The game evolves on the inite state space S = {1,..., N}, where N Z +, N. We will reer to discrete time instances as stages. At each stage t Z 0 the system occupies a state, say, s t S and player i chooses an action a i t rom the set A i = {1,..., m i }. For notational simplicity, we assume that the action sets are not state dependent. The sample space is Ω = (S A 1 A ), and its sigma-algebra is denoted by Σ = Ω. For t Z 0, i I we deine the maps S t : Ω S and A i t : Ω A, where or ω = (ω 1, ω,...) = ((s 0, a 1 0, a 0), (s 1, a 1 1, a 1),...) such that the relationships S t (ω) = s t and A i t(ω) = a i t hold. We assume that at each stage t Z 0, and or all ω Ω, both players have access to the ull state S t (ω) S o the system. We restrict ourselves to randomized Markovian strategies [1]. This means that when the system occupies the state s S each player i I will select a probability distribution on A i that depends on previous system states and actions only through the current state. Formally, the strategy o player i I is given by the map i : S A i R + where, or all s S, one has i (s, a i ) = 1. (1) a i A i The set o all strategies o player i I will be denoted by F i. For each i I, and every i F i, let us write i = [1 it,..., N it where or s S, i ]T s = [ i (s, 1),..., i (s, m 1 )] T. Each player i I has its own perception about the evolution o the system, encoded in the transition map p i : S S A 1 A R +. In particular, player i I believes that or each (s, a 1, a ) S A 1 A the system will make a transition to the state s S with probability p i (s, s, a 1, a ). Given a pair o strategies = ( 1, ) F = F 1 F, () the evolution o the system on S is Markovian. For a given initial condition s 0 S and a ixed strategy pair F, the measurable space (Ω, Σ) will be equipped with two measures, namely, µ i,s 0, i = 1,, as ollows. Given t Z 0, consider the path ((s 0, a 1 0, a 0), (s 1, a 1 1, a 1),..., (s t, a 1 t, a t )) (S A 1 A ) t+1 and let O Σ denote the corresponding cylinder set, O = {ω Ω S k (ω) = s k, A i k (ω) = ai k, i I, 0 k t}. Then, or i I, and F, one has ( t µ i,s 0 [O] = 1 (s k 1, a 1 k 1) (s k 1, a k 1) k=1 ) p i (s k, s k 1, a 1 k 1, a k 1) 1 (s t, a 1 t ) (s t, a t ). Let r i : S A 1 A R, denote the reward map o player i I. For each choice (a 1, a ) A 1 A at state s S, player i will collect an immediate reward r i (s, a 1, a ). For the class o dierential games we have in mind (e.g., pursuitevasion games) the players are in a purely antagonistic situation. We will thereore assume that r 1 (s, a 1, a ) = r (s, a 1, a ) or all (s, a 1, a ) S A 1 A. For t Z 0, i I, deine R i t : Ω R, as ollows R i t(ω) = r i (S t (ω), A 1 t (ω), A t (ω)). (3) We may write E i,s 0 [R i t] = E i [Ri t S 0 = s 0 ], to denote the expected value o the reward R i t o player i I at stage t Z 0 or a given strategy pair F, initial state s 0 S under measure µ i,s 0. Let β [0, 1) denote a discount actor. For a ixed strategy F we denote the discounted value map o player i, by v i : S R, where or s 0 S, v i (s 0 ) = β t E i,s 0 [Rt]. i (4) t=0 The vector o discounted values or player i I o the streams o expected rewards resulting rom the use o the strategy pair F or every initial state s 0 S, is then given by v i = [vi (1),..., vi, with joint Q-values or (N)]T each player deined by [5], [13] [19] [ Q i (s, v i ) ] = a 1,a [ r i (s, a 1, a ) + β ] p i (s, s, a 1, a )v i (s (5) ). s S The joint Q-values deined in (5) will serve as single-stage payo matrices when ormulating optimization problems or computing equilibrium strategies, as discussed in Section V. We will reer to the collection o objects G = {β, I, S, A 1, A, r 1, r, p 1, p } as a discounted stochastic game. Note that i we assume, as we do in this paper, that p 1 p, then G is a stochastic game with lack o common knowledge. At this point we need to establish which pair = ( 1, ) F is a solution to the stochastic game G. Deinition 3.1: A pair o strategies o = ( 1,o,,o ) F is a Nash equilibrium o the stochastic game G i v 1 ( 1,,o ) v 1 ( 1,o,,o ), 1 F 1, (6a) v ( 1,o, ) v ( 1,o,,o ), F. (6b) The above inequalities are understood component-wise with respect to the partial order induced by the non-negative orthant R N +. The solution concept considered is that o a Nash equilibrium or a non-zero-sum stochastic game. When p 1 = p the stochastic game G reduces to a traditional zero-sum stochastic game and the two sets o

3 inequalities (6) reduce to a single set o saddle-point inequalities v ( 1,,o ) v ( 1,o,,o ) v ( 1,o, ), F, where or F, it holds v = v 1 = v. We irst turn our attention to showing the existence o an equilibrium in the sense o (6). Remark: For the two player zero-sum case, it can be shown [5], [18] that v i o(s) = Nashi (Q 1 o(s, v1 o), Q o(s, v o)) i I, (7) where Nash i ( ) is the expected payo to player i when players use o F. The computation o this value and other aspects o the problem are discussed in Section V. IV. EXISTENCE OF EQUILIBRIA The equilibria and the corresponding strategies are computed via the calculation o the best response maps or each agent. To this end, and or each i I, let i stand or the opponent o player i, i.e. i I\{i}. Deinition 4.1: The best response map or each player i I is the set-valued map deined by B i : F i F i, where, or i F i, B i ( i ) = arg max i F vi i. (8) The best response map or each player can be easily computed by noticing that, given i F i, agent i is aced with a one-sided optimization problem, which is a Markov decision process (MDP). To see this, let us take the perspective o Player 1, keeping in mind that the analysis or Player is completely analogous. Accordingly, let us ix F and deine the transition map p 1 : S S A 1 R +, where or (s, s, a 1 ) S S A 1, p 1 (s, s, a 1 ) = p 1 (s, s, a 1, a ) (s, a ). (9) a A The immediate expected reward map or Player 1 as a unction o the randomized strategy o Player is r 1 : S A 1 R, where or (s, a 1 ) S A 1, r 1 (s, a1 ) = r 1 (s, a 1, a ) (s, a ). (10) a A The collection o objects M 1 = {β, S, A 1, p 1, r 1 } is reerred to as the MDP aced by Player 1 when Player employs the randomized strategy F. Given M 1 let v 1,o = max 1 F 1 v1. (11) Each element 1 B 1 ( ) then satisies the relation v 1 = v 1,o. The above optimization problem is standard in the theory o Markov decision processes and it is well posed or every β [0, 1). Deine now the composite response map or all players as B : F F, where or F, [ ] B B() = 1 ( ) B ( 1. (1) ) The map B denotes the best response map or the game. In view o (6), the existence o a ixed point o the map B is equivalent to the existence o an equilibrium or the given stochastic game G. The existence o a ixed point or B can be shown using Kakutani s ixed point theorem, see or instance [4]. Theorem 4. ( [4] ): Let X R n be a compact and convex set and let F : X X be a set-valued map such that: i) For all x X the set F (x) is non-empty and convex. ii) The graph o F is closed, i.e., or all sequences {x k } and {y k } in X such that k Z +, y k F (x k ), x k x, y k y y F (x). Then there exists x X such that x F (x ). Next, we show that the set-valued map B satisies the assumptions o Theorem 4.. Theorem 4.3: The stochastic game G has at least one equilibrium. Proo: The convexity o F i, or i I, ollows directly rom its deinition. For compactness, consider, or each i I, the map N H i : mi F i, k=1 where or u = [u T 1,..., u T N N ]T k=1 mi, and H i [u] = i with u kl = i (k, l), k S, l A, and notice that H i is a continuous bijection with a compact domain. Since F i are convex and compact or all i I, the same holds or F. Inspection o (8) suggests that or i I and i F i ixed, player i in the process o calculating the elements o the set that constitute B i ( i ) is aced with a one-sided optimization problem which is a Markov decision process (MDP). Since the solution o this problem always exists, an optimal best response to F exists and thus B 1 ( ) is not empty. For s S, F, let P 1,s Rm1 R N deined by [P 1,s ] ij = p 1 (j, s, i), (13) and r 1,s Rm1 deined by [r 1,s ] i = r 1 (s, i). Every element 1 B 1 ( ) satisies or each state s S Bellman s equations o optimality [5], [1] [v 1,o ] s = max s 1, r 1,s + β P1,s v1,o. (14) 1 s From the linear programing ormulation o MDP s, see or instance [5], the optimal policies are obtained as the solution o a linear program, and as such the convex combination o two optimal policies is optimal as well. Thus, we have shown that or F, B 1 ( ) is non-empty and convex and by a symmetric argument so is B ( 1 ) or 1 F 1. It ollows that or every F the set B() is non-empty and convex. It remains to show that the graph o B is closed. To this end, consider a sequence { n} F such that n F. Furthermore consider the sequence {n} 1 F 1 such that or every n Z +, n 1 B 1 ( n) and suppose that n 1 1 F 1. It must be shown that 1 B 1 ( ), which will involve sensitivity arguments in regards to the variations o

4 the optimal value and strategy o Player 1 as a unction o the strategy o Player. To this end, consider the map V 1 : F 1 F R N R N where or any ( 1,, v) F 1 F R N and s S [V 1 ( 1,, v)] s = 1 s, r 1,s + β P1,s v. The map Vβ 1 is continuous, since each o its coordinates is polynomial in its arguments and linear in 1. For a ixed F deine the Bellman operator U 1 : R N R N o the corresponding the MDP o player 1 (M 1 ) as [U 1 (v)] s = max[v 1 ( 1,, v)] s, s 1 where v R N and s S. It is known rom standard theory o MDPs [1] that or every F the mapping U 1 is a contraction with respect to the max-norm and as such its ixed point v 1,o is unique. Furthermore, the amily o maps {U 1 F } is equicontinuous. For v R N consider the map Wv 1 : F R N where or F W 1 v( ) = U 1 (v). The map Wv 1 is continuous on its domain and or any closed and bounded set B R N, the amily o maps {Wv 1 v B} is equicontinuous. It ollows that i v 1,o v n 0 then v 1,o = v 0. Employing the triangle inequality, one has V 1 ( 1,, v 0 ) v 0 V 1 ( 1,, v 0 ) V 1 (n, 1 n, v 1,o ) n + V 1 (n, 1 n, v 1,o ) n v 0 = V 1 ( 1,, v 0 ) V 1 (n, 1 n, v 1,o ) n + v 1,o n v 0. By taking the limit n, it ollows that V 1 ( 1,, v 0 ) = v 0, showing that 1 B 1 ( ). In other words, the best response map B 1 has a closed graph. By a similar argument so does the best response map B and thereore B has a closed graph. V. COMPUTATIONAL ASPECTS We have shown that i two players have dierent perceptions o the environment, then the associated discounted stochastic game G has an equilibrium. We now consider the computational aspects o the problem, discussing both the Nash and correlated equilibria and how such solutions can be ound in games under lack o common knowledge. An inherently attractive approach is to solve the singlesided MDPs M 1 and M. Solutions to these optimization problems are well-known and can be ound through linear programming or value iteration [5], [0]. The resulting optimal policy in M i is stationary as well as deterministic and is constructed as ollows. With knowledge o the optimal value o player i, v i,o, orm the single-agent Q-values i Q i,o (s, a i ) = r i i i(s, ai ) + β p i i(s, s, a i )v i,o (s ), i s S (15) where then the optimal policy in s S is then i,o (a i, s) = δ(a i a i,o (s)), with a i,o (s) = arg max a i Q i,o (s, a i ) [5], i [15], [1]. While existence o equilibrium in stochastic games is guaranteed, they need not be deterministic and thus the above method will not generally yield NE [15]. Secondly, the presence o other players that do not have ixed behavior (e.g., due to learning) induces non-stationarity in the MDP transitions deined by (9), violating assumptions o reinorcement learning techniques that have been suggested to solve this problem [19], [1] [3]. Instead, one can view zero-sum games as a collection o matrix games, replacing single-stage payo matrices with the joint Q-values given by (5) [5], [14] [17], [3]. However, or the game to be zero-sum it must be o common knowledge (p 1 = p ) so that the condition in v = v 1 = v is satisied. In this case, inding NE is done by recursively solving a linear program, as the traditional zero-sum structure guarantees the existence o uniquely-valued NE [5], [14] [16], [4]. However, in the case addressed in this paper, this structure is lost as p 1 p. Unortunately, such non-zero sum games may have multiple NE and inding these solutions amounts to solving a non-concave quadratic program, as ollows [5], [5] [7]. For each s S, let Q 1 (s, v1 ), Q (s, v ) be the payo matrices and take s 1 and s to be the strategies or players 1 and, respectively. The NE strategy and value or each player can be ound by solving T[ max Q 1 (s, v 1 ) + Q (s, v ) ] s α γ, (16) 1 s 1, s,α,γ s subject to a i A i Q 1 (s, v 1 ) s α1 m 1 0, Q T (s, v ) 1 s γ1 m 0, 1 m 1, 1 s 1 = 0, 1 m, s 1 = 0, [ 1 s ] j 0 j {1,..., m 1 }, [ s ] k 0 k {1,..., m }, (17) where α, γ R are the expected payos to players 1 and, respectively, when playing the strategy pair (s 1,o, s,o ) (i.e., Nash 1 ( ) = α, Nash ( ) = γ). Although the problem is non-concave, the objective unction has a global maximum o zero corresponding to NE [5], [6]. Thus, a easible solution to (17) with objective unction equal to zero is a NE [5]. Other approaches to the problem have been ormulated that involve solving larger nonlinear programs, opting to not break the game into smaller state-wise components [8] [30]. These approaches are high-dimensional and nonlinear, making the problem no easier to solve. While these programs provide means to ind NE, they may converge to non- Nash solutions []. Thus, in order to arrive at quantiiable solutions, we consider the correlated equilibrium. Deinition 5.1: Consider the two player game deined by payo matrices Q 1 (s, vi ) and Q (s, vi ) with l = m1 m and let z o l be a joint strategy. The strategy z o is a Correlated Equilibrium i ( Q i (s, v i ) a i,a i Qi k (s, v i ) ) a i,a i h z o a i,a 0, i k (18) is satisied a i k, ai h Ai, i I. The correlated equilibrium (CE) is deined on the joint strategy space and allows or correlation o player actions,

5 thus relaxing the requirement that strategies be independent [13]. Thereore, the set o CE contains the NE, since the NE are the cases where the joint strategy can be actored into independent distributions satisying deinition 3.1 [13]. The CE assumes that a third party recommends actions to players according to z o and hence agents are not aware o the actions given to their opponent(s). The attractiveness o CE is that, even in the general-sum case, they can be ound by solving a linear program [13]. The optimization problem requires two components: (1) a selection unction, deining the CE to seek and () constraints encoding deinition 5.1. Given payo matrices Q 1 (s, v1 ), Q (s, v ), take c Rl as c = F(Q 1 (s, v1 ), Q (s, v )), where F is a selection unction, then the solution to such that max c T z, (19) z l Lz 0, (0) is a CE, where L R n l, n = i=1 mi (m i 1), enorces deinition 5.1 [13], [31]. The expected value to player i I is given by CE i ( ) and is ound by taking the expectation o the player s payo under the strategy z o [13], [31]. Then, provided a model o the environment, inding a CE is reduced to a recursive linear program similar to value iteration [31]. The stochastic game is again viewed as a collection o static games, where the value o player i I is updated according to [v i k+1 ] s = CE i (Q 1 k (s, v 1 k ), Q k (s, v k )). (1) Although the iterations suggested by (1) may not converge to a stationary strategy, the algorithm may give other meaningul solutions such as cyclic-correlated equilibrium [31]. These concepts are next shown with an example o a pursuitevasion game under the lack o common knowledge. VI. NUMERICAL EXAMPLE We consider a two-player pursuit-evasion game on a 6x6 grid world, as shown in Fig. 1. The evading player wins the game and is awarded +100 points or reaching any o the two evade states alone (may not co-occupy with pursuer). Capture occurs i the evader occupies the same or any one Obstacle Cluster Evade State Evade State Pursuer Start Evader Start Fig. 1: Grid or pursuit evasion game: start, goal and obstacle cells are indicated. o the neighboring cells in the our cardinal directions as the pursuer, thereby paying -100 points and losing the game. I both navigate into obstacles on the same move, the game is over with 0 points being given to each (draw). In the event o a crash, the game is awarded to the non-crashing player and classiied as either evasion (pursuer crash) or capture (evader crash). For moves within the game, the evading player is given rewards according to r (s) = κ p(s), where κ > 0 and p(s) R is the relative x-y position o the players when in s S. Since the game is zero-sum, r 1 (s) = r (s) or all s S. Each agent has the same action set, and can select to move in one o the our cardinal directions o the grid by selecting up (U), down (D), let (L) or right (R) and thereore A 1 = A = {U, D, L, R}. The discount actor is β = 0.7. Transitions are generally stochastic and are created by providing each agent an action success probability and distributing the remaining probability uniormly over neighboring cells. We can view stochastic transitions as a UAV navigating an environment in the presence o a disturbance. In what ollows, we seek CE that maximize the joint rewards o the players, with resulting strategy simulated over 5,000 games. To understand how the dierences in asymmetric environment knowledge change the game, we consider a number o scenarios as ollows. CKD: Game is common knowledge with deterministic world dynamics, with both agents able to deduce the absence o a disturbance. CKS: Agents have the same perception o the world and are both able to sense a disturbance. LOCK-EV: Lack o common knowledge game with evader aware o a disturbance and pursuer is not. LOCK-PS: Lack o common knowledge game with pursuer aware o the lack o a disturbance and evader is not. Fig. shows sample trajectories or CKD and CKS, with Table I showing results or all cases. Fig. 3 displays player movements under lack o common knowledge. (a) Fig. : Sample trajectories given the stochastic game with common knowledge. Both cases show the game won by the evading player. (a) CKD; (b) CKS. In CKD the evading player wins all games electing to pass directly next to the obstacles, maintaining its -cell advantage. Contrasting to CKS, where both agents are aware o the disturbance but the probabilistic nature o the environment makes evasion substantially more diicult, since a single mistake when executing a maneuver may lead to capture. Further, note that in the sample trajectory or this case (Fig. (b)), the evading player tends to pass urther away rom obstacles. Interestingly, LOCK-EV does not dier greatly rom CKS, albeit there are a greater number o games in which the evader wins, with an increased number o pursuer crashes, indicative o its alse inormation regarding the environment. In LOCK-PS, the evader again navigates away rom obstacles with the pursuer able to threaten interception (b)

6 (a) Fig. 3: Sample trajectories given the stochastic game with lack o common knowledge. Both cases show the game won by the pursuing player. (a) LOCK-EV; (b) LOCK-PS. which orces the evader to back-track leading to eventual capture. TABLE I: Capture and Evasion percentages with corresponding scenario numbering. Results are or 5,000 simulated games. (b) CKD CKS LOCK-EV LOCK-PS Capture Due to Evader Crash Evasion Due to Pursuer Crash Average Number Moves In this case, the success o evasion is drastically lower than CKD when it holds the correct understanding o the environment. Nonetheless, the evader manages to prolong capture by requiring a substantial number o moves, on average, until capture occurs. VII. CONCLUSION In this paper we considered the problem when two players engage in a pursuit-evasion scenario in a stochastic setting while having dierent perspectives about the probabilistic environment at their disposal. Under this assumption, we have established that there exists an equilibrium or the corresponding non-zero-sum stochastic game and provided an illustrative example utilizing the correlated equilibrium. ACKNOWLEDGMENTS Support or this work has been provided by ONR awards N and N and NSF award CMMI REFERENCES [1] T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory. Philadelphia: SIAM, [] R. Isaacs, Dierential Games: A Mathematical Theory with Applications to Warare and Pursuit, Control and Optimization. Dover Publications, [3] A. Balakrishnan, Stochastic Dierential Systems, I,. Springer-Verlag, [4] M. J. Osborne and A. Rubinstein, A Course in Game Theory. Cambridge, MA: MIT Press, [5] J. Filar and K. Vrieze, Competitive Markov Decision Processes. New York: Springer-Verlag, [6] Y. Yavin, Pursuit-evasion dierential games with deception or interrupted observation, Computers & Mathematics with Applications, vol. 13, pp , [7] W. Sun and P. Tsiotras, Pursuit evasion game o two players under an external low ield, in Proceedings o the American Control Conerence, Chicago, IL, July 015, pp [8] R. P. Anderson, E. Bakolas, D. Milutinović, and P. Tsiotras, Optimal eedback guidance o a small aerial vehicle in a stochastic wind, Journal o Guidance, Control, and Dynamics, vol. 36, no. 4, pp , July 013. [9] E. Bakolas and P. Tsiotras, Feedback navigation in an uncertain low-ield and connections with pursuit strategies, AIAA Journal o Guidance, Control, and Dynamics, vol. 35, no. 4, pp , July-August 01. [10] M. Soulignac, Feasible and optimal path planning in strong current ields, IEEE Transactions on Robotics, vol. 7, no. 1, pp , Feb [11] Y. Ketema and Y. J. Zhao, Controllability and reachability or micro-aerial-vehicle trajectory planning in winds, AIAA Journal o Guidance, Control and Dynamics, vol. 33, no. 3, pp , 010. [1] M. L. Puterman, Markov Decision Processes. Wiley-Interscience, [13] A. Greenwald and K. Hall, Correlated Q-Learning, Proceedings o the 0th International Conerence on Machine Learning, pp. 4 49, 003. [14] M. L. Littman, Markov games as a ramework or multi-agent reinorcement learning, in Eleventh International Conerence on Machine Learning, 1994, pp [15], Value-unction reinorcement learning in Markov games, Journal o Cognitive Systems Research, vol. 1, pp , 001. [16], Friend-or-Foe Q-learning in General-Sum Games, in Eighteenth International Conerence on Machine Learning, 001, pp [17] E. Munoz de Cote and M. L. Littman, A Polynomial-time Nash Equilibrium Algorithm or Repeated Stochastic Games, in Proceedings o the Twenty-Fourth Conerence on Uncertainty in Artiicial Intelligence, 008, pp [18] J. Hu and M. P. Wellman, Nash Q-Learning or General-Sum Stochastic Games, Journal o Machine Learning Research, vol. 4, pp , 003. [19] M. Bowling and M. Veloso, An Analysis o Stochastic Game Theory or Multiagent Reinorcement Learning, Carnegie Mellon University, Tech. Rep., October 000. [0] R. S. Sutton and A. G. Barto, Reinorcement Learning. MIT Press, [1] P. Stone and M. Veloso, Multiagent Systems: A Survey rom a Machine Learning Perspective, Autonomous Robots, vol. 8, pp , 000. [] M. H. Bowling, Convergence Problems o General-Sum Multiagent Reinorcement Learning, International Conerence on Machine Learning, pp , 000. [3] L. Buşoniu, R. Babuška, and B. De Schutter, A Comprehensive Survey o Multiagent Reinorcement Learning, IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, vol. 38, no., pp , March 008. [4] A. W. Tucker, Solving a Matrix Game by Linear Programming, IBM Journal o Research and Development, vol. 4, no. 5, pp , November [5] O. Mangasarian and H. Stone, Two-Person Nonzero-Sum Games and Quadratic Programming, Journal o Mathematical Analysis and Applications, vol. 9, pp , [6] O. L. Mangasarian, Equilibrium points o bimatrix games, Journal o the Society or Industrial and Applied Mathematics, vol. 1, no. 4, pp , December [7] C. Lemke and J. T. Howson Jr., Equilibrium points o bimatrix games, Journal o the Society or Industrial and Applied Mathematics, vol. 1, no., pp , June [8] J. A. Filar and T. A. Schultz, Nonlinear programming and stationary strategies in stochastic games, Mathematical Programming, vol. 35, pp , [9] T. E. S. Raghavan and J. A. Filar, Algorithms or Stochastic Games - A Survey, Methods and Models o Operations Research, vol. 35, pp , [30] M. Breton, A. Haurie, and J. A. Filar, On the computation o equilibria in discounted stochastic dynamic games, Journal o Economic Dynamics and Control, vol. 10, no , pp , [31] M. Zinkevich, A. Greenwald, and M. Littman, Cyclic Equilibria in Markov Games, Advances in Neural Inormation Processing Systems 18, pp , 006.

Cyclic Equilibria in Markov Games

Cyclic Equilibria in Markov Games Cyclic Equilibria in Markov Games Martin Zinkevich and Amy Greenwald Department of Computer Science Brown University Providence, RI 02912 {maz,amy}@cs.brown.edu Michael L. Littman Department of Computer

More information

Multiagent Value Iteration in Markov Games

Multiagent Value Iteration in Markov Games Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges

More information

Procedia Computer Science 00 (2011) 000 6

Procedia Computer Science 00 (2011) 000 6 Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-

More information

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result

More information

Optimal Convergence in Multi-Agent MDPs

Optimal Convergence in Multi-Agent MDPs Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,

More information

Web Appendix for The Value of Switching Costs

Web Appendix for The Value of Switching Costs Web Appendix or The Value o Switching Costs Gary Biglaiser University o North Carolina, Chapel Hill Jacques Crémer Toulouse School o Economics (GREMAQ, CNRS and IDEI) Gergely Dobos Gazdasági Versenyhivatal

More information

Hierarchical State Abstractions for Decision-Making Problems with Computational Constraints

Hierarchical State Abstractions for Decision-Making Problems with Computational Constraints Hierarchical State Abstractions for Decision-Making Problems with Computational Constraints Daniel T. Larsson, Daniel Braun, Panagiotis Tsiotras Abstract In this semi-tutorial paper, we first review the

More information

Basic mathematics of economic models. 3. Maximization

Basic mathematics of economic models. 3. Maximization John Riley 1 January 16 Basic mathematics o economic models 3 Maimization 31 Single variable maimization 1 3 Multi variable maimization 6 33 Concave unctions 9 34 Maimization with non-negativity constraints

More information

DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R I LAGGING ANCHOR ALGORITHM. Xiaosong Lu and Howard M. Schwartz

DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R I LAGGING ANCHOR ALGORITHM. Xiaosong Lu and Howard M. Schwartz International Journal of Innovative Computing, Information and Control ICIC International c 20 ISSN 349-498 Volume 7, Number, January 20 pp. 0 DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games

A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games Learning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games Jun Li Jun.Li@warnerbros.com Kandethody Ramachandran ram@cas.usf.edu

More information

On Equilibria of Distributed Message-Passing Games

On Equilibria of Distributed Message-Passing Games On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

Convergence and No-Regret in Multiagent Learning

Convergence and No-Regret in Multiagent Learning Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent

More information

arxiv: v1 [cs.ai] 22 Oct 2017

arxiv: v1 [cs.ai] 22 Oct 2017 Hierarchical State Abstractions for Decision-Making Problems with Computational Constraints arxiv:1710.07990v1 [cs.ai 22 Oct 2017 Daniel T. Larsson, Daniel Braun, Panagiotis Tsiotras Abstract In this semi-tutorial

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Learning to Coordinate Efficiently: A Model-based Approach

Learning to Coordinate Efficiently: A Model-based Approach Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion

More information

A Simple Explanation of the Sobolev Gradient Method

A Simple Explanation of the Sobolev Gradient Method A Simple Explanation o the Sobolev Gradient Method R. J. Renka July 3, 2006 Abstract We have observed that the term Sobolev gradient is used more oten than it is understood. Also, the term is oten used

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

SEPARATED AND PROPER MORPHISMS

SEPARATED AND PROPER MORPHISMS SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN The notions o separatedness and properness are the algebraic geometry analogues o the Hausdor condition and compactness in topology. For varieties over the

More information

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques

More information

Correlated Q-Learning

Correlated Q-Learning Journal of Machine Learning Research 1 (27) 1 1 Submitted /; Published / Correlated Q-Learning Amy Greenwald Department of Computer Science Brown University Providence, RI 2912 Keith Hall Department of

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Function Approximation for Continuous Constrained MDPs

Function Approximation for Continuous Constrained MDPs Function Approximation for Continuous Constrained MDPs Aditya Undurti, Alborz Geramifard, Jonathan P. How Abstract In this work we apply function approximation techniques to solve continuous, constrained

More information

Chapter 9. Mixed Extensions. 9.1 Mixed strategies

Chapter 9. Mixed Extensions. 9.1 Mixed strategies Chapter 9 Mixed Extensions We now study a special case of infinite strategic games that are obtained in a canonic way from the finite games, by allowing mixed strategies. Below [0, 1] stands for the real

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Robust feedback linearization

Robust feedback linearization Robust eedback linearization Hervé Guillard Henri Bourlès Laboratoire d Automatique des Arts et Métiers CNAM/ENSAM 21 rue Pinel 75013 Paris France {herveguillardhenribourles}@parisensamr Keywords: Nonlinear

More information

Objectives. By the time the student is finished with this section of the workbook, he/she should be able

Objectives. By the time the student is finished with this section of the workbook, he/she should be able FUNCTIONS Quadratic Functions......8 Absolute Value Functions.....48 Translations o Functions..57 Radical Functions...61 Eponential Functions...7 Logarithmic Functions......8 Cubic Functions......91 Piece-Wise

More information

SEPARATED AND PROPER MORPHISMS

SEPARATED AND PROPER MORPHISMS SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN Last quarter, we introduced the closed diagonal condition or a prevariety to be a prevariety, and the universally closed condition or a variety to be complete.

More information

Lecture December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about

Lecture December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about 0368.4170: Cryptography and Game Theory Ran Canetti and Alon Rosen Lecture 7 02 December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about Two-Player zero-sum games (min-max theorem) Mixed

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Finite Dimensional Hilbert Spaces are Complete for Dagger Compact Closed Categories (Extended Abstract)

Finite Dimensional Hilbert Spaces are Complete for Dagger Compact Closed Categories (Extended Abstract) Electronic Notes in Theoretical Computer Science 270 (1) (2011) 113 119 www.elsevier.com/locate/entcs Finite Dimensional Hilbert Spaces are Complete or Dagger Compact Closed Categories (Extended bstract)

More information

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of

More information

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 11: Markov Decision Processes II Teacher: Gianni A. Di Caro RECAP: DEFINING MDPS Markov decision processes: o Set of states S o Start state s 0 o Set of actions A o Transitions P(s s,a)

More information

Differential game strategy in three-player evasion and pursuit scenarios

Differential game strategy in three-player evasion and pursuit scenarios Journal o Systems Engineering and Electronics Vol. 29, No. 2, pril 218, pp.352 366 Dierential game strategy in three-player evasion and pursuit scenarios SUN Qilong 1,QINaiming 1,*, XIO Longxu 2, and LIN

More information

Percentile Policies for Inventory Problems with Partially Observed Markovian Demands

Percentile Policies for Inventory Problems with Partially Observed Markovian Demands Proceedings o the International Conerence on Industrial Engineering and Operations Management Percentile Policies or Inventory Problems with Partially Observed Markovian Demands Farzaneh Mansouriard Department

More information

Distributed Game Strategy Design with Application to Multi-Agent Formation Control

Distributed Game Strategy Design with Application to Multi-Agent Formation Control 5rd IEEE Conference on Decision and Control December 5-7, 4. Los Angeles, California, USA Distributed Game Strategy Design with Application to Multi-Agent Formation Control Wei Lin, Member, IEEE, Zhihua

More information

Gradient descent for symmetric and asymmetric multiagent reinforcement learning

Gradient descent for symmetric and asymmetric multiagent reinforcement learning Web Intelligence and Agent Systems: An international journal 3 (25) 17 3 17 IOS Press Gradient descent for symmetric and asymmetric multiagent reinforcement learning Ville Könönen Neural Networks Research

More information

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares Scattered Data Approximation o Noisy Data via Iterated Moving Least Squares Gregory E. Fasshauer and Jack G. Zhang Abstract. In this paper we ocus on two methods or multivariate approximation problems

More information

VALUATIVE CRITERIA FOR SEPARATED AND PROPER MORPHISMS

VALUATIVE CRITERIA FOR SEPARATED AND PROPER MORPHISMS VALUATIVE CRITERIA FOR SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN Recall that or prevarieties, we had criteria or being a variety or or being complete in terms o existence and uniqueness o limits, where

More information

( x) f = where P and Q are polynomials.

( x) f = where P and Q are polynomials. 9.8 Graphing Rational Functions Lets begin with a deinition. Deinition: Rational Function A rational unction is a unction o the orm ( ) ( ) ( ) P where P and Q are polynomials. Q An eample o a simple rational

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

Information, Utility & Bounded Rationality

Information, Utility & Bounded Rationality Information, Utility & Bounded Rationality Pedro A. Ortega and Daniel A. Braun Department of Engineering, University of Cambridge Trumpington Street, Cambridge, CB2 PZ, UK {dab54,pao32}@cam.ac.uk Abstract.

More information

Multi-Agent Learning with Policy Prediction

Multi-Agent Learning with Policy Prediction Multi-Agent Learning with Policy Prediction Chongjie Zhang Computer Science Department University of Massachusetts Amherst, MA 3 USA chongjie@cs.umass.edu Victor Lesser Computer Science Department University

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Multiagent Learning Using a Variable Learning Rate

Multiagent Learning Using a Variable Learning Rate Multiagent Learning Using a Variable Learning Rate Michael Bowling, Manuela Veloso Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213-3890 Abstract Learning to act in a multiagent

More information

Strong Lyapunov Functions for Systems Satisfying the Conditions of La Salle

Strong Lyapunov Functions for Systems Satisfying the Conditions of La Salle 06 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 49, NO. 6, JUNE 004 Strong Lyapunov Functions or Systems Satisying the Conditions o La Salle Frédéric Mazenc and Dragan Ne sić Abstract We present a construction

More information

Adaptive State Feedback Nash Strategies for Linear Quadratic Discrete-Time Games

Adaptive State Feedback Nash Strategies for Linear Quadratic Discrete-Time Games Adaptive State Feedbac Nash Strategies for Linear Quadratic Discrete-Time Games Dan Shen and Jose B. Cruz, Jr. Intelligent Automation Inc., Rocville, MD 2858 USA (email: dshen@i-a-i.com). The Ohio State

More information

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Nancy Fulda and Dan Ventura Department of Computer Science Brigham Young University

More information

Telescoping Decomposition Method for Solving First Order Nonlinear Differential Equations

Telescoping Decomposition Method for Solving First Order Nonlinear Differential Equations Telescoping Decomposition Method or Solving First Order Nonlinear Dierential Equations 1 Mohammed Al-Reai 2 Maysem Abu-Dalu 3 Ahmed Al-Rawashdeh Abstract The Telescoping Decomposition Method TDM is a new

More information

Prioritized Sweeping Converges to the Optimal Value Function

Prioritized Sweeping Converges to the Optimal Value Function Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science

More information

Approximative Methods for Monotone Systems of min-max-polynomial Equations

Approximative Methods for Monotone Systems of min-max-polynomial Equations Approximative Methods or Monotone Systems o min-max-polynomial Equations Javier Esparza, Thomas Gawlitza, Stean Kieer, and Helmut Seidl Institut ür Inormatik Technische Universität München, Germany {esparza,gawlitza,kieer,seidl}@in.tum.de

More information

Gradient Descent for Symmetric and Asymmetric Multiagent Reinforcement Learning

Gradient Descent for Symmetric and Asymmetric Multiagent Reinforcement Learning 1 Gradient Descent for Symmetric and Asymmetric Multiagent Reinforcement Learning Ville Könönen Neural Networks Research Centre Helsinki University of Technology P.O. Box 5400, FI-02015 HUT, Finland Tel:

More information

A Note on the Existence of Ratifiable Acts

A Note on the Existence of Ratifiable Acts A Note on the Existence of Ratifiable Acts Joseph Y. Halpern Cornell University Computer Science Department Ithaca, NY 14853 halpern@cs.cornell.edu http://www.cs.cornell.edu/home/halpern August 15, 2018

More information

Mathematical Optimization Models and Applications

Mathematical Optimization Models and Applications Mathematical Optimization Models and Applications Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 1, 2.1-2,

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Semideterministic Finite Automata in Operational Research

Semideterministic Finite Automata in Operational Research Applied Mathematical Sciences, Vol. 0, 206, no. 6, 747-759 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.2988/ams.206.62 Semideterministic Finite Automata in Operational Research V. N. Dumachev and

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Multi-Pursuer Single-Evader Differential Games with Limited Observations*

Multi-Pursuer Single-Evader Differential Games with Limited Observations* American Control Conference (ACC) Washington DC USA June 7-9 Multi-Pursuer Single- Differential Games with Limited Observations* Wei Lin Student Member IEEE Zhihua Qu Fellow IEEE and Marwan A. Simaan Life

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Planning by Probabilistic Inference

Planning by Probabilistic Inference Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.

More information

Feedback Linearization

Feedback Linearization Feedback Linearization Peter Al Hokayem and Eduardo Gallestey May 14, 2015 1 Introduction Consider a class o single-input-single-output (SISO) nonlinear systems o the orm ẋ = (x) + g(x)u (1) y = h(x) (2)

More information

1 Relative degree and local normal forms

1 Relative degree and local normal forms THE ZERO DYNAMICS OF A NONLINEAR SYSTEM 1 Relative degree and local normal orms The purpose o this Section is to show how single-input single-output nonlinear systems can be locally given, by means o a

More information

Robust Learning Equilibrium

Robust Learning Equilibrium Robust Learning Equilibrium Itai Ashlagi Dov Monderer Moshe Tennenholtz Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa 32000, Israel Abstract We introduce

More information

Approximate Nash Equilibria with Near Optimal Social Welfare

Approximate Nash Equilibria with Near Optimal Social Welfare Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 015) Approximate Nash Equilibria with Near Optimal Social Welfare Artur Czumaj, Michail Fasoulakis, Marcin

More information

(C) The rationals and the reals as linearly ordered sets. Contents. 1 The characterizing results

(C) The rationals and the reals as linearly ordered sets. Contents. 1 The characterizing results (C) The rationals and the reals as linearly ordered sets We know that both Q and R are something special. When we think about about either o these we usually view it as a ield, or at least some kind o

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Some AI Planning Problems

Some AI Planning Problems Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533

More information

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x) Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not

More information

Robust Residual Selection for Fault Detection

Robust Residual Selection for Fault Detection Robust Residual Selection or Fault Detection Hamed Khorasgani*, Daniel E Jung**, Gautam Biswas*, Erik Frisk**, and Mattias Krysander** Abstract A number o residual generation methods have been developed

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Documents de Travail du Centre d Economie de la Sorbonne

Documents de Travail du Centre d Economie de la Sorbonne Documents de Travail du Centre d Economie de la Sorbonne On equilibrium payos in wage bargaining with discount rates varying in time Ahmet OZKARDAS, Agnieszka RUSINOWSKA 2014.11 Maison des Sciences Économiques,

More information

Best Guaranteed Result Principle and Decision Making in Operations with Stochastic Factors and Uncertainty

Best Guaranteed Result Principle and Decision Making in Operations with Stochastic Factors and Uncertainty Stochastics and uncertainty underlie all the processes of the Universe. N.N.Moiseev Best Guaranteed Result Principle and Decision Making in Operations with Stochastic Factors and Uncertainty by Iouldouz

More information

Analysis of the regularity, pointwise completeness and pointwise generacy of descriptor linear electrical circuits

Analysis of the regularity, pointwise completeness and pointwise generacy of descriptor linear electrical circuits Computer Applications in Electrical Engineering Vol. 4 Analysis o the regularity pointwise completeness pointwise generacy o descriptor linear electrical circuits Tadeusz Kaczorek Białystok University

More information

Noncooperative continuous-time Markov games

Noncooperative continuous-time Markov games Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and

More information

Economics Noncooperative Game Theory Lectures 3. October 15, 1997 Lecture 3

Economics Noncooperative Game Theory Lectures 3. October 15, 1997 Lecture 3 Economics 8117-8 Noncooperative Game Theory October 15, 1997 Lecture 3 Professor Andrew McLennan Nash Equilibrium I. Introduction A. Philosophy 1. Repeated framework a. One plays against dierent opponents

More information

VALUATIVE CRITERIA BRIAN OSSERMAN

VALUATIVE CRITERIA BRIAN OSSERMAN VALUATIVE CRITERIA BRIAN OSSERMAN Intuitively, one can think o separatedness as (a relative version o) uniqueness o limits, and properness as (a relative version o) existence o (unique) limits. It is not

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Circuit Complexity / Counting Problems

Circuit Complexity / Counting Problems Lecture 5 Circuit Complexity / Counting Problems April 2, 24 Lecturer: Paul eame Notes: William Pentney 5. Circuit Complexity and Uniorm Complexity We will conclude our look at the basic relationship between

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games

Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games Ronen I. Brafman Department of Computer Science Stanford University Stanford, CA 94305 brafman@cs.stanford.edu Moshe Tennenholtz

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

On the Girth of (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs

On the Girth of (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs On the Girth o (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs Sudarsan V S Ranganathan, Dariush Divsalar and Richard D Wesel Department o Electrical Engineering, University o Caliornia, Los

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

The Clifford algebra and the Chevalley map - a computational approach (detailed version 1 ) Darij Grinberg Version 0.6 (3 June 2016). Not proofread!

The Clifford algebra and the Chevalley map - a computational approach (detailed version 1 ) Darij Grinberg Version 0.6 (3 June 2016). Not proofread! The Cliord algebra and the Chevalley map - a computational approach detailed version 1 Darij Grinberg Version 0.6 3 June 2016. Not prooread! 1. Introduction: the Cliord algebra The theory o the Cliord

More information

Stochastic Game Approach for Replay Attack Detection

Stochastic Game Approach for Replay Attack Detection Stochastic Game Approach or Replay Attack Detection Fei Miao Miroslav Pajic George J. Pappas. Abstract The existing tradeo between control system perormance and the detection rate or replay attacks highlights

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Reinforcement Learning and Deep Reinforcement Learning

Reinforcement Learning and Deep Reinforcement Learning Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q

More information

Solving Continuous Linear Least-Squares Problems by Iterated Projection

Solving Continuous Linear Least-Squares Problems by Iterated Projection Solving Continuous Linear Least-Squares Problems by Iterated Projection by Ral Juengling Department o Computer Science, Portland State University PO Box 75 Portland, OR 977 USA Email: juenglin@cs.pdx.edu

More information

CHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does

CHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does Geosciences 567: CHAPTER (RR/GZ) CHAPTER : INTRODUCTION Inverse Theory: What It Is and What It Does Inverse theory, at least as I choose to deine it, is the ine art o estimating model parameters rom data

More information