Nash and Correlated Equilibria for Pursuit-Evasion Games Under Lack of Common Knowledge
|
|
- Donald Griffith
- 5 years ago
- Views:
Transcription
1 Nash and Correlated Equilibria or Pursuit-Evasion Games Under Lack o Common Knowledge Daniel T. Larsson Georgios Kotsalis Panagiotis Tsiotras Abstract The majority o work in pursuit-evasion games assumes perectly rational players who are omnipotent and have complete knowledge o the environment and the capabilities o other agents and, consequently, are correct in their assumption o the game that is played. This is rarely the case in practice. More oten than not, the players have dierent knowledge about the environment either because o sensing limitations or because o prior experience. In this paper, we wish to relax this assumption and consider pursuit-evasion games in a stochastic setting, where the players involved in the game have dierent perspectives regarding the transition probabilities that govern the world dynamics. We show the existence o a (Nash) equilibrium in this setting and discuss the computational aspects obtaining such an equilibrium. We also investigate a relaxation o this problem employing the notion o correlated equilibria. Finally, we demonstrate the approach using a gridworld example with two players in the presence o obstacles. I. INTRODUCTION Game theory studies multi-agent decision problems in which the payo o each player depends not only on its own actions, but also on the actions o the other player(s). Traditionally, pursuit-evasion games have been studied in the context o dierential games [1] [3]. In these ormulations, it is typically assumed that both players are in agreement in regards to the environment in which their interactions take place. Moreover, it is also assumed that the two players not only mutually agree about the environment they operate in, but they are also aware o this agreement, a situation that is reerred to in the literature as common knowledge [4]. In pursuit-evasion problems this leads to modeling the strategic interaction between the two players as a zero-sum dierential game. We would like to progressively relax this assumption. We start by investigating the case when the two players have potentially dierent perceptions o the evolution o the overall system. This implies that the equilibrium calculations or each player may be perormed under potentially erroneous assumptions about the environment. In order to illustrate the peculiarities o the proposed problem ormulation, we investigate a pursuit-evasion problem where the two players have dierent perceptions o the transition probabilities o the overall system, in the context D. Larsson is a PhD student with the D. Guggenheim School o Aerospace Engineering, Georgia Institute o Technology, Atlanta, GA, , USA. daniel.larsson@gatech.edu G. Kotsalis is with the H. Milton Stewart School o Industrial & Systems Engineering, Georgia Institute o Technology, Atlanta, GA, , USA. gkotsalis3@gatech.edu P. Tsiotras is a Proessor with the D. Guggenheim School o Aerospace Engineering and the Institute or Robotics and Intelligent Machines, Georgia Institute o Technology, Atlanta, GA, , USA. tsiotras@gatech.edu o a stochastic game [5]. Stochastic games provide a discrete analog o dierential games and oer a natural ramework to study pursuit-evasion problems. The term stochastic game reers to the scenario where multiple players interact in a dynamic probabilistic environment comprising o a inite number o states where each o the players have initely many actions at their disposal. The motivation or studying this problem stems rom situations o asymmetric and imprecise inormation on the underlying characteristics o the game as a result o deception [6], insuicient measurement capabilities [7] or erroneous modeling assumptions about the environment. Consider, or example, a pursuit-evasion scenario between two small UAVs in the presence o an external wind ield. The presence o the wind ield may have a large impact on the ensuing vehicle trajectories [8] [11]. For a pursuit-evasion problem, accurate knowledge, (or lack thereo) will thereby impact the outcome o the game. Depending on the on-board sensors and the individual player s modeling assumptions, every player is aced with a problem where each player s perception about the environment (and, subsequently, each other s dynamics) may dier. This discrepancy on each player s belie about the true state o the world is called lack o common knowledge. In our work, we investigate the impact o lack o common knowledge to a pursuit-evasion game in a stochastic setting. We show that lack o common knowledge leads naturally to a non-zero-sum setting even i the reward structure or both players initially leads to a zero-sum game (under common knowledge). We show the existence o an equilibrium in a such case, and provide a discussion regarding the numerical aspects o computing equilibria or such general (that is, nonzero-sum) games in Section V. Finally, we present a simple two-player pursuit-evasion game where the players do not share the same understanding o the world dynamics. II. NOTATION AND PRELIMINARIES The set o non-negative integers is denoted by Z 0, the set o positive integers by Z +, the set o real numbers by R and the set o non-negative reals by R +. For n Z + let R n denote the Euclidean n-space and R n + the non-negative orthant in R n. For n, m Z + let R n m denote the space o n-by-m real matrices. Typically, vectors and matrices will be written with boldace letters. The transpose o the column vector x R n is denoted by x T and or i Z +, [x] i reers to the i-th entry o x. Similarly or i, j Z +, [A] ij reers to the (i, j)-th entry o the matrix A R n m. The vector o 1 s in R n is denoted by 1 n = [ 1 1 ] T. n = {x R n n + i=1 [x] i = 1} denotes the simplex
2 in R n. Given x R n, x = max i {1,...,n} [x] i, and x = ( n ) 1 i=1 [x] i. Similarly or : {1,..., n} R, = max i {1,...,n} (i) and x, y R n, x, y denotes the inner product o two vectors. For vectors representing joint probability distributions x j,k = Pr {J = j & K = k} or random variables J, K. For payo matrices A i R n m, A i j,k is the entry corresponding to Player i playing action k when all others play action j. For a given set A, the set o probability measures on A will be denoted by P[A]. The power set, (the set o all subsets) o A, will be denoted by A. III. PROBLEM FORMULATION The notation and terminology used in this work is taken mainly rom [5]. We consider a discrete time, ininite-horizon dierential game with two players, indexed by i I = {1, }. The game evolves on the inite state space S = {1,..., N}, where N Z +, N. We will reer to discrete time instances as stages. At each stage t Z 0 the system occupies a state, say, s t S and player i chooses an action a i t rom the set A i = {1,..., m i }. For notational simplicity, we assume that the action sets are not state dependent. The sample space is Ω = (S A 1 A ), and its sigma-algebra is denoted by Σ = Ω. For t Z 0, i I we deine the maps S t : Ω S and A i t : Ω A, where or ω = (ω 1, ω,...) = ((s 0, a 1 0, a 0), (s 1, a 1 1, a 1),...) such that the relationships S t (ω) = s t and A i t(ω) = a i t hold. We assume that at each stage t Z 0, and or all ω Ω, both players have access to the ull state S t (ω) S o the system. We restrict ourselves to randomized Markovian strategies [1]. This means that when the system occupies the state s S each player i I will select a probability distribution on A i that depends on previous system states and actions only through the current state. Formally, the strategy o player i I is given by the map i : S A i R + where, or all s S, one has i (s, a i ) = 1. (1) a i A i The set o all strategies o player i I will be denoted by F i. For each i I, and every i F i, let us write i = [1 it,..., N it where or s S, i ]T s = [ i (s, 1),..., i (s, m 1 )] T. Each player i I has its own perception about the evolution o the system, encoded in the transition map p i : S S A 1 A R +. In particular, player i I believes that or each (s, a 1, a ) S A 1 A the system will make a transition to the state s S with probability p i (s, s, a 1, a ). Given a pair o strategies = ( 1, ) F = F 1 F, () the evolution o the system on S is Markovian. For a given initial condition s 0 S and a ixed strategy pair F, the measurable space (Ω, Σ) will be equipped with two measures, namely, µ i,s 0, i = 1,, as ollows. Given t Z 0, consider the path ((s 0, a 1 0, a 0), (s 1, a 1 1, a 1),..., (s t, a 1 t, a t )) (S A 1 A ) t+1 and let O Σ denote the corresponding cylinder set, O = {ω Ω S k (ω) = s k, A i k (ω) = ai k, i I, 0 k t}. Then, or i I, and F, one has ( t µ i,s 0 [O] = 1 (s k 1, a 1 k 1) (s k 1, a k 1) k=1 ) p i (s k, s k 1, a 1 k 1, a k 1) 1 (s t, a 1 t ) (s t, a t ). Let r i : S A 1 A R, denote the reward map o player i I. For each choice (a 1, a ) A 1 A at state s S, player i will collect an immediate reward r i (s, a 1, a ). For the class o dierential games we have in mind (e.g., pursuitevasion games) the players are in a purely antagonistic situation. We will thereore assume that r 1 (s, a 1, a ) = r (s, a 1, a ) or all (s, a 1, a ) S A 1 A. For t Z 0, i I, deine R i t : Ω R, as ollows R i t(ω) = r i (S t (ω), A 1 t (ω), A t (ω)). (3) We may write E i,s 0 [R i t] = E i [Ri t S 0 = s 0 ], to denote the expected value o the reward R i t o player i I at stage t Z 0 or a given strategy pair F, initial state s 0 S under measure µ i,s 0. Let β [0, 1) denote a discount actor. For a ixed strategy F we denote the discounted value map o player i, by v i : S R, where or s 0 S, v i (s 0 ) = β t E i,s 0 [Rt]. i (4) t=0 The vector o discounted values or player i I o the streams o expected rewards resulting rom the use o the strategy pair F or every initial state s 0 S, is then given by v i = [vi (1),..., vi, with joint Q-values or (N)]T each player deined by [5], [13] [19] [ Q i (s, v i ) ] = a 1,a [ r i (s, a 1, a ) + β ] p i (s, s, a 1, a )v i (s (5) ). s S The joint Q-values deined in (5) will serve as single-stage payo matrices when ormulating optimization problems or computing equilibrium strategies, as discussed in Section V. We will reer to the collection o objects G = {β, I, S, A 1, A, r 1, r, p 1, p } as a discounted stochastic game. Note that i we assume, as we do in this paper, that p 1 p, then G is a stochastic game with lack o common knowledge. At this point we need to establish which pair = ( 1, ) F is a solution to the stochastic game G. Deinition 3.1: A pair o strategies o = ( 1,o,,o ) F is a Nash equilibrium o the stochastic game G i v 1 ( 1,,o ) v 1 ( 1,o,,o ), 1 F 1, (6a) v ( 1,o, ) v ( 1,o,,o ), F. (6b) The above inequalities are understood component-wise with respect to the partial order induced by the non-negative orthant R N +. The solution concept considered is that o a Nash equilibrium or a non-zero-sum stochastic game. When p 1 = p the stochastic game G reduces to a traditional zero-sum stochastic game and the two sets o
3 inequalities (6) reduce to a single set o saddle-point inequalities v ( 1,,o ) v ( 1,o,,o ) v ( 1,o, ), F, where or F, it holds v = v 1 = v. We irst turn our attention to showing the existence o an equilibrium in the sense o (6). Remark: For the two player zero-sum case, it can be shown [5], [18] that v i o(s) = Nashi (Q 1 o(s, v1 o), Q o(s, v o)) i I, (7) where Nash i ( ) is the expected payo to player i when players use o F. The computation o this value and other aspects o the problem are discussed in Section V. IV. EXISTENCE OF EQUILIBRIA The equilibria and the corresponding strategies are computed via the calculation o the best response maps or each agent. To this end, and or each i I, let i stand or the opponent o player i, i.e. i I\{i}. Deinition 4.1: The best response map or each player i I is the set-valued map deined by B i : F i F i, where, or i F i, B i ( i ) = arg max i F vi i. (8) The best response map or each player can be easily computed by noticing that, given i F i, agent i is aced with a one-sided optimization problem, which is a Markov decision process (MDP). To see this, let us take the perspective o Player 1, keeping in mind that the analysis or Player is completely analogous. Accordingly, let us ix F and deine the transition map p 1 : S S A 1 R +, where or (s, s, a 1 ) S S A 1, p 1 (s, s, a 1 ) = p 1 (s, s, a 1, a ) (s, a ). (9) a A The immediate expected reward map or Player 1 as a unction o the randomized strategy o Player is r 1 : S A 1 R, where or (s, a 1 ) S A 1, r 1 (s, a1 ) = r 1 (s, a 1, a ) (s, a ). (10) a A The collection o objects M 1 = {β, S, A 1, p 1, r 1 } is reerred to as the MDP aced by Player 1 when Player employs the randomized strategy F. Given M 1 let v 1,o = max 1 F 1 v1. (11) Each element 1 B 1 ( ) then satisies the relation v 1 = v 1,o. The above optimization problem is standard in the theory o Markov decision processes and it is well posed or every β [0, 1). Deine now the composite response map or all players as B : F F, where or F, [ ] B B() = 1 ( ) B ( 1. (1) ) The map B denotes the best response map or the game. In view o (6), the existence o a ixed point o the map B is equivalent to the existence o an equilibrium or the given stochastic game G. The existence o a ixed point or B can be shown using Kakutani s ixed point theorem, see or instance [4]. Theorem 4. ( [4] ): Let X R n be a compact and convex set and let F : X X be a set-valued map such that: i) For all x X the set F (x) is non-empty and convex. ii) The graph o F is closed, i.e., or all sequences {x k } and {y k } in X such that k Z +, y k F (x k ), x k x, y k y y F (x). Then there exists x X such that x F (x ). Next, we show that the set-valued map B satisies the assumptions o Theorem 4.. Theorem 4.3: The stochastic game G has at least one equilibrium. Proo: The convexity o F i, or i I, ollows directly rom its deinition. For compactness, consider, or each i I, the map N H i : mi F i, k=1 where or u = [u T 1,..., u T N N ]T k=1 mi, and H i [u] = i with u kl = i (k, l), k S, l A, and notice that H i is a continuous bijection with a compact domain. Since F i are convex and compact or all i I, the same holds or F. Inspection o (8) suggests that or i I and i F i ixed, player i in the process o calculating the elements o the set that constitute B i ( i ) is aced with a one-sided optimization problem which is a Markov decision process (MDP). Since the solution o this problem always exists, an optimal best response to F exists and thus B 1 ( ) is not empty. For s S, F, let P 1,s Rm1 R N deined by [P 1,s ] ij = p 1 (j, s, i), (13) and r 1,s Rm1 deined by [r 1,s ] i = r 1 (s, i). Every element 1 B 1 ( ) satisies or each state s S Bellman s equations o optimality [5], [1] [v 1,o ] s = max s 1, r 1,s + β P1,s v1,o. (14) 1 s From the linear programing ormulation o MDP s, see or instance [5], the optimal policies are obtained as the solution o a linear program, and as such the convex combination o two optimal policies is optimal as well. Thus, we have shown that or F, B 1 ( ) is non-empty and convex and by a symmetric argument so is B ( 1 ) or 1 F 1. It ollows that or every F the set B() is non-empty and convex. It remains to show that the graph o B is closed. To this end, consider a sequence { n} F such that n F. Furthermore consider the sequence {n} 1 F 1 such that or every n Z +, n 1 B 1 ( n) and suppose that n 1 1 F 1. It must be shown that 1 B 1 ( ), which will involve sensitivity arguments in regards to the variations o
4 the optimal value and strategy o Player 1 as a unction o the strategy o Player. To this end, consider the map V 1 : F 1 F R N R N where or any ( 1,, v) F 1 F R N and s S [V 1 ( 1,, v)] s = 1 s, r 1,s + β P1,s v. The map Vβ 1 is continuous, since each o its coordinates is polynomial in its arguments and linear in 1. For a ixed F deine the Bellman operator U 1 : R N R N o the corresponding the MDP o player 1 (M 1 ) as [U 1 (v)] s = max[v 1 ( 1,, v)] s, s 1 where v R N and s S. It is known rom standard theory o MDPs [1] that or every F the mapping U 1 is a contraction with respect to the max-norm and as such its ixed point v 1,o is unique. Furthermore, the amily o maps {U 1 F } is equicontinuous. For v R N consider the map Wv 1 : F R N where or F W 1 v( ) = U 1 (v). The map Wv 1 is continuous on its domain and or any closed and bounded set B R N, the amily o maps {Wv 1 v B} is equicontinuous. It ollows that i v 1,o v n 0 then v 1,o = v 0. Employing the triangle inequality, one has V 1 ( 1,, v 0 ) v 0 V 1 ( 1,, v 0 ) V 1 (n, 1 n, v 1,o ) n + V 1 (n, 1 n, v 1,o ) n v 0 = V 1 ( 1,, v 0 ) V 1 (n, 1 n, v 1,o ) n + v 1,o n v 0. By taking the limit n, it ollows that V 1 ( 1,, v 0 ) = v 0, showing that 1 B 1 ( ). In other words, the best response map B 1 has a closed graph. By a similar argument so does the best response map B and thereore B has a closed graph. V. COMPUTATIONAL ASPECTS We have shown that i two players have dierent perceptions o the environment, then the associated discounted stochastic game G has an equilibrium. We now consider the computational aspects o the problem, discussing both the Nash and correlated equilibria and how such solutions can be ound in games under lack o common knowledge. An inherently attractive approach is to solve the singlesided MDPs M 1 and M. Solutions to these optimization problems are well-known and can be ound through linear programming or value iteration [5], [0]. The resulting optimal policy in M i is stationary as well as deterministic and is constructed as ollows. With knowledge o the optimal value o player i, v i,o, orm the single-agent Q-values i Q i,o (s, a i ) = r i i i(s, ai ) + β p i i(s, s, a i )v i,o (s ), i s S (15) where then the optimal policy in s S is then i,o (a i, s) = δ(a i a i,o (s)), with a i,o (s) = arg max a i Q i,o (s, a i ) [5], i [15], [1]. While existence o equilibrium in stochastic games is guaranteed, they need not be deterministic and thus the above method will not generally yield NE [15]. Secondly, the presence o other players that do not have ixed behavior (e.g., due to learning) induces non-stationarity in the MDP transitions deined by (9), violating assumptions o reinorcement learning techniques that have been suggested to solve this problem [19], [1] [3]. Instead, one can view zero-sum games as a collection o matrix games, replacing single-stage payo matrices with the joint Q-values given by (5) [5], [14] [17], [3]. However, or the game to be zero-sum it must be o common knowledge (p 1 = p ) so that the condition in v = v 1 = v is satisied. In this case, inding NE is done by recursively solving a linear program, as the traditional zero-sum structure guarantees the existence o uniquely-valued NE [5], [14] [16], [4]. However, in the case addressed in this paper, this structure is lost as p 1 p. Unortunately, such non-zero sum games may have multiple NE and inding these solutions amounts to solving a non-concave quadratic program, as ollows [5], [5] [7]. For each s S, let Q 1 (s, v1 ), Q (s, v ) be the payo matrices and take s 1 and s to be the strategies or players 1 and, respectively. The NE strategy and value or each player can be ound by solving T[ max Q 1 (s, v 1 ) + Q (s, v ) ] s α γ, (16) 1 s 1, s,α,γ s subject to a i A i Q 1 (s, v 1 ) s α1 m 1 0, Q T (s, v ) 1 s γ1 m 0, 1 m 1, 1 s 1 = 0, 1 m, s 1 = 0, [ 1 s ] j 0 j {1,..., m 1 }, [ s ] k 0 k {1,..., m }, (17) where α, γ R are the expected payos to players 1 and, respectively, when playing the strategy pair (s 1,o, s,o ) (i.e., Nash 1 ( ) = α, Nash ( ) = γ). Although the problem is non-concave, the objective unction has a global maximum o zero corresponding to NE [5], [6]. Thus, a easible solution to (17) with objective unction equal to zero is a NE [5]. Other approaches to the problem have been ormulated that involve solving larger nonlinear programs, opting to not break the game into smaller state-wise components [8] [30]. These approaches are high-dimensional and nonlinear, making the problem no easier to solve. While these programs provide means to ind NE, they may converge to non- Nash solutions []. Thus, in order to arrive at quantiiable solutions, we consider the correlated equilibrium. Deinition 5.1: Consider the two player game deined by payo matrices Q 1 (s, vi ) and Q (s, vi ) with l = m1 m and let z o l be a joint strategy. The strategy z o is a Correlated Equilibrium i ( Q i (s, v i ) a i,a i Qi k (s, v i ) ) a i,a i h z o a i,a 0, i k (18) is satisied a i k, ai h Ai, i I. The correlated equilibrium (CE) is deined on the joint strategy space and allows or correlation o player actions,
5 thus relaxing the requirement that strategies be independent [13]. Thereore, the set o CE contains the NE, since the NE are the cases where the joint strategy can be actored into independent distributions satisying deinition 3.1 [13]. The CE assumes that a third party recommends actions to players according to z o and hence agents are not aware o the actions given to their opponent(s). The attractiveness o CE is that, even in the general-sum case, they can be ound by solving a linear program [13]. The optimization problem requires two components: (1) a selection unction, deining the CE to seek and () constraints encoding deinition 5.1. Given payo matrices Q 1 (s, v1 ), Q (s, v ), take c Rl as c = F(Q 1 (s, v1 ), Q (s, v )), where F is a selection unction, then the solution to such that max c T z, (19) z l Lz 0, (0) is a CE, where L R n l, n = i=1 mi (m i 1), enorces deinition 5.1 [13], [31]. The expected value to player i I is given by CE i ( ) and is ound by taking the expectation o the player s payo under the strategy z o [13], [31]. Then, provided a model o the environment, inding a CE is reduced to a recursive linear program similar to value iteration [31]. The stochastic game is again viewed as a collection o static games, where the value o player i I is updated according to [v i k+1 ] s = CE i (Q 1 k (s, v 1 k ), Q k (s, v k )). (1) Although the iterations suggested by (1) may not converge to a stationary strategy, the algorithm may give other meaningul solutions such as cyclic-correlated equilibrium [31]. These concepts are next shown with an example o a pursuitevasion game under the lack o common knowledge. VI. NUMERICAL EXAMPLE We consider a two-player pursuit-evasion game on a 6x6 grid world, as shown in Fig. 1. The evading player wins the game and is awarded +100 points or reaching any o the two evade states alone (may not co-occupy with pursuer). Capture occurs i the evader occupies the same or any one Obstacle Cluster Evade State Evade State Pursuer Start Evader Start Fig. 1: Grid or pursuit evasion game: start, goal and obstacle cells are indicated. o the neighboring cells in the our cardinal directions as the pursuer, thereby paying -100 points and losing the game. I both navigate into obstacles on the same move, the game is over with 0 points being given to each (draw). In the event o a crash, the game is awarded to the non-crashing player and classiied as either evasion (pursuer crash) or capture (evader crash). For moves within the game, the evading player is given rewards according to r (s) = κ p(s), where κ > 0 and p(s) R is the relative x-y position o the players when in s S. Since the game is zero-sum, r 1 (s) = r (s) or all s S. Each agent has the same action set, and can select to move in one o the our cardinal directions o the grid by selecting up (U), down (D), let (L) or right (R) and thereore A 1 = A = {U, D, L, R}. The discount actor is β = 0.7. Transitions are generally stochastic and are created by providing each agent an action success probability and distributing the remaining probability uniormly over neighboring cells. We can view stochastic transitions as a UAV navigating an environment in the presence o a disturbance. In what ollows, we seek CE that maximize the joint rewards o the players, with resulting strategy simulated over 5,000 games. To understand how the dierences in asymmetric environment knowledge change the game, we consider a number o scenarios as ollows. CKD: Game is common knowledge with deterministic world dynamics, with both agents able to deduce the absence o a disturbance. CKS: Agents have the same perception o the world and are both able to sense a disturbance. LOCK-EV: Lack o common knowledge game with evader aware o a disturbance and pursuer is not. LOCK-PS: Lack o common knowledge game with pursuer aware o the lack o a disturbance and evader is not. Fig. shows sample trajectories or CKD and CKS, with Table I showing results or all cases. Fig. 3 displays player movements under lack o common knowledge. (a) Fig. : Sample trajectories given the stochastic game with common knowledge. Both cases show the game won by the evading player. (a) CKD; (b) CKS. In CKD the evading player wins all games electing to pass directly next to the obstacles, maintaining its -cell advantage. Contrasting to CKS, where both agents are aware o the disturbance but the probabilistic nature o the environment makes evasion substantially more diicult, since a single mistake when executing a maneuver may lead to capture. Further, note that in the sample trajectory or this case (Fig. (b)), the evading player tends to pass urther away rom obstacles. Interestingly, LOCK-EV does not dier greatly rom CKS, albeit there are a greater number o games in which the evader wins, with an increased number o pursuer crashes, indicative o its alse inormation regarding the environment. In LOCK-PS, the evader again navigates away rom obstacles with the pursuer able to threaten interception (b)
6 (a) Fig. 3: Sample trajectories given the stochastic game with lack o common knowledge. Both cases show the game won by the pursuing player. (a) LOCK-EV; (b) LOCK-PS. which orces the evader to back-track leading to eventual capture. TABLE I: Capture and Evasion percentages with corresponding scenario numbering. Results are or 5,000 simulated games. (b) CKD CKS LOCK-EV LOCK-PS Capture Due to Evader Crash Evasion Due to Pursuer Crash Average Number Moves In this case, the success o evasion is drastically lower than CKD when it holds the correct understanding o the environment. Nonetheless, the evader manages to prolong capture by requiring a substantial number o moves, on average, until capture occurs. VII. CONCLUSION In this paper we considered the problem when two players engage in a pursuit-evasion scenario in a stochastic setting while having dierent perspectives about the probabilistic environment at their disposal. Under this assumption, we have established that there exists an equilibrium or the corresponding non-zero-sum stochastic game and provided an illustrative example utilizing the correlated equilibrium. ACKNOWLEDGMENTS Support or this work has been provided by ONR awards N and N and NSF award CMMI REFERENCES [1] T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory. Philadelphia: SIAM, [] R. Isaacs, Dierential Games: A Mathematical Theory with Applications to Warare and Pursuit, Control and Optimization. Dover Publications, [3] A. Balakrishnan, Stochastic Dierential Systems, I,. Springer-Verlag, [4] M. J. Osborne and A. Rubinstein, A Course in Game Theory. Cambridge, MA: MIT Press, [5] J. Filar and K. Vrieze, Competitive Markov Decision Processes. New York: Springer-Verlag, [6] Y. Yavin, Pursuit-evasion dierential games with deception or interrupted observation, Computers & Mathematics with Applications, vol. 13, pp , [7] W. Sun and P. Tsiotras, Pursuit evasion game o two players under an external low ield, in Proceedings o the American Control Conerence, Chicago, IL, July 015, pp [8] R. P. Anderson, E. Bakolas, D. Milutinović, and P. Tsiotras, Optimal eedback guidance o a small aerial vehicle in a stochastic wind, Journal o Guidance, Control, and Dynamics, vol. 36, no. 4, pp , July 013. [9] E. Bakolas and P. Tsiotras, Feedback navigation in an uncertain low-ield and connections with pursuit strategies, AIAA Journal o Guidance, Control, and Dynamics, vol. 35, no. 4, pp , July-August 01. [10] M. Soulignac, Feasible and optimal path planning in strong current ields, IEEE Transactions on Robotics, vol. 7, no. 1, pp , Feb [11] Y. Ketema and Y. J. Zhao, Controllability and reachability or micro-aerial-vehicle trajectory planning in winds, AIAA Journal o Guidance, Control and Dynamics, vol. 33, no. 3, pp , 010. [1] M. L. Puterman, Markov Decision Processes. Wiley-Interscience, [13] A. Greenwald and K. Hall, Correlated Q-Learning, Proceedings o the 0th International Conerence on Machine Learning, pp. 4 49, 003. [14] M. L. Littman, Markov games as a ramework or multi-agent reinorcement learning, in Eleventh International Conerence on Machine Learning, 1994, pp [15], Value-unction reinorcement learning in Markov games, Journal o Cognitive Systems Research, vol. 1, pp , 001. [16], Friend-or-Foe Q-learning in General-Sum Games, in Eighteenth International Conerence on Machine Learning, 001, pp [17] E. Munoz de Cote and M. L. Littman, A Polynomial-time Nash Equilibrium Algorithm or Repeated Stochastic Games, in Proceedings o the Twenty-Fourth Conerence on Uncertainty in Artiicial Intelligence, 008, pp [18] J. Hu and M. P. Wellman, Nash Q-Learning or General-Sum Stochastic Games, Journal o Machine Learning Research, vol. 4, pp , 003. [19] M. Bowling and M. Veloso, An Analysis o Stochastic Game Theory or Multiagent Reinorcement Learning, Carnegie Mellon University, Tech. Rep., October 000. [0] R. S. Sutton and A. G. Barto, Reinorcement Learning. MIT Press, [1] P. Stone and M. Veloso, Multiagent Systems: A Survey rom a Machine Learning Perspective, Autonomous Robots, vol. 8, pp , 000. [] M. H. Bowling, Convergence Problems o General-Sum Multiagent Reinorcement Learning, International Conerence on Machine Learning, pp , 000. [3] L. Buşoniu, R. Babuška, and B. De Schutter, A Comprehensive Survey o Multiagent Reinorcement Learning, IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, vol. 38, no., pp , March 008. [4] A. W. Tucker, Solving a Matrix Game by Linear Programming, IBM Journal o Research and Development, vol. 4, no. 5, pp , November [5] O. Mangasarian and H. Stone, Two-Person Nonzero-Sum Games and Quadratic Programming, Journal o Mathematical Analysis and Applications, vol. 9, pp , [6] O. L. Mangasarian, Equilibrium points o bimatrix games, Journal o the Society or Industrial and Applied Mathematics, vol. 1, no. 4, pp , December [7] C. Lemke and J. T. Howson Jr., Equilibrium points o bimatrix games, Journal o the Society or Industrial and Applied Mathematics, vol. 1, no., pp , June [8] J. A. Filar and T. A. Schultz, Nonlinear programming and stationary strategies in stochastic games, Mathematical Programming, vol. 35, pp , [9] T. E. S. Raghavan and J. A. Filar, Algorithms or Stochastic Games - A Survey, Methods and Models o Operations Research, vol. 35, pp , [30] M. Breton, A. Haurie, and J. A. Filar, On the computation o equilibria in discounted stochastic dynamic games, Journal o Economic Dynamics and Control, vol. 10, no , pp , [31] M. Zinkevich, A. Greenwald, and M. Littman, Cyclic Equilibria in Markov Games, Advances in Neural Inormation Processing Systems 18, pp , 006.
Cyclic Equilibria in Markov Games
Cyclic Equilibria in Markov Games Martin Zinkevich and Amy Greenwald Department of Computer Science Brown University Providence, RI 02912 {maz,amy}@cs.brown.edu Michael L. Littman Department of Computer
More informationMultiagent Value Iteration in Markov Games
Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges
More informationProcedia Computer Science 00 (2011) 000 6
Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-
More informationA Polynomial-time Nash Equilibrium Algorithm for Repeated Games
A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result
More informationOptimal Convergence in Multi-Agent MDPs
Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,
More informationWeb Appendix for The Value of Switching Costs
Web Appendix or The Value o Switching Costs Gary Biglaiser University o North Carolina, Chapel Hill Jacques Crémer Toulouse School o Economics (GREMAQ, CNRS and IDEI) Gergely Dobos Gazdasági Versenyhivatal
More informationHierarchical State Abstractions for Decision-Making Problems with Computational Constraints
Hierarchical State Abstractions for Decision-Making Problems with Computational Constraints Daniel T. Larsson, Daniel Braun, Panagiotis Tsiotras Abstract In this semi-tutorial paper, we first review the
More informationBasic mathematics of economic models. 3. Maximization
John Riley 1 January 16 Basic mathematics o economic models 3 Maimization 31 Single variable maimization 1 3 Multi variable maimization 6 33 Concave unctions 9 34 Maimization with non-negativity constraints
More informationDECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R I LAGGING ANCHOR ALGORITHM. Xiaosong Lu and Howard M. Schwartz
International Journal of Innovative Computing, Information and Control ICIC International c 20 ISSN 349-498 Volume 7, Number, January 20 pp. 0 DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More informationA Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games
Learning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games Jun Li Jun.Li@warnerbros.com Kandethody Ramachandran ram@cas.usf.edu
More informationOn Equilibria of Distributed Message-Passing Games
On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:
More informationSelecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden
1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized
More informationConvergence and No-Regret in Multiagent Learning
Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent
More informationarxiv: v1 [cs.ai] 22 Oct 2017
Hierarchical State Abstractions for Decision-Making Problems with Computational Constraints arxiv:1710.07990v1 [cs.ai 22 Oct 2017 Daniel T. Larsson, Daniel Braun, Panagiotis Tsiotras Abstract In this semi-tutorial
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationLearning to Coordinate Efficiently: A Model-based Approach
Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion
More informationA Simple Explanation of the Sobolev Gradient Method
A Simple Explanation o the Sobolev Gradient Method R. J. Renka July 3, 2006 Abstract We have observed that the term Sobolev gradient is used more oten than it is understood. Also, the term is oten used
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationSEPARATED AND PROPER MORPHISMS
SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN The notions o separatedness and properness are the algebraic geometry analogues o the Hausdor condition and compactness in topology. For varieties over the
More informationRoberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points
Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques
More informationCorrelated Q-Learning
Journal of Machine Learning Research 1 (27) 1 1 Submitted /; Published / Correlated Q-Learning Amy Greenwald Department of Computer Science Brown University Providence, RI 2912 Keith Hall Department of
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationFunction Approximation for Continuous Constrained MDPs
Function Approximation for Continuous Constrained MDPs Aditya Undurti, Alborz Geramifard, Jonathan P. How Abstract In this work we apply function approximation techniques to solve continuous, constrained
More informationChapter 9. Mixed Extensions. 9.1 Mixed strategies
Chapter 9 Mixed Extensions We now study a special case of infinite strategic games that are obtained in a canonic way from the finite games, by allowing mixed strategies. Below [0, 1] stands for the real
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationRobust feedback linearization
Robust eedback linearization Hervé Guillard Henri Bourlès Laboratoire d Automatique des Arts et Métiers CNAM/ENSAM 21 rue Pinel 75013 Paris France {herveguillardhenribourles}@parisensamr Keywords: Nonlinear
More informationObjectives. By the time the student is finished with this section of the workbook, he/she should be able
FUNCTIONS Quadratic Functions......8 Absolute Value Functions.....48 Translations o Functions..57 Radical Functions...61 Eponential Functions...7 Logarithmic Functions......8 Cubic Functions......91 Piece-Wise
More informationSEPARATED AND PROPER MORPHISMS
SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN Last quarter, we introduced the closed diagonal condition or a prevariety to be a prevariety, and the universally closed condition or a variety to be complete.
More informationLecture December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about
0368.4170: Cryptography and Game Theory Ran Canetti and Alon Rosen Lecture 7 02 December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about Two-Player zero-sum games (min-max theorem) Mixed
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationFinite Dimensional Hilbert Spaces are Complete for Dagger Compact Closed Categories (Extended Abstract)
Electronic Notes in Theoretical Computer Science 270 (1) (2011) 113 119 www.elsevier.com/locate/entcs Finite Dimensional Hilbert Spaces are Complete or Dagger Compact Closed Categories (Extended bstract)
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationCMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 11: Markov Decision Processes II Teacher: Gianni A. Di Caro RECAP: DEFINING MDPS Markov decision processes: o Set of states S o Start state s 0 o Set of actions A o Transitions P(s s,a)
More informationDifferential game strategy in three-player evasion and pursuit scenarios
Journal o Systems Engineering and Electronics Vol. 29, No. 2, pril 218, pp.352 366 Dierential game strategy in three-player evasion and pursuit scenarios SUN Qilong 1,QINaiming 1,*, XIO Longxu 2, and LIN
More informationPercentile Policies for Inventory Problems with Partially Observed Markovian Demands
Proceedings o the International Conerence on Industrial Engineering and Operations Management Percentile Policies or Inventory Problems with Partially Observed Markovian Demands Farzaneh Mansouriard Department
More informationDistributed Game Strategy Design with Application to Multi-Agent Formation Control
5rd IEEE Conference on Decision and Control December 5-7, 4. Los Angeles, California, USA Distributed Game Strategy Design with Application to Multi-Agent Formation Control Wei Lin, Member, IEEE, Zhihua
More informationGradient descent for symmetric and asymmetric multiagent reinforcement learning
Web Intelligence and Agent Systems: An international journal 3 (25) 17 3 17 IOS Press Gradient descent for symmetric and asymmetric multiagent reinforcement learning Ville Könönen Neural Networks Research
More informationScattered Data Approximation of Noisy Data via Iterated Moving Least Squares
Scattered Data Approximation o Noisy Data via Iterated Moving Least Squares Gregory E. Fasshauer and Jack G. Zhang Abstract. In this paper we ocus on two methods or multivariate approximation problems
More informationVALUATIVE CRITERIA FOR SEPARATED AND PROPER MORPHISMS
VALUATIVE CRITERIA FOR SEPARATED AND PROPER MORPHISMS BRIAN OSSERMAN Recall that or prevarieties, we had criteria or being a variety or or being complete in terms o existence and uniqueness o limits, where
More information( x) f = where P and Q are polynomials.
9.8 Graphing Rational Functions Lets begin with a deinition. Deinition: Rational Function A rational unction is a unction o the orm ( ) ( ) ( ) P where P and Q are polynomials. Q An eample o a simple rational
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationInformation, Utility & Bounded Rationality
Information, Utility & Bounded Rationality Pedro A. Ortega and Daniel A. Braun Department of Engineering, University of Cambridge Trumpington Street, Cambridge, CB2 PZ, UK {dab54,pao32}@cam.ac.uk Abstract.
More informationMulti-Agent Learning with Policy Prediction
Multi-Agent Learning with Policy Prediction Chongjie Zhang Computer Science Department University of Massachusetts Amherst, MA 3 USA chongjie@cs.umass.edu Victor Lesser Computer Science Department University
More informationOptimality, Duality, Complementarity for Constrained Optimization
Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear
More informationMultiagent Learning Using a Variable Learning Rate
Multiagent Learning Using a Variable Learning Rate Michael Bowling, Manuela Veloso Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213-3890 Abstract Learning to act in a multiagent
More informationStrong Lyapunov Functions for Systems Satisfying the Conditions of La Salle
06 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 49, NO. 6, JUNE 004 Strong Lyapunov Functions or Systems Satisying the Conditions o La Salle Frédéric Mazenc and Dragan Ne sić Abstract We present a construction
More informationAdaptive State Feedback Nash Strategies for Linear Quadratic Discrete-Time Games
Adaptive State Feedbac Nash Strategies for Linear Quadratic Discrete-Time Games Dan Shen and Jose B. Cruz, Jr. Intelligent Automation Inc., Rocville, MD 2858 USA (email: dshen@i-a-i.com). The Ohio State
More informationIncremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests
Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Nancy Fulda and Dan Ventura Department of Computer Science Brigham Young University
More informationTelescoping Decomposition Method for Solving First Order Nonlinear Differential Equations
Telescoping Decomposition Method or Solving First Order Nonlinear Dierential Equations 1 Mohammed Al-Reai 2 Maysem Abu-Dalu 3 Ahmed Al-Rawashdeh Abstract The Telescoping Decomposition Method TDM is a new
More informationPrioritized Sweeping Converges to the Optimal Value Function
Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science
More informationApproximative Methods for Monotone Systems of min-max-polynomial Equations
Approximative Methods or Monotone Systems o min-max-polynomial Equations Javier Esparza, Thomas Gawlitza, Stean Kieer, and Helmut Seidl Institut ür Inormatik Technische Universität München, Germany {esparza,gawlitza,kieer,seidl}@in.tum.de
More informationGradient Descent for Symmetric and Asymmetric Multiagent Reinforcement Learning
1 Gradient Descent for Symmetric and Asymmetric Multiagent Reinforcement Learning Ville Könönen Neural Networks Research Centre Helsinki University of Technology P.O. Box 5400, FI-02015 HUT, Finland Tel:
More informationA Note on the Existence of Ratifiable Acts
A Note on the Existence of Ratifiable Acts Joseph Y. Halpern Cornell University Computer Science Department Ithaca, NY 14853 halpern@cs.cornell.edu http://www.cs.cornell.edu/home/halpern August 15, 2018
More informationMathematical Optimization Models and Applications
Mathematical Optimization Models and Applications Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 1, 2.1-2,
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationSemideterministic Finite Automata in Operational Research
Applied Mathematical Sciences, Vol. 0, 206, no. 6, 747-759 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.2988/ams.206.62 Semideterministic Finite Automata in Operational Research V. N. Dumachev and
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationMachine Learning I Reinforcement Learning
Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationMulti-Pursuer Single-Evader Differential Games with Limited Observations*
American Control Conference (ACC) Washington DC USA June 7-9 Multi-Pursuer Single- Differential Games with Limited Observations* Wei Lin Student Member IEEE Zhihua Qu Fellow IEEE and Marwan A. Simaan Life
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationPlanning by Probabilistic Inference
Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.
More informationFeedback Linearization
Feedback Linearization Peter Al Hokayem and Eduardo Gallestey May 14, 2015 1 Introduction Consider a class o single-input-single-output (SISO) nonlinear systems o the orm ẋ = (x) + g(x)u (1) y = h(x) (2)
More information1 Relative degree and local normal forms
THE ZERO DYNAMICS OF A NONLINEAR SYSTEM 1 Relative degree and local normal orms The purpose o this Section is to show how single-input single-output nonlinear systems can be locally given, by means o a
More informationRobust Learning Equilibrium
Robust Learning Equilibrium Itai Ashlagi Dov Monderer Moshe Tennenholtz Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa 32000, Israel Abstract We introduce
More informationApproximate Nash Equilibria with Near Optimal Social Welfare
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 015) Approximate Nash Equilibria with Near Optimal Social Welfare Artur Czumaj, Michail Fasoulakis, Marcin
More information(C) The rationals and the reals as linearly ordered sets. Contents. 1 The characterizing results
(C) The rationals and the reals as linearly ordered sets We know that both Q and R are something special. When we think about about either o these we usually view it as a ield, or at least some kind o
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationSome AI Planning Problems
Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533
More information(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)
Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not
More informationRobust Residual Selection for Fault Detection
Robust Residual Selection or Fault Detection Hamed Khorasgani*, Daniel E Jung**, Gautam Biswas*, Erik Frisk**, and Mattias Krysander** Abstract A number o residual generation methods have been developed
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationDocuments de Travail du Centre d Economie de la Sorbonne
Documents de Travail du Centre d Economie de la Sorbonne On equilibrium payos in wage bargaining with discount rates varying in time Ahmet OZKARDAS, Agnieszka RUSINOWSKA 2014.11 Maison des Sciences Économiques,
More informationBest Guaranteed Result Principle and Decision Making in Operations with Stochastic Factors and Uncertainty
Stochastics and uncertainty underlie all the processes of the Universe. N.N.Moiseev Best Guaranteed Result Principle and Decision Making in Operations with Stochastic Factors and Uncertainty by Iouldouz
More informationAnalysis of the regularity, pointwise completeness and pointwise generacy of descriptor linear electrical circuits
Computer Applications in Electrical Engineering Vol. 4 Analysis o the regularity pointwise completeness pointwise generacy o descriptor linear electrical circuits Tadeusz Kaczorek Białystok University
More informationNoncooperative continuous-time Markov games
Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and
More informationEconomics Noncooperative Game Theory Lectures 3. October 15, 1997 Lecture 3
Economics 8117-8 Noncooperative Game Theory October 15, 1997 Lecture 3 Professor Andrew McLennan Nash Equilibrium I. Introduction A. Philosophy 1. Repeated framework a. One plays against dierent opponents
More informationVALUATIVE CRITERIA BRIAN OSSERMAN
VALUATIVE CRITERIA BRIAN OSSERMAN Intuitively, one can think o separatedness as (a relative version o) uniqueness o limits, and properness as (a relative version o) existence o (unique) limits. It is not
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationCircuit Complexity / Counting Problems
Lecture 5 Circuit Complexity / Counting Problems April 2, 24 Lecturer: Paul eame Notes: William Pentney 5. Circuit Complexity and Uniorm Complexity We will conclude our look at the basic relationship between
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationOptimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games
Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games Ronen I. Brafman Department of Computer Science Stanford University Stanford, CA 94305 brafman@cs.stanford.edu Moshe Tennenholtz
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationOn the Girth of (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs
On the Girth o (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs Sudarsan V S Ranganathan, Dariush Divsalar and Richard D Wesel Department o Electrical Engineering, University o Caliornia, Los
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationThe Clifford algebra and the Chevalley map - a computational approach (detailed version 1 ) Darij Grinberg Version 0.6 (3 June 2016). Not proofread!
The Cliord algebra and the Chevalley map - a computational approach detailed version 1 Darij Grinberg Version 0.6 3 June 2016. Not prooread! 1. Introduction: the Cliord algebra The theory o the Cliord
More informationStochastic Game Approach for Replay Attack Detection
Stochastic Game Approach or Replay Attack Detection Fei Miao Miroslav Pajic George J. Pappas. Abstract The existing tradeo between control system perormance and the detection rate or replay attacks highlights
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationReinforcement Learning and Deep Reinforcement Learning
Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q
More informationSolving Continuous Linear Least-Squares Problems by Iterated Projection
Solving Continuous Linear Least-Squares Problems by Iterated Projection by Ral Juengling Department o Computer Science, Portland State University PO Box 75 Portland, OR 977 USA Email: juenglin@cs.pdx.edu
More informationCHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does
Geosciences 567: CHAPTER (RR/GZ) CHAPTER : INTRODUCTION Inverse Theory: What It Is and What It Does Inverse theory, at least as I choose to deine it, is the ine art o estimating model parameters rom data
More information