Operator approach to stochastic games with varying stage duration

Size: px

Start display at page:

Download "Operator approach to stochastic games with varying stage duration"

Sylvia Berry
5 years ago
Views:

1 Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 4 December 2015, Stochastic methods in Game theory 1 G.Vigeral (with S. Sorin) Operator approach

2 Table of contents 1 Zero-sum stochastic games 2 Exact games with varying stage duration Finite horizon Discounted evaluation 3 Discretization of a continuous timed game 4 Conclusion and remarks 2 G.Vigeral (with S. Sorin) Operator approach

3 Zero-sum stochastic games Table of contents 1 Zero-sum stochastic games 2 Exact games with varying stage duration Finite horizon Discounted evaluation 3 Discretization of a continuous timed game 4 Conclusion and remarks 3 G.Vigeral (with S. Sorin) Operator approach

4 Zero-sum stochastic games Zero-sum stochastic game A zero-sum stochastic game Γ is a 5-tuple (Ω,I,J,g,ρ) where: Ω is the set of states. I (resp. J) is the action set of Player 1 (resp. Player 2). g : I J Ω [ 1,1] is the payoff function (that Player 1 maximizes and Player 2 minimizes). ρ : I J Ω (Ω) is the transition probability. 4 G.Vigeral (with S. Sorin) Operator approach

5 Zero-sum stochastic games How the Game is played An initial state ω 1 is given, known by each player. At each stage k N: the players observe the current state ω k. According to the past history, Player 1 (resp. Player 2) chooses a mixed action x k in X = (I) (resp. y k in Y = (J)). Done independently by each player. An action i k of Player 1 (resp. j k of Player 2) is drawn according to his mixed strategy x k (resp. y k ). This gives the payoff at stage k: g k = g(i k,j k,ω k ). A new state ω k+1 is drawn according to ρ(i k,j k,ω k ). 5 G.Vigeral (with S. Sorin) Operator approach

6 Zero-sum stochastic games The n-stage game For any stochastic game Γ, any finite horizon n N, and any starting state ω 1, the n-stage game Γ n is the zero-sum game with payoff E { n k=1g k } that Player 1 maximizes and Player 2 minimizes. The value of Γ n (ω 1 ) is denoted by V n (ω 1 ). Normalized value v n = V n n., 6 G.Vigeral (with S. Sorin) Operator approach

7 Zero-sum stochastic games The discounted game For any stochastic game Γ, any discount factor λ ]0,1[, and any starting state ω 1, the discounted game Γ λ (ω 1 ) is the zero-sum game with payoff E { + (1 λ) k 1 g k }, k=1 that Player 1 maximizes and Player 2 minimizes. The value of Γ λ (ω 1 ) is denoted by W λ (ω 1 ). Normalized value w λ = λv λ. 7 G.Vigeral (with S. Sorin) Operator approach

8 Zero-sum stochastic games Recursive structure Shapley (1953) proved that the values satisfy a recursive structure: { V n (ω) = sup inf g(x,y,ω) + Eρ(x,y,ω) (V n 1 ( )) } x X y Y = inf sup { g(x,y,ω) + Eρ(x,y,ω) (V n 1 ( )) } y Y x X { W λ (ω) = sup inf g(x,y,ω) + (1 λ)eρ(x,y,ω) (W λ ( )) } x X y Y { = inf g(x,y,ω) + (1 λ)eρ(x,y,ω) (W λ ( )) }. sup y Y x X 8 G.Vigeral (with S. Sorin) Operator approach

9 Zero-sum stochastic games Shapley operator This can be summarized by: V n = Ψ(V n 1 ) = Ψ n (0) W λ = Ψ((1 λ)w λ ) ( 1 λ w λ = λψ λ w λ for some operator Ψ. ) = ( ( )) 1 λ λψ λ { Ψ(f )(ω) = sup inf g(x,y,ω) + Eρ(x,y,ω) (f ( )) } x X y Y { = inf g(x,y,ω) + Eρ(x,y,ω) (f ( )) }. sup y Y x X Ψ is nonexpansive for the infinite norm. Ψ(f ) Ψ(f ) f f 9 G.Vigeral (with S. Sorin) Operator approach

10 Framework Zero-sum stochastic games This was proven by Shapley in the finite case but true in a very wide framework. For example if Ω finite, X and Y compact, g and ρ continuous. Ω, X and Y are compact metric, g and ρ continuous. See Maitra Partasarathy, Nowak, Mertens Sorin Zamir for more general frameworks. 10 G.Vigeral (with S. Sorin) Operator approach

11 Exact games with varying stage duration Table of contents 1 Zero-sum stochastic games 2 Exact games with varying stage duration Finite horizon Discounted evaluation 3 Discretization of a continuous timed game 4 Conclusion and remarks 11 G.Vigeral (with S. Sorin) Operator approach

12 Exact games with varying stage duration Definition Definition due to Neyman (2013). Instead of playing at time 1, 2,, n,, players play at times t 1, t 2,, t n, The intensity of both payoff and transition at time t k is h k = t k+1 t k That is g h = hg and ρ h = (1 h)id + hρ. Shapley operator of "exact game" with duration h : Ψ h = (1 h)id + hψ 12 G.Vigeral (with S. Sorin) Operator approach

13 Exact games with varying stage duration Some natural questions 1 What happens, for a fixed horizon t or discount factor λ, when the duration h i of each stage vanishes? Does the value converge, to which limit? 2 What happens, for a fixed sequence of stage duration h i, when the horizon goes to infinity or the discount factor goes to 0. Does the normalized value converge, to which limit? 3 What happens when both λ (or 1 n ) and h i go to 0? 4 What can be said of optimal strategies in games with varying duration? Neyman answers questions for finite discounted games. Here we use the operator approach to give a general answer to G.Vigeral (with S. Sorin) Operator approach

14 Exact games with varying stage duration Finite horizon Game with finite horizon and varying duration Finite horizon t, finite sequence of stage duration h 1,,h n with h i = t. The value V of such a game satisfies V = z n with z i+1 = Ψ hi (z i ) = (1 h i )z i + h i Ψ(z i ) z i+1 z i h i = (Id Ψ)(z i ) Eulerian scheme associated to f = (Id Ψ)(f ). One can use general results associated to such schemes, for any non expansive operator defined on a Banach space. 14 G.Vigeral (with S. Sorin) Operator approach

15 Exact games with varying stage duration Finite horizon Eulerian schemes in Banach spaces For general nonexpansive Ψ: Proposition (Miyadera-Oharu 70) f nh (z 0 ) Ψ n h (z 0) z 0 Ψ(z 0 ) h n. Proposition (V. 10) If z i+1 = (1 h i )z i + h i Ψ(z i ), then f t (z 0 ) x n z 0 Ψ(z 0 ) n h 2 i. i=1 with t = n i=1 h i. 15 G.Vigeral (with S. Sorin) Operator approach

16 Exact games with varying stage duration Result with t fixed Finite horizon Let h = maxh i and t = h i, then V f (t) K ht. Hence as the mesh h goes to 0, the value of the game goes to f (t). f (t) can be interpreted as the value of a game played in continuous time (Neyman 13). 16 G.Vigeral (with S. Sorin) Operator approach

17 Exact games with varying stage duration Asymptotic results Finite horizon For any h i, V f (t) K. t t All the repeated games with varying stage duration have the same (normalized) asymptotic behavior. Same asymptotic behavior for the normalized value in continuous time f (t) t and for the normalized value of the original game v n. 17 G.Vigeral (with S. Sorin) Operator approach

18 Exact games with varying stage duration Discounted evaluation Game with discount factor and varying duration Discount factor λ = weight on the payoff on [0,1] compared to [0,+ ]. Infinite sequence of stage durations h 1,,h n,. ( When h is constant, normalized value w h λ = λψ 1 λh h λ In general w is ( ) + with D h i λ i=1 D h λ (f ) = λψ h (0) ( 1 λh λ ) f. ). 18 G.Vigeral (with S. Sorin) Operator approach

19 Exact games with varying stage duration Discounted evaluation Result with λ fixed and vanishing duration For a uniform duration h, w h λ = w µ with µ = λ 1+λ λh. For any λ and h i h, the value w of the λ discounted game with stage durations h i satisfies w ŵ λ Kh with ŵ λ := w λ. 1+λ Hence as the mesh h goes to 0, the value of the game goes to w λ. Already known when the game is finite 1+λ (Neyman 2013). ŵ λ can be interpreted as the value of a game played in continuous time (Neyman 13). 19 G.Vigeral (with S. Sorin) Operator approach

20 Exact games with varying stage duration Asymptotic results Discounted evaluation Assumption: there exists nondecreasing k :]0,1] R + and l : [0,+ ] R + with k(λ) = o( λ) as λ goes to 0 and D 1 λ (z) D1 µ(z) k( λ µ )l( z ) for all (λ,µ) ]0,1] 2 and z Z. Always true for Shapley operators of games with bounded payoff. Then for any λ and h i, the value w of the λ discounted game with stage durations h i satisfies w w λ Kλ. All the repeated games with varying stage duration have the same (normalized) asymptotic behavior as λ goes to 0. Same asymptotic behavior for the normalized value in continuous time ŵ λ and for the normalized value of the original game w λ. 20 G.Vigeral (with S. Sorin) Operator approach

21 Discretization of a continuous timed game Table of contents 1 Zero-sum stochastic games 2 Exact games with varying stage duration Finite horizon Discounted evaluation 3 Discretization of a continuous timed game 4 Conclusion and remarks 21 G.Vigeral (with S. Sorin) Operator approach

22 Model Discretization of a continuous timed game Finite state space. P t (i,j) is a continuous time homogeneous Markov chain on Ω, indexed by R +, with generator Q(i,j): Ṗ t (i,j) = P t (i,j)q(i,j). G h is the discretization with mesh h of the game in continuous time G where the state variable follows P t and is controlled by both players (Zachrisson 64, Tanaka Wakuta 77, Guo Hernadez-Lerma 03, Neyman 12) Players act at time s = kh by choosing actions (i s,j s ) (at random according to some x s, resp. y s ), knowing the current state. Between time s and s + h, state ω t evolves with conditional law P t 22 G.Vigeral (with S. Sorin) Operator approach

23 Discretization of a continuous timed game Results Shapley operator is Ψ h (f ) = val X Y {gh + P h f } where g h (ω 0,x,y) stands for E[ h 0 g(ω t;x,y)dt] and P h (x,y) = I J Ph (i,j)x(di)y(dj). Ψ h (f ) Ψ h (f ) = (1 + f )O(h 2 ) where Ψ is the Shapley operator of the (discrete time) stochastic game with payoff g and transition Id + Q. Hence all the results of previous section involving small h still hold. 23 G.Vigeral (with S. Sorin) Operator approach

24 Conclusion and remarks Table of contents 1 Zero-sum stochastic games 2 Exact games with varying stage duration Finite horizon Discounted evaluation 3 Discretization of a continuous timed game 4 Conclusion and remarks 24 G.Vigeral (with S. Sorin) Operator approach

25 Conclusion Conclusion and remarks We recover and generalize some results of Neyman 13, using only properties of nonexpansive operators. Only assumptions are : a) Ψ is well defined and 1-Lipschitz b) the current state is observed. Same asymptotic structure of original game, games with varying duration, and game in continuous time. Counterexamples of convergence of values with observations of states (V., Ziliotto, Sorin V.) are thus also oscillating with varying duration. 25 G.Vigeral (with S. Sorin) Operator approach

26 Conclusion and remarks Open questions What happens with a general weight on the payoff (not finite horizon or constant discount factor)? When h goes to 0, results by Neyman (finite games) and Sorin (using viscosity techniques). What happens when all the weight goes to infinity (analogous to t goes to infinity or λ to 0). What if the state is not observed? 26 G.Vigeral (with S. Sorin) Operator approach

27 A stupid game Conclusion and remarks S C 0 1 ω ω C S Only one player He observes (and remember) his moves but not the state. Starting state ω. Clearly w λ tends to 1 as λ goes to 0 27 G.Vigeral (with S. Sorin) Operator approach

28 Conclusion and remarks A (not so) stupid game with varying stage duration. C S 0 1 ω ω C S As long as Player 1 plays change the probability of being in the first state satisfies p t+1 = (1 h)p t + h(1 p t ) = (1 2h)p t + h. Hence h 1 2, wh λ tends to 1 2 as λ goes to 0. In fact wh λ 1 2 for any h G.Vigeral (with S. Sorin) Operator approach

29 Conclusion and remarks Thank you for your attention Thank you! 29 G.Vigeral (with S. Sorin) Operator approach

Limit value of dynamic zero-sum games with vanishing stage duration

Limit value of dynamic zero-sum games with vanishing stage duration Sylvain Sorin IMJ-PRG Université P. et M. Curie - Paris 6 sylvain.sorin@imj-prg.fr Workshop on Analysis and Applications of Stochastic