The Value Functions of Markov Decision Processes

Size: px
Start display at page:

Download "The Value Functions of Markov Decision Processes"

Transcription

1 The Value Functions of Markov Decision Processes arxiv:.0377v [math.pr] 7 Nov 0 Ehud Lehrer, Eilon Solan, and Omri N. Solan November 0, 0 Abstract We provide a full characterization of the set of value functions of Markov decision processes. Introduction Markov decision processes are a standard tool for studying dynamic optimization problems. The discounted value of such a problem is the maximal total discounted amount that the decision maker can guarantee to himself. By Blackwell (96), the function λ v λ (s) that assigns the discounted value at the initial state s to each discount factor λ is the maximum of finitely many rational functions (with real coefficients). Standard arguments show that the roots of the polynomial in the denominator of these rational functions either lie outside the unit ball in the complex plane, or on the boundary of the unit ball, in which case they have Lehrer acknowledges the support of the Israel Science Foundation, Grant #963/. Solan acknowledges the support of the Israel Science Foundation, Grants #33/3. School of Mathematical Sciences, Tel Aviv University, Tel Aviv , Israel and INSEAD, Bd. de Constance, 7730 Fontainebleau Cedex, France. lehrer@post.tau.ac.il. School of Mathematical Sciences, Tel Aviv University, Tel Aviv , Israel. eilons@post.tau.ac.il. School of Mathematical Sciences, Tel Aviv University, Tel Aviv , Israel. omrisola@post.tau.ac.il.

2 multiplicity. Using the theory of eigenvalues of stochastic matrices one can show that the roots on the boundary of the unit ball must be unit roots. In this note we prove the converse result: every function λ v λ that is the maximum of finitely many rational functions such that each root of the polynomials in the denominators either lies outside the unit ball in the complex plane, or is a unit root with multiplicity is the value of some Markov decision process. The Model and the Main Theorem Definition A Markov decision process is a tuple (S, A, r, q) where S is a finite set of states. A distribution µ (S) according to which the initial state is chosen. A = (A(s)) s S is the family of sets of actions available at each state s S. Denote SA := {(s, a): s S, a A(s)}. r : SA R is a payoff function. q : SA (S) is a transition function. The process starts at an initial state s S, chosen according to µ. It then evolves in discrete time: at every stage n N the process is in a state s n S, the decision maker chooses an action a n A(s n ), and a new state s n+ is chosen according to q( s n, a n ). A finite history is a sequence h n = (s, a, s, a,, s n ) H := k=0 (SA)k S. A pure strategy is a function σ : H s S A(s) such that σ(h n ) A(s n ) for every finite history h n = (s, a,, s n ), and a behavior strategy is a function σ : H s S (A(s)) such that σ(h n ) (A(s n )) for every such finite history. The set of behavior strategies is denoted B. In other words, σ assign to every finite history a distribution over A, which we call a mixed action. A strategy is stationary For every finite set X, the set of probability distributions over X is denoted (X).

3 if for every finite history h n = (s, a,, s n ), the mixed action σ(h n ) is a function of s n and is independent of (s, a,, a n ). Every behavior strategy together with a prior distribution µ over the state space induce a probability distribution P µ,σ over the space of infinite histories (SA) (which is endowed with the product σ-algebra). Expectation w.r.t. this probability distribution is denoted E µ,σ. For every discount factor λ [0, ), the λ-discounted payoff is [ ] γ λ (µ, σ) := E µ,σ λ n r(s n, a n ). n= When µ is a probability measure that is concentrated on a single state s we denote the λ-discounted payoff also by γ(s, σ). decision process, with the prior µ over the initial state is The λ-discounted value of the Markov v λ (µ) := sup γ λ (µ, σ). () σ B A behavior strategy is λ-discounted optimal if it attains the maximum in (). Denote by V the set of all functions λ v λ (µ) that are the value function of some Markov decision processes starting with some prior µ (S). The goal of the present note is to characterize the set V. Notation (i) Denote by F the set of all rational functions P such that each Q root of Q is (a) outside the unit ball, or (b) a unit root with multiplicity. (ii) Denote by MaxF the set of functions that are the maximum of a finite number of functions in F. The next proposition states that any function in V is the maximum of a finite number of functions in F Proposition V MaxF. Proof. By Blackwell (96), for every λ [0, ) there is a λ-discounted pure stationary optimal strategy. Since the number of pure stationary strategies is Recall that a complex number ω C is a unit root if there exists n N such that ω n =. 3

4 finite, it is sufficient to show that the function λ γ λ (µ, σ) is in F, for every pure stationary strategy σ. For every pure stationary strategy σ, every prior µ, and every discount factor λ [0, ), the vector (γ λ (s, σ)) s S is the unique solution of a system of S linear equations 3 in λ: γ λ (s, σ) = r(s, σ(s)) + λ q(s s, σ(s))γ λ (s, σ), s S, s S where r(s, σ(s)) := a A(s) σ(a s)r(s, a) and q(s s, σ(s)) := a A(s) σ(a s)q(s s, a) are the multilinear extensions of r and q, respectively. It follows that γ λ (, σ) = (I λq(, σ( ))) r(, σ( )), where Q(, σ( )) = Q = (Q s,s ) s,s S is the transition matrix induced by σ, that is, Q s,s = q(s s, σ(s)). By Cramer s rule, the function λ (I λq) is a rational function whose denominator is det(i λq). In particular, the roots of the denominator are the inverse of the eigenvalues of Q. Since the denominator is independent of s, it is also the denominator of γ λ (µ, σ) = s S µ(s)γ λ(s, σ). Denote the expected payoff at stage n by x n := E µ,σ [r(s n, σ(h n ))], so that γ λ (µ, σ) = n= x nλ n. Since x n max s S,a A(s) r(s, a) for every n N, it follows that the denominator det(i λq(, σ( )) does not have roots in the interior of the unit ball and that all its roots that lie on the boundary of the unit ball have multiplicity. Moreover, by, e.g., Higham and Lin (0) it follows that the roots that lie on the boundary of the unit ball must be unit roots. The main result of this note is that the converse holds as well. Theorem A function w : [0, ) R is in V if and only if it is the maximum of a finite number of functions in F. To avoid cumbersome notations we write f(λ) for the function λ f(λ). In particular, λf(λ) will denote the function λ λf(λ). We start with the following observation. 3 The result is valid for every stationary strategy, not necessarily pure. We will need it only for pure stationary strategies. 4

5 Lemma If f, g V then max{f, g} V. Proof. Let M f = (S f, µ f, A f, r f, q f ) and M g = (S g, µ g, A g, r g, q g ) be the Markov decision processes that implement f and g respectively. To implement max{f, g}, define a Markov decision process that contains M f, M g, and an additional state s, in which the decision maker chooses one of M f and M g. Formally, let M = (S f S g {s }, A, r, q ) be a Markov decision process in which A, r, and q coincide with A f, r f, and q f (resp. A g, r g, and q g ) on S f (resp. S g ), and in which in the state s the decision maker chooses whether to follow M f or M g, and the payoff and transitions at that state are the expectation of the payoff and transitions at the first stage in M f (resp. M g ) according to µ f (resp. µ g ): A (s ) := ( s S A f (s)) ( s S A g (s)) {f, g}, and for every a f s S A f (s) and every a g s S A g (s), r (s, (a f, a g, x)) := q ( s, (a f, a g, x)) := { s S µ f(s)r f (s, a f (s)) x = f, s S µ g(s)r g (s, a f (s)) x = g. { s S µ f(s)q f ( (s,f, a f (s))) x = f, s S µ g(s)q g ( (s,g, a g (s))) x = g. The reader can verify that the value function of M at the initial state s max{f, g}. is 3 Degenerate Markov Decision Processes As mentioned earlier, for every pure stationary strategy σ and every initial state s, the function λ γ λ (s, σ) is a rational function of λ. Since there is a λ-discounted pure stationary optimal strategy, and since there are finitely many such functions, it follows that the function λ v λ is the maximum of finitely many rational functions, each of which is the payoff function of some pure stationary strategy. When the decision maker follows a pure stationary strategy, we are reduced to a Markov decision process in which there is a single action in each state. This observation leads us to the following definition.

6 A Markov decision process is degenerate if A(s) = for every s S. When M is a degenerate Markov decision process we omit the reference to the action in the functions r and q. A degenerate Markov decision process is thus a quadruple (S, µ, r, q), where S is the state space, µ is a probability distribution over S, r : S R, and q( s) is a probability distribution for every state s S. Denote by V D the set of all functions that are payoff functions of some degenerate Markov decision process. Plainly, V D V. To prove Theorem, we first prove that degenerate Markov decision processes implement all functions in F. Theorem F V D. Theorem and Lemma show that MaxF V, and together with Proposition they establish Theorem. The rest of the paper is dedicated to the proof of Theorem. 3. Characterizing the set V D The following lemma lists several properties of the functions implementable by degenerate Markov decision processes. Lemma For every f V D we have a) af(λ) V D for every a R. b) f( λ) V D. c) λf(λ) V D. d) f(cλ) V D for every c [0, ]. e) f(λ) + g(λ) V D for every g V D. f) f(λ n ) V D for every n N. 6

7 Proof. Let M f = (S f, µ f, r f, q f ) be a degenerate Markov decision process whose value function is f. To prove Part (a), we multiply all payoffs in M f by a. Formally, define a degenerate Markov decision process M = (S f, µ f, r, q f ) that differs from M only in its payoff function: r (s) := ar f (s) for every s S f. The reader can verify that the value function of M at the initial state s,f is af(λ). To prove Part (b), multiply the payoff in even stages by. Formally, let Ŝ be a copy of S f ; for every state s S f we denote by ŝ its copy in Ŝ. Define a degenerate Markov decision process M = (S f Ŝ, µ f, r, q ) with initial distribution µ f (whose support is S f ) that visits states in Ŝ in even stages and states in S f in odd stages as follows: r (s) := r f (s), r (ŝ) := r f (s), s S f, q (ŝ s) = q (s ŝ) := q f (s s), s, s S f, q (s s) = q (ŝ ŝ) := 0, s, s S f. The reader can verify that the value function of M is f( λ). To prove part (c), add a state with payoff 0 from which the transition probability to a state in S f coincides with µ. Formally, define a degenerate Markov decision process M = (S f {s }, µ, r, q ) in which µ assigns probability to s. r coincides with r f on S f, while r (s ) := 0. Finally, q coincides with q f on S f, while at the state s, q (s s ) := µ(s ). The value function of M is λf(λ). To prove part (d), consider the transition function that at every stage, moves to an absorbing state 4 with payoff 0 with probability c, and with probability c continues as in M. Formally, define a degenerate Markov decision process M = (S f {s }, µ, r, q ) in which µ coincides with µ f, r and q coincide with r f and q f on S f, r (s ) := 0, and q (s s ) := (that is, s is an absorbing state), and q (s s) := c, q (s s) := cq f (s s), s, s S f. The value function of M at the initial state s,f is f(cλ). 4 A state s S is absorbing if q(s s, a) = for every action a A(s). 7

8 To prove Part (e), we show that f + g is in V D and we use part (a) with a =. The function f + g is the value function of the degenerate Markov decision process in which the prior chooses with probability one of two degenerate Markov decision processes that implement f and g. Formally, let M g = (S g, µ g, r g, q g ) be a degenerate Markov decision process whose value function is g. Let M = (S f S g, µ, r, q ) be the degenerate Markov decision process whose state space consists of disjoint copies of S f and S g, the functions r (resp. q ) coincide with r f and r g (resp. q f and q g ) of S f (resp. S g, and the initial distribution is µ = µ f + µ g. The value function of M is f + g. To prove Part (f), we space out the Markov decision process in a way that stage k of the Markov decision process that implements f becomes stage +(k )n, and the payoff in all other stages is 0. Formally, Let M = (S f {,,, n}, µ, r, q ) be a degenerate Markov decision process where µ = µ f and q ((s, k + ) (s, k)) := k {,,..., n }, s S, q ( (s, n)) := q f (s) s S, r ((s, )) := r f (s) s S, r ((s, k)) := 0 k {, 3,..., n}, s S. The value function of M with the prior µ is f(λ n ). Lemma 3 in V D. a) Every polynomial P is in V D and if f V D, then P f is also b) Let P and Q be two polynomials. If Q V D then P Q V D. In particular, if Q divides Q and Q V D then Q V D. c) If Q is a polynomial whose roots are all unit roots of multiplicity, then V Q D. Throughout the paper, whenever we refer to a polynomial we mean a polynomial with real coefficients. 8

9 Proof. Part (a) follows from Lemma (a,c,e) and the observation that any constant function a is in V D, which holds since the constant function a is the value function of the degenerate Markov decision process that starts with a state whose payoff is a and continues to an absorbing state whose payoff is 0. Part (b) follows from Part (a). We turn to prove Part (c). The degenerate Markov decision process with a single state in which the payoff is yields payoff, and therefore V λ λ D. By Lemma (f) it follows that V λ n D, for every n N. Let n be large enough such that Q divides λ n. The result now follows by Part (b). To complete the proof of Theorem we characterize the polynomials Q that satisfy Q V D. To this end we need the following property of V D. Lemma 4 If f, g V D then f(λ)g(λc) V D for every c (0, ). Proof. The proof of Lemma 4 is the most intricate part of the proof of Theorem. We start with an example, that will help us illustrate the formal definition of the degenerate MDP that implements f(λ)g(λc). Let M f and M g be the degenerate Markov decision processes that are depicted in Figure with the initial distributions µ f (s f ) = and µ g (s,g ) =, in which the payoff at each state appears in a square next to the state. Denote by f and g the value functions of M f and M g, respectively. s f s f 3 4 s,g s,g s 3,g s f 3 MDP M f MDP M g Figure : An example of two MDP s. Consider the degenerate Markov decision process M depicted in Figure, where c (0, ) and the initial state is s,g. 9

10 ( c) ( c) 3( c) s c s,g c c,g s 3,g s,f c 4( c) 4 s,f 3 s,f c 4( c) 4 s,f 4 6 s 3,f c 4( c) 4 s 3,f 6 9 s,f s,f 3 s 3,f Figure : The degenerate MDP M. The MDP M is composed of one copy of M g, and for every state in S g it contains one copy of M f. It starts at s,g, the initial state of M g. Then, at every stage, with probability c it continues as in M g, and with probability ( c) it moves to a copy of M f. In case a transition to a copy of M f occurs, the new state is chosen according to the transitions q f ( s f ). This induces a distribution similar to that of the second stage of M f. The payoff in each of the copies of M f is the product of the payoff in M f times the payoff of the state in M g that has been assigned to that copy. The payoff in each state s g S g is ( c)r g (s g ) times the expected payoff in the first stage of M f. Thus, each state in s g S g serves three purposes (see Figure ). First, it is a regular state in the copy of M g. Second, once a transition from M g to a copy of M f occurs (at each stage it occurs with probability ( c)), it serves as the first stage in M f. Finally, once a transition from s g to a copy of M f occurs, the payoffs in the copy are set to the product of the original payoffs in M f times r g (s g ). We now turn to the formal construction of M. Let f, g V D and let M f = (S f, µ f, r f, q f ) (resp. M g = (S g, µ g, r g, q g )) be the degenerate Markov decision process that implements f (resp. g). Define the following degenerate Markov decision process M = (S, µ, r, q): The set of states is S = (S g S f ) S g. In words, the set of states contains 0

11 a copy of S g, and for each state s g S g it contains a copy of S f. The initial distribution is µ g : µ(s g ) = µ g (s g ), s g S g, µ(s g, s f ) = 0, (s g, s f ) S g S f. The transition is as follows: In each copy of S f, the transition is the same as the transition in M f : q((s g, s f) (s g, s f )) := q f (s f s f ), s g S g, s f, s f S f, q((s g, s f) (s g, s f )) := 0, s g s g S g, s f, s f S f. In the copy of S g, with probability c the transition is as in M g, and with probability ( c) it is as in M f starting with the initial distribution µ f : q(s g s g ) := cq g (s g s g ), s g, s g S g, q((s g, s f ) s g ) := ( c) µ(s f)q f (s f s f), s g S g, s f S f, s f S f q((s g, s f ) s g ) := 0 s g s g S g, s f S f. The payoff function is as follows: r(s g ) := ( c)r g (s g ) µ f (s f )r f (s f ), s g S g, s f S f r(s g, s f ) := r g (s g )r f (s f ), s g S g, s f S f. We will now calculate the value function of M. Denote by E µf [r f ] := s f S f µ f (s f )r f (s f ) the expected payoff in M f at the first stage and by R the expected payoff in M f from the second stage and on. Then f(λ) = E νf [r f ] + λr.

12 At every stage, with probability c the process remains in S g and with probability ( c) the process leaves it. In particular, the probability that at stage n the process is still in S g is c n, in which case (a) the payoff is ( c)r g (s n )E µf [r f ], and (b) with probability ( c) the process moves to a copy of M f, and the total discounted payoff from stage n + and on is R. It follows that the total discounted payoff is c n λ n ( ( c)r g (s n )E µf (r f ) + ( c)λr g (s n )R ) n= = ( c) c n λ n r g (s k )f(λ) n= = ( c)g(cλ)f(λ). The result follows by Lemma (a). Lemma Let ω C be a complex number that lies outside the unit ball. 6 a) If ω C \ R then (ω λ)(ω λ) V D. b) If ω R then (ω λ) V D. Proof. We start by proving part (a). For every complex number ω C \ R that lies outside the unit ball there are three natural numbers k < l < m and three nonnegative reals α, α, α 3 that sum up to such that = α ω k + α ω l + α 3 ω m. Consider the degenerate Markov decision process that is depicted in Figure 3. That is, the set of states is S f := {s, s,, s m }, the payoff function is and the transition function is r(s m ) :=, r(s j ) := 0, j < m, q(s m k+ s m ) := α, q(s m l+ s m ) := α,, q(s s m ) := α 3, q(s j+ s j ) =, j < m. 6 For every complex number ω, the conjugate of ω is denoted ω.

13 s s s m l+ s m k+ s m α 3 α α Figure 3: The degenerate MDP in the proof of Lemma. The discounted value satisfies It follows that v λ (s j ) = λv λ (s j+ ), 0 j < m, v λ (s m ) = + λ ( α v λ (s m k+ ) + α v λ (s m l+ ) + α 3 v λ (s ) ). v λ (s m ) = α λ k α λ l α 3 λ m. Hence, this function is in V D. Since ω is one of its roots, by Lemma 3(b) we obtain that (ω λ)(ω λ) V D, as desired. We turn to prove part (b). Let ω be a real number with ω >. By Lemma 3(b) λ is in V D. By Lemma (d), λ ω When ω <, by Lemma (a,b), ω λ V D. V D, and by Lemma (a), ω λ V D. 4 The Proof of Theorem Let Q 0 be a polynomial with real coefficients whose roots are either outside the unit ball or unit roots with multiplicity. To complete the proof of Theorem we prove that Q V D. Denote by Ω the set of all roots of Q that are unit roots, by Ω the set of all roots of Q that lie outside the unit ball and have a positive imaginary part, and by Ω 3 the set of all real roots of Q that lie outside the unit ball. If some roots have multiplicity larger than, then they appear several times in Q or Q 3. For i =, 3 denote Q i = ω Ω i (ω λ) and set Q = ω Ω (ω λ)(ω λ); when Ω i = we set Q i =. Then Q = Q Q Q 3. If Ω, then by Lemma 3(c) we have Q V D. Otherwise Q =, in which case, Q 3 V D by Lemma 3(a).

14 Fix ω Ω and let c R be such that < c < ω. Since ω c lies outside the unit ball, Lemma (a) implies that g ω (λ) := is in V ( D. By Lemma ω c λ)( ω c λ) 4(a), g ω ( λ) c c Q = (ω λ)(ω λ) Q V D. By Lemma (a), (ω λ)(ω λ) Q V D. Applying successively this argument for the remaining roots in Ω, one obtains that Q Q V D. To complete the proof we apply a similar idea to ω Ω 3. Fix ω Ω 3 and let c R be such that < c < ω. By Lemma (b), ω c λ Lemma 4(a) and Lemma (a), (ω λ) Q Q every ω Ω 3 one obtains that Q 3 Q Q Final Remark V D and again by By V D. By iterating this argument for V D, as desired. The set V contain all value functions of MDP s in which the state at the first stage is chosen according to a probability distribution µ. One can wonder whether the set of implementable value functions shrinks if one restricts attention to MDP s in which the first stage is given; that is, µ assigns probability to one state. The answer is negative: the value function of any MDP in which the initial state is chosen according to a probability distribution (prior) can be obtained as the value function of an MDP in which the initial state is deterministic. Indeed, let M be an MDP with a prior. One can construct M by adding to M an initial state s in which the payoff is the expected payoff at the first stage of M and the transitions are the expected transitions after the first stage of M. We applied a similar idea in the proof of Lemma. References [] Blackwell D. (96) Discounted Dynamic Programming. Annals of Mathematical Statistics, 36, 6 3. [] Higham N.J. and Lin L. (0) On p th Roots of Stochastic Matrices. Linear Algebra and its Applications, 3,

Quitting games - An Example

Quitting games - An Example Quitting games - An Example E. Solan 1 and N. Vieille 2 January 22, 2001 Abstract Quitting games are n-player sequential games in which, at any stage, each player has the choice between continuing and

More information

Learning to play partially-specified equilibrium

Learning to play partially-specified equilibrium Learning to play partially-specified equilibrium Ehud Lehrer and Eilon Solan August 29, 2007 August 29, 2007 First draft: June 2007 Abstract: In a partially-specified correlated equilibrium (PSCE ) the

More information

SANDWICH GAMES. July 9, 2014

SANDWICH GAMES. July 9, 2014 SANDWICH GAMES EHUD LEHRER AND ROEE TEPER July 9, 204 Abstract. The extension of set functions (or capacities) in a concave fashion, namely a concavification, is an important issue in decision theory and

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Computing Uniformly Optimal Strategies in Two-Player Stochastic Games

Computing Uniformly Optimal Strategies in Two-Player Stochastic Games Noname manuscript No. will be inserted by the editor) Computing Uniformly Optimal Strategies in Two-Player Stochastic Games Eilon Solan and Nicolas Vieille Received: date / Accepted: date Abstract We provide

More information

The Game of Normal Numbers

The Game of Normal Numbers The Game of Normal Numbers Ehud Lehrer September 4, 2003 Abstract We introduce a two-player game where at each period one player, say, Player 2, chooses a distribution and the other player, Player 1, a

More information

CSE250A Fall 12: Discussion Week 9

CSE250A Fall 12: Discussion Week 9 CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.

More information

Excludability and Bounded Computational Capacity Strategies

Excludability and Bounded Computational Capacity Strategies Excludability and Bounded Computational Capacity Strategies Ehud Lehrer and Eilon Solan June 11, 2003 Abstract We study the notion of excludability in repeated games with vector payoffs, when one of the

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

The Complexity of Ergodic Mean-payoff Games,

The Complexity of Ergodic Mean-payoff Games, The Complexity of Ergodic Mean-payoff Games, Krishnendu Chatterjee Rasmus Ibsen-Jensen Abstract We study two-player (zero-sum) concurrent mean-payoff games played on a finite-state graph. We focus on the

More information

Linear Programming Methods

Linear Programming Methods Chapter 11 Linear Programming Methods 1 In this chapter we consider the linear programming approach to dynamic programming. First, Bellman s equation can be reformulated as a linear program whose solution

More information

A Necessary and Sufficient Condition for Convergence of Statistical to Strategic Equilibria of Market Games

A Necessary and Sufficient Condition for Convergence of Statistical to Strategic Equilibria of Market Games A Necessary and Sufficient Condition for Convergence of Statistical to Strategic Equilibria of Market Games Dimitrios P. Tsomocos Dimitris Voliotis Abstract In our model, we treat a market game where traders

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Probabilistic Planning. George Konidaris

Probabilistic Planning. George Konidaris Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

Noncooperative continuous-time Markov games

Noncooperative continuous-time Markov games Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and

More information

Infinite-Horizon Discounted Markov Decision Processes

Infinite-Horizon Discounted Markov Decision Processes Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Simplex Algorithm for Countable-state Discounted Markov Decision Processes

Simplex Algorithm for Countable-state Discounted Markov Decision Processes Simplex Algorithm for Countable-state Discounted Markov Decision Processes Ilbin Lee Marina A. Epelman H. Edwin Romeijn Robert L. Smith November 16, 2014 Abstract We consider discounted Markov Decision

More information

Motivation for introducing probabilities

Motivation for introducing probabilities for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.

More information

The quest for finding Hamiltonian cycles

The quest for finding Hamiltonian cycles The quest for finding Hamiltonian cycles Giang Nguyen School of Mathematical Sciences University of Adelaide Travelling Salesman Problem Given a list of cities and distances between cities, what is the

More information

Robust Markov Decision Processes

Robust Markov Decision Processes Robust Markov Decision Processes Wolfram Wiesemann, Daniel Kuhn and Berç Rustem February 9, 2012 Abstract Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments.

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Module 8 Linear Programming. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 8 Linear Programming. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 8 Linear Programming CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Policy Optimization Value and policy iteration Iterative algorithms that implicitly solve

More information

Signaling and mediation in games with common interests

Signaling and mediation in games with common interests Signaling and mediation in games with common interests Ehud Lehrer, Dinah Rosenberg and Eran Shmaya February 5, 2006 Abstract Players who have a common interest are engaged in a game with incomplete information.

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Reinforcement Learning and Deep Reinforcement Learning

Reinforcement Learning and Deep Reinforcement Learning Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q

More information

Vector Spaces and SubSpaces

Vector Spaces and SubSpaces Vector Spaces and SubSpaces Linear Algebra MATH 2076 Linear Algebra Vector Spaces & SubSpaces Chapter 4, Section 1b 1 / 10 What is a Vector Space? A vector space is a bunch of objects that we call vectors

More information

Two-player games : a reduction

Two-player games : a reduction Two-player games : a reduction Nicolas Vieille December 12, 2001 The goal of this chapter, together with the next one, is to give a detailed overview of the proof of the following result. Theorem 1 Let

More information

An Inverse Method for Policy-Iteration Based Algorithms

An Inverse Method for Policy-Iteration Based Algorithms An Inverse Method for Policy-Iteration Based Algorithms Étienne André LSV, ENS Cachan & CNRS Cachan, France andre@lsv.ens-cachan.fr Laurent Fribourg LSV, ENS Cachan & CNRS Cachan, France fribourg@lsv.ens-cachan.fr

More information

A subexponential lower bound for the Random Facet algorithm for Parity Games

A subexponential lower bound for the Random Facet algorithm for Parity Games A subexponential lower bound for the Random Facet algorithm for Parity Games Oliver Friedmann 1 Thomas Dueholm Hansen 2 Uri Zwick 3 1 Department of Computer Science, University of Munich, Germany. 2 Center

More information

PERTURBATIONS OF MARKOV CHAINS WITH APPLICATIONS TO STOCHASTIC GAMES. Northwestern University Evanston, USA Tel Aviv University Tel Aviv, Israel

PERTURBATIONS OF MARKOV CHAINS WITH APPLICATIONS TO STOCHASTIC GAMES. Northwestern University Evanston, USA Tel Aviv University Tel Aviv, Israel PERTURBATIONS OF MARKOV CHAINS WITH APPLICATIONS TO STOCHASTIC GAMES EILON SOLAN Northwestern University Evanston, USA Tel Aviv University Tel Aviv, Israel Abstract. In this chapter we will review several

More information

Stochastic Shortest Path Problems

Stochastic Shortest Path Problems Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can

More information

SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES

SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES This document is meant as a complement to Chapter 4 in the textbook, the aim being to get a basic understanding of spectral densities through

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

REVERSIBILITY AND OSCILLATIONS IN ZERO-SUM DISCOUNTED STOCHASTIC GAMES

REVERSIBILITY AND OSCILLATIONS IN ZERO-SUM DISCOUNTED STOCHASTIC GAMES REVERSIBILITY AND OSCILLATIONS IN ZERO-SUM DISCOUNTED STOCHASTIC GAMES Sylvain Sorin, Guillaume Vigeral To cite this version: Sylvain Sorin, Guillaume Vigeral. REVERSIBILITY AND OSCILLATIONS IN ZERO-SUM

More information

Markov Decision Processes with Multiple Long-run Average Objectives

Markov Decision Processes with Multiple Long-run Average Objectives Markov Decision Processes with Multiple Long-run Average Objectives Krishnendu Chatterjee UC Berkeley c krish@eecs.berkeley.edu Abstract. We consider Markov decision processes (MDPs) with multiple long-run

More information

AM 121: Intro to Optimization Models and Methods: Fall 2018

AM 121: Intro to Optimization Models and Methods: Fall 2018 AM 11: Intro to Optimization Models and Methods: Fall 018 Lecture 18: Markov Decision Processes Yiling Chen Lesson Plan Markov decision processes Policies and value functions Solving: average reward, discounted

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

arxiv: v1 [math.pr] 21 Jul 2014

arxiv: v1 [math.pr] 21 Jul 2014 Optimal Dynamic Information Provision arxiv:1407.5649v1 [math.pr] 21 Jul 2014 Jérôme Renault, Eilon Solan, and Nicolas Vieille August 20, 2018 Abstract We study a dynamic model of information provision.

More information

THE DYNAMICS OF PREFERENCES, PREDICTIVE PROBABILITIES, AND LEARNING. August 16, 2017

THE DYNAMICS OF PREFERENCES, PREDICTIVE PROBABILITIES, AND LEARNING. August 16, 2017 THE DYNAMICS OF PREFERENCES, PREDICTIVE PROBABILITIES, AND LEARNING EHUD LEHRER AND ROEE TEPER August 16, 2017 Abstract. We take a decision theoretic approach to predictive inference as in Blackwell [4,

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

UNIQUENESS OF ENTIRE OR MEROMORPHIC FUNCTIONS SHARING ONE VALUE OR A FUNCTION WITH FINITE WEIGHT

UNIQUENESS OF ENTIRE OR MEROMORPHIC FUNCTIONS SHARING ONE VALUE OR A FUNCTION WITH FINITE WEIGHT Volume 0 2009), Issue 3, Article 88, 4 pp. UNIQUENESS OF ENTIRE OR MEROMORPHIC FUNCTIONS SHARING ONE VALUE OR A FUNCTION WITH FINITE WEIGHT HONG-YAN XU AND TING-BIN CAO DEPARTMENT OF INFORMATICS AND ENGINEERING

More information

Exchangeable Processes: de Finetti Theorem Revisited

Exchangeable Processes: de Finetti Theorem Revisited Exchangeable Processes: de Finetti Theorem Revisited Ehud Lehrer and Dimitry Shaiderman January 16, 2018 Abstract: A sequence of random variables is exchangeable if the joint distribution of any finite

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5. VISCOSITY SOLUTIONS PETER HINTZ We follow Han and Lin, Elliptic Partial Differential Equations, 5. 1. Motivation Throughout, we will assume that Ω R n is a bounded and connected domain and that a ij C(Ω)

More information

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process

More information

Real Time Value Iteration and the State-Action Value Function

Real Time Value Iteration and the State-Action Value Function MS&E338 Reinforcement Learning Lecture 3-4/9/18 Real Time Value Iteration and the State-Action Value Function Lecturer: Ben Van Roy Scribe: Apoorva Sharma and Tong Mu 1 Review Last time we left off discussing

More information

CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam

CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam CSE 573 Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming Slides adapted from Andrey Kolobov and Mausam 1 Stochastic Shortest-Path MDPs: Motivation Assume the agent pays cost

More information

(IV.C) UNITS IN QUADRATIC NUMBER RINGS

(IV.C) UNITS IN QUADRATIC NUMBER RINGS (IV.C) UNITS IN QUADRATIC NUMBER RINGS Let d Z be non-square, K = Q( d). (That is, d = e f with f squarefree, and K = Q( f ).) For α = a + b d K, N K (α) = α α = a b d Q. If d, take S := Z[ [ ] d] or Z

More information

i=1 β i,i.e. = β 1 x β x β 1 1 xβ d

i=1 β i,i.e. = β 1 x β x β 1 1 xβ d 66 2. Every family of seminorms on a vector space containing a norm induces ahausdorff locally convex topology. 3. Given an open subset Ω of R d with the euclidean topology, the space C(Ω) of real valued

More information

On a Markov Game with Incomplete Information

On a Markov Game with Incomplete Information On a Markov Game with Incomlete Information Johannes Hörner, Dinah Rosenberg y, Eilon Solan z and Nicolas Vieille x{ January 24, 26 Abstract We consider an examle of a Markov game with lack of information

More information

BELIEF CONSISTENCY AND TRADE CONSISTENCY. E. Lehrer D. Samet. Working Paper No 27/2012 December Research No

BELIEF CONSISTENCY AND TRADE CONSISTENCY. E. Lehrer D. Samet. Working Paper No 27/2012 December Research No BELIEF CONSISTENCY AND TRADE CONSISTENCY by E. Lehrer D. Samet Working Paper No 27/2012 December 2012 Research No. 06510100 The School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel

More information

Stochastic convexity in dynamic programming

Stochastic convexity in dynamic programming Economic Theory 22, 447 455 (2003) Stochastic convexity in dynamic programming Alp E. Atakan Department of Economics, Columbia University New York, NY 10027, USA (e-mail: aea15@columbia.edu) Received:

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

Matrix functions that preserve the strong Perron- Frobenius property

Matrix functions that preserve the strong Perron- Frobenius property Electronic Journal of Linear Algebra Volume 30 Volume 30 (2015) Article 18 2015 Matrix functions that preserve the strong Perron- Frobenius property Pietro Paparella University of Washington, pietrop@uw.edu

More information

Stochastic Realization of Binary Exchangeable Processes

Stochastic Realization of Binary Exchangeable Processes Stochastic Realization of Binary Exchangeable Processes Lorenzo Finesso and Cecilia Prosdocimi Abstract A discrete time stochastic process is called exchangeable if its n-dimensional distributions are,

More information

ALGEBRA 8: Linear algebra: characteristic polynomial

ALGEBRA 8: Linear algebra: characteristic polynomial ALGEBRA 8: Linear algebra: characteristic polynomial Characteristic polynomial Definition 8.1. Consider a linear operator A End V over a vector space V. Consider a vector v V such that A(v) = λv. This

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

Only Intervals Preserve the Invertibility of Arithmetic Operations

Only Intervals Preserve the Invertibility of Arithmetic Operations Only Intervals Preserve the Invertibility of Arithmetic Operations Olga Kosheleva 1 and Vladik Kreinovich 2 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Lecture Notes Math 371: Algebra (Fall 2006) by Nathanael Leedom Ackerman

Lecture Notes Math 371: Algebra (Fall 2006) by Nathanael Leedom Ackerman Lecture Notes Math 371: Algebra (Fall 2006) by Nathanael Leedom Ackerman October 17, 2006 TALK SLOWLY AND WRITE NEATLY!! 1 0.1 Factorization 0.1.1 Factorization of Integers and Polynomials Now we are going

More information

Value Iteration. 1 Introduction. Krishnendu Chatterjee 1 and Thomas A. Henzinger 1,2

Value Iteration. 1 Introduction. Krishnendu Chatterjee 1 and Thomas A. Henzinger 1,2 Value Iteration Krishnendu Chatterjee 1 and Thomas A. Henzinger 1,2 1 University of California, Berkeley 2 EPFL, Switzerland Abstract. We survey value iteration algorithms on graphs. Such algorithms can

More information

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman Kernels of Directed Graph Laplacians J. S. Caughman and J.J.P. Veerman Department of Mathematics and Statistics Portland State University PO Box 751, Portland, OR 97207. caughman@pdx.edu, veerman@pdx.edu

More information

Probabilistic Aspects of Computer Science: Probabilistic Automata

Probabilistic Aspects of Computer Science: Probabilistic Automata Probabilistic Aspects of Computer Science: Probabilistic Automata Serge Haddad LSV, ENS Paris-Saclay & CNRS & Inria M Jacques Herbrand Presentation 2 Properties of Stochastic Languages 3 Decidability Results

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems

More information

THE DYNAMICS OF SUCCESSIVE DIFFERENCES OVER Z AND R

THE DYNAMICS OF SUCCESSIVE DIFFERENCES OVER Z AND R THE DYNAMICS OF SUCCESSIVE DIFFERENCES OVER Z AND R YIDA GAO, MATT REDMOND, ZACH STEWARD Abstract. The n-value game is a dynamical system defined by a method of iterated differences. In this paper, we

More information

MATH 8253 ALGEBRAIC GEOMETRY WEEK 12

MATH 8253 ALGEBRAIC GEOMETRY WEEK 12 MATH 8253 ALGEBRAIC GEOMETRY WEEK 2 CİHAN BAHRAN 3.2.. Let Y be a Noetherian scheme. Show that any Y -scheme X of finite type is Noetherian. Moreover, if Y is of finite dimension, then so is X. Write f

More information

arxiv: v2 [math.co] 25 Jun 2015

arxiv: v2 [math.co] 25 Jun 2015 ON THE SET OF ELASTICITIES IN NUMERICAL MONOIDS THOMAS BARRON, CHRISTOPHER O NEILL, AND ROBERTO PELAYO arxiv:409.345v [math.co] 5 Jun 05 Abstract. In an atomic, cancellative, commutative monoid S, the

More information

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic

More information

Contents Partial Fraction Theory Real Quadratic Partial Fractions Simple Roots Multiple Roots The Sampling Method The Method of Atoms Heaviside s

Contents Partial Fraction Theory Real Quadratic Partial Fractions Simple Roots Multiple Roots The Sampling Method The Method of Atoms Heaviside s Contents Partial Fraction Theory Real Quadratic Partial Fractions Simple Roots Multiple Roots The Sampling Method The Method of Atoms Heaviside s Coverup Method Extension to Multiple Roots Special Methods

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

4 Inverse function theorem

4 Inverse function theorem Tel Aviv University, 2014/15 Analysis-III,IV 71 4 Inverse function theorem 4a What is the problem................ 71 4b Simple observations before the theorem..... 72 4c The theorem.....................

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

The Target Discounted-Sum Problem

The Target Discounted-Sum Problem The Target Discounted-Sum Problem Udi Boker The Interdisciplinary Center (IDC), Herzliya, Israel Email: udiboker@idc.ac.il Thomas A. Henzinger and Jan Otop IST Austria Klosterneuburg, Austria Email: {tah,

More information

האוניברסיטה העברית בירושלים

האוניברסיטה העברית בירושלים האוניברסיטה העברית בירושלים THE HEBREW UNIVERSITY OF JERUSALEM TOWARDS A CHARACTERIZATION OF RATIONAL EXPECTATIONS by ITAI ARIELI Discussion Paper # 475 February 2008 מרכז לחקר הרציונליות CENTER FOR THE

More information

Perturbations of Markov Chains with Applications to Stochastic Games

Perturbations of Markov Chains with Applications to Stochastic Games Perturbations of Markov Chains with Applications to Stochastic Games Eilon Solan June 14, 2000 In this lecture we will review several topics that are extensively used in the study of n-player stochastic

More information

Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009

Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009 Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009 These notes are based on Chapter 12 of David Blackwell and M. A.Girshick, Theory of Games and Statistical Decisions, John Wiley

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

THE BIG MATCH WITH A CLOCK AND A BIT OF MEMORY

THE BIG MATCH WITH A CLOCK AND A BIT OF MEMORY THE BIG MATCH WITH A CLOCK AND A BIT OF MEMORY KRISTOFFER ARNSFELT HANSEN, RASMUS IBSEN-JENSEN, AND ABRAHAM NEYMAN Abstract. The Big Match is a multi-stage two-player game. In each stage Player hides one

More information

Learning to Coordinate Efficiently: A Model-based Approach

Learning to Coordinate Efficiently: A Model-based Approach Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

The complex projective line

The complex projective line 17 The complex projective line Now we will to study the simplest case of a complex projective space: the complex projective line. We will see that even this case has already very rich geometric interpretations.

More information

Linear Algebra I. Ronald van Luijk, 2015

Linear Algebra I. Ronald van Luijk, 2015 Linear Algebra I Ronald van Luijk, 2015 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents Dependencies among sections 3 Chapter 1. Euclidean space: lines and hyperplanes 5 1.1. Definition

More information

18.175: Lecture 2 Extension theorems, random variables, distributions

18.175: Lecture 2 Extension theorems, random variables, distributions 18.175: Lecture 2 Extension theorems, random variables, distributions Scott Sheffield MIT Outline Extension theorems Characterizing measures on R d Random variables Outline Extension theorems Characterizing

More information

Lecture 6: April 25, 2006

Lecture 6: April 25, 2006 Computational Game Theory Spring Semester, 2005/06 Lecture 6: April 25, 2006 Lecturer: Yishay Mansour Scribe: Lior Gavish, Andrey Stolyarenko, Asaph Arnon Partially based on scribe by Nataly Sharkov and

More information

A Survey of Stochastic ω-regular Games

A Survey of Stochastic ω-regular Games A Survey of Stochastic ω-regular Games Krishnendu Chatterjee Thomas A. Henzinger EECS, University of California, Berkeley, USA Computer and Communication Sciences, EPFL, Switzerland {c krish,tah}@eecs.berkeley.edu

More information

Polynomial stochastic games via sum of squares optimization

Polynomial stochastic games via sum of squares optimization Polynomial stochastic games via sum of squares optimization Parikshit Shah University of Wisconsin, Madison, WI email pshah@discoverywiscedu Pablo A Parrilo Massachusetts Institute of Technology, Cambridge,

More information

Multiagent Value Iteration in Markov Games

Multiagent Value Iteration in Markov Games Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges

More information

4 Divergence theorem and its consequences

4 Divergence theorem and its consequences Tel Aviv University, 205/6 Analysis-IV 65 4 Divergence theorem and its consequences 4a Divergence and flux................. 65 4b Piecewise smooth case............... 67 4c Divergence of gradient: Laplacian........

More information

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε 1. Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II Chapter 2 Further properties of analytic functions 21 Local/Global behavior of analytic functions;

More information

Complex Numbers. Basic algebra. Definitions. part of the complex number a+ib. ffl Addition: Notation: We i write for 1; that is we

Complex Numbers. Basic algebra. Definitions. part of the complex number a+ib. ffl Addition: Notation: We i write for 1; that is we Complex Numbers Definitions Notation We i write for 1; that is we define p to be p 1 so i 2 = 1. i Basic algebra Equality a + ib = c + id when a = c b = and d. Addition A complex number is any expression

More information