The Value Functions of Markov Decision Processes
|
|
- Erin Henry
- 6 years ago
- Views:
Transcription
1 The Value Functions of Markov Decision Processes arxiv:.0377v [math.pr] 7 Nov 0 Ehud Lehrer, Eilon Solan, and Omri N. Solan November 0, 0 Abstract We provide a full characterization of the set of value functions of Markov decision processes. Introduction Markov decision processes are a standard tool for studying dynamic optimization problems. The discounted value of such a problem is the maximal total discounted amount that the decision maker can guarantee to himself. By Blackwell (96), the function λ v λ (s) that assigns the discounted value at the initial state s to each discount factor λ is the maximum of finitely many rational functions (with real coefficients). Standard arguments show that the roots of the polynomial in the denominator of these rational functions either lie outside the unit ball in the complex plane, or on the boundary of the unit ball, in which case they have Lehrer acknowledges the support of the Israel Science Foundation, Grant #963/. Solan acknowledges the support of the Israel Science Foundation, Grants #33/3. School of Mathematical Sciences, Tel Aviv University, Tel Aviv , Israel and INSEAD, Bd. de Constance, 7730 Fontainebleau Cedex, France. lehrer@post.tau.ac.il. School of Mathematical Sciences, Tel Aviv University, Tel Aviv , Israel. eilons@post.tau.ac.il. School of Mathematical Sciences, Tel Aviv University, Tel Aviv , Israel. omrisola@post.tau.ac.il.
2 multiplicity. Using the theory of eigenvalues of stochastic matrices one can show that the roots on the boundary of the unit ball must be unit roots. In this note we prove the converse result: every function λ v λ that is the maximum of finitely many rational functions such that each root of the polynomials in the denominators either lies outside the unit ball in the complex plane, or is a unit root with multiplicity is the value of some Markov decision process. The Model and the Main Theorem Definition A Markov decision process is a tuple (S, A, r, q) where S is a finite set of states. A distribution µ (S) according to which the initial state is chosen. A = (A(s)) s S is the family of sets of actions available at each state s S. Denote SA := {(s, a): s S, a A(s)}. r : SA R is a payoff function. q : SA (S) is a transition function. The process starts at an initial state s S, chosen according to µ. It then evolves in discrete time: at every stage n N the process is in a state s n S, the decision maker chooses an action a n A(s n ), and a new state s n+ is chosen according to q( s n, a n ). A finite history is a sequence h n = (s, a, s, a,, s n ) H := k=0 (SA)k S. A pure strategy is a function σ : H s S A(s) such that σ(h n ) A(s n ) for every finite history h n = (s, a,, s n ), and a behavior strategy is a function σ : H s S (A(s)) such that σ(h n ) (A(s n )) for every such finite history. The set of behavior strategies is denoted B. In other words, σ assign to every finite history a distribution over A, which we call a mixed action. A strategy is stationary For every finite set X, the set of probability distributions over X is denoted (X).
3 if for every finite history h n = (s, a,, s n ), the mixed action σ(h n ) is a function of s n and is independent of (s, a,, a n ). Every behavior strategy together with a prior distribution µ over the state space induce a probability distribution P µ,σ over the space of infinite histories (SA) (which is endowed with the product σ-algebra). Expectation w.r.t. this probability distribution is denoted E µ,σ. For every discount factor λ [0, ), the λ-discounted payoff is [ ] γ λ (µ, σ) := E µ,σ λ n r(s n, a n ). n= When µ is a probability measure that is concentrated on a single state s we denote the λ-discounted payoff also by γ(s, σ). decision process, with the prior µ over the initial state is The λ-discounted value of the Markov v λ (µ) := sup γ λ (µ, σ). () σ B A behavior strategy is λ-discounted optimal if it attains the maximum in (). Denote by V the set of all functions λ v λ (µ) that are the value function of some Markov decision processes starting with some prior µ (S). The goal of the present note is to characterize the set V. Notation (i) Denote by F the set of all rational functions P such that each Q root of Q is (a) outside the unit ball, or (b) a unit root with multiplicity. (ii) Denote by MaxF the set of functions that are the maximum of a finite number of functions in F. The next proposition states that any function in V is the maximum of a finite number of functions in F Proposition V MaxF. Proof. By Blackwell (96), for every λ [0, ) there is a λ-discounted pure stationary optimal strategy. Since the number of pure stationary strategies is Recall that a complex number ω C is a unit root if there exists n N such that ω n =. 3
4 finite, it is sufficient to show that the function λ γ λ (µ, σ) is in F, for every pure stationary strategy σ. For every pure stationary strategy σ, every prior µ, and every discount factor λ [0, ), the vector (γ λ (s, σ)) s S is the unique solution of a system of S linear equations 3 in λ: γ λ (s, σ) = r(s, σ(s)) + λ q(s s, σ(s))γ λ (s, σ), s S, s S where r(s, σ(s)) := a A(s) σ(a s)r(s, a) and q(s s, σ(s)) := a A(s) σ(a s)q(s s, a) are the multilinear extensions of r and q, respectively. It follows that γ λ (, σ) = (I λq(, σ( ))) r(, σ( )), where Q(, σ( )) = Q = (Q s,s ) s,s S is the transition matrix induced by σ, that is, Q s,s = q(s s, σ(s)). By Cramer s rule, the function λ (I λq) is a rational function whose denominator is det(i λq). In particular, the roots of the denominator are the inverse of the eigenvalues of Q. Since the denominator is independent of s, it is also the denominator of γ λ (µ, σ) = s S µ(s)γ λ(s, σ). Denote the expected payoff at stage n by x n := E µ,σ [r(s n, σ(h n ))], so that γ λ (µ, σ) = n= x nλ n. Since x n max s S,a A(s) r(s, a) for every n N, it follows that the denominator det(i λq(, σ( )) does not have roots in the interior of the unit ball and that all its roots that lie on the boundary of the unit ball have multiplicity. Moreover, by, e.g., Higham and Lin (0) it follows that the roots that lie on the boundary of the unit ball must be unit roots. The main result of this note is that the converse holds as well. Theorem A function w : [0, ) R is in V if and only if it is the maximum of a finite number of functions in F. To avoid cumbersome notations we write f(λ) for the function λ f(λ). In particular, λf(λ) will denote the function λ λf(λ). We start with the following observation. 3 The result is valid for every stationary strategy, not necessarily pure. We will need it only for pure stationary strategies. 4
5 Lemma If f, g V then max{f, g} V. Proof. Let M f = (S f, µ f, A f, r f, q f ) and M g = (S g, µ g, A g, r g, q g ) be the Markov decision processes that implement f and g respectively. To implement max{f, g}, define a Markov decision process that contains M f, M g, and an additional state s, in which the decision maker chooses one of M f and M g. Formally, let M = (S f S g {s }, A, r, q ) be a Markov decision process in which A, r, and q coincide with A f, r f, and q f (resp. A g, r g, and q g ) on S f (resp. S g ), and in which in the state s the decision maker chooses whether to follow M f or M g, and the payoff and transitions at that state are the expectation of the payoff and transitions at the first stage in M f (resp. M g ) according to µ f (resp. µ g ): A (s ) := ( s S A f (s)) ( s S A g (s)) {f, g}, and for every a f s S A f (s) and every a g s S A g (s), r (s, (a f, a g, x)) := q ( s, (a f, a g, x)) := { s S µ f(s)r f (s, a f (s)) x = f, s S µ g(s)r g (s, a f (s)) x = g. { s S µ f(s)q f ( (s,f, a f (s))) x = f, s S µ g(s)q g ( (s,g, a g (s))) x = g. The reader can verify that the value function of M at the initial state s max{f, g}. is 3 Degenerate Markov Decision Processes As mentioned earlier, for every pure stationary strategy σ and every initial state s, the function λ γ λ (s, σ) is a rational function of λ. Since there is a λ-discounted pure stationary optimal strategy, and since there are finitely many such functions, it follows that the function λ v λ is the maximum of finitely many rational functions, each of which is the payoff function of some pure stationary strategy. When the decision maker follows a pure stationary strategy, we are reduced to a Markov decision process in which there is a single action in each state. This observation leads us to the following definition.
6 A Markov decision process is degenerate if A(s) = for every s S. When M is a degenerate Markov decision process we omit the reference to the action in the functions r and q. A degenerate Markov decision process is thus a quadruple (S, µ, r, q), where S is the state space, µ is a probability distribution over S, r : S R, and q( s) is a probability distribution for every state s S. Denote by V D the set of all functions that are payoff functions of some degenerate Markov decision process. Plainly, V D V. To prove Theorem, we first prove that degenerate Markov decision processes implement all functions in F. Theorem F V D. Theorem and Lemma show that MaxF V, and together with Proposition they establish Theorem. The rest of the paper is dedicated to the proof of Theorem. 3. Characterizing the set V D The following lemma lists several properties of the functions implementable by degenerate Markov decision processes. Lemma For every f V D we have a) af(λ) V D for every a R. b) f( λ) V D. c) λf(λ) V D. d) f(cλ) V D for every c [0, ]. e) f(λ) + g(λ) V D for every g V D. f) f(λ n ) V D for every n N. 6
7 Proof. Let M f = (S f, µ f, r f, q f ) be a degenerate Markov decision process whose value function is f. To prove Part (a), we multiply all payoffs in M f by a. Formally, define a degenerate Markov decision process M = (S f, µ f, r, q f ) that differs from M only in its payoff function: r (s) := ar f (s) for every s S f. The reader can verify that the value function of M at the initial state s,f is af(λ). To prove Part (b), multiply the payoff in even stages by. Formally, let Ŝ be a copy of S f ; for every state s S f we denote by ŝ its copy in Ŝ. Define a degenerate Markov decision process M = (S f Ŝ, µ f, r, q ) with initial distribution µ f (whose support is S f ) that visits states in Ŝ in even stages and states in S f in odd stages as follows: r (s) := r f (s), r (ŝ) := r f (s), s S f, q (ŝ s) = q (s ŝ) := q f (s s), s, s S f, q (s s) = q (ŝ ŝ) := 0, s, s S f. The reader can verify that the value function of M is f( λ). To prove part (c), add a state with payoff 0 from which the transition probability to a state in S f coincides with µ. Formally, define a degenerate Markov decision process M = (S f {s }, µ, r, q ) in which µ assigns probability to s. r coincides with r f on S f, while r (s ) := 0. Finally, q coincides with q f on S f, while at the state s, q (s s ) := µ(s ). The value function of M is λf(λ). To prove part (d), consider the transition function that at every stage, moves to an absorbing state 4 with payoff 0 with probability c, and with probability c continues as in M. Formally, define a degenerate Markov decision process M = (S f {s }, µ, r, q ) in which µ coincides with µ f, r and q coincide with r f and q f on S f, r (s ) := 0, and q (s s ) := (that is, s is an absorbing state), and q (s s) := c, q (s s) := cq f (s s), s, s S f. The value function of M at the initial state s,f is f(cλ). 4 A state s S is absorbing if q(s s, a) = for every action a A(s). 7
8 To prove Part (e), we show that f + g is in V D and we use part (a) with a =. The function f + g is the value function of the degenerate Markov decision process in which the prior chooses with probability one of two degenerate Markov decision processes that implement f and g. Formally, let M g = (S g, µ g, r g, q g ) be a degenerate Markov decision process whose value function is g. Let M = (S f S g, µ, r, q ) be the degenerate Markov decision process whose state space consists of disjoint copies of S f and S g, the functions r (resp. q ) coincide with r f and r g (resp. q f and q g ) of S f (resp. S g, and the initial distribution is µ = µ f + µ g. The value function of M is f + g. To prove Part (f), we space out the Markov decision process in a way that stage k of the Markov decision process that implements f becomes stage +(k )n, and the payoff in all other stages is 0. Formally, Let M = (S f {,,, n}, µ, r, q ) be a degenerate Markov decision process where µ = µ f and q ((s, k + ) (s, k)) := k {,,..., n }, s S, q ( (s, n)) := q f (s) s S, r ((s, )) := r f (s) s S, r ((s, k)) := 0 k {, 3,..., n}, s S. The value function of M with the prior µ is f(λ n ). Lemma 3 in V D. a) Every polynomial P is in V D and if f V D, then P f is also b) Let P and Q be two polynomials. If Q V D then P Q V D. In particular, if Q divides Q and Q V D then Q V D. c) If Q is a polynomial whose roots are all unit roots of multiplicity, then V Q D. Throughout the paper, whenever we refer to a polynomial we mean a polynomial with real coefficients. 8
9 Proof. Part (a) follows from Lemma (a,c,e) and the observation that any constant function a is in V D, which holds since the constant function a is the value function of the degenerate Markov decision process that starts with a state whose payoff is a and continues to an absorbing state whose payoff is 0. Part (b) follows from Part (a). We turn to prove Part (c). The degenerate Markov decision process with a single state in which the payoff is yields payoff, and therefore V λ λ D. By Lemma (f) it follows that V λ n D, for every n N. Let n be large enough such that Q divides λ n. The result now follows by Part (b). To complete the proof of Theorem we characterize the polynomials Q that satisfy Q V D. To this end we need the following property of V D. Lemma 4 If f, g V D then f(λ)g(λc) V D for every c (0, ). Proof. The proof of Lemma 4 is the most intricate part of the proof of Theorem. We start with an example, that will help us illustrate the formal definition of the degenerate MDP that implements f(λ)g(λc). Let M f and M g be the degenerate Markov decision processes that are depicted in Figure with the initial distributions µ f (s f ) = and µ g (s,g ) =, in which the payoff at each state appears in a square next to the state. Denote by f and g the value functions of M f and M g, respectively. s f s f 3 4 s,g s,g s 3,g s f 3 MDP M f MDP M g Figure : An example of two MDP s. Consider the degenerate Markov decision process M depicted in Figure, where c (0, ) and the initial state is s,g. 9
10 ( c) ( c) 3( c) s c s,g c c,g s 3,g s,f c 4( c) 4 s,f 3 s,f c 4( c) 4 s,f 4 6 s 3,f c 4( c) 4 s 3,f 6 9 s,f s,f 3 s 3,f Figure : The degenerate MDP M. The MDP M is composed of one copy of M g, and for every state in S g it contains one copy of M f. It starts at s,g, the initial state of M g. Then, at every stage, with probability c it continues as in M g, and with probability ( c) it moves to a copy of M f. In case a transition to a copy of M f occurs, the new state is chosen according to the transitions q f ( s f ). This induces a distribution similar to that of the second stage of M f. The payoff in each of the copies of M f is the product of the payoff in M f times the payoff of the state in M g that has been assigned to that copy. The payoff in each state s g S g is ( c)r g (s g ) times the expected payoff in the first stage of M f. Thus, each state in s g S g serves three purposes (see Figure ). First, it is a regular state in the copy of M g. Second, once a transition from M g to a copy of M f occurs (at each stage it occurs with probability ( c)), it serves as the first stage in M f. Finally, once a transition from s g to a copy of M f occurs, the payoffs in the copy are set to the product of the original payoffs in M f times r g (s g ). We now turn to the formal construction of M. Let f, g V D and let M f = (S f, µ f, r f, q f ) (resp. M g = (S g, µ g, r g, q g )) be the degenerate Markov decision process that implements f (resp. g). Define the following degenerate Markov decision process M = (S, µ, r, q): The set of states is S = (S g S f ) S g. In words, the set of states contains 0
11 a copy of S g, and for each state s g S g it contains a copy of S f. The initial distribution is µ g : µ(s g ) = µ g (s g ), s g S g, µ(s g, s f ) = 0, (s g, s f ) S g S f. The transition is as follows: In each copy of S f, the transition is the same as the transition in M f : q((s g, s f) (s g, s f )) := q f (s f s f ), s g S g, s f, s f S f, q((s g, s f) (s g, s f )) := 0, s g s g S g, s f, s f S f. In the copy of S g, with probability c the transition is as in M g, and with probability ( c) it is as in M f starting with the initial distribution µ f : q(s g s g ) := cq g (s g s g ), s g, s g S g, q((s g, s f ) s g ) := ( c) µ(s f)q f (s f s f), s g S g, s f S f, s f S f q((s g, s f ) s g ) := 0 s g s g S g, s f S f. The payoff function is as follows: r(s g ) := ( c)r g (s g ) µ f (s f )r f (s f ), s g S g, s f S f r(s g, s f ) := r g (s g )r f (s f ), s g S g, s f S f. We will now calculate the value function of M. Denote by E µf [r f ] := s f S f µ f (s f )r f (s f ) the expected payoff in M f at the first stage and by R the expected payoff in M f from the second stage and on. Then f(λ) = E νf [r f ] + λr.
12 At every stage, with probability c the process remains in S g and with probability ( c) the process leaves it. In particular, the probability that at stage n the process is still in S g is c n, in which case (a) the payoff is ( c)r g (s n )E µf [r f ], and (b) with probability ( c) the process moves to a copy of M f, and the total discounted payoff from stage n + and on is R. It follows that the total discounted payoff is c n λ n ( ( c)r g (s n )E µf (r f ) + ( c)λr g (s n )R ) n= = ( c) c n λ n r g (s k )f(λ) n= = ( c)g(cλ)f(λ). The result follows by Lemma (a). Lemma Let ω C be a complex number that lies outside the unit ball. 6 a) If ω C \ R then (ω λ)(ω λ) V D. b) If ω R then (ω λ) V D. Proof. We start by proving part (a). For every complex number ω C \ R that lies outside the unit ball there are three natural numbers k < l < m and three nonnegative reals α, α, α 3 that sum up to such that = α ω k + α ω l + α 3 ω m. Consider the degenerate Markov decision process that is depicted in Figure 3. That is, the set of states is S f := {s, s,, s m }, the payoff function is and the transition function is r(s m ) :=, r(s j ) := 0, j < m, q(s m k+ s m ) := α, q(s m l+ s m ) := α,, q(s s m ) := α 3, q(s j+ s j ) =, j < m. 6 For every complex number ω, the conjugate of ω is denoted ω.
13 s s s m l+ s m k+ s m α 3 α α Figure 3: The degenerate MDP in the proof of Lemma. The discounted value satisfies It follows that v λ (s j ) = λv λ (s j+ ), 0 j < m, v λ (s m ) = + λ ( α v λ (s m k+ ) + α v λ (s m l+ ) + α 3 v λ (s ) ). v λ (s m ) = α λ k α λ l α 3 λ m. Hence, this function is in V D. Since ω is one of its roots, by Lemma 3(b) we obtain that (ω λ)(ω λ) V D, as desired. We turn to prove part (b). Let ω be a real number with ω >. By Lemma 3(b) λ is in V D. By Lemma (d), λ ω When ω <, by Lemma (a,b), ω λ V D. V D, and by Lemma (a), ω λ V D. 4 The Proof of Theorem Let Q 0 be a polynomial with real coefficients whose roots are either outside the unit ball or unit roots with multiplicity. To complete the proof of Theorem we prove that Q V D. Denote by Ω the set of all roots of Q that are unit roots, by Ω the set of all roots of Q that lie outside the unit ball and have a positive imaginary part, and by Ω 3 the set of all real roots of Q that lie outside the unit ball. If some roots have multiplicity larger than, then they appear several times in Q or Q 3. For i =, 3 denote Q i = ω Ω i (ω λ) and set Q = ω Ω (ω λ)(ω λ); when Ω i = we set Q i =. Then Q = Q Q Q 3. If Ω, then by Lemma 3(c) we have Q V D. Otherwise Q =, in which case, Q 3 V D by Lemma 3(a).
14 Fix ω Ω and let c R be such that < c < ω. Since ω c lies outside the unit ball, Lemma (a) implies that g ω (λ) := is in V ( D. By Lemma ω c λ)( ω c λ) 4(a), g ω ( λ) c c Q = (ω λ)(ω λ) Q V D. By Lemma (a), (ω λ)(ω λ) Q V D. Applying successively this argument for the remaining roots in Ω, one obtains that Q Q V D. To complete the proof we apply a similar idea to ω Ω 3. Fix ω Ω 3 and let c R be such that < c < ω. By Lemma (b), ω c λ Lemma 4(a) and Lemma (a), (ω λ) Q Q every ω Ω 3 one obtains that Q 3 Q Q Final Remark V D and again by By V D. By iterating this argument for V D, as desired. The set V contain all value functions of MDP s in which the state at the first stage is chosen according to a probability distribution µ. One can wonder whether the set of implementable value functions shrinks if one restricts attention to MDP s in which the first stage is given; that is, µ assigns probability to one state. The answer is negative: the value function of any MDP in which the initial state is chosen according to a probability distribution (prior) can be obtained as the value function of an MDP in which the initial state is deterministic. Indeed, let M be an MDP with a prior. One can construct M by adding to M an initial state s in which the payoff is the expected payoff at the first stage of M and the transitions are the expected transitions after the first stage of M. We applied a similar idea in the proof of Lemma. References [] Blackwell D. (96) Discounted Dynamic Programming. Annals of Mathematical Statistics, 36, 6 3. [] Higham N.J. and Lin L. (0) On p th Roots of Stochastic Matrices. Linear Algebra and its Applications, 3,
Quitting games - An Example
Quitting games - An Example E. Solan 1 and N. Vieille 2 January 22, 2001 Abstract Quitting games are n-player sequential games in which, at any stage, each player has the choice between continuing and
More informationLearning to play partially-specified equilibrium
Learning to play partially-specified equilibrium Ehud Lehrer and Eilon Solan August 29, 2007 August 29, 2007 First draft: June 2007 Abstract: In a partially-specified correlated equilibrium (PSCE ) the
More informationSANDWICH GAMES. July 9, 2014
SANDWICH GAMES EHUD LEHRER AND ROEE TEPER July 9, 204 Abstract. The extension of set functions (or capacities) in a concave fashion, namely a concavification, is an important issue in decision theory and
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationComputing Uniformly Optimal Strategies in Two-Player Stochastic Games
Noname manuscript No. will be inserted by the editor) Computing Uniformly Optimal Strategies in Two-Player Stochastic Games Eilon Solan and Nicolas Vieille Received: date / Accepted: date Abstract We provide
More informationThe Game of Normal Numbers
The Game of Normal Numbers Ehud Lehrer September 4, 2003 Abstract We introduce a two-player game where at each period one player, say, Player 2, chooses a distribution and the other player, Player 1, a
More informationCSE250A Fall 12: Discussion Week 9
CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.
More informationExcludability and Bounded Computational Capacity Strategies
Excludability and Bounded Computational Capacity Strategies Ehud Lehrer and Eilon Solan June 11, 2003 Abstract We study the notion of excludability in repeated games with vector payoffs, when one of the
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More information1 Stochastic Dynamic Programming
1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future
More informationThe Complexity of Ergodic Mean-payoff Games,
The Complexity of Ergodic Mean-payoff Games, Krishnendu Chatterjee Rasmus Ibsen-Jensen Abstract We study two-player (zero-sum) concurrent mean-payoff games played on a finite-state graph. We focus on the
More informationLinear Programming Methods
Chapter 11 Linear Programming Methods 1 In this chapter we consider the linear programming approach to dynamic programming. First, Bellman s equation can be reformulated as a linear program whose solution
More informationA Necessary and Sufficient Condition for Convergence of Statistical to Strategic Equilibria of Market Games
A Necessary and Sufficient Condition for Convergence of Statistical to Strategic Equilibria of Market Games Dimitrios P. Tsomocos Dimitris Voliotis Abstract In our model, we treat a market game where traders
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationProbabilistic Planning. George Konidaris
Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationNoncooperative continuous-time Markov games
Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and
More informationInfinite-Horizon Discounted Markov Decision Processes
Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected
More informationOn John type ellipsoids
On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationMarkov decision processes and interval Markov chains: exploiting the connection
Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationSimplex Algorithm for Countable-state Discounted Markov Decision Processes
Simplex Algorithm for Countable-state Discounted Markov Decision Processes Ilbin Lee Marina A. Epelman H. Edwin Romeijn Robert L. Smith November 16, 2014 Abstract We consider discounted Markov Decision
More informationMotivation for introducing probabilities
for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.
More informationThe quest for finding Hamiltonian cycles
The quest for finding Hamiltonian cycles Giang Nguyen School of Mathematical Sciences University of Adelaide Travelling Salesman Problem Given a list of cities and distances between cities, what is the
More informationRobust Markov Decision Processes
Robust Markov Decision Processes Wolfram Wiesemann, Daniel Kuhn and Berç Rustem February 9, 2012 Abstract Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments.
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationModule 8 Linear Programming. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 8 Linear Programming CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Policy Optimization Value and policy iteration Iterative algorithms that implicitly solve
More informationSignaling and mediation in games with common interests
Signaling and mediation in games with common interests Ehud Lehrer, Dinah Rosenberg and Eran Shmaya February 5, 2006 Abstract Players who have a common interest are engaged in a game with incomplete information.
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationReinforcement Learning and Deep Reinforcement Learning
Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q
More informationVector Spaces and SubSpaces
Vector Spaces and SubSpaces Linear Algebra MATH 2076 Linear Algebra Vector Spaces & SubSpaces Chapter 4, Section 1b 1 / 10 What is a Vector Space? A vector space is a bunch of objects that we call vectors
More informationTwo-player games : a reduction
Two-player games : a reduction Nicolas Vieille December 12, 2001 The goal of this chapter, together with the next one, is to give a detailed overview of the proof of the following result. Theorem 1 Let
More informationAn Inverse Method for Policy-Iteration Based Algorithms
An Inverse Method for Policy-Iteration Based Algorithms Étienne André LSV, ENS Cachan & CNRS Cachan, France andre@lsv.ens-cachan.fr Laurent Fribourg LSV, ENS Cachan & CNRS Cachan, France fribourg@lsv.ens-cachan.fr
More informationA subexponential lower bound for the Random Facet algorithm for Parity Games
A subexponential lower bound for the Random Facet algorithm for Parity Games Oliver Friedmann 1 Thomas Dueholm Hansen 2 Uri Zwick 3 1 Department of Computer Science, University of Munich, Germany. 2 Center
More informationPERTURBATIONS OF MARKOV CHAINS WITH APPLICATIONS TO STOCHASTIC GAMES. Northwestern University Evanston, USA Tel Aviv University Tel Aviv, Israel
PERTURBATIONS OF MARKOV CHAINS WITH APPLICATIONS TO STOCHASTIC GAMES EILON SOLAN Northwestern University Evanston, USA Tel Aviv University Tel Aviv, Israel Abstract. In this chapter we will review several
More informationStochastic Shortest Path Problems
Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can
More informationSF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES
SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES This document is meant as a complement to Chapter 4 in the textbook, the aim being to get a basic understanding of spectral densities through
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationBoolean Inner-Product Spaces and Boolean Matrices
Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver
More informationREVERSIBILITY AND OSCILLATIONS IN ZERO-SUM DISCOUNTED STOCHASTIC GAMES
REVERSIBILITY AND OSCILLATIONS IN ZERO-SUM DISCOUNTED STOCHASTIC GAMES Sylvain Sorin, Guillaume Vigeral To cite this version: Sylvain Sorin, Guillaume Vigeral. REVERSIBILITY AND OSCILLATIONS IN ZERO-SUM
More informationMarkov Decision Processes with Multiple Long-run Average Objectives
Markov Decision Processes with Multiple Long-run Average Objectives Krishnendu Chatterjee UC Berkeley c krish@eecs.berkeley.edu Abstract. We consider Markov decision processes (MDPs) with multiple long-run
More informationAM 121: Intro to Optimization Models and Methods: Fall 2018
AM 11: Intro to Optimization Models and Methods: Fall 018 Lecture 18: Markov Decision Processes Yiling Chen Lesson Plan Markov decision processes Policies and value functions Solving: average reward, discounted
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationarxiv: v1 [math.pr] 21 Jul 2014
Optimal Dynamic Information Provision arxiv:1407.5649v1 [math.pr] 21 Jul 2014 Jérôme Renault, Eilon Solan, and Nicolas Vieille August 20, 2018 Abstract We study a dynamic model of information provision.
More informationTHE DYNAMICS OF PREFERENCES, PREDICTIVE PROBABILITIES, AND LEARNING. August 16, 2017
THE DYNAMICS OF PREFERENCES, PREDICTIVE PROBABILITIES, AND LEARNING EHUD LEHRER AND ROEE TEPER August 16, 2017 Abstract. We take a decision theoretic approach to predictive inference as in Blackwell [4,
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationUNIQUENESS OF ENTIRE OR MEROMORPHIC FUNCTIONS SHARING ONE VALUE OR A FUNCTION WITH FINITE WEIGHT
Volume 0 2009), Issue 3, Article 88, 4 pp. UNIQUENESS OF ENTIRE OR MEROMORPHIC FUNCTIONS SHARING ONE VALUE OR A FUNCTION WITH FINITE WEIGHT HONG-YAN XU AND TING-BIN CAO DEPARTMENT OF INFORMATICS AND ENGINEERING
More informationExchangeable Processes: de Finetti Theorem Revisited
Exchangeable Processes: de Finetti Theorem Revisited Ehud Lehrer and Dimitry Shaiderman January 16, 2018 Abstract: A sequence of random variables is exchangeable if the joint distribution of any finite
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationVISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.
VISCOSITY SOLUTIONS PETER HINTZ We follow Han and Lin, Elliptic Partial Differential Equations, 5. 1. Motivation Throughout, we will assume that Ω R n is a bounded and connected domain and that a ij C(Ω)
More informationChapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS
Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process
More informationReal Time Value Iteration and the State-Action Value Function
MS&E338 Reinforcement Learning Lecture 3-4/9/18 Real Time Value Iteration and the State-Action Value Function Lecturer: Ben Van Roy Scribe: Apoorva Sharma and Tong Mu 1 Review Last time we left off discussing
More informationCSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam
CSE 573 Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming Slides adapted from Andrey Kolobov and Mausam 1 Stochastic Shortest-Path MDPs: Motivation Assume the agent pays cost
More information(IV.C) UNITS IN QUADRATIC NUMBER RINGS
(IV.C) UNITS IN QUADRATIC NUMBER RINGS Let d Z be non-square, K = Q( d). (That is, d = e f with f squarefree, and K = Q( f ).) For α = a + b d K, N K (α) = α α = a b d Q. If d, take S := Z[ [ ] d] or Z
More informationi=1 β i,i.e. = β 1 x β x β 1 1 xβ d
66 2. Every family of seminorms on a vector space containing a norm induces ahausdorff locally convex topology. 3. Given an open subset Ω of R d with the euclidean topology, the space C(Ω) of real valued
More informationOn a Markov Game with Incomplete Information
On a Markov Game with Incomlete Information Johannes Hörner, Dinah Rosenberg y, Eilon Solan z and Nicolas Vieille x{ January 24, 26 Abstract We consider an examle of a Markov game with lack of information
More informationBELIEF CONSISTENCY AND TRADE CONSISTENCY. E. Lehrer D. Samet. Working Paper No 27/2012 December Research No
BELIEF CONSISTENCY AND TRADE CONSISTENCY by E. Lehrer D. Samet Working Paper No 27/2012 December 2012 Research No. 06510100 The School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel
More informationStochastic convexity in dynamic programming
Economic Theory 22, 447 455 (2003) Stochastic convexity in dynamic programming Alp E. Atakan Department of Economics, Columbia University New York, NY 10027, USA (e-mail: aea15@columbia.edu) Received:
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationMatrix functions that preserve the strong Perron- Frobenius property
Electronic Journal of Linear Algebra Volume 30 Volume 30 (2015) Article 18 2015 Matrix functions that preserve the strong Perron- Frobenius property Pietro Paparella University of Washington, pietrop@uw.edu
More informationStochastic Realization of Binary Exchangeable Processes
Stochastic Realization of Binary Exchangeable Processes Lorenzo Finesso and Cecilia Prosdocimi Abstract A discrete time stochastic process is called exchangeable if its n-dimensional distributions are,
More informationALGEBRA 8: Linear algebra: characteristic polynomial
ALGEBRA 8: Linear algebra: characteristic polynomial Characteristic polynomial Definition 8.1. Consider a linear operator A End V over a vector space V. Consider a vector v V such that A(v) = λv. This
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationOnly Intervals Preserve the Invertibility of Arithmetic Operations
Only Intervals Preserve the Invertibility of Arithmetic Operations Olga Kosheleva 1 and Vladik Kreinovich 2 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationLecture Notes Math 371: Algebra (Fall 2006) by Nathanael Leedom Ackerman
Lecture Notes Math 371: Algebra (Fall 2006) by Nathanael Leedom Ackerman October 17, 2006 TALK SLOWLY AND WRITE NEATLY!! 1 0.1 Factorization 0.1.1 Factorization of Integers and Polynomials Now we are going
More informationValue Iteration. 1 Introduction. Krishnendu Chatterjee 1 and Thomas A. Henzinger 1,2
Value Iteration Krishnendu Chatterjee 1 and Thomas A. Henzinger 1,2 1 University of California, Berkeley 2 EPFL, Switzerland Abstract. We survey value iteration algorithms on graphs. Such algorithms can
More informationKernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman
Kernels of Directed Graph Laplacians J. S. Caughman and J.J.P. Veerman Department of Mathematics and Statistics Portland State University PO Box 751, Portland, OR 97207. caughman@pdx.edu, veerman@pdx.edu
More informationProbabilistic Aspects of Computer Science: Probabilistic Automata
Probabilistic Aspects of Computer Science: Probabilistic Automata Serge Haddad LSV, ENS Paris-Saclay & CNRS & Inria M Jacques Herbrand Presentation 2 Properties of Stochastic Languages 3 Decidability Results
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME
ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems
More informationTHE DYNAMICS OF SUCCESSIVE DIFFERENCES OVER Z AND R
THE DYNAMICS OF SUCCESSIVE DIFFERENCES OVER Z AND R YIDA GAO, MATT REDMOND, ZACH STEWARD Abstract. The n-value game is a dynamical system defined by a method of iterated differences. In this paper, we
More informationMATH 8253 ALGEBRAIC GEOMETRY WEEK 12
MATH 8253 ALGEBRAIC GEOMETRY WEEK 2 CİHAN BAHRAN 3.2.. Let Y be a Noetherian scheme. Show that any Y -scheme X of finite type is Noetherian. Moreover, if Y is of finite dimension, then so is X. Write f
More informationarxiv: v2 [math.co] 25 Jun 2015
ON THE SET OF ELASTICITIES IN NUMERICAL MONOIDS THOMAS BARRON, CHRISTOPHER O NEILL, AND ROBERTO PELAYO arxiv:409.345v [math.co] 5 Jun 05 Abstract. In an atomic, cancellative, commutative monoid S, the
More informationDynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)
Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic
More informationContents Partial Fraction Theory Real Quadratic Partial Fractions Simple Roots Multiple Roots The Sampling Method The Method of Atoms Heaviside s
Contents Partial Fraction Theory Real Quadratic Partial Fractions Simple Roots Multiple Roots The Sampling Method The Method of Atoms Heaviside s Coverup Method Extension to Multiple Roots Special Methods
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationProofs for Large Sample Properties of Generalized Method of Moments Estimators
Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my
More information4 Inverse function theorem
Tel Aviv University, 2014/15 Analysis-III,IV 71 4 Inverse function theorem 4a What is the problem................ 71 4b Simple observations before the theorem..... 72 4c The theorem.....................
More informationPart III. 10 Topological Space Basics. Topological Spaces
Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.
More informationThe Target Discounted-Sum Problem
The Target Discounted-Sum Problem Udi Boker The Interdisciplinary Center (IDC), Herzliya, Israel Email: udiboker@idc.ac.il Thomas A. Henzinger and Jan Otop IST Austria Klosterneuburg, Austria Email: {tah,
More informationהאוניברסיטה העברית בירושלים
האוניברסיטה העברית בירושלים THE HEBREW UNIVERSITY OF JERUSALEM TOWARDS A CHARACTERIZATION OF RATIONAL EXPECTATIONS by ITAI ARIELI Discussion Paper # 475 February 2008 מרכז לחקר הרציונליות CENTER FOR THE
More informationPerturbations of Markov Chains with Applications to Stochastic Games
Perturbations of Markov Chains with Applications to Stochastic Games Eilon Solan June 14, 2000 In this lecture we will review several topics that are extensively used in the study of n-player stochastic
More informationNotes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009
Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009 These notes are based on Chapter 12 of David Blackwell and M. A.Girshick, Theory of Games and Statistical Decisions, John Wiley
More informationAN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES
AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim
More informationTHE BIG MATCH WITH A CLOCK AND A BIT OF MEMORY
THE BIG MATCH WITH A CLOCK AND A BIT OF MEMORY KRISTOFFER ARNSFELT HANSEN, RASMUS IBSEN-JENSEN, AND ABRAHAM NEYMAN Abstract. The Big Match is a multi-stage two-player game. In each stage Player hides one
More informationLearning to Coordinate Efficiently: A Model-based Approach
Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationThe complex projective line
17 The complex projective line Now we will to study the simplest case of a complex projective space: the complex projective line. We will see that even this case has already very rich geometric interpretations.
More informationLinear Algebra I. Ronald van Luijk, 2015
Linear Algebra I Ronald van Luijk, 2015 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents Dependencies among sections 3 Chapter 1. Euclidean space: lines and hyperplanes 5 1.1. Definition
More information18.175: Lecture 2 Extension theorems, random variables, distributions
18.175: Lecture 2 Extension theorems, random variables, distributions Scott Sheffield MIT Outline Extension theorems Characterizing measures on R d Random variables Outline Extension theorems Characterizing
More informationLecture 6: April 25, 2006
Computational Game Theory Spring Semester, 2005/06 Lecture 6: April 25, 2006 Lecturer: Yishay Mansour Scribe: Lior Gavish, Andrey Stolyarenko, Asaph Arnon Partially based on scribe by Nataly Sharkov and
More informationA Survey of Stochastic ω-regular Games
A Survey of Stochastic ω-regular Games Krishnendu Chatterjee Thomas A. Henzinger EECS, University of California, Berkeley, USA Computer and Communication Sciences, EPFL, Switzerland {c krish,tah}@eecs.berkeley.edu
More informationPolynomial stochastic games via sum of squares optimization
Polynomial stochastic games via sum of squares optimization Parikshit Shah University of Wisconsin, Madison, WI email pshah@discoverywiscedu Pablo A Parrilo Massachusetts Institute of Technology, Cambridge,
More informationMultiagent Value Iteration in Markov Games
Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges
More information4 Divergence theorem and its consequences
Tel Aviv University, 205/6 Analysis-IV 65 4 Divergence theorem and its consequences 4a Divergence and flux................. 65 4b Piecewise smooth case............... 67 4c Divergence of gradient: Laplacian........
More information(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε
1. Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional
More informationNATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II
NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II Chapter 2 Further properties of analytic functions 21 Local/Global behavior of analytic functions;
More informationComplex Numbers. Basic algebra. Definitions. part of the complex number a+ib. ffl Addition: Notation: We i write for 1; that is we
Complex Numbers Definitions Notation We i write for 1; that is we define p to be p 1 so i 2 = 1. i Basic algebra Equality a + ib = c + id when a = c b = and d. Addition A complex number is any expression
More information