Optimal Control of Partially Observable Piecewise Deterministic Markov Processes
|
|
- Shawn Hawkins
- 5 years ago
- Views:
Transcription
1 Optimal Control of Partially Observable Piecewise Deterministic Markov Processes Nicole Bäuerle based on a joint work with D. Lange Wien, April 2018
2 Outline Motivation PDMP with Partial Observation Controlled PDMP with Partial Observation Reduction to a Discrete-Time Problem The Filter Existence Results Application
3 Motivation process state running cost 0 1 st jump 2nd time y process state A typical path of a PDMP
4 Ingredients of a PDMP In order to define a PDMP we need the following data The drift Φ : R d R + R d is continuous and satisfies for all y R d and s, t > 0: Φ(y, t + s) = Φ(Φ(y, s), t).
5 Ingredients of a PDMP In order to define a PDMP we need the following data The drift Φ : R d R + R d is continuous and satisfies for all y R d and s, t > 0: Φ(y, t + s) = Φ(Φ(y, s), t). The jump times 0 := T 0 < T 1 <... are R + -valued random variables such that S n := T n T n 1, n N, S 0 := 0. They are generated by an intensity λ : R d (0, ).
6 Ingredients of a PDMP In order to define a PDMP we need the following data The drift Φ : R d R + R d is continuous and satisfies for all y R d and s, t > 0: Φ(y, t + s) = Φ(Φ(y, s), t). The jump times 0 := T 0 < T 1 <... are R + -valued random variables such that S n := T n T n 1, n N, S 0 := 0. They are generated by an intensity λ : R d (0, ). A transition kernel Q from R d to R d gives the probability Q(B y) that the process jumps into B given the state y.
7 Ingredients of a PDMP In order to define a PDMP we need the following data The drift Φ : R d R + R d is continuous and satisfies for all y R d and s, t > 0: Φ(y, t + s) = Φ(Φ(y, s), t). The jump times 0 := T 0 < T 1 <... are R + -valued random variables such that S n := T n T n 1, n N, S 0 := 0. They are generated by an intensity λ : R d (0, ). A transition kernel Q from R d to R d gives the probability Q(B y) that the process jumps into B given the state y. The PDMP is then defined by Y t := Φ(Y Tn, t T n ), for T n t < T n+1, n N 0.
8 A PDMP with Partial Observation Let (ɛ n ) n N be iid R d -valued random variables. We assume that the controller is only able to observe X n := Y Tn + ɛ n. Define Λ(y, t) := t 0 λ( Φ(y, s) ) ds. Then P x,y (S n t, Y Tn C, X n D S 0, Y T0, X 0,..., S n 1, Y Tn 1, X n 1 ) = t 0 exp ( Λ(Y Tn 1, s) ) λ ( Φ(Y Tn 1, s) ) C Q ɛ (D y )Q ( dy Φ(Y Tn 1, s) ) ds
9 Policies for POPDMP Consider marked point process (T n, Ŷn := Y Tn, X n ).
10 Policies for POPDMP Consider marked point process (T n, Ŷn := Y Tn, X n ). Relaxed controls on action space A: R := {r : [0, ) P(A) r measurable}.
11 Policies for POPDMP Consider marked point process (T n, Ŷn := Y Tn, X n ). Relaxed controls on action space A: R := {r : [0, ) P(A) r measurable}. Observable histories: H 0 := R d and for n N H n := H n 1 R R + R d. Element: h n = (x 0, r 0, s 1, x 1,..., r n 1, s n, x n ) H n
12 Policies for POPDMP Consider marked point process (T n, Ŷn := Y Tn, X n ). Relaxed controls on action space A: R := {r : [0, ) P(A) r measurable}. Observable histories: H 0 := R d and for n N H n := H n 1 R R + R d. Element: h n = (x 0, r 0, s 1, x 1,..., r n 1, s n, x n ) H n A discrete time history dependent relaxed control policy is π D := (π D 0, πd 1,... ) where πd n : H n R is measurable.
13 Policies for POPDMP Consider marked point process (T n, Ŷn := Y Tn, X n ). Relaxed controls on action space A: R := {r : [0, ) P(A) r measurable}. Observable histories: H 0 := R d and for n N H n := H n 1 R R + R d. Element: h n = (x 0, r 0, s 1, x 1,..., r n 1, s n, x n ) H n A discrete time history dependent relaxed control policy is π D := (π D 0, πd 1,... ) where πd n : H n R is measurable. Continuous control: π t := 1 {Tn t<tn+1 }(t) πn D (H n )(t T n ). n=0
14 Controlled Data We assume that the policy influences the drift, the intensity and the transition kernel.
15 Controlled Data We assume that the policy influences the drift, the intensity and the transition kernel. Drift Φ : R R d R + R d, notation Φ r (y, t).
16 Controlled Data We assume that the policy influences the drift, the intensity and the transition kernel. Drift Φ : R R d R + R d, notation Φ r (y, t). Jump rate λ A : R d A (0, ).
17 Controlled Data We assume that the policy influences the drift, the intensity and the transition kernel. Drift Φ : R R d R + R d, notation Φ r (y, t). Jump rate λ A : R d A (0, ). Jump kernel Q A from R d A to R d.
18 Controlled Process A Controlled Partially Observable Piecewise Deterministic Markov Process with local characteristics (Φ r, λ A, Q A, Q ɛ ) satisfies P πd t = e ΛπD n 1(Ŷn 1,s) 0 x,y(s n t, Ŷn C, X n D S 0,..., S n 1, Ŷn 1, X n 1, πn 1 D ) λ A( Φ πd n 1 ( Ŷ n 1, s), a ) A Q ɛ (D y )Q A( dy Φ πd n 1 ( Ŷ n 1, s), a ) πn 1 D (H n 1, s)(da)ds C where Λ r (y, t) := t 0 A λa (Φ r (y, s), a)r s (da) ds. Toy Example.
19 The Optimization Problem For a fixed policy π D we define the cost as [ J(x, π D ) := e βt c(y t, a) π t (da) dt IE πd x,y 0 A ] Q 0 (dy x).
20 The Optimization Problem For a fixed policy π D we define the cost as [ J(x, π D ) := e βt c(y t, a) π t (da) dt IE πd x,y The value function is defined as 0 J(x) := inf π D J(x, π D ) for all x R d. A ] Q 0 (dy x).
21 The Optimization Problem For a fixed policy π D we define the cost as [ J(x, π D ) := e βt c(y t, a) π t (da) dt IE πd x,y The value function is defined as A policy π is optimal if 0 J(x) := inf π D J(x, π D ) for all x R d. J(x) = J(x, π ) for all x R d. A ] Q 0 (dy x).
22 A discrete-time POMDP We define a POMDP as follows:
23 A discrete-time POMDP We define a POMDP as follows: State space R + R 2d (s, y, x).
24 A discrete-time POMDP We define a POMDP as follows: State space R + R 2d (s, y, x). Action space R r.
25 A discrete-time POMDP We define a POMDP as follows: State space R + R 2d (s, y, x). Action space R r. Transition law with Γ r (y, t) := βt + t 0 t e Γr (y,u) 0 A λa (Φ r (y, u), a)r u (da)du Q ( [0, t] C D y, r ) = Aλ A( Φ r (y, u), a ) Q(D y )Q A( dy Φ r (y, u), a ) r u (da)du. C
26 A discrete-time POMDP We define a POMDP as follows: State space R + R 2d (s, y, x). Action space R r. Transition law with Γ r (y, t) := βt + t 0 t e Γr (y,u) 0 A λa (Φ r (y, u), a)r u (da)du Q ( [0, t] C D y, r ) = Aλ A( Φ r (y, u), a ) Q(D y )Q A( dy Φ r (y, u), a ) r u (da)du. C The one-stage cost g(y, r) = 0 e Γr (y,t) A c(φ r (y, t), a) r t (da) dt.
27 The Problem for the discrete-time POMDP For a fixed policy π D we define the cost as [ ] J(x, π D ) := ĨEπD x g(ŷk, πk D (H k)). k=0
28 The Problem for the discrete-time POMDP For a fixed policy π D we define the cost as [ ] J(x, π D ) := ĨEπD x g(ŷk, πk D (H k)). The value function is defined as k=0 J(x) := inf π D J(x, π D ) x R d.
29 The Problem for the discrete-time POMDP For a fixed policy π D we define the cost as [ ] J(x, π D ) := ĨEπD x g(ŷk, πk D (H k)). The value function is defined as A policy π is optimal if k=0 J(x) := inf π D J(x, π D ) x R d. J(x) = J(x, π ) x R d.
30 Equivalence of the two Problems Theorem Let x R d be an initial observation and π D a policy. Then, it holds J(x, π D ) = J(x, π D ).
31 Assumptions (C1) The action space A is a compact metric space.
32 Assumptions (C1) The action space A is a compact metric space. (C2) λ A : R d A (0, ) is continuous and 0 < λ < λ A < λ.
33 Assumptions (C1) The action space A is a compact metric space. (C2) λ A : R d A (0, ) is continuous and 0 < λ < λ A < λ. (C3) Q A is weakly continuous, i.e. (x, a) v(z)q A (dz x, a) is continuous and bounded for all v : R d R continuous and bounded.
34 Assumptions (C1) The action space A is a compact metric space. (C2) λ A : R d A (0, ) is continuous and 0 < λ < λ A < λ. (C3) Q A is weakly continuous, i.e. (x, a) v(z)q A (dz x, a) is continuous and bounded for all v : R d R continuous and bounded. (B1) There exists E 0 = {y 1,..., y d } R d s.t. for all y R d and a A we have Q A (E 0 y, a) = 1 and Q 0 (E 0 ) = 1.
35 Assumptions (C1) The action space A is a compact metric space. (C2) λ A : R d A (0, ) is continuous and 0 < λ < λ A < λ. (C3) Q A is weakly continuous, i.e. (x, a) v(z)q A (dz x, a) is continuous and bounded for all v : R d R continuous and bounded. (B1) There exists E 0 = {y 1,..., y d } R d s.t. for all y R d and a A we have Q A (E 0 y, a) = 1 and Q 0 (E 0 ) = 1. (B2) Q ɛ has a bounded density f ɛ with respect to some σ-finite measure ν.
36 The Filter Assumption (B2) implies that the transition law has the density q(s, y, x y, r) = e Γr (y,s) f (x y ) λ A( Φ r (y, s), a ) Q A( y Φ r (y, s), a ) r s (da). A
37 The Filter Assumption (B2) implies that the transition law has the density q(s, y, x y, r) = e Γr (y,s) f (x y ) A λ A( Φ r (y, s), a ) Q A( y Φ r (y, s), a ) r s (da). Define for y E 0 the updating operator Ψ(ρ, r, s, x)(y y E q(s, ) := 0 y, x y, r)ρ(y) ŷ E y E q(s, 0 0 ŷ, x y, r)ρ(y). and for h n = (x 0, r 0, s 1, x 1,..., r n 1, s n, x n ), µ 0 (x 0 ) := Q 0 ( x 0 ), µ n ( h n ) = µ n ( h n 1, r n 1, s n, x n ) := Ψ ( µ n 1 ( h n 1 ), r n 1, s n, x n ).
38 The Filter Theorem It holds that ) P πd x (Ŷn C X 0, R 0, S 1, X 1,..., R n 1, S n, X n = µ n (C X 0, R 0, S 1, X 1,..., R n 1, S n, X n )
39 Filtered MDP We define a POMDP as follows:
40 Filtered MDP We define a POMDP as follows: State space P(E 0 ) ρ.
41 Filtered MDP We define a POMDP as follows: State space P(E 0 ) ρ. Action space R r.
42 Filtered MDP We define a POMDP as follows: State space P(E 0 ) ρ. Action space R r. Transition law ˆQ(B ρ, r) = y E 0 1 B ( Ψ(ρ, r, s, x) ) q SX (s, x y, r)ν(dx)dsρ(y), where q SX (s, x y, r) := y E 0 q(s, y, x y, r).
43 Filtered MDP We define a POMDP as follows: State space P(E 0 ) ρ. Action space R r. Transition law ˆQ(B ρ, r) = y E 0 1 B ( Ψ(ρ, r, s, x) ) q SX (s, x y, r)ν(dx)dsρ(y), where q SX (s, x y, r) := y E q(s, 0 y, x y, r). The one-stage cost ĝ(ρ, r) := g(y, r)ρ(y). y E 0
44 Filtered MDP Policy π = (f 0, f 1,...) with f n : P(E 0 ) R can be seen as a special policy π D Π D by setting π D n (h n ) := f n (µ n ( h n )).
45 Filtered MDP Policy π = (f 0, f 1,...) with f n : P(E 0 ) R can be seen as a special policy π D Π D by setting π D n (h n ) := f n (µ n ( h n )). We denote the cost of policy π by [ V (ρ, π) := ÎE π ρ ĝ ( µ n, f n (µ n ) )]. n=0
46 Filtered MDP Policy π = (f 0, f 1,...) with f n : P(E 0 ) R can be seen as a special policy π D Π D by setting π D n (h n ) := f n (µ n ( h n )). We denote the cost of policy π by [ V (ρ, π) := ÎE π ρ ĝ ( µ n, f n (µ n ) )]. n=0 The value function is defined as V (ρ) := inf V (ρ, π) for all ρ P(E 0 ). π Π
47 Equivalence of the two Problems Theorem Let x R d be an initial observation, π and π D given as on previous slide. Then, it holds V (Q 0 ( x), π) = J(x, π D ).
48 Further Assumptions (C4) r Φ r (y, t) is continuous for all y E 0, t 0.
49 Further Assumptions (C4) r Φ r (y, t) is continuous for all y E 0, t 0. (C5) c : R d A R + is lower semi-continuous with respect to the product topology.
50 Regularization of the Filter Let h σ : R R, σ > 0 be a regularization kernel, i.e. (i) h σ (t) 0 for all t R, (ii) R h σ(t)dt = 1, a (iii) lim σ 0 a h σ(t)dt = 1 for all a > 0.
51 Regularization of the Filter Let h σ : R R, σ > 0 be a regularization kernel, i.e. (i) h σ (t) 0 for all t R, (ii) R h σ(t)dt = 1, a (iii) lim σ 0 a h σ(t)dt = 1 for all a > 0. We use a regularized filter of the form ˆΨ(ρ, r, s, x)(y R y E q(u, ) := 0 y, x y, r)ρ(y)h σ (s u)du ŷ E y E q(u, 0 0 ŷ, x y, r)ρ(y)h σ (s u)du. Note that we have lim σ 0 ˆΨ = Ψ R
52 Solution of the Problem Theorem Under all assumptions we have for the regularized version of the problem a) V σ (ρ) = inf r R {ĝ(ρ, r) + P(E 0 ) V σ (ρ ) ˆQ(dρ } ρ, r). b) There exists an f which attains the minimum in a) and (f, f,...) is optimal for the filtered problem. The optimal policy for the original problem is (π0 D, πd 1,...) with π P 0 (x)(t) = f ( Q 0 ( x) ) (t), x R d π P n (h n )(t) = f ( µ n ( h n ) ) (t), h n H n.
53 Application: Toy Example (i) The state space is R, E 0 := { 2, 0, 2}, the action space is A := [ 1; 1]. (ii) The controlled drift is given by d dt Φr (y, t) = ar t (da), Φ(y, 0) = y. (iii) We set λ A 1 and β := 1. A (iv) The jump transition kernel Q A and the cost function are given in the picture on the next slide. (vii) Signal density: f ɛ ( 1) = f ɛ (0) = f ɛ (1) = 1 3. back
54 Application: Transition Kernel and Cost Function c(y) y Q(y; ) = δ 2 ( ) Q(y; ) = δ 0 ( ) Q(y; ) = δ 2 ( )
55 Solution of the Problem: Optimal Policy Theorem In this POPDMP all assumptions which we previously made, are satisfied and there exists an optimal policy. In this example we have also computed the value function and the optimal policy numerically by value iteration. For the optimal policy we obtained { 1{t 1 π n (h n, t) := 2 } if µ 1 n µ 3 n, 1 {t 1 2 } if µ 1 n µ 3 n.
56 Solution of the Problem: Value Function
57 References References N.B. and Lange, D.: Optimal control of partially observable piecewise deterministic Markov processes. SIAM J. Control Optim., 56, , N.B. and Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Brandejsky, A. and de Saporta, B. and Dufour, F.: Optimal stopping for partially observed piecewise-deterministic Markov processes. Stochastic Process. Appl., 123, , Davis, M.H.A.: Markov Models and Optimization. Chapman and Hall, Yushkevich, A. A.: On reducing a jump controllable Markov model to a model with discrete time. Theory Probab. Appl., 25, 58-69, 1980.
58 References Thank you very much for your attention!
arxiv: v2 [math.oc] 17 Jan 2018
OPTIML CONTROL OF PRTILLY OBSERVBLE PIECEWISE DETERMINISTIC MRKOV PROCESSES NICOLE BÄUERLE ND DIRK LNGE arxiv:176.9142v2 [math.oc 17 Jan 218 bstract. In this paper we consider a control problem for a Partially
More informationOn Stopping Times and Impulse Control with Constraint
On Stopping Times and Impulse Control with Constraint Jose Luis Menaldi Based on joint papers with M. Robin (216, 217) Department of Mathematics Wayne State University Detroit, Michigan 4822, USA (e-mail:
More informationMDP Algorithms for Portfolio Optimization Problems in pure Jump Markets
MDP Algorithms for Portfolio Optimization Problems in pure Jump Markets Nicole Bäuerle, Ulrich Rieder Abstract We consider the problem of maximizing the expected utility of the terminal wealth of a portfolio
More informationON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME
ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems
More informationA MODEL FOR THE LONG-TERM OPTIMAL CAPACITY LEVEL OF AN INVESTMENT PROJECT
A MODEL FOR HE LONG-ERM OPIMAL CAPACIY LEVEL OF AN INVESMEN PROJEC ARNE LØKKA AND MIHAIL ZERVOS Abstract. We consider an investment project that produces a single commodity. he project s operation yields
More informationDeterministic Kalman Filtering on Semi-infinite Interval
Deterministic Kalman Filtering on Semi-infinite Interval L. Faybusovich and T. Mouktonglang Abstract We relate a deterministic Kalman filter on semi-infinite interval to linear-quadratic tracking control
More informationHeavy Tailed Time Series with Extremal Independence
Heavy Tailed Time Series with Extremal Independence Rafa l Kulik and Philippe Soulier Conference in honour of Prof. Herold Dehling Bochum January 16, 2015 Rafa l Kulik and Philippe Soulier Regular variation
More informationExistence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models
Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models Christian Bayer and Klaus Wälde Weierstrass Institute for Applied Analysis and Stochastics and University
More informationMinimum average value-at-risk for finite horizon semi-markov decision processes
12th workshop on Markov processes and related topics Minimum average value-at-risk for finite horizon semi-markov decision processes Xianping Guo (with Y.H. HUANG) Sun Yat-Sen University Email: mcsgxp@mail.sysu.edu.cn
More informationErnesto Mordecki 1. Lecture III. PASI - Guanajuato - June 2010
Optimal stopping for Hunt and Lévy processes Ernesto Mordecki 1 Lecture III. PASI - Guanajuato - June 2010 1Joint work with Paavo Salminen (Åbo, Finland) 1 Plan of the talk 1. Motivation: from Finance
More informationOn the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers
On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers Huizhen (Janey) Yu (janey@mit.edu) Dimitri Bertsekas (dimitrib@mit.edu) Lab for Information and Decision Systems,
More informationThomas Knispel Leibniz Universität Hannover
Optimal long term investment under model ambiguity Optimal long term investment under model ambiguity homas Knispel Leibniz Universität Hannover knispel@stochastik.uni-hannover.de AnStAp0 Vienna, July
More informationFunctional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals
Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals Noèlia Viles Cuadros BCAM- Basque Center of Applied Mathematics with Prof. Enrico
More informationA.Piunovskiy. University of Liverpool Fluid Approximation to Controlled Markov. Chains with Local Transitions. A.Piunovskiy.
University of Liverpool piunov@liv.ac.uk The Markov Decision Process under consideration is defined by the following elements X = {0, 1, 2,...} is the state space; A is the action space (Borel); p(z x,
More informationOptimal portfolio strategies under partial information with expert opinions
1 / 35 Optimal portfolio strategies under partial information with expert opinions Ralf Wunderlich Brandenburg University of Technology Cottbus, Germany Joint work with Rüdiger Frey Research Seminar WU
More informationSkew Brownian Motion and Applications in Fluid Dispersion
Skew Brownian Motion and Applications in Fluid Dispersion Ed Waymire Department of Mathematics Oregon State University Corvallis, OR 97331 *Based on joint work with Thilanka Appuhamillage, Vrushali Bokil,
More informationOPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS
APPLICATIONES MATHEMATICAE 29,4 (22), pp. 387 398 Mariusz Michta (Zielona Góra) OPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS Abstract. A martingale problem approach is used first to analyze
More informationThe Mabinogion Sheep Problem
The Mabinogion Sheep Problem Kun Dong Cornell University April 22, 2015 K. Dong (Cornell University) The Mabinogion Sheep Problem April 22, 2015 1 / 18 Introduction (Williams 1991) we are given a herd
More informationINEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS
Applied Probability Trust (5 October 29) INEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS EUGENE A. FEINBERG, Stony Brook University JUN FEI, Stony Brook University Abstract We consider the following
More informationValue Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes
Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael
More informationStochastic relations of random variables and processes
Stochastic relations of random variables and processes Lasse Leskelä Helsinki University of Technology 7th World Congress in Probability and Statistics Singapore, 18 July 2008 Fundamental problem of applied
More informationWalsh Diffusions. Andrey Sarantsev. March 27, University of California, Santa Barbara. Andrey Sarantsev University of Washington, Seattle 1 / 1
Walsh Diffusions Andrey Sarantsev University of California, Santa Barbara March 27, 2017 Andrey Sarantsev University of Washington, Seattle 1 / 1 Walsh Brownian Motion on R d Spinning measure µ: probability
More informationMalliavin Calculus in Finance
Malliavin Calculus in Finance Peter K. Friz 1 Greeks and the logarithmic derivative trick Model an underlying assent by a Markov process with values in R m with dynamics described by the SDE dx t = b(x
More informationInfinite-Horizon Discounted Markov Decision Processes
Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected
More informationRepresentation of Markov chains. Stochastic stability AIMS 2014
Markov chain model Joint with J. Jost and M. Kell Max Planck Institute for Mathematics in the Sciences Leipzig 0th July 204 AIMS 204 We consider f : M! M to be C r for r 0 and a small perturbation parameter
More informationRegular Variation and Extreme Events for Stochastic Processes
1 Regular Variation and Extreme Events for Stochastic Processes FILIP LINDSKOG Royal Institute of Technology, Stockholm 2005 based on joint work with Henrik Hult www.math.kth.se/ lindskog 2 Extremes for
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationReaction-Diffusion Equations In Narrow Tubes and Wave Front P
Outlines Reaction-Diffusion Equations In Narrow Tubes and Wave Front Propagation University of Maryland, College Park USA Outline of Part I Outlines Real Life Examples Description of the Problem and Main
More informationAffine Processes. Econometric specifications. Eduardo Rossi. University of Pavia. March 17, 2009
Affine Processes Econometric specifications Eduardo Rossi University of Pavia March 17, 2009 Eduardo Rossi (University of Pavia) Affine Processes March 17, 2009 1 / 40 Outline 1 Affine Processes 2 Affine
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 22 12/09/2013. Skorokhod Mapping Theorem. Reflected Brownian Motion
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.7J Fall 213 Lecture 22 12/9/213 Skorokhod Mapping Theorem. Reflected Brownian Motion Content. 1. G/G/1 queueing system 2. One dimensional reflection mapping
More informationHJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011
Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance
More informationReductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang
Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones Jefferson Huang School of Operations Research and Information Engineering Cornell University November 16, 2016
More informationExercises in stochastic analysis
Exercises in stochastic analysis Franco Flandoli, Mario Maurelli, Dario Trevisan The exercises with a P are those which have been done totally or partially) in the previous lectures; the exercises with
More informationTheoretical Tutorial Session 2
1 / 36 Theoretical Tutorial Session 2 Xiaoming Song Department of Mathematics Drexel University July 27, 216 Outline 2 / 36 Itô s formula Martingale representation theorem Stochastic differential equations
More informationStrong Markov property of determinantal processes associated with extended kernels
Strong Markov property of determinantal processes associated with extended kernels Hideki Tanemura Chiba university (Chiba, Japan) (November 22, 2013) Hideki Tanemura (Chiba univ.) () Markov process (November
More informationShort-time expansions for close-to-the-money options under a Lévy jump model with stochastic volatility
Short-time expansions for close-to-the-money options under a Lévy jump model with stochastic volatility José Enrique Figueroa-López 1 1 Department of Statistics Purdue University Statistics, Jump Processes,
More informationTotal Expected Discounted Reward MDPs: Existence of Optimal Policies
Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600
More informationOptimal stopping of a risk process when claims are covered immediately
Optimal stopping of a risk process when claims are covered immediately Bogdan Muciek Krzysztof Szajowski Abstract The optimal stopping problem for the risk process with interests rates and when claims
More informationWEAK VERSIONS OF STOCHASTIC ADAMS-BASHFORTH AND SEMI-IMPLICIT LEAPFROG SCHEMES FOR SDES. 1. Introduction
WEAK VERSIONS OF STOCHASTIC ADAMS-BASHFORTH AND SEMI-IMPLICIT LEAPFROG SCHEMES FOR SDES BRIAN D. EWALD 1 Abstract. We consider the weak analogues of certain strong stochastic numerical schemes considered
More informationIntroduction to self-similar growth-fragmentations
Introduction to self-similar growth-fragmentations Quan Shi CIMAT, 11-15 December, 2017 Quan Shi Growth-Fragmentations CIMAT, 11-15 December, 2017 1 / 34 Literature Jean Bertoin, Compensated fragmentation
More informationSimulation of conditional diffusions via forward-reverse stochastic representations
Weierstrass Institute for Applied Analysis and Stochastics Simulation of conditional diffusions via forward-reverse stochastic representations Christian Bayer and John Schoenmakers Numerical methods for
More informationr-largest jump or observation; trimming; PA models
r-largest jump or observation; trimming; PA models Sidney Resnick School of Operations Research and Industrial Engineering Rhodes Hall, Cornell University Ithaca NY 14853 USA http://people.orie.cornell.edu/sid
More informationDynamic Matching Models
Dynamic Matching Models Ana Bušić Inria Paris - Rocquencourt CS Department of École normale supérieure joint work with Varun Gupta, Jean Mairesse and Sean Meyn 3rd Workshop on Cognition and Control January
More informationKolmogorov Equations and Markov Processes
Kolmogorov Equations and Markov Processes May 3, 013 1 Transition measures and functions Consider a stochastic process {X(t)} t 0 whose state space is a product of intervals contained in R n. We define
More informationInfinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 1 Outline The average
More informationWiener Measure and Brownian Motion
Chapter 16 Wiener Measure and Brownian Motion Diffusion of particles is a product of their apparently random motion. The density u(t, x) of diffusing particles satisfies the diffusion equation (16.1) u
More informationApproximation of BSDEs using least-squares regression and Malliavin weights
Approximation of BSDEs using least-squares regression and Malliavin weights Plamen Turkedjiev (turkedji@math.hu-berlin.de) 3rd July, 2012 Joint work with Prof. Emmanuel Gobet (E cole Polytechnique) Plamen
More informationRecent results in game theoretic mathematical finance
Recent results in game theoretic mathematical finance Nicolas Perkowski Humboldt Universität zu Berlin May 31st, 2017 Thera Stochastics In Honor of Ioannis Karatzas s 65th Birthday Based on joint work
More informationON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE. 1. Introduction
ON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE KARL PETERSEN AND SUJIN SHIN Abstract. We show that two natural definitions of the relative pressure function for a locally
More informationChapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS
Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process
More informationPhysician Performance Assessment / Spatial Inference of Pollutant Concentrations
Physician Performance Assessment / Spatial Inference of Pollutant Concentrations Dawn Woodard Operations Research & Information Engineering Cornell University Johns Hopkins Dept. of Biostatistics, April
More informationSemi-Infinite Relaxations for a Dynamic Knapsack Problem
Semi-Infinite Relaxations for a Dynamic Knapsack Problem Alejandro Toriello joint with Daniel Blado, Weihong Hu Stewart School of Industrial and Systems Engineering Georgia Institute of Technology MIT
More informationarxiv: v1 [math.pr] 19 Aug 2014
A Multiplicative Wavelet-based Model for Simulation of a andom Process IEVGEN TUCHYN University of Lausanne, Lausanne, Switzerland arxiv:1408.453v1 [math.p] 19 Aug 014 We consider a random process Y(t)
More informationON THE FIRST TIME THAT AN ITO PROCESS HITS A BARRIER
ON THE FIRST TIME THAT AN ITO PROCESS HITS A BARRIER GERARDO HERNANDEZ-DEL-VALLE arxiv:1209.2411v1 [math.pr] 10 Sep 2012 Abstract. This work deals with first hitting time densities of Ito processes whose
More informationDiscretization of SDEs: Euler Methods and Beyond
Discretization of SDEs: Euler Methods and Beyond 09-26-2006 / PRisMa 2006 Workshop Outline Introduction 1 Introduction Motivation Stochastic Differential Equations 2 The Time Discretization of SDEs Monte-Carlo
More informationMartingale Problems. Abhay G. Bhatt Theoretical Statistics and Mathematics Unit Indian Statistical Institute, Delhi
s Abhay G. Bhatt Theoretical Statistics and Mathematics Unit Indian Statistical Institute, Delhi Lectures on Probability and Stochastic Processes III Indian Statistical Institute, Kolkata 20 24 November
More informationReflected Brownian Motion
Chapter 6 Reflected Brownian Motion Often we encounter Diffusions in regions with boundary. If the process can reach the boundary from the interior in finite time with positive probability we need to decide
More informationBayesian Inference and the Symbolic Dynamics of Deterministic Chaos. Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2
How Random Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos Christopher C. Strelioff 1,2 Dr. James P. Crutchfield 2 1 Center for Complex Systems Research and Department of Physics University
More informationPoisson random measure: motivation
: motivation The Lévy measure provides the expected number of jumps by time unit, i.e. in a time interval of the form: [t, t + 1], and of a certain size Example: ν([1, )) is the expected number of jumps
More informationA Diffusion Approximation for Stationary Distribution of Many-Server Queueing System In Halfin-Whitt Regime
A Diffusion Approximation for Stationary Distribution of Many-Server Queueing System In Halfin-Whitt Regime Mohammadreza Aghajani joint work with Kavita Ramanan Brown University APS Conference, Istanbul,
More informationOn detection of unit roots generalizing the classic Dickey-Fuller approach
On detection of unit roots generalizing the classic Dickey-Fuller approach A. Steland Ruhr-Universität Bochum Fakultät für Mathematik Building NA 3/71 D-4478 Bochum, Germany February 18, 25 1 Abstract
More informationON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES
MMOR manuscript No. (will be inserted by the editor) ON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES Eugene A. Feinberg Department of Applied Mathematics and Statistics; State University of New
More informationSimulation methods for stochastic models in chemistry
Simulation methods for stochastic models in chemistry David F. Anderson anderson@math.wisc.edu Department of Mathematics University of Wisconsin - Madison SIAM: Barcelona June 4th, 21 Overview 1. Notation
More informationContinuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University.
Continuous-Time Markov Decision Processes Discounted and Average Optimality Conditions Xianping Guo Zhongshan University. Email: mcsgxp@zsu.edu.cn Outline The control model The existing works Our conditions
More informationZero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation
Zero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation Oscar Vega-Amaya Departamento de Matemáticas Universidad de Sonora May 2002 Abstract This paper deals with zero-sum average
More informationOn Ergodic Impulse Control with Constraint
On Ergodic Impulse Control with Constraint Maurice Robin Based on joint papers with J.L. Menaldi University Paris-Sanclay 9119 Saint-Aubin, France (e-mail: maurice.robin@polytechnique.edu) IMA, Minneapolis,
More informationReliability and Risk Analysis. Time Series, Types of Trend Functions and Estimates of Trends
Reliability and Risk Analysis Stochastic process The sequence of random variables {Y t, t = 0, ±1, ±2 } is called the stochastic process The mean function of a stochastic process {Y t} is the function
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationIntertwining of Markov processes
January 4, 2011 Outline Outline First passage times of birth and death processes Outline First passage times of birth and death processes The contact process on the hierarchical group 1 0.5 0-0.5-1 0 0.2
More informationWeak Feller Property of Non-linear Filters
ariv:1812.05509v1 [math.oc] 13 Dec 2018 Weak Feller Property of Non-linear Filters Ali Devran Kara, Naci Saldi and Serdar üksel Abstract Weak Feller property of controlled and control-free Markov chains
More informationReinforcement Learning as Classification Leveraging Modern Classifiers
Reinforcement Learning as Classification Leveraging Modern Classifiers Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 Machine Learning Reductions
More informationA Dynamic Contagion Process with Applications to Finance & Insurance
A Dynamic Contagion Process with Applications to Finance & Insurance Angelos Dassios Department of Statistics London School of Economics Angelos Dassios, Hongbiao Zhao (LSE) A Dynamic Contagion Process
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationRectifiability of sets and measures
Rectifiability of sets and measures Tatiana Toro University of Washington IMPA February 7, 206 Tatiana Toro (University of Washington) Structure of n-uniform measure in R m February 7, 206 / 23 State of
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationStatistics 581, Problem Set 1 Solutions Wellner; 10/3/ (a) The case r = 1 of Chebychev s Inequality is known as Markov s Inequality
Statistics 581, Problem Set 1 Solutions Wellner; 1/3/218 1. (a) The case r = 1 of Chebychev s Inequality is known as Markov s Inequality and is usually written P ( X ɛ) E( X )/ɛ for an arbitrary random
More informationNested Uncertain Differential Equations and Its Application to Multi-factor Term Structure Model
Nested Uncertain Differential Equations and Its Application to Multi-factor Term Structure Model Xiaowei Chen International Business School, Nankai University, Tianjin 371, China School of Finance, Nankai
More informationStochastic Volatility and Correction to the Heat Equation
Stochastic Volatility and Correction to the Heat Equation Jean-Pierre Fouque, George Papanicolaou and Ronnie Sircar Abstract. From a probabilist s point of view the Twentieth Century has been a century
More informationZero-sum semi-markov games in Borel spaces: discounted and average payoff
Zero-sum semi-markov games in Borel spaces: discounted and average payoff Fernando Luque-Vásquez Departamento de Matemáticas Universidad de Sonora México May 2002 Abstract We study two-person zero-sum
More informationA SKOROHOD REPRESENTATION THEOREM WITHOUT SEPARABILITY PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO
A SKOROHOD REPRESENTATION THEOREM WITHOUT SEPARABILITY PATRIZIA BERTI, LUCA PRATELLI, AND PIETRO RIGO Abstract. Let (S, d) be a metric space, G a σ-field on S and (µ n : n 0) a sequence of probabilities
More information{σ x >t}p x. (σ x >t)=e at.
3.11. EXERCISES 121 3.11 Exercises Exercise 3.1 Consider the Ornstein Uhlenbeck process in example 3.1.7(B). Show that the defined process is a Markov process which converges in distribution to an N(0,σ
More informationA NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM
J. Appl. Prob. 49, 876 882 (2012 Printed in England Applied Probability Trust 2012 A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM BRIAN FRALIX and COLIN GALLAGHER, Clemson University Abstract
More informationLarge Deviations Principles for McKean-Vlasov SDEs, Skeletons, Supports and the law of iterated logarithm
Large Deviations Principles for McKean-Vlasov SDEs, Skeletons, Supports and the law of iterated logarithm Gonçalo dos Reis University of Edinburgh (UK) & CMA/FCT/UNL (PT) jointly with: W. Salkeld, U. of
More informationRobust Optimal Control Using Conditional Risk Mappings in Infinite Horizon
Robust Optimal Control Using Conditional Risk Mappings in Infinite Horizon Kerem Uğurlu Monday 9 th April, 2018 Department of Applied Mathematics, University of Washington, Seattle, WA 98195 e-mail: keremu@uw.edu
More informationHomogenization for chaotic dynamical systems
Homogenization for chaotic dynamical systems David Kelly Ian Melbourne Department of Mathematics / Renci UNC Chapel Hill Mathematics Institute University of Warwick November 3, 2013 Duke/UNC Probability
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationThe Contraction Method on C([0, 1]) and Donsker s Theorem
The Contraction Method on C([0, 1]) and Donsker s Theorem Henning Sulzbach J. W. Goethe-Universität Frankfurt a. M. YEP VII Probability, random trees and algorithms Eindhoven, March 12, 2010 joint work
More informationHarnack Inequalities and Applications for Stochastic Equations
p. 1/32 Harnack Inequalities and Applications for Stochastic Equations PhD Thesis Defense Shun-Xiang Ouyang Under the Supervision of Prof. Michael Röckner & Prof. Feng-Yu Wang March 6, 29 p. 2/32 Outline
More informationZ. Zhou On the classical limit of a time-dependent self-consistent field system: analysis. computation
On the classical limit of a time-dependent self-consistent field system: analysis and computation Zhennan Zhou 1 joint work with Prof. Shi Jin and Prof. Christof Sparber. 1 Department of Mathematics Duke
More informationYaglom-type limit theorems for branching Brownian motion with absorption. by Jason Schweinsberg University of California San Diego
Yaglom-type limit theorems for branching Brownian motion with absorption by Jason Schweinsberg University of California San Diego (with Julien Berestycki, Nathanaël Berestycki, Pascal Maillard) Outline
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationMarkov Decision Processes and Solving Finite Problems. February 8, 2017
Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:
More informationOn the split equality common fixed point problem for quasi-nonexpansive multi-valued mappings in Banach spaces
Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 (06), 5536 5543 Research Article On the split equality common fixed point problem for quasi-nonexpansive multi-valued mappings in Banach spaces
More informationMean field simulation for Monte Carlo integration. Part I : Intro. nonlinear Markov models (+links integro-diff eq.) P. Del Moral
Mean field simulation for Monte Carlo integration Part I : Intro. nonlinear Markov models (+links integro-diff eq.) P. Del Moral INRIA Bordeaux & Inst. Maths. Bordeaux & CMAP Polytechnique Lectures, INLN
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationChain Rule. MATH 311, Calculus III. J. Robert Buchanan. Spring Department of Mathematics
3.33pt Chain Rule MATH 311, Calculus III J. Robert Buchanan Department of Mathematics Spring 2019 Single Variable Chain Rule Suppose y = g(x) and z = f (y) then dz dx = d (f (g(x))) dx = f (g(x))g (x)
More information4th Preparation Sheet - Solutions
Prof. Dr. Rainer Dahlhaus Probability Theory Summer term 017 4th Preparation Sheet - Solutions Remark: Throughout the exercise sheet we use the two equivalent definitions of separability of a metric space
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationLecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.
1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if
More information