THE VALUE FUNCTION IN ERGODIC CONTROL OF DIFFUSION PROCESSES WITH PARTIAL OBSERVATIONS II

Similar documents
OPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS

SUPPLEMENT TO CONTROLLED EQUILIBRIUM SELECTION IN STOCHASTICALLY PERTURBED DYNAMICS

1. Stochastic Processes and filtrations

THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON

1 Brownian Local Time

Reflected Brownian Motion

for all f satisfying E[ f(x) ] <.

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Multi-dimensional Stochastic Singular Control Via Dynkin Game and Dirichlet Form

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

A Barrier Version of the Russian Option

2 Statement of the problem and assumptions

Weak convergence and large deviation theory

On the martingales obtained by an extension due to Saisho, Tanemura and Yor of Pitman s theorem

SOME PROPERTIES ON THE CLOSED SUBSETS IN BANACH SPACES

Harmonic Functions and Brownian motion

Lipschitz continuity for solutions of Hamilton-Jacobi equation with Ornstein-Uhlenbeck operator

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Mutual Information for Stochastic Differential Equations*

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

Backward Stochastic Differential Equations with Infinite Time Horizon

Nonlinear representation, backward SDEs, and application to the Principal-Agent problem

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Continuous dependence estimates for the ergodic problem with an application to homogenization

Lecture 22 Girsanov s Theorem

SYNCHRONIZATION OF NONAUTONOMOUS DYNAMICAL SYSTEMS

Some Properties of NSFDEs

Thomas Knispel Leibniz Universität Hannover

Stability of Stochastic Differential Equations

Fast-slow systems with chaotic noise

Stochastic Differential Equations.

Uniformly Uniformly-ergodic Markov chains and BSDEs

UNCERTAINTY FUNCTIONAL DIFFERENTIAL EQUATIONS FOR FINANCE

EULER MARUYAMA APPROXIMATION FOR SDES WITH JUMPS AND NON-LIPSCHITZ COEFFICIENTS

2012 NCTS Workshop on Dynamical Systems

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

4.6 Example of non-uniqueness.

Laplace s Equation. Chapter Mean Value Formulas

Richard F. Bass Krzysztof Burdzy University of Washington

An Ergodic Control Problem for Constrained Diffusion Processes: Existence of Optimal Markov Control.

OPTIMAL SEQUENTIAL VECTOR QUANTIZATION OF MARKOV SOURCES

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations.

Half of Final Exam Name: Practice Problems October 28, 2014

Hypoelliptic multiscale Langevin diffusions and Slow fast stochastic reaction diffusion equations.

On the Bellman equation for control problems with exit times and unbounded cost functionals 1

Maximum Process Problems in Optimal Control Theory

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

Man Kyu Im*, Un Cig Ji **, and Jae Hee Kim ***

Ernesto Mordecki 1. Lecture III. PASI - Guanajuato - June 2010

Theoretical Tutorial Session 2

Minimal Sufficient Conditions for a Primal Optimizer in Nonsmooth Utility Maximization

On the quantiles of the Brownian motion and their hitting times.

A MODEL FOR THE LONG-TERM OPTIMAL CAPACITY LEVEL OF AN INVESTMENT PROJECT

Piecewise Smooth Solutions to the Burgers-Hilbert Equation

The Azéma-Yor Embedding in Non-Singular Diffusions

Non-Essential Uses of Probability in Analysis Part IV Efficient Markovian Couplings. Krzysztof Burdzy University of Washington

AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015)

Recent Trends in Differential Inclusions

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

On Differentiability of Average Cost in Parameterized Markov Chains

c 2004 Society for Industrial and Applied Mathematics

OPTIMAL TRANSPORTATION PLANS AND CONVERGENCE IN DISTRIBUTION

Nonlinear Systems and Control Lecture # 12 Converse Lyapunov Functions & Time Varying Systems. p. 1/1

REPRESENTABLE BANACH SPACES AND UNIFORMLY GÂTEAUX-SMOOTH NORMS. Julien Frontisi

Properties of an infinite dimensional EDS system : the Muller s ratchet

AW -Convergence and Well-Posedness of Non Convex Functions

Dynamical systems with Gaussian and Levy noise: analytical and stochastic approaches

Zero-sum semi-markov games in Borel spaces: discounted and average payoff

Fast-slow systems with chaotic noise

A Concise Course on Stochastic Partial Differential Equations

The Pedestrian s Guide to Local Time

n E(X t T n = lim X s Tn = X s

Lecture 12. F o s, (1.1) F t := s>t

Singular Perturbations of Stochastic Control Problems with Unbounded Fast Variables

Kai Lai Chung

Harmonic Functions and Brownian Motion in Several Dimensions

THE SET OF RECURRENT POINTS OF A CONTINUOUS SELF-MAP ON AN INTERVAL AND STRONG CHAOS

Malliavin Calculus in Finance

Spatial Ergodicity of the Harris Flows

Stationary distribution and pathwise estimation of n-species mutualism system with stochastic perturbation

Linear conic optimization for nonlinear optimal control

Nonlinear stabilization via a linear observability

b i (µ, x, s) ei ϕ(x) µ s (dx) ds (2) i=1

Presenter: Noriyoshi Fukaya

Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets

Convergence of generalized entropy minimizers in sequences of convex problems

p 1 ( Y p dp) 1/p ( X p dp) 1 1 p

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

EXISTENCE AND UNIQUENESS OF SOLUTIONS FOR A SYSTEM OF FRACTIONAL DIFFERENTIAL EQUATIONS 1. Yong Zhou. Abstract

STAT 331. Martingale Central Limit Theorem and Related Results

STOCHASTIC PERRON S METHOD AND VERIFICATION WITHOUT SMOOTHNESS USING VISCOSITY COMPARISON: OBSTACLE PROBLEMS AND DYNKIN GAMES

Applications of Ito s Formula

Skorokhod Embeddings In Bounded Time

Asymptotic stability of an evolutionary nonlinear Boltzmann-type equation

LMI Methods in Optimal and Robust Control

ON THE PATHWISE UNIQUENESS OF SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS

Some SDEs with distributional drift Part I : General calculus. Flandoli, Franco; Russo, Francesco; Wolf, Jochen

LONG-TERM OPTIMAL REAL INVESTMENT STRATEGIES IN THE PRESENCE OF ADJUSTMENT COSTS

Question 1. The correct answers are: (a) (2) (b) (1) (c) (2) (d) (3) (e) (2) (f) (1) (g) (2) (h) (1)

Transcription:

APPLICATIONES MATHEMATICAE 27,4(2), pp. 455 464 V.S.BORKAR(Mumbai) THE VALUE FUNCTION IN ERGODIC CONTROL OF DIFFUSION PROCESSES WITH PARTIAL OBSERVATIONS II Abstract. The problem of minimizing the ergodic or time-averaged cost for a controlled diffusion with partial observations can be recast as an equivalent control problem for the associated nonlinear filter. In analogy with the completely observed case, one may seek the value function for this problem as the vanishing discount limit of value functions for the associated discounted cost problems. This passage is justified here for the scalar case under a stability hypothesis, leading in particular to a martingale formulation of the dynamic programming principle. 1. Introduction. The usual approach to control of partially observed diffusions is via the equivalent separated control problem for the associated nonlinear filter. This approach has proved quite successful for the finite horizon and infinite horizon discounted costs. (See, e.g., Borkar (1989), Chapter V.) For the average cost or ergodic control problem, however, a completely satisfactory treatment is still lacking. While the existence of optimal controls in appropriate classes of controls is known in many cases (see, e.g., Bhatt and Borkar (1996)), the characterization of optimality via dynamic programming has not yet been fully developed. Limited results are available in Bhatt and Borkar (1996) which takes a convex duality approach, and in Borkar (1999) where the vanishing discount limit is justified for the limited class of the so-called asymptotically flat diffusions of Basak, Borkar and Ghosh (1997). The aim here is to present another special case where this limit can be justified, that of stable scalar diffusions. 2 Mathematics Subject Classification: Primary 93E2. Key words and phrases: ergodic control, partial observations, scalar diffusions, value function, vanishing discount limit. Research partially supported by grant No. DST/MS/III-45/96 from the Department of Science and Technology, Government of India. [455]

456 V. S. Borkar Specifically, consider the scalar diffusion X( ) and an associated observation process Y( ) = [Y 1 ( ),...,Y m ( )] T described by the stochastic differential equations (1.1) (1.2) for t. Here, X(t) = X + Y(t) = m(x(s),u(s))ds+ h(x(s))ds+w (t), σ(x(s))dw(s), (i) m(, ) : R U R, for a prescribed compact metric space U, is a bounded continuous map, Lipschitz in its first argument uniformly w.r.t. the second, (ii) σ( ) : R R + is bounded Lipschitz and uniformly bounded away from zero, (iii) h( ) : R R m is bounded twice continuously differentiable with bounded first and second derivatives, (iv) X is a random variable with prescribed law π P(R) (here and later, for a Polish space S, P(S) := the Polish space of probability measures on S with the Prokhorov topology), (v) W( ),W ( ) are resp. one and m-dimensional standard Brownian motions and (X,W( ), W ( )) are independent, (vi)u( ) : R + U is a control processwith measurablepaths, adapted to {G t } := the natural filtration of Y( ). Call such u( ) strict sense admissible controls. (More generally, u( ) is said to be admissible if for t s, W(t) W(s) is independent of (u(y), W(y), y s,x ).) Given a bounded continuous running cost k : R U R, the ergodic control problem under partial observations seeks to minimize over all strict sense admissible controls the ergodic or average cost (1.3) lim sup t 1 t E[k(X(s),u(s))]ds. We consider the weak formulation of this problem described in Chapter I of Borkar (1989) and the relaxed control framework. The latter in particular implies the following: We suppose that U = P(Q) for a compact metric space Q and that there exist bounded continuous m : R Q R, Lipschitz in the first argument uniformly w.r.t. the second, and bounded continuous k : R Q R, such that m(x,u) = m(x,y)u(dy), k(x,u) = See Borkar (1989), Chapter I, for background. k(x, y) u(dy) x, u.

The value function 457 The stability assumption is the following: Say u( ) is a Markov control if u(t) = v(x(t)) for all t and a measurable v : R U. (This is not strict sense admissible in general.) Equation (1.1) has a unique strong solution X( ) under any Markov control (see, e.g., Borkar (1989), pp. 1 12). We assume that any such X( ) is stable, i.e., positive recurrent (Bhattacharya (1981)), under all Markov u( ) v(x( )). By abuse of notation, we may refer to the map v( ) itself as the Markov control. Consequences of stability are given in Section 3, following a description of the separated control problem in the next section. The final section establishes the vanishing discount limit, leading to a martingale formulation of the dynamic programming principle in the spirit of Striebel (1984). 2. The separated control problem. Let {F t } := the natural filtration of (u( ),Y( )) and { F t } := the natural filtration of (u( ), Y( ), X( ), W( ), W ( )). Let π t P(R) be the regular conditional law of X(t) given F t. Let π t (f) = Ì f dπ t for f : R R measurable (when defined) and L u f(x) = 1 2 σ2 (x) d2 f dx 2(x)+m(x,u)df (x), x R, u U, dx for f C 2 (R). The evolution of {π t } is given by the nonlinear filter (see, e.g., Borkar (1989), Section V.1) (2.1) π t (f) = π (f)+ + π s (L u(s) f)ds π s (hf) π s (f)π s (h),dŷ(s), t, where the innovations process Ŷ(t) := Y(t) Ì t π s(h)ds, t, is a standard Brownian motion in R m. Rewrite (1.3) as (2.2) lim sup t 1 t E[π s (k(,u(s)))]ds. The separated control problem equivalent to the one above is to control {π t } given by (2.1) over strict sense admissible u( ), so as to minimize the cost (2.2). For well-posedness issues concerning (2.1), see Borkar (1989), Section V.1. (Assumption (iii) above plays a role here. See Fleming and Pardoux (1982) for weaker conditions.) Following Fleming and Pardoux (1982), we enlarge the class of controls under consideration as follows: Let (Ω, F, P) denote the underlying probability space. Without loss of generality, let F = t F t. Define on (Ω,F) another probability measure P as follows. If P t,p t are the restrictions to

458 V. S. Borkar (Ω, F t ) of P,P t respectively, then ( dp t t = exp h(x(s)),dy (s) 1 dp t 2 ) h(x(s)) 2 ds, t. Under P, Y( ) is a standard Brownian motion in R m, independent of X, W( ). Call u( ) wide sense admissible if under P, for t s, Y(t) Y(s) is independent of {X,W( ),u(y), Y(y), y s}. (Note that this includes strict sense admissible controls.) The problem now is to minimize (2.2) over all such u( ). To summarize, we consider the separated control problem defined by: the P(R)-valued controlled Markov process {π t }, whose evolution is described by (2.1), U-valued control process u( ) assumed to be wide sense admissible in the sense defined above, the objective of minimizing the associated ergodic cost defined by (2.2) over all such u( ). Note that we are considering the so-called weak formulation of the control problem (Borkar (1989), Chapter I), i.e., a solution for the above control system is a pair of processes {π t,u(t)} satisfying the foregoing on some probability space. Call it a stationary pair if they form a jointly stationary process in P(R) U and an optimal stationary pair if the corresponding ergodic cost (2.2) (wherein the liminf will perforce be a lim) is the least attainable. 3. Consequences of stability. Let (X( ), u( )) be as in (2.1). Define Markov controls v m,v M such that v m (x) Argmin(m(x, )), v M (x) Argmax(m(x, )) for all x. This is possible by a standard measurable selection theorem (see, e.g., Borkar (1989), p. 2). Let X m ( ),X M ( ) be corresponding solutions to (1.1) on the same probability space as X( ), with the same W( ) and initial condition X. By the comparison theorem of Ikeda and Watanabe (1981), pp. 352 353, (3.1) X m (t) X(t) X M (t) t a.s. Thus for any y >, P( X(t) y) P( X m (t) y)+p( X M (t) y). Since X m ( ),X M ( ) are stable, this implies tightness of the laws of X(t), t, and therefore of the laws of π t, t, by Lemma 3.6, pp. 126 127, of Borkar (1989). It then follows as in Lemmas 3.1, 3.2 of Bhatt and Borkar (1996) that an optimal stationary pair {π t,u(t)} exists. Now let τ(x) = inf{t : X(t) = x}. Define τ m (x),τ M (x) analogously, with X m ( ), resp. X M ( ), in place of X( ). By (3.1), E x [τ()] E x [τ M ()]

The value function 459 (resp., E x [τ m ()]) for x (resp., ). Since X m ( ),X M ( ) are stable, the right-hand side is bounded and so is φ(x) := sup u( ) E x [τ()] for all x. Let g : R [,1] be a smooth map with g() = and g(x) 1 monotonically as x ±. Let β (,1) denote the optimal cost for the ergodic control problem that seeks to maximize lim sup t 1 t E[g(X(s))]ds. The value function V( ) : R R for this problem (see Borkar (1989), Ch. VI) is given by [ τ() ] (3.2) V(x) = sup E x (g(x(s)) β)ds 2φ(x). u( )admissible ThenV( )isac 2 solutiontotheassociatedhamilton Jacobi Bellman equation (ibid.) ( 1 2 σ2 (x) d2 V dx 2 (x)+max m(x,u) dv ) u dx (s) = β g(x). For our choice of g( ), it follows that there exist ε,a > such that (3.3) max u L uv(x) ε for x > a. Thus Remark. In conjunction with Ito s formula, (3.3) leads to: For x a, V(a) = E x [V(x(τ(a)))] V(x) εe x [τ(a)]. εφ(a)+v(x) ε(e x [τ(a)]+e a [τ()])+v(a) = εe x [τ()]+v(a), leading to V(x) ε(φ(x) φ(a))+v(a). Together with its counterpart for x a and (3.2), this shows that V( ),φ( ) have similar growth as x. Now consider independent scalar Brownian motions W 1 ( ),W 2 ( ), and for x 1,x 2 R, let X 1 ( ),X 2 ( ) be the processes given by X i (t) = x i + m(x i (s),u(s))ds+ σ(x i (s))dw i (s), t, for i = 1,2,u( ) being a common control process admissible for both. Let ξ = inf{t : X 1 (t) = X 2 (t)}. Lemma 3.1. E[ξ] K 1 (V(x 1 )+V(x 2 )).

46 V. S. Borkar Proof. Without loss of generality, suppose that x 1 x 2. Define X i (t), t, i = 1,2, by X i (t) = x i + m(x i (s),u i (s))ds+ σ(x i (s))dw i (s), t, for i = 1,2, with u 1 ( ) = v M (X 1 ( )), u 2 ( ) = v m (X 2 ( )). By the same comparison principle as in (3.1), it suffices to verify that E[ξ] K 1 (V(x 1 )+V(x 2 )) for ξ = inf{t : X 1 (t) = X 2 (t)}. From (3.3), it follows that V(x,y) = V(x) + V(y) serves as a stochastic Lyapunov function for X( ) = (X 1 ( ),X 2 ( )), implying that the expected first hitting time thereof for any open ball in R 2 is bounded (see, e.g., Bhattacharya (1981)). In particular, this is so if the ball is separated from the point (x 1,x 2 ) by the line {(x,y) : x = y}. The claim follows by standard arguments. The following lemma gives a useful estimate: Lemma 3.2. For T >, the law of X(t), t (,T], conditioned on X = x, has a density p(t,x, ) satisfying the estimates C 1 t 1/2 exp( C 2 y x 2 /t) p(t,x,y) C 3 t 1/2 exp( C 4 y x 2 /t), where C 1,C 2,C 3,C 4 are constants that depend on T. Proof. For controls of the type u(t) = v(x(t),t), t, with a measurable v : R R + U, these aretheestimates of Aronson (1967). Thegeneral case follows from the fact that the one-dimensional marginals of X( ) under any admissible u( ) can be mimicked by u( ) of the above type (Bhatt and Borkar (1996), p. 1552). Corollary3.1. For any s, the conditional law of X(s+t), t [,T], given F s has a density satisfying the above estimates with x replaced by X(s) throughout. Proof. CombinetheabovewithTheorem1.6, p. 13, ofborkar(1989). Ì Let P e (R) = {µ P(R) : e ax µ(dx) < for all a > }. For x >, φ(x) = E[τ M ()/X M () = x]. Thus φ( ) satisfies: φ() = and 1 2 σ2 (x) d2 φ dx 2(x)+m(x,v M(x)) dφ (x) = 1 on (, ). dx Explicit solution of this o.d.e. shows that φ( ) has at most exponential growth on [, ). A symmetricargument shows thesame fore[τ m ()/X m () = Ì x] on (,]. Thus V( ) has at most exponential growth and hence V(x)µ(dx) < for µ Pe (R). What is more, by Corollary 3.1 above, if π P e (R), E[V(X(t))] < for t, implying E[π t (V)] <, or

The value function 461 π t (V) < a.s. A similar argument shows that π P e (R) π t P e (R) a.s. for t, allowing one to view {π t } as a P e (R)-valued process. 4. The vanishing discount limit. The associated discounted cost problem with discount factor α > is to minimize over all wide sense admissible u( ) the discounted cost J α (u( ),π ) = E [ ] e αt k(x(t),u(t))dt = E For π P(R) define the discounted value function ψ α (π) = infe [ [ ] e αt π t (k(,u(t)))dt/π = π, ] e αt π t (k(,u(t)))dt. where the infimum is over all wide sense admissible u( ). This infimum is, in fact, a minimum see Borkar (1989), Chapter V. We shall need a bound on ψ α (π) ψ α (π ) for π π. For this purpose, we first construct on a common probability space two solutions to (2.1), (2.2) with different initial laws, but a common wide sense admissible u( ) as follows. (We closely follow Borkar (1999).) Let (Ω,F,P ) be a probability space on which we have R-valued random variables X, X with laws π,π respectively, scalar Brownian motions W 1 ( ),W 2 ( ) and m-dimensional Brownian motions Ŷ( ), Ỹ( ), such that [ X, X, W 1 ( ), W 2 ( ), Ŷ( ), Ỹ( )] is an independent family. Also defined on (Ω,F,P )isau-valuedprocessu( )withmeasurablesamplepaths, independent of ( X, X,W 1 ( ),W 2 ( ),Ỹ( )), and satisfying: For t s, Ŷ(t) Ŷ(s) is independent of the foregoing and of u(y), Ŷ(y), y s. Let X( ), X( ) denote the solutions to (2.1) with initial conditions X, X and driving Brownian motions W 1 ( ), W 2 ( ) replacing W( ), respectively. Define Ft = the right-continuous completion ofσ( X(s), X(s),Ŷ(s),Ỹ(s),W 1(s),W 2 (s),u(s), s t), t. Without any loss of generality, let F = t F t. Define a new probability measure P on (Ω,F) as follows: Let P t,p t be the restrictions of P,P respectively to (Ω,Ft ) for t. Then dp t dp t = exp 1 2 ( t ( h( X(s)),dŶ(s) + h( X(s)), dỹ(s) ) ) ( h( X(s)) 2 + h( X(s)) 2 )ds, t. Novikov s criterion (see, e.g., Ikeda and Watanabe (1981)) ensures that the right-hand side is a legal Radon Nikodym derivative. By Girsanov s theo-

462 V. S. Borkar rem (ibid.), under P, Ŷ(t) = h( X(s))ds+Ŵ(t), Ỹ(t) = h( X(s))ds+ W(t), for t, where Ŵ( ), W( ) are m-dimensional Brownian motions and ( X, X,W 1 ( ),W 2 ( ),Ŵ( ), W( )) is an independent family. Further, u( ) is a wide sense admissible control for both X( ), X( ). What this construction achieves is to identify each wide sense admissible control u( ) for π with one wide sense admissible control u( ) for π. (This identification can be many-one.) By a symmetric argument that interchanges the roles of π,π, one may identify every wide sense admissible control for π with one for π. Now suppose that ψ α (π) ψ α (π ). Then for a wide sense admissible u( ) that is optimal for π for the α-discounted cost problem, ψ α (π) ψ α (π ) = ψ α (π ) ψ α (π) J α (u( ),π ) J α (u( ),π) sup J α (u( ),π ) J α (u( ),π) where we use the above identification of controls and the supremum is correspondingly interpreted as being over appropriate wide sense admissible controls. If ψ α (π) > ψ α (π ), a symmetric argument works. We have proved: Lemma 4.1. ψ α (π) ψ α (π ) sup J α (u( ),π) J α (u( ),π ). Let π,π P e (R). Then this leads to: Lemma 4.2. For a suitable constant K >, ψ α (π) ψ α (π ) K (π(v)+π (V)). Proof. Let K > be a bound on k(,u) and K 1 > as in Lemma 3.1. Let ξ=inf{t : X(t)= X(t)} and set X (t)= X(t)I{t ξ}+ X(t)I{t>ξ}. ThenX ( ) satisfies (2.1) withthesameu( ) as for X( ), X( ) andthedriving Brownian motion Then W( ) = W 2 ( )I{ ξ}+(w 2 (ξ)+w 1 ( ) W 1 (ξ))i{ > ξ}. ψ α (π) ψ α (π ) sup J α (u( ),π) J α (u( ),π ) sup e αt E[ k( X(t),u(t)) k(x (t),u(t)) ]dt 2KsupE[ξ] 2KK 1 E[V( X )+V( X )] = 2KK 1 (π(v)+π (V)).

The value function 463 From Borkar (1989), Chapter V, we know that ψ α ( ) satisfies the martingale dynamic programming principle: For t, e αt ψ α (π t )+ is an {F t }-submartingale. That is, for t s, [ (4.1) ψ α (π s ) E e αt ψ α (π t )+ e αs π s (k(,u(s)))ds s e αy π y (k(,u(y)))dy/f s ] Fix π P e (R) and let ψ α (π) = ψ α (π) ψ α (π ), ψ(π) = limsup α ψ α (π) and = liminf α αψ α (π ). Clearly, is bounded by any bound on k(, ) and in view of Lemma 4.2, ψ(π) = O(π(V)) = O(π(φ)). Rewrite (4.1) as [ ψ α (π s ) E e αt ψ α (π t )+ Taking limsup as α on both sides, we get Thus we have: [ ψ(π s ) E ψ α (π t )+ s a.s. e αy (π y (k(,u(y))) αψ α (π ))dy/f s ]. s (π y (k(,u(y))) )dy/f s ]. Theorem 4.1. Under any wide sense admissible u( ), ψ(π t ) (π s (k(,u(s))) )ds, t, is an {F t }-submartingale. Further, if {(π t,u(t)) : t } is a stationary pair under which it is a martingale, then it must be an optimal stationary pair and the optimal cost. Proof. The first part is proved above. The second follows exactly as in Theorem 3.1, Borkar (1999). This is a weak verification theorem, weak because existence of a stationary pair as above is not guaranteed, even though existence of an optimal stationary pair is. This is so because a priori, need not equal the optimal cost. However, one can show as in Theorem 3.1 of Borkar (1999) that it is less than or equal to the optimal cost. It is conjectured that if (π t,u(t)), t, in (4.1) is an optimal stationary pair with the law of π t = µ P(P e (R)), then obtained as above is indeed the optimal cost for µ-a.s. π.

464 V. S. Borkar References D.G.Aronson(1967),Boundsforthefundamentalsolutionofaparabolicequation,Bull. Amer. Math. Soc. 73, 89 896. G.K.Basak,V.S.BorkarandM.K.Ghosh(1997),Ergodiccontrolofdegenerate diffusions, Stochastic Anal. Appl. 15, 1 17. A.G.BhattandV.S.Borkar(1996),OccupationmeasuresforcontrolledMarkovprocesses: characterization and optimality, Ann. Probab. 24, 1531 1562. R.N.Bhattacharya(1981),Asymptoticbehaviourofseveraldimensionaldiffusions,in: Stochastic Nonlinear Systems in Physics, Chemistry and Biology, L. Arnold and R. Lefever(eds.), Springer Ser. Synerg. 8, Springer, Berlin, 86 99. V.S.Borkar(1989),OptimalControlofDiffusionProcesses,PitmanRes.NotesMath. Ser. 23, Longman Sci. and Tech., Harlow. V.S.Borkar(1999),Thevaluefunctioninergodiccontrolofdiffusionprocesseswith partial observations, Stochastics Stochastics Reports 67, 255 266. W.F.FlemingandE.Pardoux(1982),Optimalcontrolofpartiallyobserveddiffusions, SIAM J. Control Optim. 2, 261 285. N.IkedaandS.Watanabe(1981), StochasticDifferentialEquationsandDiffusion Processes, North-Holland, Amsterdam, and Kodansha, Tokyo. C.Striebel(1984),Martingalemethodsfortheoptimalcontrolofcontinuoustimestochastic systems, Stochastic Process. Appl. 18, 329 347. Vivek S. Borkar School of Technology and Computer Science Tata Institute of Fundamental Research Homi Bhabha Road Mumbai 4 5, India E-mail: borkar@tifr.res.in Received on 14.1.2; revised version on 31.7.2