MICHAEL LUDKOVSKI AND SEMIH O. SEZER

Size: px

Start display at page:

Download "MICHAEL LUDKOVSKI AND SEMIH O. SEZER"

Angela Cannon
5 years ago
Views:

1 FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES MICHAEL LUDKOVSKI AND SEMIH O. SEZER Abstract. We study decision timing problems on finite horizon with Poissonian information arrivals. In our model, a decision maker wishes to optimally time her action in order to maximize her expected reward. The reward depends on an unobservable Markovian environment, and information about the environment is collected through a compound) Poisson observation process. Examples of such systems arise in investment timing, reliability theory, Bayesian regime detection and technology adoption models. We solve the problem by studying an optimal stopping problem for a piecewise-deterministic process, which gives the posterior likelihoods of the unobservable environment. Our method lends itself to simple numerical implementation and we present several illustrative numerical examples. 1. Introduction Decision timing under uncertainty is one of the fundamental problems in Operations Research. In a typical setting, an economic agent called the decision-maker or DM) has a set of possible actions A where each action has a random) reward associated with it. The objective of the DM is to select a single action and time it so as to maximize her expected reward. More precisely, the DM picks a stopping time τ and an action k from the set A at τ. The reward H that DM receives is a function of the pair τ, k), as well as of some stochastic state variable Y. In classical examples e.g. investment timing, American option pricing, natural resource management, etc.), Y is an observable stochastic process e.g. asset prices, market demand etc.), and the DM s objective is a standard optimal stopping problem. More complicated stopping problems involving unobserved system states have also been considered in the literature; see, for example, Bather 1973, Monahan 198, Jensen 1982, McCardle 1985, Mazziotto 1986, Jensen and Hsu 1993, Stadje 1997, Schöttl 1998, Fuh 23, Décamps et al. 25, Dayanik and Goulding 29. Such models are especially natural when one wishes to capture the inherent conflict between gathering information which makes waiting valuable) and the time-value of money which makes waiting costly). Indeed, most realistic settings involve a DM who is only partially aware of the environment and must collect data before making a decision. In a multi-period setting, it is natural to capture this uncertainty in the environment through an unobservable stochastic process M {M t } t, where M t represents the state of the world at time t. The DM starts with an initial guess about M, collects information via relevant news, and updates her beliefs. At the time of decision she then receives a reward that depends on the present environment, H = Hτ, k, M τ ). In such problems, a common approach is to postulate that the process M is a partially observable Markov decision) process POMDP), in which case we have a hidden Markov model HMM). Such models have been studied extensively both in discrete- and continuous-time. The reader may refer 2 Mathematics Subject Classification. Primary 62L1; Secondary 62L15, 62C1, 6G4. Key words and phrases. Markov Modulated Poisson processes, Bayesian sequential analysis, optimal stopping, decision making. 1

2 2 MICHAEL LUDKOVSKI AND SEMIH O. SEZER to Bertsekas 1976, Monahan 1982, Elliott et al and Cappé et al. 25 for a comprehensive treatment of discrete-time models and also for many other references on the subject. For continuoustime models and applications, Bensoussan 1992, Liptser and Shiryaev 21, Mamon and Elliott 27 and Elliott et al. 1995), and the references therein can be consulted. In continuous-time models, if news such as changes in asset prices) arrive in infinitesimal amounts, then it is intuitive to have a continuum of information, which is typically captured by the filtration of an observed diffusion process. However, in many instances, a more realistic representation is to use discrete information amounts. Corporate developments, engineering failures, insurance claims, and economic surveys are all discrete events and the corresponding news arrive in chunks. Note that discreteness of information is distinct from the discreteness of time. The model is still in continuous-time, since the events may take place at any instance. However, the event itself carries a strictly positive amount of information. Moreover, no news is still informative and affects the beliefs of the DM. Mathematically, discrete information in continuous-time may be represented by the filtration of an observed marked point process MPP). In such a model, the instantaneous arrival intensity and the distribution of the marks typically depend on the current state of the process M. That is, the observable point process encodes information about the hidden environment M via its arrival times and/or marks. Filtering with continuous-time point process observations has been considered in Bremaud 1981, Arjas et al. 1992, Elliott and Malcolm 25, and it is known that the dynamics of the conditional probabilities of M are of the piecewise deterministic process PDP) type. In other words, the DM s beliefs evolve deterministically between arrivals of new information, and experience random jumps at event times. From the control perspective, various aspects of optimal stopping of PDP s have been studied by Lenhart and Liao 1985, Gugerli 1986 and Costa and Davis In this paper, we study a class of finite-horizon decision-making problems within the PDP framework by considering a general partially-observable regime-switching model with Poisson information arrivals. More precisely, we consider a setting where the observations of DM come from a compound Poisson process X with arrival rate λ, and mark/jump distribution ν. The local characteristics λ, ν) of X are modulated by the current state of an unobservable finite-state Markov process M. In this setting, the DM can stop at any time τ less than some horizon T < and select an action from a set A {1,..., a}. Action k A yields a terminal reward cost) equal to µ k,i1 {Mτ =i}, as a function of the unobservable state of M. Here, µ k,i is a given finite number, which can also be interpreted as the expected value of an independent random variable Φ k,i representing the uncertain payoff of taking action k when M t = i. The DM may alternatively delay her decision and continue to observe the process X in order to collect more information, or in order to stop later when M appears to be in a better state. Delaying the decision carries penalties rewards) due to the cost of observation or lost opportunity or operating revenues). We allow these terms to depend on M and we assume that an amount with present value τ e ρt c i1 {Mt=i}) dt is accumulated until the decision time τ. Here ρ is the discount factor, and c i is the instantaneous cost or revenue of running the system when M is in state i E. Also, we allow ρ to be zero. This makes the formulation suitable for non-financial applications where the quality of the decision is more important than its timing. In this setup, the objective of the DM is to find an admissible pair τ, d) that will maximize her total expected reward and resolve the trade-off between exploring getting more observations) and exploiting engaging in an action). Since the DM collects information by observing X, τ must be a

3 FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES 3 stopping time of the filtration F X generated by X. Also, the decision d should be measurable with respect to the information Fτ X revealed until τ. Let π = π 1,..., π n ) PM = 1),..., PM = n)) denote the prior beliefs of the DM about the initial state of M, and let P π be the corresponding probability measure. Then, the objective of the DM is to compute ) τ UT, π) sup E π e ρt c i 1 {Mt=i} dt + e ) 1.1) ρτ 1 {d=k} µ k,i 1 {Mτ =i}, τ T, d k A and, if it exists, find a pair τ, d) attaining this value. In our paper, we solve the problem in 1.1) in its general form without any restrictive assumption. We give a full characterization of the value function with a direct proof of the dynamic programming principle. We also identify optimal and ɛ-optimal policies for the DM. Moreover, we study the qualitative properties of the solution structure and provide a numerical approach that can be readily implemented; see Sections 4 and 5. Special cases of this optimal stopping problem have been considered by Jensen and Hsu 1993 in connection with system reliability studies, Jensen 1997 and Schöttl 1998 in the context of insurance premium re-pricing and Peskir and Shiryaev 2, Gapeev 22, Bayraktar et al. 26, Dayanik et al. 28a for classical Poisson disorder and regime detection problems. This line of work links together the MPP filtering literature with the PDP results of Lenhart and Liao 1985, Gugerli 1986, Costa and Davis Here, we extend these works in three major directions. First, we consider a general continuous-time finite-state Markov chain for the environment M, and impose no restrictions on the arrival rate and mark distribution of the observed MPP X. Second, we consider a general discount/cost structure, that can be used to encode a variety of economic objectives. So far, all the aforementioned papers have dealt only with special cases by imposing additional assumptions on either X or M or c i, ρ, µ k,i s. Finally, we work in the context of finite horizon, where value functions are time-inhomogeneous. The introduction of time-to-maturity as a state variable adds analytical complexity and leads to the appearance of novel effects that are not possible with infinite horizon stationary models. In Section 5, we illustrate the strength and wide-applicability of our approach on two key applications. In Section 5.1, we revisit the machine replacement problem of Jensen and Hsu 1993 in the finite horizon setting and without their assumptions. Next, in Section 5.2, we give the solution of the finite horizon formulation of the hypothesis-testing problem studied in Peskir and Shiryaev 2, Gapeev 22 and Dayanik et al. 28a. In the Bayesian sequential analysis literature, continuous-time change-detection and hypothesis-testing problems have attracted considerable interest, especially for Poisson and Wiener processes; see for example Dayanik et al. 28a,b and the references therein. Earlier works in this field study these two problems on the infinite horizon. One exception is Gapeev and Peskir 24, which solves the finite horizon change-detection problem for the Wiener process long after its infinite horizon formulation was solved by Shiryaev Our analysis in Sections 3 and 4 gives the solutions of these problems for the compound Poisson process as an immediate corollary, and this is another contribution of our paper. This paper is organized as follows. Below in Section 2, we describe the formal setting of our model and then show that the problem in 1.1) is equivalent to an optimal stopping problem in terms of the conditional probability process, which is a piecewise deterministic process. In Section 3, we describe how the value function of this stopping problem can be computed via a sequential procedure. The results of Section 3 are used in Section 4 in order to identify an optimal strategy and study its properties. Following this, in Section 5 we give examples illustrating our results.

4 4 MICHAEL LUDKOVSKI AND SEMIH O. SEZER Finally, Appendices at the end include supplementary proofs and additional remarks. Appendix A extends our model to the case of discrete costs incurred at each event time, and Appendix B comments on the relationship between finite- and infinite-horizon problems and optimal controls. 2. Problem Statement 2.1. Model. Let Ω, H, P) be a probability space hosting a continuous-time Markov process M taking values on E {1,..., n}, for n N, and with infinitesimal generator Q = q ij ) i,j E. Also, we have a collection of independent compound Poisson processes X 1),..., X n) with local parameters λ 1, ν 1 ),..., λ n, ν n ) respectively. In terms of these independent processes, we define the observation process 2.1) X t X +,t 1 {Ms=i} dx i) s, t, which is a Markov-modulated Poisson process, also called a Cox process see Cox and Isham 198 and Grandell 1976). In the remainder, we let σ, σ 1,... denote the arrival times of the process X: σ m inf{t > σ m 1 : X t X t }, m 1, with σ, and the variables Y 1, Y 2,... denote R d -valued marks observed at these arrival times: Y m = X σm X σm, m 1. Finally, to compute relative likelihoods of different marks, we introduce the measure ν defined as ν ν ν n, and we let f i ) denote the density of ν i with respect to ν Conditional probability process. For a point in D { π R n + : π π n = 1}, let P π denote the probability measure with the expectation operator E π ) under which M has initial distribution π. Moreover, let F {Ft X } t be the filtration of the process X in 2.1). With this notation, we define the D-valued conditional probability process Π ) t Π 1) t,..., Π n) t such that 2.2) Π i) t = P π {M t = i F X t }, for i E, and t. The process Π is clearly adapted to F, and each component gives the conditional probability that the current state of M is {i} given the information generated by X until the current time t. Moreover, using standard arguments as in Shiryaev 1978, pp , and Dayanik et al. 28a, Proof of Proposition 2.1, it can be shown that the problem in 1.1) is equivalent to a fully observed optimal stopping problem with the process Π as the new hyperstate. More precisely, the value function U in 1.1) can be written as 2.3) in terms of the functions τ UT, π) = V T, π) sup E π e ρt C Π t )dt + e ρτ H Π τ ) τ T, 2.4) C π) c i π i and H π) max H k π), where H k π) µ k,i π i. k A If there is a stopping time τ attaining the supremum in 2.3), then the admissible strategy τ, dτ )) is an optimal rule for the problem in 1.1) if we define 2.5) dτ) arg max k A H k Π τ ).

5 FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES Sample paths of Π. Let us take a sample path of the observations process X, in which m-many arrivals are observed on, t. Let t k ) k m denote those arrival times. If we know that the process M stays at the state {i} without any transition, then the conditional) likelihood of this path would be written as P π {σ k dt k, Y k dy k ; k m M s = i, s t} = m m λ i e λ it 1 dt 1 λ i e λ it m t m 1 ) dt m e λ it m t m 1 ) f i y k )νdy k ) = e λ it λ i dt k f i y k )νdy k ). By construction, the observation process X has independent increments conditioned on M = {M t } t. Therefore, we have 2.6) 1 {Mt=i} P π{ } σ i dt i, Y i dy i ; i m M s ; s t ) t n m = 1 {Mt=i} exp λ i 1 {Mtk =i}ds 1 {Mtk =j}λ j dt k f i y k )νdy k ). j E i=1 By taking the expectations of the expressions above, we obtain the unconditional likelihoods, in terms of which we give an explicit representation for the process Π in Lemma 2.1 below. Lemma 2.1. For i E, let us define 2.7) where 2.8) k=1 k=1 L π i t, m : t k, y k ), k m) E π 1 {Mt=i} e It) It) t n i=1 λ i 1 {Ms=i} ds and lt, y) j E k=1 m lt k, y k ), k=1 1 {Mt=j}λ j f j y). Also, let L π t, m : t k, y k ), k m) j E L π j t, m : t k, y k ), k m). Then we have 2.9) Π i) t = L π i t, N t : σ k, Y k ), k N t ) L π t, N t : σ k, Y k ), k N t ) L π i t, m : t k, y k ), k m) L π, t, m : t k, y k ), k m) m=nt ; t k =σ k,y k =Y k ) k m P π -a.s., for all t, and for i E. Lemma 2.1 indicates that the conditional probability of M t being in state i is simply the relative likelihood of the observed path until t on the event {M t = i}. Using the explicit form in 2.9), we describe the behavior of the sample paths of Π in Remark 2.1 below. Remark 2.1. The process Π has piecewise-deterministic sample paths: between two arrival times of X, it moves deterministically, and at an arrival time, it jumps from one point to another depending on the observed mark size see Figure 1). In precise terms, the sample paths have the characterization 2.1) Πt) = x t σ m, Πσ ) m ) Πσ m ) = R Πσ m ), Y m ) where xt, π) x 1 t, π),..., x n t, π)) is defined as 2.11), σ m t < σ m+1, m N, x i t, π) E π 1 {Mt=i} e It) E π e It) = P π {σ 1 > t, M t = i} P π, for i E, {σ 1 > t}

6 6 MICHAEL LUDKOVSKI AND SEMIH O. SEZER 1,),1) 1,),1) a) b),,1),,1) 1,,) c),1,) 1,,) d),1,) Figure 1. Sample paths of the process Π for four different examples. Solid lines represent actual sample paths. Dashed lines in panels c) and d) are the deterministic parts in 2.11). In panels a) and b), there are two hidden states, and in panels c) and d), there are three. The parameters of each example: Q a = ) ) 1 1, Q b =, Q c = , Q d = 1 1 with λ a = 1, 2, λ b = 1, 4, λ c = 1, 2, 3, λ d = 1, 3, 5. In each example, jumps of the process X are always of unit size. and R π, y) is defined by 2.12) R π, y) Rπ 1,..., π n, y) = ) λ 1 π 1 f 1 y) j E λ jπ j f j y),..., λ n π n f n y) j E λ. jπ j f j y) Note that the paths t xt, π) have the semigroup property xt + u, π) = xu, xt, π)), for t, u. The i th component of the vector flow x i, ) indicates how likely it is to have a period of, t without any arrival on the event {M t = i}. Moreover, from similar analysis in Dayanik et al. 28a, Section 2, Π is a P π, F)-Markov process for every π D. Corollary 2.1. Using infinitesimal last step analysis, it can be shown see, for example, Darroch and Morris 1968, page 416, and Karlin and Taylor 1998, Chapter 6.7) that the vector mt, π) m 1 t, π),..., m n t, π)) E π 1 {Mt=1} e Iu),..., E π 1 {Mt=n} e Iu) ) 2.13)

7 FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES 7 has the form mt, π) = π e tq Λ) in terms of n n diagonal matrix Λ with Λ i,i = λ i, and the components of mt, π) satisfy dm i t, π)/dt = λ i m i t, π) + j E m jt, π) q j,i. Then thanks to the chain rule and 2.11) we have dx i t, π) n n 2.14) = q j,i x j t, π) λ i x i t, π) + x i t, π) λ j x j t, π). dt Hence, the process Π in 2.1) has the dynamics j j 2.15) dπ i) t = n j q j,i Π j) t λ iπ i) t + Πi) t n λ j Π j) dt + j t R d λ i f i y)π i) t j E λ jf j y)π j) 1 pdt, dy), i E, t where p, ) is the point process given by p, t B) = i N 1,t B σ i, Y i ), for every Borel set B BR d ) and t. 3. Constructing the Value Function The characterization of the sample paths in 2.15) and general theory of optimal stopping see, for example, Bensoussan 1992, Lenhart and Liao 1985) imply that the free-boundary problem associated with the optimal stopping problem in 2.3) has the form 3.1) max { ρ + L)fs, π) + C π) ; H π) fs, π) } =, in terms of the infinitesimal generator Lfs, π) = fs, π) s + q j,i π j λ i π i + π i λ j π j fs, π) π i j E j E ) + f s, R π, y) fs, π) π i λ i ν i dy), y R d acting on smooth) functions f, ) on, T D. Studying the equation ρ+l)fs, π)+c π) = and determining the stopping regions is not easy even when n = 2; see, for example, Peskir and Shiryaev 2, which solves a free-boundary problem similar to 3.1) for an infinite horizon problem with n = 2. Moreover, it is known that the value function of such a stopping problem may not differentiable at every point on its domain as illustrated in Dayanik and Sezer 25, in which case the equation 3.1) should be considered in viscosity sense. Instead of studying the problem in 3.1), we will employ a sequential approximation technique to compute the value function following Gugerli 1986 and Davis 1993, Chapter 5. Similar approach is also taken in Bayraktar et al. 26 and Dayanik et al. 28a for the disorder-detection and hypothesis-testing problems respectively on infinite horizon. Below, we tailor this method to fit it into the finite-horizon setting. We focus on the non-trivial modifications that arise due to timedependent operators and the more general form of M, and otherwise refer to the results of Dayanik et al. 28a. All the proofs are delegated to the Appendix.

8 8 MICHAEL LUDKOVSKI AND SEMIH O. SEZER 3.1. A sequential approximation. Let us first define the sequence of functions 3.2) V s, π) sup τ s V m s, π) sup τ s τ E π e ρt C Π ) t )dt + e ρτ H Πτ, and τ σm E π e ρt C Π ) t )dt + e ρτ σm H Πτ σm, for m N, on the domain, T D, where the first argument s should be considered as the remaining time to maturity. Proposition 3.1 below shows that V m s converge to V uniformly; see also the proof of Davis 1993, Theorem 53.4) and Dayanik et al. 28a, Proposition 3.1 for related results. Proposition 3.1 is a generalization of these results in the finite horizon case. Proposition 3.1. The sequence {V m } m 1 converges to V uniformly on, T D. More precisely, we have V m s, π) V s, π) V m s, π) + T C + 2 H ) ) ) m/2 λ T λ 3.3) 1/2, m 1 2ρ + λ for all s, π), T D and m N, where C max π D C π), H max π D H π) and λ max λ i. Let us consider the second problem in 3.2) for fixed m N, and let τ s be an F-stopping time. Note that the first arrival time σ 1 is a regeneration time of Markov process Π; therefore, on the event {τ σ 1 }, the maximal expected reward that the DM can achieve after σ 1 should be V m 1 s σ 1, Π σ1 ). Define the operator 3.4) τ σ1 Jwτ, s, π) E π e ρt C Π t )dt + 1 {τ<σ1 }e ρτ H ) Πτ + 1 {σ1 τ}e ρσ 1 w s σ 1, Πσ 1 )). Then, the dynamic programming intuition suggests that V ) should solve the equation V m s, π) = J V m 1 s, π), where the operator J is defined as 3.5) J ws, π) sup Jwτ, s, π) = sup Jwt, s, π) τ s t,s for a bounded function w :, T D R. The second equality in 3.5) is due to the characterization of F-stopping times Davis 1993, Lemma A2.3, p. 261) whereby for every m N, there exists a F X σ m -measurable R m such that τ σ m+1 = σ m + R m ) σ m+1, P-a.s. on {τ σ m }. Note that, with the notation in 2.13), we have P π σ 1 > u = E π e Iu) and P π σ 1 du, M u = i = E π λ i 1 {Mu=i}e Iu) du = λ i m i u, π) du, and using the characterization of the paths in 2.1) and 2.14) the operator J in 3.4) can be rewritten as ) 3.6) Jwt, s, π) = m i t, π) e ρt H xt, π)) + t e ) ρu m i u, π) C xu, π)) + λ i S i ws u, xu, π)) du,

9 FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES 9 in terms of the operators see 2.12)) 3.7) S i wt, π) w t, R π, y)) f i y)νdy), for i E. R d The following lemma provides basic properties of the operator J. Lemma 3.1. If w, ) is a bounded continuous function on, T D, then so is J w, ). Also, if w 1, ) w 2, ), then J w 1, ) J w 2, ). Moreover, if the mapping π ws, π) is convex for each s, T, so is π J ws, π) for each s, T. 3.8) Let us now define the sequence v s, π) H π), and v m+1 s, π) J v m s, π), for m, on, T D. Thanks to Lemma 3.1 we immediately see that the sequence {v m, )} m N is non-decreasing, hence the pointwise limit v, ) sup m N v m, ) is well defined on, T D. Moreover, again by Lemma 3.1, we have that each v m, ) is bounded and continuous on, T D, and the mapping π v m s, π) is convex for each s, T. Proposition 3.2. The sequences defined in 3.2) and 3.8) coincide. That is, we have v m, ) = V m, ) for every m N. Corollary 3.1. Each V m is continuous and hence their uniform limit see Proposition 3.1) V, ) is also continuous on, T D. As the upper envelope of convex mappings π v m s, π) = V m s, π), the mapping π V s, π) is again convex for each s, T. The Proposition 3.3 below is the dynamic programming equation for V, ), characterizing the value function as the fixed point of the operator J defined in ). Proposition 3.3. The value function satisfies V s, π) = J V s, π), and it is the smallest bounded solution of this equation greater than H ). 4. An Optimal Strategy Recall that the process Π has right-continuous paths with left limits), and the functions V, ) and H ) are continuous due to Corollary 3.1. Hence the paths of the process V t, Π t ) H Π t ) are also right-continuous and have left limits. Therefore, for ε the random time { U ε s, π) inf t, s : V s t, Π t ) ε H Π } 4.1) t ) is a well-defined F-stopping time. We also have U ε s, π) σ 1 = r ε s, π) σ 1, where 4.2) r ε s, π) inf {t, s : V s t, xt, π)) ε H xt, π))}, which can be considered as the deterministic counterpart of 4.1). Proposition 4.1. The stopping time U ε s, π) defined in 4.1) is an ε-optimal stopping time for the problem in 2.3), i.e., 4.3) Uεs, π) E π e ρt C Π t ) dt + e ρ Uεs, π) H ΠUε s, π)) V s, π) ε, for all ε and s, π), T D. Before proceeding with the proof of Proposition 4.1, we first state an immediate consequence of this result.

10 1 MICHAEL LUDKOVSKI AND SEMIH O. SEZER Corollary 4.1. The pair U T, π), du T, π))) is an optimal admissible strategy for the problem in 1.1). Proof of Proposition 4.1. Let us define 4.4) Z t t e ρu C Π u ) du + e ρt V s t, Π t ), t, s, which is a bounded process on t, s, T. The ε-optimality of U ε s, π) follows easily once we establish 4.5) E π Z Uεs, π) = Z since this equality would imply V s, π) = E π Z Uεs, π) = Uεs, π) 4.6) E π e ρt C Π t ) dt + e ρuεs, π) V s U ε s, π), Π Uεs, π)) Uεs, π) E π e ρt C Π t ) dt + e ρuεs, π) H Π Uεs, π)) + ε, due to regularity of the paths t V t, Π t ) H Π t ). In the remainder of the proof we show 4.5) by establishing E π Z Uεs, π) σ m = Z, for m = 1, 2,..., inductively. After taking the limit as m in the equality above, we obtain 4.5) due to bounded convergence theorem. For typographical convenience we write r ε = r ε s, π) and U ε = U ε s, π). First, we consider m = 1. Recall that U ε s, π) σ 1 = r ε σ 1. Then E π Z Uε σ1 = E π Z rε σ1 = 4.7) rε σ1 E π e ρt C Π t )dt + 1 {σ1 r ε}e ρσ 1 V s σ 1, Π σ1 ) + 1 {σ1 >r ε}e ρrε H Π rε ) + 1 {σ1 >r ε}e V ρ rε s r ε, Π rε ) H Π ) rε ) = JV r ε, s, π) + e ρ rε P π {σ 1 > r ε } V s r ε, xr ε, π)) H xr ε, π)) ) where we used Proposition 3.3. Analogously to Dayanik et al. 28a, Lemma 3.8, we have that for deterministic times u t s, and for a bounded function w, ) ) 4.8) Jwt, s, π) = Jwu, s, π) + P π {σ 1 > u} e ρu Jwt u, s u, xu, π)) Hxu, π)). For t < r ε s, π), we have V s t, xt, π)) H xt, π)) > ε. Then, 4.8) yields JV t, s, π) sup JV u, s, π) εp π {σ 1 > t}e ρt sup JV u, s, π) εp π {σ 1 > t}e ρt < sup JV u, s, π). u t,s u,s u,s Therefore, the supremum in sup t,s JV t, s, π) must be achieved on r ε s, π), s and combining 4.8) with 4.7), we get E π Z Uε σ 1 = sup JV u, s, π) = J V s, π) = V s, π) = Z. u r ε,s

11 4.9) FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES 11 Now suppose by induction that E π Z Uεs, π) σ m = Z for m 1 and consider the equality E π Z Uε σ m+1 = E π 1 {Uε<σ 1 }Z Uε + 1 {Uε σ 1 }Z Uε σ m {Uε σ 1 } = E 1 π U ε {Uε<σ1 } e ρt C Π t ) dt + e ρuε V s U ε, Π ) Uε ) Uε σ m+1 e ρt C Π t ) dt + e ρ Uε σ m+1 V s U ε σ m+1, Π Uε σ m+1 ) ). On the event {U ε σ 1 }, we have U ε σ m+1 = σ 1 +U ε σ m θ σ1, where θ is the time-shift operator on Ω; i.e., X t θ s = X t+s. Using the strong Markov property of Π, equation 4.9) becomes 4.1) E π Z Uε σm+1 = E π Uε ) 1 {Uε<σ1 } e ρt C Π t ) dt + e ρuε V s U ε, Π Uε ) where 4.11) + σ1 e ρt C Π t )dt + 1 {Uε σ1 }e ρσ 1 ηs σ 1, Π σ1 ), Uε σm ηu, π) E π e ρt C Π t )dt + e ρ Uε σm V u U ε σ m, Π Uε σm ) = V u, π), thanks to the induction hypothesis for m. Combining 4.1) and 4.11) and the definition of Z in 4.4) we get E π Z Uε σm+1 = E π 1 {Uε<σ1 }Z Uε + 1 {Uε σ1 }Z σ1 = E π Z Uε σ1 = Z, where the last equality follows from our result for m = 1. Hence we have E π Z Uε σm+1 = Z and this completes the induction step A nearly-optimal strategy. On a practical level, one cannot compute V directly, but instead computes the approximate value functions V m s defined in 3.2) and employs the corresponding nearly-optimal strategies see 4.12). It is therefore important to know the error associated with this approximation. For a given error level ε >, let us fix { ) 1/2 ) k/2 λ T λ m = inf k N : T C + 2 H ) ε/2}, k 1 2ρ + λ such that V m V ε/2 on, T D via 3.3). Next, let us define the stopping times 4.12) ε/2 s, π) inf{t, s : V ms, Π t ) ε/2 H Π t )}. ) ) The regularity of the paths t Π t implies that V U m) ε/2 s, π), Π m) U ε/2 s, π) H Π m) U ε/2 s, π) ε. Then the arguments in the proof of Proposition 4.1 see 4.4), 4.5), and 4.6)) can easily be modified to show that m) U V s, π) = E π ε s, π) e ρt C m) ρu ε Π t ) dt + e s, π) V s U ε m) s, π), ) Π m) U s, π) ε 4.13) m) U E π ε s, π) e ρt C Π ) m) ρu ε t ) dt + e s, π) H ΠU m) + ε. ε s, π) ) Hence, if we apply the admissible strategy T, π), du m) T, π)), which requires computing U m) U m) ε 3.2) only up to m defined above, the resulting error is no more than ε. ε

12 12 MICHAEL LUDKOVSKI AND SEMIH O. SEZER 4.2. Stopping and continuation regions. Let 4.14) C T {s, π), T D : V s, π) > H π)}, Γ T {s, π), T D : V s, π) = H π)} denote the continuation and stopping regions respectively. decomposed as the union k A Γ T,k of the regions 4.15) Γ T,k {s, π), T D : V s, π) = H k π)}, k A, The stopping region can further be where H k is defined in 2.4). Corollary 4.1 states that in the optimal solution U T, π), du T, π)) ), one observes the process Π until U T, π), whence it enters the region Γ T. At this time, if Π is in the set Γ T,k we take du T, π)) = k. Remark 4.1. The definition of the value function V in 2.3) implies that the mapping s V s, π) is non-decreasing. Therefore if s, π) Γ T,k for some s, π), T D, then we have t, π) Γ T,k for all t s. Remark 4.2. For fixed s T, let s, π 1 ) and s, π 2 ) be two points in the region Γ T,k, and let α, 1). As the upper envelope of convex mappings π v m s, π) see Corollary 3.1), the mapping π V s, π) is convex for each s, T. Using this property we obtain H k α π α) π 2 )) V s, α π α) π 2 ) α V s, π 1 ) + 1 α) V s, π 2 ) = α H k π 1 ) + 1 α) H k π 2 ) = H k α π α) π 2 )), which implies that s, α π α) π 2 ) Γ T,k, and the region Γ T,k {s} D) is convex for each fixed s T and k A. Remark 4.3. Note that Γ T {, π); π D}. The region {s, π) Γ T : s > } may however be empty. In an example where min c i > and µ k,i s are all the same it is never optimal to stop prior to terminal time T. Moreover, the region {s, π) Γ T : s > } may be non-empty but have an empty interior. For example, in the hypothesis testing problem discussed in Section 5.2 all the states of the unobservable Markov process are absorbing, and each component Π i) t is a martingale. Since the terminal cost function of the corresponding minimization problem see 2.4)) H ) = min k E H k ) is concave, the process H Π t ) is a super-martingale on, T. If we select ρ = and c i = for all i E, it is therefore never optimal to stop early on the interior of {s, π) Γ T : s > }. In this case, there is no penalty associated with a delay in the decision, so τ = T unless π is at a corner of the simplex D. Lemma 4.1. For i E, let A i) {k A : µ k,i = max j A µ j,i }. If the inequality c i ρµ k,i + j i µ k,j µ k,i )q i,j > holds for all k A i), then there exists πi c < 1, independent of T, such that {s, π), T D : π i πi c} C T. If the hidden process M is known to be in state i E, then the expression ρµ k,i is the instantaneous decay of the payoff from selecting action k A immediately, and c i is the instantaneous cost of waiting. Moreover, under action k A, the term j i µ k,j µ k,i )q k,j is the marginal rate of return from waiting for the hidden process M to jump to another state. Therefore the sum in Lemma 4.1 is the instantaneous net return enjoyed by the DM under action k A. Lemma 4.1 indicates that if there is strong posteriori evidence that M is in state i, and if the instantaneous net return is positive under all favorable actions around the i th corner of D, the decision maker should not stop unless T = ).

13 FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES Stopping regions for reward maximization with running cost. Here, we consider the problem in 2.3) with the assumption c i running costs) for i E, and µ max k,i µ k,i > terminal rewards). The second condition is not restrictive if ρ = since we can always add and subtract) the same constant to and from) the terminal reward function. Let us define 4.16) I {i E : max k A µ k,i = µ}, which is the set of the states of M, at which the DM can get the highest terminal reward. Since c i for all i E, we obviously have i I {s, π) : s, T, π i = 1} Γ T. In general, if there is a penalty associated with waiting, we expect that it is optimal to stop at the points s, π) for which the best component π i, i I, is sufficiently high, for any s >. Lemma 4.2 provides a sufficient condition for this to be true. Lemma 4.2. Let i I. If ρ >, or c i <, then there exists a constant πi s T, such that Γ T {s, π), T D : π i π s i }. < 1, independent of Remark 4.4. If H ), the statement of the stopping problem in 2.3) implies that the value function V is non-increasing as a function of the discount factor ρ. If we denote the dependence of the stopping region on ρ with Γ T ρ), then we have Γ T ρ 1 ) Γ T ρ 2 ) whenever ρ 1 ρ 2. Moreover, the dynamics of the process Π are independent of ρ and U s, π) is the hitting time of Π to Γ T. Therefore, the time that the DM can afford for observing the process X in the presence of a lower discount factor is no less than that spent under heavier discounting. A similar claim also holds for dependence of U s, π) and Γ T on the running costs c i. Namely, an observer with lower in absolute value) running costs stops no-sooner than another one with heavier running costs. 5. Examples Below, we re-visit the well-known Bayesian regime detection problem and the machine replacement problem of Jensen and Hsu 1993 in our finite horizon setting. For both problems, we also provide numerical solutions, which are obtained by discretizing the domain, T D of V, ) and solving the fixed point equation V, ) = J V t, π) recursively. We set the number of iterations m N such that the error V m ) V ) is negligible see 3.3)). Our model is applicable in many other settings that have been considered elsewhere, including launch of insurance products Schöttl 1998, technology adoption Ulu and Smith 27 and various disorder detection problems; see Ludkovski and Sezer 27 for further details and examples Optimal replacement of a system. Here, we consider the reliability problem in Jensen and Hsu 1993 where the aim is to find the best time to replace a machine in order to maximize its lifetime net earnings. The objective is to compute 5.1) sup τ τ E π c i 1 {Mt=i}dt + µ i 1 {Mτ =i} In this setting, the observations come from a simple Poisson process representing the number of defective items produced by the machine, and the process M represents the current productivity level. The n th state defective state) is absorbing, while all others are transient. Related models have appeared in Makis and Jiang 23, and Stadje 1994 and go all the way to classical POMDP work by Smallwood and Sondik

14 14 MICHAEL LUDKOVSKI AND SEMIH O. SEZER Figure 2. Value function V T, π) of the reliability example of Section 5.1. The shaded regions represent the stopping regions { π D : V T, π) = H π)}. Left and right panels are for the values T = 1.5 and T =.2 respectively. The shaded regions are the same in both panels. Note however the different z-scales. The panels also show the line 3.5π π 2 π 3 =, which is the stopping boundary of the ILA rule. Assumption 1. In Jensen and Hsu 1993, it is assumed that i) q i for i = 1,..., n 1, with q n = ii) r 1 r 2... r n = c n, with c n < iii) < λ 1... λ n, iv) q in > λ n λ i for i = 1,..., n 1. These assumptions ensure that the infinitesimal look-ahead rule τ ILA := inf{t : i r iπ i) t < } is optimal where r i c i + j i µ j µ i )q i,j cf. Lemma 4.1). It follows as a corollary to Jensen 1989, Theorem 3.1 that τ ILA T is an optimal stopping rule for the finite horizon problem and the region { π D : V T, π) = H π)} does not depend on T. This occurs because the instantaneous revenue rates r i s completely summarize the relative worth of different machine states, and the sum r iπ i) t is monotonically non-increasing over time P π -a.s. for all π D see Jensen and Hsu 1993, Theorem 2). Thus, T only plays a role insofar as allowing the DM to collect profits before the machine deteriorates. We illustrate this degeneracy in Figure 2. In this example, we select the parameters to fit the framework of Jensen and Hsu We have a machine that moves through three regimes E = {1, 2, 3} with transition matrix Q = At different states, the running profit from operating the machine is c = 1,, 1, and shutting down the machine involves a cost of µ = 1, 1,. In each state, the breakdowns occur according to independent Poisson processes with intensities λ = 2, 3, 4. In this setting, we have r = {3.5, 1.5, 1} so that τ ILA = inf{t : 3.5Π 1) t + 1.5Π 2) t Π 3) t < }. The left and right panels of Figure 2 show the functions V T, π) and the regions { π D : T, π) Γ T } for T = 1.5 and T =.2 respectively. We see that V.2, π) < V 1.5, π) but the regions { π D : V T, π) = H π)} for T =.2 and T = 1.5 coincide with the region { π D : 3.5π π 2 π 3 }, at least modulo the D-discretization necessary for numerical implementation. This degenerate structure would disappear if one removes some of the assumptions in Jensen and Hsu Nevertheless, the sequential construction of Section 3 can still safely be employed

15 FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES 15 Figure 3. The second example for the reliability problem of Section 5.1 with the new parameters in 5.2). In the left panel T = 2, in the middle T =.5, and in the right panel T =.1. In each picture, the function V T, π) is plotted on D. Shaded regions are the sets { π D : V T, π) = H π)}. with the optimal stopping rule given in Corollary 4.1. We give an example in Figure 3 where ) Q =.5.5 and λ = λ1, λ 2, λ 3 = 1, 4, 7. We keep other parameters the same as in the previous example. Now, the instantaneous net gain r iπ i) t = 1.5Π 1) t +.5Π 2) t Π 3) t is not monotonically non-increasing P π -a.s. for all π D anymore. Figure 3 shows that the stopping region is now time dependent and expands as time to maturity decreases. For this choice of c and µ, one can modify the proof of Lemma 4.2 to show that there always exists a stopping region around the absorbing state for all T ). Furthermore, Lemma 4.1 implies that it is never optimal to stop around the corners of the simplex D corresponding to non-absorbing states. Also note that the transition rates of M are now lower. Therefore, the DM can obtain positive net gain when M starts from the state {1} and there is enough time to operate the system. Indeed, the first panel in Figure 3 shows that for T = 2 the value function is positive around the corner {1} Sequential hypothesis-testing. In this problem, a compound Poisson process X = {X t } t is observed starting from t =. The arrival rate λ and mark distribution ν of X are not known precisely. Rather they depend on the static regime of a Markov process M with n absorbing states i.e., M t = M for all t ). Each state corresponds to the realization of one of the n simple hypotheses 5.3) A 1 : λ, ν) = λ 1, ν 1 ),......, A n : λ, ν) = λ n, ν n ), with given prior likelihoods π i, for i = 1,..., n. The objective of the DM is to identify the current regime as quickly as possible, with minimal probability of wrong decision. In earlier work on this problem, the trade-off between observing and stopping is generally modeled via the Bayes risk 5.4) n E π τ + µ k,i 1 {d=k,m =i}, k,i=1

16 16 MICHAEL LUDKOVSKI AND SEMIH O. SEZER where τ is the decision time, d {1,..., n} represents the hypothesis selected and µ k,i is the cost of selecting the wrong hypothesis A k when the correct one is A i. The DM then needs to minimize 5.4) and find a pair τ, d), if one exists, that attains this infimum. The infinite horizon version of 5.4) was solved for the first time by Peskir and Shiryaev 2 for a simple Poisson process with n = 2. Later, Gapeev 22 provided the solution again with n = 2), where the jump size is exponentially distributed under each hypothesis, and the mean of the exponential distribution is the same as the proposed arrival rate. The solution for any jump distribution and for n N was recently provided by Dayanik et al. 28a. Below we treat the finite horizon version of that problem, where a decision must be made before horizon T <. Remark 5.1. Let V, π) denote the value function of this minimization problem on infinitehorizon, and for 1 k n, let Γ,k { π D : V, π) = H k π)} in terms of the functions H k π) = µ k,iπ i. Dayanik et al. 28a showed that each region Γ,k is closed and convex with a non-empty interior around the k th corner of the simplex D. This structure also extends to the finite-horizon problem. Since V, π) V T, π), we have Γ,k Γ T,k, for k E and T <. Then, Remarks 4.1 and 4.2 and Corollary 4.1 imply that there are time-dependent closed convex sets with non-empty interiors) around the corners of D such that it is optimal to stop the first time the process Π enters one of these sets. At this time, if the conditional likelihoods process Π is around the k th corner, we select hypothesis A k. In Figure 4, we illustrate the time-dependence of the solution structure using a simple example with two hypotheses A 1 : Λ = λ 1 and A 2 : Λ = λ 2 on the arrival rate only. This problem was solved in Peskir and Shiryaev 2 on infinite horizon, and the authors show that the immediate stopping is optimal if and only if µ 2,1 µ 1,2 λ 2 λ 1 ) µ 2,1 + µ 1,2. Hence, the inequality µ 2,1 µ 1,2 λ 2 λ 1 ) > µ 2,1 + µ 1,2 has to be satisfied also in any finite-horizon problem with non-trivial solution. In Figure 4, the arrival rates are λ 1 = 1 and λ 2 = 5. For the Bayes risk given in 5.4), we select µ 1,2 = µ 2,1 = 2 for the penalty costs. This numerical example matches Peskir and Shiryaev 2, Figures 2-3. The left panel of Figure 4 shows the value functions V T, ) with horizons T =.1, T =.2, T =.4 and T = 2 respectively, and the terminal reward H π) = min{µ 1,2 π 2 ; µ 2,1 1 π 2 )} on the state space of π 2, 1. We see that as T increases, the value function decreases, as expected. The right panel of Figure 4 shows that the continuation region widens as time to maturity increases. We also observe that the boundary curves approach the solution structure of problem with infinite horizon. Peskir and Shiryaev 2 obtained a continuation region of.22,.7, very close to ours of.23,.75 for T > 1. Let us define the lower boundary curve T b 1 T ) sup{π 2, 1 : V T, π) = 2π 2 }. Clearly b 1 ) =.5. In the right panel, we remarkably observe that the lower boundary curve b 1 ) has a discontinuity at T = and then remains constant until about T =.2. Note that the point π = π 1, π 2 ) =.5,.5) is the global maximum of the terminal cost function H π). Starting at the point.5 + ε,.5 ε), for ε and small, as long as there is no jump, the conditional likelihood process Π drifts quickly) toward π = π 1, π 2 ) = 1, ) and away from this maximum. Intuitively speaking, for very small values of T, the probability of observing a jump is low and thus it is optimal to continue. Therefore, the lower curve in Figure 4 is discontinuous around T =. The drift of the process Π towards 1, ) decreases as π 2 decreases and approaches 1, ) see 2.14)). As a result, at points π where π 2 is small, the effect of waiting cost becomes dominant and it is optimal to stop even if T is small.

17 FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES 17 Figure 4. Bayesian regime detection example of Section 5.2. The left panel shows the value functions V T, π) for various time horizons T. The right panel shows the stopping regions Γ T,k namely Γ T,1 below the lower curve and Γ T,2 above the higher curve) for T = 2. The following proposition summarizes our discussion on this example and states that this behavior of the lower boundary curve around T = holds for any set of parameters λ 2 > λ 1, µ 1,2, µ 2,1. Proposition 5.1. Consider the hypothesis-testing problem in 5.4) with two simple hypotheses on the arrival rate: A 1 : Λ = λ 1 and A 2 : Λ = λ 2 with λ 2 > λ 1 ). The continuation region C T is non-empty for T > ) if and only if µ 2,1 µ 1,2 λ 2 λ 1 ) > µ 2,1 + µ 1,2. The boundary curve T b 1 T ) sup{π 2, 1 : V T, π) = µ 1,2 π 2 } is discontinuous at T =, and there is an interval around T = at which b 1 ) is constant. Remark 5.2. As a final note, we would like to add that our analysis in Sections 3 and 4 can also be applied easily to solve the finite horizon change-detection problem. In this problem, the local parameters λ, ν) of an observed compound Poisson process change at some unobservable time θ when the process M hits one of its absorbing states, and the objective is to find the best time τ that minimizes Eτ θ) + + c Pτ < θ), which is another special case of 1.1). In the infinite horizon setting, Dayanik and Sezer 25 and Bayraktar and Sezer 29 show that the stopping region consists of closed convex regions again with non-empty interior) around the corners of D corresponding to absorbing states. In the finite-horizon formulation, Remarks 4.1 and 4.2 and Lemma 4.2 imply that there are time-dependent closed convex sets around these corners, and the hitting time of Π to those regions is an optimal alarm time thanks to Corollary 4.1). Moreover, Lemma 4.1 implies that it is never optimal to stop around the remaining corners of D corresponding to non-absorbing states. Acknowledgments The authors would like to thank the editors and anonymous referees for many helpful comments and remarks that improved the presentation in the paper.

18 18 MICHAEL LUDKOVSKI AND SEMIH O. SEZER Appendices Appendix A. Discrete information costs The objective function in 1.1) is applicable to a variety of economic settings. This has allowed us to provide a unified treatment of many disparate models. Returning to the economic interpretation of the running costs appearing in the first term in 1.1), in a typical setting they represent information acquisition expenses, or opportunity costs. Alternatively, observation costs may be discrete and be incurred only when new information arrives. This, for example, happens if new information corresponds to opportunities lost e.g. deals signed by competitors), leading to a cost structure of the form N τ j=1 e ρσ j KY j ). Here, N τ is the number of arrivals by time τ, σ j, Y j ) are the arrival times and marks respectively, and KY j ) is the cost incurred upon an arrival of size Y j with K : R d R satisfying ν i K + R K + y)ν d i dy) <, i E). In the third case, one deals with the objective function A.1) ÛT, π) sup τ T, d F X τ E π N τ e ρσ j KY j ) + e ρτ j=1 a =i}) 1 {d=k} µ k,i 1 {Mτ, by solving the equivalent stopping problem ˆV T, π) sup τ T E π Nτ ) j=1 e ρσ j KY j ) + e ρτ H Πτ, as in Proposition 2.3. In this case, one can verify that the sequential approximation method of Section 3 holds for the value function ˆV. Namely, if we define the sequence of functions { ˆV m, )} m, where ˆV m s, π) sup τ s E π m Nτ j=1 e ρσ j KY j ) + e ρτ σm H k=1 Πτ σm ), it can be shown see ), Proposition 3.2) that we have ˆV m+1 s, π) = Ĵ ˆV m s, π) where the operator Ĵ is defined as Ĵ ws, π) = sup E π e It) e ρt H xt, π)) t,s t + e ) ρu m i t, π) λ i Ky)ν i dy) + S i ws u, xu, π)) du, R d for a bounded function w :, T D R. NT Clearly {V m } m is an increasing sequence. Using the inequality E j=1 K+ Y j ) max λ i )T max ν i K + ) and the truncation arguments in the proof of Proposition 3.1, one can show that the sequence converges to ˆV uniformly with the error bound ) ) 1/2 ) m/2 λ T λ V V m max λ i)t max ν ik + ) + 2 H. m 1 2ρ + λ Arguments in Sections 3 and 4 can then be replicated to conclude that NÛεs, π) π)) E π e ρσ j KY j ) + e ρ Ûεs, π) H Π Û ε s, ˆV s, π) ε, j=1 for the stopping time Ûεs, π) inf { t, s : ˆV s t, Πt ) ε H Π } t ). Hence, the admissible strategy Ûεs, π), dûεs, π))) is an optimal strategy for the problem in A.1), as expected. Furthermore, other results of Section 4 can be adjusted for this new objective function. Below, we summarize these results in a remark, and we conclude our discussion here. Remark A.1. Let ν j K R d Ky)ν j dy), for j E.

19 FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES 19 i) For a given index i E, Define A i) {k A : µ k,i = max j A µ j,i } as in Lemma 4.1. If ρµ k,i + λ i ν i K + j i µ k,j µ k,i )q i,j > holds for all k A i), then there exists some ˆπ i c < 1 for all T > ) such that it is optimal to continue on the region {, T D; π i ˆπ i c}. ii) Assume ν j K for all j E, and µ max k,i µ k,i >, and let I be as in 4.16). For i I, if ν i K < or ρ > there exists a number ˆπ i s < 1 free of T ) such that it is optimal to stop at the points π for which π i ˆπ i s. That is: Γ T,i {, T D; π i ˆπ i c } for all T. iii) In the case where ν j K for all j E, and H ), the stopping region is monotone in ρ and ν j K, for j E. Namely, if we increase one of these factors in absolute terms keeping everything else fixed), the stopping region expands, and the DM is forced to make a decision sooner. iv) For a given ε >, let m N such that ˆV T, ) ˆV T, ) ε/2. Then the stopping time Û m) ε/2 {t s, π) inf, T : ˆVm T t, Π t ) ε H Π } t ) gives an ε-optimal strategy. v) If ρ > or K ) with max ν i K ) <, then ˆV T, ) ˆV, ) uniformly as in B.2) if we redefine e ρt max λ i max ν ik H ) ), if ρ > ErrT ) ) 2 H ) mink,i µ k,i max k,i µ k,i, if ρ =, K ) and max T min λ i max ν i K ν ik <. Appendix B. Remarks on the infinite horizon problem In general, if there is a strict penalty for waiting, it is likely that the DM will make a decision prior to the final time T for moderate or large values of T. In this case, the constraint τ T in 2.3) is of less importance, and one essentially faces an infinite horizon stopping problem. Solving the infinite horizon problem can be computationally more appealing since we eliminate the timedimension of the state space, T D. Below, we show that the value function of the finite-horizon problem converges uniformly to that of the infinite horizon under the assumption B.1) either ρ > or max c i <. The infinite horizon problem is defined as in 2.3) and 1.1)) by removing the constraint τ T. With the notation in 2.3), let V, π) be the value function of this stopping problem. Lemma B.1. As T, the function V T, π) converges to V, π) uniformly on D, and we have B.2) where V T, π) V, π) V T, π) + ErrT ), for all π D and T, e ρt C + 2 H ), if ρ > ) ErrT ) 2 H mink,i µ k,i max k,i µ k,i, if ρ = and max T max c c i <. i

SEQUENTIAL TESTING OF SIMPLE HYPOTHESES ABOUT COMPOUND POISSON PROCESSES. 1. Introduction (1.2)

SEQUENTIAL TESTING OF SIMPLE HYPOTHESES ABOUT COMPOUND POISSON PROCESSES SAVAS DAYANIK AND SEMIH O. SEZER Abstract. One of two simple hypotheses is correct about the unknown arrival rate and jump distribution