Solving Sequential Decision Problems via Continuation Values 1

Size: px

Start display at page:

Download "Solving Sequential Decision Problems via Continuation Values 1"

Frederick Lawson
5 years ago
Views:

1 Solving Sequential Decision Problems via Continuation Values 1 Qingyin Ma a and John Stachurski b a, b Research School of Economics, Australian National University September 1, 016 ABSTRACT. We study a solution method for sequential decision problems based around the continuation value function, rather than the value function. This approach turns to have significant advantages. One is that continuation value functions are smoother, allowing for sharper analysis of optimal policies and more efficient computation. Another is that, for a range of problems, the continuation value function exists in a lower dimensional space than the value function, mitigating the curse of dimensionality. In one typical experiment, the lower state dimension reduces computation time from over a week to less than three minutes. 1 The authors thank the Australian Research Council Discovery Grant DP addresses: qingyin.ma@anu.edu.au, john.stachurski@anu.edu.au 1

2 PREFACE Thesis Title: Essays on Sequential Decision Problems in Economic Dynamics Supervisor: Prof. John Stachurski In many economic problems, agents located in a stochastically evolving environment must choose between acting now or waiting for a better opportunity. These problems can be modeled in an optimal stopping framework. The thesis attempts to provide a systematic analysis to this class of problem. Three main contributions are made: Firstly, the thesis extends the standard dynamic programming theory by providing a systematic treatment to unbounded returns in optimal stopping problems. Under general settings where unbounded return functions are permitted, the thesis provides easy-to-check sufficient conditions for the existence and uniqueness of solutions to the Bellman equation, and the unique fixed point of the Bellman operator is shown to be the value function (VF). The theory is applicable to a broad class of applications in economics and finance. Secondly, the thesis proposes an alternative approach to solve optimal stopping problems. The idea involves calculating the continuation value function (CVF) directly, and has significant advantages over standard approaches based on VF: (1) In a wide range of economic applications, CVF exists in a lower dimensional space than VF, while the converse never holds. This allows us to mitigate one of the primary stumbling blocks for numerical analysis the curse of dimensionality. () CVF is typically smoother than VF, which is easier to approximate numerically. (3) CVF-based approach allows a sharp analysis of the optimal policy. Finally, some preliminary extensions have been done so far: the theory is shown to work well for repeated optimal stopping problems. The next stage is to build a unified theoretical framework that treats optimal stopping problems with recursive preferences. Although only applications of economics are presented, the theory developed contributes to many other areas, including mathematical finance, operations research, sequential analysis, and so on. The thesis is structured as follows: Chapter I: Introduction; Chapter II: Optimal Stopping with Unbounded Returns; Chapter III: The Continuation Value Based Approach; Chapter IV: Extensions; Chapter V: Conclusions. This paper is mainly based on results presented in Chapter III.

3 3 1. INTRODUCTION In many economic problems, agents face stochastically evolving environments and choose between acting immediately or waiting for a better opportunity. One such scenario is that faced by job seekers, who can either accept their current wage offer or continue job hunting (see, e.g., McCall (1970) or Pissarides (000)). Another one is that faced by firms choosing whether to enter a market or to wait, or to exit when incumbent (e.g., Jovanovic (198), Hopenhayn (199), Ericson and Pakes (1995), Fajgelbaum et al. (015)). Other problems in this category include American call and put options (Karatzas and Shreve (1998), Shiryaev (1999), Duffie (010)), consumer search problems (Burdett and Judd (1983), Kiyotaki and Wright (1993), Trejos and Wright (1995), Shi (1995, 1997)), optimal default (Choi et al. (003), Albuquerque and Hopenhayn (004), Arellano (008)), optimal replacement of durable goods (Rust (1986, 1987)), optimal timing of investment (Dixit and Pindyck (1994)), timing of retirement (Huggett et al. (011)), timing of harvesting agricultural products (Insley and Wirjanto (010)) and optimal monopoly pricing with unknown demand across multiple markets (Rothschild (1974)). In solving these problems, the standard path is to first seek the value function, which gives maximal expected rewards from the flow of possible payoffs. From the value function, one can calculate the continuation value by taking the expectation of the value function in the next period, appropriately discounted and combined with flow benefits from continuation. Once the continuation value is obtained, it can be compared with the reward from stopping. The optimal policy is to stop if and only if the reward from stopping is larger. An alternative approach was introduced by Jovanovic (198) in the context of firm exit decisions. The idea involves calculating the continuation value directly, using an operator that we refer to below as the continuation value operator. In this paper, we show that Jovanovic s approach extends naturally to almost all optimal stopping problems of interest to economists. We systematically study the method and its relationship to traditional dynamic programming. We show that, for many problems, this method has significant advantages over traditional methods based around the value function. Optimal stopping problems have major roles in other related fields. For example, in finance, American options provide the right to buy or sell an asset at a predetermined price or continue to the next period (Duffie (010)). Analysis of options in financial markets has led to the study of various economic and political decisions using the framework of real options (Alvarez and Dixit (014), Backus (014)). Within operations research, problems such as adaptive routing and optimal dynamic mechanism design are solved using the theory of optimal stopping.

4 4 One advantage is that, for a range of interesting problems, the continuation value function exists in a lower dimensional space than the value function. 3 For example, in the classic job search model of McCall (1970), wage offers are independent draws from a fixed distribution. The current offer affects lifetime rewards only if the agent decides to accept the offer. If not, then the process updates with the preceding draw forgotten. Hence the current wage draw appears in the value function since the offer in hand can impact lifetime rewards but has no impact on the continuation value. 4 The practical impact of lower dimensionality can be very large, as has been pointed out by many authors (see, e.g., Bellman (1969) or Rust (1997)). For example, while solving a well known version of the job search model in Section 5.1, we find that the continuation value based approach takes only 171 seconds to compute the optimal policy to a given level of accuracy, as opposed to more than 7 days for the value function iteration approach. A second potential benefit of the continuation value based approach is that the continuation value function is often smoother than the value function. The intuition behind this result is that the value function is typically kinked at points where it is optimal to switch between continuing and stopping. However, when transitions are stochastic and shocks have a degree of smoothness (for example, the distributions have densities), such kinks are smoothed out in the continuation value function. As a result, the continuation value function becomes easier to approximate and more useful for making inferences about the optimal policy. For example, we use smoothness in the continuation value function to obtain new results on the differentiability of transition thresholds (e.g., reservation wages) as functions of other state variables. In extending Jovanovic s continuation value function method to the whole spectrum of sequential decision problems used by economists, several challenges must be addressed. One is that, in many applications, rewards are unbounded, meaning that traditional methods based around contractions with respect to supremum norms do not apply. 5 To this end, we study the continuation value function in general settings where unbounded payoff functions 3 While every state variable that appears in the continuation value function must appear in the value function, the converse is not true. Hence, the number of arguments in the continuation value function is always weakly less than the number of arguments in the value function, and sometimes strictly so. 4 Of course the current wage offer could affect the continuation value in a variety of ways, some of which are considered below. For example, McCall (1970) considers a mechanism where the the current offer matters for the state of knowledge, as described a belief distribution. In this case, however, the value function is still higher dimensional than the continuation value function, since the value function must track both the current offer and the parameters in the belief distribution, while the continuation value function tracks only the latter. 5 While in some cases this problem can be eliminated by compactifying the state space to the underlying model, in other cases such changes are problematic. For example, wages might be driven by a state process with unit root (see Example. below), in which case the state space cannot be compactified. Alternatively, in studies

5 are allowed. This is achieved by using weighted supremum norms. This approach turns out to interact well with the continuation value function operator, leading to simple sufficient conditions that are straightforward to check in applications. Since we tackle unbounded problems, our research is also connected to earlier studies on unbounded dynamic programming. In economics, the weighted supremum norm approach was pioneered by Boyd (1990) and has been used in numerous other studies of unbounded dynamic programming. 6 When adapting this method to continuation value functions, we find it possible to develop a simple and direct version of the methodology that includes bounded problems as a special case. Another line of research treats unboundedness via the local contraction approach, which constructs a local contraction based on a suitable sequence of increasing compact subsets. See, for example, Rincón-Zapatero and Rodríguez-Palmero (003, 009), Martins-da Rocha and Vailakis (010) and Matkowski and Nowak (011). One of the motivations of this line of work is to deal with dynamic programming problems that are unbounded both above and below. For our problem, we show that the weighted supremum norm based method can tackle this case effectively, and hence we do not consider local contractions. The paper is structured as follows. Section outlines the method and provides the basic optimality results. Section 3 discusses the properties of the continuation value function, such as continuity and differentiability. Section 4 explores the connections between the continuation value and the optimal policy. Section 5 compares the computational efficiency of our approach with the value function approach. Section 6 concludes. Proofs are provided in the appendix OPTIMALITY RESULTS This section studies the optimality results. Prior to discussing technical details, we first give an overview of the method and our terminology..1. Overview. Consider a decision problem where an agent is faced at each point in time with the choice between stopping (e.g., exercising an option, exiting a market, accepting a of firm decisions, interest might center on the tails of the firm size distribution, so compactifying the state space is undesirable. 6 Examples include Becker and Boyd (1997), Alvarez and Stokey (1998), Durán (000), Durán (003) and Le Van and Vailakis (005). 7 Due to the page limit, the Appendix section has been cut short a lot. However, a complete technical appendix is available upon request.

6 6 job) or continuing to the next stage. Suppose that the value function v satisfies a Bellman equation of the form { v (z) = max r(z), c(z) + β } v (z )P(z, dz ) where z Z is the current state, z is next period s state, r(z) is the payoff to stopping, c(z) is the flow payoff to continuing and P gives one step transition probabilities for the state. For example, r(z) might be the liquidation value of a firm considering whether to exit a market and c(z) might be one period profit conditional on remaining active, given the state z. In this case, v (z) is the value of the firm prior to deciding whether to continue or exit. The continuation value function associated with this problem is the second term on the right hand side of (1). We write it as ψ (z) := c(z) + β v (z )P(z, dz ). () It is straightforward to write down a functional equation such that ψ is at least one of the solutions: From (1) and (), we have v (z) = max{r(z), ψ (z)} for all z. Inserting this identity into the right hand side of () leads us to the equation ψ(z) = c(z) + β max{r(z ), ψ(z )}P(z, dz ) (3) for all z Z. To analyze this equation, we study the operator Q defined by Qψ(z) = c(z) + β max{r(z ), ψ(z )}P(z, dz ). (4) By construction, fixed points of Q solve (3). As shown below, they are also continuation value functions and from them we can derive value functions, optimal stopping rules and so on. Once the fundamental optimality results are in place, we turn to properties of the continuation value function, such as continuity, differentiability and monotonicity, and deduce implications for the optimal stopping rule. Prior to these tasks, we recall some facts related to optimal stopping and weighted supremum norms. (1).. Preliminaries. For real numbers a and b we set a b := max{a, b}. If f and g are functions, then ( f g)(x) := f (x) g(x). If (Z, Z ) is a measurable space, then bz is the set of Z -measurable bounded functions from Z to R, with norm f := sup z Z f (z). For unbounded functions we use weighted supremum norms. Given a function κ : Z [1, ), the κ-weighted supremum norm of f : Z R is defined as If f κ f κ := f /κ = sup z Z f (z) κ(z) <, then we say that f is κ-bounded. The symbol b κ Z will denote the set of all functions from Z to R that are both Z -measurable and κ-bounded. We use ρ κ to represent

7 the metric ρ κ ( f, g) = f g κ on b κ Z. As is well-known, the pair (b κ Z, ρ κ ) forms a Banach space. A stochastic kernel P on (Z, Z ) is a map P : Z Z [0, 1] such that z P(z, B) is Z - measurable for each B Z and B P(z, B) is a probability measure for each z Z. Below, we understand P(z, B) as representing the probability of a state transition from z Z to B Z in one unit of time Set Up. Let (Z n ) n 0 be a time-homogeneous Markov process defined on probability space (Ω, F, P) and taking values in measurable space (Z, Z ). Let P denote the corresponding stochastic kernel. Let {F n } n 0 be a filtration contained in F and such that (Z n ) n 0 is adapted to {F n } n 0. Let P z indicate probability conditioned on Z 0 = z, while E z is expectation conditioned on the same event. In proofs we take (Ω, F ) to be the canonical sequence space, so that Ω = n=0z and F is the product σ-algebra generated by Z. For the formal construction of P z on (Ω, F ) given P and z Z see Theorem of Meyn and Tweedie (01) or Section 8. of Stokey et al. (1989). A random variable τ taking values in N 0 := {0, 1,...} is called a (finite) stopping time with respect to the filtration {F n } n 0 if P{τ < } = 1 and {τ n} F n for all n 0. Below, τ = n has the interpretation of choosing to act at time n. Let M denote the set of all stopping times on Ω with respect to the filtration {F n } n 0. Let r : Z R and c : Z R be a measurable functions, referred to below as the exit payoff and flow continuation payoff respectively. Consider a problem where, at each time t 0, an agent observes Z t and chooses between stopping and continuing. Stopping generates final payoff r(z t ). Continuing involves continuation payoff c(z t ) and transition to the next period, where the agent observes Z t+1 and the process repeats. Future payoff are discounted at rate β (0, 1). The value function is defined at z Z by v (z) := sup τ M { } τ 1 E z β t c(z t ) + β τ r(z τ ). (5) t=0 A stopping time τ M is called an optimal stopping time if it attains the supremum in (5). A policy is a map σ from Z to {0, 1}, with 0 indicating the decision to continue and 1 indicating the decision to stop. A policy is called an optimal policy if τ defined by τ 0 σ(z t ) = 1} is an optimal stopping time. := inf{t To guarantee existence of the value function and related properties without insisting that the payoff functions are bounded, we adopt the next assumption:

8 8 Assumption.1. There exist a Z -measurable function g : Z R + and constants m, d R + such that βm < 1 and, for all z Z, { } max r(z ) P(z, dz ), c(z) g(z) (6) and g(z )P(z, dz ) mg(z) + d. (7) The interpretation of Assumption.1 is that both r and c are small in absolute value relative to some function g such that E g(z t ) does not grow too quickly. Slow growth in E g(z t ) is imposed by (7), which can be understood as a geometric drift condition (see, e.g., Meyn and Tweedie (01), chapter 15). 8 Example.1. A standard example of an optimal stopping problem in economics is job search. As a simple example, suppose that a worker can either accept a current wage offer w t and work permanently at that wage, or reject the offer, receive unemployment compensation c, and reconsider next period. Let the current wage offer be a function w t = w(z t ) of some idiosyncratic or aggregate state process (Z t ) t 0. The exit reward is r(z) = u(w(z))/(1 β), where u is a utility function and β < 1 is the discount factor. The flow continuation payoff is the constant c. 9 If u is bounded, then we can then set g(z) equal to the constant r c and Assumption.1 is satisfied with m = 1 and d = 0. Example.. Consider the same setting as Example.1, with state process z t+1 = ρz t + b + ε t+1, (ε t ) IID N(0, σ ), (8) Let w t = exp(z t ), so that wages are lognormal, We consider several standard utility functions that are unbounded. (1) For u(w) = ln w. If β ρ < 1, let g(w) = ln w, m = ρ, and d = σ π + b, then Assumption.1 holds. Since the correlation coefficient ρ 1 is allowed, our theory can treat nonstationary state processes. () For u(w) = w1 γ 1 γ, where γ 0 and γ = 1. Notice that when γ = 0, the utility function is of constant relative risk aversion form, with a coefficient of relative risk aversion γ. When γ = 0, the utility function reduces to u(w) = w. ] (a) If ρ [0, 1] and β exp [(1 γ)ρb + (1 γ) ρ σ < 1, then Assumption.1 holds ] by letting m = d = exp [(1 γ)ρb + (1 γ) ρ σ and g(w) = w (1 γ)ρ. 8 To verify Assumption.1, it sufficies to obtain a Z -measurable function g : Z R +, constants m, d R + with βm < 1 and constants a 1, a, a 3 and a 4 in R + such that r(z ) P(z, dz ) a 1 g(z) + a, c(z) a 3 g(z) + a 4 and (7) holds. We use this fact in the applications below. 9 The classical McCall model used an IID wage process (McCall (1970)). We follow many subsequent studies in assuming Markov dynamics for wages (see, e.g., Jovanovic (1987) or Bull and Jovanovic (1988)).

9 ] (b) If ρ [ 1, 0] and β exp [ (1 γ)ρb + (1 γ) ρ σ < 1, then Assumption.1 ] holds by letting m = exp [ (1 γ)ρb + (1 γ) ρ σ, d = 0, and g(w) = w (1 γ)ρ + w (1 γ)ρ. Example.3. Consider the asset pricing problem of a perpetual call option (see, e.g., Shiryaev (1999), Duffie (010)), an infinite-horizon American call option with no fixed maturity nor exercise limit. Let x be the current price of the asset. Recall the stochastic process defined in (8), and let the sequence of asset price (x t ) t 0 be x t = e z t for all t 0. The value of the option to buy the asset at a strike price K is given by } v (x) = max {(x K) +, e γ v (x ) f (x x) dx where f (x x) = LN(ρ ln x + b, σ ), and γ > 0 is the riskless rate of return. If ρ [0, 1] and ) ) β exp (ρb + ρ σ < 1, then Assumption.1 holds by letting m = d = exp (ρb + ρ σ, and ) g(x) = x ρ. If ρ [ 1, 0] and β exp ( ρb + ρ σ < 1, then Assumption.1 holds by letting ) m = exp ( ρb + ρ σ, d = 0, and g(x) = x ρ + x ρ Optimality. Let g be as in Assumption.1 and let k(z) := β t E z { r(z t ) + g(z t )} + 1, (9) t 0 As supplementary appendix of this paper, Ma (016) shows that, under Assumption.1, the value function v is a well-defined element of b k Z that satisfies the Bellman equation (1), and that the Bellman operator Tv(z) = max { r(z), c(z) + β } v(z )P(z, dz ) is a contraction mapping on b k Z when paired with the weighted supremum norm k. Hence v is the unique fixed point. With the notation introduced in Section., the Bellman equation (1) can be expressed in functional notation as v = r (c + βpv), and the continuation value function can be defined by ψ = c + βpv. Since v satisfies the Bellman equation, we also have v = r ψ. Ma (016) also shows that the optimal stopping time is τ := inf{t 0 r(z t ) ψ (Z t )}. Thus, the optimal strategy is a Markov strategy, with action at time t depending only on the current state Z t. (10).5. The Continuation Value Operator. Without loss of generality, consider the case m > 1 and d 1 β 1. Let l be the weighting function l(y) = g(y) + d m 1. (11)

10 10 Let Q be the operator from b l Z to itself defined by (4). As we now show, the fixed point of Q is the continuation value function ψ defined in (). Theorem.1. If Assumption.1 holds, then the following statements are true: (1) Q is a contraction mapping on (b l Z, ρ l ) of modulus βm. () The unique fixed point of Q in b l Z is ψ. (3) The policy σ defined pointwise by σ (z) = 1{r(z) ψ (z)} is an optimal policy. Example. (Continued). Recall the extended job search model of McCall (1970), in which a general Markov process (z t ) t 0 is considered that generates the wage process. For each type of utility function u, the continuation value operator satisfies { u(w Qψ(w) = c + } ) β max 1 β, ψ(w ) f (w w) dw Since Assumption.1 has been verified, from Theorem.1 we know that there exists a unique fixed point of Q under b l Z that coincides with ψ the continuation value function, which in the current case represents the expected value of rejecting the current offer and waiting for a new draw. Example.3 (Continued). Recall the perpetual option problem of Shiryaev (1999). The continuation value operator for the perpetual option satisfies Qψ(x) = e γ max{(x K) +, ψ(x )} f (x x) dx By Theorem.1, Q admits a unique fixed point ψ in b l Z, which in this case can be interpreted as the expected value of holding the option in the current period and considering exercising at a later stage. Example.4. (Firm Exit I). Consider a firm exit model in the style of Hopenhayn (199). At the beginning of each period, a productivity shock a is realized and observed by an incumbent firm in the industry. The firm must decide whether to exit the market or not in the next period (before a is realized). The output of the firm is q(a, l) = al α, where α (0, 1), l denotes the labor demand. Suppose that the productivity shock process (a t ) t 0 satisfies a t = e z t for all t 0, where (z t ) t 0 is defined in (8). A fixed cost c f > 0 must be paid every period by the incumbent firm, which can be treated as a fixed outside opportunity cost for some resources (e.g., managerial ability) used by the firm. Given output and input prices p and w, profit maximization behavior implies that the exit payoff and flow continuation payoff of staying in the industry r(a) = c(a) = Ga 1 α 1 c f, where G = ( αp ) 1 ( ) 1 α 1 α w α w. The continuation value operator Qψ(a) = ( Ga 1 1 α c f ) + β { } max Ga 1 1 α c f, ψ(a ) f (a a) da

11 [ ] where f (a a) = LN(ρ ln a + b, σ ). It can be verified that if β exp b 1 α + σ < 1 and ρ (1 α) [0, 1], then Assumption.1 holds by letting g(a) = a 1 1 α If ρ [ 1, 0] and β exp a 1 1 α + a 1 1 α [ b 1 α + and m = exp [ b ] σ (1 α) ] 1 α + σ (1 α) and m = d = exp [ b 1 α + 11 ]. σ (1 α) < 1, then Assumption.1 holds by letting g(a) = and d = 0. By Theorem.1, Q admits a unique fixed point in b l Z that corresponds to the continuation value function ψ, which can be understood as the expected value of staying in the industry for the next period and performing optimally afterwards. Example.5. (Firm Exit II). Consider the firm exit model of Jovanovic (198). Let q be the output of a firm, and C(q) a cost function that satisfies: C(0) = C (0) = 0, C (q) > 0, C (q) > 0, and lim q C (q) =. The total cost is C(q)x, where (x t ) t 0 is a stochastic process that satisfies x t = l(η t ); l is a positive, strictly increasing, and continuous function with lim η l(η) = α 1 > 0 and lim η l(η) = α ; and (η t ) t 0 is a stochastic process that satisfies η t = ξ + ε t, (ε t ) IID N(0, σ ) where ξ denotes firm type, which is connected to firm efficiency and unobservable. At the beginning of each period, the firm observes x, and must decide whether to exit the industry or not. The firm has prior belief ξ N(µ, γ) and updates it in a Bayesian manner after observing x, so the posterior ξ x N(µ, γ ), where γ = 1 ( ) ( 1 γ + 1 σ and µ = γ µ γ + l 1 (x ) ). σ Let π(p, x) = max[pq C(q)x] be the maximal profits, where (p t ) t 0 is a bounded price sequence which is Markovian with transition probability h. Jovanovic (198) shows that π q is a bounded and continuous function. Let W > 0 denote the expected present value of the firm s fixed factor in a different industry. Then the continuation value operator satisfies Qψ(p, x, µ, γ) = π(p, x) + β max{w, ψ(p, x, µ, γ )} f (x µ, γ)h(p p) d(x, p ) Since both the exit and flow continuation payoffs are bounded, Assumption.1 satisfies trivially by letting g be the upper bound of W π, m = 1 and d = 0. So Q admits a unique fixed point ψ in bz that can be interpreted as the value of staying in the industry for one period and performing optimally afterwards. The two operators Q and T are closely related, in the sense that the n-th iterate of the value function can be obtained from the n-th iterate of the continuation value function by taking the pointwise maximum of this function and r. In particular, iterates of these operators converge to their respective fixed points at the same rate. The next proposition clarifies: Proposition.1. Fix ψ 0 b l Z and let v 0 := r ψ 0. If v n := T n v 0 and ψ n := Q n ψ 0 for some n N, then v n = r ψ n.

12 1 3. PROPERTIES OF CONTINUATION VALUES In this section we explore some further properties of the continuation value function. As the most significant result among those we establish, the continuation value function ψ is shown to be smooth (continuously differentiable) under mild assumptions. While the value function v usually has kinks, ψ can be smoother since the incorporated integration operation creates a smoothing effect. This makes the continuation value based approach more favorable for numerical computation than the value function based approaches since smooth functions are easier to approximate numerically Continuity. We establish two results on continuity. The first one serves general problems, while the second one works well when the stochastic kernel P admits a density representation. Assumption 3.1. The flow continuation payoff function c is continuous. Assumption 3.. The function z max{r(z ), ψ(z )}P(z, dz ) is continuous for all continuous function ψ b l Z. Assumption 3.3. The exit payoff function r is continuous. We have the following general result on the continuity of ψ. The continuity of v can be obtained as a byproduct under an additional continuity assumption of r. Proposition 3.1. If Assumptions.1 and hold, and g is continuous, then ψ is continuous. If in addition Assumption 3.3 holds, then v is continuous. In many applications, the stochastic kernel P has a density representation, which makes the verification of Assumption 3. easier. Definition 3.1. A stochastic density kernel (or density kernel) on Z is a measurable function f : Z Z R + such that f (z z)dz :=: f (z z)λ(dz ) = 1 for all z Z where λ denotes the Lebesgue measure. We say that the stochastic kernel P has a density representation if there exists a density kernel f such that P(z, B) = 1(z B) f (z z) dz for all z Z and B Z The following result provides an alternative way to obtain the continuity of ψ and v when P has a density representation, which is highly valuable in applications.

13 13 Proposition 3.. Suppose Assumptions.1, 3.1 and the following conditions hold: (1) P has a density representation f, and z f (z z) is continuous for all z Z; () z r(z ) f (z z)dz, z g(z ) f (z z) dz, and g are continuous. Then ψ is continuous. If in addition Assumption 3.3 holds, then v is continuous. Remark 3.1. When the return functions r and c are bounded, as is the case of many standard economic models, establishing the continuity of ψ is even easier. For general problems, we only require that Assumption 3.1 holds and that P satisfies the Feller property. When P has a density representation f, Assumption 3.1 and the continuity of z f (z z) (for all z Z) are sufficient for ψ to be continuous. Example.5 (Continued). Recall the firm exit model of Jovanovic (198). The exit payoff W and flow continuation payoff π are bounded and continuous, and the Feller property in this case can be easily verified by applying Lemma 7.1 in the Appendix. Therefore, ψ is continuous. Since the exit payoff W is constant, v is continuous. Remark 3.. The continuity of ψ does not necessarily require the continuity of r, while the continuity of v usually does. Intuitively, the integration operation inside operator Q has a smoothing effect. Example. (Continued). Recall the job search problem where the wage sequence is driven by a general Markov process (z t ) t 0. Notice that P has a density representation f, and w f (w w) is continuous for all w Z. Moreover, it is easy to verify the following statements: (1) ln w f (w w) dw = σ () w a f (w w) dw = w aρ e ( π e ab+ a σ [ (ρ ln w+b) ) σ ] + (ρ ln w + b) (a = 0) [ ( 1 Φ )] ρ ln w+b σ where Φ denotes the normal cumulative distribution function. So the second condition of Proposition 3. holds. Therefore, we can show that ψ and v are continuous for all three types of u functions. Example.3 (Continued). For the perpetual option problem presented previously, P admits a density representation f, and x f (x x) is continuous for all x Z. Since x ρ f (x x) dx ) = exp (ρ ln x + ρb + ρ σ for all ρ R, we can easily verify the second condition of Proposition 3. by applying Lemma 7.1 in the Appendix. Therefore, ψ and v are continuous. Example.4 (Continued). Recall the firm exit problem of Hopenhayn (199). Notice that c(a) = Ga 1 1 α c f is continuous. Through similar analysis as in Example.3, we can show that ψ and v are continuous.

14 14 Example 3.1. (Firm Entry). Consider the firm entry problem in the style of Fajgelbaum et al. (015). In the beginning of each period, the firm observes an investment cost f, where { f t } IID h = LN(µ f, γ f ). Based on the belief of the fundamental, the firm has two choices: enter the market, incur the observed investment cost and obtain a stochastic dividend x t through production, or wait and reconsider next period. The firm aims to find a decision rule that maximizes the expected net present value. The stochastic dividend follows x t = ξ t + ε x t, {εx t } IID N(0, γ x ), where ξ t and ε x t are respectively the persistent and transient component. A public signal y t is released at the end of each period, where y t = ξ t + ε y t, { ε y } IID t N(0, γ y ). Suppose that the firm has prior belief ξ N(µ, γ) at the beginning of each period and updates it in a Bayesian way after observing ( ) ( 1 y, then the posterior belief ξ y N(µ, γ ), where γ = 1 γ + 1 γy and µ = γ µ γ + y γ y ). The firm has constant absolute risk aversion u(x) = 1 a (1 e ax ), a > 0. The continuation value operator follows Qψ(µ, γ) = β max { E µ,γ [u(x )] f, ψ(µ, γ ) } p( f, y µ, γ) d( f, y) (1) where p( f, y µ, γ) = h( f )l(y µ, γ) with l(y µ, γ) = N(µ, γ + γ y ). Moreover, the exit payoff ( [ ]) r( f, µ, γ) = E µ,γ [u(x)] f = 1 a 1 exp aµ + a (γ+γ x ) f. This is another example with unbounded returns. To apply our method, consider the state space Y = R R ++ with typical element y Y taking form of y = (µ, γ). Consider l : Y ( ) [1, ) defined by l(µ, γ) = exp aµ + a γ + 1. Then from Theorem.1, Proposition 3.1, and Lemma 7.1 in the Appendix, we can show that (See the Appendix for a detailed proof) (1) Q is a well-defined mapping from b l Y into itself, and it is a contraction mapping of modulus β on the complete metric space (b l Y, ρ l ); () ψ and v are continuous functions. 3.. Shape Properties. We now study the shape properties of the continuation value function including monotonicity and concavity. Assumption 3.4. The flow continuation payoff c is increasing (resp. decreasing). Assumption 3.5. The function z max{r(z ), ψ(z )}P(z, dz ) is increasing (resp. decreasing) for all increasing (resp. decreasing) function ψ b l Z. Assumption 3.6. The exit payoff r is increasing (resp. decreasing). Remark 3.3. If Assumption 3.6 holds and P is stochastically increasing in the sense that P(z, ) first-order stochastically dominates P( z, ) for all z z, then Assumption 3.5 holds. We have the following result regarding monotonicity.

15 15 Proposition 3.3. Under Assumptions.1 and , ψ is increasing (resp. decreasing). If in addition Assumption 3.6 holds, then v is increasing (resp. decreasing). The next result studies concavity properties of ψ. Proposition 3.4. Suppose that Assumption.1 holds, r 0, P has a density representation f, and that z f (z z) (for all z Z) and c are concave (resp. convex) functions. Then ψ is a concave (resp. convex) function. Example. (Continued). In the job search problem where the wage process (w t ) t 0 is driven by a Markov process (z t ) t 0, the flow continuation payoff is constant, and each type of exit payoff is increasing. From the properties of the log-normal distribution we know that if ρ 0, the stochastic kernel corresponding to the density kernel f is stochastically increasing. By Theorem.1 and Proposition 3.3, ψ and v are increasing under the following circumstances: (1) u(w) = ln w and ρ [0, 1 w1 γ β ); () u(w) = 1 γ (γ 0, γ = 1), ρ [0, 1] and ] β exp [(1 γ)ρb + (1 γ) ρ σ < 1. Example.3 (Continued). Recall the pricing problem of the perpetual option. The exit payoff r(x) = (x K) + is increasing. Follow similar analysis as in Example., we can show that ψ and v are increasing. Example.4 (Continued). For the firm exit problem of Hopenhayn (199), both r and c are increasing functions. Similar as Examples. -.3, we can show that ψ and v are increasing functions. Example 3.1 (Continued). For the firm entry problem of Fajgelbaum et al. (015), Proposition 3.3 shows that ψ is increasing in µ, and v is increasing in µ and decreasing in f Differentiability. Suppose Z R m, then a typical element z Z takes form of z = (z 1,..., z m ). For given function h defined on Z and for all z int(z), define D i h(z) := h(z) z i, i = 1,..., m. For given z 0 Z and δ > 0, define B δ (z 0 ) := {z Z : z z 0 < δ}, B δ (z i 0 ) := {z i Z (i) : z i z0 i < δ}, B δ (z 0 ) and B δ (z0 i ) as their closures, where is the Euclidean norm, Z (i) is the i-th dimension of Z and Z ( i) denotes the remaining m 1 dimensions of Z. Assumption 3.7. P has a density representation f, and for all z Z, z f (z z) is differentiable at interior points in the sense that D i f (z z) exists for all z int(z), i = 1,..., m. Assumption 3.8. For all z 0 int(z), there exists δ > 0, such that for i = 1,..., m, the following functions take finite values: (1) z i 0 sup z i B δ (z i 0 ) D i f (z z) dz ;

16 16 () z0 i r(z ) sup D i f (z z) dz ; z i B δ (z0 i ) (3) z0 i g(z ) sup D i f (z z) dz. z i B δ (z0 i ) Assumption 3.9. The flow continuation payoff function c is differentiable at interior points in the sense that D i c(z) exists for all z int(z), i = 1,..., m. The following result provides a group of sufficient conditions for ψ to be differentiable. Proposition 3.5. Under Assumptions.1 and , ψ is differentiable at interior points in the sense that D i ψ (z) exists for all z int(z), i = 1,..., m. We consider an alternative way to establish the property of differentiability. Assumption For all z Z, z f (z z) is twice differentiable at interior points in the sense that D i f (z z) exits for all z int(z), i = 1,..., m. Moreover, each (z, z ) D i f (z z) is continuous. Assumption The following conditions hold for i = 1,..., m (1) There are finite solutions to D i f (z z) = 0, and for all z 0 int(z), there exists δ > 0, such that each solution (z, z i 0 ) z i (z, z i 0 ) / B δ(z i 0 ) as z ; () The following functions take finite values on int(z): (a) z D i f (z z) dz ; (b) z r(z )D i f (z z) dz ; (c) z g(z ) D i f (z z) dz. Moreover, r and g are continuous. Remark 3.4. A sufficient condition for condition (1) of Assumption 3.11 is frequently used when the state space is unbounded: There are finite solutions to D i f (z z) = 0, and each solution (z, z i ) z i (z, z i ) satisfies z i (z, z i ) as z for given z i int(z ( i) ); The following proposition, which avoids verifying assumption 3.8, is useful in applications where unbounded state space presents, as to be shown below. Proposition 3.6. Under Assumptions.1 and , ψ is differentiable at interior points in the sense that D i ψ (z) exists for all z int(z), i = 1,..., m. Outside of being highly valuable for numerical computaion, smoothness is a desired property in a lot of applications in which we want to characterize the properties of the optimal policy, as to be shown in the next section. Assumption 3.1. For i = 1,..., m, the following conditions hold:

17 (1) The following functions are continuous on int(z): (a) z D i f (z z) dz ; (b) z r(z )D i f (z z) dz ; and (c) z g(z ) D i f (z z) dz ; () The flow continuation payoff function c is continuously differentiable at interior points in the sense that z D i c(z) is continuous on int(z). 17 The next result provides sufficient conditions for ψ to be smooth. Proposition 3.7. Suppose that Assumption 3.1 holds, and either (1) or () holds: (1) The assumptions of Proposition 3.5 hold, and each z D i f (z z) is continuous on int(z); () The assumptions of Proposition 3.6 hold. Then ψ is continuously differentiable at interior points in the sense that z D i ψ (z) is continuous on int(z), i = 1,..., m. Remark 3.5. When the return functions r and c are bounded, conditions (1.b) and (1.c) of Assumption 3.1 are not required to establish the smoothness of ψ in Proposition 3.7. Example. (Continued). In the extended job search model where (w t ) t 0 is generated by the Markov process (z t ) t 0, it is easy to verify the following statements: ( ) (1) The solutions to f (w w) = 0 are w (w ) = exp ln w b w ρ σ ± σ 1 ρ ρ + ; ρ σ () f (w w) w dw = ρ σw π ; [ ] (3) (ln w ) f (w w) w ρ 1 w w exp (ln w ρ ln w b) (ln w ) + ρ ln w+b ln w ; πσ σ σ [ ] (4) w a f (w w) w ρ 1 w w exp (ln w ρ ln w b) (ln w ρ ln w b) +w a, a = 0; πσ σ σ (5) The four terms on both sides of statements (3) and (4) are continuous in w; (6) The integrations of the right-hand-side terms of statements (3) and (4) with respect to w are continuous in w. From the first statement we know that condition (1) of Assumption 3.11 holds. Based on statements () - (6) and Lemma 7.1 in the Appendix, we can show that condition () of Assumption 3.1 holds. The remaining conditions of Proposition 3.7 are easy to verify. Therefore, ψ is continuously differentiable. To see that ψ is smoother than v, we run the following simulation. For simplicity, we consider r(w) = w 1 β, and set β = 0.96, ρ = 0.6, σ = 1, b = 0 and c = 1. From Figure 1 we can see that although v has a kink in the interior of the state space, ψ is smooth in the sense that it is continuously differentiable and allows no kinks. Example.3 (Continued). Recall the pricing problem of the perpetual option. By similar analysis as in Example., we can show that ψ is continuously differentiable. This is the case despite the fact that the exit payoff r(x) = (x K) + has a kink at x = K. Therefore,

18 18 FIGURE 1. Comparison of ψ and v in general, the exit payoff function is not required to be differentiable for the continuation value function to be smooth. Example.4 (Continued). For the firm exit model of Hopenhayn (199), through similar analysis as in Examples. -.3, we can show that ψ is continuously differentiable Parametric Continuity. In applications, we are often curious about how the value function, continuation value function, and optimal policy change in response to the variation of some key parameters. In such circumstances, parametric continuity is highly valuable. Consider the parameter space Θ R k. Let P θ, r θ, c θ, v θ, and ψ θ denote the stochastic kernel, exit payoff, flow continuation payoff, value function, and continuation value function with respect to parameter θ Θ, respectively. Under Assumption.1, for all θ Θ, there exist measurable map g θ : Z R +, and constants m θ, d θ R + with βm θ < 1 such that for all z Z: (1) max { r θ (z ) P θ (z, dz ), c θ (z) } g θ (z); and () g θ (z)p θ (z, dz ) m θ g θ (z) + d θ. Define m := sup m θ and d := sup d θ. θ Θ θ Θ Assumption βm < 1 and d <. Remark 3.6. To simplify analysis, we consider the parameter space Θ that does not include the space of β. An alternative way to treat this problem is to consider β [0, a], where a [0, 1), and include this space as part of Θ. In this case, Assumption 3.13 is replaced by am < 1 and d <. All the theoretical results on parametric continuity of this paper remain true if we make this change.

19 Assumption For all θ Θ, P θ has a density representation f θ. For all z, z Z, θ f θ (z z) is continuous. For all z Z, θ r θ (z ) f θ (z z) dz and θ g θ (z ) f θ (z z) dz are continuous. Assumption For all z Z, θ r θ (z), θ c θ (z) and θ g θ (z) are continuous. 19 Under these assumptions we have the following result for parametric continuity. Proposition 3.8. Under Assumptions.1 and , θ ψθ (z) and θ v θ (z) are continuous for all z Z. Example. (Continued). Recall the extension of the job search model of McCall (1970). For simplicity, consider u(w) = ln w. Let the parameter space Θ = ( 1 β, 1 β ) A B C, where A, B are bounded subsets of R ++, R respectively, and C R. A typical element θ Θ takes form of θ = (ρ, σ, b, c). Based on Proposition 3.8, θ ψθ (w) and θ v θ (w) are continuous for all w Z. Similarly, we can establish the parametric continuity property for u(w) = w1 γ 1 γ (γ 0, γ = 1). Remark 3.7. The parametric continuity result of Examples. -.5 and 3.1 can be established similarly. To simplify analysis, unless explicitly specified, we do not discuss parametric continuity for other examples, though this property holds for each of them. 4. OPTIMAL POLICIES In this section, we discuss several other significant advantages of the continuation value based approach over traditional approaches based on the value function. To begin with, for a broad range of problems, the continuation value function exists in a lower dimensional space than the value function. The relationship is asymmetric. While each state variable that appears in the continuation value function must appear in the value function, the converse is not true. This facilitates numerical computation significantly since the curse of dimensionality is greatly mitigated. Moreover, among these problems, the decision rule usually exhibits threshold behavior with respect to some state variable, in the sense that the sequential decision process terminates whenever a threshold level is achieved by that state process. In such cases, the continuation value based method allows for a sharp analysis of the optimal policy. This type of problem is pervasive in quantitative and theoretical economic modeling, as we now formulate. Suppose that the state space Z R m and can be written as Z = X Y, where X is a convex subset of R and Y is a convex subset of R m The state process (Z n ) n 0 is then 10 To simplify analysis, we assume that X is one dimensional. In general, the dimension of X can be higher.

20 0 {(X n, Y n )} n 0, where (X n ) n 0 and (Y n ) n 0 are two stochastic processes taking values in X and Y respectively. In particular, the period-n state vector Z n = (X n, Y n ), where X n represents the first dimension and Y n the rest m 1 dimensions of the random variable Z n. Assume that stochastic processes (X n ) n 0 and (Y n ) n 0 satisfy the following properties: (1) (Monotonicity) The exit payoff function r is monotone on X; and () (Conditional Independence) Conditional on each Y n, the next period states (X n+1, Y n+1 ) and the current state X n are independent. We call each random variable X n the threshold state variable of period n, and each Y n the environment state vector (or environment states, or environment) of period n. Moreover, we call X the threshold state space and Y the environment space. Assume further, for this threshold state optimal stopping problem, that the flow continuation payoff c is defined on the environment space, i.e., c : Y R. Denote x as the threshold state variable and y as the environment so that the vector of state variables in the current period is z = (x, y). Let z = (x, y ) be the vector of states of next period. We know from the definition of the threshold state variable that the stochastic kernel P(z, dz ) can be represented by the conditional distribution function of (x, y ) given y, donoted as F y (x, y ), i.e., P(z, dz ) = P((x, y), d(x, y )) = df y (x, y ). Notice that under this setup, the continuation value ψ is a function of y only, while the value function v is a function of both x and y. So ψ has strictly fewer arguments than v. 11 Assumption 4.1. r is strictly monotone on X. Moreover, for all y Y, there exists x X such that r(x, y) = c(y) + β v (x, y ) df y (x, y ). Under Assumption 4.1, the reservation rule property holds. When the exit payoff r is strictly increasing in x, for instance, this property states that if the agent terminates at state x X at a given point in time, then he would have terminated at any higher state at that moment. Specifically, there is a decision threshold x : Y X such that when the state variable x attains this threshold level, i.e., x = x(y), the agent is indifferent between terminating and continuing, i.e., r( x(y), y) = ψ (y) for all y Y. As shown in Theorem.1, the optimal policy σ : Z {0, 1} satisfies σ (z) = 1{r(z) ψ (z)}. For threshold state optimal stopping problems, this policy is fully specified by the decision threshold x. In particular, under Assumption 4.1, the optimal policy σ (x, y) = 1{x x(y)} if r is strictly increasing in x, and σ (x, y) = 1{x x(y)} if r is strictly decreasing in x. Based on the properties of the continuation value function, the properties of the decision threshold x can be easily established. We summarize them in the following. Firstly, we have the following result for continuity. 11 In this case, since the threshold state is assumed one-dimensional, ψ has one less argument than v. In general, the difference in the arguments of ψ and v can be strictly larger than one.

21 Proposition 4.1. Suppose that either the assumptions of Proposition 3.1 or Proposition 3. hold, and that Assumption 4.1 holds, then x is continuous. 1 The next result provides sufficient conditions for x to be monotone. Proposition 4.. Suppose that the assumptions of Proposition 3.3 and Assumption 4.1 hold, and that r is defined on X. If ψ is increasing and r is strictly increasing (resp. decreasing), then x is increasing (resp. decreasing). If ψ is decreasing and r is strictly increasing (resp. decreasing), then x is decreasing (resp. increasing). A typical element y Y takes form of y = (y 1,..., y m 1 ). For i = 1,..., m 1 and given functions h : Y R, l : X Y R, define D i h(y) := h(y) y i, D i l(x, y) := l(x,y), and y i D x l(x, y) := l(x,y) x. The following result on the smoothness of x follows from Proposition 3.7 and the implicit function theorem. Proposition 4.3. Suppose that the assumptions of Proposition 3.7 and Assumption 4.1 hold. Moreover, r is continuously differentiable on int(z). Then x is continuously differentiable on int(y). In particular, D i x(y) = D ir( x(y),y) D i ψ (y) D x for all y int(y). r( x(y),y) Intuitively, (x, y) r(x, y) ψ (y) denotes the premium of terminating the sequential decision process. So functions (x, y) D i r(x, y) D i ψ (y); D x r(x, y) denote the instantaneous rate of change in the terminating premium in response to an instantaneous change in the environment state y i and threshold state x, respectively. Holding the terminating premium at 0, the change of premium as a result of change of x cancels the premium change resulting from the variation of y. Therefore, the instantaneous rate of change of x(y) with respect to y i is equivalent to the ratio of the instantaneous rates of changes in the premium. The negative sign is due to the 0-sum property of the terminating premium at the decision threshold x. Let x θ be the decision threshold with respect to θ Θ. We have the following result for parametric continuity. Proposition 4.4. Suppose that the assumptions of Proposition 3.8, and Assumptions 3.3 and 4.1 hold. Then θ x θ (y) is continuous for all y Y. Example 3.1 (Continued). Recall the firm entry problem of Fajgelbaum et al. (015). This is a typical threshold state optimal stopping problem. In particular, the threshold state space X = R +, and the threshold state variable x = f. The environment space Y = R R + with environment states y = (µ, γ). The value function of the firm follows { } v ( f, µ, γ) = max E µ,γ [u(x)] f, β v ( f, µ, γ )p( f, y µ, γ) d( f, y)

22 Since there are 3 state variables, v is defined on a space of 3-dimensional. However, ψ is defined on a space of -dimensional since the environment space is one dimension less. Moreover, the optimal policy is determined by a reservation cost function f : Y R such that when f = f (µ, γ), the firm is indifferent between entering the market and waiting. In particular, f (µ, γ) = Eµ,γ [u(x)] ψ (µ, γ) and optimal policy σ ( f, µ, γ) = 1{ f f (µ, γ)} for all ( f, µ, γ) Z. By Proposition 4.1, we can show that f is continuous. 5. COMPUTATIONAL EFFICIENCY The motivation of this section is to provide an illustration of the computational efficiency of the continuation value based method over the traditional value function based methods. Numerical experiments show that the partial impact of lower dimensionality of the continuation value can be huge, even the difference between the arguments of this function and the value function is only a single variable. For example, while solving a well known version of the job search model in Section 5.1, the continuation value iteration takes only 171 seconds to compute the optimal policy with the level of accuracy 10 6 (see group-3 experiments), as opposed to more than 7 days for the value function iteration. Moreover, we do not provide a detailed comparison of the two approaches in Section 5., as the computation via value function takes too long (more than 7 days) due to the curse of dimensionality. However, our approach takes only minutes to compute the optimal policy with a level of accuracy Finally, all the applications demonstrate the effectiveness our approach in characterizing the optimal policy Job Search II. Consider another extension of McCall s job search model presented by Ljungqvist and Sargent (01). The model is as the benchmark case, apart from the fact that the distribution of the wage process h is unknown. The worker knows that there are two possible densities f and g. At the start of time, nature selects h to be either f or g. The choice is not observed by the worker, who puts prior probability π 0 on f being chosen. By the π Bayes rule, π t updates via π t+1 = t f (w t+1 ) π t f (w t+1 )+(1 π t. We can express the value function )g(w t+1 ) of the unemployed worker recursively as follows { } w v (w, π) = max 1 β, c + β v (w, π )h π (w ) dw where π = q(w π f (w, π) = ) π f (w )+(1 π)g(w ) and h π(w ) := π f (w ) + (1 π)g(w ). This is a typical threshold state optimal stopping problem, in which the threshold state variable is w and the environment is π. In particular, ψ is defined on a space that is of lower dimensional than the state space where v is defined, in the sense that ψ is a function of π only while v is a function of both w and π.

23 Following Ljungqvist and Sargent (01), we set f = Beta(1, 1) and g = Beta(3, 1.). Then the state space Z = [0, ] [0, 1]. Based on our theory, the optimal policy is characterized by a reservation wage function w : [0, 1] R such that when w = w(π), the worker is indifferent between accepting and rejecting the offer. Denote b[0, 1] as the set of bounded functions on [0, 1]. Consider the Banach space (b[0, 1], ) as the space of candidate functions. The continuation value operator defined on this space satisfies { w } Qψ(π) = c + β max 1 β, ψ q(w, π) h π (w ) dw (13) This is the special case of our theory when the state space is compact, and both exit and flow continuation payoffs are bounded. Proposition 5.1. When the unemployment compensation c [0, ], the following statements hold: (1) Q is a well-defined mapping from b[0, 1] into itself, and it is a contraction mapping of modulus β on the Banach space (b[0, 1], ). { } () The value function v (w, π) = max w 1 β, ψ (π), reservation wage w(π) = (1 + β)ψ (π), and optimal policy σ (w, π) = 1{w w(π)} for all (w, π) Z. (3) ψ, w, and v are continuous functions. 3 FIGURE. The reservation wage Following Section 6.6 of Ljungqvist and Sargent (01), we set β = 0.95 and c = 0.6. In the benchmark simulation, the grid points (w, π) lie in [0, ] [10 4, ] with 100 points for the w grid and 50 points for the π grid. As shown in Figure, the reservation wage w is a decreasing in π. Intuitively, f is a less attractive offer distribution than g, and larger π means more weight on f and less on g. Therefore, larger π depresses the worker s assessment of his future prospects, and relatively low current offers become more attractive.

24 4 Since the computation is -dimensional via value function iteration (VFI), and is only 1- dimensional via continuation value function iteration (CVI), we can expect that the computation via CVI would be much faster. To make a comparison, we conduct several groups of experiments and provide the time taken by the two approaches. All the experiments are processed in a standard Python environment on a laptop with a.5 GHz Intel Core i Group-1 Experiments. In this group, we explore the time taken by the two approaches to compute the fixed point at different levels of accuracy and across different parameterizations. Specifically, Table 1 provides the list of experiments we perform. In all simulations, the setup of the grid points is the same as the baseline simulation. For each given test and level of accuracy, we run the simulation 50 times for CVI, 0 times for VFI, and calculate the average time. The results are provided in Table. TABLE 1. Group-1 Experiments Parameter Test 1 Test Test 3 Test 4 Test 5 β c TABLE. Time Taken of Group-1 Experiments Test/Method/Precision Test 1 Test Test 3 Test 4 Test 5 VFI CVI VFI CVI VFI CVI VFI CVI VFI CVI As can be seen in Table, our method performs much better than VFI. Averagely speaking, CVI is 141 times faster than VFI. In the best case, CVI is 07 times faster. In Test 5, VFI takes seconds to achieve the level of accuracy 10 3, while CVI takes only 1.33 seconds. Even if in the worst case, CVI is 109 times faster. In Test 5, VFI takes seconds while CVI takes only.99 seconds to achieve the level of accuracy 10 6.

25 5.1.. Group- Experiments. In applications, more grid points are needed for the numerical approximation to be more accurate. In this group of experiments, we compare how the two approaches perform under different grid sizes. The parameterization is the same as in the benchmark setup. Again, we run the simulation 50 times for CVI, 0 times for VFI, and calculate the average time. Information and results of these experiments are provided in Table 3 and Table 4, respectively. 5 TABLE 3. Group- Experiments Variable Test Test 6 Test 7 Test 8 Test 9 Test 10 π w TABLE 4. Time Taken of Group- Experiments Test/Precision/Method Test Test 6 Test 7 Test 8 Test 9 Test 10 VFI CVI VFI CVI VFI CVI VFI CVI VFI CVI VFI CVI As can be seen, our approach outperforms VFI more obviously as the grid size increases. In Table 4 we see that as we increase the number of grid points for w, the speed of CVI is not affected. However, the speed of VFI reduces significantly. Amongst tests, 6 and 7, CVI is 19 times faster than VFI on average. In the best case, CVI is 386 times faster. While it takes VFI seconds to achieve a level of accuracy 10 3 in Test 7, CVI takes only 0.9 second. As we increase the grids of w from 100 to 00, CVI is not affected, but the time taken for VFI almost doubles. Obviously, this is because the grid points for w are not used for CVI, while they are part of the grids for VFI.

26 6 As we increase the grid size of both w and π, there is a slight decrease in the computation speed of CVI. Nevertheless, the decrease in the speed of VFI is almost exponential. Amongst tests and 8-10, CVI is 3.41 times as fast as VFI on average. In Test 10, VFI takes seconds to achieve a level of precision 10 3, instead, CVI takes only 1.83 seconds, which is 386 times faster Group-3 Experiments. Since the total number of grids increases exponentially with respect to the total number of states, the speed of computation drops dramatically as the number of states increases. For example, with 3 state variables, VFI suffers the curse of dimensionality, while CVI works quite well. To illustrate this point, we consider the parametric class problem with respect to the unemployment compensation c, in which case c is treated as an alternative state variable. In this case, VFI has 3 state variables and the computation takes more than 7 days. However, the CVI has only states and the computation finishes within 171 seconds to attain the accuracy level Hence, we can conveniently calculate via CVI the reservation wage as a function of both π and c. Figure 3 provides the result. FIGURE 3. The reservation wage This figure, in which a whole class of c values are considered, serves as a generalization of Figure. Not surprisingly, the reservation wage increases as c increases, since a higher level of compensation hinders the agent s incentive to enter into the labor market. 5.. Job Search III. Consider the adaptive search model proposed (though not implemented) in McCall (1970). The model explores how the reservation utility changes in response to the agent s expectation of the mean and variance of the unknown wage offer distribution. Suppose the wage process follows w = ξ + ε w, ε w N(0, γ w ) (14)

27 where ξ is the persistent component with prior belief ξ N(µ, γ), and ε w is a transitory component. The worker s current estimate of the next period wage distribution is f (w µ, γ) = N(µ, γ + γ w ). After observing w next period, the posterior belief ξ w N(µ, γ ), where ( ) ( 1 γ = 1 γ + 1 γ w and µ = γ µ γ + w γ w ). The worker has constant absolute risk aversion u(w) = 1 a (1 e aw ), a > 0. Once he accepts the offer, the search process terminates and he obtains the same utility u(w) in each future period. If the agent rejects the offer, he obtains utility c from unemployment compensation and reconsiders next period. The value function follows { u(w) v (w, µ, γ) = max 1 β, c + β v (w, µ, γ ) f (w µ, γ) dw } This is another threhold state optimal stopping problem. In particular, the threshold state space X = R and the threshold state variable x = w. The environment space Y = R R + and the environment states y = (µ, γ). Since there are 3 state variables, standard approaches via VFI suffers the curse of dimensionality. The computation via VFI is as time-consuming as it performs in Experiment 3 of Section 5.1. However, the computation via CVI is only - dimensional and our theory works well. Notice that the exit payoff is unbounded below. We consider a weight function l : Y [1, ) ( ) defined by l(µ, γ) = exp aµ + a γ + 1 and the space of candidate functions (b l Y, ρ l ). For all ψ b l Y, the continuation value operator follows { u(w Qψ(µ, γ) = c + } ) β max 1 β, ψ(µ, γ ) f (w µ, γ) dw (16) where µ, γ and f (w µ, γ) are defined as above. Based on the theory of Section 4, the optimal policy is determined by a reservation wage function w : Y R such that when w = w(µ, γ), the worker is indifferent between accepting and rejecting the job offer. Proposition 5.. Suppose that the unemployment compensation satisfies c < 1 a. Then the following statements hold: (1) Q is a well-defined mapping from b l Y into itself, and it is a contraction mapping of modulus β on the complete metric space (b l Y, ρ l ). { } () For all (w, µ, γ) Z, the value function v (w, µ, γ) = u(w) max 1 β, ψ (µ, γ), reservation wage w(µ, γ) = 1 a ln [1 a(1 β)ψ (µ, γ)], and optimal policy σ (w, µ, γ) = 1{w w(µ, γ)}. (3) ψ, w, and v are continuous functions. (4) ψ and w are increasing functions of µ. v is an increasing function of w and µ. Remark 5.1. When risk aversion is considered, the exit payoff is bounded above, though it is unbounded below. However, it is easy to verify that our theory can be applied to all settings 7 (15)

28 8 where the exit payoff is of form r(w) = aw + b, a, b R +, or r(w) = ae w + b with a, b R +, and the flow continuation payoff c b. Since in the current context (1 β)ψ is a monotone transformation of the reservation wage and possesses clear economic intuition, we define it as the reservation utility function and use it for the remaining analysis. In the simulation, we set β = 0.95 and a = 0.6. To parallel Ljungqvist and Sargent (01), we set c = after transforming their parameterization by the utility function u. The literature provides little guidance on γ w, so we perform a sensitivity analysis. The grid points (µ, γ) lie in [ 50, 50] [10 4, 5], with 150 points for the µ grid and 75 points for the γ grid. The grid is scaled to be more dense when the absolute values of µ and γ are small. We set the threshold function outside the grid to its value at the closest grid. The integration is computed via Monte Carlo with 1000 draws. 1 Figure 4 provides the simulation results. There are several key characteristics in Figure 4. Firstly, in each case, the reservation utility is an increasing function of µ, which parallels the result of Proposition 5.. Naturally, a more optimistic agent (higher µ) would expect that higher offers can be obtained. Thus he will not accept the current offer until the utility obtained is high enough. As another interesting point, for given µ of a relatively small value, the reservation utility is increasing in γ. However, as µ gets large, this utility starts to be decreasing in γ. Intuitively, although a pessimistic worker (low µ) expects that he will obtain low wage offers on average, part of the downside risks are chopped off. Worst case scenario, he is ensured to get an unemployment compensation c > 0. Thus, a higher level of uncertainty (higher γ) in the offer distribution provides the worker with a better opportunity to try the fortune for a good offer. This pushes up the reservation utility. For an optimistic (high µ) but risk-averse worker, since the choice is irreversible, when facing a higher level of uncertainty, the worker has an incentive to enter the labor market at an earlier stage so as to avoid downside risks. This depresses the reservation utility. For similar reasons, increasing γ w creates a positive effect on the reservation utility when µ is small Job Search IV. We consider another extension of the standard job search model of Mc- Call (1970). Assume that the wage process follows w t = η t + θ t ξ t (17) ln θ t = ρ ln θ t 1 + ln u t (18) 1 Changing the number of Monte Carlo samples, the grid range and grid density produces almost the same results.

29 9 FIGURE 4. The reservation utility where ρ [ 1, 1] is a constant. The sequences {ξ t } IID h with ξ h(ξ)dξ <, {η t } IID v with η v(η)dη <, and {u t } IID LN(0, σu). Moreover, {ξ t }, {η t }, and {u t } are independent, and the sequence {θ t } is independent of {ξ t } and {η t }. The process in (17) and (18) is general in the sense that it incorporates several standard setups. For example, when {ξ t } and {η t } are log normally distributed, it simplifies to the setup of Kaplan and Violante (010), where income fluctuation problems are studied. Furthermore, when {ξ t } IID N(0, 1), through some slight modification, this process simplifies to a setup that incorporates the standard stochastic volatility model (see, e.g., Taylor, 198). We set h = LN(0, σ ξ ) and v = LN(µ η, σ η). In this case, θ t and ξ t are persistent and transitory components of income, respectively, while u t is treated as a shock to the persistent component. η t can be interpreted as social security, gifts, etc. The threshold state space X = R + with threshold state process {w t }, and the environment space Y = R + with environment

30 30 process {θ t }. This is another example for which the computation via VFI lacks efficiency but our method performs very well. The value function of the agent satisfies { } w v (w, θ) = max 1 β, c + β v (w, θ ) f (θ θ)h(ξ )v(η ) d(θ, ξ, η ) and the continuation value operator takes form of { w } Qψ(θ) = c + β max 1 β, ψ(θ ) f (θ θ)h(ξ )v(η ) d(θ, ξ, η ) where w = η + θ ξ, and f (θ θ) = LN(ρ ln θ, σu) is the density kernel of the Markov process ( ) {θ t }. Suppose ρ [ 1, 1] ρ and β exp σu < 1, then Assumption.1 holds by letting g(θ) = ( ) θ ρ + θ ρ, m = ρ exp σu, and d = 1 β 1. ( ) Proposition 5.3. Suppose ρ [ 1, 1], λ := ρ β exp σu < 1, and the unemployment compensation c R +. Then the following statements hold: (1) Q is a well-defined mapping from b l Y into itself, and it is a contraction mapping of modulus λ on the complete metric space (b l Y, ρ l ). { } () The the value function v (w, θ) = max w 1 β, ψ (θ), reservation wage w(θ) = (1 β)ψ (θ), and optimal policy σ (w, θ) = 1{w w(θ)} for all (w, θ) Z. (3) ψ and w are continuously differentiable, and v is continuous. We choose β = 0.95 and µ η = 0 for the baseline parameterization. We set σ ξ = 0.05, σ η = 0.001, and σ u = In the first simulation, we consider the parametric class problem with respect to c, where we let c [0, 10] with 50 grid points and ρ = 1. In the second simulation, we consider the parametric class problem with respect to ρ, where ρ [0.5, 1] with 0 grid points and we set c = 0.6 as in Ljungqvist and Sargent (01). We set θ [10 3, 5] with 100 grid points, and the grid is scaled to be more dense when θ is smaller. Similar as before, the reservation wage outside the grid points is set to its value at the closest grid, and the integration is computed via Monte Carlo with 1000 draws. We see in Figure 5 that the reservation wage is an increasing function of θ. When the realization of θ is small, the reservation wage is an increasing function of the unemployment compensation c. When θ gets large, the reservation wage becomes less sensitive to c. Intuitively, when θ gets well above c, since the shock is highly persistent (ρ = 1), the reservation wage is completely determined by the realization of the permanent shock. In Figure 6, we see that for any ρ [0.5, 1], the reservation wage is an increasing function of θ. For larger ρ, the slope of the reservation wage function is sharper. Intuitively, ρ measures the degree of income persistence. As ρ gets larger, the effect of a positive shock lasts longer, which pushes up the worker s reservation wage.

31 FIGURE 5. The reservation wage FIGURE 6. The reservation wage 6. CONCLUSION In this paper, we study an alternative solution method to sequential decision problems.

31 31 FIGURE 5. The reservation wage FIGURE 6. The reservation wage 6. CONCLUSION In this paper, we study an alternative solution method to sequential decision problems. The idea involves calculating the continuation value directly. We show that not only is the set of possible applications of this method very broad, but it turns to have significant advantages over traditional methods based on the value function. 7. APPENDIX Denote (X, X ) as a measurable space and (Y, Y, u) as a measure space. Lemma 7.1. Let p : Y X R be a measurable map that is continuous in x. If there exists a measurable map q : Y X R + that is continuous in x with q(y, x) p(y, x) for all (y, x) Y X, and that x q(y, x)u(dy) is continuous, then the mapping x p(y, x)u(dy) is continuous. Proof. Since q(y, x) p(y, x) for all (y, x) Y X, we know that (y, x) q(y, x) ± p(y, x) are nonnegative measurable functions. Let (x n ) be a sequence of X with x n x. By Fatou s lemma, we have lim inf [q(y, x n) ± p(y, x n )]u(dy) lim inf n n [q(y, x n ) ± p(y, x n )]u(dy) From the given assumptions we know that lim q(y, xn )u(dy) = q(y, x). Combine this n result with the above inequality, we have ± p(y, x)u(dy) lim inf n ( ± ) p(y, x n )u(dy)

Optimal Timing of Decisions: A General Theory Based on Continuation Values 1

Optimal Timing of Decisions: A General Theory Based on Continuation Values 1 Qingyin Ma a and John Stachurski b a, b Research School of Economics, Australian National University April 17, 2017 ABSTRACT.