Time inconsistency of optimal policies of distributionally robust inventory models

Time inconsistency of optimal policies of distributionally robust inventory models Alexander Shapiro Linwei Xin Abstract In this paper, we investigate optimal policies of distributionally robust (risk averse) inventory models. We demonstrate that if the respective risk measures are not strictly monotone, then there may exist infinitely many optimal policies which are not base-stock and not time consistent. This is in a sharp contrast with the risk neutral formulation of the inventory model where all optimal policies are time consistent. This also extends previous studies of time inconsistency in the robust setting. Keywords: inventory model, base-stock policy, time consistency, distributional robustness, moment constraints, coherent risk measures School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA, e-mail: ashapiro@isye.gatech.edu. Booth School of Business, The University of Chicago, Chicago, Illinois 60637, USA, e-mail: Linwei.Xin@chicagobooth.edu. 1

1 Introduction Consider the following classical inventory model: [ T ] E t=1 c t(x t y t ) + b t [D t x t ] + + h t [x t D t ] + min x t y t s.t. y t+1 = x t D t, t = 1,..., T 1, (1.1) where D 1,..., D T is a (random) demand process, c t, b t, h t are the ordering, backorder penalty and holding costs per unit, respectively, y t is the inventory level and x t y t is the order quantity at time t. The initial inventory level y 1 is given. The decisions x t are viewed as control variables and y t as state variables. Precise meaning of problem (1.1) will be discussed later. We use capital letter for D t viewed as a random variable and d t for its particular realization through the paper. Assume that b t > c t 0, h t > 0, t = 1,..., T. Assuming further that the demand process is stagewise independent 1, it is well known that base-stock policy is optimal for problem (1.1) (e.g., [32]). These base-stock policies are time consistent in the sense that the optimal policy computed at the initial period of the decision process, before any realization of the demand process became available, remains optimal at the later periods. Recently considerable attention was attracted to risk averse and distributionally robust formulations of stochastic programs (e.g., [7],[9],[10],[13],[14],[15],[16],[28],[29],[30], and references therein). In the distributionally robust approach it is argued that the true distribution of the data process is never known exactly and this motivates to consider the worst-distribution approach for a specified family of probability distributions (probability measures). In the risk averse approach the expectation operator is replaced by a risk functional (risk measure) defined on an appropriate space of random variables. In the influential paper Artzner et al [2], it was suggested that a good risk measure should satisfy certain natural conditions (axioms) and such risk measures were called coherent. By duality arguments distributionally robust and risk averse approaches, with coherent risk measures, in a sense are equivalent to each other (e.g., [26, Section 6.3]). In a pioneering paper, Scarf [19] gave an elegant solution for the worst-distribution formulation in case of the static inventory model, T = 1, when only first and second order moments of the demand distribution are specified. An extension of such distributionally robust approach to the multi-period setting, when T > 1, is delicate. We show in this paper that in some distributionally robust formulations of the inventory model, apart from the base-stock policies, there exist infinitely many other optimal policies which are not time consistent. This is somewhat surprising since time inconsistency can happen in seemingly natural formulations already in the two period, T = 2, setting. 1 It is said that the demand process is stagewise independent if D t+1 is independent of (D 1,..., D t), t = 1,..., T 1. 1

When the employed risk functional has a nested form, an optimal policy is time consistent if and only if (iff) it satisfies the respective dynamic programming equations. Therefore time consistent policies always exist provided the dynamic programming equations have solutions. If an optimal policy is not time consistent, then it is inferior to the time consistent policies in the sense that for some realizations of the random data process it is strictly worse than the corresponding time consistent policy although both policies have the same optimal value from the first stage point of view (e.g., [25, Section 5.2]). In the risk neutral setting this does not happen since the expectation operator is strictly monotone. On the other hand, in distributionally robust multistage settings it could happen that there are many optimal solutions which do not satisfy the dynamic programming equations and are not time consistent. In the framework of robust objective, i.e. when using the max-type risk functional, it was shown in [3] that in certain settings affine policies (decision rules) are optimal, but do not satisfy the dynamic programming equations and are not time consistent. Such a phenomenon of time inconsistency is claimed to be by no means an exception, but rather a general fact, intrinsic in any robust multi-stage decision model (e.g., [8]). Time inconsistent policies were also explicitly constructed in an inventory setting in [8]). In the robust setting 2 a concept essentially equivalent to the time consistency was referred to as Pareto Efficiency in [12]. In this paper, we will extend the study to risk averse (distributionally robust) models and further show that time inconsistency is not unique to robust multistage decision making, but may happen for a large class of risk averse/distributionally robust settings. We will demonstrate this by using two inventory models: distributionally robust inventory models defined by the first n-th order moment constraints (Section 3.1), and risk averse inventory models associated with spectral risk measure functionals (Section 3.2). This paper is organized as follows. In Section 2, we give a general discussion of risk averse (distributionally robust) inventory models. Section 3 is devoted to specific constructions of such time inconsistent optimal policies. We also provide concluding remarks in Section 4. 2 Distributionally robust inventory model In this section, we provide a general discussion of risk averse (distributionally robust) inventory models of the form min x t y t [ T ] R t=1 c t(x t y t ) + b t [D t x t ] + + h t [x t D t ] + s.t. y t+1 = x t D t, t = 1,..., T 1. 2 By robust setting we mean the worst case setting, i.e., the max-type objective functional is used. (2.2) 2

Here R( ) is a real valued functional defined on a space of random variables. If R is the expectation operator, i.e., R := E, then (2.2) becomes the risk neutral formulation (1.1). Optimization in (2.2) is performed over policies x t = x t (d [t 1] ), t = 1,..., T, which are nonanticipative functions of the demand process 3. In the risk neutral setting the feasibility constraints of the corresponding problem should be satisfied for almost every (a.e.) realization of the random demand process. In the risk averse formulation this can be more involved since in some settings there is no reference probability measure and it does not make sense to talk about a.e. realization of the demand process. We will discuss this later. Assuming that the functional R can be represented in a nested form (e.g., [17]), it is possible to write the following dynamic programming equations for problem (2.2). At each time period t = T,..., 2, the cost-to-go function V t (y t ) is given by the optimal value of problem with V T +1 ( ) 0 and { min ct (x t y t ) + ρ t [ψ t (x t, D t ) + V t+1 (x t D t )] }, (2.3) x t y t ψ t (x t, d t ) := b t [d t x t ] + + h t [x t d t ] +. (2.4) For t = 1, Problem (2.3) represents the first period optimization problem and its optimal value coincides with the optimal value of Problem (2.2). We assume here that the cost-to-go functions V t ( ) in the above dynamic equations do not depend on history of the demand process. In the risk neutral setting this is ensured by the stagewise independence assumption. In the considered distributionally robust case it is often referred to as the rectangularity condition (e.g., [13], [16]). Consider intervals [α t, β t ] R +, t = 1,..., T, and denote by P t the set of all (Borel) probability measures on [α t, β t ]. The functionals ρ t are assumed to be of the form ρ t (Z) := sup P t M t E Pt [Z], t = 1,..., T, (2.5) where M t is a nonempty subset of P t. If the sets M t = { P t } are singletons for t = 1,..., T, then functionals ρ t = E Pt become the respective expectations. In that case, the functional R = E P is the expectation taken with respect to the product probability distribution P = P 1 P T, supported on the set S := [α 1, β 1 ] [α T, β T ], and equations (2.3) are the standard dynamic programming equations for the risk neutral formulation (1.1) under the assumption of stagewise independence (e.g., [32]). We consider two basic ways for constructing sets M t. In one approach we assume existence of a reference probability measure and the set M t is supposed to consist of probability measures 3 We denote by d [t] := (d 1,..., d t) history of the demand process up to time t. 3

absolutely continuous with respect to the reference measure. In another approach the sets M t are defined by moment constraints. In that case there is no reference measure. By writing that a property holds for every realization of a considered random variable we mean that it holds with probability one, or almost everywhere, with respect to the reference probability measure in the first setting, while in the second setting it supposed to hold for all possible realizations. We will discuss this later while considering specific examples. The risk functional R of Problem (2.2), associated with the dynamic equations (2.5), has the nested form R( ) = ρ 1 (ρ 2 D1 ( ρt D[T 1] ( ) )), (2.6) where E Pt D [t 1] denotes the conditional expectation with respect to the distribution P t conditional on D [t 1] and ρ t D[t 1] ( ) := sup P t M t E Pt D [t 1] [ ], t = 2,..., T. Such nested risk averse formulations of multistage programs are discussed in, e.g., [11],[17],[18],[26], and for inventory models in, e.g., [1],[5],[6]. It is also possible to write the functional R in the form R( ) = sup E P [ ], P M for some set M of probability measures on S. In general, the set M is different from the following set of probability measures M := {P = P 1 P t : P t M t, t = 1,..., T }, (2.7) and can be a complicated derivative of the sets M t used in definition of the functionals ρ t. This happens because of the nested structure of the risk functional R (for a detailed discussion of this phenomenon we can refer to [23]). Of course M = M in the risk neutral case when the sets M t are singletons. There are two other interesting cases where the sets M and M do coincide. Suppose that M t = P t, t = 1,..., T, i.e., M t is the set of all probability measures on the interval [α t, β t ]. Then ρ t is the supremum operator ρ t (Z t ) = sup(z t ) of Z t : [α t, β t ] R, and R(Z) = sup(z) of Z : S R is the respective supremum operator. In that case, M consists of all probability measures on S and coincides with the respective set M. Note that since here the considered functions of the demand are continuous, the respective maximum is attained and is finite and hence we can treat R as the maximum operator. This corresponds to the robust approach to multistage programming, we consider such an example of the maximum operator measure in Section 3.1. 4

As another example, suppose that the sets M t are defined by the first-order moment constraints, that is, M t := {P t P t : E Pt [D t ] = µ t } for some µ t [α t, β t ], t = 1,..., T. In that case, for any convex function φ t : R R the maximum of E Pt [φ t (D t )] over P t M t is attained at a unique probability measure supported on the two end points of the interval [α t, β t ], and hence M = M (e.g., [22]). We consider such an example in Section 3.1. It is also possible to consider situations where the sets M t are defined by general moment constraints, say by the first and second order moments. However, in such cases the set M is generally different from the set M, given by formula (2.7), and could have a complicated structure. In that respect we can note that a different version of distributionally robust inventory model, defined by the first and second order moment constraints, was studied in [31]. More specifically in [31] the set M is defined by a set of product measures as in (2.7), with the respective sets M t defined by the first and second order moments. In that setting it was demonstrated in [31] that the functional R may not be always equivalent to the nested form defined by the sets M t in the sense of having the same optimal value. As a consequence, Problem (2.2) may have no optimal policy of base-stock form. In Section 3.1 we consider a two-period moment constrained problem such that the risk functional R can be represented in a nested form, and there always exists an optimal base-stock policy. There is a long history of discussion of time consistency in multistage settings and there is no general consensus of its precise meaning. One intuitive approach is to ensure that a policy given by an optimal solution of Problem (2.2) remains optimal at all stages t = 1,..., T. This requires a precise definition of what optimality means at every stage t, conditional on the observed realization of the demand process up to the time t. In the risk neutral setting such conditional optimality criterion comes naturally as the respective conditional expectation. In the distributionally robust and risk averse settings the situation is more involved. In the recent literature on risk averse stochastic optimization, time consistency is often discussed from the point of view of employed conditional risk functionals and is related to their decomposability (see, e.g., [4] for a recent survey and references therein). In view of the nested form (2.6) of the considered risk functional, it is natural here to consider conditional optimality criteria of the nested form ( R t D[t 1] ( ) := ρ t D[t 1] ρt D[T 1] ( ) ), t = 2,..., T. This leads to the following notion of time consistency of an optimal policy. 5

Definition 2.1 Let π = { x 1,..., x T (d [T 1] )} be an optimal solution (optimal policy) of problem (2.2). It is said that policy π is time consistent if for every time period t = 2,..., T, the remaining policy { x t (d [t 1] ),..., x T (d [T 1] )} is optimal for the problem min x τ y τ [ T ] R t D[t 1] τ=t c τ (x τ y τ ) + b τ [D τ x τ ] + + h τ [x τ D τ ] + s.t. y τ+1 = x τ D τ, τ = t,..., T 1, conditional on observed realization D [t 1] = d [t 1] of the demand process. Naturally time consistency is related to the respective dynamic programming equations. Definition 2.2 It is said that policy π = { x 1,..., x T (d [T 1] )} satisfies the dynamic programming equations if for t = 1,..., T, x t arg min x t y t { ct (x t y t ) + ρ t [ψ t (x t, D t ) + V t+1 (x t D t )] }, conditional on D [t 1] = d [T 1] for every realization of the demand process. That is, at time t = 2,..., T we observed realization of the demand process, up to this time, and made decision x t 1 according to the considered policy π. Time consistency of π means that, looking into the future, conditional on the demand realization and our decision, policy π remains optimal with respect to the nested tail R t D[t 1] of our optimization criterion. Note that this tail optimization criterion is conditional on the observed realization of the demand process. In the risk neutral case when R := E, the corresponding tail risk measure is given by the respective conditional expectation. We refer to [26, Section 6.8.5] for a further discussion of these concepts in risk averse settings. A natural question, which we address here, is whether every optimal policy is time consistent. Note that each functional ρ t, defined in (2.5), is convex, positively homogeneous, translation equivariant and monotone. Monotonicity means that 4 if Z Z, then ρ t (Z) ρ t (Z ). We will need a stronger notion of monotonicity. Definition 2.3 We say that functional ρ t, of the form (2.5), is strictly monotone if Z Z and Z Z imply ρ t (Z) > ρ(z ). Any policy satisfying the dynamic programming equations is time consistent (in the sense of Definition 2.1); and conversely if an optimal policy is time consistent, then it satisfies the dynamic programming equations 5. Assuming that the considered risk functionals are monotone, it is shown 4 By writing Z Z we mean that this inequality holds for every realization of random variables Z and Z. 5 By saying that policy is optimal we mean that it is an optimal solution of the corresponding multistage problem. 6

in [26, Proposition 6.80] (see also [27]) that if the considered multistage problem has unique optimal solution (optimal policy), then this policy is time consistent and satisfies the respective dynamic programming equations. If the employed risk functionals are strictly monotone, then every optimal policy is time consistent and satisfies the respective dynamic programming equations. On the other hand, as it was discussed in [27] and [24], if the risk functionals are not strictly monotone, then there may exist optimal policies which are not time consistent and do not satisfy the dynamic programming equations. In the robust setting, when M t = P t, this was pointed out in [3] and such an example was explicitly constructed in [8]. As it was already pointed in Section 1, distributionally robust and risk averse approaches with coherent risk measures, in a sense are equivalent to each other. A popular coherent risk measure is the Average Value-at-Risk (AV@R) measure. This risk measure is not strictly monotone, and we consider an example of the AV@R measure in Section 3.2 in a general framework of spectral risk measures. For convex functionals, necessary and sufficient conditions for strict monotonicity are given in [27, Proposition 7] in terms of the respective subdifferentials. If the set M t is defined by a finite number of moment constraints (in particular if M t = P t ), and the interval [α t, β t ] is nondegenerate (i.e., α t < β t ), then the corresponding functional ρ t is never strictly monotone (cf., [24]). In the next section, we are going to demonstrate, in the setting of the inventory model, that time inconsistent optimal policies, which are no longer of base-stock form, exist already in the two period case (when T = 2), if the corresponding risk functional is not strictly monotone. This is somewhat surprising since time inconsistency can happen in seemingly natural formulations already in the two-period setting. In view of the analysis in [24], it seems that without the strict monotonicity, existence of an infinite number of such time inconsistent optimal policies rather is a rule than an exception (in the robust setting this was pointed out in [3],[8]). 3 Examples of time inconsistent optimal policies In this section we investigate existence of time inconsistent optimal policies in the case of two-period inventory model when T = 2. As in (2.7) we assume that the considered distributions of the demand vector (D 1, D 2 ) are of the form P 1 P 2 with P 1 M 1 and P 2 M 2 being marginal distributions of D 1 and D 2 on intervals [α 1, β 1 ] and [α 2, β 2 ] respectively. We consider two settings for defining the set M 1 and functional R. In Section 3.2, we assume existence of a reference probability measure on the interval [α 1, β 1 ]. As it was already mentioned in the previous section, in that case probabilistic statements are made with respect to the reference probability measure and the constraints in problem (3.8) (below) are understood to hold for almost every D 1. In Section 3.1 we assume that the set M 1 consists of all 7

probability measures on [α 1, β 1 ] satisfying the specified moment constraints. In those sections the constraints should be satisfied for all D 1 [α 1, β 1 ]. In both cases the set M 2 does not play an essential role and can be arbitrary. For T = 2 the corresponding problem (2.2) becomes (up to the constant c 1 y 1 ) min x 1,x 2 ( ) (c 1 c 2 )x 1 + R [ψ 1 (x 1, D 1 ) + c 2 (x 2 (D 1 ) + D 1 ) + ψ 2 (x 2 (D 1 ), D 2 )] s.t. x 1 y 1, x 2 (D 1 ) x 1 D 1. (3.8) Since R( ) = ρ 1 (ρ 2 D1 ( )) with functionals ρ 1 and ρ 2 of the form (2.5), the objective function in (3.8) can be written as (c 1 c 2 )x 1 + sup P 1 M 1 E P1 [ [ ψ 1 (x 1, D 1 ) + c 2 (x 2 (D 1 ) + D 1 ) + sup E P2 D 1 ψ2 (x 2 (D 1 ), D 2 ) ] ]. P 2 M 2 The second stage cost-to-go function V 2 (x 1, d 1 ) here is given by the optimal value of the problem { } min c 2 x 2 + sup E P2 [ψ 2 (x 2, D 2 )] s.t. x 2 x 1 d 1, (3.9) x 2 P 2 M 2 and the first stage problem is min (c 1 c 2 )x 1 + sup E P1 [c 2 D 1 + ψ 1 (x 1, D 1 ) + V 2 (x 1, D 1 )]. (3.10) x 1 y 1 P 1 M 1 We have that a policy ( x 1, x 2 (d 1 )) is an optimal solution of problem (3.8) iff x 1 is an optimal solution of problem (3.10) and x 2 ( ) is an optimal solution of [ ψ 1 ( x 1, D 1 ) + c 2 D 1 + c 2 x 2 (D 1 ) + sup min x 2 ( ) sup E P1 P 1 M 1 s.t. x 2 (D 1 ) x 1 D 1. [ E P2 D 1 ψ2 (x 2 (D 1 ), D 2 ) ] ] P 2 M 2 Such optimal policy is time consistent iff x 2 (D 1 ) is an optimal solution of the problem min x 2 ( ) (3.11) c 2 x 2 (D 1 ) + sup E P2 D 1 [ψ 2 (x 2 (D 1 ), D 2 )] P 2 M 2 (3.12) s.t. x 2 (D 1 ) x 1 D 1 for every realization of the random variable D 1. That is, an optimal policy ( x 1, x 2 (d 1 )) is time inconsistent iff x 2 (D 1 ) is not an optimal solution of (3.12) for some set of realizations of D 1. Let x 2 be an optimal solution of the unconstrained version (i.e., after removing the feasibility constraint x 2 x 1 d 1 ) of problem (3.9). If the unconstrained version of problem (3.9) has multiple optimal solutions, we assume that x 2 is the largest one when constructing time inconsistent policies. Then x 2 (d 1 ) := max{x 2, x 1 d 1 } is an optimal solution of the second stage problem (3.12). That is, the set of time consistent policies coincides with the basestock policies. (There is unique time 8

consistent policy if the first stage optimal solution x 1 is unique and the unconstrained version of problem (3.9) has unique optimal solution.) Therefore time consistent optimal policies always exist. Any time inconsistent optimal policy (if it exists) is inferior to the time consistent solutions in the sense that for some realizations of the demand D 1 the corresponding second stage cost is strictly worse than the one of time consistent policy. The time inconsistent optimal policies do not happen if the functional ρ 1 is strictly monotone. We are going to demonstrate on several seemingly natural examples that without strict monotonicity, besides time consistent, there may exist many optimal time inconsistent policies. By the above discussion this does not happen if problem (3.11) has unique optimal solution. Let x 1 be an optimal solution of the first stage optimization problem and consider function Ψ(x 2, d 1 ) := ψ 1 ( x 1, d 1 ) + c 2 d 1 + c 2 x 2 + sup P 2 M 2 E P2 [ ψ2 (x 2, D 2 )], (3.13) where ψ t (, ) are defined in (2.4). Note that the function Ψ(, ) is convex. We assume that the expectation, with respect to P 2 M 2, term in (3.13) is well defined and the corresponding supremum is finite valued for every considered x 2. Then optimal solutions (optimal policies) of the respective problem (3.8) are ( x 1, ˆx 2 (d 1 )) with ˆx 2 (d 1 ) being an optimal solution of the problem min x 2 ( ) R[Ψ(x 2 (D 1 ), D 1 )] s.t. x 2 (D 1 ) x 1 D 1, (3.14) where R(Z) = sup P M 1 E P [Z]. We consider two settings for defining the set M 1 and the associated functional R. In Section 3.2, we assume existence of a reference probability measure P on the interval [α 1, β 1 ]. As it was already mentioned, in that case the constraints in (3.14) are understood to hold for almost every D 1 with respect to the probability measure P (i.e., with probability one). In Section 3.1 we assume that the set M 1 consists of all probability measures on [α 1, β 1 ] satisfying the specified moment constraints. In those sections the constraints should be satisfied for all D 1 [α 1, β 1 ]. We write x 2 ( ) in order to emphasize that the minimization in (3.14) is performed over functions of d 1 [α 1, β 1 ]. By interchanging the minimization and risk functional R in (3.14), we obtain problem [ ] R min Ψ(x 2, D 1 ). (3.15) x 2 x 1 D 1 Then because of the monotonicity of R, problems (3.14) and (3.15) have the same optimal value, denoted ϑ (e.g., [24]). Here the optimal policy ( x 1, ˆx 2 (d 1 )) is time consistent if ˆx 2 (d 1 ) is optimal 9

for the second period problem for all d 1 [α 1, β 1 ]. That is the policy ( x 1, ˆx 2 (d 1 )) is time consistent if ˆx 2 (d 1 ) is an optimal solution of the second period problem for any realization d 1 [α 1, β 1 ]. min Ψ(x 2, d 1 ) (3.16) x 2 x 1 d 1 Let x 2 be the (largest) unconstrained minimizer of Ψ(x 2, d 1 ) over x 2 R as defined above. Note that x 2 is the minimizer of c 2x 2 + sup P2 M 2 E P2 [ ψ2 (x 2, D 2 )] and does not depend on d 1. Then x 2 (d 1 ) := max{ x 1 d 1, x 2} is the optimal solution of (3.16) for any d 1 [α 1, β 1 ]. Therefore, [ ] ϑ = R min Ψ(x 2, D 1 ) = R[Ψ( x 2 (D 1 ), D 1 )]. x 2 x 1 D 1 Consider the following min-function, associated with problem (3.16) and first stage solution x 1, where ϕ(d 1 ) := min x 2 x 1 d 1 Ψ(x 2, d 1 ) = Ψ( x 2 (d 1 ), d 1 ) = ψ 1 ( x 1, d 1 ) + c 2 d 1 + φ(d 1 ), (3.17) φ(d 1 ) := { } [ min c 2 x 2 + sup E P2 ψ2 (x 2, D 2 )]. x 2 x 1 d 1 P 2 M 2 [ Recall that it is assumed that sup E P2 ψ2 (, D 2 )] is finite, hence the functions φ( ) and ϕ( ) are P 2 M 2 finite valued. The minimization problem (3.16) can be written as minimization of the extended real valued function Ψ(x 2, d 1 ) := Ψ(x 2, d 1 ) if x 2 x 1 d 1, and Ψ(x 2, d 1 ) := + otherwise. Since Ψ(, ) and hence Ψ(, ) are convex, it follows that the function ϕ( ) is convex and hence is continuous. Remark 3.1 Note that if x 1 y 1 and a (measurable) function x 2 : [α 1, β 1 ] R is such that x 2 ( ) x 2 ( ), then the corresponding policy π = ( x 1, x 2 ( )) is feasible for Problem (3.8). Moreover, if this policy is optimal, i.e. R[Ψ( x 2 (D 1 ), D 1 )] = ϑ, then it is time consistent iff x 2 ( ) = x 2 ( ). If such optimal policy is not time consistent, i.e., x 2 ( ) x 2 ( ), then for any t (0, 1) the policy ( x 1, t x 2 ( )+(1 t) x 2 ( )) is also optimal and time inconsistent. Therefore if there exists one optimal time inconsistent policy, then there is an infinity number of such time inconsistent policies. In the rest of the section, we demonstrate existence of time inconsistent optimal policies for some examples of not strictly monotone R. Again M 2 can be arbitrary and we will focus on the discussion of M 1. 10

3.1 Moments constraints In this section we discuss existence of time inconsistent policies when the set M 1 of probability measures is defined by the first n-th moment constraints for any n 0. 3.1.1 The problem of moments Since in this section we deal only with the first stage interval, in order to simplify notation we drop the subscripts and write [α, β] for this interval. We recall in this section some classical results for the problem of moments. We address the question for what points µ := (µ 0, µ 1,..., µ n ) R n+1 there exists a measure P on the interval [α, β] satisfying the moment constraints β α t k dp (t) = µ k, k = 0,..., n. (3.18) If such a (nonnegative) measure P exists we say that it is a representing measure. By the Richter - Rogosinski Theorem we have that if such representing measure exists, then there exists representing measure P having support of no more than n + 1 points, i.e., P = l i=1 p iδ ti for some 6 t i [α, β] and l n + 1. Note that µ 0 is the normalization constant. That is, if P is a positive representing measure, then µ 0 > 0 and Q := µ 1 0 P is a probability measure satisfying the moment constraints E Q [D k ] = µ k /µ 0, k = 1,..., n. By C n+1 we denote the set of points µ R n+1 for which there exists at least one representing measure. Note that C n+1 includes the null vector corresponding to the zero measure. The set C n+1 is a convex closed cone. For µ R n+1 consider the following so-called Hankel matrices. For even n = 2m consider matrices defined as H 2m ( µ) = [µ i+j ] m i,j=0 and H 2m ( µ) = [(α + β)µ i+j+1 µ i+j+2 αβµ i+j ] m 1 i,j=0, and for odd n = 2m + 1 consider matrices defined as H 2m+1 ( µ) = [µ i+j+1 αµ i+j ] m i,j=0 and H 2m+1 ( µ) = [βµ i+j µ i+j+1 ] m i,j=0. Note that matrices H 2m ( µ) and H 2m ( µ) are of the respective orders (m + 1) (m + 1) and m m; and matrices H 2m+1 ( µ) and H 2m+1 ( µ) are of order (m + 1) (m + 1). In the following theorem we summarize classical results relevant to our problem (e.g., [20, Chapter 10]). Recall that the boundary of the convex closed cone C n+1 is the set C n+1 \ int(c n+1 ), where int(c n+1 ) denotes the interior of C n+1. 6 By δ t we denote measure of mass one at point t. 11

Theorem 3.1 (i) Point µ R n+1 belongs to the cone C n+1 iff the matrices H n ( µ) and H n ( µ) are positive semidefinite. (ii) Point µ R n+1 belongs to the interior of C n+1 iff the matrices H n ( µ) and H n ( µ) are positive definite. (iii) If µ R n+1 is a boundary point of C n+1, then the corresponding representing measure is unique. (iv) If µ is an interior point of C n+1, then for any d [α, β] there exists a representing measure P having a positive mass at the point d. For example consider n = 1. Then Hankel matrices are 1 1 matrices [µ 1 αµ 0 ] and [βµ 0 µ 1 ]. Hence the corresponding cone C 2 = {(µ 0, µ 1 ) : µ 1 αµ 0 0, βµ 0 µ 1 0}, which implies that µ 0 0. For µ 0 = 1 the corresponding representing set of probability measures is nonempty iff µ 1 [α, β]. The interior of C 2 is defined by the constraints µ 1 αµ 0 > 0 and βµ 0 µ 1 > 0, and boundary happens when µ 1 αµ 0 = 0 or βµ 0 µ 1 = 0. For n = 2 the corresponding Hankel matrices are [ ] µ0 µ 1 H 2 ( µ) = µ 1 µ 2 and H 2 ( µ) = [(α + β)µ 1 µ 2 αβµ 0 ], and hence C 3 = {(µ 0, µ 1, µ 2 ) : µ 0 0, µ 0 µ 2 µ 2 1 0, (α + β)µ 1 µ 2 αβµ 0 0}. For µ 0 = 1, in terms of the variance σ 2 := µ 2 µ 2 1, the corresponding set of feasible solutions is defined by the inequalities σ 2 0 and σ 2 (β µ 1 )(µ 1 α). Boundary solutions happen if µ 1 [α, β] and σ 2 = 0 or σ 2 = (β µ 1 )(µ 1 α). 3.1.2 Existence of time inconsistent policies We discuss now existence of time inconsistent policies when the set M 1 is defined by moment constraints E P [D k ] = µ k, k = 1,..., n, i.e., M 1 consists of the representing measures, satisfying (3.18), with µ 0 = 1. As before we use notation µ := (µ 0, µ 1,..., µ n ) R n+1, with µ 0 = 1. We assume that the problem of moments has a feasible solution, i.e., µ C n+1. If vector µ is a boundary point of the cone C n+1, then as it was pointed in Theorem 3.1(iii), the corresponding set M 1 = {P } is a singleton with P = l i=1 p iδ ti and l n + 1. Then R[Ψ(x 2 (D), D] = E P [Ψ(x 2 (D), D] = l p i Ψ(x 2 (t i ), t i ). (3.19) i=1 Let x 2 (d 1 ) := x 2 (d 1 ) for d 1 {t 1,..., t l } and x 2 (d 1 ) > x 2 (d 1 ) otherwise. Clearly policy x 2 ( ) is feasible, and because of (3.19) the corresponding optimal value for x 2 (d 1 ) and x 2 (d 1 ) is the same. Hence there is an infinite number of time inconsistent policies. Note that although here the set 12

M 1 is a singleton, and hence the problem becomes the risk neutral problem with respect to the corresponding distribution P, existence of time inconsistent policies follows because the distribution P has a finite support which is smaller than the whole interval [α 1, β 1 ]. Now suppose that µ is an interior point of C n+1. Consider function ϕ( ) defined in (3.17) and the problem max E P [ ϕ(d)] subject to E P [D k ] = µ k, k = 1,..., n. (3.20) This problem of maximization with respect to probability measures satisfying the moment constraints, has a dual which can be written as the following semi-infinite programming problem (e.g., [26, Section 6.6]) min z R n+1 z 0 + µ 1 z 1 +... + µ n z n s.t. ϕ(t) z 0 + tz 1 +... + t n z n, t [α 1, β 1 ]. (3.21) Since the interval [α 1, β 1 ] is a compact set, there is no duality gap between the primal problem (3.20) and its dual problems (3.21). Moreover, the set of optimal solutions of the dual problem (3.21) is nonempty and bounded iff µ is an interior point of C n+1 (e.g., [21, Proposition 3.4]). For a feasible point z of problem (3.21) consider the index set of active constraints (z) := {t [α 1, β 1 ] : ϕ(t) = z 0 + tz 1 +... + t n z n }. (3.22) We need the following auxiliary result about the set (z) before stating a necessary and sufficient condition for time inconsistency. Lemma 3.1 If ( z) = [α 1, β 1 ] for some optimal solution z of problem (3.21), then any P M 1 is an optimal solution of problem (3.20). Proof. that Suppose that ( z) = [α 1, β 1 ] for some optimal solution z. Then for any P M 1 we have E P [ ϕ(d)] = E P [ z 0 + z 1 D +... + z n D n ] = z 0 + z 1 µ 1 +... + z n µ n. That is, E P [ ϕ(d)] is the same for every P M 1, and hence every P M 1 is an optimal solution of problem (3.20). Now we state the first main result of the paper, which provides a necessary and sufficient condition for time inconsistency in the moment setting. Theorem 3.2 Suppose that µ is an interior point of C n+1. Then there exists an infinite number of optimal time inconsistent policies if and only if the dual problem (3.21) has an optimal solution z such that ( z) [α 1, β 1 ]. (3.23) 13

Proof. Let µ be an interior point of the cone C n+1, and hence the dual problem (3.21) possesses an optimal solution z. Suppose that condition (3.23) holds. Note that since the objective function of the dual problem is linear, the corresponding set ( z) of active constraints is nonempty. By the first order optimality conditions we have that a feasible point z of problem (3.21) is optimal iff there exist points t i ( z), and λ i 0, i = 1,..., k, with k n + 1, such that µ k λ i g i = 0, i=1 where g i := (1, t i,..., t n i ) Rn+1. Let us note that the set ( z) is closed, and by the assumption (3.23) we can choose z such that ( z) [α, β]. So we can choose a nonempty close interval I [α, β] \ ( z). Consider a policy x 2 ( ) such that x 2 (d 1 ) := x 2 (d 1 ) for all d 1 [α 1, β 1 ]\I, and x 2 (d 1 ) := x 2 (d 1 )+ε for d 1 I and some ε > 0. Clearly this policy is feasible and hence the value sup E P [Ψ( x 2 (D 1 ), D 1 )] (3.24) P M 1 is not less than ϑ. Let us show that for ε > 0 small enough actually the value of (3.24) is ϑ, and hence the policy with x 2 ( ) replaced by x 2 ( ) is also optimal. Consider function ϕ(d 1 ) := Ψ( x 2 (d 1 ), d 1 ) and the dual semi-infinite program of problem (3.24). We have that ϕ(d 1 ) = ϕ(d 1 ) for d 1 [α 1, β 1 ] \ I, and sup d1 I ϕ(d 1 ) ϕ(d 1 ) can be made arbitrary small for sufficiently small ε > 0. Since ϕ(t) < z 0 + t z 1 +... + t n z n for all t I, we can choose ε > 0 small enough such that ϕ(t) < z 0 + t z 1 +... + t n z n for all t I, and hence ϕ(t) z 0 + t z 1 +... + t n z n for all t [α 1, β 1 ]. It follows that z is a feasible point of the dual semi-infinite program of problem (3.24). Moreover, the corresponding set of active constraints, which is obtained by replacing ϕ in (3.22) with ϕ, does not change. By the first order optimality conditions the optimal solution z of problem (3.21) is also an optimal solution of the dual of problem (3.24), and hence these two semi-infinite programs have the same optimal value ϑ. It follows that the optimal value of problem (3.24) is also ϑ. This shows existence of an optimal solution different from x 2 ( ). Conversely suppose that condition (3.23) does not hold, which implies that ( z) = [α 1, β 1 ] for some dual optimal solution z. Then by Lemma 3.1, any P M 1 is optimal for the problem (3.20). We argue now by a contradiction. Suppose that there is an optimal time inconsistent policy ( x 1, x 2 ( )). It follows that there exists d 0 [α 1, β 1 ] such that x 2 (d 0 ) x 2 (d 0 ) and Ψ( x 2 (d 0 ), d 0 ) < Ψ( x 2 (d 0 ), d 0 ). Also by optimality of x 2 ( ) we have that Ψ( x 2 (d), d) Ψ( x 2 (d), d) for all d [α 1, β 1 ]. Moreover, by Theorem 3.1(iv), there exists P 0 M 1 that has a positive mass at the point d 0. Hence ϑ = E P0 [Ψ( x 2 (D 1 ), D 1 )] < E P0 [Ψ( x 2 (D 1 ), D 1 )] sup E P [Ψ( x 2 (D 1 ), D 1 )]. P M 1 14

This implies that the policy ( x 1, x 2 ( )) is not optimal, giving the designed contradiction. The assumption (3.23) could be violated only in rather specific cases. For example in the robust setting, which can be viewed as n = 0, problem (3.21) becomes the problem of minimization of z 0 subject to z 0 ϕ(t), t [α 1, β 1 ]. Since function ϕ( ) is not constant on the interval [α 1, β 1 ] this assumption always holds, and hence in the robust setting there always exists an infinite number of time inconsistent policies (cf., [8]). Corollary 3.1 In the robust setting (i.e., n = 0), there exists an infinite number of time inconsistent policies. Suppose now that n 1. The following corollary states that optimal time inconsistent policies always exist if the first stage problem solution belongs to (α 1, β 1 ). Corollary 3.2 Suppose that n 1 and the first stage problem (3.10) has an optimal solution x 1 such that α 1 < x 1 < β 1. Then there exists an infinite number of optimal time inconsistent policies. Proof. The subdifferential of the function ψ 1 ( x 1, ) is not a singleton at d 1 = x 1. By (3.17) it follows that ϕ(d 1 ) is not differentiable at d 1 = x 1. Since x 1 is an interior point of the interval [α 1, β 1 ], it follows that ϕ( ) cannot coincide with a polynomial on the [α 1, β 1 ]. The proof can be concluded by applying Theorem 3.2. Theorem 3.2 and Corollary 3.2 suggest that only when x 1 is equal to one of the edge points of the interval [α 1, β 1 ] or falls outside of the interval, it could happen that the problem does not have time inconsistent policies. We demonstrate this point through the following result. This shows a difference from the robust setting in which optimal time inconsistent policies always exist. Corollary 3.3 Suppose that n 1 and β 1 x 1 x 2. Then there are no optimal time inconsistent policies. Proof. Under the conditions, we have ϕ(d 1 ) = h 1 ( x 1 d 1 ) + c 2 d 1 + φ(0), (3.25) which is linear in d 1. Hence there exists an optimal solution of problem (3.21) that exactly coincides ϕ(d 1 ). Therefore, Assumption (3.23) is violated and the proof is complete. Finally, we would like to point out that the duality argument in the proof of Theorem 3.2 is quite general and does not rely on the specific inventory setting. 15

3.2 Spectral risk measures In this section we discuss existence of time inconsistent policies when the functional R is a spectral risk measure. We assume existence of a reference probability measure P, associated with demand D 1, and that the support of P is the interval [α 1, β 1 ], i.e., P([α 1, β 1 ]) = 1 and for any nonempty open set A [α 1, β 1 ] it follows that P(A) > 0. Spectral risk measure functional is defined as R(Z) := 1 0 σ(t)f 1 Z (t)dt, (3.26) where F Z (z) = P(Z z) is the cumulative distribution function (cdf) of Z (with respect to the reference measure P), F 1 Z (t) = inf{z : F Z(z) t} is the respective quantile, and σ : [0, 1) R + is a monotonically nondecreasing, right side continuous function such that 1 0 σ(t)dt = 1. For example 7 for σ( ) := (1 γ) 1 1 [γ,1] ( ), with γ [0, 1), this becomes the so-called Average Valueat-Risk measure. Spectral risk measures are law invariant and coherent in the sense of [2]. It is known that spectral risk measure (3.26) is strictly monotone iff the corresponding spectral function σ(t) is strictly positive for all t (0, 1) (e.g., [26]). In other words spectral risk measure (3.26) is not strictly monotone iff there exists γ (0, 1) such that σ(γ) = 0. In particular the AV@R γ risk measure is not strictly monotone for any γ (0, 1). Now we state the second main result of the paper, which provides general sufficient conditions for time inconsistency when the spectral risk measure functional is not strictly monotone. Recall that function ϕ( ), defined in (3.17), is convex and continuous. We make the following assumption. Assumption 3.1 The function ϕ( ) is not constant on any subinterval of the interval [α 1, β 1 ]. Theorem 3.3 Consider spectral risk measure (3.26). Suppose that σ(γ) = 0, for some γ (0, 1), and Assumption 3.1 holds. Then there exists an infinite number of optimal time inconsistent policies. Proof. Let F (z) := P( ϕ(d 1 ) z) be the cdf of random variable ϕ(d 1 ). Consider z := F 1 (γ). We have that P( ϕ(d 1 ) z ) γ and P( ϕ(d 1 ) < z ) γ. Since ϕ( ) is not constant on the interval {d 1 [α 1, β 1 ] : ϕ(d 1 ) z }, it follows that the set {d 1 [α 1, β 1 ] : ϕ(d 1 ) < z } is nonempty. Since the function ϕ is convex continuous, this set is an open interval. Let A be a nondegenerate closed subinterval of {d 1 [α 1, β 1 ] : ϕ(d 1 ) < z }. Since [α 1, β 1 ] is the support of P, it follows that P(A) > 0. We have then that there exists (measurable) function x 2 : [α 1, β 1 ] R such that x 2 (d 1 ) = x 2 (d 1 ) for all d 1 [α 1, β 1 ] \ A, and x 2 (d 1 ) > x 2 (d 2 ) and Ψ( x 2 (d 1 ), d 1 ) < z for all d 1 A. Consider the 7 By 1 A we denote the indicator function of set A. 16

cdf F ( ) of the random variable Ψ( x 2 (D 1 ), D 1 ), i.e., F (z) := P(Ψ( x2 (D 1 ), D 1 ) z). We have then that F 1 (t) = F 1 (t) for all t [γ, 1]. Combining with the monotonicity of σ(t), it follows that R( Z) = R( Z), where Z := ϕ(d 1 ) and Z := Ψ( x 2 (D 1 ), D 1 ). Thus the policy ( x 1, x 2 (d 1 )) has the same value as the policy ( x 1, x 2 (d 1 )), and hence is also optimal. Assumption 3.1 is a mild condition, it could be violated in rather specific cases. For example, this assumption holds if x 1 x 2 and h 1 > c 2. Indeed, under this conditions, one can check that ϕ(d 1 ) = h 1 ( x 1 d 1 ) + + b 1 (d 1 x 1 ) + + c 2 d 1 + φ(0), which satisfies Assumption 3.1. 4 Conclusion In this paper, we investigated optimal policies of distributionally robust (risk averse) inventory models. We demonstrated through several examples that if the respective risk measures are not strictly monotone, then there may exist infinitely many optimal policies which are not base-stock and not time consistent. This is in a sharp contrast with the risk neutral formulation of the inventory model where all optimal policies are time consistent. This extends previous studies of time inconsistency in the robust setting. Acknowledgement Work of the first author was partly supported by NSF grant 1633196 and DARPA EQUiPS program, grant SNL 014150709. References [1] S. Ahmed, U. Cakmak, and A. Shapiro, Coherent risk measures in inventory problems, European Journal of Operational Research, 182, 226-238, 2007. [2] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath, Coherent measures of risk, Mathematical Finance, 9, 203 228, 1999. [3] D. Bertsimas, D. Iancu and P. Parrilo, Optimality of Affine Policies in Multistage Robust Optimization, Mathematics of Operations Research, 35, 363-394, 2010. [4] T.R. Bielecki, I. Cialenco, M. Pitera, A survey of time consistency of dynamic risk measures and dynamic performance measures in discrete time: LM-measure perspective, Probability, Uncertainty and Quantitative Risk, 2(3), 1-52, 2017. 17

[5] X. Chen, M. Sim, D. Simchi-Levi, and P. Sun, Risk Aversion in Inventory Management, Operations Research, 55(5), 828-842, 2007. [6] X. Chen and P. Sun, Optimal structural policies for ambiguity and risk averse inventory and pricing models, SIAM Journal on Control and Optimization, 50(1), 133-146, 2012. [7] F. J. C. T. de Ruiter, R.C. M. Brekelmans and D. den Hertog, The impact of the existence of multiple adjustable robust solutions, Math. Program., Ser. A, 160, 531-545, 2016. [8] E. Delage and D. Iancu, Robust Multistage Decision Making, INFORMS Tutorials in Operations Research, http://dx.doi.org/10.1287/educ.2015.0139, 2015. [9] E. Delage and Y. Ye, Distributionally robust optimization under moment uncertainty with application to data-driven problems, Operations Research, 58, 595 6127, 2010. [10] P.M. Esfahani and D. Kuhn, 2017, Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations, Mathematical Programming, to appear. [11] D.A. Iancu, M. Petrik, and D. Subramanian, Tight approximations of dynamic risk measures, Mathematics of Operations Research, 40(3), 655-682, 2015. [12] D.A. Iancu and N. Trichakis, Pareto Efficiency in Robust Optimization, Management Science, 60(1), 130 147, 2014. [13] G.N. Iyengar, Robust dynamic programming, Mathematics of Operations Research, 30, 257 280, 2005. [14] R. Jiang and Y. Guan, Data-driven chance constrained stochastic program, Mathematical Programming, 158, 291-327, 2016. [15] H. Lam and S. Ghosh, Iterative methods for robust estimation under bivariate distributional uncertainty, Proceedings of the Winter Simulation Conference, 2013. [16] A. Nilim and L. El Ghaoui, Robust control of Markov decision processes with uncertain transition matrices, Operations Research, 53, 780 798, 2005. [17] A. Ruszczyński and A. Shapiro, Conditional risk mappings, Mathematics of Operations Research, 31, 544 561, 2006. [18] A. Ruszczyński, Risk-averse dynamic programming for Markov decision processes, Mathematical Programming, Series B, 125, 235 261, 2010. 18

[19] H. Scarf, A min-max solution of an inventory problem, Studies in the Mathematical Theory of Inventory and Production (Chapter 12), 1958. [20] K. Schmüdgen, The Moment Problem, Springer International Publishing AG, Switzerland, 2017. [21] A. Shapiro, On duality theory of conic linear problems, in Semi-Infinite Programming: Recent Advances, Miguel A. Goberna and Marco A. Lopez, Eds., Kluwer Academic Publishers, pp. 135 165, 2001. [22] A. Shapiro, Minimax and risk averse multi-stage stochastic programming, European Journal of Operational Research, 219, 719 726, 2012. [23] A. Shapiro, Rectangular sets of probability measures, Operations Research, 64, 528-541, 2016. [24] A. Shapiro, Interchangeability principle and dynamic equations in risk averse stochastic programming, Operations Research Letters, 45, 377-381, 2017. [25] A. Shapiro, Tutorial on risk neutral, distributionally robust and risk averse multistage stochastic programming, http://www.optimization-online.org/db FILE/2018/02/6455.pdf, 2018. [26] A. Shapiro, D. Dentcheva, and A. Ruszczyński, Lectures on Stochastic Programming: Modeling and Theory, Second Edition, SIAM, Philadelphia, 2014. [27] A. Shapiro and A. Pichler, Time and Dynamic Consistency of Risk Averse Stochastic Programs, http://www.optimization-online.org/db HTML/2016/09/5654.html, 2016. [28] Z. Wang, P.W. Glynn, and Y. Ye, Likelihood robust optimization for data-driven problems, Computational Management Science, 13, 241-261, 2016. [29] W. Wiesemann, D. Kuhn, and B. Rustem, Robust Markov decision processes, Mathematics of Operations Research, 38, 153 183, 2013. [30] W. Wiesemann, D. Kuhn, and M. Sim, Distributionally robust convex optimization, Operations Research, 62, 1358 1376, 2014. [31] L. Xin, D.A. Goldberg, and A. Shapiro, Time (in) consistency of multistage distributionally robust inventory models with moment constraints, arxiv preprint arxiv:1304.3074, 2013. [32] P.H. Zipkin, Foundations of Inventory Management, McGraw-Hill, Boston, 2000. 19