Risk Measures. A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia , USA ICSP 2016

Size: px

Start display at page:

Download "Risk Measures. A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia , USA ICSP 2016"

Martha Carpenter
6 years ago
Views:

1 Risk Measures A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia , USA ICSP 2016

2 Min-max (distributionally robust) approach to stochastic programming Min x X { } f(x) := sup E µ [F (x, ω)], µ M where X R n, F : R n Ω R and M is a set of probability measures (distributions) on the sample space (Ω, F). Optimization of mean-risk models: Min ρ[f x(ω)], x X where ρ : Z R is a mean-risk function, Z is a (linear) space of allowable functions Z(ω) and F x ( ) = F (x, ) Z for all x X. 1

3 Axiomatic approach (coherent measures of risk), by Artzner, Delbaen, Eber, Heath (1999): (A1) Convexity: ρ(αz 1 + (1 α)z 2 ) αρ(z 1 ) + (1 α)ρ(z 2 ) for all Z 1, Z 2 Z and α [0, 1]. (A2) Monotonicity: If Z 1, Z 2 Z and Z 2 Z 1, then ρ(z 2 ) ρ(z 1 ). (A3) Translation Equivariance: If a R and Z Z, then ρ(z + a) = ρ(z) + a. (A4) Positive Homogeneity: ρ(αz) = αρ(z), Z Z, α > 0. If ρ satisfies (A1)-(A3), it is said that ρ is a convex risk measure. 2

4 Space Z := L p (Ω, F, P ), where P is a (reference) probability measure on (Ω, F) and p [1, ). That is, Z is the space of random variables Z(ω) having finite p-th order moment. Space Z is paired with its dual space Z = L q (Ω, F, P ), where 1/p+1/q = 1, and the scalar product (bilinear form) ζ, Z := ζ(ω)z(ω)dp (ω), ζ Ω Z, Z Z. We also consider space Z := L (Ω, F, P ), of essentially bounded (measurable) functions Z : Ω R, paired with space L 1 (Ω, F, P ). Theorem 1 (i) If ρ : Z R satisfies axioms (A1) (convexity) and (A2) (monotonicity), then ρ( ) is continuous in the norm topology of the space Z = L p (Ω, F, P ), p [1, ]. (ii) If ρ : Z R, p [1, ), is proper, satisfies axioms (A1) - (A3) and its domain {Z Z : ρ(z) < + } has nonempty interior, then ρ is finite valued and continuous on Z. 3

5 Dual representation of risk functions By Fenchel-Moreau theorem if ρ : Z R is convex (assumption (A1)) and lower semicontinuous, then where ρ(z) = sup ζ A { ζ, Z ρ (ζ)}, ρ (ζ) := sup Z Z { ζ, Z ρ(z)}, A := dom(ρ ) = {ζ Z : ρ (ζ) < + }. It is possible to show that condition (A2) (monotonicity) holds iff ζ 0 for every ζ A. Condition (A3) (translation equivariance) holds iff Ω ζdp = 1 for every ζ A. If ρ is positively homogeneous, then ρ (ζ) = 0 for every ζ A. 4

6 If conditions (A1) (A4) hold, then A is a set of density functions and ρ(z) = sup µ Q E µ [Z], where Q = {µ : dµ = ζdp, ζ A} is a set of absolutely continuous with respect to P probability measures. Consequently, problem Min x X ρ[f (x, ω)] can be represented as a min-max problem. 5

7 Average Value-at-Risk (also called Conditional Value-at-Risk) { AV@R α (Z) := inf t + α 1 } E[Z t] + t R Note that the minimum in the above is attained at t = HZ 1 (1 α), where H Z(t) := Pr(Z t) is the cdf of Z and Indeed HZ 1 (1 α) = V@R α(z) = inf{t : H Z (t) 1 α}. (t+α 1 E[Z t] + ) t = 1 + α 1 E { } [Z t]+ t 1 + α 1 Pr(Z t) = 1 + α 1 (1 H Z (t)). = 6

8 Also α (Z) = 1 α 1 1 α H 1 Z (t)dt. If H Z (z) is continuous at z = H 1 (1 α), then Z AV@R α (Z) = E [ Z Z H 1 Z (1 α)]. Note that AV@R α (Z) HZ 1 (1 α), and the following inequalities are equivalent V@R α (Z γ) 0, V@R α (Z) γ 0, HZ 1 1 α H Z(γ), Pr(Z γ) 1 α, Pr(Z > γ) α. 7

9 The risk measure ρ(z) = AV@R α (Z) is coherent. It is natural to take here Z = L 1 (Ω, F, P ) and Z = L (Ω, F, P ). The dual representation holds with A = { ζ Z : 0 ζ(ω) α 1 a.e. ω Ω, Ω ζ(ω)dp (ω) = 1}. Note that as α 0, AV@R α (Z) converges to ess sup Z(ω) := inf { sup ω Ω Z (ω) : Z (ω) = Z(ω) a.e. ω Ω }. So AV@R 0 ( ) should be defined on the space Z = L (Ω, F, P ). 8

10 Example Mean-variance risk function (c > 0): ρ(z) := E[Z] + c Var[Z], Z Z := L 2 (Ω, F, P ). Dual representation: ρ(z) = sup ζ Z, E[ζ]=1 { ζ, Z (4c) 1 Var[ζ] }. Satisfies (A1) and (A3), does not satisfy (A2) and (A4). Example Mean-upper-semideviation risk function of order p [1, + ), Z Z := L p (Ω, F, P ), c 0 and ρ(z) := E[Z] + c ( E { [Z E[Z]] p +}) 1/p. Here ρ satisfies (A1),(A3),(A4), and also (A2) (monotonicity) if c 1. The max-representation ρ(z) = sup ζ A Ω Z(ω)ζ(ω)dP (ω) holds with A = {ζ : ζ = 1 + h Ω hdp, h q c, h 0}. 9

11 It is said that a risk measure ρ : X R is law invariant, with respect to the reference distribution P, if for any distributionally equivalent Z, Z Z (i.e., Pr(Z t) = Pr(Z t) for all t R), it follows that ρ(z) = ρ(z ). That is, law invariant risk measure ρ(z) is a function of the cdf H Z of random variable Z. It is said that a probability space (Ω, F, P ) is standard if Ω = [0, 1] equipped with its Borel sigma algebra F and uniform probability measure P. Standard probability space is atomless. Two random variables Z and Z defined on the standard probability space are distributionally equivalent iff there exists a measure preserving transformation T : Ω Ω such that Z (ω) = Z(T (ω)). In particular Z(ω) is distributionally equivalent to HZ 1 (ω) = inf{t : H Z (t) ω}. (A measurable T : Ω Ω is measure preserving if it is one-to-one, onto and P (T (A)) = P (A) for any A F.) 10

12 We say that σ : [0, 1) R + is a spectral function if σ( ) is right side continuous, monotonically nondecreasing and such that 1 0 σ(t)dt = 1. Note that any spectral function is a probability density function. Let (Ω, F, P ) be the standard probability space, Z = L p (Ω, F, P ) and ρ : Z R be a law invariant coherent risk measure. It has the dual representation ρ(z) = sup ζ A 1 0 ζ(t)z(t)dt, Z Z. Note that ρ is law invariant iff the set A invariant with respect to measure-preserving transformations. Since Z D HZ 1, by law invariance of ρ we have ρ(z) = ρ(hz 1 ). 11

13 We say that a set Υ Z of spectral functions is a generating set of ρ if If Υ = {σ} is a singleton, 1 ρ(z) = sup σ Υ 0 σ(t)h 1 Z (t)dt, Z Z. 1 ρ(z) = 0 σ(t)h 1 Z (t)dt, Z Z. is said to be a spectral risk measure. It is said that ζ A is an exposed point of A if there exists Z Z such that Ω ζzdp attains its maximum over ζ A at the unique point ζ. In a sense minimal generating set is given by the intersection of the set of exposed points of A with the set of spectral functions. 12

14 Note that 1 α (Z) = 1 1 α 1 α H 1 Z (t)dt. is spectral risk measure with spectral function σ = 1 1 α 1 [α,1]. A convex combination ρ(z) := λ 1 AV@R 1 α1 (Z) λ m AV@R 1 αm (Z), of Average Value-at Risk measures is also spectral risk measure with spectral function σ = m i=1 λ i (1 α i ) 1 1 [αi,1]. 13

15 For a probability measure (distribution) µ on the interval [0, 1) consider spectral function t σ(t) := (1 0 α) 1 dµ(α), t [0, 1). This equation defines a mapping σ = Tµ from the set of probability measures on [0, 1) to the set of spectral functions. The mapping T is one-to-one and onto and its inverse is (T 1 σ)(α) = (1 α)σ(α) + For σ = Tµ we have σ(t)h 1 (t)dt = (1 α) t 0 α H 1 (t)dtdµ(α) = α 0 σ(t)dt, α [0, 1). (1) 0 (1 α) 1 H 1 (t)dµ(α)dt = 1 0 AV@R 1 α(h)dµ(α). 14

16 This leads to the Kusuoka representation ρ(z) = sup µ M 1 0 AV@R 1 α(z)dµ(α), where M is a set of probability measures on [0,1) given by M = T 1 (Υ). For example, consider the absolute semideviation risk measure { [Z ] } ρ(z) := E[Z] + ce E[Z], c [0, 1]. Its (minimal) Kusuoka representation is ρ(z) = sup {(1 cα)av@r 1 (Z) + cαav@r α (Z)}. α (0,1) + 15

17 Statistical properties of risk measures Let ρ : Z R be a law invariant risk measure. Recall that we can consider it as a function ρ(h) of cdf H = H Z. Now let Z 1,..., Z N be an iid sample of the random variable Z. Then we can estimate ρ(h) by replacing H with the empirical cdf (empirical measure) Ĥ N = N 1 N j=1 δ(z j ). For example, for ρ = AV@R 1 α we have { ρ(ĥ N ) = inf t R = inf t R t α E Ĥ [Z t] N + t + 1 (1 α)n N j=1 } [Z j t] + An optimal solution of the above problem is t = Z (k), where Z (1)... Z (N) are the respective order statistics and k is the smallest integer αn. In particular if N < (1 α) 1, then k = N. In that case ρ(ĥ N ) = t = max 1 j N Z j.. 16

18 Law of Large Numbers for risk measures Theorem 2 Let ρ : Z R be a real valued law invariant convex risk measure and Ĥ N be the empirical cdf associated with an iid sample of Z Z. Then ρ(ĥ N ) converges to ρ(z) w.p.1 as N. What about rate of convergence? Consider θ := AV@R 1 α (Z) and its sample estimate ˆθ N based oh iid sample of size N. We have that ˆθ N θ w.p.1 as N. Central Limit Theorem: N 1/2(ˆθ N θ ) converges in distribution to normal N (0, σ 2 ), where σ 2 = (1 α) 2 Var ( [Z t ) ] + with t = HZ 1 (α), provided this quantile is unique. 17

19 Moreover E[ˆθ N ] < θ and E[ˆθ N ] θ α = 2Nh(t ) + o(n 1 ), where h(t ) = H Z (t )/ t is the corresponding density, provided it exists and h(t ) 0. Example. Consider the exponential distribution H(z) = 1 e z, z 0. Then H 1 (α) = log(1 α) 1 and θ = 1 + log(1 α) 1. For N < (1 α) 1, Pr ( ˆθ N θ ε ) = 1 [1 e 1 ε (1 α)] N. Suppose that N is equal to the integer part of (1 α) 1. Then lim inf α 1 Pr (ˆθ N θ + ε ) 1 e e 1 ε.

20 We also need a type of uniform LLN. Consider a random function F (x, ξ) depending on random vector ξ = ξ(ω), and a law invariant convex risk measure ρ : Z R. Suppose that: For every x R n the random variable F x (ξ) = F (x, ξ) belongs to the space Z. Consider the composite function φ(x) = ρ(f x ). Let ξ j, j = 1,..., N, be an iid sample of random vector ξ, and Ĥ xn be the empirical cdf associated with the sample F (x, ξ 1 ),..., F (x, ξ N ). Then ˆφ N (x) = ρ(ĥ xn ) is a sample estimate of φ(x). 19

21 The problem Min {φ(x) = ρ(f x)} x X Can be approximated by the problem Min x X ˆφ N (x). Theorem 3 Suppose that: (i) F x Z for all x R n, (ii) for a.e. ξ the function F (, ξ) is convex, (iii) ρ : Z R is law invariant and convex. Then the composite functions φ : R n R and ˆφ N : R n R are convex and ˆφ N ( ) converges to φ( ) w.p.1 uniformly on any compact set C R n. 20

22 Two-stage risk averse stochastic programs. First stage problem Min ρ(f x(ω)), x X where F x (ω) = F (x, ω) is given by the optimal value of the second stage problem: Min g(x, y, ω), y G(x,ω) where g : R n R m Ω R is a random lower semicontinuous function and G : R n Ω R m is a closed valued measurable multifunction. For example, g(x, y, ω) := c T x + q(ω) T y and G(x, ω) := { y R m : T (ω)x + W (ω)y b(ω) }. 21

23 Interchangeability principle for risk measures Let ρ : Z R be a convex risk measure and G y (ω) = G(y, ω) be such that inf y Y G y (ω) Z, Y R m. Then ρ ( inf G y(ω) y Y ) = inf y( ) Y ρ( G(y(ω), ω) ). By the interchangeability principle the two stage problem can be written in the following equivalent form Min x X, y( ) ρ[g(x, y(ω), ω)] subject to y(ω) G(x, ω) a.e. ω Ω. How this can be extended to a dynamic process (multistage programming)? 22

24 With every law invariant risk measure ρ(z) we can associate the respective conditional risk measure, denoted ρ(z Y ) or ρ Y (Z), conditional on random variable Y, by employing conditional distribution of Z given Y. Note that ρ(z Y ) is a function of Y and we can consider the composite risk measure ρ(ρ(z Y )). For example, for ρ(z) := E[Z] and ρ(z Y ) = E[Z Y ] we have E[E[Z Y ]] = E[Z]. If Z and Y are independent, then ρ Y (Z) = ρ(z). 23

25 Examples With the mean-upper-semideviation risk measure we can associate its conditional version ρ Y (Z) = E Y [Z] + c ( E Y { [Z E Y [Z]] p +}) 1/p. Conditional version of the Average Value-at-Risk: AV@R α Y (Z) = inf E { Y t + α 1 } [Z t] +. t R The minimum in the above is attained at t = { (1 α)-quantile of the conditional distribution of Z given Y }. Of course, t is a function of Y here. 24

26 Risk neutral multistage stochastic programming. Let ξ 1,..., ξ T be random data process, with ξ 1 being deterministic. Denote ξ [t] := (ξ 1,..., ξ t ). Nested formulation of multistage stochastic programming problem: [ Min F 1 (x 1 ) + E ξ1 inf F 2(x 2, ξ 2 ) +... x 1 X 1 [ x 2 X 2 (x 1,ξ 2 ) +E ξ[t inf F 2] T 1(x T 1, ξ T 1 ) x T 1 X T (x T 2,ξ T 1 ) +E ξ[t [ inf F 1] T (x T, ξ T )] ]]. x T X T (x T 1,ξ T ) Here F t : R n t R d t R are real valued functions and X t : R n t 1 R d t R n t, t = 2,..., T, are multifunctions. 25

27 For example F t (x t, ξ t ) := c T t x t, X t (x t 1, ξ t ) := {x t 1 : B t x t 1 + A t x t b t }, t = 2,..., T, X 1 := {x 1 : A 1 x 1 b 1 }, with ξ t = (c t, A t, B t, b t ), corresponds to linear multistage stochastic programming. Note that E[Z] = E ξ1 [E ξ[2] [ E ξ[t 1] [Z] ]]. 26

28 This decomposition property of the expectation operator and interchangeability of the expectation and minimization operators allows to write the nested formulation in the equivalent form ( ) ] Min E[ F 1 (x 1 ) + F 2 (x 2 (ξ [2] ), ξ 2 ) + + F T xt (ξ [T ] ), ξ T x 1,x 2 ( ),...,x T ( ) s.t. x 1 X 1, x t (ξ [t] ) X t (x t 1 (ξ [t 1] ), ξ t ), t = 2,..., T. The optimization is performed over (nonanticipative) policies x 1, x 2 (ξ [2] ),..., x T (ξ [T ] ) satisfying the feasibility constraints, and the feasibility constraints should be satisfied for almost every (a.e.) realization of the random data, i.e., they should hold with probability one (w.p.1). 27

29 Dynamic Programming Equations. For the last period T we have Q T (x T 1, ξ T ) := inf F T (x T, ξ T ), x T X T (x T 1,ξ T ) Q T (x T 1, ξ [T 1] ) := E ξ[t 1] [Q T (x T 1, ξ T )], and for t = T 1,..., 2, where Q t ( xt 1, ξ [t] ) = inf x t X t (x t 1,ξ t ) { Ft (x t, ξ t ) + Q t+1 ( xt, ξ [t] ) }, Q t+1 ( xt, ξ [t] ) := E ξ[t] { Qt+1 ( xt, ξ [t+1] )}. Finally, at the first stage we solve the problem Min x1 X 1 F 1 (x 1 ) + E[Q 2 (x 1, ξ 2 )]. 28

30 It is said that the data process is stagewise independent if random vector ξ t+1 is independent of ξ [t] = (ξ 1,..., ξ t ), t = 1,..., T 1. In case of stagewise independence, by induction in t = T,..., it is possible to show that cost-to-go functions Q t (x t 1 ) do not depend on the data process. The dynamic programming equations take the form where Q t ( xt 1, ξ t ) = { } inf Ft (x t, ξ t ) + Q t+1 (x t ), x t X t (x t 1,ξ t ) Q t+1 (x t ) = E { Q t+1 ( xt, ξ t+1 )}. 29

31 Risk averse multistage programming. Nested formulation of risk averse multistage programming problem: [ Min F 1 (x 1 ) + ρ ξ1 inf F 2(x 2, ξ 2 ) +... x 1 X 1 [ x 2 X 2 (x 1,ξ 2 ) +ρ ξ[t inf F 2] T 1(x T 1, ξ T 1 ) x T 1 X T (x T 2,ξ T 1 ) +ρ ξ[t [ inf F 1] T (x T, ξ T )] ]], x T X T (x T 1,ξ T ) where ρ ξ[t] ( ), t = 1,..., T 1, are conditional law invariant coherent (convex) risk measures. For example ρ ξ[t] ( ) := λe ξ[t] [ ] + (1 λ)av@r α ξ[t] [ ] is a convex combination of the conditional expectation and conditional Average Value-at-risk measure. 30

32 We can write the risk averse multistage programming problem as Min ρ [ ( ) ] F 1 (x 1 ) + F 2 (x 2 (ξ [2] ), ξ 2 ) + + F T xt (ξ [T ] ), ξ T x 1,x 2 ( ),...,x T ( ) where s.t. x 1 X 1, x t (ξ [t] ) X t (x t 1 (ξ [t 1] ), ξ t ), t = 2,..., T, ρ(z Z T ) = ρ ξ1 ( ρ ξ[2] ( ρ ξ[t 1] (Z Z T ) )) = Z 1 + ρ ξ1 ( Z 2 + ρ ξ[2] ( + ρ ξ[t 1] (Z T ) )) is the corresponding composite risk measure. The optimization is performed over (nonanticipative) policies x 1, x 2 (ξ [2] ),..., x T (ξ [T ] ) satisfying the feasibility constraints. 31

33 If ρ ξ[t] ( ) := E ξ[t] ( ) are conditional expectations, then ρ( ) = E( ). In that case this becomes the risk neutral stochastic programming. If ρ ξ[t] ( ) := ess sup( ) = AV@R 0 ξ[t] ( ), This case corresponds to multistage ro- then ρ( ) = AV@R 0 ( ). bust optimization. Let ρ be a law invariant coherent risk measure. It turns out that only ρ( ) := E( ) and ρ( ) := ess sup( ) risk measures have the decomposition property ρ(ρ Y (Z)) = ρ(z), Z Z. 32

34 Distributionally robust multistage stochastic programming. We can write the risk neutral multistage problem as Min E P [Z π ], (2) π Π where P is the probability distribution of random vector ξ [T ] = (ξ 1,..., ξ T ), Π is a set of policies satisfying the feasibility constraints x 1 X 1, x t (ξ [t] ) X t (x t 1 (ξ [t 1] ), ξ t ), t = 2,..., T 1, and Z π = Z π (ξ [T ] ) is defined as Z π := f 1 (x 1 ) + f 2 (x 2 (ξ [2] ), ξ 2 ) + + f T (x T (ξ [T ] ), ξ T ). 33

35 It looks natural to formulate the following distributionally robust analogue of problem (2). Consider a set M of probability distributions of ξ [T ] supported on a set Ξ R d 1 R d T equipped with its Borel sigma algebra B, and the problem Min sup E Q [Z π ]. (3) π Π Q M However, there is a problem with formulation (3). The expectation operator has the following property (recall that ξ 1 is deterministic) E[Z] = E ξ1 [E ξ[2] [ E ξ[t 1] (Z) ]]. 34

36 For Z = Z(ξ [T ] ) Z and Q M we have that E Q [Z] = E Q [ E Q ξ[2] [ ]] E Q ξ[t [Z], 1] and hence [ [ sup E Q [Z] sup E Q sup E Q ξ[2] Q M Q M Q M sup E Q ξ[t [Z]] ]. (4) Q M 1] In a certain sense the the right hand side of (4) represents the tightest upper bound for the left hand side among all possible coherent and time consistent upper bounds. 35

37 Rectangularity concept. Let Z be a subset of the space Z. We say that a set M of probability measures on (Ξ, B) is a rectangular set, associated with the sets M and Z, if [ [ sup E Q [Z] = sup E Q sup E Q ξ[1] Q M Q M Q M sup E Q ξ[t [Z]] ], Z Z. Q M 1] In particular, if this holds for M = M we say that the set M is rectangular (with respect to Z). 36

38 Theorem 4 Suppose that the right hand side of (4) is finite for all Z Z. Then there exists a bounded set M Z of probability measures which is rectangular with respect to M and Z. Theorem 5 Suppose that the set M is convex bounded and weakly closed, and let M Z be a rectangular set with respect to M and Z such that M M. Then M is rectangular with respect to Z iff M = M. It could be noted that taking weak closure of convex hull of the set M does not change optimal value of the original problem. Therefore the condition for M to be convex and weakly closed is not that restrictive. Of course, if the set Ξ is finite and hence the spaces Z and Z are finite dimensional, then the weak topology is the same as the standard topology of Z. 37

39 The analysis simplifies if we assume that the set M consists of product measures. That is, M := {Q = Q 1 Q T : Q t M t, t = 1,..., T }, where Ξ = Ξ 1 Ξ T and M t is a set of probability measures on (Ξ t, B t ), t = 1,..., T. If we view ξ 1,..., ξ T as a random process having distribution Q M, then this means that random vectors ξ t, t = 1,..., T, are mutually independent with respective marginal distributions Q t M t. For M Q = Q 1 Q T we can write E Q ξ[t 1] [Z] = and hence sup E Q [Z] Q M sup Q 1 M 1 E Q1 Ξ T Z(ξ [T 1], ξ T )dq T (ξ T ) =: E QT ξ [T 1] [Z], [ sup Q 2 M 2 E Q2 ξ [1] [ ] ] sup E QT ξ [Z]. Q T M [T 1] T 38

40 Dynamic Programming Equations. For the last period T we have Q T (x T 1, ξ T ) := inf F T (x T, ξ T ), x T X T (x T 1,ξ T ) Q T (x T 1, ξ [T 1] ) := ρ ξ[t 1] [Q T (x T 1, ξ T )], and for t = T 1,..., 2, where Q t ( xt 1, ξ [t] ) = inf x t X t (x t 1,ξ t ) { Ft (x t, ξ t ) + Q t+1 ( xt, ξ [t] ) }, Q t+1 ( xt, ξ [t] ) := ρ ξ[t] { Qt+1 ( xt, ξ [t+1] )}. Finally, at the first stage we solve the problem Min x1 X 1 F 1 (x 1 ) + ρ ξ1 [Q 2 (x 1, ξ 2 )]. 39

41 In case of stagewise independence, the cost-to-go functions Q t (x t 1 ) do not depend on the data process, and dynamic programming equations take the form ( ) { } Q t xt 1, ξ t = inf Ft (x t, ξ t ) + Q t+1 (x t ), x t X t (x t 1,ξ t ) t = T,..., 2, where Q t+1 ( xt, ξ [t] ) := ρ { Qt+1 ( xt, ξ t+1 )}, with Q T +1 ( ) 0. Finally, at the first stage we solve the problem Min F 1 (x 1 ) + ρ[q 2 (x 1, ξ 2 )]. x 1 X 1 40

42 Time consistency of stochastic programming problems Consider a multiperiod stochastic program Min x 1,x 2 ( ),...,x T ( ) ϱ( F 1 (x 1 ), F 2 (x 2 (ξ [2] ), ξ 2 ),..., F T ( xt (ξ [T ] ), ξ T ) ) s.t. x 1 X 1, x t (ξ [t] ) X t (x t 1 (ξ [t 1] ), ξ t ), t = 2,..., T, where ϱ : Z 1 Z 2 Z T R is a multiperiod risk measure. Is it time consistent? For example is Min AV@R ( ( ) ) α F1 (x 1 ) + + F T xt (ξ [T ] ), ξ T x 1,x 2 ( ),...,x T ( ) s.t. x 1 X 1, x t (ξ [t] ) X t (x t 1 (ξ [t 1] ), ξ t ), t = 2,..., T, time consistent? Note that for α (0, 1), ( AV@R α ( ) AV@R α ξ1 (AV@R α ξ[2] AV@Rα ξ[t ( ) )). 1] 41

43 Principle of conditional optimality. At every state of the system, optimality of our decisions should not depend on scenarios which we already know cannot happen in the future. Let us consider the following example. Let ϱ(z 1,..., Z T ) := Z 1 + Tt=2 AV@R α (Z t ). We can write ϱ(z 1,..., Z T ) := inf E { Z r 2,...,r 1 + ( T T t=2 rt + α 1 )} [Z t r t ] +. The objective function of the corresponding multiperiod optimization problem can be written as F 1 (x 1 ) + T t=2 r t + E T t=2 α 1 [F t (x t, ξ t ) r t ] + with additional variables r = (r 2,..., r T ). The problem can be viewed as a standard multistage stochastic program with (r, x 1 ) being first stage decision variables., 42

44 Dynamic programming equations: the cost-to-go function Q t (x t 1, r t,..., r T, ξ [t] ) is given by the optimal value of the problem Min x t X t (x t 1,ξ t ) α 1 [F t (x t, ξ t ) r t ] + + E ξ[t] [ Qt+1 (x t, r t,..., r T, ξ [t] ) ]. Although it is possible to write dynamic programming equations for this problem, note that decision variables r 2,..., r T are decided at the first stage and their optimal values depend on all scenarios starting at the root node at stage t = 1. Consequently optimal decisions at later stages depend on scenarios other than following a considered node. That is, here the principle of conditional optimality does not hold. 43

45 A well known quotation of Bellman: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. In order to make this precise we have to define what do we mean by saying that an optimal policy remains optimal at every stage of the process conditional on an observed realization of the data process. 44

46 We assume that with the multiperiod problem is associated a sequence of mappings ϱ t,t : Z t Z T Z t, t = 1,..., T. We say that {ϱ t,t } is a sequence of multiperiod mappings if it satisfies the condition of monotonicity: if Z, Z Z t Z T and Z Z, then ϱ t,t (Z) ϱ t,t (Z ). Examples. Conditional expectations: ϱ t,t (Z t,..., Z T ) := E ξ[t] (Z t Z T ). Conditional max-risk measures (recall that AV@R 0 ( ) = ess sup( )): ϱ t,t (Z t,..., Z T ) := AV@R 0 ξ[t] (Z t Z T ). For a law invariant coherent (convex) risk measure ρ( ): ϱ t,t (Z t,..., Z T ) := ρ ξ[t] (Z t Z T ). 45

47 Nested form: ϱ t,t (Z t,..., Z T ) := ρ ξ[t] ( ρ ξ[t 1] (Z t Z T ) ) = Z t + ρ ξ[t] (Z t ρ ξ[t 1] (Z T ) Recall that only conditional expectations and conditional maxrisk measures can be represented in the nested form of coherent risk measures. ) 46

48 For t = 2,..., T, consider problems Min (P t ) : ϱ ( ( ) ) t,t Ft (x t (ξ [t] ), ξ t ),..., F T xt (ξ [T ] ), ξ T x t ( ),...,x T ( ) s.t. x τ (ξ [τ] ) X τ (x τ 1 (ξ [τ 1] ), ξ τ ), τ = t,..., T, conditional on ξ [t] and x t 1. At t = 1 the problem (P 1 ) is Min x 1,x 2 ( ),...,x T ( ) ϱ 1,T ( F1 (x 1 ), F 2 (x 2 (ξ [2] ), ξ 2 ),..., F T ( xt (ξ [T ] ), ξ T ) ) s.t. x 1 X 1, x t (ξ [t] ) X t (x t 1 (ξ [t 1] ), ξ t ), t = 2,..., T, with ϱ 1,T : Z 1 Z 2 Z T R being a multiperiod risk measure. 47

49 We say that an optimal policy x 1, x 2 (ξ [2] ),..., x T (ξ [T ] ) of problem (P 1 ) is time consistent if the policy x t (ξ [t] ),..., x T (ξ [T ] ) is optimal for the problem (P t ), t = 2,..., T, conditional on ξ [t] and x t 1 = x t 1 (ξ [t 1] ). For the nested conditional risk measures ϱ t,t (Z t,..., Z T ) = Z t + ρ ξ[t] (Z t ρ ξ[t 1] (Z T ) every optimal policy of (P 1 ) is time consistent. ), (C1) For all 1 τ < θ T and Z, Z Z τ Z T, the conditions Z t = Z t, t = τ,..., θ 1, and ϱ θ,t (Z θ,..., Z T ) ϱ θ,t (Z θ,..., Z T ) imply that ϱ τ,t (Z τ,..., Z T ) ϱ τ,t (Z τ,..., Z T ). 48

50 Proposition 1 Suppose that condition (C1) holds and the multiperiod problem (P 1 ) has unique optimal solution policy x 1, x 2 (ξ [2] ),..., x T (ξ [T ] ). Then the policy x t (ξ [t] ),..., x T (ξ [T ] ) is optimal for the problem (P t ), t = 2,..., T, conditional on ξ [t] and x t 1 = x t 1. In case problem (P 1 ) has more than one optimal solution we need a stronger condition to ensure that an optimal solution of (P 1 ) is conditionally optimal for (P t ), t = 2,..., T : (C1 ) For all 1 τ < θ T and Z, Z Z τ Z T, the conditions Z t = Z t, t = τ,..., θ 1, and ϱ θ,t (Z θ,..., Z T ) ϱ θ,t (Z θ,..., Z T ) imply that ϱ τ,t (Z τ,..., Z T ) ϱ τ,t (Z τ,..., Z T ). 49

51 Consider the following conditions. (C2) For t = 1,..., T, it holds that ϱ t,t (0,..., 0) = 0, and for all (Z 1,..., Z T ) Z and t = 1,..., T 1, it holds that ϱ t,t (Z t, Z t+1,..., Z T ) = Z t + ϱ t,t (0, Z t+1,..., Z T ). (C3) For all 1 τ < θ T and Z Z it holds that ϱ τ,t (Z τ,..., Z θ,..., Z T ) = ϱ τ,θ ( Zτ,..., Z θ 1, ϱ θ,t (Z θ,..., Z T ) ). 50

52 Multiperiod coherent mappings of the nested form satisfy conditions (C1),(C2) and (C3), but not necessarily (C1 ). We have the following relations between these conditions. Proposition 2 Let {ϱ t,t } be a sequence of multiperiod mappings. Then the following holds. (i) Conditions (C1) and (C2) imply condition (C3). (ii) Condition (C3) implies condition (C1). 51

53 It is tempting to impose the following condition on the sequence of multiperiod mappings: For all 1 τ < θ T and Z, Z Z τ Z T, the conditions Z t = Z t, t = τ,..., θ 1, and ϱ τ,t (Z τ,..., Z T ) ϱ τ,t (Z τ,..., Z T ) imply that ϱ θ,t (Z θ,..., Z T ) ϱ θ,t (Z θ,..., Z T ). This condition will impliy the desired property of the sequence of multiperiod mappings. Unfortunately this condition does not hold even for a sequence of conditional expectation mappings ϱ t,t (Z t,..., Z T ) := E ξ[t] [Z t Z T ]. 52

54 Suppose that conditions (C1) and (C2) hold. Then condition (C3) follows, and hence we can write ( ϱ 1,T (Z 1,..., Z T ) = ϱ 1,T 1 Z1,..., Z T 2, ϱ T 1,T (Z T 1, Z T ) ) ( = ϱ 1,T 1 Z1,..., Z T 2, Z T 1 + ϱ T 1,T (0, Z T ) ). By continuing this process backwards we obtain the nested representation of ϱ 1,T : ϱ 1,T (Z 1,..., Z T ) = Z 1 + ρ 2 [Z ρ T 1 [ ZT 1 + ρ T [Z T ] ]], where ρ t+1 : Z t+1 Z t are one step mappings defined as ρ t+1 (Z t+1 ) := ϱ t,t+1 (0, Z t+1 ) = ϱ t,t (0, Z t+1, 0,..., 0). Clearly the mappings ρ t+1 inherit such properties of ϱ t,t as monotonicity, convexity and positive homogeneity. 53

55 Approximate dynamic programming Basic idea is to approximate the cost-to-go functions by a class of computationally manageable functions. Since functions Q t ( ) are convex it is natural to approximate these functions by piecewise linear functions given by maximum of cutting hyperplanes. Stochastic Dual Dynamic Programming (SDDP) method (Pereira and Pinto, 1991). For trial decisions x t, t = 1,..., T 1, at the backward step of the SDDP algorithm, piecewise linear approximations Q t ( ) of the cost-to-go functions Q t ( ) are constructed by solving problems Min x t R n (cj t t )T x t + Q t+1 (x t ) s.t. Bt j x t 1 + A j t x t = b j t, x t 0, j = 1,..., N t, and their duals, going backward in time t = T,..., 1. Denote by v 0 and ˆv N the respective optimal values of the true and SAA problems. 54

56 By construction Therefore the optimal value of Q t ( ) Q t ( ), t = 2,..., T. Min x 1 R n 1 ct 1 x 1 + Q 2 (x 1 ) s.t. A 1 x 1 = b 1, x 1 0 gives a lower bound for the optimal value ˆv N of the SAA problem. We also have that v 0 E[ˆv N ]. Therefore on average ˆv N is also a lower bound for the optimal value of the true problem. 55

57 The approximate cost-to-go functions Q 2,..., Q T and a feasible first stage solution x 1 define a feasible policy. That is for a realization (sample path) ξ 1,..., ξ T of the data process, x t = x t (ξ [t] ) are computed recursively in t = 2,..., T as a solution of Min x t c T t x t + Q t+1 (x t ) s.t. B t x t 1 + A t x t b t. In the forward step of the SDDP algorithm M sample paths (scenarios) are generated and the corresponding x t, t = 2,..., T, are used as trial points in the next iteration of the backward step. It is essential for convergence of this algorithm that at each iteration in the forward step the paths (scenarios) are resampled, i.e., generated independently of the previous iteration. Note that the functions Q 2,..., Q T and x 1 define a feasible policy also for the true problem. 56

58 Convergence of the SDDP algorithm It is possible to show that, under mild regularity conditions, the SDDP algorithm converges as the number of iterations go to infinity. That is, the computed optimal values and generated policies converge w.p.1 to their counterparts of the considered SAA problem. However, the convergence can be very slow and one should take such mathematical proofs very cautiously. Moreover, it should be remembered that the SAA problem is just an approximation of the true problem. It is possible to show that, in a certain probabilistic sense, the SAA problem converges to the true problem as all sample sizes N t, t = 2,..., T, tend to infinity. It was found in our numerical experiments that optimal solutions of the SAA problems started to stabilize for sample sizes of about N t = 100, t = 2,..., T. 57

59 Stopping criteria The policy value E [ Tt=1 c T t x t(ξ [t] ) ] can be estimated in the forward step of the algorithm. That is, let ξ2 i,..., ξi T, i = 1,..., M, be sample paths (scenarios) generated at a current iteration of the forward step, and ϑ i := T t=1 (c i t) T x i t, i = 1,..., M, be the corresponding cost values. Then E[ϑ i ] = E [ Tt=1 c T t x t(ξ [t] i )], and hence ϑ = 1 M M ϑ i i=1 gives an unbiased estimate of the policy value. 58

60 Also ˆσ 2 = 1 M 1 M i=1 (ϑ i ϑ) 2 estimates variance of the sample ϑ 1,..., ϑ M. Hence ϑ + z αˆσ/ M gives an upper bound for the policy value with confidence of about 100(1 α)%. Here z α is the corresponding critical value. At the same time this gives an upper bound for the optimal value of the corresponding multistage problem, SAA or the true problem depending from what data process the random scenarios were generated. Typical example of behavior of the lower and upper bounds produced by the SDDP algorithm for an SAA problem 59

61 8 state variables, 120 stages, 1 cut per iteration 60

62 Theoretical analysis and numerical experiments indicate that computational complexity of the SDDP algorithm grows fast with increase of the number of state variables. Sensitivity to initial conditions Individual stage costs for the risk neutral approach in two cases: all the reservoirs start at 25% or at 75% of the maximum capacity. The yellow curve denotes the 75% initial reservoir level and the dark green denotes the 25% initial level. 61

63 62

64 Variability of SAA problems Table shows the 95% confidence interval for the lower bound and average policy value at iteration 3000 over a sample of 20 SAA problems. Each of the policy value observations was computed using 2000 scenarios. The last 2 columns of the table shows the range divided by the average of the lower bound (where the range is the difference between the maximum and minimum observation) and the standard deviation divided by the average value. This problem has relatively low variability (approx. 4%) for both of the lower bound and the average policy value. 95% C.I. left Average 95% C.I. right range average sdev. average ( 10 9 ) ( 10 9 ) ( 10 9 ) Lower bound % 4.07% Average policy % 4.12% SAA variability for risk neutral SDDP 63

65 Risk averse approach How to control risk, i.e., to reduce chances of extreme costs, at every stage of the time process. Value-at-Risk of a random outcome (variable) Z at level α (0, 1): V@R α (Z) = inf{t : F Z (t) 1 α}, where F Z (t) = Pr(Z t) is the cdf of Z. That is, V@R α (Z) is the (1 α)-quantile of the distribution of Z. Note that V@R α (Z) c is equivalent to Pr(Z > c) α. Therefore it could be a natural approach to impose constraints (chance constraints) of V@R α (Z) c for Z = cost, chosen constant c and significance level α at every stage of the process. 64

66 There are two problems with such approach. It is difficult to handle chance constraints numerically and could lead to infeasibility problems. Average Value-at-Risk (also called Conditional Value-at-Risk) { AV@R α (Z) = inf t + α 1 } E[Z t] + t R Note that the minimum in the above is attained at t = V@R α (Z). If the cdf F Z (z) is continuous, then AV@R α (Z) = E [ Z Z V@R α (Z) ]. It follows that AV@R α (Z) V@R α (Z). Therefore the constraint AV@R α (Z) c is a conservative approximation of the chance constraint V@R α (Z) c. 65

67 In the problem of minimizing expected cost E[Z] subject to the constraint α (Z) c, we impose an infinite penalty for violating this constraint. This could result in infeasibility of the obtained problem. Instead we can impose a finite penalty and consider problem of minimization of E[Z] + κav@r α (Z) for some constant κ > 0. Note that this is equivalent to minimization of ρ(z), where for λ (0, 1) and κ = ρ(z) = (1 λ)e[z] + λav@r α (Z) λ 1 λ. 66

68 This leads to the following (nested) formulation of risk averse multistage problem. with [ Min c T A 1 x 1 b 1 x 1 + ρ 2 ξ1 inf c T 2 x B 2 x 1 +A 2 x 2 =b 2 x 2 0 [ +ρ T 1 ξ[t inf c T 2] T 1 x T 1 B T 1 x T 2 +A T 1 x T 1 =b T 1 x T 1 0 +ρ T ξ[t 1] [ inf B T x T 1 +A T x T =b T x T 0 c T T x T ] ]], ρ t ξ[t] ( ) := (1 λ)e ξ[t] [ ] + λav@r α ξ[t] ( ) being conditional analogue of ρ( ). 67

69 We can write the risk averse multistage programming problem as Min ρ [ ( ) ] F 1 (x 1 ) + F 2 (x 2 (ξ [2] ), ξ 2 ) + + F T xt (ξ [T ] ), ξ T x 1,x 2 ( ),...,x T ( ) s.t. x 1 X 1, x t (ξ [t] ) X t (x t 1 (ξ [t 1] ), ξ t ), t = 2,..., T, where F t (x t, ξ t ) = c T t x t and X t (x t 1, ξ t ) = {x t : B t x t 1 + A t x t = b t, x t 0}. ρ(z Z T ) = ρ ξ1 ( ρ ξ[2] ( ρ ξ[t 1] (Z Z T ) )) = Z 1 + ρ ξ1 ( Z 2 + ρ ξ[2] ( + ρ ξ[t 1] (Z T ) )) is the corresponding composite risk measure. The optimization is performed over (nonanticipative) policies x 1, x 2 (ξ [2] ),..., x T (ξ [T ] ) satisfying the feasibility constraints. 68

70 With some modifications the SDDP algorithm can be applied to the above multistage problem. Assuming the stagewise independence, the dynamic programming equations for the adaptive risk averse problem take the form Q t ( xt 1, ξ t ) = inf x t R n t t = T,..., 2, where Q T +1 ( ) 0 and { c T t x t +Q t+1 (x t ) : B t x t 1 +A t x t = b t, x t 0 }, Q t+1 (x t ) := ρ t+1 ξ[t] [ Qt+1 ( xt, ξ t+1 )]. Since ξ t+1 is independent of ξ [t], the cost-to-go functions Q t+1 (x t ) do not depend on the data process. In order to apply the backward step of the SDDP algorithm we only need to know how to compute subgradients of the cost-to-go functions. 69

71 The value of this problem corresponds to the total objective ρ(z Z T ) = ρ ξ[1] ( ρ ξ[t 1] (Z Z T ) ) = Z 1 + ρ ξ[1] (Z ρ ξ[t 1] (Z T ) The dynamic programming equations of the risk averse formulation of the SAA program take the form Q j t (x t 1) = inf x t { (c j t ) T x t + Q t+1 (x t ) : B j t x t 1 + A j t x t = b j t, x t 0 }, j = 1,..., N t, t = T,..., 2, and ( Q t+1 (x t ) = ρ Q 1 t+1 (x t),..., Q N t+1 t+1 (x t) with Q T +1 ( ) 0 and the first stage problem Min A 1 x 1 b 1 c T 1 x 1 + ρ ( Q 1 2 (x 1),..., Q N 2 2 (x 1) ). ), ) 70

72 For ρ( ) = (1 λ)e[ ] + λav@r α ( ), and (Z 1,..., Z N ) = ( Q 1 t+1 (x t ),..., Q N t+1 (x t) ) we have that Q t+1 (x t ) = 1 λ N t+1 N t+1 j=1 Z j + λ Z ι + 1 αn t+1 j:z j >Z ι where Z ι is the (1 α)-quantile of Z 1,..., Z Nt+1. N t+1 < (1 α) 1, then Z ι = max{z 1,..., Z Nt+1 }. [ ] Zj Z ι, Note that if A subgradient of Q t+1 (x t ) is given by Q t+1 (x t ) = 1 λ N λ N t+1 j=1 Q ι t+1 (x t) + 1 αn t+1 Q j t+1 (x t) + j:z j >Z ι [ j Q t+1 (x t) Q ι t+1 (x t) ]. 71

73 These formulas allow construction of cuts in the backward step of the SDDP algorithm. In the forward step trial points are generated in the same way as in the risk neutral case. Remarks Unfortunately there is no easy way for evaluating value of the risk objective of generated policies, and hence constructing a corresponding upper bound. Some suggestions were made in the recent literature. However, in larger problems the optimality gap (between the upper and lower bounds) never approaches zero in any realistic time. Therefore stopping criteria based on stabilization of the lower bound (and may be optimal solutions) could be reasonable. Also it should be remembered that there is no intuitive interpretation for the risk objective ρ(cost) of the total cost. Rather the goal is to control risk at every stage of the process. 72

74 73

Risk neutral and risk averse approaches to multistage stochastic programming.

Risk neutral and risk averse approaches to multistage stochastic programming. A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA