Vincent Guigues FGV/EMAp, Rio de Janeiro, Brazil 1.1 Comparison of the confidence intervals from Section 3 and from [18]

Size: px

Start display at page:

Download "Vincent Guigues FGV/EMAp, Rio de Janeiro, Brazil 1.1 Comparison of the confidence intervals from Section 3 and from [18]"

Morris Townsend
6 years ago
Views:

1 Online supplementary material for the paper: Multistep stochastic mirror descent for risk-averse convex stochastic programs based on extended polyhedral risk measures Vincent Guigues FGV/EMAp, -9 Rio de Janeiro, Brazil Numerical experiments. Comparison of the confidence intervals from Section and from [8] We compare the coverage probabilities and the computational time of two confidence intervals with confidence level at least α =.9 on the optimal value of (.6) and (.7), built using a sample ξ N = (ξ,..., ξ N ) of size N of ξ: [ ]. the (non-asymptotic) confidence interval C = Low (Θ, Θ, N), Up (Θ, N) proposed in Section. with Θ, Θ, Θ as in Corollary... The (non-asymptotic) confidence interval C = where [ ] Low (Θ, N), Up (Θ, N) proposed in [8] ( ( Low (Θ, N) = f N ) ]) N θ + θ Dω,X M [ + Θ [M θ ] Dω,X M µ(ω) N µ(ω) (.) with f N = min x X N N t= [ ] g(x t, ξ t ) + G(x t, ξ t ) (x x t ), taking for x,..., x N, the sequence of points generated by the algorithm with constant step γ = θ µ(ω)d ω,x M N. In this [ ] expression, M satisfies E exp{ G(x, ξ) /M } exp{} for all x X. Using Theorem of [8], we have P(f(x ) < Low (Θ, N)) 6 exp{ Θ /} + exp{ Θ /} + exp{.7θ N}. Recalling that P(f(x ) > Up (Θ, N)) exp{ Θ /}, it follows that we can take Θ = ln(/α) and Θ satisfying 6 exp{ Θ /} + exp{ Θ /} + exp{.7θ N} = α/. All simulations were implemented in Matlab using Mosek Optimization Toolbox []... Comparison of the confidence intervals on a risk-neutral problem We consider problem (.6) with α =., α =.9, λ = b =, a =, n {, 6, 8, }, and where ξ is a random vector with i.i.d. Bernoulli entries: Prob(ξ(i) = ) = Ψ(i), Prob(ξ(i) = ) = Note that parameter D ω,x in [8] is parameter D ω,x given by (.9) divided by.

2 Sample C / C, problem size n size N Table : Average ratio of the widths of the confidence intervals for problem (.6). Confidence Problem size n interval 6 8 C, N = C, N = C, N = C, N = Table : Average computational time (in seconds) of a confidence interval estimated computing confidence intervals for problem (.6). Ψ(i), with Ψ(i) randomly drawn over [, ]. It follows that f(x) = α µ x + α x V x where µ(i) = E[ξ(i)] = Ψ(i) and V (i, j) = E[ξ(i)]E[ξ(j)] = (Ψ(i) )(Ψ(j) ) for i j while V (i, i) = E[ξ(i) ] =. For, we take = and for the distance-generating function the entropy function ω(x) = ω (x) = n i= x(i) ln(x(i)). We (first) take θ = in (.), meaning µ(ω)dω,x that C is obtained running with constant step γ = M N where M = α + α. We simulate instances of this problem and compute for each instance the confidence intervals C and C. The coverage probabilities of the two non-asymptotic confidence intervals are equal to one for all parameter combinations. We report in Table the mean ratio of the widths of the non-asymptotic confidence intervals. Interestingly, we observe that the confidence interval C we proposed in Section is less conservative than C : in these experiments, the mean length of the width of C divided by the width of C varies between.8 and.8, as can be seen in Table. Another advantage of C is that it tends to be computed more quickly (see Table for problem sizes n =, 6, 8, and ), especially when the problem size n increases (see Table for n =,,, and ), due to the fact that C is computed using an analytic formula while solving an (additional) optimization problem of size n is required to compute C. We now fix a problem size n = and compute realizations of the confidence intervals on the optimal value of that problem. On the top left plot of Figure, we report the optimal value as well as the approximate optimal values g N using variants and of for three sample sizes: N =,, and. On the remaining plots of this figure, the upper and Confidence Problem size n interval C, N = C, N = Table : Average computational time (in seconds) of a confidence interval estimated computing confidence intervals for problem (.6).

3 lower bounds of confidence intervals C and C are reported for sample sizes N =,, and. We observe that the upper limits of C and C are very close (though not identical since the variants and use different steps). When the sample size N increases, g N gets closer to the optimal value and the upper (resp. lower) limits tend to decrease (resp. increase). In this figure, we also see that C lower limit is much larger than C lower limit (in accordance with the results of Table ). We also note that and lower bounds appear to be almost straight lines for these simulations. This comes from the fact that the random part g N in these bounds is quite small compared to the deterministic part (remaining terms). x.. Approximate optimal value,, N=,, Approximate optimal value,, N=,, Lower bound,, N=,, Lower bound,, N=,, Upper bound,, N= Upper bound,, N= Upper bound,, N=.6. Upper bound,, N= Upper bound,, N= Upper bound,, N= Figure : Approximate optimal value, upper and lower bounds for C and C, on instances of problem (.6) of size n =. Finally, we consider for parameter θ involved in the computation of C the range of values.,.,.,.,.,,, considered in [8]. For these values of θ, the average ratios of C and C widths are given in Table. These average ratios are all above.79 and as high as. for (θ, N, n) = (.,, ), which shows again that C is much more conservative than the interval C proposed in Section. for this range of values of θ... Comparison of the confidence intervals on a risk-averse problem We reproduce the experiments of the previous section for problem (.7) with = = and the distance-generating function ω(x) = ω (x) = x. We take M = α ( ε ) + n(α + α ε ),

4 Problem size n (Ratio, θ, N) 6 8 C / C, θ =., N = C / C, θ =., N = C / C, θ =., N =....6 C / C, θ =., N = C / C, θ =., N = C / C, θ =, N = C / C, θ =, N = C / C, θ =, N =.7... Table : Average ratio of the widths of confidence intervals C and C, problem (.6). Confidence interval and ε =., problem size ε =.9, problem size sample size N C, N = C, N = C, N = C, N = Table : CVaR optimization (problem (.7)). Average computational time (in seconds) of a confidence interval estimated computing confidence intervals. and two sets of values for (α, α, ε): (α, α, ε) = (.9,.,.9) and the more risk-averse variant (α, α, ε) = (.,.9,.). For these problems, we first discretize ξ, generating a sample of size which becomes the sample space. We compute the optimal value of (.7) using this sample and sample from this set of scenarios to generate the problem instances. For different problem and sample sizes, we generate again instances. Coverage probabilities of the non-asymptotic confidence intervals are equal to one for all parameter combinations. The time required to compute these confidence intervals is given in Table while the the average ratios of the widths of C and C are reported in Table 6. We observe again on this problem that C is much more conservative than C and for N = that C is computed quicker than C for all problem sizes. When ɛ is small and more weight is given to the CVaR, the optimization problem becomes more difficult, i.e., we need a large sample size to obtain a solution of good quality. This can be seen in Figures and. On the top left plots of Figures and, for a problem of size n =, we plot realizations of the approximate optimal values g N using variants and of for two sample sizes: N = and N = (ε =. for Figure and ε =.9 for Figure ). For fixed sample size Ratio and ε =., problem size ε =.9, problem size sample size N C / C, N = C / C, N = Table 6: CVaR optimization (problem (.7)). Average ratio of the widths of the confidence intervals C and C.

5 .... Approximate optimal value,, N=, Approximate optimal value,, N=,..6.7 Lower bound,, N=, Lower bound,, N=, Upper bound,, N= Upper bound,, N= 6 Upper bound,, N= Upper bound,, N= Figure : CVaR optimization (problem (.7)). Approximate optimal value g N, upper and lower bounds of C and C on instances, problem size n = and ε =..

6 Approximate optimal value,, N=,, epsilon=.9 Approximate optimal value,, N=,, epsilon=.9 Lower bound,, N=,, epsilon=.9 Lower bound,, N=,, epsilon= Upper bound,, N=, epsilon=.9 Upper bound,, N=, epsilon=.9. Upper bound,, N=, epsilon=.9 Upper bound,, N=, epsilon= Figure : CVaR optimization (problem (.7)). Approximate optimal value g N, upper and lower bounds of C and C on instances, problem size n = and ε =.9. N, for ε =.9 these realizations are much closer to the optimal value than for ε =.. On the remaining plots of Figure and, we report the upper and lower bounds of confidence intervals C and C. We observe again that (i) upper (resp. lower) bounds decrease (resp. increase) when the sample size increases, (ii) C and C upper bounds are very close, and (iii) C lower bound is much larger than C lower bound (reflecting the fact that C is much more conservative than C ). Additionally, we observe that when ɛ is small (ε =.) and more weight is given to the CVaR (α =.9) the upper and lower bounds become more distant to the optimal value, i.e., the width of the confidence intervals increases. To conclude, confidence intervals C and C cannot be compared directly because both the constants involved and the steps used to generate the points x,..., x N, are different. However, we hypothesize that the optimization in results in both the conservativeness and the computation time difference.. Comparing the multistep and nonmultistep variants of to solve problem (.6) We solve various instances of problem (.6) (with a =, b = ) using and its multistep version defined in Section taking ω(x) = ω (x) = x. These algorithms in this case are the RSA and multistep RSA. We fix the parameters α =.9, α =., λ =, x = [; ;... ; ]], D X =, and recall that µ(ω) = µ(ω ) = M(ω ) = µ(f) =, ρ =, L = α n+α ( n+λ ), M = α +.α, 6

7 6 x MS x.... x.... x Figure : Steps (left plot), average (computed over runs) approximate optimal values (middle plot), and average (computed over runs) value of the objective function at the solution (right plot) along the iterations of the and MS algorithms run on problem (.6) with n =, N = 8. and M = n( α + α ). In this and the next section, ξ is again a random vector with i.i.d. Bernoulli entries: Prob(ξ(i) = ) = Ψ(i), Prob(ξ(i) = ) = Ψ(i), with Ψ(i) randomly drawn over [, ]. We first take n = and choose the number of iterations using Proposition., namely we take f(ysteps+ N = + 78A(f, ω ) = 8 which ensures that for the MS algorithm E[ ) f(x ).. (we also check that for this value of N, relation (.76) (an assumption of Proposition.) holds). For this value of N, the values of γ t for each iteration of the MS algorithm as well as the constant value of γ for the algorithm are represented in the left plot of Figure. We observe that the MSRSA algorithm starts with larger steps (when we are still far from the optimal solution) and ends with smaller steps (when we get closer to the optimal solution) than the RSA algorithm. We run each algorithm times and report in the middle plot of Figure the average (over the runs) of the approximate optimal values computed along the iterations with both algorithms. We also report in the right plot of Figure the average (over these runs) of the value of the objective function at the and MS solutions. More precisely, for each run of the algorithm, for iteration i the approximate optimal value is g i = i i k= g(x k, ξ k ) (defined in Algorithm ) while for iteration j of the i-th step of the MS algorithm, the approximate optimal value is g i,j = j j k= g(x i,k, ξ i,k ) (defined in Algorithm ) where ξ i,k and x i,k are respectively the k-th realization of ξ and the k-th point generated for that step i (of course, for a given run, the same samples are used for and MS). We observe that we get better (lower) approximations of the optimal value using the MSRSA algorithm. After a large number of iterations, the algorithms provide very close approximations of the optimal value (themselves close to the optimal value of the problem), which is in agreement with the results of Sections and which state that for both algorithms the approximate optimal values converge in probability to the optimal value of the problem. However, it is observed that the MSRSA algorithm provides an approximate solution of good quality much quicker than the RSA algorithm. We also observe that if the value of the sample size N = 8 chosen based on Proposition. indeed allows us to solve the problem with a good accuracy, it is very conservative. In a second series of experiments, we choose various problem sizes n and smaller sample sizes N, namely (n, N) = (, ), (n, N) = (, ), (n, N) = (, ), and (n, N) = (, ), still 7

8 observing solutions of good quality. For these values of the pair (n, N), the values of the steps used for the and MS algorithms are reported in Figure. Here again the MSRSA algorithm starts with larger steps and ends with smaller steps. x MS. x MS x. x MS MS x Figure : Steps used for the and MS algorithms to solve problem (.6) with (n, N) = (, ) (top left plot), (n, N) = (, ) (top right plot), (n, N) = (, ) (bottom left), (n, N) = (, ) (bottom right). The average (over runs) of the approximate optimal value and of the value of the objective function at the and MS solutions are reported in Figures 6 and 7. We still observe on these simulations that MS allows us to obtain a solution of good quality much quicker than and ends up with a better solution, even when only two different step sizes are used for MS. 8

9 x Figure 6: Average over realizations of the approximate optimal values computed by the and MS algorithms to solve (.6). Top left: (n, N) = (, ), top right: (n, N) = (, ), bottom left: (n, N) = (, ), bottom right: (n, N) = (, ). 9

10 x Figure 7: Average over realizations of the values of the objective function at the approximate solutions computed by the and MS algorithms to solve (.6). Top left: (n, N) = (, ), top right: (n, N) = (, ), bottom left: (n, N) = (, ), bottom right: (n, N) = (, ).

11 . Comparing the multistep and nonmultistep variants of to solve problem (.8) We reproduce the experiment of the previous section running times and MS on problem (.8) taking ω(x) = ω (x) = x, ε =.9, α =., α =.9, λ =, x = [; ; ;... ; ]], D X =, and recall that µ(ω) = µ(ω ) = M(ω ) = µ(f) =, ρ =, L = ( α ε α ( ε ) + n(α + α ε ) + λ, M = (α + α ) ( ε ), and M = + n α + α ). ε We consider again four combinations for the pair (n, N): (n, N) = (, ), (, ), (, ), and (, ). The steps used along the iterations of the and MS algorithms are reported in Figure 8.. x x. MS. MS x. x MS MS Figure 8: Steps used for the and MS algorithms to solve problem (.8) with (n, N) = (, ) (top left plot), (n, N) = (, ) (top right plot), (n, N) = (, ) (bottom left), (n, N) = (, ) (bottom right). The average (computed running the algorithms times) of the approximate optimal values

12 Figure 9: Average over realizations of the approximate optimal values computed by the and MS algorithms to solve (.8). Top left: (n, N) = (, ), top right: (n, N) = (, ), bottom left: (n, N) = (, ), bottom right: (n, N) = (, ). and of the value of the objective function at the approximate solutions are reported in Figures 9 and. In these experiments we observe again that MS approximate solutions are better along the iterations and at the end of the optimization process.

13 Figure : Average over realizations of the values of the objective function at the approximate solutions (right plots) computed by the and MS algorithms to solve (.8). Top left: (n, N) = (, ), top right: (n, N) = (, ), bottom left: (n, N) = (, ), bottom right: (n, N) = (, ).

Non-asymptotic confidence bounds for the optimal value of a stochastic program

Non-asymptotic confidence bounds for the optimal value of a stochastic program on-asymptotic confidence bounds for the optimal value of a stochastic program Vincent Guigues, Anatoli Juditsky, Arkadi emirovski To cite this version: Vincent Guigues, Anatoli Juditsky, Arkadi emirovski.