Weierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN

Size: px

Start display at page:

Download "Weierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN"

Mitchell Small
6 years ago
Views:

1 Weierstraß-Institut für Angewandte Analysis und Stochastik Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN Are Quasi-Monte Carlo algorithms efficient for two-stage stochastic programs? Holger Heitsch 1, Hernan Leövey 2, Werner Römisch 2 submitted: April 19, Weierstrass Institute Mohrenstr Berlin Germany holger.heitsch@wias-berlin.de 2 Humboldt University Berlin Institute of Mathematics Berlin Germany leovey@math.hu-berlin.de romisch@math.hu-berlin.de No Berlin Mathematics Subject Classification. 90C15. Key words and phrases. Stochastic programming, two-stage, scenario, Quasi-Monte Carlo, effective dimension, dimension reduction. The authors wish to express their gratitude to Prof. Ian Sloan (University of New South Wales, Sydney) for inspiring conversations. Much of the work on this paper was done during the first author held a position at the Humboldt University Berlin. The research of the first author is partially supported by a grant of Kisters AG and by the Deutsche Forschungsgemeinschaft within project B04 in SFB/Transregio 154 Mathematical Modelling, Simulation and Optimization using the Example of Gas Networks. The research of the second author is supported by a grant of the German Bundesministerium für Wirtschaft und Technologie (BMWi) and the third by the DFG Research Center MATHEON at Berlin.

2 Edited by Weierstraß-Institut für Angewandte Analysis und Stochastik (WIAS) Leibniz-Institut im Forschungsverbund Berlin e. V. Mohrenstraße Berlin Germany Fax: World Wide Web:

3 Abstract Quasi-Monte Carlo algorithms are studied for designing discrete approximations of two-stage linear stochastic programs with random right-hand side and continuous probability distribution. The latter should allow for a transformation to a distribution with independent marginals. The twostage integrands are piecewise linear, but neither smooth nor lie in the function spaces considered for QMC error analysis. We show that under some weak geometric condition on the two-stage model all terms of their ANOVA decomposition, except the one of highest order, are continuously differentiable and that first and second order ANOVA terms have mixed first order partial derivatives and belong to L 2. Hence, randomly shifted lattice rules (SLR) may achieve the optimal rate of convergence O(n 1+δ ) with δ (0, 1 2 ] and a constant not depending on the dimension if the effective superposition dimension is at most two. We discuss effective dimensions and dimension reduction for two-stage integrands. The geometric condition is shown to be satisfied almost everywhere if the underlying probability distribution is normal and principal component analysis (PCA) is used for transforming the covariance matrix. Numerical experiments for a large scale two-stage stochastic production planning model with normal demand show that indeed convergence rates close to the optimal are achieved when using SLR and randomly scrambled Sobol point sets accompanied with PCA for dimension reduction. 1 Introduction Two-stage stochastic programs arise as deterministic equivalents of improperly posed random linear programs min{ c, x : x X, T x = h(ξ)}, (1) where X is a convex polyhedral subset of R m, T a matrix, ξ is a d-dimensional random vector, h(ξ) = (ξ, h) for some h R r d, d r, and, denotes the inner product in R m. The modeling idea consists in the compensation of a possible deviation h(ξ(ω)) T x for a given realization ξ(ω) of ξ, by introducing additional costs Φ(x, ξ(ω)) whose mean with respect to the probability distribution P of ξ is added to the objective of (1). In two-stage stochastic programming it is assumed that the additional costs represent the optimal value of a second-stage linear program, i.e., Φ(x, ξ) = inf{ q, y : y R m, W y = h(ξ) T x, y 0}, (2) where W is a (r, m)-matrix called recourse matrix, q R m the recourse costs and y the recourse decision. The deterministic equivalent program then is of the form { min c, x + R d Φ(x, ξ)p (dξ) : x X }. (3) In practical applications of stochastic programming the dimension d is often large, e.g., in economics, energy, finance or transportation (see [62] for a survey of applied models). It is worth noting that the option pricing models that served as motivating examples for the further development of Quasi-Monte Carlo algorithms (e.g. in [65, 66, 69]) may be reformulated as linear two-stage stochastic programs 1

4 whose stochastic inputs are means of geometric Brownian motions paths. So, in a sense, the models considered here may be regarded as extensions of such financial models (see Example 4). The standard approach to solving the optimization model (3) consists in approximating the underlying probability distribution by discrete distributions P n based on a finite number n of samples or scenarios ξ j R d with probabilities p j, j = 1,..., n, and to consider the approximate stochastic program { min c, x + n } p j Φ(x, ξ j ) : x X. The case of random samples is studied in detail at least for independent and identically distributed (iid) samples (see e.g. Chapters 6 and 7 in [51], [49, Sect. 4]), where the convergence rate (in probability or quadratic mean) is O(n 1 2 ). Only a few papers related to stochastic programming deal with the situation of deterministic samples with identical weights p j = n 1 and proved (general) convergence results (see [7, 45, 19, 46], [23] for randomized samples or [50] for an overview). There exist two main approaches for the generation of discrete approximations to P based on deterministic samples with identical weights. The first one is called optimal quantization of probability distributions (see [12], [42]) and determines such quantizations by (approximately) solving best approximation problems for P in terms of the L p -minimal (or L p -Wasserstein) metric l p, p 1 (see Section 2.5 in [48]). The primal and dual representations of l 1 together with a classical result (see [8, Proposition 2.1]) imply that c n 1 d l1 (P, P n ) = sup f(ξ)(p P n )(dξ) l p (P, P n ) f F d, f L 1 R d holds for sufficiently large n and some constant c > 0 if P has a density on R d and F d denotes the Banach space of Lipschitz functions on R d equipped with the Lipschitz norm L. This shows that the convergence rate of l p (P, P n ) is at best O(n 1 d ). This rate is indeed established in [12, Theorem 6.2] under certain additional conditions on P. It is well known that the unit ball {f F d : f L 1} is too large for obtaining better rates. The second approach utilizes Quasi-Monte Carlo algorithms that are of the form Q n,d (f) = n 1 n f(x j ) (n N) and relies on the concept of equidistributed or low discrepancy point sets {x j } n or sequences (x j ) j N in [0, 1) d (see [56, 33, 29, 5]). As observed in [16] certain reproducing kernel Hilbert spaces F d of functions f : [0, 1] d R are particularly useful for estimating the quadrature error. Let K : [0, 1] d [0, 1] d R be a kernel satisfying K(, y) F d and f, K(, y) = f(y) for each y [0, 1] d and f F d. If, and denote the inner product and norm in F d, and the integral I d (f) = f(x)dx [0,1] d is a continuous functional on F d, the worst-case quadrature error e n (F d ) allows the representation e n (F d ) = I d (f) Q n,d (f) = sup f, h n = h n (4) sup f F d, f 1 f 1 2

5 according to Riesz representation theorem for linear bounded functionals on Hilbert spaces. The representer h n F d of the quadrature error is of the form n h n (x) = K(x, y)dy n 1 K(x, x j ) ( x [0, 1] d ). [0,1] d Here, we consider the weighted tensor product Sobolev space [54, 3] F d = W (1,...,1) 2,mix ([0, 1]d ) = d W2 1 ([0, 1]) (5) equipped with a weighted norm f γ where the index γ refers to a positive and nonincreasing sequence (γ i ). The norm is given by f 2 γ = f, f γ and the inner product (see Section 4 for the notation) f, g γ = γu 1 I u (f)(x u )I u (g)(x u )dx u, (6) [0,1] u u {1,...,d} where I u (f) and γ u are defined for any subset u of {1,..., d} by I u (f)(x u ) = [0,1]d u u x u f(x)dx u and γ u = i u The space F d is also a reproducing kernel Hilbert space with the kernel γ i. K d,γ (x, y) = d ( 1 + γi (0.5B 2 ( x i y i ) + B 1 (x i )B 1 (y i )) ) (x, y [0, 1] d ), where B 1 (x) = x 1 2 and B 2(x) = x 2 x respectively [3, 25]. are the Bernoulli polynomials of order 1 and 2, Another example is a weighted tensor product Walsh space consisting of Walsh series (see [5, Example 2.8] and [4]). These spaces became important for analyzing the recently developed randomized lattice rules, namely, randomly shifted lattice rules [55, 24, 26, 35] and random digitally shifted polynomial lattice rules (see [4, 5]). Both are special cases of randomized Quasi-Monte Carlo algorithms (RQMC) which will be discussed in Section 2. Here, we just mention that randomly shifted lattice rules n 1 ({ }) jg Q n,d (, f) = n 1 f n + j=0 (7) can be constructed, where is uniformly distributed in [0, 1) d, g Z d is the generator of the lattice which is obtained by a component-by-component algorithm [24] and { } means taking componentwise the fractional part. For f belonging to the weighted (un)anchored tensor product Sobolev space F d the root mean square error of such randomly shifted lattice rules can be bounded by [55, 24, 3] E I d (f) Q n,d (, f) 2 C(δ)n 1+δ f γ (δ (0, 0.5]), (8) where the constant C(δ) does not depend on the dimension d if the sequence of nonnegative weights (γ j ) satisfies 1 2(1 δ) γj <. (9) 3

6 Unfortunately, typical integrands in linear two-stage stochastic programming (see Section 3) do not belong to such tensor product Sobolev or Walsh spaces and are even not of bounded Hardy and Krause variation (on [0, 1] d ). The latter condition represents the standard requirement on the integrand f to justify Quasi-Monte Carlo algorithms via the Koksma-Hlawka theorem [33, Theorem 2.11]. Alternatively, it is suggested in the literature to study the so-called ANOVA decomposition (see Section 4) of such integrands, the smoothness of the ANOVA terms, effective dimensions and/or sensitivity indices of the integrands. The aim of the present paper is to follow the suggestions and to derive theoretical arguments that explain why modern RQMC methods, with focus on randomly shifted lattice rules (7), converge with nearly the optimal rate (8) for the considered class of stochastic programs although the integrands do not satisfy standard requirements in QMC analysis, e.g., do not belong to the weighted tensor product Sobolev space (5). As a first step in this direction we show in Section 5 that all ANOVA terms except the one of highest order are continuously differentiable and possess second order partial derivatives almost everywhere under some geometric condition on the second stage program. In particular, the first and second order ANOVA terms belong to a mixed Sobolev space which is defined in Section 4. Error estimates show that the QMC convergence rate dominates the error if the effective superposition dimension is equal to 2 (Remark 10 in Section 5). In addition, we show in Section 6 that the geometric condition is satisfied for almost all covariance matrices if the underlying random vector is Gaussian. The meaning of almost all is also explained there. We also provide estimates of sensitivity indices and mean dimension in Section 7 and discuss techniques for dimension reduction. In accordance with the theoretical results in Section 5 our preliminary computational results in Section 8 show that scrambled Sobol sequences and randomly shifted lattice rules applied to a large scale two-stage stochastic program achieve convergence rates close to the optimal rate (8) if principal component analysis (PCA) is employed for dimension reduction. Both randomized QMC algorithms clearly outperform Monte Carlo methods. 2 Randomized Quasi-Monte Carlo methods Randomized Quasi-Monte Carlo algorithms (RQMC) permit us to combine the good features of Monte Carlo within Quasi-Monte Carlo methods for practical error estimation. If f has mixed partial derivatives of second order in each variable in L 2 ([0, 1] d ), then the convergence rate (8) can be improved to nearly O(n 2 ) by embedding the function into an appropriate Korobov space through the so called tent or baker s transformation (see [6, Section 5]). Although this is theoretically true, this extra improved rate of convergence (over the already good O(n 1+δ )) for smoother integrands is rarely observed for RQMC in practical applications of high-dimensional integration where only moderate or small sample sizes n are affordable for computations [17]. A large class of QMC rules that can be randomized are the well known (t, m, d)-nets and (t, d)- sequences [33]. The randomization techniques for these constructions follow mainly two schemes: random digital shifts and random scramblings. Random digital shifting of (t, m, d)-nets and (t, d)- sequences can be performed in a similar way as mentioned for randomly shifting lattice rules, but the operations to add the shift must be carried out in the basis b used to define the (t, m, d)-nets (see [6, Section 6]). The resulting RQMC point set preserves the original net structure. Similar bounds for the root mean square error as in (8) can be obtained for integrands belonging to the weighted (anchored and unanchored) tensor product Sobolev space F d by using a special class of (t, m, d)-nets called polynomial lattice rules, see again [6, Section 6]. 4

7 The random scrambling method was first introduced by Owen in [36]. The basic properties of Owen s scrambling are the following: Proposition 1 (Equidistribution) A randomized (t, m, d)-net in base b using Owen s scrambling is again a (t, m, d)-net in base b with probability 1. A randomized (t, d)-sequence in base b using Owen s scrambling is again a (t, d)- sequence in base b with probability 1. Proposition 2 (Uniformity) Let z i be the randomized version of a point z i originally belonging to a (t, m, d)-net in base b or a (t, d)-sequence in base b, using Owen s scrambling. Then z i has a uniform distribution in [0, 1) d, that is, for any Lebesgue measurable set G [0, 1) d, P ( z i G) = λ d (G), with λ d the d-dimensional Lebesgue measure. Note that the uniformity property stated above ensures that the resulting RQMC estimator ˆQ n,d (.) is unbiased. We mention here the general results about the variance of a RQMC estimator ˆQ n,d (.) after Owen s random scrambling technique to (t, m, d)-nets in base b for functions f L 2 ([0, 1] d ) (see [37]). Theorem 3 Let z i, 1 i n, be the points of a scrambled (t, m, d)-net in base b, and let f be a function on [0, 1) d with integral I and variance σ 2 := (f(z) I d (f)) 2 dz <. Let ˆQn,d (f) = n 1 n f( z i) with n = b m be the RQMC estimator. Then its variance Var( ˆQ n,d (f)) has the properties Var( ˆQ n,d (f)) = o(n 1 ) as n and Var( ˆQ n,d (f)) bt n ( ) d b + 1 σ 2. b 1 For t = 0 we have Var( ˆQ n,d (f)) 1 n ( ) d 1 b σ 2 1 b 1 n eσ2. Note that the last inequality for t = 0 above holds since in this case one must have b d. If the function f has bounded variation in the sense of Hardy and Krause V HK (f) <, then by the equidistribution property stated above the classical Koksma-Hlawka inequality holds with probability 1 for random scrambled (t, m, d)-nets, therefore the classical discrepancy bounds for (t, m, d)-nets [33] lead to Var( ˆQ n,d (f)) = O ( n 2 (log n) 2(d 1)). If the integrand f has a mixed partial derivative of order d which satisfies a Hölder condition, the above rate of convergence can be improved to [37, 38] Var( ˆQ n,d (f)) = O ( n 3 (log n) d 1). Further improved results for functions having finite generalized Hardy and Krause variation can be found in [5, Theorem 13.25]. Note, however, that distinct from (8) sequences of the form (n α (log n) d 1 ) increase as long as n < exp d 1 and, hence, require extremely large sample sizes n for higher dimensions d to get α small. 5

8 The piecewise linear convex functions arising in stochastic programming (see Section 3) do not even have mixed partial derivatives (in the sense of Sobolev) in general. They do not have finite (generalized) Hardy and Krause variation either. The latter is shown for the classical Hardy and Krause variation of the special function f d (x) = max{x 1 + x x d 1, 0}, d 3, in [40, Proposition 2 17], but its proof carries over to the generalized variation. Thus, none of the results stated or mentioned above for RQMC can be used to formally justify an observed root mean square error convergence near to O(n 1 ) (see Section 8) for integrands appearing in linear two-stage stochastic programming. Several modifications of the original scrambling method proposed by Owen have been investigated in order to provide efficient implementations of scramblings for practical applications, see the survey [28] and [31, 20, 60, 41] for example. Recent QMC constructions that aim to advantage from a setting with even higher smoothness of the integrands are the so called higher order digital nets in combination with higher order scramblings. For further information on this topic we refer the reader to [1, 5]. 3 Integrands of linear two-stage stochastic programs As described in the introduction, the integrands of two-stage linear stochastic programs with random right-hand sides are Φ(x, ξ) = φ(h(ξ) T x), (10) where φ denotes the optimal value function assigning to each t R r the infimum φ(t) = inf{ q, y : W y = t, y 0} in R = R {, + }. Due to duality in linear programming, the function φ is finite and φ(t) = sup{ t, z : W z q}, (11) if t dom φ = {t R r : φ(t) < } and the dual feasible set D = {z R r : W z q} is nonempty. Here, q R m, W is a (r, m)-matrix and t varies in the polyhedral cone dom φ = W (R m +). If D is nonempty, it is of the form D = conv{v 1,..., v l } + (dom φ), where v 1,..., v l are the vertices of D, conv means convex hull and (dom φ) is the polar cone to the cone dom φ = W (R m +), i.e., (dom φ) = {d R r : d, t 0, t W (R m +)} = {d R r : W d 0}. Furthermore, there exist polyhedral cones K j, j = 1,..., l, decomposing dom φ. The cone K j is the normal cone to the vertex v j, i.e., Moreover, K j = {t dom φ : t, z v j 0, z D} (j = 1,..., l) (12) = {t dom φ : t, v i v j 0, i = 1,..., l, i j}. (13) φ(t) = v j, t ( t K j ) and φ(t) = max,...,l vj, t ( t dom φ) (14) and,...,l K j = dom φ. The intersection K j K j for j j coincides with a common closed face of dimension less than r. It is a common closed face of dimension r 1 iff the two cones are adjacent. In the latter case, the intersection is contained in {t dom φ : t, v j v j = 0}. (15) 6

9 If there exists k {1,..., d} such that the kth components of v j and v j coincide, the common closed face of K j and K j contains at least one of the two one-dimensional cones {(0,..., 0, t k, 0,..., 0) : t k 0} and {(0,..., 0, t k, 0,..., 0) : t k 0}. The cones K j may also be represented by { } K j = λ i w i : λ i 0, i I j, i I j where w i R r are the columns of W and I j = {i {1,..., m} : w i, v j = q i }. Each vertex v j is determined by r linear independent equations out of the m equations w i, v = q i, i = 1,..., m. In the following sections we need the conditions (A1) h(ξ) T x W (R+) m for all ξ R d, x X (relatively complete recourse). (A2) The dual feasible set D is nonempty (dual feasibility). (A3) ξ 2 P (dξ) < (finite second order moment). R d (A4) P has a density of the form ρ(ξ) = d ρ i(ξ i ) (ξ R d ), where ρ i is a continuous (marginal) density on R, i = 1,..., d (independent components). (A5) The first d components of the adjacent vertices of D are distinct, i.e., all common closed faces of the normal cones to two adjacent vertices of D do not parallel the first d coordinate axes in R r (geometric condition). Conditions (A3) and (A4) are needed for defining and using the ANOVA decomposition in Sections 4 and 5. Our present analysis applies to random vectors ξ of the form ξ = Bζ, where B is a nonsingular (d, d)-matrix and the d-dimensional random vector ζ has independent components. Then the equality constraint W y = h(ξ) T x in (2) has to be replaced by ˆB 1 W y = h(ζ) ˆB 1 T x, where ˆB = ( B 0 0 I and I denotes the (r d, r d) identity matrix. The structure ξ = Bζ applies, in particular, to multivariate normal probability distributions where the matrix B may be chosen to be orthogonal (see Section 6). The geometric condition (A5) has technical character and will be further discussed in Section 6. Conditions (A1), (A2), (A3) imply that the two-stage stochastic program (3) is well defined and represents an optimization problem with finite convex objective and polyhedral convex feasible set. If X is compact its optimal value v(p ) is finite and its solution set S(P ) is nonempty, closed and convex. The quantitative stability results [49, Theorems 5 and 9] for general stochastic programs imply the perturbation estimate v(p ) v(q) L sup Φ(x, ξ)(p Q)(dξ) (17) x X R d ( ) S(Q) S(P ) + ψ 1 P sup x X ) Φ(x, ξ)(p Q)(dξ) R d where B is the unit ball in R m, ψ P is the growth function of the objective { } ψ P (τ) = inf c, x + Φ(x, ξ)p (dξ) v(p ) : d(x, S(P )) τ, x X R d (16) B, (18) (τ 0), its inverse is defined by ψ 1 P (t) = sup{τ R + : ψ P (τ) t}, and Q is a probability measure satisfying (A3), too. 7

10 For further information on linear parametric programming and two-stage stochastic programming we refer to [61, 34] and [51, 52, 70]. To give an example for (3) we show that option pricing models considered as stimulating examples for the recent developments in QMC theory (see e.g. [66, 67]) may be reformulated as linear two-stage stochastic programs. Example 4 Let the first stage variable x represent the strike price at the expiration date T e. The dimensions are set to m = 1, m = 2 and the matrix W is set to W = (w, w) with w = exp (rt e ) and r denoting the risk-free interest rate. The second stage program and its dual are min{y 1 : W y = ξ x, y R 2, y 0} = max{(ξ x)z : z R, W z (1, 0) } = max{(ξ x)z : 0 wz 1}. The terminal payoff is exp ( rt e ) max{0, ξ x} and v = 0 and v = 1 are the only vertices. Taking w the expectation then leads to the optimization model { } min x + exp ( rt e ) max{0, ξ x}ρ(ξ)dξ : x 0 R for maximizing the strike price. Now, it depends on the kind of option how the random variable ξ depends on the geometric Brownian motion S given by S t = S 0 exp ((r 1 2 σ2 )t + σb t ) with volatility σ and standard Brownian motion (B t ) t 0. For example, for arithmetic Asian options one has [65] ξ = 1 d S ti with t i = it e, i = 1,..., d. d d Hence, in a sense, the integrand (10) extends the situations encountered in such option pricing models. It is, however, much more involved. 4 ANOVA decomposition of integrands and effective dimension The analysis of variance (ANOVA) decomposition of a function was first proposed as a tool in statistical analysis (see [18] and the survey [59]). In [56] it was first used for the analysis of quadrature methods. We consider a density function ρ on R d and assume (A4) from Section 3. As in [15] we consider the weighted L p space over R d, i.e., L p,ρ (R d ), with the norm ( f(ξ) p ρ(ξ)dξ f p,ρ = R d ess sup f(ξ) ξ R d ) 1 p if 1 p < +, if p = +. Let D = {1,..., d} and f L 1,ρ (R d ). The projection P k, k D, is given by (P k f)(ξ) := f(ξ 1,..., ξ k 1, s, ξ k+1,..., ξ d )ρ k (s)ds (ξ R d ). 8

11 Clearly, the function P k f is constant with respect to ξ k. For u D we use u for its cardinality, u for D \ u and write ( P u f = P k )(f), k u where the product means composition. We note that the ordering within the product is not important because of Fubini s theorem. The function P u f is constant with respect to all ξ k, k u. Note that P u satisfies the properties of a projection, namely, P u is linear and P 2 u = P u. The ANOVA decomposition of f L 1,ρ (R d ) is of the form [65, 27] f = u D f u (19) with f u depending only on ξ u, i.e., on the variables ξ j with indices j u. It satisfies the property P j f u = 0 for all j u and the recurrence relation f = I d,ρ (f) := P D (f) and f u = P u (f) v u f v. It is known from [27] that the ANOVA terms are given explicitly by f u = v u ( 1) u v P v f = P u (f) + v u( 1) u v P u v (P u (f)), (20) where P u and P u v mean integration with respect to ξ j, j D \ u and j u \ v, respectively. The second representation motivates that f u is essentially as smooth as P u (f) due to the Inheritance Theorem [15, Theorem 2]. The following result is well known (e.g. [65]). Proposition 5 If f belongs to L 2,ρ (R d ), the ANOVA functions {f u } u D are orthogonal in L 2,ρ (R d ). We define the variance of f and f u by σ 2 (f) = f I d,ρ (f) 2 2,ρ, σ2 u(f) = f u 2 2,ρ, and have σ 2 (f) = f 2 2,ρ (I d,ρ (f)) 2 = f u 2 2,ρ = σu(f). 2 =u D =u D In the literature, the ANOVA decomposition is often considered for functions g L 1 ([0, 1] d ). Then the projections are defined by and (P k g)(υ) := 1 0 g(υ 1,..., υ k 1, s, υ k+1,..., υ d )ds (υ [0, 1] d ) ( Pug := k u P k ) (g) (u D). Similarly to the case in R d the ANOVA decomposition of g L 1 ([0, 1] d ) is of the form g = u D g u, g := I d (g) := P D(g) and g u := P u(g) v u g v with g u depending only on υ u, i.e., on the variables υ j with indices j u. Note that Pu is indeed again a projection and, assuming that g L 2 ([0, 1] d ), the same orthogonality property (now over L 2 ([0, 1] d )) as in Proposition 5 follows. 9

12 Assuming now for simplicity that ρ j (t) > 0 for all t R, j = 1,..., d, an integrand f L 1,ρ (R d ) can be transformed into a function g defined on [0, 1] d by inverting the function ϕ := (ϕ 1,..., ϕ d ), ϕ i (t) := t ρ i (s)ds (i D) (21) and by defining g(υ) := { (f ϕ 1 )(υ) if υ (0, 1) d, 0 if υ [0, 1] d \ (0, 1) d. (22) Then the ANOVA terms g u of g satisfy where f u (ξ u ) = g u ϕ u (ξ u ) for ξ u R u, g u (υ u ) = (f u ϕ 1 u )(υ u ) for υ u (0, 1) u, (23) ϕ u := (ϕ j1,..., ϕ j u ), ϕ 1 u := (ϕ 1 j 1,..., ϕ 1 j u ), (j k u, 1 k u, j k < j l, k < l). When setting σ 2 u(g) := [0,1] u g 2 u(υ u )dυ u for = u D and σ 2 (g) := 0 one obtains σ2 u(g) = σ 2 u(f) for u D. We return to the R d case and assume σ(f) > 0 in the following to avoid trivial cases. The normalized ratios σ2 u(f) σ 2 (f) serve as indicators for the importance of the variable ξu in f. They are used in [57] to define global sensitivity indices of a set u D by S u = 1 σ 2 (f) v u σ 2 v(f) and Su = 1 S u = 1 σ 2 (f) If S u is small, then the variable ξ u is considered inessential for f in [57]. v u σ 2 v(f). The normalized ratios are also used in [39, 30] to define and study the dimension distribution of a function f in two ways. The dimension distribution of f in the superposition (truncation) sense is a probability measure ν S (ν T ) defined on the power set of D by ν S (s) := ν S ({s}) = u =s σ 2 u(f) σ 2 (f) ( ν T (s) = max{j:j u}=s Hence, the mean dimension in the superposition (truncation) sense is d S = =u D u σ2 u(f) σ 2 (f) =u D σu(f) 2 ) σ 2 (f) (s D). ( dt = ) max{j : j u} σ2 u(f). σ 2 (24) (f) It is proved in [30, Theorem 2] that the mean dimension d S in the superposition sense is closely related to the global sensitivity indices of subsets of D containing a single element. Namely, d S = d S {j}. (25) The paper [30] also provides a formula for the dimension variance based on S u for all subsets u of D containing two indices. 10

13 For small ε (0, 1) (ε = 0.01 is suggested in a number of papers), the effective superposition (truncation) dimension d S (ε) D (d T (ε) D) is the (1 ε)-quantile of ν S (ν T ), i.e., d S (ε) = min{s D : ν S (u) 1 ε, u s} d T (ε) = min{s D : ν T ({1,..., s}) 1 ε}. Note that d S (ε) d T (ε) and (see [65, 13]) { f 2,ρ max f u, f u d S (ε) u {1,...,d T (ε)} f u 2,ρ } εσ(f). (26) Small effective superposition dimension d S (ε), even if d T (ε) is large, suggests that we may expect superiority of QMC over MC. We note that there exist algorithms based on MC or QMC to compute global sensitivity indices and effective dimensions approximately (see [57, 65, 58, 66] for example). Since the algorithms are often described for functions on [0, 1] d, we mention that the dimension distribution and, hence, any effective dimension of f is the same as for g given by (22). The algorithm of [65] for estimating the effective truncation dimension can be carried out equivalently for f, with its obvious adaption to the R d setting. All these notions are discussed in [39] for different classes of functions, including additive and multiplicative functions. We record here the results for additive functions for later reference. Example 6 For functions f having separability structure, i.e., f is of the form f(ξ) = d g j (ξ j ) (ξ R d ) with g j L 2,ρj (R), j = 1,..., d, the second and higher order ANOVA terms vanish (see [39]). Hence, the effective superposition dimension d S (ε) is equal to 1 for every ε (0, 1) while the effective truncation dimension d T (ε) = s if d j=s+1 where σ j is the variance of g j, j = 1,..., d. ( d σj 2 ε The importance of the ANOVA decomposition in the context of this paper is due to the fact that the ANOVA terms f u with u < d may be (much) smoother than the original integrand f under certain conditions (see [14, 15]). As in [15] we use the notation D i f for i D to denote the classical partial derivative f x i. For a multi-index α = (α 1,..., α d ) with α i N 0 we set D α f = d D α i i f = σ 2 j ), α f x α 1 1 x α, d and call D α f the partial derivative of order α = d α i. The function D α f is called weak or Sobolev derivative of order α if it is measurable on R d and satisfies (D α f)(ξ)v(ξ)dξ = ( 1) α f(ξ)(d α v)(ξ)dξ for all v C0 (R d ), R d R d d 11

14 where C 0 (R d ) denotes the space of infinitely differentiable functions with compact support in R d and D α v is a classical derivative. Then classical derivatives are also weak derivatives. In accordance with the notation (5) we consider in the next section the mixed Sobolev space W (1,...,1) 2,ρ,mix (Rd ) = { f L 2,ρ (R d ) : D α f L 2,ρ (R d ) if α i 1, i D }. 5 ANOVA decomposition of linear two-stage integrands According to Section 3 the integrands in linear two-stage stochastic programming map from R d to R and are given by f(ξ) = f x (ξ) = max,...,l vj, (ξ, h) T x (x X), (27) where the v j, j = 1,..., l, are the vertices of the dual feasible set D = {z R r : W z q} and K j are the normal cones to v j, j = 1,..., l. The integrands are parametrized by the first-stage decision x varying in X. Such functions do not belong to the tensor product Sobolev spaces described in Section 1 and, in general, are not of bounded variation in the sense of Hardy and Krause (see [40, Proposition 17]). Next we intend to compute projections P k (f) for k D. Let x X be fixed, ξ i R, i = 1,..., d, i k, be given. We set ξ k = (ξ 1,..., ξ k 1, ξ k+1,..., ξ d ) and ξ k s = (ξ 1,..., ξ k 1, s, ξ k+1,..., ξ d ). We assume (A1) (A5) and have according to Section 3 (ξ k s, h) T x dom φ = for every s R and by definition of the projection (P k f)(ξ k ) = f(ξ k s )ρ k (s)ds = l K j f(ξ 1,..., ξ k 1, s, ξ k+1,..., ξ d )ρ k (s)ds. (28) The one-dimensional affine subspace {(ξs k, h) T x : s R} intersects a finite number of the polyhedral cones K j. Hence, there exist p = p(k) N {0}, s i = s k i R, i = 1,..., p, and j i = ji k {1,..., l}, i = 1,..., p + 1, such that s i < s i+1 and (ξ k s, h) T x K j1 s (, s 1 ] (ξ k s, h) T x K ji s [s i 1, s i ] (i = 2,..., p) (ξ k s, h) T x K jp+1 s [s p, + ). By setting s 0 :=, s p+1 :=, we obtain the following explicit representation of P k f p+1 (P k f)(ξ k ) = si where the points s i, i = 1,..., p, satisfy the equations s i 1 v j i, (ξ k s, h) T x ρ k (s)ds, (29) 0 = (ξs k i, h) T x, v j i+1 v j i = (ξs k i, 0) + (0, h) T x, v j i+1 v j i = d ξ j (v j i+1 j v j i j ) + s i(v j i+1 k v j i k ) + (0, h) T x, v j i+1 v j i j k 12

15 according to (15). By setting w i = v j i+1 v j i for i = 1,..., p and z(x) = (0, h) T x this leads to the explicit formula s i = s i (ξ k, x) = 1 [ wk i d j k ] wjξ i j z(x), w i (i = 1,..., p). (30) Hence, all s i, i = 1,..., p, are affine functions of the remaining components ξ j, j k. The first step in our analysis consists in studying smoothness properties of the projection P k f on R d. We note that f and P k f are finite convex functions on R d and R d 1, respectively and, hence, twice differentiable almost everywhere on their domains due to Alexandroff s theorem (see, for example, [10, Section 6.4]). Our analysis shows that the integration in (28) even improves the smoothness properties. In the following, we consider a point ξ k 0 R d 1 and an open ball B ɛ0 (ξ k 0) around ξ k 0 with radius ɛ 0. Assume that the ball B ɛ0 (ξ k 0) is small enough such that the set of cones K ɛ (ξ k 0) := { K j : K j {(ξ k s, h) T x : s R} for some ξ k B ɛ (ξ k 0) } (31) satisfies K ɛ (ξ k 0) = K ɛ0 (ξ k 0) for 0 < ɛ < ɛ 0. Thus, the relevant intersecting cones are fixed in B ɛ (ξ k 0). We consider also the sets of intersecting cones at an arbitrary point ξ k B ɛ (ξ k 0) K(ξ k ) := { K j : K j {(ξ k s, h) T x : s R} }. (32) Because any affine one-dimensional space {(ξ k s, h) T x : s R} for some ξ k B ɛ (ξ k 0) is in fact a parallel translation of {(ξ k 0,s, h) T x : s R}, ɛ 0 can be chosen even small enough such that K(ξ k ) K ɛ0 (ξ k 0) for every ξ k B ɛ0 (ξ k 0). Therefore we have K(ξ k 0) = K ɛ0 (ξ k 0). (33) Moreover, since the cones K j K ɛ0 (ξ k 0) are convex, their intersection with the affine one-dimensional space {(ξ k 0,s, h) T x : s R} is given either by an interval or by a single point. The latter can only appear if the intersection meets just the vertex or an edge (i.e., faces of dimension zero or one) of a polyhedral cone belonging to K(ξ k 0). Hence, the subset of R d that corresponds to such single points has Lebesgue measure 0 in R d. In case that the intersection is given by an interval I Kj (ξ k 0), we have due to (A5) that the interior of I Kj (ξ k 0), denoted I K j (ξ k 0), contains only interior points of K j. This is true because otherwise the interval I Kj (ξ k 0) must lie in a facet of K j, and this would imply that there is a facet that is parallel to one of the canonical basis elements e k, 1 k d, in contradiction to (A5). This implies that we can partition the affine one-dimensional space {(ξ k 0,s, h) T x : s R} by considering the intervals (s i, s i+1 ), 0 i p(ξ k 0), s 0 = and s p(ξ k 0 )+1 = +, such that {(ξ k 0,s, h) T x : s (s i, s i+1 )} K j i. Now, we are ready to state our first result on smoothness properties of P k f. Theorem 7 Let k D and x X. Assume (A1) (A5) and let f = f x be the integrand (27) of the linear two-stage stochastic program (3). Then the kth projection P k f of f is continuously differentiable on R d 1. P k f is s-times continuously differentiable almost everywhere if the density ρ k belongs to C s 2 (R) for some s N, s 2. Proof. We consider the two possible cases for open balls around ξ k 0 : 13

16 P 1.) There exists ɛ 0 > 0 such that K(ξ k ) = K(ξ k 0) for all ξ k B ɛ0 (ξ k 0). P 2.) For each ɛ > 0 there exists ξ k B ɛ (ξ k 0) such that K(ξ k ) K(ξ k 0). In case P 1.), the functions s i are differentiable in the entire neighborhood B ɛ0 (ξ k 0), because they admit a representation as an affine function. Thus, we obtain from (29) for any l D, l k, that P k f is partially differentiable with respect to ξ l at ξ k and P k f ξ l (ξ k ) = = = p+1 ξ l p+1 si p+1 si s i 1 v j i, (ξ k s, h) T x ρ k (s)ds s i 1 v j i l ρ k(s)ds + p v j i, (ξ k s i, h) T x ρ k (s i ) s i ξ l v j i, (ξs k i 1, h) T x ρ k (s i 1 ) s i 1 ξ l i=2 p+1 v j i l si p+1 ρ k (s)ds = v j i l (ϕ k(s i ) ϕ k (s i 1 )), s i 1 where we used the identity v j i, (ξ k s i, h) T x = v j i+1, (ξ k s i, h) T x for each i = 1,..., p and ϕ k denotes the marginal distribution function with density ρ k. By reordering the latter sum we have P k f (ξ k ) = ξ l p w i lϕ k (s i ) + v j p+1 l. (34) Hence, the behavior of all first order partial derivatives of P k f only depends on the kth marginal distribution function ϕ k. The latter are again differentiable and it follows for r D, r k, 2 P k f ξ r ξ l (ξ k ) = p wl iwi r ρ wk i k (s i ). (35) Hence, P k f is second order continuously differentiable on the neighborhood B ɛ0 (ξ k 0). More generally, if ρ k C s 2 (R) for some s N, s 2, P k f is s-times continuously differentiable on the neighborhood B ɛ0 (ξ k 0). In case P 2.), we use the identity (33) and consider all cones belonging to K(ξ k 0). Let K i, i = 1,..., p + 1, denote all such cones which are normal cones to the vertices v j i of D. Furthermore, let s i, i = 1,..., p, be nondecreasing and defined by (ξ k s i, h) T x K i K i+1 and we set s 0 = and s p+1 = +. We allow explicitly that s i 1 = s i holds for some i {1,..., p 1}. Then we have p+1 P k f(ξ0) k = si s i 1 v j i, (ξ k 0,s, h) T x ρ k (s)ds, where p = p(ξ k 0) and s i = s i (ξ k 0, x), i = 1,..., p, is given by (30). Now, let ξ k B ɛ0 (ξ k 0). Due to (33) P k f(ξ k ) may be represented by a subset of the set K(ξ k 0). Of course, K 1 and K p+1 and all cones K i such that s i 1 (ξ k 0, x) < s i (ξ k 0, x) appear also in the 14

17 representation of P k f(ξ k ). Those cones K i with s i 1 (ξ k 0, x) = s i (ξ k 0, x) may either disappear or appear with s i 1 (ξ k, x) < s i (ξ k, x). If they disappear we set s i 1 (ξ k, x) = s i (ξ k, x) and include them formally into the representation which is of the form p+1 P k f(ξ k ) = si (ξ k,x) s i 1 (ξ k,x) v j i, (ξ k s, h) T x ρ k (s)ds. In a small ball around ξ k this representation doesn t change. Hence, P k f is differentiable also in case P 2.) and the partial derivative is of the form p+1 P k f (ξ k ) = ξ l v j i l si (ξ k,x) s i 1 (ξ k,x) ρ k (s)ds (36) as in case P 1.) and, hence, continuous. The integrals with s i 1 (ξ k, x) = s i (ξ k, x) are again formally included into (36). The second order partial derivative at ξ k 2 P k f ξ r ξ l (ξ k ) for r D, r k, is, however, not of the form (35) in general since summands are missing that correspond to integrals with s i 1 (ξ k, x) = s i (ξ k, x). Moreover, note that the points ξ0 k R d 1 with the property s i 1 (ξ0, k x) = s i (ξ0, k x) and s i 1 (ξ k, x) < s i (ξ k, x) for some ξ k in B ɛ0 (ξ0), k ξ k ξ0 k, and i {1,..., p}, belong to a solution set of linear (see (30)) equations in d 1 variables of the form s i 1 (ω k, x) = s i (ω k, x) with s i 1 and s i being not parallel each other, i {1,..., p}. Thus, those singular points ξ0 k R d 1, where the second order partial derivatives may not exist, belong to a subset of a finite union of hyperplanes in R d 1, each one of dimension less than or equal to d 2, and thus they are a measure-zero subset of R d 1. Hence, P k f is continuously differentiable on R d 1, but the mixed second order partial derivatives may exist only almost everywhere on R d 1. Corollary 8 Let u D and x X. Assume (A1) (A5). Then the projection P u f is continuously differentiable on R d u and second order continuously differentiable almost everywhere in R d u. Proof. If u = 1 the result follows from Theorem 7. For u = {k, j} with k, j D, k j, we obtain from the Leibniz theorem [15, Theorem 1] for l u and r u D l P u f(ξ u ) := P u f(ξ u ) = P j P k f(ξ u ) ξ l ξ l D r D l P u f(ξ u ) := 2 ξ l ξ r P u f(ξ u ) = P j 2 ξ l ξ r P k f(ξ u ) if the derivatives exist, and since integration by P j preserves the smoothness (see [15, Theorem 2]), the claim is proved. Moreover, from the proof of Theorem 7 we obtain the explicit form p D l P u f(ξ u ) = wl i ϕ k (s i (ξ k ))ρ j (ξ j )dξ j + v j p+1 l, for all ξ u in R d u (37) D r D l P u f(ξ u ) = p w i l wi r w i k R R ρ k (s i (ξ k ))ρ j (ξ j )dξ j, a.e. in R d u. (38) If u contains more than two elements, the integrals on the right-hand side become multiple integrals. In all cases, however, such an integral is a continuous function of the remaining variables ξ i, i D \ u. This can be shown using Lebesgue s theorem as ϕ k and ρ k are continuous and bounded on R. The following is the main result of this section. 15

18 Theorem 9 Assume (A1) (A5). Then all ANOVA terms of f except the one of highest order are first order continuously differentiable on R d and all second order partial derivatives exist and are continuous except in a set of Lebesgue measure zero and quadratically integrable with respect to the density ρ. In particular, the first and second order ANOVA terms of f belong to the mixed Sobolev space W (1,...,1) 2,ρ,mix (Rd ). Proof. According to (20) the ANOVA terms of f are defined by f u = P u (f) + v u( 1) u v P v (f) for all nonempty subsets u of D. Hence, all ANOVA terms of f for u D are continuously differentiable on R d. Second order partial derivatives of those ANOVA terms exist and are continuous almost everywhere in R d. The non-vanishing first order partial derivatives of the second order ANOVA terms are of the form D l f {l,r} (ξ l, ξ r ) = D l P D\{l,r} f(ξ l, ξ r ) D l P D\{l} f(ξ l ) p = ϕ k (s i (ξ k )) ρ i (ξ i )dξ {l,r} D l P D\{l} f(ξ l ) w i l R i D\{l,r} i k for all l, r D and some k D. Since ϕ k is Lipschitz continuous, the functions D l f {l,r} and D r f {l,r} are Lipschitz continuous with respect to each of the two variables ξ l and ξ r independently when the other variable is fixed almost everywhere. Hence, D l f {l,r} and D r f {l,r} are partially differentiable with respect to ξ r and ξ l, respectively, in the sense of Sobolev (see, for example, [10, Section 4.2.3]). Furthermore, the second order partial derivative is almost everywhere bounded (see also (35)) and due to (A3) quadratically integrable with respect to ρ. Remark 10 The second order ANOVA approximation of f, i.e., f (2) := 2 f u (39) belongs to the tensor product Sobolev space W (1,...,1) 2,mix (Rd ). Hence, if the effective superposition dimension is at most 2, f (2) is a good approximation of f due to (26) and a favorable behavior of randomly shifted lattice rules may be expected. u =1 u D The following two examples provide some more insight into the smoothness of the first order projections. The first example verifies that some projection is not continuously differentiable if (A5) is violated while the second example shows that conditions (A1) (A5) do not imply second order partial differentiability of all projections. Example 11 Let m = 3, d = 2, Ξ = R 2, P denote a probability distribution with independent marginal densities ρ i, i = 1, 2, whose means are w.l.o.g. equal to 0. We assume that (A3) is satisfied for P. Let the vector q and matrix W W = ( ) q =

19 be given. Then (A1) and (A2) are satisfied and the dual feasible set D (Figure 1) is D = {z R 2 : W z q} = {z R 2 : z 1 + z 2 1, z 1 + z 2 1, z 2 0}, i.e., D is a triangle and has the three vertices ( ) 1 v 1 = v 2 = 0 ( 1 0 ) v 3 = ( 0 1 ). Hence, the second component of the two adjacent vertices v 1 and v 2 coincides. According to (13) the K 3 v 3 D v 2 0 v 1 K 2 K 1 Figure 1: Illustration of D, its vertices v j and the normal cones K j to its vertices normal cones K j to D at v j, j = 1, 2, 3, are K 1 = {z R 2 : z 1 0, z 2 z 1 }, K 2 = {z R 2 : z 1 0, z 2 z 1 }, K 3 = {z R 2 : z 2 z 1, z 2 z 1 }. The function φ (see (11)) and the integrand are of the form φ(t) = max,2,3 vi, t = max{t 1, t 1, t 2 } = max{ t 1, t 2 } f(x, ξ) = max{ ξ 1 [T x] 1, ξ 2 [T x] 2 }. The ANOVA projection P 1 f is defined by (P 1 f)(ξ 2 ) = + max{ ξ 1 [T x] 1, ξ 2 [T x] 2 }ρ 1 (ξ 1 )dξ 1 (ξ 2 R). For ξ 2 [T x] 2 0 one obtains (P 1 f)(ξ 2 ) = = and in case ξ 2 [T x] ξ 1 [T x] 1 ρ 1 (ξ 1 )dξ 1 (ξ 1 [T x] 1 )ρ 1 (ξ 1 )dξ 1 2 [T x]1 (ξ 1 [T x] 1 )ρ 1 (ξ 1 )dξ 1 (P 1 f)(ξ 2 ) = + ξ 1 [T x] 1 ρ 1 (ξ 1 )dξ 1 0 ξ2 [T x] 2 (ξ 1 + ξ 2 [T x] 1 [T x] 2 )ρ 1 (ξ 1 )dξ 1. 17

20 Hence, P 1 f belongs to C 1 (R) for all x X if ρ is continuous. When calculating the ANOVA projection P 2 f, notice that assumption (A5) is violated. We obtain (P 2 f)(ξ 1 ) = ξ 1 [T x] 1 ξ1 [T x] 1 and P 2 f does not belong to C 1 (R) for all x X. ρ 2 (ξ 2 )dξ ξ 1 [T x] 1 (ξ 2 [T x] 2 )ρ 2 (ξ 2 )dξ 2 Example 12 Let m = 3, d = 2, P denote a two-dimensional probability distribution with independent continuous marginal densities ρ i, i = 1, 2, whose means are w.l.o.g. equal to 0. Again we assume that (A3) is satisfied for P. Let the vector q and matrix W W = ( ) q = be given. Then (A1) and (A2) are satisfied and the dual feasible set D (Figure 2) is D = {z R 2 : W z q} = {z R 2 : z 1 + z 2 1, z 1 + z 2 1, z 1 3z 2 1}, i.e., D is also a triangle and has the three vertices ( ) ( 2 1 v 1 = v 2 = 1 0 ) v 3 = ( 0 1 ). Hence, both components of the vertices v j, j = 1, 2, 3, are distinct. This means that (A4) and (A5) are satisfied. The normal cones K j to D at v j, j = 1, 2, 3, are K 1 = {z R 2 : z 2 z 1, z 2 3z 1 }, K 2 = {z R 2 : z 2 3z 1, z 2 z 1 }, K 3 = {z R 2 : z 2 z 1, z 2 z 1 }. z 2 K 3 v 3 v 2 0 K 2 D z 1 K 1 v 1 Figure 2: Illustration of D, its vertices v j and the normal cones K j to its vertices The function φ in (11) and the integrand are of the form φ(t) = max,2,3 vi, t = max{2t 1 t 2, t 1, t 2 } f(x, ξ) = φ(ξ 1 [T x] 1, ξ 2 [T x] 2 ). 18

21 The ANOVA projection P 1 f is given by (P 1 f)(ξ 2 ) = + max{2(s [T x] 1 ) ξ 2 + [T x] 2, s + [T x] 1, ξ 2 [T x] 2 }ρ 1 (s)ds for every ξ 2 R. For simplicity let x = 0. First let ξ 2 > 0. (P 1 f)(ξ 2 ) = = + s1 max{2s ξ 2, s, ξ 2 }ρ 1 (s)ds sρ 1 (s)ds + s2 s 2 s 1 ξ 2 ρ 1 (s)ds + + (2s ξ 2 )ρ 1 (s)ds, where s 1 = s 1 (ξ 2 ) = ξ 2 and s 2 = s 2 (ξ 2 ) = ξ 2. Hence, + ( ξ 2 (P 1 f)(ξ 2 ) = 3 ξ 2 sρ 1 (s)ds + ξ 2 ρ 1 (s)ds ξ 2 Now, we compute the partial derivatives for ξ 2 > 0 and obtain + ξ 2 ) ρ 1 (s)ds. P 1 f (ξ 2 ) ξ 2 = ξ 2 (ρ 1 (ξ 2 ) ρ 1 ( ξ 2 )) + (2ϕ 1 (ξ 2 ) ϕ 1 ( ξ 2 ) 1) P 1 f (0+) ξ 2 = ϕ 1 (0) 1 2 P 1 f (ξ ξ2 2 2 ) = 2ρ 1 (ξ 2 ) + ρ 1 ( ξ 2 ) = 3ρ 1 (ξ 2 ). P 1 f is for ξ 2 > 0 s-times continuously differentiable if ρ 1 C s 2 (R) for any s N. Now, let ξ 2 < 0. Then we obtain with s 1 (ξ 2 ) = ξ 2 3 (P 1 f)(ξ 2 ) = = + s1 max{2s ξ 2, s, ξ 2 }ρ 1 (s)ds sρ 1 (s)ds + + s 1 (2s ξ 2 )ρ 1 (s)ds P 1 f (ξ 2 ) ξ 2 = ξ 2 3 1( ξ 2 3 1( ξ ( ξ 2 3 1( ξ 2 3 P 1 f (0 ) ξ 2 = ϕ 1 (0) 1 2 P 1 f (ξ ξ2 2 2 ) = 1ρ 3 1( ξ 2 3 P 1 f is for ξ 2 < 0 s-times continuously differentiable if ρ 1 C s 2 (R) for any s N. Hence, P 1 f belongs to C 1 (R), but its second derivative is discontinuous at ξ 2 = 0. The same holds for P 2 f. Remark 13 (error estimate) If the assumptions of Theorem 9 are satisfied and all marginal densities ρ j, j D, are positive and continuously differentiable, all ANOVA terms g u, u = 1, 2, of g given by (23) are continuously differentiable on R d and the second order partial derivatives exist in the sense of Sobolev. However, as observed already in [53], the partial derivatives are not quadratically integrable and, hence, g u does 19

22 not belong to the tensor product Sobolev space (5). Meanwhile, a theory was developed in [26] in which a suitable weight function is incorporated into the integrals defining the inner product (6) and into the kernel to remedy this issue. In this way, the same convergence rate (8) is obtained in [26] for a number of probability distributions (including the normal distribution). In order not to complicate the argument with technical details we assume here for simplicity that the ANOVA terms g u, u = 1, 2, belong to (5). Then the QMC quadrature error may be estimated as follows: n f(ξ)ρ(ξ)dξ n 1 f(ξ j n ) = g(x)dx n 1 g(x j ) R d [0,1] d 0< u d g u (x u )dx u n 1 [0,1] d n g u (x j ) 2 Disc n,u (x 1,..., x n ) g u γ + (40) u =1 d g u (x)dx n 1 [0,1] d u =3 n g u (x j ), (41) where x j i = ϕ i(ξ j i ) (0, 1)d, j = 1,..., n, i = 1,..., d, are the QMC points and Disc n,u is the weighted L 2 - discrepancy Disc 2 n,u(x 1,..., x n ) = γ u [0,1] u disc 2 u(x u )dx u, where the discrepancy disc is given by disc u (x u ) = i u x i n 1 {j {1,..., n} : x j [0, x u )}, and g u γ the weighted norm of g u given by (6) in the weighted tensor product Sobolev space (5). Recalling the arguments in the introduction one may conclude that all terms in (40) converge with the optimal rate (8) while all terms in (41) also converge to 0 due to Proinov s convergence result [47] (as the g u are continuous). In addition, the sum (41) can be further estimated by d ( u =3 [0,1] d g 2 u(x)dx + n 1 n ) 1 gu(x 2 j 2 ) = d n ( f u 2L2 + n 1 fu(ξ 2 j ) u =3 ) 1 2. (42) d Since (26) implies u =3 f u 2 L 2 εσ 2 (f) if d S (ε) 2 and the second term on the right-hand side of (42) represents a QMC approximation of the first term, we may conclude that the term in (41) is of the form O( ε). Hence, we obtain the estimate f(ξ)ρ(ξ)dξ n 1 R d n f(ξ j ) C(δ)n 1+δ + O( ε) (43) if the condition d S (ε) 2 is satisfied. The latter may eventually be achieved by applying dimension reduction techniques (see Section 7). 20

23 Moreover, when recalling the results in [68], one may hope that the convergence rate for the terms in (40) is even better. Finally, we note that the constants involved in the estimate (43) may be chosen to be uniform with respect to x X. Together with the perturbation estimates (17) and (18) in Section 3 one, hence, obtains v(p ) v(p n ) Ĉ(δ)n 1+δ + O( ε), S(P n ) S(P ) + ψ 1 P (Ĉ(δ)n 1+δ + O( ε)) if d S (ε) 2. Here, P n is the discrete probability measure representing the QMC method, i.e., P n = n 1 n δ ξ j, where δ ξ denotes the Dirac measure placing unit mass at ξ. 6 Orthogonal transformations and the Gaussian case We consider the stochastic program (3) with Φ(x, ξ) = φ(h(ξ) T x) and the infimal function (11) of the second stage problem given by (14). Assume that (A1) (A3) is satisfied. Further we assume that ξ is of the form ξ = Qζ with some orthogonal (d, d) matrix Q and with ζ satisfying (A4). Then the relevant integrand is of the form f(ξ) = max,...,l vj, (Qζ, h) T x = max,...,l ˆQ v j, (ζ, h) ˆQ T x, where the (r, r) matrix ˆQ is defined as in (16) with B replaced by Q. Hence, the results of Section 5 apply if the vertices ˆQ v j, j = 1,..., l, of the linearly transformed dual feasible set ˆQ D satisfy the corresponding assumption (A5). The set ˆQ D may be represented in the form ˆQ D = { ˆQ z : W z q} = {z R r : ( ˆQ W ) z q}. The geometric condition (A5) is violated only if some facet of ˆQ D is parallel to some of the first d coordinate axis. To study the set of orthogonal (d, d) matrices Q such that the latter property is true, we consider the set M of all nonsingular (d, d) matrices as elements of R d2 and the subset M o of all orthogonal (d, d) matrices. We recall that M o is a compact subset of M and is of dimension l(d) = 1 2 d(d 1). Hence, M o can be considered as a subset of R l(d) which is equipped with the Lebesgue measure λ l(d). The (d, d) matrices Q cause rotations of R d and, thus, changes of the first d variables of the elements of D. If all d variables change, i.e., if Q causes a full-dimensional rotation, then there exist only countably many matrices Q such that (A5) is violated for ˆQD. All orthogonal matrices Q such that less than d variables change, form a subset of M o of dimension smaller than l(d). In both situations the set of all orthogonal matrices Q M o such that (A5) is violated for ˆQD has Lebesgue measure λ l(d) zero. Hence, we may say that (A5) is satisfied for almost every orthogonal matrix Q if ξ is given in the form ξ = Qζ with ζ satisfying (A4). We note that this argument applies to the principal component analysis (PCA) decomposition of a normal random vector with nonsingular covariance matrix (see Section 7). Corollary 14 Let x X be fixed and assume (A1), (A2) and that ξ is normal with nonsingular covariance matrix Σ. For almost every covariance matrix the ANOVA approximation f (2) of f given by (39) belongs to the mixed Sobolev space W (1,...,1) 2,ρ,mix (Rd ). 21

ANOVA decomposition of convex piecewise linear functions

ANOVA decomposition of convex piecewise linear functions W. Römisch Abstract Piecewise linear convex functions arise as integrands in stochastic programs. They are Lipschitz continuous on their domain,