October 10, 2004
The one-period case Distances of Probability Measures Tensor products of trees Tree reduction
A decision problem is subject to uncertainty Uncertainty is represented by probability To reduce computational complexity, a discrete probability with only few mass points is preferred The discrete probability distribution representing uncertainty is called the scenario model.
The two phases of scenario generation Phase 1. Find the correct (appropriate, reasonable) probability distribution - the uncertainty model Phase 2. Find a scenario model, such that the information loss going from the uncertainty model to the scenario model is small Phase 1 is done by Statistical Estimation Expert opinion Regulator s rules Any combination of the above
Multistage decisions and multistage scenarios ξ 1,..., ξ T a multivariate time series process (e.g. future interest rates, future asset prices, etc.) F 0,... F T 1 a filtration
x 0,..., x T 1 decisions observation of observation of observation of the r.v. ξ 1 the r.v. ξ 2 the r.v. ξ 3 = 0 t = t 1 t = t 2 t = t 3 decision decision decision decision x 0 x 1 x 2 x 3
The success of the strategy is expressed in terms of a cost function H(x 0, ξ 1, x 1,..., x T 1, ξ T ) The final objective is to minimize a functional F of the cost success function, such as the expectation, a quantile or some other functional F[H(x 0, ξ 1, x 1,..., x T 1, ξ T )] Minimize in x 0, x 1 (ξ 1 ),..., x T 1 (ξ 1,..., ξ T 1 ) : F[H(x 0, ξ 1,..., x T 1, ξ T )] under the information constraints as well as other constraints Information (measurability) constraints x 0 = a constant F 0 measurable x 1 = x 1 (ξ 1 ) F 1 measurable. x T 1 = x T 1 (ξ 1, ξ 2,..., ξ T 1 ) F T 1 measurable
Stochastic processes See a stochastic process ξ 1,..., ξ T as a random element in R T See a stochastic process and a sequence of transition probabilities P{ξ 1 A 1 }, P{ξ 2 A 2 ξ 1 }, P{ξ 3 A 3 ξ 1, ξ 2 },... Only the second view models the information structure along with the uncertainty structure.
The one-period case Distances of Probability Measures Tree Approximations The transition probability view of a stochastic process leads to tree processes in the discrete world. A stochastic process ν 1,..., ν T is a tree process, if all conditional distributions P{ν 1,..., ν t 1 ν t } are degenerated or - equivalenty - σ(ν t ) = σ(ν 1, ν 2,..., ν t ). Every stochastic process is a function of a tree process. v 1 4 p 4 1 v 4 1 3 4 p 4 v 5 1 2 5 p 5 5 p 5 v 6 3 6 p 6 6 p 6 v 0 v 2 v7 2 1 2 0 2 7 p 7 0 2 7 p 7 v 8 1 8 p 8 8 p 8 v 9 3 v 3 9 p 9 1 9 p 9 4 3 4 3 v 10 10 p 10 2 10 p 10
The one-period case Distances of Probability Measures The approximating problem Minimize in x 0, x 1 ( ξ 1 ),..., x T 1 ( ξ 1,..., ξ T 1 ) : F[H(x 0, ξ 1,..., x T 1, ξ T )] under the information constraints as well as other constraints Information (measurability) constraints x 0 = a constant F 0 measurable x 1 = x 1 ( ξ 1 ) F 1 measurable x 2 = x 2 ( ξ 1, ξ 2 ) F 2 measurable. x T 1 = x T 1 ( ξ 1, ξ 2,..., ξ T 1 ) F T 1 measurable Error= min(,, ) min(,, ).
The one-period case Distances of Probability Measures The one-period case 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 6 4 2 0 2 4 6 G original distribution function G discrete approximation G : values v 1 v 2 v m probabilities p 1 p 2 p m
The one-period case Distances of Probability Measures Distances of Probability measures Let H be a family of functions. Define a semidistance by d H (G 1, G 2 ) = sup{ h(w) dg 1 (w) h(w) dg 2 (w) : h H}.. If H is the class of the indicator functions 1l A of all measurable sets, the variational distance is obtained d V (G 1, G 2 ) = sup{ 1l A (u) d[g 1 G 2 ](u) : A measurable}. If H is the class of indicator functions of half-unbounded rectangles in R k the Kolmogorov-Smirnov distance is obtained d KS (G 1, G 2 ) = sup{ G 1 (x) G 2 (x) : x R k }.
The one-period case Distances of Probability Measures If H is the class of Lipschitz-continuous functions with Lipschitz-constant 1 the Wasserstein distance is obtained. d W (G 1, G 2 ) = sup{ h dg 1 hdg 2 : h(u) h(v) u v }. If Hc is the class of Lipschitz functions of order p, the Fortet-Mourier distance is obtained. d FMp (G 1, G 2 ) = sup{ h dg 1 hdg 2 : L p (h) 1 where L p (f ) = inf{l : h(u) h(v) L u v max(1, u p 1, v p 1 )}.
The one-period case Distances of Probability Measures Suppose that the stochastic optimization problem is to maximize h(x, u) dg(u). If H is a family of functions containing the functions {h(x, ) : x X}. Then defining a semidistance as d H (G 1, G 2 ) = sup{ h(w) dg 1 (w) h(w) dg 2 (w) : h H}. entails that Error sup x h(x, w) dg 1 (w) Typical integrands are of the form h(x, w) dg 2 (w) d H (G 1, G 2 ). u i α i (x)[c i (x) u d i (x)] + i.e. they are Lipschitz in u.
The one-period case Distances of Probability Measures KS-distance and W-distance: A comparison Let ξ 1 G 1, ξ 2 G 2 d KS (G 1, G 2 ) = sup{ G 1 (u) G 2 (u) } x = sup{ P(ξ 1 u) P(ξ 2 u) }. Hlawka-Koksma Inequality: h(u) dg 1 (u) h(u) dg 2 (u) d KS (G 1, G 2 ) V (h). where V (h) is the total variation of h V (h) = sup{ i h(z i ) h(z i 1 : z 1 < z 2 < < z n, n arbitrary }.
The one-period case Distances of Probability Measures If K is a monotonic function, then d KS (K(ξ 1 ), K(ξ 2 )) = d KS (ξ 1, ξ 2 ). Therefore, using the quantile transform, the problem reduces to approximate the uniform [0,1] distribution. If u 1 u 2 u m p 1 p 2 p m approximates the uniform [0,1] distribution with d KS distance ɛ, then G 1 (u 1 ) G 1 (u 2 ) G 1 (u m ) p 1 p 2 p m approximates G also with d KS distance ɛ.
The one-period case Distances of Probability Measures The optimal approximation of a continuous distribution G by a distribution sitting on mass points v 1,..., v n with probabilities p 1,..., p n w.r.t. the KS distance is given by The distance is 1/n. z i = G 1 ( 2i 1 2n ), p i = 1/n. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2 1.5 1 0.5 0 0.5 1 1.5 2
The one-period case Distances of Probability Measures Other KS approximations The Monte Carlo method If U i is a sequence of i.i.d. random variables from a uniform [0,1], then distribution G n sitting on points v i = G 1 (U i ) with masses 1/n satisfies P{sup G(v) G n (v) ɛ/ n} 58 exp( 2ɛ 2 ). v (Dvoretzky, Kiefer, Wolfowitz inequality) The Quasi Monte Carlo method A sequence u i is called a low discrepancy sequence in [0,1] (e.g. the Sobol sequence), if the KS distance between the uniform distribution and the empirical distribution based on (u i ) is O(log(n)/n). Then with v i = G 1 (u i ) with masses 1/n we have d KS (G, G n ) C log n n for all n.
The one-period case Distances of Probability Measures The W-distance and the facility location problem d W (G 1, G 2 ) = sup{ h dg 1 hdg 2 : h(u) h(v) u v }. Theorem (Kantorovich-Rubinstein). Dual version of Wasserstein-distance: d W (G 1, G 2 ) = inf{e( X Y : (X, Y ) is a bivariate r.v. with given marginal distribution functions G 1 and G 2 }.
The one-period case Distances of Probability Measures Remark If both measures sit on a finite number of mass points {v 1, v 2,... }, then d W (G 1, G 2 ) is the optimal value of the following linear optimization problem: Minimize i,j κ ijr ij i κ ij = µ j for all j j κ ij = ν i for all i Here µ i resp. ν i is the mass sitting on v i and r i,j = v i v j. Interpretation as an optimal mass transportation problem: 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.05 0.1 6 4 2 0 2 4 6
The one-period case Distances of Probability Measures For an infinite space, it is in general impossible to find the explicit form of the joint distribution of X and Y which minimizes E( X Y ). For the case k = 1 and for a convex function ψ, the explicit solution of the mass transportation problem is known: Lemma.(Dall Aglio (1972)) inf{e(ψ(x Y ) : (X, Y ) is a bivariate r.v. with marginals G 1 and G 2 } = 1 0 ψ(g 1 1 1 (u) G (u)) du. Corollary. The Wasserstein distance on R 1 satisfies d W (G 1, G 2 ) = G1 1 1 (u) G2 (u) du = G 1 (u) G 2 (u) du. 2
The one-period case Distances of Probability Measures An example Suppose we want to approximate the t-distribution with 2 degrees of freedom by a probability measure sitting on five points. Using the KS- distance one gets the solution G 1 value -1.8856-0.6172 0 0.6172 1.8856 probability 0.2 0.2 0.2 0.2 0.2 Using the Wasserstein distance one gets G 2 value -4.58-1.56 0 1.56 4.58 probability 0.0446 0.2601 0.3906 0.2601 0.0446 0.4 0.4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 6 4 2 0 2 4 6 0 6 4 2 0 2 4 6
The one-period case Distances of Probability Measures Example (continued)- A newsboy problem Consider the family of functions w h(w, a) = [w a] + + 0.3[w a]. We have plotted a h(w, a) dg(w) (solid) a h(w, a) d G 1 (w) (dotted), a h(w, a) d G 2 (w) (dashed). One sees that G 2 gives a better approximation and the location of the minimum is also closer to the true one. 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 6 5 4 3 2 1 0
The one-period case Distances of Probability Measures Finding the WD approximation Calculating a W-distance minimal n-point approximation G n of a distribution G equals solving a transportation problem. The goal is find the set of borders B b i, i = 1,..., n + 1 which solve the problem minimize n i=1 bi+1 b i g(x) v i d(x) The ith transportation-center v i can be calculated as follows: v i = G 1 ( 1 2 bi+1 b i bi g(x) dx g(x) dx) b 1
The one-period case Distances of Probability Measures Global Optimization Branch and Bound can be applied to calculate the global minimum. Divide search space into N sections and create the B&B tree recursively: Set the first border b 2 to all N n + 1 positions, these form subproblems with n 1 transportation centers. This method lowers the dimension of the problem. Upper bounds are evaluated at the leaves of the B&B tree and the lower bounds can be calculated by finding the minimum possible transportation cost within two borders. Local Optimization Meta-heuristics may be applied to find local minima much faster. E.g. a combination of iterative local search with simulated annealing: 1. Generate initial set of borders B (e.g. KS-distance optimal borders) 2. Move border b x which best improves the optimal value of problem. 3. If no border can be moved improving the optimal value, start simulated annealing (Randomly choose a border, if movement improves the optimal value, move it, otherwise draw random number and decide whether to move it or not, lower temperature,...)
The one-period case Distances of Probability Measures Approximation: Fortet-Mourier distance Via transformation approximable with Wasserstein distance minimization: { χ p u u 1 (u) = sign(u) u p u > 1 Choose p: Transform G with χ p G 1/p = G χ 1/p Approximate G 1/p by G 1/p with min(wasserstein) Backtransformation: G = G 1/p χ p
The one-period case Distances of Probability Measures Approximation methods: Summary Full Monte Carlo Quasi Monte Carlo Moment matching KS-approximation Wasserstein and Fortet-Mourier Approximation some more
The one-period case Distances of Probability Measures Moment matching? Advantage: Easy to do. Moments do not characterize the distribution. The moment distance d(g, G) = sup k u k dg(u) u k d G(u) is not a distance. 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 5 4 3 2 1 0 1 2 3 4 5
Tensor products of trees Tree reduction Phase 1: Estimation and simulation of paths Parametric approach: Use data, estimate (time-series) model, simulate paths over whole time/all stages. 4.6 4.4 4.2 4 3.8 3.6 3.4 3.2 3 2.8 2.6 t = 4 t = 3 t = 2 t = 1 t = 0 t = 1 t = 2 Dow Jones Industrial: n = 10 simulated GARCH(1,1) paths
Tensor products of trees Tree reduction Phase 2: Generating multi-variate, multi-period trees Step 1. Approximate data in the first stage (e.g. Wasserstein distance) 180 160 140 120 100 80 60 40 t = 0 t = 1 t = 2 t = 3
Tensor products of trees Tree reduction Generating multi-variate, multi-period trees Step 2. For subsequent stages use paths conditional to predecessor 180 160 140 120 100 80 60 40 t = 0 t = 1 t = 2 t = 3
Tensor products of trees Tree reduction Generating multi-variate, multi-period trees Step 3. Iterate through the whole tree 140 130 120 110 100 90 80 70 60 t = 0 t = 1 t = 2 t = 3
Tensor products of trees Tree reduction Mathematics behind Let P t (A t x t 1 ) be the conditional probability of ξ t A t given the past ξ (t 1) = (ξ 1,..., ξ (t 1) ) = x t 1 = (x 1,..., x t 1 ). Let P another such probability, again dissected into its chain of conditional probabilities. We define d(p, P) = d(p 1, P 1 ) + T t=2 sup d(p t ( x (t 1) ), P t ( x (t 1) )). x (t 1) where d is the Wasserstein distance. For an empirical measure (or any finitely supported measure), the conditional probability does only exist except for finitely many conditions. However we define the distance as zero, if the conditional distribution is not defined. d does not fulfill the triangle inequality. However, d reflects the information structure. Notice that the usual empirical does not converge w.r.t. d, only the tree empirical does.
Tensor products of trees Tree reduction Empricals and tree empiricals 1.25 1.2 1.15 1.1 1.05 1 0.95 0.9 0.85 0.8 1.3 1.2 1.1 1 0.9 0.8 0.9 0.95 1 1.05 1.1 1.15 Ordinary empirical and tree empirical
Tensor products of trees Tree reduction Tensor products of tree processes Suppose that (ξ (1) t ) and (ξ (2) t ) are two tree processes. There is just one way of constructing the joint tree process (ξ (1) t, ξ (2) t ) such that the two components are independent. However, more often one needs to construct a joint process with given correlation.
Tensor products of trees Tree reduction Joint distributions with given marginals and given rrelation Constructing a joint distribution Q with given marginals X = (x i, p 1i ), Y = (y i, p 2i ) and given correlation ρ can be calculated such that an additional (tree) reduction is obtained by solving the non-linear program minimize subject to m n i=1 j=1 η(q ij) m i=1 q ij = p 1j j = 1,..., n n j=1 q ij = p 2i i = 1,..., m m i=1 n j=1 q ijx i y j = E(X )E(Y ) ρ Var(X )Var(Y ) with η(x) = { 0 if x = 0, M if x > 0.
Tensor products of trees Tree reduction An example Two asset categories, first stage, 3 nodes each. Stocks x i 1.12 1.03 0.95 p i 0.21 0.52 0.27 Bonds x i 1.054 1.04 1.031 p i 0.32 0.42 0.26 Joint distribution with ρ = 0.4 and reduction: Bonds / Stocks 1.12 1.03 0.95 1.054 0.142 0 0.058 1.04 0.112 0.409 0 1.031 0.067 0 0.202
Tensor products of trees Tree reduction Tree reduction Distance based tree reduction (Dupacova, Groewe, Roemisch; Heitsch) EVPI based tree reduction (Dempster, Thompson, Consigli)
Tensor products of trees Tree reduction Distance based tree reduction Rule: If two branches of the tree are closer in distance than some threshold, collapse the two.
Tensor products of trees Tree reduction EVPI based tree reduction EVPI = Feasible minimum - Clairvoyant s minimum Feasible minimum = min{f[h(x 0, ξ 1,..., x T 1, ξ T )] : x 0, x 1 ( ξ 1 ),..., x T 1 ( ξ 1,..., ξ T 1 )} Clairvoynt s minimum = min{f[h(x 0, ξ 1,..., x T 1, ξ T )] : x 0 ( ξ 1,..., ξ T 1 ),..., x T 1 ( ξ 1,..., ξ T 1 )} EVPI may be defined for every subtree (given the optimal decision in earlier stages). This EVPI process is a supermartingale. However, there is no relation between the EVPI s of full (finer) model and a reduced (coarser) model.
Tensor products of trees Tree reduction EVPI EVPI. 3.2 5.1 2.4 4 0.5 2.7 0.4 0.1 0.8
Empirical convergence of central moments (1) Data: Weekly data of DJI and 10-year T-Bonds: 01/1993-01/1999 Approximation: Wasserstein, Monte Carlo, Sobol Sequence, Moment matching 1.18 8 x 10 3 1.16 7 6 1.14 5 1.12 4 1.1 3 1.08 2 1.06 1 1.04 3 5 10 15 20 25 30 0 5 10 15 20 25 30 First and second central moment approximation (n = 2 : 35)
Empirical convergence of central moments (2) 6 x 10 4 1.2 x 10 4 5 1 4 0.8 3 0.6 2 1 0.4 0 0.2 1 5 10 15 20 25 30 0 0 5 10 15 20 25 30 35 Third and fourth central moment approximation (n = 2 : 35) Moment matching (Kaut [2003]) started to match at n = 15 : 20.
Approximation errors: Single-stage portfolio selection Mathematical models M Risk functionals F: CVaR α : inf{a + 1 1 α E[Y a]+ : a R} Mean Absolute Deviation (MAD): n i=1 x i E(X ) Approximation methods S: Moment matching (Høyland et al. [2003]) Probability distance minimization (Wasserstein distance) Quasi-random sampling (Low discrepancy series: Sobol sequence)
Risk measures: Linear programming formulations Mean Absolute Deviation [Konno/Yamazaki 1991][Feinstein/Thapa 1993] maximize subject to E(x, ξ) S s=1 p sz s z s S s=1 p sz s s = 1,..., S x X Conditional Value at Risk [Rockafellar/Uryasev 1999][Uryasev 2000] maximize γ 1 S 1 α s=1 p sz s subject to z s S s=1 p sz s s = 1,..., S x T e E(x, ξ) x X Portfolio constraints X : A a=1 x a 1, a : max(x a ) = 0.5,...
Empirical results 13 (somewhat randomly selected) assets: AOL, C, CSCO, DIS, EMC, GE, HPQ, MOT, NT, PFE, SUNW, WMT, XOM Weekly data from January 1993 to January 2003. Rolling horizon: asset allocation, backtracking Optimize MAD and CVaR with full data and approximations
Empirical results: MAD Example (Asset Allocation): Data 01/1993-01/1995, n = 50 aol aol c xom c csco dis xom csco dis sunw wmt emc emc pfe hpq mot nt pfe nt mot hpq sunw wmt ge Full dataset ge Wasserstein xom sunw wmt pfe nt aol xom wmt mot sunw pfe aol hpq nt c mot csco emc dis c ge Moment matching csco dis emc ge hpq Sobol sequence
Empirical results: CVaR α Example (Asset Allocation): Data 01/1999-01/2001, α = 0.1, n = 50 mot emc hpq ge dis cscoc aol xom nt mothpq csco emc dis ge caol xom nt pfe wmt wmt pfe sunw Full dataset sunw Wasserstein csco xom aol c aol xom wmt emc dis ge sunw c pfe wmt csco hpq nt dis sunw mot pfe nt Moment matching emc ge hpq mot Sobol
Scenario Generation and Backtracking (1) Expected Shortfall (CV@R), α = 0.4, n = 50 2.6 2.4 Complete Dataset Wasserstein Moment Matching Crude Monte Carlo 2.2 2 1.8 1.6 1.4 1.2 1 0.8 1996 1997 1998 1999 2000 2001 2002 2003
Scenario Generation and Backtracking (2) Mean Absolute Deviation (MAD), n = 50 2.2 Complete Dataset Wasserstein Moment Matching Crude Monte Carlo 2 1.8 1.6 1.4 1.2 1 1996 1997 1998 1999 2000 2001 2002 2003
Scenario Generation and Backtracking (3) Expected Shortfall (CV@R), α = 0.4, n = 20 - Moment Matching failed, substituted with Quasi Random Sampling 2.6 2.4 Complete Dataset Wasserstein Quasi Monte Carlo Crude Monte Carlo 2.2 2 1.8 1.6 1.4 1.2 1 1996 1997 1998 1999 2000 2001 2002 2003
Scenario Generation and Backtracking (4) Mean Absolute Deviation (MAD), n = 20 - Moment Matching failed, substituted with Quasi Random Sampling 2.2 2 Complete Dataset Wasserstein Quasi Monte Carlo Crude Monte Carlo 1.8 1.6 1.4 1.2 1 1996 1997 1998 1999 2000 2001 2002 2003
Scenario Generation and Backtracking (5) Expected Shortfall (CV@R), α = 0.4, n = 150 2.2 2 Complete Dataset Wasserstein Moment Matching Crude Monte Carlo 1.8 1.6 1.4 1.2 1 1996 1997 1998 1999 2000 2001 2002 2003
Scenario Generation and Backtracking (6) Mean Absolute Deviation (MAD), n = 150 2.5 Complete Dataset Wasserstein Moment Matching Crude Monte Carlo 2 1.5 1 1996 1997 1998 1999 2000 2001 2002 2003