Scenario estimation and generation

Similar documents
Introduction to Stochastic Optimization Part 4: Multi-stage decision

Ambiguity in portfolio optimization

Stability of Stochastic Programming Problems

Scenario Reduction: Old and New Results

Stability-based generation of scenario trees for multistage stochastic programs

Jitka Dupačová and scenario reduction

Information theoretic perspectives on learning algorithms

Scenario generation in stochastic programming

Scenario generation in stochastic programming with application to energy systems

Stability of optimization problems with stochastic dominance constraints

Asymptotic distribution of the sample average value-at-risk

Birgit Rudloff Operations Research and Financial Engineering, Princeton University

Mean-field dual of cooperative reproduction

Robustness and bootstrap techniques in portfolio efficiency tests

Reformulation of chance constrained problems using penalty functions

Generalized quantiles as risk measures

A note on scenario reduction for two-stage stochastic programs

Probability metrics and the stability of stochastic programs with recourse

Dynamic Risk Measures and Nonlinear Expectations with Markov Chain noise

P (A G) dp G P (A G)

Data-Driven Risk-Averse Two-Stage Stochastic Program with ζ-structure Probability Metrics

Lecture Quantitative Finance Spring Term 2015

Aggregate Risk. MFM Practitioner Module: Quantitative Risk Management. John Dodson. February 6, Aggregate Risk. John Dodson.

Scenario generation Contents

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Complexity of two and multi-stage stochastic programming problems

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Stochastic Programming: From statistical data to optimal decisions

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Introduction to Algorithmic Trading Strategies Lecture 10

Dynamic Generation of Scenario Trees

ORIGINS OF STOCHASTIC PROGRAMMING

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

Lecture 21: Convergence of transformations and generating a random variable

An Alternative Method for Estimating and Simulating Maximum Entropy Densities

Scenario Grouping and Decomposition Algorithms for Chance-constrained Programs

Stein s method and zero bias transformation: Application to CDO pricing

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Multistage stochastic programs: Time-consistency, time-inconsistency and martingale bounds

Motivation General concept of CVaR Optimization Comparison. VaR and CVaR. Přemysl Bejda.

CVaR distance between univariate probability distributions and approximation problems

Asymptotic Statistics-VI. Changliang Zou

Optimization Problems with Probabilistic Constraints

Three hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Time Series Models for Measuring Market Risk

arxiv: v3 [math.oc] 25 Apr 2018

EMPIRICAL ESTIMATES IN STOCHASTIC OPTIMIZATION VIA DISTRIBUTION TAILS

Lecture 35: December The fundamental statistical distances

Backtesting Marginal Expected Shortfall and Related Systemic Risk Measures

Simple Consumption / Savings Problems (based on Ljungqvist & Sargent, Ch 16, 17) Jonathan Heathcote. updated, March The household s problem X

Revisiting some results on the sample complexity of multistage stochastic programs and some extensions

Selected Exercises on Expectations and Some Probability Inequalities

Solutions of the Financial Risk Management Examination

Scenario Reduction for Risk-Averse Stochastic Programs

A Conditional Approach to Modeling Multivariate Extremes

Portfolio optimization with stochastic dominance constraints

Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets

Valid Inequalities and Restrictions for Stochastic Programming Problems with First Order Stochastic Dominance Constraints

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Concentration inequalities for non-lipschitz functions

Scenario Generation in Stochastic Programming with Application to Optimizing Electricity Portfolios under Uncertainty

Worst-Case Violation of Sampled Convex Programs for Optimization with Uncertainty

Robust Growth-Optimal Portfolios

Sharp bounds on the VaR for sums of dependent risks

Stochastic Dual Dynamic Programming with CVaR Risk Constraints Applied to Hydrothermal Scheduling. ICSP 2013 Bergamo, July 8-12, 2012

Combinatorial Data Mining Method for Multi-Portfolio Stochastic Asset Allocation

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

Coherent Risk Measures. Acceptance Sets. L = {X G : X(ω) < 0, ω Ω}.

CVaR and Examples of Deviation Risk Measures

Uncertainty in energy system models

Testing Downside-Risk Efficiency Under Distress

Optimization Tools in an Uncertain Environment

Stochastic Optimization One-stage problem

Upper bound for optimal value of risk averse multistage problems

Normal Approximation for Hierarchical Structures

Backtesting Marginal Expected Shortfall and Related Systemic Risk Measures

Monte Carlo Methods. Leon Gu CSD, CMU

Introduction & Random Variables. John Dodson. September 3, 2008

Distirbutional robustness, regularizing variance, and adversaries

VaR vs. Expected Shortfall

Revisiting some results on the complexity of multistage stochastic programs and some extensions

The Instability of Correlations: Measurement and the Implications for Market Risk

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

1. Stochastic Processes and filtrations

Quantifying Stochastic Model Errors via Robust Optimization

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Conditional Sampling for Max Stable Random Fields

Miloš Kopa. Decision problems with stochastic dominance constraints

A simple graphical method to explore tail-dependence in stock-return pairs

Vincent Guigues FGV/EMAp, Rio de Janeiro, Brazil 1.1 Comparison of the confidence intervals from Section 3 and from [18]

Chance constrained optimization - applications, properties and numerical issues

Robust Backtesting Tests for Value-at-Risk Models

1. Introduction. In this paper we consider stochastic optimization problems of the form

Maximum Mean Discrepancy

Explicit Bounds for the Distribution Function of the Sum of Dependent Normally Distributed Random Variables

Optimization of nonstandard risk functionals

Scientific Computing: Monte Carlo

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Transcription:

October 10, 2004

The one-period case Distances of Probability Measures Tensor products of trees Tree reduction

A decision problem is subject to uncertainty Uncertainty is represented by probability To reduce computational complexity, a discrete probability with only few mass points is preferred The discrete probability distribution representing uncertainty is called the scenario model.

The two phases of scenario generation Phase 1. Find the correct (appropriate, reasonable) probability distribution - the uncertainty model Phase 2. Find a scenario model, such that the information loss going from the uncertainty model to the scenario model is small Phase 1 is done by Statistical Estimation Expert opinion Regulator s rules Any combination of the above

Multistage decisions and multistage scenarios ξ 1,..., ξ T a multivariate time series process (e.g. future interest rates, future asset prices, etc.) F 0,... F T 1 a filtration

x 0,..., x T 1 decisions observation of observation of observation of the r.v. ξ 1 the r.v. ξ 2 the r.v. ξ 3 = 0 t = t 1 t = t 2 t = t 3 decision decision decision decision x 0 x 1 x 2 x 3

The success of the strategy is expressed in terms of a cost function H(x 0, ξ 1, x 1,..., x T 1, ξ T ) The final objective is to minimize a functional F of the cost success function, such as the expectation, a quantile or some other functional F[H(x 0, ξ 1, x 1,..., x T 1, ξ T )] Minimize in x 0, x 1 (ξ 1 ),..., x T 1 (ξ 1,..., ξ T 1 ) : F[H(x 0, ξ 1,..., x T 1, ξ T )] under the information constraints as well as other constraints Information (measurability) constraints x 0 = a constant F 0 measurable x 1 = x 1 (ξ 1 ) F 1 measurable. x T 1 = x T 1 (ξ 1, ξ 2,..., ξ T 1 ) F T 1 measurable

Stochastic processes See a stochastic process ξ 1,..., ξ T as a random element in R T See a stochastic process and a sequence of transition probabilities P{ξ 1 A 1 }, P{ξ 2 A 2 ξ 1 }, P{ξ 3 A 3 ξ 1, ξ 2 },... Only the second view models the information structure along with the uncertainty structure.

The one-period case Distances of Probability Measures Tree Approximations The transition probability view of a stochastic process leads to tree processes in the discrete world. A stochastic process ν 1,..., ν T is a tree process, if all conditional distributions P{ν 1,..., ν t 1 ν t } are degenerated or - equivalenty - σ(ν t ) = σ(ν 1, ν 2,..., ν t ). Every stochastic process is a function of a tree process. v 1 4 p 4 1 v 4 1 3 4 p 4 v 5 1 2 5 p 5 5 p 5 v 6 3 6 p 6 6 p 6 v 0 v 2 v7 2 1 2 0 2 7 p 7 0 2 7 p 7 v 8 1 8 p 8 8 p 8 v 9 3 v 3 9 p 9 1 9 p 9 4 3 4 3 v 10 10 p 10 2 10 p 10

The one-period case Distances of Probability Measures The approximating problem Minimize in x 0, x 1 ( ξ 1 ),..., x T 1 ( ξ 1,..., ξ T 1 ) : F[H(x 0, ξ 1,..., x T 1, ξ T )] under the information constraints as well as other constraints Information (measurability) constraints x 0 = a constant F 0 measurable x 1 = x 1 ( ξ 1 ) F 1 measurable x 2 = x 2 ( ξ 1, ξ 2 ) F 2 measurable. x T 1 = x T 1 ( ξ 1, ξ 2,..., ξ T 1 ) F T 1 measurable Error= min(,, ) min(,, ).

The one-period case Distances of Probability Measures The one-period case 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 6 4 2 0 2 4 6 G original distribution function G discrete approximation G : values v 1 v 2 v m probabilities p 1 p 2 p m

The one-period case Distances of Probability Measures Distances of Probability measures Let H be a family of functions. Define a semidistance by d H (G 1, G 2 ) = sup{ h(w) dg 1 (w) h(w) dg 2 (w) : h H}.. If H is the class of the indicator functions 1l A of all measurable sets, the variational distance is obtained d V (G 1, G 2 ) = sup{ 1l A (u) d[g 1 G 2 ](u) : A measurable}. If H is the class of indicator functions of half-unbounded rectangles in R k the Kolmogorov-Smirnov distance is obtained d KS (G 1, G 2 ) = sup{ G 1 (x) G 2 (x) : x R k }.

The one-period case Distances of Probability Measures If H is the class of Lipschitz-continuous functions with Lipschitz-constant 1 the Wasserstein distance is obtained. d W (G 1, G 2 ) = sup{ h dg 1 hdg 2 : h(u) h(v) u v }. If Hc is the class of Lipschitz functions of order p, the Fortet-Mourier distance is obtained. d FMp (G 1, G 2 ) = sup{ h dg 1 hdg 2 : L p (h) 1 where L p (f ) = inf{l : h(u) h(v) L u v max(1, u p 1, v p 1 )}.

The one-period case Distances of Probability Measures Suppose that the stochastic optimization problem is to maximize h(x, u) dg(u). If H is a family of functions containing the functions {h(x, ) : x X}. Then defining a semidistance as d H (G 1, G 2 ) = sup{ h(w) dg 1 (w) h(w) dg 2 (w) : h H}. entails that Error sup x h(x, w) dg 1 (w) Typical integrands are of the form h(x, w) dg 2 (w) d H (G 1, G 2 ). u i α i (x)[c i (x) u d i (x)] + i.e. they are Lipschitz in u.

The one-period case Distances of Probability Measures KS-distance and W-distance: A comparison Let ξ 1 G 1, ξ 2 G 2 d KS (G 1, G 2 ) = sup{ G 1 (u) G 2 (u) } x = sup{ P(ξ 1 u) P(ξ 2 u) }. Hlawka-Koksma Inequality: h(u) dg 1 (u) h(u) dg 2 (u) d KS (G 1, G 2 ) V (h). where V (h) is the total variation of h V (h) = sup{ i h(z i ) h(z i 1 : z 1 < z 2 < < z n, n arbitrary }.

The one-period case Distances of Probability Measures If K is a monotonic function, then d KS (K(ξ 1 ), K(ξ 2 )) = d KS (ξ 1, ξ 2 ). Therefore, using the quantile transform, the problem reduces to approximate the uniform [0,1] distribution. If u 1 u 2 u m p 1 p 2 p m approximates the uniform [0,1] distribution with d KS distance ɛ, then G 1 (u 1 ) G 1 (u 2 ) G 1 (u m ) p 1 p 2 p m approximates G also with d KS distance ɛ.

The one-period case Distances of Probability Measures The optimal approximation of a continuous distribution G by a distribution sitting on mass points v 1,..., v n with probabilities p 1,..., p n w.r.t. the KS distance is given by The distance is 1/n. z i = G 1 ( 2i 1 2n ), p i = 1/n. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2 1.5 1 0.5 0 0.5 1 1.5 2

The one-period case Distances of Probability Measures Other KS approximations The Monte Carlo method If U i is a sequence of i.i.d. random variables from a uniform [0,1], then distribution G n sitting on points v i = G 1 (U i ) with masses 1/n satisfies P{sup G(v) G n (v) ɛ/ n} 58 exp( 2ɛ 2 ). v (Dvoretzky, Kiefer, Wolfowitz inequality) The Quasi Monte Carlo method A sequence u i is called a low discrepancy sequence in [0,1] (e.g. the Sobol sequence), if the KS distance between the uniform distribution and the empirical distribution based on (u i ) is O(log(n)/n). Then with v i = G 1 (u i ) with masses 1/n we have d KS (G, G n ) C log n n for all n.

The one-period case Distances of Probability Measures The W-distance and the facility location problem d W (G 1, G 2 ) = sup{ h dg 1 hdg 2 : h(u) h(v) u v }. Theorem (Kantorovich-Rubinstein). Dual version of Wasserstein-distance: d W (G 1, G 2 ) = inf{e( X Y : (X, Y ) is a bivariate r.v. with given marginal distribution functions G 1 and G 2 }.

The one-period case Distances of Probability Measures Remark If both measures sit on a finite number of mass points {v 1, v 2,... }, then d W (G 1, G 2 ) is the optimal value of the following linear optimization problem: Minimize i,j κ ijr ij i κ ij = µ j for all j j κ ij = ν i for all i Here µ i resp. ν i is the mass sitting on v i and r i,j = v i v j. Interpretation as an optimal mass transportation problem: 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.05 0.1 6 4 2 0 2 4 6

The one-period case Distances of Probability Measures For an infinite space, it is in general impossible to find the explicit form of the joint distribution of X and Y which minimizes E( X Y ). For the case k = 1 and for a convex function ψ, the explicit solution of the mass transportation problem is known: Lemma.(Dall Aglio (1972)) inf{e(ψ(x Y ) : (X, Y ) is a bivariate r.v. with marginals G 1 and G 2 } = 1 0 ψ(g 1 1 1 (u) G (u)) du. Corollary. The Wasserstein distance on R 1 satisfies d W (G 1, G 2 ) = G1 1 1 (u) G2 (u) du = G 1 (u) G 2 (u) du. 2

The one-period case Distances of Probability Measures An example Suppose we want to approximate the t-distribution with 2 degrees of freedom by a probability measure sitting on five points. Using the KS- distance one gets the solution G 1 value -1.8856-0.6172 0 0.6172 1.8856 probability 0.2 0.2 0.2 0.2 0.2 Using the Wasserstein distance one gets G 2 value -4.58-1.56 0 1.56 4.58 probability 0.0446 0.2601 0.3906 0.2601 0.0446 0.4 0.4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 6 4 2 0 2 4 6 0 6 4 2 0 2 4 6

The one-period case Distances of Probability Measures Example (continued)- A newsboy problem Consider the family of functions w h(w, a) = [w a] + + 0.3[w a]. We have plotted a h(w, a) dg(w) (solid) a h(w, a) d G 1 (w) (dotted), a h(w, a) d G 2 (w) (dashed). One sees that G 2 gives a better approximation and the location of the minimum is also closer to the true one. 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 6 5 4 3 2 1 0

The one-period case Distances of Probability Measures Finding the WD approximation Calculating a W-distance minimal n-point approximation G n of a distribution G equals solving a transportation problem. The goal is find the set of borders B b i, i = 1,..., n + 1 which solve the problem minimize n i=1 bi+1 b i g(x) v i d(x) The ith transportation-center v i can be calculated as follows: v i = G 1 ( 1 2 bi+1 b i bi g(x) dx g(x) dx) b 1

The one-period case Distances of Probability Measures Global Optimization Branch and Bound can be applied to calculate the global minimum. Divide search space into N sections and create the B&B tree recursively: Set the first border b 2 to all N n + 1 positions, these form subproblems with n 1 transportation centers. This method lowers the dimension of the problem. Upper bounds are evaluated at the leaves of the B&B tree and the lower bounds can be calculated by finding the minimum possible transportation cost within two borders. Local Optimization Meta-heuristics may be applied to find local minima much faster. E.g. a combination of iterative local search with simulated annealing: 1. Generate initial set of borders B (e.g. KS-distance optimal borders) 2. Move border b x which best improves the optimal value of problem. 3. If no border can be moved improving the optimal value, start simulated annealing (Randomly choose a border, if movement improves the optimal value, move it, otherwise draw random number and decide whether to move it or not, lower temperature,...)

The one-period case Distances of Probability Measures Approximation: Fortet-Mourier distance Via transformation approximable with Wasserstein distance minimization: { χ p u u 1 (u) = sign(u) u p u > 1 Choose p: Transform G with χ p G 1/p = G χ 1/p Approximate G 1/p by G 1/p with min(wasserstein) Backtransformation: G = G 1/p χ p

The one-period case Distances of Probability Measures Approximation methods: Summary Full Monte Carlo Quasi Monte Carlo Moment matching KS-approximation Wasserstein and Fortet-Mourier Approximation some more

The one-period case Distances of Probability Measures Moment matching? Advantage: Easy to do. Moments do not characterize the distribution. The moment distance d(g, G) = sup k u k dg(u) u k d G(u) is not a distance. 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 5 4 3 2 1 0 1 2 3 4 5

Tensor products of trees Tree reduction Phase 1: Estimation and simulation of paths Parametric approach: Use data, estimate (time-series) model, simulate paths over whole time/all stages. 4.6 4.4 4.2 4 3.8 3.6 3.4 3.2 3 2.8 2.6 t = 4 t = 3 t = 2 t = 1 t = 0 t = 1 t = 2 Dow Jones Industrial: n = 10 simulated GARCH(1,1) paths

Tensor products of trees Tree reduction Phase 2: Generating multi-variate, multi-period trees Step 1. Approximate data in the first stage (e.g. Wasserstein distance) 180 160 140 120 100 80 60 40 t = 0 t = 1 t = 2 t = 3

Tensor products of trees Tree reduction Generating multi-variate, multi-period trees Step 2. For subsequent stages use paths conditional to predecessor 180 160 140 120 100 80 60 40 t = 0 t = 1 t = 2 t = 3

Tensor products of trees Tree reduction Generating multi-variate, multi-period trees Step 3. Iterate through the whole tree 140 130 120 110 100 90 80 70 60 t = 0 t = 1 t = 2 t = 3

Tensor products of trees Tree reduction Mathematics behind Let P t (A t x t 1 ) be the conditional probability of ξ t A t given the past ξ (t 1) = (ξ 1,..., ξ (t 1) ) = x t 1 = (x 1,..., x t 1 ). Let P another such probability, again dissected into its chain of conditional probabilities. We define d(p, P) = d(p 1, P 1 ) + T t=2 sup d(p t ( x (t 1) ), P t ( x (t 1) )). x (t 1) where d is the Wasserstein distance. For an empirical measure (or any finitely supported measure), the conditional probability does only exist except for finitely many conditions. However we define the distance as zero, if the conditional distribution is not defined. d does not fulfill the triangle inequality. However, d reflects the information structure. Notice that the usual empirical does not converge w.r.t. d, only the tree empirical does.

Tensor products of trees Tree reduction Empricals and tree empiricals 1.25 1.2 1.15 1.1 1.05 1 0.95 0.9 0.85 0.8 1.3 1.2 1.1 1 0.9 0.8 0.9 0.95 1 1.05 1.1 1.15 Ordinary empirical and tree empirical

Tensor products of trees Tree reduction Tensor products of tree processes Suppose that (ξ (1) t ) and (ξ (2) t ) are two tree processes. There is just one way of constructing the joint tree process (ξ (1) t, ξ (2) t ) such that the two components are independent. However, more often one needs to construct a joint process with given correlation.

Tensor products of trees Tree reduction Joint distributions with given marginals and given rrelation Constructing a joint distribution Q with given marginals X = (x i, p 1i ), Y = (y i, p 2i ) and given correlation ρ can be calculated such that an additional (tree) reduction is obtained by solving the non-linear program minimize subject to m n i=1 j=1 η(q ij) m i=1 q ij = p 1j j = 1,..., n n j=1 q ij = p 2i i = 1,..., m m i=1 n j=1 q ijx i y j = E(X )E(Y ) ρ Var(X )Var(Y ) with η(x) = { 0 if x = 0, M if x > 0.

Tensor products of trees Tree reduction An example Two asset categories, first stage, 3 nodes each. Stocks x i 1.12 1.03 0.95 p i 0.21 0.52 0.27 Bonds x i 1.054 1.04 1.031 p i 0.32 0.42 0.26 Joint distribution with ρ = 0.4 and reduction: Bonds / Stocks 1.12 1.03 0.95 1.054 0.142 0 0.058 1.04 0.112 0.409 0 1.031 0.067 0 0.202

Tensor products of trees Tree reduction Tree reduction Distance based tree reduction (Dupacova, Groewe, Roemisch; Heitsch) EVPI based tree reduction (Dempster, Thompson, Consigli)

Tensor products of trees Tree reduction Distance based tree reduction Rule: If two branches of the tree are closer in distance than some threshold, collapse the two.

Tensor products of trees Tree reduction EVPI based tree reduction EVPI = Feasible minimum - Clairvoyant s minimum Feasible minimum = min{f[h(x 0, ξ 1,..., x T 1, ξ T )] : x 0, x 1 ( ξ 1 ),..., x T 1 ( ξ 1,..., ξ T 1 )} Clairvoynt s minimum = min{f[h(x 0, ξ 1,..., x T 1, ξ T )] : x 0 ( ξ 1,..., ξ T 1 ),..., x T 1 ( ξ 1,..., ξ T 1 )} EVPI may be defined for every subtree (given the optimal decision in earlier stages). This EVPI process is a supermartingale. However, there is no relation between the EVPI s of full (finer) model and a reduced (coarser) model.

Tensor products of trees Tree reduction EVPI EVPI. 3.2 5.1 2.4 4 0.5 2.7 0.4 0.1 0.8

Empirical convergence of central moments (1) Data: Weekly data of DJI and 10-year T-Bonds: 01/1993-01/1999 Approximation: Wasserstein, Monte Carlo, Sobol Sequence, Moment matching 1.18 8 x 10 3 1.16 7 6 1.14 5 1.12 4 1.1 3 1.08 2 1.06 1 1.04 3 5 10 15 20 25 30 0 5 10 15 20 25 30 First and second central moment approximation (n = 2 : 35)

Empirical convergence of central moments (2) 6 x 10 4 1.2 x 10 4 5 1 4 0.8 3 0.6 2 1 0.4 0 0.2 1 5 10 15 20 25 30 0 0 5 10 15 20 25 30 35 Third and fourth central moment approximation (n = 2 : 35) Moment matching (Kaut [2003]) started to match at n = 15 : 20.

Approximation errors: Single-stage portfolio selection Mathematical models M Risk functionals F: CVaR α : inf{a + 1 1 α E[Y a]+ : a R} Mean Absolute Deviation (MAD): n i=1 x i E(X ) Approximation methods S: Moment matching (Høyland et al. [2003]) Probability distance minimization (Wasserstein distance) Quasi-random sampling (Low discrepancy series: Sobol sequence)

Risk measures: Linear programming formulations Mean Absolute Deviation [Konno/Yamazaki 1991][Feinstein/Thapa 1993] maximize subject to E(x, ξ) S s=1 p sz s z s S s=1 p sz s s = 1,..., S x X Conditional Value at Risk [Rockafellar/Uryasev 1999][Uryasev 2000] maximize γ 1 S 1 α s=1 p sz s subject to z s S s=1 p sz s s = 1,..., S x T e E(x, ξ) x X Portfolio constraints X : A a=1 x a 1, a : max(x a ) = 0.5,...

Empirical results 13 (somewhat randomly selected) assets: AOL, C, CSCO, DIS, EMC, GE, HPQ, MOT, NT, PFE, SUNW, WMT, XOM Weekly data from January 1993 to January 2003. Rolling horizon: asset allocation, backtracking Optimize MAD and CVaR with full data and approximations

Empirical results: MAD Example (Asset Allocation): Data 01/1993-01/1995, n = 50 aol aol c xom c csco dis xom csco dis sunw wmt emc emc pfe hpq mot nt pfe nt mot hpq sunw wmt ge Full dataset ge Wasserstein xom sunw wmt pfe nt aol xom wmt mot sunw pfe aol hpq nt c mot csco emc dis c ge Moment matching csco dis emc ge hpq Sobol sequence

Empirical results: CVaR α Example (Asset Allocation): Data 01/1999-01/2001, α = 0.1, n = 50 mot emc hpq ge dis cscoc aol xom nt mothpq csco emc dis ge caol xom nt pfe wmt wmt pfe sunw Full dataset sunw Wasserstein csco xom aol c aol xom wmt emc dis ge sunw c pfe wmt csco hpq nt dis sunw mot pfe nt Moment matching emc ge hpq mot Sobol

Scenario Generation and Backtracking (1) Expected Shortfall (CV@R), α = 0.4, n = 50 2.6 2.4 Complete Dataset Wasserstein Moment Matching Crude Monte Carlo 2.2 2 1.8 1.6 1.4 1.2 1 0.8 1996 1997 1998 1999 2000 2001 2002 2003

Scenario Generation and Backtracking (2) Mean Absolute Deviation (MAD), n = 50 2.2 Complete Dataset Wasserstein Moment Matching Crude Monte Carlo 2 1.8 1.6 1.4 1.2 1 1996 1997 1998 1999 2000 2001 2002 2003

Scenario Generation and Backtracking (3) Expected Shortfall (CV@R), α = 0.4, n = 20 - Moment Matching failed, substituted with Quasi Random Sampling 2.6 2.4 Complete Dataset Wasserstein Quasi Monte Carlo Crude Monte Carlo 2.2 2 1.8 1.6 1.4 1.2 1 1996 1997 1998 1999 2000 2001 2002 2003

Scenario Generation and Backtracking (4) Mean Absolute Deviation (MAD), n = 20 - Moment Matching failed, substituted with Quasi Random Sampling 2.2 2 Complete Dataset Wasserstein Quasi Monte Carlo Crude Monte Carlo 1.8 1.6 1.4 1.2 1 1996 1997 1998 1999 2000 2001 2002 2003

Scenario Generation and Backtracking (5) Expected Shortfall (CV@R), α = 0.4, n = 150 2.2 2 Complete Dataset Wasserstein Moment Matching Crude Monte Carlo 1.8 1.6 1.4 1.2 1 1996 1997 1998 1999 2000 2001 2002 2003

Scenario Generation and Backtracking (6) Mean Absolute Deviation (MAD), n = 150 2.5 Complete Dataset Wasserstein Moment Matching Crude Monte Carlo 2 1.5 1 1996 1997 1998 1999 2000 2001 2002 2003