Superhedging and distributionally robust optimization with neural networks

Size: px

Start display at page:

Download "Superhedging and distributionally robust optimization with neural networks"

Jessica Craig
5 years ago
Views:

1 Superhedging and distributionally robust optimization with neural networks Stephan Eckstein joint work with Michael Kupper and Mathias Pohl Robust Techniques in Quantitative Finance University of Oxford Stephan Eckstein Neural Network Hedging September 5, / 1

2 Robust optimization problems Setup: Set of probability measures Q on R d Continuous and bounded function f : R d R. Objective: Solve (P) := sup ν Q f dν Different choices of Q lead to Constrained optimal transport (see e.g. Ekren and Soner (018), includes martingale optimal transport) Distributionally robust optimization (see e.g. Esfahani and Kuhn (015), Blanchet and Murthy (016), Bartl et al. (017), Ob lój and Wiesel (018)) Mixtures of the above (Gao and Kleywegt (017)) Stephan Eckstein Neural Network Hedging September 5, 018 / 1

3 Robust optimization problems Setup: Set of probability measures Q on R d Continuous and bounded function f : R d R. Objective: Solve (P) := sup ν Q f dν Different choices of Q lead to Constrained optimal transport (see e.g. Ekren and Soner (018), includes martingale optimal transport) Distributionally robust optimization (see e.g. Esfahani and Kuhn (015), Blanchet and Murthy (016), Bartl et al. (017), Ob lój and Wiesel (018)) Mixtures of the above (Gao and Kleywegt (017)) Stephan Eckstein Neural Network Hedging September 5, 018 / 1

4 Robust optimization problems Setup: Set of probability measures Q on R d Continuous and bounded function f : R d R. Objective: Solve (P) := sup ν Q f dν Different choices of Q lead to Constrained optimal transport (see e.g. Ekren and Soner (018), includes martingale optimal transport) Distributionally robust optimization (see e.g. Esfahani and Kuhn (015), Blanchet and Murthy (016), Bartl et al. (017), Ob lój and Wiesel (018)) Mixtures of the above (Gao and Kleywegt (017)) Stephan Eckstein Neural Network Hedging September 5, 018 / 1

5 Robust optimization problems Setup: Set of probability measures Q on R d Continuous and bounded function f : R d R. Objective: Solve (P) := sup ν Q f dν Different choices of Q lead to Constrained optimal transport (see e.g. Ekren and Soner (018), includes martingale optimal transport) Distributionally robust optimization (see e.g. Esfahani and Kuhn (015), Blanchet and Murthy (016), Bartl et al. (017), Ob lój and Wiesel (018)) Mixtures of the above (Gao and Kleywegt (017)) Stephan Eckstein Neural Network Hedging September 5, 018 / 1

6 Example: Martingale optimal transport (MOT) Stock S t for times t = 1, has known marginals µ 1, µ P(R). Q = { ν P(R ) : ν 1 = µ 1, ν = µ, if (S 1, S ) ν, then E[S S 1 ] = S 1 } The function f can model an exotic option, e.g. f (s 1, s ) = (s Ks 1 ) are all potential martingale couplings ? Stephan Eckstein Neural Network Hedging September 5, / 1

7 Numerical approach: Discretization Idea: Reduce problem (P) to a finite dimensional problem (P n ) by going over to a discrete space. In the prior example for instance: Approximate marginals Replace Q by µ 1 µ n 1 = n α i δ xi, µ µ n = i=1 n β i δ yi i=1 Q n = {ν P(R ) : ν 1 = µ n 1, ν = µ n, if (S 1, S ) ν, then E[S S 1 ] = S 1 } Solve (P n ) = inf ν Qn f dν instead of (P). Stephan Eckstein Neural Network Hedging September 5, / 1

8 Numerical approach: Discretization In the prior example for instance: n Measures in n only put mass on intersections of grid n Stephan Eckstein Neural Network Hedging September 5, / 1

9 Numerical approach: Discretization Discretization of (P), as well as solving the discrete versions of (P) are studied and applied successfully in a lot of works: MOT: Alfonsi et al. (017), Guo and Ob lój (018) OT: See e.g. Peyre and Cuturi (018) Distributionally robust optimization: Esfahani and Kuhn (015) Difficulty: Discretization scales badly with dimension. In the prior example, (P n ) has n parameters. In a MOT problem with T time steps, and K dimensional assets, (P n ) has n T K parameters! Stephan Eckstein Neural Network Hedging September 5, / 1

10 Numerical approach: Discretization Discretization of (P), as well as solving the discrete versions of (P) are studied and applied successfully in a lot of works: MOT: Alfonsi et al. (017), Guo and Ob lój (018) OT: See e.g. Peyre and Cuturi (018) Distributionally robust optimization: Esfahani and Kuhn (015) Difficulty: Discretization scales badly with dimension. In the prior example, (P n ) has n parameters. In a MOT problem with T time steps, and K dimensional assets, (P n ) has n T K parameters! Stephan Eckstein Neural Network Hedging September 5, / 1

11 Alternative: Parametrization Idea: Work with {ν λ : λ Λ} Q where Λ is a finite-dimensional parameter space that one can scale independently of dimension. Problem: No expressive sets {ν λ : λ Λ} available that are also numerically feasible to work with. Stephan Eckstein Neural Network Hedging September 5, / 1

12 Using the dual formulation We consider problems (P) = sup ν Q fdν which allow for a dual formulation of the form (D) = inf ϕ(h) h H: h f where H C(R d ) and ϕ : H R is a linear functional. Example: For the MOT problem from before H = {h(x, y) = h 1 (x) + h (y) + h 3 (x) (y x) : h i C b (R)} ϕ(h) = R h 1dµ 1 + R h dµ Parametrizing H is simpler than parametrizing Q! (See also Henry-Labordère (013)) Stephan Eckstein Neural Network Hedging September 5, / 1

13 Using the dual formulation We consider problems (P) = sup ν Q fdν which allow for a dual formulation of the form (D) = inf ϕ(h) h H: h f where H C(R d ) and ϕ : H R is a linear functional. Example: For the MOT problem from before H = {h(x, y) = h 1 (x) + h (y) + h 3 (x) (y x) : h i C b (R)} ϕ(h) = R h 1dµ 1 + R h dµ Parametrizing H is simpler than parametrizing Q! (See also Henry-Labordère (013)) Stephan Eckstein Neural Network Hedging September 5, / 1

14 Why use neural networks? (Feed-forward) neural networks are parametrized functions that build on concatenation of several layers of simple functions. R k x A l σ A }{{ l 1... σ A } 0 (x) }{{} (l 1). layer 1. layer where A i are affine transformations and σ : R R is a non-linear activation function that is applied element-wise. Neural networks can approximate a lot of functions with a high, but numerically feasible amount of parameters. Successful in practice, even though theoretical understanding of numerical schemes is lacking. Stephan Eckstein Neural Network Hedging September 5, / 1

15 Solution approach using neural networks (D) = inf h H: h f ϕ(h) Step 1: We replace H by a set of neural network functions H m. Thereby, trading strategies are then restricted to feed-forward neural networks. Resulting finite-dimensional problem (D m ) = inf h H m : h f ϕ(h) Problem: Constraint h f prevents application of numerical schemes based on gradient-descent. Stephan Eckstein Neural Network Hedging September 5, / 1

16 Solution approach using neural networks (D m ) = inf h H m : h f ϕ(h) Step : Penalize the inequality constraint h f. (Dθ m ) = inf ϕ(h) + max{f h, 0}dθ h H m If θ gives positive mass of every open ball, then (D m θ ) = (Dm ). Implementation problem: The mapping x max{0, x} has no useful gradients. Stephan Eckstein Neural Network Hedging September 5, / 1

17 Solution approach using neural networks (Dθ m ) = inf ϕ(h) + h H m max{f h, 0}dθ Step 3: Approximate x max{x, 0} by a sequence of differentiable nondecreasing convex functions (β γ ) γ>0, e.g. β γ = γ max{0, x}. (Dθ,γ m ) = inf ϕ(h) + β γ (f h)dθ h H m Stephan Eckstein Neural Network Hedging September 5, / 1

18 Solution approach: Overview Label Statement Description (D) inf h H: h f ϕ(h) initial problem (D m ) inf h H m : h f (D m θ ) inf h H m ϕ(h) + (Dθ,γ m ) inf ϕ(h) + h H m ϕ(h) finite dimensional version of (D) (f h) + dθ dominated version of (D m ) β γ (f h) dθ penalized version of (D m θ ) Table: Summary of problems occurring in the approach. Stephan Eckstein Neural Network Hedging September 5, / 1

19 Solution approach: Theoretical results Under mild assumptions on H, H m and θ it holds (D m ) (D) for m (1) (D m θ,γ ) (Dm ) for γ () Related results To (1): E.g. Bühler et al. (018), To (): E.g. Cominetti and San Martín (1994), Cuturi (013), many others related to Sinkhorn distance, Bregman projection and Schrödinger problem. Stephan Eckstein Neural Network Hedging September 5, / 1

20 Solution approach: Theoretical results Under mild assumptions on H, H m and θ it holds (D m ) (D) for m (1) (D m θ,γ ) (Dm ) for γ () Related results To (1): E.g. Bühler et al. (018), To (): E.g. Cominetti and San Martín (1994), Cuturi (013), many others related to Sinkhorn distance, Bregman projection and Schrödinger problem. Stephan Eckstein Neural Network Hedging September 5, / 1

21 Numerics: MOT example MOT as in introduction f (s1, s) = (s s1) + Marginals µ 1, µ : Some mixtures of normals Parameter Optimizer Superhedging price Subhedging price (inf over Q instead) Neural network NN with 4 layers & hidden dimension 64, θ = µ 1 µ, β γ (x) = 1000 max{0, x} Approximate dual optimizer ĥ Discretization n = 600 (Relaxed martingale constraint: ε = 10 6 ) Approximate coupling ˆν Stephan Eckstein Neural Network Hedging September 5, / 1

22 MOT: Dual optimizer Superhedging Stephan Eckstein Neural Network Hedging September 5, / 1

23 MOT: Dual optimizer Subhedging Stephan Eckstein Neural Network Hedging September 5, / 1

24 Back to the primal Goal: Use the neural network solution of the dual to obtain a (near) optimal measure of the primal! The problem (D θ,γ ) = inf ϕ(h) + h H has a primal formulation (P θ,γ ) = sup ν Q fdν β γ (f h)dθ β γ ( ) dν dθ dθ and if (D θ,γ ) has an optimizer ĥ, one can often show that ˆν given by is an optimizer of (P θ,γ ). d ˆν dθ = β γ(f ĥ) Stephan Eckstein Neural Network Hedging September 5, / 1

25 Back to the primal Goal: Use the neural network solution of the dual to obtain a (near) optimal measure of the primal! The problem (D θ,γ ) = inf ϕ(h) + h H has a primal formulation (P θ,γ ) = sup ν Q fdν β γ (f h)dθ β γ ( ) dν dθ dθ and if (D θ,γ ) has an optimizer ĥ, one can often show that ˆν given by is an optimizer of (P θ,γ ). d ˆν dθ = β γ(f ĥ) Stephan Eckstein Neural Network Hedging September 5, / 1

26 Back to the primal Goal: Use the neural network solution of the dual to obtain a (near) optimal measure of the primal! The problem (D θ,γ ) = inf ϕ(h) + h H has a primal formulation (P θ,γ ) = sup ν Q fdν β γ (f h)dθ β γ ( ) dν dθ dθ and if (D θ,γ ) has an optimizer ĥ, one can often show that ˆν given by is an optimizer of (P θ,γ ). d ˆν dθ = β γ(f ĥ) Stephan Eckstein Neural Network Hedging September 5, / 1

27 MOT: Primal optimizer Approximately optimal coupling: Superhedging Stephan Eckstein Neural Network Hedging September 5, / 1

28 MOT: Primal optimizer Approximately optimal coupling: Subhedging Stephan Eckstein Neural Network Hedging September 5, / 1

29 MOT: Primal optimizer Approximately optimal coupling: Subhedging Stephan Eckstein Neural Network Hedging September 5, / 1

30 Risk aggregation (DNB Case Study - Aas and Puccetti (014)) Bank is exposed to 6 types of risks X 1,..., X 6 with known marginal exposures. An expert opinion µ P(R 6 ) about the joint distribution of (X 1,..., X 6 ) is given. Goal: Calculate bounds on under constraints that AVaR α ( 6 i=1 X i (X 1,..., X 6 ) have marginals µ 1,..., µ 6 Joint distribution ν (X 1,..., X 6 ) is in a Wasserstein ball around µ of a given radius ρ: W p ( µ, ν) ρ ) Stephan Eckstein Neural Network Hedging September 5, / 1

31 Risk aggregation (DNB Case Study - Aas and Puccetti (014)) Bank is exposed to 6 types of risks X 1,..., X 6 with known marginal exposures. An expert opinion µ P(R 6 ) about the joint distribution of (X 1,..., X 6 ) is given. Goal: Calculate bounds on under constraints that AVaR α ( 6 i=1 X i (X 1,..., X 6 ) have marginals µ 1,..., µ 6 Joint distribution ν (X 1,..., X 6 ) is in a Wasserstein ball around µ of a given radius ρ: W p ( µ, ν) ρ ) Stephan Eckstein Neural Network Hedging September 5, / 1

32 Bank risk aggregation (DNB Case Study) Using only marginal information: Average Value at Risk for = 0.95 Reference model 4165 (Best Case) (Worst Case) Stephan Eckstein Neural Network Hedging September 5, / 1

33 Bank risk aggregation (DNB Case Study) Worst case around reference model: Average Value at Risk for = 0.95 Reference model 4165 (Best Case) ( = 1/6) ( = 1) (Worst Case) Worst case over Wasserstein ball around of radius Stephan Eckstein Neural Network Hedging September 5, / 1

34 Thank you

35 References Aas, Kjersti and Puccetti, Giovanni Bounds on total economic capital: the DNB case study Springer, 17(4): , 014. Alfonsi, Aurélien and Corbetta, Jacopo and Jourdain, Benjamin Sampling of probability measures in the convex order and approximation of Martingale Optimal Transport problems arxiv preprint arxiv: , 017. Bartl, Daniel and Drapeau, Samuel and Tangpi, Ludovic Computational aspects of robust optimized certainty equivalents arxiv preprint arxiv: , 017. Blanchet, Jose and Murthy, Karthyek RA Quantifying distributional model risk via optimal transport arxiv preprint arxiv: , 016. H. Buehler, L. Gonon, J. Teichmann, and B. Wood. Deep hedging. arxiv preprint arxiv: , 018.

36 R. Cominetti and J. San Martín. Asymptotic analysis of the exponential penalty trajectory in linear programming. Mathematical Programming, 67(1-3): , M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in neural information processing systems, pages 9 300, 013. Eckstein, Stephan and Kupper, Michael Computation of optimal transport and related hedging problems via penalization and neural networks. arxiv preprint arxiv: , 018. I. Ekren and H. M. Soner. Constrained optimal transport. Archive for Rational Mechanics and Analysis, pages 1 37, 017. Esfahani, Peyman Mohajerin and Kuhn, Daniel Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations Mathematical Programming, pages 1 5, 017.

37 R. Gao and A. Kleywegt Data-Driven Robust Optimization with Known Marginal Distributions Working paper, 017 Henry-Labordère, Pierre Automated option pricing: Numerical methods International Journal of Theoretical and Applied Finance, 013. Guo, Gaoyue and Obloj, Jan Computational Methods for Martingale Optimal Transport problems arxiv preprint arxiv: , 017. Obloj, Jan and Wiesel, Johannes Statistical estimation of superhedging prices arxiv preprint arxiv: , 018. Peyré, Gabriel and Cuturi, Marco Computational optimal transport arxiv preprint arxiv: , 018.

arxiv: v1 [math.oc] 23 Feb 2018

arxiv: v1 [math.oc] 23 Feb 2018 Computation of optimal transport and related hedging problems via penalization and neural networks arxiv:1802.08539v1 [math.oc] 23 Feb 2018 Stephan Eckstein Michael Kupper February 26, 2018 Abstract This