Optimal Transport in Risk Analysis

Optimal Transport in Risk Analysis Jose Blanchet (based on work with Y. Kang and K. Murthy) Stanford University (Management Science and Engineering), and Columbia University (Department of Statistics and Department of IEOR). Blanchet (Columbia U. and Stanford U.) 1 / 25

Goal: Goal: Present a comprehensive framework for decision making under model uncertainty... This presentation is an invitation to read these two papers: https://arxiv.org/abs/1604.01446 https://arxiv.org/abs/1610.05627 Blanchet (Columbia U. and Stanford U.) 2 / 25

Example: Model Uncertainty in Ruin Probabilities R (t) = the reserve (perhaps multiple lines) at time t. Blanchet (Columbia U. and Stanford U.) 3 / 25

Example: Model Uncertainty in Ruin Probabilities R (t) = the reserve (perhaps multiple lines) at time t. Ruin probability (in finite time horizon T ) u T = P true (R (t) B for some t [0, T ]). B is a set which models bankruptcy. Blanchet (Columbia U. and Stanford U.) 3 / 25

A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. Blanchet (Columbia U. and Stanford U.) 4 / 25

A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. P 0 = proxy for P true. P 0 right trade-off between fidelity and tractability. Blanchet (Columbia U. and Stanford U.) 4 / 25

Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Blanchet (Columbia U. and Stanford U.) 5 / 25

Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Want optimization to be tractable. Blanchet (Columbia U. and Stanford U.) 5 / 25

Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Want optimization to be tractable. Want to preserve advantages of using P 0. Blanchet (Columbia U. and Stanford U.) 5 / 25

Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Blanchet (Columbia U. and Stanford U.) 6 / 25

Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009). Big problem: Absolute continuity may typically be violated... Blanchet (Columbia U. and Stanford U.) 6 / 25

Elements of Optimal Transport Costs S X and S Y be Polish spaces. Blanchet (Columbia U. and Stanford U.) 7 / 25

Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. Blanchet (Columbia U. and Stanford U.) 7 / 25

Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. c ( ) : S X S X [0, ) be lower semicontinuous. Blanchet (Columbia U. and Stanford U.) 7 / 25

Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. c ( ) : S X S X [0, ) be lower semicontinuous. µ ( ) and v ( ) Borel probability measures defined on S X and S Y. Blanchet (Columbia U. and Stanford U.) 7 / 25

Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π Blanchet (Columbia U. and Stanford U.) 8 / 25

Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π This is the so-called Kantorovich problem (see Villani (2008)). Blanchet (Columbia U. and Stanford U.) 8 / 25

Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π This is the so-called Kantorovich problem (see Villani (2008)). If c ( ) is a metric then D c (µ, v) is a Wasserstein distance of order 1. If c (x, y) = 0 if and only if x = y then D c (µ, v) = 0 if and only if µ = v. Blanchet (Columbia U. and Stanford U.) 8 / 25

Illustration of Optimal Transport Costs Blanchet (Columbia U. and Stanford U.) 9 / 25

Fundamental Theorem of Robust Performance Analysis Theorem (B. and Murthy (2016)) Suppose that c ( ) is lower semicontinuous and that h ( ) is upper semicontinuous with E P0 f (X ) <. Then, sup E P (f (Y )) = inf E P 0 [λδ + sup{f (z) λc (X, z)}]. D c (P 0,P ) δ λ 0 z Moreover, (π ) and dual λ are primal-dual solutions if and only if f (y) λ c (x, y) = sup{f (z) λ c (x, z)} (x, y) -π a.s. z λ (E π [c (X, Y ) δ]) = 0. Blanchet (Columbia U. and Stanford U.) 10 / 25

Implications for Risk Analysis Theorem (B. and Murthy (2016)) Suppose that c ( ) is lower semicontinuous and that B is a closed set. Let c B (x) = inf{c (x, y) : y B}, then sup P (Y B) = P 0 (c B (X ) 1/λ ), D c (P 0,P ) δ where λ 0 satisfies (under mild assumptions on c B (X )) δ = E 0 [c B (X ) I (c B (X ) 1/λ )]. Blanchet (Columbia U. and Stanford U.) 11 / 25

Application 1: Back to Classical Risk Problem Suppose that c (x, y) = d J (x ( ), y ( )) = Skorokhod J 1 metric. = inf { sup x (t) y (φ (t)), sup φ (t) t }. φ( ) bijection t [0,1] t [0,1] Blanchet (Columbia U. and Stanford U.) 12 / 25

Application 1: Back to Classical Risk Problem Suppose that c (x, y) = d J (x ( ), y ( )) = Skorokhod J 1 metric. = inf { sup x (t) y (φ (t)), sup φ (t) t }. φ( ) bijection t [0,1] t [0,1] If R (t) = b Z (t), then ruin during time interval [0, 1] is B b = {z ( ) : b sup z (t)}. t [0,1] Blanchet (Columbia U. and Stanford U.) 12 / 25

Application 1: Computing Distance to Bankruptcy So: {c Bb (z) 1/λ } = {sup t [0,1] z (t) b 1/λ }, and ( ) sup P (Z B b ) = P 0 sup Z (t) b 1/λ. D c (P 0,P ) δ t [0,1] Blanchet (Columbia U. and Stanford U.) 13 / 25

Application 1: Computing Uncertainty Size Note any coupling π so that π X = P 0 and π Y = P satisfies D c (P 0, P) E π [c (X, Y )] δ. Blanchet (Columbia U. and Stanford U.) 14 / 25

Application 1: Computing Uncertainty Size Note any coupling π so that π X = P 0 and π Y = P satisfies D c (P 0, P) E π [c (X, Y )] δ. So use any coupling between evidence and P 0 or expert knowledge. Blanchet (Columbia U. and Stanford U.) 14 / 25

Application 1: Illustration of Coupling Given arrivals and claim sizes let Z (t) = m 1/2 2 N (t) k=1 (X k m 1 ) Blanchet (Columbia U. and Stanford U.) 15 / 25

Application 1: Coupling in Action Blanchet (Columbia U. and Stanford U.) 16 / 25

Application 1: Numerical Example Assume Poisson arrivals. Pareto claim sizes with index 2.2 (P (V > t) = 1/(1 + t) 2.2 ). Cost c (x, y) = d J (x, y) 2 < note power of 2. Used Algorithm 1 to calibrate (estimating means and variances from data). b P 0 (Ruin) P true (Ruin) P robust (Ruin) P true (Ruin) 100 1.07 10 1 12.28 150 2.52 10 4 10.65 200 5.35 10 8 10.80 250 1.15 10 12 10.98 Blanchet (Columbia U. and Stanford U.) 17 / 25

Additional Applications: Multidimensional Ruin Problems Paper: Quantifying Distributional Model Risk via Optimal Transport (B. & Murthy 16) https://arxiv.org/abs/1604.01446 contains more applications Multidimensional risk processes (explicit evaluation of c B (x) for d J metric). Control: min θ sup P :D (P,P0 ) δ E [L (θ, Z )] < robust optimal reinsurance. Blanchet (Columbia U. and Stanford U.) 18 / 25

Connections to Machine Learning Connection to machine learning helps further understand why optimal transport costs are sensible choices... Paper: Robust Wasserstein Profile Inference and Applications to Machine Learning (B., Murthy & Kang 16) https://arxiv.org/abs/1610.05627 Blanchet (Columbia U. and Stanford U.) 19 / 25

Robust Performance Analysis in Machine Learning Consider estimating β R m in linear regression Y i = βx i + e i, where {(Y i, X i )} n i=1 are data points. Blanchet (Columbia U. and Stanford U.) 20 / 25

Robust Performance Analysis in Machine Learning Consider estimating β R m in linear regression Y i = βx i + e i, where {(Y i, X i )} n i=1 are data points. Optimal Least Squares approach consists in estimating β via n 1 ( ) 2 MSE (β) = min β n Y i β T X i. i=1 Blanchet (Columbia U. and Stanford U.) 20 / 25

Connection to Sqrt-Lasso Theorem (B., Kang, Murthy (2016)) Suppose that c ( (x, y), ( { x, y )) x x = 2 q if y = y if y = y. Then, if 1/p + 1/q = 1 ( ( ) ) 2 max E 1/2 P Y β T X = P :D c (P,P n ) δ MSE (β) + δ β 2 p. Remark 1: This is sqrt-lasso (Belloni et al. (2011)). Remark 2: Also representations for support vector machines, LAD lasso, group lasso, adaptive lasso, and more! Blanchet (Columbia U. and Stanford U.) 21 / 25

For Instance... Regularized Estimators Theorem (B., Kang, Murthy (2016)) Suppose that c ( (x, y), ( x, y )) = { x x q if y = y if y = y. Then, ] sup E P [log(1 + e Y βt X ) P : D c (P,P n ) δ = 1 n n log(1 + e Y i β T X i ) + δ β p. i=1 Remark 1: This is regularized logistic regression (see also Esfahani and Kuhn 2015). Blanchet (Columbia U. and Stanford U.) 22 / 25

Connections to Empirical Likelihood Paper: https://arxiv.org/abs/1610.05627 Also chooses δ optimally introducing an extension of Empirical Likelihood called "Robust Wasserstein Profile Inference". Blanchet (Columbia U. and Stanford U.) 23 / 25

The Robust Wasserstein Profile Function Pick δ = 95% quantile of R n (β ) and we show that nr n (β ) d E [e 2 ] E [e 2 ] (E e ) 2 N(0, Cov (X )) 2 q. Blanchet (Columbia U. and Stanford U.) 24 / 25

Conclusions Presented systematic approach for quantifying model misspecification. Blanchet (Columbia U. and Stanford U.) 25 / 25

Conclusions Presented systematic approach for quantifying model misspecification. Approach based on optimal transport sup P (X B) = P 0 (c B (X ) 1/λ ). D (P,P 0 ) δ Blanchet (Columbia U. and Stanford U.) 25 / 25

Conclusions Presented systematic approach for quantifying model misspecification. Approach based on optimal transport sup P (X B) = P 0 (c B (X ) 1/λ ). D (P,P 0 ) δ Closed forms solutions, tractable in terms of P 0, feasible calibration of δ. New statistical estimators, connections to machine learning & regularization. Blanchet (Columbia U. and Stanford U.) 25 / 25