Optimal Transport in Risk Analysis Jose Blanchet (based on work with Y. Kang and K. Murthy) Stanford University (Management Science and Engineering), and Columbia University (Department of Statistics and Department of IEOR). Blanchet (Columbia U. and Stanford U.) 1 / 25
Goal: Goal: Present a comprehensive framework for decision making under model uncertainty... This presentation is an invitation to read these two papers: https://arxiv.org/abs/1604.01446 https://arxiv.org/abs/1610.05627 Blanchet (Columbia U. and Stanford U.) 2 / 25
Example: Model Uncertainty in Ruin Probabilities R (t) = the reserve (perhaps multiple lines) at time t. Blanchet (Columbia U. and Stanford U.) 3 / 25
Example: Model Uncertainty in Ruin Probabilities R (t) = the reserve (perhaps multiple lines) at time t. Ruin probability (in finite time horizon T ) u T = P true (R (t) B for some t [0, T ]). Blanchet (Columbia U. and Stanford U.) 3 / 25
Example: Model Uncertainty in Ruin Probabilities R (t) = the reserve (perhaps multiple lines) at time t. Ruin probability (in finite time horizon T ) u T = P true (R (t) B for some t [0, T ]). B is a set which models bankruptcy. Blanchet (Columbia U. and Stanford U.) 3 / 25
Example: Model Uncertainty in Ruin Probabilities R (t) = the reserve (perhaps multiple lines) at time t. Ruin probability (in finite time horizon T ) u T = P true (R (t) B for some t [0, T ]). B is a set which models bankruptcy. Problem: Model (P true ) may be complex, intractable or simply unknown... Blanchet (Columbia U. and Stanford U.) 3 / 25
A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. Blanchet (Columbia U. and Stanford U.) 4 / 25
A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. P 0 = proxy for P true. Blanchet (Columbia U. and Stanford U.) 4 / 25
A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. P 0 = proxy for P true. P 0 right trade-off between fidelity and tractability. Blanchet (Columbia U. and Stanford U.) 4 / 25
A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. P 0 = proxy for P true. P 0 right trade-off between fidelity and tractability. δ is the distributional uncertainty size. Blanchet (Columbia U. and Stanford U.) 4 / 25
A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. P 0 = proxy for P true. P 0 right trade-off between fidelity and tractability. δ is the distributional uncertainty size. D c ( ) is the distributional uncertainty region. Blanchet (Columbia U. and Stanford U.) 4 / 25
Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Blanchet (Columbia U. and Stanford U.) 5 / 25
Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Want optimization to be tractable. Blanchet (Columbia U. and Stanford U.) 5 / 25
Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Want optimization to be tractable. Want to preserve advantages of using P 0. Blanchet (Columbia U. and Stanford U.) 5 / 25
Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Want optimization to be tractable. Want to preserve advantages of using P 0. Want a way to estimate δ. Blanchet (Columbia U. and Stanford U.) 5 / 25
Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Blanchet (Columbia U. and Stanford U.) 6 / 25
Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009). Blanchet (Columbia U. and Stanford U.) 6 / 25
Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009). Big problem: Absolute continuity may typically be violated... Blanchet (Columbia U. and Stanford U.) 6 / 25
Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009). Big problem: Absolute continuity may typically be violated... Think of using Brownian motion as a proxy model for R (t)... Blanchet (Columbia U. and Stanford U.) 6 / 25
Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009). Big problem: Absolute continuity may typically be violated... Think of using Brownian motion as a proxy model for R (t)... We advocate using optimal transport costs (e.g. Wasserstein distance). Blanchet (Columbia U. and Stanford U.) 6 / 25
Elements of Optimal Transport Costs S X and S Y be Polish spaces. Blanchet (Columbia U. and Stanford U.) 7 / 25
Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. Blanchet (Columbia U. and Stanford U.) 7 / 25
Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. c ( ) : S X S X [0, ) be lower semicontinuous. Blanchet (Columbia U. and Stanford U.) 7 / 25
Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. c ( ) : S X S X [0, ) be lower semicontinuous. µ ( ) and v ( ) Borel probability measures defined on S X and S Y. Blanchet (Columbia U. and Stanford U.) 7 / 25
Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. c ( ) : S X S X [0, ) be lower semicontinuous. µ ( ) and v ( ) Borel probability measures defined on S X and S Y. Given π a Borel prob. measure on S X S Y, π X (A) = π (A S Y ) and π Y (C ) = π (S X C ). Blanchet (Columbia U. and Stanford U.) 7 / 25
Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π Blanchet (Columbia U. and Stanford U.) 8 / 25
Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π This is the so-called Kantorovich problem (see Villani (2008)). Blanchet (Columbia U. and Stanford U.) 8 / 25
Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π This is the so-called Kantorovich problem (see Villani (2008)). If c ( ) is a metric then D c (µ, v) is a Wasserstein distance of order 1. Blanchet (Columbia U. and Stanford U.) 8 / 25
Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π This is the so-called Kantorovich problem (see Villani (2008)). If c ( ) is a metric then D c (µ, v) is a Wasserstein distance of order 1. If c (x, y) = 0 if and only if x = y then D c (µ, v) = 0 if and only if µ = v. Blanchet (Columbia U. and Stanford U.) 8 / 25
Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π This is the so-called Kantorovich problem (see Villani (2008)). If c ( ) is a metric then D c (µ, v) is a Wasserstein distance of order 1. If c (x, y) = 0 if and only if x = y then D c (µ, v) = 0 if and only if µ = v. Kantorovich s problem is a "nice" infinite dimensional linear programming problem. Blanchet (Columbia U. and Stanford U.) 8 / 25
Illustration of Optimal Transport Costs Blanchet (Columbia U. and Stanford U.) 9 / 25
Fundamental Theorem of Robust Performance Analysis Theorem (B. and Murthy (2016)) Suppose that c ( ) is lower semicontinuous and that h ( ) is upper semicontinuous with E P0 f (X ) <. Then, sup E P (f (Y )) = inf E P 0 [λδ + sup{f (z) λc (X, z)}]. D c (P 0,P ) δ λ 0 z Moreover, (π ) and dual λ are primal-dual solutions if and only if f (y) λ c (x, y) = sup{f (z) λ c (x, z)} (x, y) -π a.s. z λ (E π [c (X, Y ) δ]) = 0. Blanchet (Columbia U. and Stanford U.) 10 / 25
Implications for Risk Analysis Theorem (B. and Murthy (2016)) Suppose that c ( ) is lower semicontinuous and that B is a closed set. Let c B (x) = inf{c (x, y) : y B}, then sup P (Y B) = P 0 (c B (X ) 1/λ ), D c (P 0,P ) δ where λ 0 satisfies (under mild assumptions on c B (X )) δ = E 0 [c B (X ) I (c B (X ) 1/λ )]. Blanchet (Columbia U. and Stanford U.) 11 / 25
Application 1: Back to Classical Risk Problem Suppose that c (x, y) = d J (x ( ), y ( )) = Skorokhod J 1 metric. = inf { sup x (t) y (φ (t)), sup φ (t) t }. φ( ) bijection t [0,1] t [0,1] Blanchet (Columbia U. and Stanford U.) 12 / 25
Application 1: Back to Classical Risk Problem Suppose that c (x, y) = d J (x ( ), y ( )) = Skorokhod J 1 metric. = inf { sup x (t) y (φ (t)), sup φ (t) t }. φ( ) bijection t [0,1] t [0,1] If R (t) = b Z (t), then ruin during time interval [0, 1] is B b = {z ( ) : b sup z (t)}. t [0,1] Blanchet (Columbia U. and Stanford U.) 12 / 25
Application 1: Back to Classical Risk Problem Suppose that c (x, y) = d J (x ( ), y ( )) = Skorokhod J 1 metric. = inf { sup x (t) y (φ (t)), sup φ (t) t }. φ( ) bijection t [0,1] t [0,1] If R (t) = b Z (t), then ruin during time interval [0, 1] is B b = {z ( ) : b sup z (t)}. t [0,1] Let P 0 ( ) be the Wiener measure want to compute sup P (Z B b ). D c (P 0,P ) δ Blanchet (Columbia U. and Stanford U.) 12 / 25
Application 1: Computing Distance to Bankruptcy So: {c Bb (z) 1/λ } = {sup t [0,1] z (t) b 1/λ }, and ( ) sup P (Z B b ) = P 0 sup Z (t) b 1/λ. D c (P 0,P ) δ t [0,1] Blanchet (Columbia U. and Stanford U.) 13 / 25
Application 1: Computing Uncertainty Size Note any coupling π so that π X = P 0 and π Y = P satisfies D c (P 0, P) E π [c (X, Y )] δ. Blanchet (Columbia U. and Stanford U.) 14 / 25
Application 1: Computing Uncertainty Size Note any coupling π so that π X = P 0 and π Y = P satisfies D c (P 0, P) E π [c (X, Y )] δ. So use any coupling between evidence and P 0 or expert knowledge. Blanchet (Columbia U. and Stanford U.) 14 / 25
Application 1: Computing Uncertainty Size Note any coupling π so that π X = P 0 and π Y = P satisfies D c (P 0, P) E π [c (X, Y )] δ. So use any coupling between evidence and P 0 or expert knowledge. We discuss choosing δ non-parametrically in a moment Blanchet (Columbia U. and Stanford U.) 14 / 25
Application 1: Illustration of Coupling Given arrivals and claim sizes let Z (t) = m 1/2 2 N (t) k=1 (X k m 1 ) Blanchet (Columbia U. and Stanford U.) 15 / 25
Application 1: Coupling in Action Blanchet (Columbia U. and Stanford U.) 16 / 25
Application 1: Numerical Example Assume Poisson arrivals. Pareto claim sizes with index 2.2 (P (V > t) = 1/(1 + t) 2.2 ). Cost c (x, y) = d J (x, y) 2 < note power of 2. Used Algorithm 1 to calibrate (estimating means and variances from data). b P 0 (Ruin) P true (Ruin) P robust (Ruin) P true (Ruin) 100 1.07 10 1 12.28 150 2.52 10 4 10.65 200 5.35 10 8 10.80 250 1.15 10 12 10.98 Blanchet (Columbia U. and Stanford U.) 17 / 25
Additional Applications: Multidimensional Ruin Problems Paper: Quantifying Distributional Model Risk via Optimal Transport (B. & Murthy 16) https://arxiv.org/abs/1604.01446 contains more applications Multidimensional risk processes (explicit evaluation of c B (x) for d J metric). Control: min θ sup P :D (P,P0 ) δ E [L (θ, Z )] < robust optimal reinsurance. Blanchet (Columbia U. and Stanford U.) 18 / 25
Connections to Machine Learning Connection to machine learning helps further understand why optimal transport costs are sensible choices... Paper: Robust Wasserstein Profile Inference and Applications to Machine Learning (B., Murthy & Kang 16) https://arxiv.org/abs/1610.05627 Blanchet (Columbia U. and Stanford U.) 19 / 25
Robust Performance Analysis in Machine Learning Consider estimating β R m in linear regression Y i = βx i + e i, where {(Y i, X i )} n i=1 are data points. Blanchet (Columbia U. and Stanford U.) 20 / 25
Robust Performance Analysis in Machine Learning Consider estimating β R m in linear regression Y i = βx i + e i, where {(Y i, X i )} n i=1 are data points. Optimal Least Squares approach consists in estimating β via n 1 ( ) 2 MSE (β) = min β n Y i β T X i. i=1 Blanchet (Columbia U. and Stanford U.) 20 / 25
Robust Performance Analysis in Machine Learning Consider estimating β R m in linear regression Y i = βx i + e i, where {(Y i, X i )} n i=1 are data points. Optimal Least Squares approach consists in estimating β via n 1 ( ) 2 MSE (β) = min β n Y i β T X i. i=1 Apply the distributionally robust estimator based on optimal transport. Blanchet (Columbia U. and Stanford U.) 20 / 25
Connection to Sqrt-Lasso Theorem (B., Kang, Murthy (2016)) Suppose that c ( (x, y), ( { x, y )) x x = 2 q if y = y if y = y. Then, if 1/p + 1/q = 1 ( ( ) ) 2 max E 1/2 P Y β T X = P :D c (P,P n ) δ MSE (β) + δ β 2 p. Remark 1: This is sqrt-lasso (Belloni et al. (2011)). Remark 2: Also representations for support vector machines, LAD lasso, group lasso, adaptive lasso, and more! Blanchet (Columbia U. and Stanford U.) 21 / 25
For Instance... Regularized Estimators Theorem (B., Kang, Murthy (2016)) Suppose that c ( (x, y), ( x, y )) = { x x q if y = y if y = y. Then, ] sup E P [log(1 + e Y βt X ) P : D c (P,P n ) δ = 1 n n log(1 + e Y i β T X i ) + δ β p. i=1 Remark 1: This is regularized logistic regression (see also Esfahani and Kuhn 2015). Blanchet (Columbia U. and Stanford U.) 22 / 25
Connections to Empirical Likelihood Paper: https://arxiv.org/abs/1610.05627 Also chooses δ optimally introducing an extension of Empirical Likelihood called "Robust Wasserstein Profile Inference". Blanchet (Columbia U. and Stanford U.) 23 / 25
The Robust Wasserstein Profile Function Pick δ = 95% quantile of R n (β ) and we show that nr n (β ) d E [e 2 ] E [e 2 ] (E e ) 2 N(0, Cov (X )) 2 q. Blanchet (Columbia U. and Stanford U.) 24 / 25
Conclusions Presented systematic approach for quantifying model misspecification. Blanchet (Columbia U. and Stanford U.) 25 / 25
Conclusions Presented systematic approach for quantifying model misspecification. Approach based on optimal transport sup P (X B) = P 0 (c B (X ) 1/λ ). D (P,P 0 ) δ Blanchet (Columbia U. and Stanford U.) 25 / 25
Conclusions Presented systematic approach for quantifying model misspecification. Approach based on optimal transport sup P (X B) = P 0 (c B (X ) 1/λ ). D (P,P 0 ) δ Closed forms solutions, tractable in terms of P 0, feasible calibration of δ. Blanchet (Columbia U. and Stanford U.) 25 / 25
Conclusions Presented systematic approach for quantifying model misspecification. Approach based on optimal transport sup P (X B) = P 0 (c B (X ) 1/λ ). D (P,P 0 ) δ Closed forms solutions, tractable in terms of P 0, feasible calibration of δ. New statistical estimators, connections to machine learning & regularization. Blanchet (Columbia U. and Stanford U.) 25 / 25
Conclusions Presented systematic approach for quantifying model misspecification. Approach based on optimal transport sup P (X B) = P 0 (c B (X ) 1/λ ). D (P,P 0 ) δ Closed forms solutions, tractable in terms of P 0, feasible calibration of δ. New statistical estimators, connections to machine learning & regularization. Extensions of Empirical Likelihood & connections to optimal regularization. Blanchet (Columbia U. and Stanford U.) 25 / 25