Optimal Transport in Risk Analysis

Similar documents
Optimal Transport Methods in Operations Research and Statistics

PRIVATE RISK AND VALUATION: A DISTRIBUTIONALLY ROBUST OPTIMIZATION VIEW

Distributionally Robust Stochastic Optimization with Wasserstein Distance

Quantifying Stochastic Model Errors via Robust Optimization

Distirbutional robustness, regularizing variance, and adversaries

Superhedging and distributionally robust optimization with neural networks

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

Robust Dual-Response Optimization

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Distributionally Robust Groupwise Regularization Estimator

Distributionally Robust Convex Optimization

Distributionally Robust Optimization with ROME (part 1)

Introduction to Rare Event Simulation

Random Convex Approximations of Ambiguous Chance Constrained Programs

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.

Robust Actuarial Risk Analysis

Nishant Gurnani. GAN Reading Group. April 14th, / 107

arxiv: v2 [math.oc] 16 Jul 2016

Wasserstein GAN. Juho Lee. Jan 23, 2017

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Posterior Regularization

Ambiguity Sets and their applications to SVM

Robust Optimization for Risk Control in Enterprise-wide Optimization

Expectation Propagation Algorithm

Does Distributionally Robust Supervised Learning Give Robust Classifiers?

Distributionally Robust Discrete Optimization with Entropic Value-at-Risk

Introduction. Chapter 1

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Foundations of Nonparametric Bayesian Methods

School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia , USA

Lecture 35: December The fundamental statistical distances

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models

arxiv: v1 [stat.me] 2 Nov 2017

Support Vector Machines

PATTERN RECOGNITION AND MACHINE LEARNING

Data-Driven Distributionally Robust Chance-Constrained Optimization with Wasserstein Metric

CORC Tech Report TR Robust dynamic programming

Are You a Bayesian or a Frequentist?

Optimal Transport and Wasserstein Distance

Distributionally Robust Convex Optimization

Handout 8: Dealing with Data Uncertainty

arxiv: v3 [stat.ml] 4 Aug 2017

CSC 2541: Bayesian Methods for Machine Learning

Lecture 2: Statistical Decision Theory (Part I)

Lecture 7 Introduction to Statistical Decision Theory

Learning Energy-Based Models of High-Dimensional Data

Bayesian Learning (II)

arxiv: v1 [stat.ml] 27 May 2018

Robust portfolio selection under norm uncertainty

Lagrange Relaxation: Introduction and Applications

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

On Distributionally Robust Chance Constrained Program with Wasserstein Distance

Information theoretic perspectives on learning algorithms

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Time inconsistency of optimal policies of distributionally robust inventory models

Ambiguous Joint Chance Constraints under Mean and Dispersion Information

Distributionally Robust Reward-risk Ratio Programming with Wasserstein Metric

CORC Technical Report TR Ambiguous chance constrained problems and robust optimization

Reducing Model Risk With Goodness-of-fit Victory Idowu London School of Economics

Nonparametric Bayesian Methods - Lecture I

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture 2: August 31

BAYESIAN ESTIMATION OF UNKNOWN PARAMETERS OVER NETWORKS

Lecture : Probabilistic Machine Learning

arxiv: v3 [math.oc] 25 Apr 2018

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

Convex Optimization Lecture 16

Duality in Regret Measures and Risk Measures arxiv: v1 [q-fin.mf] 30 Apr 2017 Qiang Yao, Xinmin Yang and Jie Sun

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS

THE stochastic and dynamic environments of many practical

STA414/2104 Statistical Methods for Machine Learning II

Generalized quantiles as risk measures

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Econometrics I, Estimation

Discussion of Sensitivity and Informativeness under Local Misspecification

Statistical Measures of Uncertainty in Inverse Problems

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Robust Hypothesis Testing Using Wasserstein Uncertainty Sets

Generative Models and Optimal Transport

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction

Covariance function estimation in Gaussian process regression

Machine learning, shrinkage estimation, and economic theory

Lecture 6: Conic Optimization September 8

On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing

Bayesian Machine Learning

arxiv: v1 [cs.it] 30 Jun 2017

CS-E3210 Machine Learning: Basic Principles

Habilitationsvortrag: Machine learning, shrinkage estimation, and economic theory

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Sparse and Robust Optimization and Applications

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning

Regression I: Mean Squared Error and Measuring Quality of Fit

Machine Learning 3. week

Statistical Inference

Decomposition Algorithms for Two-Stage Distributionally Robust Mixed Binary Programs

Machine Learning. Support Vector Machines. Manfred Huber

Ambiguous Chance Constrained Programs: Algorithms and Applications

Assessing financial model risk

RISK AND RELIABILITY IN OPTIMIZATION UNDER UNCERTAINTY

Transcription:

Optimal Transport in Risk Analysis Jose Blanchet (based on work with Y. Kang and K. Murthy) Stanford University (Management Science and Engineering), and Columbia University (Department of Statistics and Department of IEOR). Blanchet (Columbia U. and Stanford U.) 1 / 25

Goal: Goal: Present a comprehensive framework for decision making under model uncertainty... This presentation is an invitation to read these two papers: https://arxiv.org/abs/1604.01446 https://arxiv.org/abs/1610.05627 Blanchet (Columbia U. and Stanford U.) 2 / 25

Example: Model Uncertainty in Ruin Probabilities R (t) = the reserve (perhaps multiple lines) at time t. Blanchet (Columbia U. and Stanford U.) 3 / 25

Example: Model Uncertainty in Ruin Probabilities R (t) = the reserve (perhaps multiple lines) at time t. Ruin probability (in finite time horizon T ) u T = P true (R (t) B for some t [0, T ]). Blanchet (Columbia U. and Stanford U.) 3 / 25

Example: Model Uncertainty in Ruin Probabilities R (t) = the reserve (perhaps multiple lines) at time t. Ruin probability (in finite time horizon T ) u T = P true (R (t) B for some t [0, T ]). B is a set which models bankruptcy. Blanchet (Columbia U. and Stanford U.) 3 / 25

Example: Model Uncertainty in Ruin Probabilities R (t) = the reserve (perhaps multiple lines) at time t. Ruin probability (in finite time horizon T ) u T = P true (R (t) B for some t [0, T ]). B is a set which models bankruptcy. Problem: Model (P true ) may be complex, intractable or simply unknown... Blanchet (Columbia U. and Stanford U.) 3 / 25

A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. Blanchet (Columbia U. and Stanford U.) 4 / 25

A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. P 0 = proxy for P true. Blanchet (Columbia U. and Stanford U.) 4 / 25

A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. P 0 = proxy for P true. P 0 right trade-off between fidelity and tractability. Blanchet (Columbia U. and Stanford U.) 4 / 25

A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. P 0 = proxy for P true. P 0 right trade-off between fidelity and tractability. δ is the distributional uncertainty size. Blanchet (Columbia U. and Stanford U.) 4 / 25

A Distributionally Robust Risk Analysis Formulation Our solution: Estimate u T by solving sup P (R (t) B for some t [0, T ]), D c (P 0,P ) δ where P 0 is a suitable model. P 0 = proxy for P true. P 0 right trade-off between fidelity and tractability. δ is the distributional uncertainty size. D c ( ) is the distributional uncertainty region. Blanchet (Columbia U. and Stanford U.) 4 / 25

Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Blanchet (Columbia U. and Stanford U.) 5 / 25

Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Want optimization to be tractable. Blanchet (Columbia U. and Stanford U.) 5 / 25

Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Want optimization to be tractable. Want to preserve advantages of using P 0. Blanchet (Columbia U. and Stanford U.) 5 / 25

Desirable Elements of Distributionally Robust Formulation Would like D c ( ) to have wide flexibility (even non-parametric). Want optimization to be tractable. Want to preserve advantages of using P 0. Want a way to estimate δ. Blanchet (Columbia U. and Stanford U.) 5 / 25

Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Blanchet (Columbia U. and Stanford U.) 6 / 25

Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009). Blanchet (Columbia U. and Stanford U.) 6 / 25

Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009). Big problem: Absolute continuity may typically be violated... Blanchet (Columbia U. and Stanford U.) 6 / 25

Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009). Big problem: Absolute continuity may typically be violated... Think of using Brownian motion as a proxy model for R (t)... Blanchet (Columbia U. and Stanford U.) 6 / 25

Connections to Distributionally Robust Optimization Standard choices based on divergence (such as Kullback-Leibler) - Hansen & Sargent (2016) ( )) dv D (v µ) = E v (log. dµ Robust Optimization: Ben-Tal, El Ghaoui, Nemirovski (2009). Big problem: Absolute continuity may typically be violated... Think of using Brownian motion as a proxy model for R (t)... We advocate using optimal transport costs (e.g. Wasserstein distance). Blanchet (Columbia U. and Stanford U.) 6 / 25

Elements of Optimal Transport Costs S X and S Y be Polish spaces. Blanchet (Columbia U. and Stanford U.) 7 / 25

Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. Blanchet (Columbia U. and Stanford U.) 7 / 25

Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. c ( ) : S X S X [0, ) be lower semicontinuous. Blanchet (Columbia U. and Stanford U.) 7 / 25

Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. c ( ) : S X S X [0, ) be lower semicontinuous. µ ( ) and v ( ) Borel probability measures defined on S X and S Y. Blanchet (Columbia U. and Stanford U.) 7 / 25

Elements of Optimal Transport Costs S X and S Y be Polish spaces. B SX and B SY be the associated Borel σ-fields. c ( ) : S X S X [0, ) be lower semicontinuous. µ ( ) and v ( ) Borel probability measures defined on S X and S Y. Given π a Borel prob. measure on S X S Y, π X (A) = π (A S Y ) and π Y (C ) = π (S X C ). Blanchet (Columbia U. and Stanford U.) 7 / 25

Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π Blanchet (Columbia U. and Stanford U.) 8 / 25

Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π This is the so-called Kantorovich problem (see Villani (2008)). Blanchet (Columbia U. and Stanford U.) 8 / 25

Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π This is the so-called Kantorovich problem (see Villani (2008)). If c ( ) is a metric then D c (µ, v) is a Wasserstein distance of order 1. Blanchet (Columbia U. and Stanford U.) 8 / 25

Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π This is the so-called Kantorovich problem (see Villani (2008)). If c ( ) is a metric then D c (µ, v) is a Wasserstein distance of order 1. If c (x, y) = 0 if and only if x = y then D c (µ, v) = 0 if and only if µ = v. Blanchet (Columbia U. and Stanford U.) 8 / 25

Definition of Optimal Transport Costs Define D c (µ, v) = min{e π (c (X, Y )) : π X = µ and π Y = v}. π This is the so-called Kantorovich problem (see Villani (2008)). If c ( ) is a metric then D c (µ, v) is a Wasserstein distance of order 1. If c (x, y) = 0 if and only if x = y then D c (µ, v) = 0 if and only if µ = v. Kantorovich s problem is a "nice" infinite dimensional linear programming problem. Blanchet (Columbia U. and Stanford U.) 8 / 25

Illustration of Optimal Transport Costs Blanchet (Columbia U. and Stanford U.) 9 / 25

Fundamental Theorem of Robust Performance Analysis Theorem (B. and Murthy (2016)) Suppose that c ( ) is lower semicontinuous and that h ( ) is upper semicontinuous with E P0 f (X ) <. Then, sup E P (f (Y )) = inf E P 0 [λδ + sup{f (z) λc (X, z)}]. D c (P 0,P ) δ λ 0 z Moreover, (π ) and dual λ are primal-dual solutions if and only if f (y) λ c (x, y) = sup{f (z) λ c (x, z)} (x, y) -π a.s. z λ (E π [c (X, Y ) δ]) = 0. Blanchet (Columbia U. and Stanford U.) 10 / 25

Implications for Risk Analysis Theorem (B. and Murthy (2016)) Suppose that c ( ) is lower semicontinuous and that B is a closed set. Let c B (x) = inf{c (x, y) : y B}, then sup P (Y B) = P 0 (c B (X ) 1/λ ), D c (P 0,P ) δ where λ 0 satisfies (under mild assumptions on c B (X )) δ = E 0 [c B (X ) I (c B (X ) 1/λ )]. Blanchet (Columbia U. and Stanford U.) 11 / 25

Application 1: Back to Classical Risk Problem Suppose that c (x, y) = d J (x ( ), y ( )) = Skorokhod J 1 metric. = inf { sup x (t) y (φ (t)), sup φ (t) t }. φ( ) bijection t [0,1] t [0,1] Blanchet (Columbia U. and Stanford U.) 12 / 25

Application 1: Back to Classical Risk Problem Suppose that c (x, y) = d J (x ( ), y ( )) = Skorokhod J 1 metric. = inf { sup x (t) y (φ (t)), sup φ (t) t }. φ( ) bijection t [0,1] t [0,1] If R (t) = b Z (t), then ruin during time interval [0, 1] is B b = {z ( ) : b sup z (t)}. t [0,1] Blanchet (Columbia U. and Stanford U.) 12 / 25

Application 1: Back to Classical Risk Problem Suppose that c (x, y) = d J (x ( ), y ( )) = Skorokhod J 1 metric. = inf { sup x (t) y (φ (t)), sup φ (t) t }. φ( ) bijection t [0,1] t [0,1] If R (t) = b Z (t), then ruin during time interval [0, 1] is B b = {z ( ) : b sup z (t)}. t [0,1] Let P 0 ( ) be the Wiener measure want to compute sup P (Z B b ). D c (P 0,P ) δ Blanchet (Columbia U. and Stanford U.) 12 / 25

Application 1: Computing Distance to Bankruptcy So: {c Bb (z) 1/λ } = {sup t [0,1] z (t) b 1/λ }, and ( ) sup P (Z B b ) = P 0 sup Z (t) b 1/λ. D c (P 0,P ) δ t [0,1] Blanchet (Columbia U. and Stanford U.) 13 / 25

Application 1: Computing Uncertainty Size Note any coupling π so that π X = P 0 and π Y = P satisfies D c (P 0, P) E π [c (X, Y )] δ. Blanchet (Columbia U. and Stanford U.) 14 / 25

Application 1: Computing Uncertainty Size Note any coupling π so that π X = P 0 and π Y = P satisfies D c (P 0, P) E π [c (X, Y )] δ. So use any coupling between evidence and P 0 or expert knowledge. Blanchet (Columbia U. and Stanford U.) 14 / 25

Application 1: Computing Uncertainty Size Note any coupling π so that π X = P 0 and π Y = P satisfies D c (P 0, P) E π [c (X, Y )] δ. So use any coupling between evidence and P 0 or expert knowledge. We discuss choosing δ non-parametrically in a moment Blanchet (Columbia U. and Stanford U.) 14 / 25

Application 1: Illustration of Coupling Given arrivals and claim sizes let Z (t) = m 1/2 2 N (t) k=1 (X k m 1 ) Blanchet (Columbia U. and Stanford U.) 15 / 25

Application 1: Coupling in Action Blanchet (Columbia U. and Stanford U.) 16 / 25

Application 1: Numerical Example Assume Poisson arrivals. Pareto claim sizes with index 2.2 (P (V > t) = 1/(1 + t) 2.2 ). Cost c (x, y) = d J (x, y) 2 < note power of 2. Used Algorithm 1 to calibrate (estimating means and variances from data). b P 0 (Ruin) P true (Ruin) P robust (Ruin) P true (Ruin) 100 1.07 10 1 12.28 150 2.52 10 4 10.65 200 5.35 10 8 10.80 250 1.15 10 12 10.98 Blanchet (Columbia U. and Stanford U.) 17 / 25

Additional Applications: Multidimensional Ruin Problems Paper: Quantifying Distributional Model Risk via Optimal Transport (B. & Murthy 16) https://arxiv.org/abs/1604.01446 contains more applications Multidimensional risk processes (explicit evaluation of c B (x) for d J metric). Control: min θ sup P :D (P,P0 ) δ E [L (θ, Z )] < robust optimal reinsurance. Blanchet (Columbia U. and Stanford U.) 18 / 25

Connections to Machine Learning Connection to machine learning helps further understand why optimal transport costs are sensible choices... Paper: Robust Wasserstein Profile Inference and Applications to Machine Learning (B., Murthy & Kang 16) https://arxiv.org/abs/1610.05627 Blanchet (Columbia U. and Stanford U.) 19 / 25

Robust Performance Analysis in Machine Learning Consider estimating β R m in linear regression Y i = βx i + e i, where {(Y i, X i )} n i=1 are data points. Blanchet (Columbia U. and Stanford U.) 20 / 25

Robust Performance Analysis in Machine Learning Consider estimating β R m in linear regression Y i = βx i + e i, where {(Y i, X i )} n i=1 are data points. Optimal Least Squares approach consists in estimating β via n 1 ( ) 2 MSE (β) = min β n Y i β T X i. i=1 Blanchet (Columbia U. and Stanford U.) 20 / 25

Robust Performance Analysis in Machine Learning Consider estimating β R m in linear regression Y i = βx i + e i, where {(Y i, X i )} n i=1 are data points. Optimal Least Squares approach consists in estimating β via n 1 ( ) 2 MSE (β) = min β n Y i β T X i. i=1 Apply the distributionally robust estimator based on optimal transport. Blanchet (Columbia U. and Stanford U.) 20 / 25

Connection to Sqrt-Lasso Theorem (B., Kang, Murthy (2016)) Suppose that c ( (x, y), ( { x, y )) x x = 2 q if y = y if y = y. Then, if 1/p + 1/q = 1 ( ( ) ) 2 max E 1/2 P Y β T X = P :D c (P,P n ) δ MSE (β) + δ β 2 p. Remark 1: This is sqrt-lasso (Belloni et al. (2011)). Remark 2: Also representations for support vector machines, LAD lasso, group lasso, adaptive lasso, and more! Blanchet (Columbia U. and Stanford U.) 21 / 25

For Instance... Regularized Estimators Theorem (B., Kang, Murthy (2016)) Suppose that c ( (x, y), ( x, y )) = { x x q if y = y if y = y. Then, ] sup E P [log(1 + e Y βt X ) P : D c (P,P n ) δ = 1 n n log(1 + e Y i β T X i ) + δ β p. i=1 Remark 1: This is regularized logistic regression (see also Esfahani and Kuhn 2015). Blanchet (Columbia U. and Stanford U.) 22 / 25

Connections to Empirical Likelihood Paper: https://arxiv.org/abs/1610.05627 Also chooses δ optimally introducing an extension of Empirical Likelihood called "Robust Wasserstein Profile Inference". Blanchet (Columbia U. and Stanford U.) 23 / 25

The Robust Wasserstein Profile Function Pick δ = 95% quantile of R n (β ) and we show that nr n (β ) d E [e 2 ] E [e 2 ] (E e ) 2 N(0, Cov (X )) 2 q. Blanchet (Columbia U. and Stanford U.) 24 / 25

Conclusions Presented systematic approach for quantifying model misspecification. Blanchet (Columbia U. and Stanford U.) 25 / 25

Conclusions Presented systematic approach for quantifying model misspecification. Approach based on optimal transport sup P (X B) = P 0 (c B (X ) 1/λ ). D (P,P 0 ) δ Blanchet (Columbia U. and Stanford U.) 25 / 25

Conclusions Presented systematic approach for quantifying model misspecification. Approach based on optimal transport sup P (X B) = P 0 (c B (X ) 1/λ ). D (P,P 0 ) δ Closed forms solutions, tractable in terms of P 0, feasible calibration of δ. Blanchet (Columbia U. and Stanford U.) 25 / 25

Conclusions Presented systematic approach for quantifying model misspecification. Approach based on optimal transport sup P (X B) = P 0 (c B (X ) 1/λ ). D (P,P 0 ) δ Closed forms solutions, tractable in terms of P 0, feasible calibration of δ. New statistical estimators, connections to machine learning & regularization. Blanchet (Columbia U. and Stanford U.) 25 / 25

Conclusions Presented systematic approach for quantifying model misspecification. Approach based on optimal transport sup P (X B) = P 0 (c B (X ) 1/λ ). D (P,P 0 ) δ Closed forms solutions, tractable in terms of P 0, feasible calibration of δ. New statistical estimators, connections to machine learning & regularization. Extensions of Empirical Likelihood & connections to optimal regularization. Blanchet (Columbia U. and Stanford U.) 25 / 25