Generative Models and Optimal Transport

Size: px
Start display at page:

Download "Generative Models and Optimal Transport"

Transcription

1 Generative Models and Optimal Transport Marco Cuturi Joint work / work in progress with G. Peyré, A. Genevay (ENS), F. Bach (INRIA), G. Montavon, K-R Müller (TU Berlin)

2 Statistics 0.1 : Density Fitting We collect data NX data = 1 N i=1 x i data 2

3 Statistics 0.1 : Density Fitting We collect data NX data = 1 N i=1 x i p 0 data We fit a parametric family of densities {p, 2 } e.g. =(m, ); p = N (m, ) 2

4 Density Fitting p 1 data

5 Density Fitting data p 2

6 Density Fitting data p done! We stop when there is a good fit.

7 Maximum Likelihood Estimation data NX max 2 1 N i=1 log p (x i ) p done!

8 Maximum Likelihood Estimation data NX max 2 1 N i=1 log p (x i ) log 0 = 1 p (x i ) must be > 0 p done!

9 Maximum Likelihood Estimation Equivalent to a KL projection in the space of probability measures data data {p, 2 } p 1 KL p done! p 2 p done! min KL( datakp ) 2

10 Maximum Likelihood Estimation Equivalent to a KL projection in the space of probability measures data data KL {p, 2 } p 1 p 0 p done! p 2 p done! min KL( datakp ) 2

11 In higher dimensional spaces data min KL( datakp ) 2 8

12 In higher dimensional spaces p data min KL( datakp ) 2 8

13 In higher dimensional spaces p data Data space has dimension

14 Generative Models data 10

15 Generative Models µ latent space data data space 10

16 Generative Models µ f : latent space! data space latent space data data space 10

17 Generative Models µ z latent space f : latent space! data space data z = data space 10

18 Generative Models µ z f : latent space! data space latent space f (z) data data space z = f 10

19 Generative Models µ z f : latent space! data space latent space f (z) data data space z = f 10

20 Generative Models µ f : latent space! data space latent space data f ] µ data space 10

21 Generative Models µ f : latent space! data space latent space data f ] µ data space Push-forward: 8B, f ] µ(b) :=µ(f 1 (B)) 10

22 Generative Models µ f : latent space! data space latent space data f ] µ data space Goal: find such that f ] µ fits data 11

23 Generative Models µ f : latent space! data space latent space data f ] µ data space Goal: find such that f ] µ fits data 11

24 Generative Models µ f : latent space! data space latent space data f ] µ data space Di erence between fitting a push forward measure f ] µ vs. a density p? 12

25 Generative Models µ f : latent space! data space latent space data f ] µ data space MLE max 2 1 N NX i=1 log p (x i ) = min KL( datakp ) 2 13

26 Generative Models µ f : latent space! data space latent space data f ] µ data space MLE max 2 1 N NX i=1 log f ] µ(x i ) min KL( datakf ] µ) 2 14

27 Generative Models µ f : latent space! data space latent space data f ] µ data space MLE max 2 1 N NX i=1 log f ] µ(x i ) min KL( datakf ] µ) 2 14

28 Generative Models µ f : latent space! data space latent space data f ] µ data space Need a more flexible discrepancy function to compare data and f ] µ 15

29 µ Workarounds? latent space data space data Formulation as adversarial problem [GPM 14] min 2 max Accuracy g ((f ] µ, +1), ( data, 1)) classifiers g Use a richer metric for probability measures, able to handle measures with non-overlapping supports: min 2 ( data, p ), not min 2 KL( datakp ) 16

30 Minimum Estimation l 1 17

31 Minimum Kantorovich Estimation Use optimal transport theory, namely Wasserstein distances to define discrepancy. min W ( data,f ] µ) 2 Optimal transport? fertile field in mathematics. Monge Kantorovich Koopmans Dantzig Brenier Otto McCann Villani Nobel 75 Fields 10 18

32 What is Optimal Transport? A geometric toolbox to compare probability measures supported on a metric space. p h 1 p 0 d h 2 Statistical Models Bags of features Brain Activation Maps Empirical Measures, i.e. data µ 19 Color Histograms

33 What is Optimal Transport? A geometric toolbox to compare probability measures supported on a metric space. p p 0 d h 2 Statistical Models Bags of features Brain Activation Maps Empirical Measures, i.e. data µ 20 Color Histograms

34 What is Optimal Transport? A powerful geometric toolbox to compare probability measures. 21 [SDPC.. 15]

35 What is Optimal Transport? A powerful geometric toolbox to compare probability measures. Wasserstein Barycenters [Agueh 11] 21 [SDPC.. 15]

36 What is Optimal Transport? A powerful geometric toolbox to compare probability measures. Wasserstein Barycenters [Agueh 11] 22 [SDPC.. 15]

37 Origins: Monge s Problem 23

38 Origins: Monge s Problem 24

39 Origins: Monge s Problem 24

40 Origins: Monge s Problem 24

41 Origins: Monge s Problem 24

42 Origins: Monge s Problem 24

43 Origins: Monge s Problem 24

44 Origins: Monge s Problem x 24

45 Origins: Monge s Problem x 24

46 Origins: Monge s Problem x y = T (x) 24

47 Origins: Monge s Problem x y = T (x) D(x, T (x)) 24

48 Origins: Monge s Problem a probability space, c :! R. µ, two probability measures in P( ). [Monge 81] problem: Z find a map T :! inf T ] µ= c(x, T (x))µ(dx) x T (x) 25

49 Origins: Monge s Problem a probability space, c :! R. µ, two probability measures in P( ). [Monge 81] problem: find a map T :! [Brenier 87] If = R d, c = k k 2, µ, a.c., then T = ru, u convex. x T (x) 25

50 Monge s Problem a probability space, c :! R. µ, two probability measures in P( ). [Monge 81] problem: Z find a map T :! inf T ] µ= c(x, T (x))µ(dx) x T (x) 26

51 Monge s Problem a probability space, c :! R. µ, two probability measures in P( ). [Monge 81] problem: Z find a map T :! inf T ] µ= c(x, T (x))µ(dx) x 26

52 [Kantorovich 42] Relaxation Instead of maps T :!, consider P 2 P( ) probabilistic maps, i.e. couplings : (µ, ) def = {P 2 P( ) 8A, B, P (A ) =µ(a), P ( B) = (B)} 27

53 [Kantorovich 42] Relaxation (µ, ) def = {P 2 P( ) 8A, B, P (A ) =µ(a), P ( B) = (B)} } { } { } { 0.6 µ(x) ν(y) 0.4 P x y 3 4 P (x, y)

54 [Kantorovich 42] Relaxation Joint Probabilities of (µ, ν) (µ, ) def = {P 2 P( ) 8A, B, Π(µ, ν) = For probability P Π(µ, measures ν), on Ω 2 P (A ) =µ(a), P ( B) = (B)} P ({x},} Ω) =µ({x}) with { marginals } and P µ (Ω, and{ {y}) ν. } =ν({y}) { 0.6 µ(x) ν(y) 0.4 P x y 3 4 P (x, y)

55 Wasserstein Distances Def. For p 1, the p-wasserstein distance between µ, in P( ), defined by a metric D on, ZZ W p p (µ, ) def = inf P 2 (µ, ) D(x, y) p P (dx, dy). PRIMAL 29

56 Wasserstein Distances Def. For p 1, the p-wasserstein distance between µ, in P( ), defined by a metric D on, ZZ W p p (µ, ) def = inf P 2 (µ, ) D(x, y) p P (dx, dy). PRIMAL 29

57 Wasserstein Distances Def. For p 1, the p-wasserstein distance between µ, in P( ), defined by a metric D on, ZZ W p p (µ, ) def = inf P 2 (µ, ) D(x, y) p P (dx, dy). PRIMAL Z Z W p p (µ, ) = sup '2L 1 (µ), 2L 1 ( ) '(x)+ (y)appled p (x,y) 29 'dµ + d. DUAL

58 W is versatile Discrete - Discrete Discrete - Continuous Continuous - Continuous 30

59 W is versatile Discrete - Discrete - Network flow solvers - Entropic regularization Discrete - Continuous low dim. [M 11][KMB 16] [L 15] Continuous - Continuous Stochastic Optimization [GCPB 16] 30

60 Minimum Kantorovich Estimators min W ( data,f ] µ) 2 [Bassetti 06] 1st reference discussing this approach. [MMC 16] use regularization in a finite setting. [ACB 17] (WGAN) [BJGR 17] (Wasserstein ABC). Hot topics: approximate & differentiate W efficiently. Today: ideas from our recent preprint [GPC 17] 31

61 Wasserstein on Empirical Measures nx µ = i=1 a i xi (, D) mx 32 = j=1 b j yj

62 Wasserstein on Empirical Measures nx mx Consider µ = a i xi and = b j yj. i=1 M XY def =[D(x i, y j ) p ] ij j=1 U(a, b) def = {P 2 R n m + P 1 m = a,p T 1 n = b} 2 y 1... y m 3 2 b 1... b m 3 x 1. 6 D(x i, y j ) p 4 x n a 1. 6 P 1 m = a 4 a n 7 5

63 Wasserstein on Empirical Measures nx mx Consider µ = a i xi and = b j yj. i=1 M XY def =[D(x i, y j ) p ] ij j=1 U(a, b) def = {P 2 R n m + P 1 m = a,p T 1 n = b} 2 y 1... y m x 1. 6 D(x i, y j ) p 4 x n b 1... b m a P T 1 4 n = b a n 3 7 5

64 Wasserstein on Empirical Measures nx mx Consider µ = a i xi and = b j yj. i=1 M XY def =[D(x i, y j ) p ] ij j=1 U(a, b) def = {P 2 R n m + P 1 m = a,p T 1 n = b} Def. Optimal Transport Problem W p p (µ, ) = min hp,m XY i P 2U(a,b) 33

65 Discrete OT Problem M XY U(a, b) 34

66 Discrete OT Problem M XY P? U(a, b) 35

67 Discrete OT Problem M XY P? U(a, b) Def. Dual OT problem W p p (µ, ) = max T a + 2R n, 2R m i + j appled(x i,y j ) p 35 T b

68 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) P? Note: flow/pde formulations [Beckman 61]/[Benamou 98] can be used for p=1/p=2 for a sparse-graph metric/euclidean metric. 35

69 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) P? 36

70 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) P? P? Solution unstable and not always unique. 36

71 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) {P? } P? Solution unstable and not always unique. 36

72 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) {P? } P? Solution unstable and not always unique. 37

73 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) P? Solution unstable and not always unique. P? 37

74 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) P? Solution unstable and not always unique. P? W p p (µ, ) not di erentiable. 37

75 Entropic Regularization [Wilson 62] Def. Regularized Wasserstein, 0 W (µ, ) def = min P 2U(a,b) hp,m XY i E(P ) E(P ) def = nmx i,j=1 P ij (log P ij ) Note: Unique optimal solution because of strong concavity of Entropy 38

76 Entropic Regularization [Wilson 62] Def. Regularized Wasserstein, 0 W (µ, ) def = min P 2U(a,b) hp,m XY i E(P ) µ P Note: Unique optimal solution because of strong concavity of Entropy 38

77 Fast & Scalable Algorithm Prop. If P def = argmin hp,m XY i E(P ) P 2U(a,b) then 9!u 2 R n +, v 2 R m +, such that P = diag(u)kdiag(v), K def = e M XY / 39

78 Fast & Scalable Algorithm Prop. If P def = argmin hp,m XY i E(P ) P 2U(a,b) then 9!u 2 R n +, v 2 R m +, such that P = diag(u)kdiag(v), K def = e M XY / L(P,, )= X ij P ij M ij + P ij log P ij + T (P 1 a)+ T (P T 1 ij = M ij + (log P ij + 1) + i + j (@L/@P ij = 0) )P ij = e i e M ij e j = u i K ij v j 39

79 Fast & Scalable Algorithm Prop. If P def = argmin hp,m XY i E(P ) P 2U(a,b) then 9!u 2 R n +, v 2 R m +, such that P = diag(u)kdiag(v), K def = e M XY / [Sinkhorn 64] fixed-point iterations for (u, v) u a/kv, v b/k T u O(nm) complexity, GPGPU parallel [C 13]. O(n d+1 ) if = {1,...,n} d and separable. D p [S..C.. 15] 39

80 Sinkhorn Divergence Def. For > 0, let W (µ, ) def = hp,m XY i Prop. W (µ, µ) > 0 Def. Normalized Sinkhorn Divergence W (µ, ) def = W (µ, ) 1 2 (W (µ, µ)+w (, )) Prop. If p = 1, W (µ, )!!1 ED(µ, ) 40

81 Algorithmic Formulation Def. For L 1, define W L (µ, ) def = hp L,M XY i, where P L def = diag(u L )Kdiag(v L ), v 0 = 1 m ; l 0, u l def = a/kv l, v l+1 def = b/k T u l. L can be computed recursively, in O(L) kernelk vector products. 41

82 Proposal: Autodiff OT using Sinkhorn Approximate W loss by the transport cost W L after L Sinkhorn iterations. (z1,...,zm) (x1,...,xm) Input data (c(x i,y j )) i,j e C/" C... (y1,...,yn) K nk 1 b`+1 m 1/ 1/ ÊL ( ) ` ` +1 mk > Generative model Sinkhorn ` =1,...,L 1 b` a`+1 h(c K)b L,a L i 42 [GPC 17]

83 Example: MNIST, Learning f 43

84 Example: Generation of Images (a) MMD (b) (c) MMD-GAN τ = 1000 τ=10 CIFAR 10 images In these examples the cost function is also learned adversarially, as a NN mapping onto feature vectors. 44

85 Concluding Remarks Regularized OT is much faster than OT. Regularized OT can interpolate between W and the MMD / Energy distance metrics. The solution of regularized OT is auto-differentiable. Many open problems remain! NIPS 17 WORKSHOP NIPS 17 TUTORIAL 45

Numerical Optimal Transport and Applications

Numerical Optimal Transport and Applications Numerical Optimal Transport and Applications Gabriel Peyré Joint works with: Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, Justin Solomon www.numerical-tours.com Histograms in Imaging

More information

softened probabilistic

softened probabilistic Justin Solomon MIT Understand geometry from a softened probabilistic standpoint. Somewhere over here. Exactly here. One of these two places. Query 1 2 Which is closer, 1 or 2? Query 1 2 Which is closer,

More information

Optimal Transport: A Crash Course

Optimal Transport: A Crash Course Optimal Transport: A Crash Course Soheil Kolouri and Gustavo K. Rohde HRL Laboratories, University of Virginia Introduction What is Optimal Transport? The optimal transport problem seeks the most efficient

More information

Optimal Transport in ML

Optimal Transport in ML Optimal Transport in ML Rémi Gilleron Inria Lille & CRIStAL & Univ. Lille Feb. 2018 Main Source also some figures Computational Optimal transport, G. Peyré and M. Cuturi Rémi Gilleron (Magnet) OT Feb.

More information

Optimal transport for Gaussian mixture models

Optimal transport for Gaussian mixture models 1 Optimal transport for Gaussian mixture models Yongxin Chen, Tryphon T. Georgiou and Allen Tannenbaum Abstract arxiv:1710.07876v2 [math.pr] 30 Jan 2018 We present an optimal mass transport framework on

More information

Optimal mass transport as a distance measure between images

Optimal mass transport as a distance measure between images Optimal mass transport as a distance measure between images Axel Ringh 1 1 Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden. 21 st of June 2018 INRIA, Sophia-Antipolis, France

More information

Wasserstein Training of Boltzmann Machines

Wasserstein Training of Boltzmann Machines Wasserstein Training of Boltzmann Machines Grégoire Montavon, Klaus-Rober Muller, Marco Cuturi Presenter: Shiyu Liang December 1, 2016 Coordinated Science Laboratory Department of Electrical and Computer

More information

AdaGAN: Boosting Generative Models

AdaGAN: Boosting Generative Models AdaGAN: Boosting Generative Models Ilya Tolstikhin ilya@tuebingen.mpg.de joint work with Gelly 2, Bousquet 2, Simon-Gabriel 1, Schölkopf 1 1 MPI for Intelligent Systems 2 Google Brain Radford et al., 2015)

More information

Optimal Transport and Wasserstein Distance

Optimal Transport and Wasserstein Distance Optimal Transport and Wasserstein Distance The Wasserstein distance which arises from the idea of optimal transport is being used more and more in Statistics and Machine Learning. In these notes we review

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

Optimal transport for machine learning

Optimal transport for machine learning Optimal transport for machine learning Rémi Flamary AG GDR ISIS, Sète, 16 Novembre 2017 2 / 37 Collaborators N. Courty A. Rakotomamonjy D. Tuia A. Habrard M. Cuturi M. Perrot C. Févotte V. Emiya V. Seguy

More information

arxiv: v1 [cs.lg] 13 Nov 2018

arxiv: v1 [cs.lg] 13 Nov 2018 SEMI-DUAL REGULARIZED OPTIMAL TRANSPORT MARCO CUTURI AND GABRIEL PEYRÉ arxiv:1811.05527v1 [cs.lg] 13 Nov 2018 Abstract. Variational problems that involve Wasserstein distances and more generally optimal

More information

Introduction to optimal transport

Introduction to optimal transport Introduction to optimal transport Nicola Gigli May 20, 2011 Content Formulation of the transport problem The notions of c-convexity and c-cyclical monotonicity The dual problem Optimal maps: Brenier s

More information

Optimal Transport Methods in Operations Research and Statistics

Optimal Transport Methods in Operations Research and Statistics Optimal Transport Methods in Operations Research and Statistics Jose Blanchet (based on work with F. He, Y. Kang, K. Murthy, F. Zhang). Stanford University (Management Science and Engineering), and Columbia

More information

The dynamics of Schrödinger bridges

The dynamics of Schrödinger bridges Stochastic processes and statistical machine learning February, 15, 2018 Plan of the talk The Schrödinger problem and relations with Monge-Kantorovich problem Newton s law for entropic interpolation The

More information

Mathematical Foundations of Data Sciences

Mathematical Foundations of Data Sciences Mathematical Foundations of Data Sciences Gabriel Peyré CNRS & DMA École Normale Supérieure gabriel.peyre@ens.fr https://mathematical-tours.github.io www.numerical-tours.com January 7, 2018 Chapter 18

More information

Approximations of displacement interpolations by entropic interpolations

Approximations of displacement interpolations by entropic interpolations Approximations of displacement interpolations by entropic interpolations Christian Léonard Université Paris Ouest Mokaplan 10 décembre 2015 Interpolations in P(X ) X : Riemannian manifold (state space)

More information

Wasserstein GAN. Juho Lee. Jan 23, 2017

Wasserstein GAN. Juho Lee. Jan 23, 2017 Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)

More information

Some geometric consequences of the Schrödinger problem

Some geometric consequences of the Schrödinger problem Some geometric consequences of the Schrödinger problem Christian Léonard To cite this version: Christian Léonard. Some geometric consequences of the Schrödinger problem. Second International Conference

More information

Information theoretic perspectives on learning algorithms

Information theoretic perspectives on learning algorithms Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez

More information

Geometric Inference for Probability distributions

Geometric Inference for Probability distributions Geometric Inference for Probability distributions F. Chazal 1 D. Cohen-Steiner 2 Q. Mérigot 2 1 Geometrica, INRIA Saclay, 2 Geometrica, INRIA Sophia-Antipolis 2009 June 1 Motivation What is the (relevant)

More information

Concentration inequalities: basics and some new challenges

Concentration inequalities: basics and some new challenges Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,

More information

Discrete transport problems and the concavity of entropy

Discrete transport problems and the concavity of entropy Discrete transport problems and the concavity of entropy Bristol Probability and Statistics Seminar, March 2014 Funded by EPSRC Information Geometry of Graphs EP/I009450/1 Paper arxiv:1303.3381 Motivating

More information

Optimal Transportation. Nonlinear Partial Differential Equations

Optimal Transportation. Nonlinear Partial Differential Equations Optimal Transportation and Nonlinear Partial Differential Equations Neil S. Trudinger Centre of Mathematics and its Applications Australian National University 26th Brazilian Mathematical Colloquium 2007

More information

A new Hellinger-Kantorovich distance between positive measures and optimal Entropy-Transport problems

A new Hellinger-Kantorovich distance between positive measures and optimal Entropy-Transport problems A new Hellinger-Kantorovich distance between positive measures and optimal Entropy-Transport problems Giuseppe Savaré http://www.imati.cnr.it/ savare Dipartimento di Matematica, Università di Pavia Nonlocal

More information

Understanding GANs: Back to the basics

Understanding GANs: Back to the basics Understanding GANs: Back to the basics David Tse Stanford University Princeton University May 15, 2018 Joint work with Soheil Feizi, Farzan Farnia, Tony Ginart, Changho Suh and Fei Xia. GANs at NIPS 2017

More information

Convex discretization of functionals involving the Monge-Ampère operator

Convex discretization of functionals involving the Monge-Ampère operator Convex discretization of functionals involving the Monge-Ampère operator Quentin Mérigot CNRS / Université Paris-Dauphine Joint work with J.D. Benamou, G. Carlier and É. Oudet Workshop on Optimal Transport

More information

Fast Computation of Wasserstein Barycenters

Fast Computation of Wasserstein Barycenters Marco Cuturi Graduate School of Informatics, Kyoto University Arnaud Doucet Department of Statistics, University of Oxford MCUTURI@I.KYOTO-U.AC.JP DOUCET@STAT.OXFORD.AC.UK Abstract We present new algorithms

More information

Geometric Inference for Probability distributions

Geometric Inference for Probability distributions Geometric Inference for Probability distributions F. Chazal 1 D. Cohen-Steiner 2 Q. Mérigot 2 1 Geometrica, INRIA Saclay, 2 Geometrica, INRIA Sophia-Antipolis 2009 July 8 F. Chazal, D. Cohen-Steiner, Q.

More information

Displacement convexity of the relative entropy in the discrete h

Displacement convexity of the relative entropy in the discrete h Displacement convexity of the relative entropy in the discrete hypercube LAMA Université Paris Est Marne-la-Vallée Phenomena in high dimensions in geometric analysis, random matrices, and computational

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Optimal Transport for Data Analysis

Optimal Transport for Data Analysis Optimal Transport for Data Analysis Bernhard Schmitzer 2017-05-30 1 Introduction 1.1 Reminders on Measure Theory Reference: Ambrosio, Fusco, Pallara: Functions of Bounded Variation and Free Discontinuity

More information

Numerical Optimal Transport. Applications

Numerical Optimal Transport. Applications Numerical Optimal Transport http://optimaltransport.github.io Applications Gabriel Peyré www.numerical-tours.com ÉCOLE NORMALE SUPÉRIEURE Grayscale Image Equalization Figure 2.9: 1-D optimal couplings:

More information

Superhedging and distributionally robust optimization with neural networks

Superhedging and distributionally robust optimization with neural networks Superhedging and distributionally robust optimization with neural networks Stephan Eckstein joint work with Michael Kupper and Mathias Pohl Robust Techniques in Quantitative Finance University of Oxford

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

Optimal transportation and optimal control in a finite horizon framework

Optimal transportation and optimal control in a finite horizon framework Optimal transportation and optimal control in a finite horizon framework Guillaume Carlier and Aimé Lachapelle Université Paris-Dauphine, CEREMADE July 2008 1 MOTIVATIONS - A commitment problem (1) Micro

More information

arxiv: v2 [math.ap] 23 Apr 2014

arxiv: v2 [math.ap] 23 Apr 2014 Multi-marginal Monge-Kantorovich transport problems: A characterization of solutions arxiv:1403.3389v2 [math.ap] 23 Apr 2014 Abbas Moameni Department of Mathematics and Computer Science, University of

More information

Optimal Transport for Data Analysis

Optimal Transport for Data Analysis Optimal Transport for Data Analysis Bernhard Schmitzer 2017-05-16 1 Introduction 1.1 Reminders on Measure Theory Reference: Ambrosio, Fusco, Pallara: Functions of Bounded Variation and Free Discontinuity

More information

Math 5051 Measure Theory and Functional Analysis I Homework Assignment 3

Math 5051 Measure Theory and Functional Analysis I Homework Assignment 3 Math 551 Measure Theory and Functional Analysis I Homework Assignment 3 Prof. Wickerhauser Due Monday, October 12th, 215 Please do Exercises 3*, 4, 5, 6, 8*, 11*, 17, 2, 21, 22, 27*. Exercises marked with

More information

Information Geometry

Information Geometry 2015 Workshop on High-Dimensional Statistical Analysis Dec.11 (Friday) ~15 (Tuesday) Humanities and Social Sciences Center, Academia Sinica, Taiwan Information Geometry and Spontaneous Data Learning Shinto

More information

Dynamic and Stochastic Brenier Transport via Hopf-Lax formulae on Was

Dynamic and Stochastic Brenier Transport via Hopf-Lax formulae on Was Dynamic and Stochastic Brenier Transport via Hopf-Lax formulae on Wasserstein Space With many discussions with Yann Brenier and Wilfrid Gangbo Brenierfest, IHP, January 9-13, 2017 ain points of the

More information

Optimal Transport And WGAN

Optimal Transport And WGAN Optimal Transport And WGAN Yiping Lu 1 1 School of Mathmetical Science Peking University Data Science Seminar, PKU, 2017 Yiping Lu (pku) Geometry Of WGAN 1/34 Outline 1 Introduction To Optimal Transport.

More information

Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018

Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018 Some theoretical properties of GANs Gérard Biau Toulouse, September 2018 Coauthors Benoît Cadre (ENS Rennes) Maxime Sangnier (Sorbonne University) Ugo Tanielian (Sorbonne University & Criteo) 1 video Source:

More information

Stability results for Logarithmic Sobolev inequality

Stability results for Logarithmic Sobolev inequality Stability results for Logarithmic Sobolev inequality Daesung Kim (joint work with Emanuel Indrei) Department of Mathematics Purdue University September 20, 2017 Daesung Kim (Purdue) Stability for LSI Probability

More information

PATH FUNCTIONALS OVER WASSERSTEIN SPACES. Giuseppe Buttazzo. Dipartimento di Matematica Università di Pisa.

PATH FUNCTIONALS OVER WASSERSTEIN SPACES. Giuseppe Buttazzo. Dipartimento di Matematica Università di Pisa. PATH FUNCTIONALS OVER WASSERSTEIN SPACES Giuseppe Buttazzo Dipartimento di Matematica Università di Pisa buttazzo@dm.unipi.it http://cvgmt.sns.it ENS Ker-Lann October 21-23, 2004 Several natural structures

More information

Numerical Approximation of L 1 Monge-Kantorovich Problems

Numerical Approximation of L 1 Monge-Kantorovich Problems Rome March 30, 2007 p. 1 Numerical Approximation of L 1 Monge-Kantorovich Problems Leonid Prigozhin Ben-Gurion University, Israel joint work with J.W. Barrett Imperial College, UK Rome March 30, 2007 p.

More information

On a Class of Multidimensional Optimal Transportation Problems

On a Class of Multidimensional Optimal Transportation Problems Journal of Convex Analysis Volume 10 (2003), No. 2, 517 529 On a Class of Multidimensional Optimal Transportation Problems G. Carlier Université Bordeaux 1, MAB, UMR CNRS 5466, France and Université Bordeaux

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Optimal Transport for Applied Mathematicians

Optimal Transport for Applied Mathematicians Optimal Transport for Applied Mathematicians Calculus of Variations, PDEs and Modelling Filippo Santambrogio 1 1 Laboratoire de Mathématiques d Orsay, Université Paris Sud, 91405 Orsay cedex, France filippo.santambrogio@math.u-psud.fr,

More information

Penalized Barycenters in the Wasserstein space

Penalized Barycenters in the Wasserstein space Penalized Barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS Journées IOP - Du 5 au 8 Juillet 2017 Bordeaux Elsa Cazelles

More information

arxiv: v1 [math.oc] 27 May 2016

arxiv: v1 [math.oc] 27 May 2016 Stochastic Optimization for Large-scale Optimal Transport arxiv:1605.08527v1 [math.oc] 27 May 2016 Aude Genevay CEREMADE, Univ. Paris-Dauphine INRIA Mokaplan project-team genevay@ceremade.dauphine.fr Gabriel

More information

Martingale optimal transport with Monge s cost function

Martingale optimal transport with Monge s cost function Martingale optimal transport with Monge s cost function Martin Klimmek, joint work with David Hobson (Warwick) klimmek@maths.ox.ac.uk Crete July, 2013 ca. 1780 c(x, y) = x y No splitting, transport along

More information

Regularity for the optimal transportation problem with Euclidean distance squared cost on the embedded sphere

Regularity for the optimal transportation problem with Euclidean distance squared cost on the embedded sphere Regularity for the optimal transportation problem with Euclidean distance squared cost on the embedded sphere Jun Kitagawa and Micah Warren January 6, 011 Abstract We give a sufficient condition on initial

More information

arxiv: v3 [stat.ml] 12 Mar 2018

arxiv: v3 [stat.ml] 12 Mar 2018 Wasserstein Auto-Encoders Ilya Tolstikhin 1, Olivier Bousquet 2, Sylvain Gelly 2, and Bernhard Schölkopf 1 1 Max Planck Institute for Intelligent Systems 2 Google Brain arxiv:1711.01558v3 [stat.ml] 12

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Optimal Transport in Risk Analysis

Optimal Transport in Risk Analysis Optimal Transport in Risk Analysis Jose Blanchet (based on work with Y. Kang and K. Murthy) Stanford University (Management Science and Engineering), and Columbia University (Department of Statistics and

More information

Stochastic Optimization for Large-scale Optimal Transport

Stochastic Optimization for Large-scale Optimal Transport Stochastic Optimization for Large-scale Optimal Transport Aude Genevay, Marco Cuturi, Gabriel Peyré, Francis Bach To cite this version: Aude Genevay, Marco Cuturi, Gabriel Peyré, Francis Bach. Stochastic

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

NEW FUNCTIONAL INEQUALITIES

NEW FUNCTIONAL INEQUALITIES 1 / 29 NEW FUNCTIONAL INEQUALITIES VIA STEIN S METHOD Giovanni Peccati (Luxembourg University) IMA, Minneapolis: April 28, 2015 2 / 29 INTRODUCTION Based on two joint works: (1) Nourdin, Peccati and Swan

More information

GAN, Optimal Transport and Monge-Ampère Equation

GAN, Optimal Transport and Monge-Ampère Equation GAN, Optimal Transport and Monge-Ampère Equation David Xianfeng Gu 1,2, Shing-Tung Yau 2 1 Computer Science & Applied Mathematics Stony Brook University 2 Center of Mathematical Sciences and Applications

More information

Energy-Based Generative Adversarial Network

Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial

More information

A Closed-form Gradient for the 1D Earth Mover s Distance for Spectral Deep Learning on Biological Data

A Closed-form Gradient for the 1D Earth Mover s Distance for Spectral Deep Learning on Biological Data A Closed-form Gradient for the D Earth Mover s Distance for Spectral Deep Learning on Biological Data Manuel Martinez, Makarand Tapaswi, and Rainer Stiefelhagen Karlsruhe Institute of Technology, Karlsruhe,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Gradient Flow in the Wasserstein Metric

Gradient Flow in the Wasserstein Metric 3 2 1 Time Position -0.5 0.0 0.5 0 Graient Flow in the Wasserstein Metric Katy Craig University of California, Santa Barbara JMM, AMS Metric Geometry & Topology January 11th, 2018 graient flow an PDE Examples:

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information

Game Theory and its Applications to Networks - Part I: Strict Competition

Game Theory and its Applications to Networks - Part I: Strict Competition Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis

More information

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making

More information

Convex discretization of functionals involving the Monge-Ampère operator

Convex discretization of functionals involving the Monge-Ampère operator Convex discretization of functionals involving the Monge-Ampère operator Quentin Mérigot CNRS / Université Paris-Dauphine Joint work with J.D. Benamou, G. Carlier and É. Oudet GeMeCod Conference October

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Entropic and Displacement Interpolation: Tryphon Georgiou

Entropic and Displacement Interpolation: Tryphon Georgiou Entropic and Displacement Interpolation: Schrödinger bridges, Monge-Kantorovich optimal transport, and a Computational Approach Based on the Hilbert Metric Tryphon Georgiou University of Minnesota IMA

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Geometry and Optimization of Relative Arbitrage

Geometry and Optimization of Relative Arbitrage Geometry and Optimization of Relative Arbitrage Ting-Kam Leonard Wong joint work with Soumik Pal Department of Mathematics, University of Washington Financial/Actuarial Mathematic Seminar, University of

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Convex inequalities, isoperimetry and spectral gap III

Convex inequalities, isoperimetry and spectral gap III Convex inequalities, isoperimetry and spectral gap III Jesús Bastero (Universidad de Zaragoza) CIDAMA Antequera, September 11, 2014 Part III. K-L-S spectral gap conjecture KLS estimate, through Milman's

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL

APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL Adrien Blanchet IAST/TSE/Université Toulouse 1 Capitole Journées SMAI-MODE 216 En collaboration avec G. Carlier & P. Mossay & F. Santambrogio

More information

Logarithmic Sobolev Inequalities

Logarithmic Sobolev Inequalities Logarithmic Sobolev Inequalities M. Ledoux Institut de Mathématiques de Toulouse, France logarithmic Sobolev inequalities what they are, some history analytic, geometric, optimal transportation proofs

More information

Second order models for optimal transport and cubic splines on the Wasserstein space

Second order models for optimal transport and cubic splines on the Wasserstein space Second order models for optimal transport and cubic splines on the Wasserstein space Jean-David Benamou, Thomas Gallouët, François-Xavier Vialard To cite this version: Jean-David Benamou, Thomas Gallouët,

More information

arxiv: v1 [math.oc] 29 Dec 2017

arxiv: v1 [math.oc] 29 Dec 2017 Vector and Matrix Optimal Mass Transport: Theory, Algorithm, and Applications Ernest K. Ryu, Yongxin Chen, Wuchen Li, and Stanley Osher arxiv:1712.10279v1 [math.oc] 29 Dec 2017 December 29, 2017 Abstract

More information

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University Determinant maximization with linear matrix inequality constraints S. Boyd, L. Vandenberghe, S.-P. Wu Information Systems Laboratory Stanford University SCCM Seminar 5 February 1996 1 MAXDET problem denition

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Large Scale Semi-supervised Linear SVMs. University of Chicago

Large Scale Semi-supervised Linear SVMs. University of Chicago Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Free Energy, Fokker-Planck Equations, and Random walks on a Graph with Finite Vertices

Free Energy, Fokker-Planck Equations, and Random walks on a Graph with Finite Vertices Free Energy, Fokker-Planck Equations, and Random walks on a Graph with Finite Vertices Haomin Zhou Georgia Institute of Technology Jointly with S.-N. Chow (Georgia Tech) Wen Huang (USTC) Yao Li (NYU) Research

More information

A convex relaxation for weakly supervised classifiers

A convex relaxation for weakly supervised classifiers A convex relaxation for weakly supervised classifiers Armand Joulin and Francis Bach SIERRA group INRIA -Ecole Normale Supérieure ICML 2012 Weakly supervised classification We adress the problem of weakly

More information

Introduction to Optimal Transport Theory

Introduction to Optimal Transport Theory Introduction to Optimal Transport Theory Filippo Santambrogio Grenoble, June 15th 2009 These very short lecture notes do not want to be an exhaustive presentation of the topic, but only a short list of

More information

A Gaussian Type Kernel for Persistence Diagrams

A Gaussian Type Kernel for Persistence Diagrams TRIPODS Summer Bootcamp: Topology and Machine Learning Brown University, August 2018 A Gaussian Type Kernel for Persistence Diagrams Mathieu Carrière joint work with S. Oudot and M. Cuturi Persistence

More information

Part 4: Conditional Random Fields

Part 4: Conditional Random Fields Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 39 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution.

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

arxiv: v1 [cs.sy] 14 Apr 2013

arxiv: v1 [cs.sy] 14 Apr 2013 Matrix-valued Monge-Kantorovich Optimal Mass Transport Lipeng Ning, Tryphon T. Georgiou and Allen Tannenbaum Abstract arxiv:34.393v [cs.sy] 4 Apr 3 We formulate an optimal transport problem for matrix-valued

More information

Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem

Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem https://doi.org/10.1007/s41884-018-0002-8 RESEARCH PAPER Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem Shun-ichi Amari

More information

Optimal Transport for Domain Adaptation

Optimal Transport for Domain Adaptation Optimal Transport for Domain Adaptation Nicolas Courty, Rémi Flamary, Devis Tuia, Alain Rakotomamonjy To cite this version: Nicolas Courty, Rémi Flamary, Devis Tuia, Alain Rakotomamonjy. Optimal Transport

More information

Geometrical Insights for Implicit Generative Modeling

Geometrical Insights for Implicit Generative Modeling Geometrical Insights for Implicit Generative Modeling Leon Bottou a,b, Martin Arjovsky b, David Lopez-Paz a, Maxime Oquab a,c a Facebook AI Research, New York, Paris b New York University, New York c Inria,

More information

The optimal partial transport problem

The optimal partial transport problem The optimal partial transport problem Alessio Figalli Abstract Given two densities f and g, we consider the problem of transporting a fraction m [0, min{ f L 1, g L 1}] of the mass of f onto g minimizing

More information

SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE

SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE FRANÇOIS BOLLEY Abstract. In this note we prove in an elementary way that the Wasserstein distances, which play a basic role in optimal transportation

More information

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University

An Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University An Overview of Edward: A Probabilistic Programming System Dustin Tran Columbia University Alp Kucukelbir Eugene Brevdo Andrew Gelman Adji Dieng Maja Rudolph David Blei Dawen Liang Matt Hoffman Kevin Murphy

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

On dissipation distances for reaction-diffusion equations the Hellinger-Kantorovich distance

On dissipation distances for reaction-diffusion equations the Hellinger-Kantorovich distance Weierstrass Institute for! Applied Analysis and Stochastics On dissipation distances for reaction-diffusion equations the Matthias Liero, joint work with Alexander Mielke, Giuseppe Savaré Mohrenstr. 39

More information