Generative Models and Optimal Transport
|
|
- Barnard Allen
- 5 years ago
- Views:
Transcription
1 Generative Models and Optimal Transport Marco Cuturi Joint work / work in progress with G. Peyré, A. Genevay (ENS), F. Bach (INRIA), G. Montavon, K-R Müller (TU Berlin)
2 Statistics 0.1 : Density Fitting We collect data NX data = 1 N i=1 x i data 2
3 Statistics 0.1 : Density Fitting We collect data NX data = 1 N i=1 x i p 0 data We fit a parametric family of densities {p, 2 } e.g. =(m, ); p = N (m, ) 2
4 Density Fitting p 1 data
5 Density Fitting data p 2
6 Density Fitting data p done! We stop when there is a good fit.
7 Maximum Likelihood Estimation data NX max 2 1 N i=1 log p (x i ) p done!
8 Maximum Likelihood Estimation data NX max 2 1 N i=1 log p (x i ) log 0 = 1 p (x i ) must be > 0 p done!
9 Maximum Likelihood Estimation Equivalent to a KL projection in the space of probability measures data data {p, 2 } p 1 KL p done! p 2 p done! min KL( datakp ) 2
10 Maximum Likelihood Estimation Equivalent to a KL projection in the space of probability measures data data KL {p, 2 } p 1 p 0 p done! p 2 p done! min KL( datakp ) 2
11 In higher dimensional spaces data min KL( datakp ) 2 8
12 In higher dimensional spaces p data min KL( datakp ) 2 8
13 In higher dimensional spaces p data Data space has dimension
14 Generative Models data 10
15 Generative Models µ latent space data data space 10
16 Generative Models µ f : latent space! data space latent space data data space 10
17 Generative Models µ z latent space f : latent space! data space data z = data space 10
18 Generative Models µ z f : latent space! data space latent space f (z) data data space z = f 10
19 Generative Models µ z f : latent space! data space latent space f (z) data data space z = f 10
20 Generative Models µ f : latent space! data space latent space data f ] µ data space 10
21 Generative Models µ f : latent space! data space latent space data f ] µ data space Push-forward: 8B, f ] µ(b) :=µ(f 1 (B)) 10
22 Generative Models µ f : latent space! data space latent space data f ] µ data space Goal: find such that f ] µ fits data 11
23 Generative Models µ f : latent space! data space latent space data f ] µ data space Goal: find such that f ] µ fits data 11
24 Generative Models µ f : latent space! data space latent space data f ] µ data space Di erence between fitting a push forward measure f ] µ vs. a density p? 12
25 Generative Models µ f : latent space! data space latent space data f ] µ data space MLE max 2 1 N NX i=1 log p (x i ) = min KL( datakp ) 2 13
26 Generative Models µ f : latent space! data space latent space data f ] µ data space MLE max 2 1 N NX i=1 log f ] µ(x i ) min KL( datakf ] µ) 2 14
27 Generative Models µ f : latent space! data space latent space data f ] µ data space MLE max 2 1 N NX i=1 log f ] µ(x i ) min KL( datakf ] µ) 2 14
28 Generative Models µ f : latent space! data space latent space data f ] µ data space Need a more flexible discrepancy function to compare data and f ] µ 15
29 µ Workarounds? latent space data space data Formulation as adversarial problem [GPM 14] min 2 max Accuracy g ((f ] µ, +1), ( data, 1)) classifiers g Use a richer metric for probability measures, able to handle measures with non-overlapping supports: min 2 ( data, p ), not min 2 KL( datakp ) 16
30 Minimum Estimation l 1 17
31 Minimum Kantorovich Estimation Use optimal transport theory, namely Wasserstein distances to define discrepancy. min W ( data,f ] µ) 2 Optimal transport? fertile field in mathematics. Monge Kantorovich Koopmans Dantzig Brenier Otto McCann Villani Nobel 75 Fields 10 18
32 What is Optimal Transport? A geometric toolbox to compare probability measures supported on a metric space. p h 1 p 0 d h 2 Statistical Models Bags of features Brain Activation Maps Empirical Measures, i.e. data µ 19 Color Histograms
33 What is Optimal Transport? A geometric toolbox to compare probability measures supported on a metric space. p p 0 d h 2 Statistical Models Bags of features Brain Activation Maps Empirical Measures, i.e. data µ 20 Color Histograms
34 What is Optimal Transport? A powerful geometric toolbox to compare probability measures. 21 [SDPC.. 15]
35 What is Optimal Transport? A powerful geometric toolbox to compare probability measures. Wasserstein Barycenters [Agueh 11] 21 [SDPC.. 15]
36 What is Optimal Transport? A powerful geometric toolbox to compare probability measures. Wasserstein Barycenters [Agueh 11] 22 [SDPC.. 15]
37 Origins: Monge s Problem 23
38 Origins: Monge s Problem 24
39 Origins: Monge s Problem 24
40 Origins: Monge s Problem 24
41 Origins: Monge s Problem 24
42 Origins: Monge s Problem 24
43 Origins: Monge s Problem 24
44 Origins: Monge s Problem x 24
45 Origins: Monge s Problem x 24
46 Origins: Monge s Problem x y = T (x) 24
47 Origins: Monge s Problem x y = T (x) D(x, T (x)) 24
48 Origins: Monge s Problem a probability space, c :! R. µ, two probability measures in P( ). [Monge 81] problem: Z find a map T :! inf T ] µ= c(x, T (x))µ(dx) x T (x) 25
49 Origins: Monge s Problem a probability space, c :! R. µ, two probability measures in P( ). [Monge 81] problem: find a map T :! [Brenier 87] If = R d, c = k k 2, µ, a.c., then T = ru, u convex. x T (x) 25
50 Monge s Problem a probability space, c :! R. µ, two probability measures in P( ). [Monge 81] problem: Z find a map T :! inf T ] µ= c(x, T (x))µ(dx) x T (x) 26
51 Monge s Problem a probability space, c :! R. µ, two probability measures in P( ). [Monge 81] problem: Z find a map T :! inf T ] µ= c(x, T (x))µ(dx) x 26
52 [Kantorovich 42] Relaxation Instead of maps T :!, consider P 2 P( ) probabilistic maps, i.e. couplings : (µ, ) def = {P 2 P( ) 8A, B, P (A ) =µ(a), P ( B) = (B)} 27
53 [Kantorovich 42] Relaxation (µ, ) def = {P 2 P( ) 8A, B, P (A ) =µ(a), P ( B) = (B)} } { } { } { 0.6 µ(x) ν(y) 0.4 P x y 3 4 P (x, y)
54 [Kantorovich 42] Relaxation Joint Probabilities of (µ, ν) (µ, ) def = {P 2 P( ) 8A, B, Π(µ, ν) = For probability P Π(µ, measures ν), on Ω 2 P (A ) =µ(a), P ( B) = (B)} P ({x},} Ω) =µ({x}) with { marginals } and P µ (Ω, and{ {y}) ν. } =ν({y}) { 0.6 µ(x) ν(y) 0.4 P x y 3 4 P (x, y)
55 Wasserstein Distances Def. For p 1, the p-wasserstein distance between µ, in P( ), defined by a metric D on, ZZ W p p (µ, ) def = inf P 2 (µ, ) D(x, y) p P (dx, dy). PRIMAL 29
56 Wasserstein Distances Def. For p 1, the p-wasserstein distance between µ, in P( ), defined by a metric D on, ZZ W p p (µ, ) def = inf P 2 (µ, ) D(x, y) p P (dx, dy). PRIMAL 29
57 Wasserstein Distances Def. For p 1, the p-wasserstein distance between µ, in P( ), defined by a metric D on, ZZ W p p (µ, ) def = inf P 2 (µ, ) D(x, y) p P (dx, dy). PRIMAL Z Z W p p (µ, ) = sup '2L 1 (µ), 2L 1 ( ) '(x)+ (y)appled p (x,y) 29 'dµ + d. DUAL
58 W is versatile Discrete - Discrete Discrete - Continuous Continuous - Continuous 30
59 W is versatile Discrete - Discrete - Network flow solvers - Entropic regularization Discrete - Continuous low dim. [M 11][KMB 16] [L 15] Continuous - Continuous Stochastic Optimization [GCPB 16] 30
60 Minimum Kantorovich Estimators min W ( data,f ] µ) 2 [Bassetti 06] 1st reference discussing this approach. [MMC 16] use regularization in a finite setting. [ACB 17] (WGAN) [BJGR 17] (Wasserstein ABC). Hot topics: approximate & differentiate W efficiently. Today: ideas from our recent preprint [GPC 17] 31
61 Wasserstein on Empirical Measures nx µ = i=1 a i xi (, D) mx 32 = j=1 b j yj
62 Wasserstein on Empirical Measures nx mx Consider µ = a i xi and = b j yj. i=1 M XY def =[D(x i, y j ) p ] ij j=1 U(a, b) def = {P 2 R n m + P 1 m = a,p T 1 n = b} 2 y 1... y m 3 2 b 1... b m 3 x 1. 6 D(x i, y j ) p 4 x n a 1. 6 P 1 m = a 4 a n 7 5
63 Wasserstein on Empirical Measures nx mx Consider µ = a i xi and = b j yj. i=1 M XY def =[D(x i, y j ) p ] ij j=1 U(a, b) def = {P 2 R n m + P 1 m = a,p T 1 n = b} 2 y 1... y m x 1. 6 D(x i, y j ) p 4 x n b 1... b m a P T 1 4 n = b a n 3 7 5
64 Wasserstein on Empirical Measures nx mx Consider µ = a i xi and = b j yj. i=1 M XY def =[D(x i, y j ) p ] ij j=1 U(a, b) def = {P 2 R n m + P 1 m = a,p T 1 n = b} Def. Optimal Transport Problem W p p (µ, ) = min hp,m XY i P 2U(a,b) 33
65 Discrete OT Problem M XY U(a, b) 34
66 Discrete OT Problem M XY P? U(a, b) 35
67 Discrete OT Problem M XY P? U(a, b) Def. Dual OT problem W p p (µ, ) = max T a + 2R n, 2R m i + j appled(x i,y j ) p 35 T b
68 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) P? Note: flow/pde formulations [Beckman 61]/[Benamou 98] can be used for p=1/p=2 for a sparse-graph metric/euclidean metric. 35
69 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) P? 36
70 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) P? P? Solution unstable and not always unique. 36
71 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) {P? } P? Solution unstable and not always unique. 36
72 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) {P? } P? Solution unstable and not always unique. 37
73 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) P? Solution unstable and not always unique. P? 37
74 Discrete OT Problem M XY network flow solver used in practice. O(n 3 log(n)) U(a, b) P? Solution unstable and not always unique. P? W p p (µ, ) not di erentiable. 37
75 Entropic Regularization [Wilson 62] Def. Regularized Wasserstein, 0 W (µ, ) def = min P 2U(a,b) hp,m XY i E(P ) E(P ) def = nmx i,j=1 P ij (log P ij ) Note: Unique optimal solution because of strong concavity of Entropy 38
76 Entropic Regularization [Wilson 62] Def. Regularized Wasserstein, 0 W (µ, ) def = min P 2U(a,b) hp,m XY i E(P ) µ P Note: Unique optimal solution because of strong concavity of Entropy 38
77 Fast & Scalable Algorithm Prop. If P def = argmin hp,m XY i E(P ) P 2U(a,b) then 9!u 2 R n +, v 2 R m +, such that P = diag(u)kdiag(v), K def = e M XY / 39
78 Fast & Scalable Algorithm Prop. If P def = argmin hp,m XY i E(P ) P 2U(a,b) then 9!u 2 R n +, v 2 R m +, such that P = diag(u)kdiag(v), K def = e M XY / L(P,, )= X ij P ij M ij + P ij log P ij + T (P 1 a)+ T (P T 1 ij = M ij + (log P ij + 1) + i + j (@L/@P ij = 0) )P ij = e i e M ij e j = u i K ij v j 39
79 Fast & Scalable Algorithm Prop. If P def = argmin hp,m XY i E(P ) P 2U(a,b) then 9!u 2 R n +, v 2 R m +, such that P = diag(u)kdiag(v), K def = e M XY / [Sinkhorn 64] fixed-point iterations for (u, v) u a/kv, v b/k T u O(nm) complexity, GPGPU parallel [C 13]. O(n d+1 ) if = {1,...,n} d and separable. D p [S..C.. 15] 39
80 Sinkhorn Divergence Def. For > 0, let W (µ, ) def = hp,m XY i Prop. W (µ, µ) > 0 Def. Normalized Sinkhorn Divergence W (µ, ) def = W (µ, ) 1 2 (W (µ, µ)+w (, )) Prop. If p = 1, W (µ, )!!1 ED(µ, ) 40
81 Algorithmic Formulation Def. For L 1, define W L (µ, ) def = hp L,M XY i, where P L def = diag(u L )Kdiag(v L ), v 0 = 1 m ; l 0, u l def = a/kv l, v l+1 def = b/k T u l. L can be computed recursively, in O(L) kernelk vector products. 41
82 Proposal: Autodiff OT using Sinkhorn Approximate W loss by the transport cost W L after L Sinkhorn iterations. (z1,...,zm) (x1,...,xm) Input data (c(x i,y j )) i,j e C/" C... (y1,...,yn) K nk 1 b`+1 m 1/ 1/ ÊL ( ) ` ` +1 mk > Generative model Sinkhorn ` =1,...,L 1 b` a`+1 h(c K)b L,a L i 42 [GPC 17]
83 Example: MNIST, Learning f 43
84 Example: Generation of Images (a) MMD (b) (c) MMD-GAN τ = 1000 τ=10 CIFAR 10 images In these examples the cost function is also learned adversarially, as a NN mapping onto feature vectors. 44
85 Concluding Remarks Regularized OT is much faster than OT. Regularized OT can interpolate between W and the MMD / Energy distance metrics. The solution of regularized OT is auto-differentiable. Many open problems remain! NIPS 17 WORKSHOP NIPS 17 TUTORIAL 45
Numerical Optimal Transport and Applications
Numerical Optimal Transport and Applications Gabriel Peyré Joint works with: Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, Justin Solomon www.numerical-tours.com Histograms in Imaging
More informationsoftened probabilistic
Justin Solomon MIT Understand geometry from a softened probabilistic standpoint. Somewhere over here. Exactly here. One of these two places. Query 1 2 Which is closer, 1 or 2? Query 1 2 Which is closer,
More informationOptimal Transport: A Crash Course
Optimal Transport: A Crash Course Soheil Kolouri and Gustavo K. Rohde HRL Laboratories, University of Virginia Introduction What is Optimal Transport? The optimal transport problem seeks the most efficient
More informationOptimal Transport in ML
Optimal Transport in ML Rémi Gilleron Inria Lille & CRIStAL & Univ. Lille Feb. 2018 Main Source also some figures Computational Optimal transport, G. Peyré and M. Cuturi Rémi Gilleron (Magnet) OT Feb.
More informationOptimal transport for Gaussian mixture models
1 Optimal transport for Gaussian mixture models Yongxin Chen, Tryphon T. Georgiou and Allen Tannenbaum Abstract arxiv:1710.07876v2 [math.pr] 30 Jan 2018 We present an optimal mass transport framework on
More informationOptimal mass transport as a distance measure between images
Optimal mass transport as a distance measure between images Axel Ringh 1 1 Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden. 21 st of June 2018 INRIA, Sophia-Antipolis, France
More informationWasserstein Training of Boltzmann Machines
Wasserstein Training of Boltzmann Machines Grégoire Montavon, Klaus-Rober Muller, Marco Cuturi Presenter: Shiyu Liang December 1, 2016 Coordinated Science Laboratory Department of Electrical and Computer
More informationAdaGAN: Boosting Generative Models
AdaGAN: Boosting Generative Models Ilya Tolstikhin ilya@tuebingen.mpg.de joint work with Gelly 2, Bousquet 2, Simon-Gabriel 1, Schölkopf 1 1 MPI for Intelligent Systems 2 Google Brain Radford et al., 2015)
More informationOptimal Transport and Wasserstein Distance
Optimal Transport and Wasserstein Distance The Wasserstein distance which arises from the idea of optimal transport is being used more and more in Statistics and Machine Learning. In these notes we review
More informationNishant Gurnani. GAN Reading Group. April 14th, / 107
Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,
More informationOptimal transport for machine learning
Optimal transport for machine learning Rémi Flamary AG GDR ISIS, Sète, 16 Novembre 2017 2 / 37 Collaborators N. Courty A. Rakotomamonjy D. Tuia A. Habrard M. Cuturi M. Perrot C. Févotte V. Emiya V. Seguy
More informationarxiv: v1 [cs.lg] 13 Nov 2018
SEMI-DUAL REGULARIZED OPTIMAL TRANSPORT MARCO CUTURI AND GABRIEL PEYRÉ arxiv:1811.05527v1 [cs.lg] 13 Nov 2018 Abstract. Variational problems that involve Wasserstein distances and more generally optimal
More informationIntroduction to optimal transport
Introduction to optimal transport Nicola Gigli May 20, 2011 Content Formulation of the transport problem The notions of c-convexity and c-cyclical monotonicity The dual problem Optimal maps: Brenier s
More informationOptimal Transport Methods in Operations Research and Statistics
Optimal Transport Methods in Operations Research and Statistics Jose Blanchet (based on work with F. He, Y. Kang, K. Murthy, F. Zhang). Stanford University (Management Science and Engineering), and Columbia
More informationThe dynamics of Schrödinger bridges
Stochastic processes and statistical machine learning February, 15, 2018 Plan of the talk The Schrödinger problem and relations with Monge-Kantorovich problem Newton s law for entropic interpolation The
More informationMathematical Foundations of Data Sciences
Mathematical Foundations of Data Sciences Gabriel Peyré CNRS & DMA École Normale Supérieure gabriel.peyre@ens.fr https://mathematical-tours.github.io www.numerical-tours.com January 7, 2018 Chapter 18
More informationApproximations of displacement interpolations by entropic interpolations
Approximations of displacement interpolations by entropic interpolations Christian Léonard Université Paris Ouest Mokaplan 10 décembre 2015 Interpolations in P(X ) X : Riemannian manifold (state space)
More informationWasserstein GAN. Juho Lee. Jan 23, 2017
Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)
More informationSome geometric consequences of the Schrödinger problem
Some geometric consequences of the Schrödinger problem Christian Léonard To cite this version: Christian Léonard. Some geometric consequences of the Schrödinger problem. Second International Conference
More informationInformation theoretic perspectives on learning algorithms
Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez
More informationGeometric Inference for Probability distributions
Geometric Inference for Probability distributions F. Chazal 1 D. Cohen-Steiner 2 Q. Mérigot 2 1 Geometrica, INRIA Saclay, 2 Geometrica, INRIA Sophia-Antipolis 2009 June 1 Motivation What is the (relevant)
More informationConcentration inequalities: basics and some new challenges
Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,
More informationDiscrete transport problems and the concavity of entropy
Discrete transport problems and the concavity of entropy Bristol Probability and Statistics Seminar, March 2014 Funded by EPSRC Information Geometry of Graphs EP/I009450/1 Paper arxiv:1303.3381 Motivating
More informationOptimal Transportation. Nonlinear Partial Differential Equations
Optimal Transportation and Nonlinear Partial Differential Equations Neil S. Trudinger Centre of Mathematics and its Applications Australian National University 26th Brazilian Mathematical Colloquium 2007
More informationA new Hellinger-Kantorovich distance between positive measures and optimal Entropy-Transport problems
A new Hellinger-Kantorovich distance between positive measures and optimal Entropy-Transport problems Giuseppe Savaré http://www.imati.cnr.it/ savare Dipartimento di Matematica, Università di Pavia Nonlocal
More informationUnderstanding GANs: Back to the basics
Understanding GANs: Back to the basics David Tse Stanford University Princeton University May 15, 2018 Joint work with Soheil Feizi, Farzan Farnia, Tony Ginart, Changho Suh and Fei Xia. GANs at NIPS 2017
More informationConvex discretization of functionals involving the Monge-Ampère operator
Convex discretization of functionals involving the Monge-Ampère operator Quentin Mérigot CNRS / Université Paris-Dauphine Joint work with J.D. Benamou, G. Carlier and É. Oudet Workshop on Optimal Transport
More informationFast Computation of Wasserstein Barycenters
Marco Cuturi Graduate School of Informatics, Kyoto University Arnaud Doucet Department of Statistics, University of Oxford MCUTURI@I.KYOTO-U.AC.JP DOUCET@STAT.OXFORD.AC.UK Abstract We present new algorithms
More informationGeometric Inference for Probability distributions
Geometric Inference for Probability distributions F. Chazal 1 D. Cohen-Steiner 2 Q. Mérigot 2 1 Geometrica, INRIA Saclay, 2 Geometrica, INRIA Sophia-Antipolis 2009 July 8 F. Chazal, D. Cohen-Steiner, Q.
More informationDisplacement convexity of the relative entropy in the discrete h
Displacement convexity of the relative entropy in the discrete hypercube LAMA Université Paris Est Marne-la-Vallée Phenomena in high dimensions in geometric analysis, random matrices, and computational
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationOptimal Transport for Data Analysis
Optimal Transport for Data Analysis Bernhard Schmitzer 2017-05-30 1 Introduction 1.1 Reminders on Measure Theory Reference: Ambrosio, Fusco, Pallara: Functions of Bounded Variation and Free Discontinuity
More informationNumerical Optimal Transport. Applications
Numerical Optimal Transport http://optimaltransport.github.io Applications Gabriel Peyré www.numerical-tours.com ÉCOLE NORMALE SUPÉRIEURE Grayscale Image Equalization Figure 2.9: 1-D optimal couplings:
More informationSuperhedging and distributionally robust optimization with neural networks
Superhedging and distributionally robust optimization with neural networks Stephan Eckstein joint work with Michael Kupper and Mathias Pohl Robust Techniques in Quantitative Finance University of Oxford
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationOptimal transportation and optimal control in a finite horizon framework
Optimal transportation and optimal control in a finite horizon framework Guillaume Carlier and Aimé Lachapelle Université Paris-Dauphine, CEREMADE July 2008 1 MOTIVATIONS - A commitment problem (1) Micro
More informationarxiv: v2 [math.ap] 23 Apr 2014
Multi-marginal Monge-Kantorovich transport problems: A characterization of solutions arxiv:1403.3389v2 [math.ap] 23 Apr 2014 Abbas Moameni Department of Mathematics and Computer Science, University of
More informationOptimal Transport for Data Analysis
Optimal Transport for Data Analysis Bernhard Schmitzer 2017-05-16 1 Introduction 1.1 Reminders on Measure Theory Reference: Ambrosio, Fusco, Pallara: Functions of Bounded Variation and Free Discontinuity
More informationMath 5051 Measure Theory and Functional Analysis I Homework Assignment 3
Math 551 Measure Theory and Functional Analysis I Homework Assignment 3 Prof. Wickerhauser Due Monday, October 12th, 215 Please do Exercises 3*, 4, 5, 6, 8*, 11*, 17, 2, 21, 22, 27*. Exercises marked with
More informationInformation Geometry
2015 Workshop on High-Dimensional Statistical Analysis Dec.11 (Friday) ~15 (Tuesday) Humanities and Social Sciences Center, Academia Sinica, Taiwan Information Geometry and Spontaneous Data Learning Shinto
More informationDynamic and Stochastic Brenier Transport via Hopf-Lax formulae on Was
Dynamic and Stochastic Brenier Transport via Hopf-Lax formulae on Wasserstein Space With many discussions with Yann Brenier and Wilfrid Gangbo Brenierfest, IHP, January 9-13, 2017 ain points of the
More informationOptimal Transport And WGAN
Optimal Transport And WGAN Yiping Lu 1 1 School of Mathmetical Science Peking University Data Science Seminar, PKU, 2017 Yiping Lu (pku) Geometry Of WGAN 1/34 Outline 1 Introduction To Optimal Transport.
More informationSome theoretical properties of GANs. Gérard Biau Toulouse, September 2018
Some theoretical properties of GANs Gérard Biau Toulouse, September 2018 Coauthors Benoît Cadre (ENS Rennes) Maxime Sangnier (Sorbonne University) Ugo Tanielian (Sorbonne University & Criteo) 1 video Source:
More informationStability results for Logarithmic Sobolev inequality
Stability results for Logarithmic Sobolev inequality Daesung Kim (joint work with Emanuel Indrei) Department of Mathematics Purdue University September 20, 2017 Daesung Kim (Purdue) Stability for LSI Probability
More informationPATH FUNCTIONALS OVER WASSERSTEIN SPACES. Giuseppe Buttazzo. Dipartimento di Matematica Università di Pisa.
PATH FUNCTIONALS OVER WASSERSTEIN SPACES Giuseppe Buttazzo Dipartimento di Matematica Università di Pisa buttazzo@dm.unipi.it http://cvgmt.sns.it ENS Ker-Lann October 21-23, 2004 Several natural structures
More informationNumerical Approximation of L 1 Monge-Kantorovich Problems
Rome March 30, 2007 p. 1 Numerical Approximation of L 1 Monge-Kantorovich Problems Leonid Prigozhin Ben-Gurion University, Israel joint work with J.W. Barrett Imperial College, UK Rome March 30, 2007 p.
More informationOn a Class of Multidimensional Optimal Transportation Problems
Journal of Convex Analysis Volume 10 (2003), No. 2, 517 529 On a Class of Multidimensional Optimal Transportation Problems G. Carlier Université Bordeaux 1, MAB, UMR CNRS 5466, France and Université Bordeaux
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationOptimal Transport for Applied Mathematicians
Optimal Transport for Applied Mathematicians Calculus of Variations, PDEs and Modelling Filippo Santambrogio 1 1 Laboratoire de Mathématiques d Orsay, Université Paris Sud, 91405 Orsay cedex, France filippo.santambrogio@math.u-psud.fr,
More informationPenalized Barycenters in the Wasserstein space
Penalized Barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS Journées IOP - Du 5 au 8 Juillet 2017 Bordeaux Elsa Cazelles
More informationarxiv: v1 [math.oc] 27 May 2016
Stochastic Optimization for Large-scale Optimal Transport arxiv:1605.08527v1 [math.oc] 27 May 2016 Aude Genevay CEREMADE, Univ. Paris-Dauphine INRIA Mokaplan project-team genevay@ceremade.dauphine.fr Gabriel
More informationMartingale optimal transport with Monge s cost function
Martingale optimal transport with Monge s cost function Martin Klimmek, joint work with David Hobson (Warwick) klimmek@maths.ox.ac.uk Crete July, 2013 ca. 1780 c(x, y) = x y No splitting, transport along
More informationRegularity for the optimal transportation problem with Euclidean distance squared cost on the embedded sphere
Regularity for the optimal transportation problem with Euclidean distance squared cost on the embedded sphere Jun Kitagawa and Micah Warren January 6, 011 Abstract We give a sufficient condition on initial
More informationarxiv: v3 [stat.ml] 12 Mar 2018
Wasserstein Auto-Encoders Ilya Tolstikhin 1, Olivier Bousquet 2, Sylvain Gelly 2, and Bernhard Schölkopf 1 1 Max Planck Institute for Intelligent Systems 2 Google Brain arxiv:1711.01558v3 [stat.ml] 12
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationOptimal Transport in Risk Analysis
Optimal Transport in Risk Analysis Jose Blanchet (based on work with Y. Kang and K. Murthy) Stanford University (Management Science and Engineering), and Columbia University (Department of Statistics and
More informationStochastic Optimization for Large-scale Optimal Transport
Stochastic Optimization for Large-scale Optimal Transport Aude Genevay, Marco Cuturi, Gabriel Peyré, Francis Bach To cite this version: Aude Genevay, Marco Cuturi, Gabriel Peyré, Francis Bach. Stochastic
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationNEW FUNCTIONAL INEQUALITIES
1 / 29 NEW FUNCTIONAL INEQUALITIES VIA STEIN S METHOD Giovanni Peccati (Luxembourg University) IMA, Minneapolis: April 28, 2015 2 / 29 INTRODUCTION Based on two joint works: (1) Nourdin, Peccati and Swan
More informationGAN, Optimal Transport and Monge-Ampère Equation
GAN, Optimal Transport and Monge-Ampère Equation David Xianfeng Gu 1,2, Shing-Tung Yau 2 1 Computer Science & Applied Mathematics Stony Brook University 2 Center of Mathematical Sciences and Applications
More informationEnergy-Based Generative Adversarial Network
Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial
More informationA Closed-form Gradient for the 1D Earth Mover s Distance for Spectral Deep Learning on Biological Data
A Closed-form Gradient for the D Earth Mover s Distance for Spectral Deep Learning on Biological Data Manuel Martinez, Makarand Tapaswi, and Rainer Stiefelhagen Karlsruhe Institute of Technology, Karlsruhe,
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationGradient Flow in the Wasserstein Metric
3 2 1 Time Position -0.5 0.0 0.5 0 Graient Flow in the Wasserstein Metric Katy Craig University of California, Santa Barbara JMM, AMS Metric Geometry & Topology January 11th, 2018 graient flow an PDE Examples:
More informationCS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds
CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()
More informationGame Theory and its Applications to Networks - Part I: Strict Competition
Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis
More informationRevisiting the Exploration-Exploitation Tradeoff in Bandit Models
Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making
More informationConvex discretization of functionals involving the Monge-Ampère operator
Convex discretization of functionals involving the Monge-Ampère operator Quentin Mérigot CNRS / Université Paris-Dauphine Joint work with J.D. Benamou, G. Carlier and É. Oudet GeMeCod Conference October
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationEntropic and Displacement Interpolation: Tryphon Georgiou
Entropic and Displacement Interpolation: Schrödinger bridges, Monge-Kantorovich optimal transport, and a Computational Approach Based on the Hilbert Metric Tryphon Georgiou University of Minnesota IMA
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationGeometry and Optimization of Relative Arbitrage
Geometry and Optimization of Relative Arbitrage Ting-Kam Leonard Wong joint work with Soumik Pal Department of Mathematics, University of Washington Financial/Actuarial Mathematic Seminar, University of
More informationScaling up Bayesian Inference
Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background
More informationConvex inequalities, isoperimetry and spectral gap III
Convex inequalities, isoperimetry and spectral gap III Jesús Bastero (Universidad de Zaragoza) CIDAMA Antequera, September 11, 2014 Part III. K-L-S spectral gap conjecture KLS estimate, through Milman's
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationAPPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL
APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL Adrien Blanchet IAST/TSE/Université Toulouse 1 Capitole Journées SMAI-MODE 216 En collaboration avec G. Carlier & P. Mossay & F. Santambrogio
More informationLogarithmic Sobolev Inequalities
Logarithmic Sobolev Inequalities M. Ledoux Institut de Mathématiques de Toulouse, France logarithmic Sobolev inequalities what they are, some history analytic, geometric, optimal transportation proofs
More informationSecond order models for optimal transport and cubic splines on the Wasserstein space
Second order models for optimal transport and cubic splines on the Wasserstein space Jean-David Benamou, Thomas Gallouët, François-Xavier Vialard To cite this version: Jean-David Benamou, Thomas Gallouët,
More informationarxiv: v1 [math.oc] 29 Dec 2017
Vector and Matrix Optimal Mass Transport: Theory, Algorithm, and Applications Ernest K. Ryu, Yongxin Chen, Wuchen Li, and Stanley Osher arxiv:1712.10279v1 [math.oc] 29 Dec 2017 December 29, 2017 Abstract
More informationDeterminant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University
Determinant maximization with linear matrix inequality constraints S. Boyd, L. Vandenberghe, S.-P. Wu Information Systems Laboratory Stanford University SCCM Seminar 5 February 1996 1 MAXDET problem denition
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationLarge Scale Semi-supervised Linear SVMs. University of Chicago
Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationFree Energy, Fokker-Planck Equations, and Random walks on a Graph with Finite Vertices
Free Energy, Fokker-Planck Equations, and Random walks on a Graph with Finite Vertices Haomin Zhou Georgia Institute of Technology Jointly with S.-N. Chow (Georgia Tech) Wen Huang (USTC) Yao Li (NYU) Research
More informationA convex relaxation for weakly supervised classifiers
A convex relaxation for weakly supervised classifiers Armand Joulin and Francis Bach SIERRA group INRIA -Ecole Normale Supérieure ICML 2012 Weakly supervised classification We adress the problem of weakly
More informationIntroduction to Optimal Transport Theory
Introduction to Optimal Transport Theory Filippo Santambrogio Grenoble, June 15th 2009 These very short lecture notes do not want to be an exhaustive presentation of the topic, but only a short list of
More informationA Gaussian Type Kernel for Persistence Diagrams
TRIPODS Summer Bootcamp: Topology and Machine Learning Brown University, August 2018 A Gaussian Type Kernel for Persistence Diagrams Mathieu Carrière joint work with S. Oudot and M. Cuturi Persistence
More informationPart 4: Conditional Random Fields
Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 39 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution.
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationarxiv: v1 [cs.sy] 14 Apr 2013
Matrix-valued Monge-Kantorovich Optimal Mass Transport Lipeng Ning, Tryphon T. Georgiou and Allen Tannenbaum Abstract arxiv:34.393v [cs.sy] 4 Apr 3 We formulate an optimal transport problem for matrix-valued
More informationInformation geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem
https://doi.org/10.1007/s41884-018-0002-8 RESEARCH PAPER Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem Shun-ichi Amari
More informationOptimal Transport for Domain Adaptation
Optimal Transport for Domain Adaptation Nicolas Courty, Rémi Flamary, Devis Tuia, Alain Rakotomamonjy To cite this version: Nicolas Courty, Rémi Flamary, Devis Tuia, Alain Rakotomamonjy. Optimal Transport
More informationGeometrical Insights for Implicit Generative Modeling
Geometrical Insights for Implicit Generative Modeling Leon Bottou a,b, Martin Arjovsky b, David Lopez-Paz a, Maxime Oquab a,c a Facebook AI Research, New York, Paris b New York University, New York c Inria,
More informationThe optimal partial transport problem
The optimal partial transport problem Alessio Figalli Abstract Given two densities f and g, we consider the problem of transporting a fraction m [0, min{ f L 1, g L 1}] of the mass of f onto g minimizing
More informationSEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE
SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE FRANÇOIS BOLLEY Abstract. In this note we prove in an elementary way that the Wasserstein distances, which play a basic role in optimal transportation
More informationAn Overview of Edward: A Probabilistic Programming System. Dustin Tran Columbia University
An Overview of Edward: A Probabilistic Programming System Dustin Tran Columbia University Alp Kucukelbir Eugene Brevdo Andrew Gelman Adji Dieng Maja Rudolph David Blei Dawen Liang Matt Hoffman Kevin Murphy
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationOn dissipation distances for reaction-diffusion equations the Hellinger-Kantorovich distance
Weierstrass Institute for! Applied Analysis and Stochastics On dissipation distances for reaction-diffusion equations the Matthias Liero, joint work with Alexander Mielke, Giuseppe Savaré Mohrenstr. 39
More information