Numerical Optimal Transport and Applications

Size: px

Start display at page:

Download "Numerical Optimal Transport and Applications"

Buddy Griffith
5 years ago
Views:

1 Numerical Optimal Transport and Applications Gabriel Peyré Joint works with: Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, Justin Solomon

2 Histograms in Imaging and Machine Learning Color histograms: Input image

3 Histograms in Imaging and Machine Learning Color histograms: optimal transport Input image Modified image

4 Histograms in Imaging and Machine Learning Color histograms: optimal transport Bag of words: Input image Modified image

5 Overview Transportation-like Problems Regularized Transport Optimal Transport Barycenters Heat Kernel Approximation

6 Radon Measures and Couplings Positive Radon measures M + (X): On a metric space (X, d). dµ(x) =f(x)dx µ = P i p i x i X = R

7 Radon Measures and Couplings Positive Radon measures M + (X): On a metric space (X, d). dµ(x) =f(x)dx Probability measures. µ = P i p i x i X = R µ(x) =1 R f =1 P i p i =1

8 Radon Measures and Couplings Positive Radon measures M + (X): dµ(x) =f(x)dx Probability measures. Couplings: Marginals: On a metric space (X, d). µ = P i p i x i X = R µ(x) =1 R f =1 P i p i =1 (µ, ) def. = { 2 M + (X X) ; P 1] = µ, P 2] = } P 1] (S) def. = (S, X) P 2] (S) def. = (X, S) µ = P i p i xi p = P j q j y j q

9 Radon Measures and Couplings Positive Radon measures M + (X): dµ(x) =f(x)dx Probability measures. Couplings: Marginals: µ = P i p i xi On a metric space (X, d). R f =1 P i p i =1 (µ, ) def. = { 2 M + (X X) ; P 1] = µ, P 2] = } p P 1] (S) def. = (S, X) P 2] (S) def. = (X, S) = P j q j q µ = P i p i x i X = R µ(x) =1 y j µ

10 Optimal Transport Ground cost c(x, y) on X X. Optimal transport: [Kantorovitch 1942] min hc, i = Z X X c(x, y)d (x, y) ; 2 (µ, )

11 Optimal Transport Ground cost c(x, y) on X X. Optimal transport: [Kantorovitch 1942] W (µ, ) def. = min hc, i = Z X X c(x, y)d (x, y) ; 2 (µ, ) For c(x, y) =d(x, y), -Wasserstein distance W.

12 Optimal Transport Ground cost c(x, y) on X X. Optimal transport: [Kantorovitch 1942] W (µ, ) def. = min hc, i = Z X X c(x, y)d (x, y) ; 2 (µ, ) For c(x, y) =d(x, y), -Wasserstein distance W. Linear programming: µ = P N 1 i=1 p i x i, = P N 2 j=1 p j y i

13 Optimal Transport Ground cost c(x, y) on X X. Optimal transport: [Kantorovitch 1942] W (µ, ) def. = min hc, i = Z X X c(x, y)d (x, y) ; 2 (µ, ) For c(x, y) =d(x, y), -Wasserstein distance W. Linear programming: µ = P N 1 i=1 p i x i, = P N 2 j=1 p j y i Hungarian/Auction: O(N 3 ) µ = 1 N P N i=1 x i, = 1 N P N j=1 y j

14 Optimal Transport Ground cost c(x, y) on X X. Optimal transport: [Kantorovitch 1942] W (µ, ) def. = min hc, i = Z X X c(x, y)d (x, y) ; 2 (µ, ) For c(x, y) =d(x, y), -Wasserstein distance W. Linear programming: µ = P N 1 i=1 p i x i, = P N 2 Hungarian/Auction: also reveal that the network simplex behaves in O(n 2 ) in our context, which is a major gain at the scale at which we typical work, i.e. thousands of particles. This finding is also useful for applica- j=1 p j y i tions that use EMD, where using the network simplex instead of the transport simplex can bring a significant performance increase. Our experiments also show that fixed-point precision O(N 3 further speeds up the computation. We observed that the value ) of the final transport cost is less accurate because of the limited precision, but that the particle pairing that produces the actual interpolation scheme remains unchanged. We used the fixed point method to generate µ = 1 N P N i=1 x i, = 1 N P N j=1 y j the results presented in this paper. The results of the performance study are also of broader interest, as current EMD image retrieval or color transfer techniques rely on slower solvers [Rubner et al. 2000; Kanters et al. 2003; Morovic and Sun 2003]. Monge-Ampère/Benamou-Brenier, d = 2 2. Time in seconds Network Simplex (fixed point) Network Simplex (double prec.) Transport Simplex y = α x 2 y = β x 3 Figure 7: Synthetic 2D examples on a Euclidean domain. Th left and right columns show the input distributions, while the cent columns show interpolations for =1/4, =1/2, and =3/ 10 4 positive. A negative side effect of this choice is that interpolatin

SEMI-DISCRETE OPTIMAL TRANSPORT OPTIMAL IN 3D 21 also reveal that the network simplex behaves in O(n2 ) in our context, which is a major gain at the scale at which we typical work, i.e. thousands of particles.

15 Optimal Transport Ground cost c(x, y) on X X. Optimal transport: [Kantorovitch 1942] Z def. W (µ, ) = min hc, i = X X c(x, y)d (x, y) ; 2 (µ, ) For c(x, y) = d(x, y), -Wasserstein distance W. LinearPprogramming: P N1 N2 µ = i=1 pi xi, = j=1 pj yi A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE OPTIMAL TRANSPORT IN 3D 21 A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE OPTIMAL TRANSPORT OPTIMAL IN 3D 21 also reveal that the network simplex behaves in O(n2 ) in our context, which is a major gain at the scale at which we typical work, i.e. thousands of particles. This finding is also useful for applications that use EMD, where using the network simplex instead of the transport simplex can bring a significant performance increase. Our experiments also show that fixed-point precision further speeds 3 up the computation. We observed that the value of the final transport cost is less accurate because of the limited precision, but that the particle pairing that produces the actual interpolation scheme remains unchanged. We used the fixed point method to generate the iresults presented in this paper. The results of the j performance study are also of broader interest, as current EMD image retrieval or color transfer techniques rely on slower solvers [Rubner et al. 2000; Kanters et al. 2003; Morovic and Sun 2003]. Hungarian/Auction: PN 1 µ = N i=1 x, = 1 N O(N ) PN j=1 y Monge-Ampe re/benamou-brenier, d = Network Simplex (fixed point) Network Simplex (double prec.) Transport Simplex y = α x2 Time in seconds Semi-discrete: Laguerre cells, d = y = β x3 Figure 7: Synthetic 2D examples on a Euclidean domain. Th left and right columns show the input distributions, while the cent [Levy, 15] columns show interpolations for = 1/4, = 1/2, and = 3/ Figure 9. More examples of9.semi-discrete optimal transport. optimal Figureof More examples of semi-discrete Figure 9. More examples semi-discrete optimal transport. Note how the solids deform and merge to form the sphere ontothe Note how solids and merge form the sph Note how the solids deform and the merge to deform form the sphere on the first row, and how thefirst branches of the star split and merge on thesplit and m row, and how the branches of the star first row, positive. and how the branches of the star split and merge on the A negative side effect of this choice is that interpolatin second row.

16 Optimal Transport Ground cost c(x, y) on X X. Optimal transport: [Kantorovitch 1942] Z def. W (µ, ) = min hc, i = X X c(x, y)d (x, y) ; 2 (µ, ) For c(x, y) = d(x, y), -Wasserstein distance W. LinearPprogramming: P N1 N2 µ = i=1 pi xi, = j=1 pj yi A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE OPTIMAL TRANSPORT IN 3D 21 A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE OPTIMAL TRANSPORT OPTIMAL IN 3D 21 also reveal that the network simplex behaves in O(n2 ) in our context, which is a major gain at the scale at which we typical work, i.e. thousands of particles. This finding is also useful for applications that use EMD, where using the network simplex instead of the transport simplex can bring a significant performance increase. Our experiments also show that fixed-point precision further speeds 3 up the computation. We observed that the value of the final transport cost is less accurate because of the limited precision, but that the particle pairing that produces the actual interpolation scheme remains unchanged. We used the fixed point method to generate the iresults presented in this paper. The results of the j performance study are also of broader interest, as current EMD image retrieval or color transfer techniques rely on slower solvers [Rubner et al. 2000; Kanters et al. 2003; Morovic and Sun 2003]. Hungarian/Auction: PN 1 µ = N i=1 x, = 1 N O(N ) PN j=1 y Monge-Ampe re/benamou-brenier, d = Network Simplex (fixed point) Network Simplex (double prec.) Transport Simplex y = α x2 Time in seconds Semi-discrete: Laguerre cells, d = y = β x3 Figure 7: Synthetic 2D examples on a Euclidean domain. Th left and right columns show the input distributions, while the cent [Levy, 15] columns show interpolations for = 1/4, = 1/2, and = 3/ Need for fast approximate algorithms for generic c Figure 9. More examples of9.semi-discrete optimal transport. optimal Figureof More examples of semi-discrete Figure 9. More examples semi-discrete optimal transport. Note how the solids deform and merge to form the sphere ontothe Note how solids and merge form the sph Note how the solids deform and the merge to deform form the sphere on the first row, and how thefirst branches of the star split and merge on thesplit and m row, and how the branches of the star first row, positive. and how the branches of the star split and merge on the A negative side effect of this choice is that interpolatin second row.

17 Unbalanced Transport (µ, ) 2 M + (X) KL( µ) def. = R X log d dµ dµ + R (dµ d ) X

18 (µ, ) 2 M + (X) Unbalanced Transport KL( µ) def. = R X log d dµ WF c (µ, ) def. =minhc, i + KL(P 1] µ)+ KL(P 2] ) dµ + R (dµ d ) X

19 (µ, ) 2 M + (X) Unbalanced Transport KL( µ) def. = R X log d dµ WF c (µ, ) def. =minhc, i + KL(P 1] µ)+ KL(P 2] ) dµ + R (dµ d ) X µ

20 (µ, ) 2 M + (X) Unbalanced Transport KL( µ) def. = R X log d dµ WF c (µ, ) def. =minhc, i + KL(P 1] µ)+ KL(P 2] ) dµ + R (dµ d ) X Proposition: If c(x, y) = log(cos(min( d(x,y), 2 )) then WF 1/2 is a distance on M + (X). c [Liereo, Mielke, Savaré 2015] [Chizat, Schmitzer, Peyré, Vialard 2015] µ

21 (µ, ) 2 M + (X) Unbalanced Transport WF c (µ, ) def. =minhc, i + KL(P 1] µ)+ KL(P 2] )! Dynamic Benamou-Brenier formulation. [Liereo, Mielke, Savaré 2015] [Chizat, Schmitzer, Peyré, Vialard 2015] µ [Kondratyev, Monsaingeon, Vorotnikov, 2015] KL( µ) def. = R X log d dµ dµ + R (dµ d ) X Proposition: If c(x, y) = log(cos(min( d(x,y), 2 )) then WF 1/2 is a distance on M + (X). c [Liereo, Mielke, Savaré 2015] [Chizat, Schmitzer, Peyré, Vialard 2015]

22 Wasserstein Gradient Flows Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998] µ t+1 =Prox W f (µ t ) def. = argmin µ2m + (X) W (µ t,µ)+ f(µ)

23 Wasserstein Gradient Flows Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998] µ t+1 =Prox W f (µ t ) def. = argmin µ2m + (X) Formal limit! t µ =div(µr(f 0 (µ))) W (µ t,µ)+ f(µ)

24 Wasserstein Gradient Flows Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998] µ t+1 =Prox W f (µ t ) def. = argmin µ2m + (X) Formal limit! t µ =div(µr(f 0 (µ))) f(µ) = R log( dµ dx t µ = µ W (µ t,µ)+ f(µ) (heat di usion)

25 rw Wasserstein Gradient Flows Evolution µ t rw Evolution µ t Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998] µ t+1 =Prox W f (µ t ) def. = argmin µ2m + (X) W (µ t,µ)+ f(µ) Formal limit! t µ =div(µr(f 0 (µ))) f(µ) = R log( dµ dx t µ = µ f(µ) = R t µ =div(µrw) (heat di usion) (advection)

0: @ t µ =div(µr(f 0 (µ))) f(µ) = R log( dµ dx )dµ @ t µ = µ W (µ t,µ)+ f(µ) (heat di usion)

26 rw Wasserstein Gradient Flows Evolution µ t rw Evolution µ t Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998] µ t+1 =Prox W f (µ t ) def. = argmin µ2m + (X) Formal limit! t µ =div(µr(f 0 (µ))) f(µ) = R log( dµ dx t µ = µ W (µ t,µ)+ f(µ) (heat di usion) f(µ) = R t µ =div(µrw) (advection) R f(µ) = 1 (non-linear di usion) m 1 ( dµ dx )m 1 t µ = µ m

27 Overview Transportation-like Problems Regularized Transport Optimal Transport Barycenters Heat Kernel Approximation

28 Transportation-like Problems def. min E( ) = hc, i + f 1 (P 1] )+f 2 (P 2] ) 2M + (X X)

29 Transportation-like Problems def. min E( ) = hc, i + f 1 (P 1] )+f 2 (P 2] ) 2M + (X X) Optimal transport: f 1 = µ f 2 =

30 Transportation-like Problems def. min E( ) = hc, i + f 1 (P 1] )+f 2 (P 2] ) 2M + (X X) Optimal transport: f 1 = µ f 2 = Unbalanced transport: f 1 = KL( µ) f 2 = KL( )

31 Transportation-like Problems def. min E( ) = hc, i + f 1 (P 1] )+f 2 (P 2] ) 2M + (X X) Optimal transport: f 1 = µ f 2 = Unbalanced transport: f 1 = KL( µ) f 2 = KL( ) Gradient flows: f 1 = µt f 2 = f

32 Transportation-like Problems def. min E( ) = hc, i + f 1 (P 1] )+f 2 (P 2] ) 2M + (X X) Optimal transport: f 1 = µ f 2 = Unbalanced transport: f 1 = KL( µ) f 2 = KL( ) Gradient flows: f 1 = µt f 2 = f Regularization: "KL( 0 ) min E( )+"KL( 0 ) Regularization and positivity barrier.

33 Transportation-like Problems def. min E( ) = hc, i + f 1 (P 1] )+f 2 (P 2] ) 2M + (X X) Optimal transport: f 1 = µ f 2 = Unbalanced transport: f 1 = KL( µ) f 2 = KL( ) Gradient flows: f 1 = µt f 2 = f Regularization: "KL( 0 ) min E( )+"KL( 0 ) Regularization and positivity barrier. Discretization grid (prescribed support).

34 Transportation-like Problems def. min E( ) = hc, i + f 1 (P 1] )+f 2 (P 2] ) 2M + (X X) Optimal transport: f 1 = µ f 2 = Unbalanced transport: f 1 = KL( µ) f 2 = KL( ) Gradient flows: f 1 = µt f 2 = f Regularization: Prox KL E/"( 0 ) def. = arg min E( )+"KL( 0 ) Regularization and positivity barrier. "KL( 0 ) Discretization grid (prescribed support). Implicit KL stepping. 0 1 =Prox 1 " E( 0) 2 =Prox 1 " E( 1) argmin E( )

35 def. " = argmin Entropy Regularized Transport {hc, i + "KL( 0 ); 2 (µ, )} " c "

36 def. " = argmin Entropy Regularized Transport Schrödinger s problem: {hc, i + "KL( 0 ); 2 (µ, )} " = argmin 2 (µ, ) KL( K) Gibbs kernel: K(x, y) def. = e c(x,y) " 0 (x, y) Landmark computational paper: [Cuturi 2013]. " c "

37 def. " = argmin Entropy Regularized Transport Schrödinger s problem: {hc, i + "KL( 0 ); 2 (µ, )} " = argmin 2 (µ, ) KL( K) Gibbs kernel: K(x, y) def. = e c(x,y) " 0 (x, y) Landmark computational paper: [Cuturi 2013]. " c Proposition: " "!0! argmin 2 (µ, ) [Carlier, Duval, Peyré, Schmitzer 2015] hc, i " "!+1! µ(x) (y) " µ " "

38 Dykstra-like Iterations Primal: min hc, i + f 1 (P 1] )+f 2 (P 2] )+"KL( 0 )

39 Dykstra-like Iterations Primal: min hc, i + f 1 (P 1] )+f 2 (P 2] )+"KL( 0 ) Dual: max u,v f 1 (u) f 2 (u) "he u ",Ke v " i

40 Dykstra-like Iterations Primal: min hc, i + f 1 (P 1] )+f 2 (P 2] )+"KL( 0 ) Dual: max u,v f 1 (u) f 2 (u) "he u ",Ke v " i (x, y) =a(x)k(x, y)b(y) (a, b) =(e u ",e v " )

41 Dykstra-like Iterations Primal: min hc, i + f 1 (P 1] )+f 2 (P 2] )+"KL( 0 ) Dual: max u,v Block coordinates relaxation: f 1 (u) f 2 (u) "he u ",Ke v " i (x, y) =a(x)k(x, y)b(y) max u f 1 (u) (a, b) =(e u ",e v " ) "he u ",Ke v " i (I u )

42 Dykstra-like Iterations Primal: min hc, i + f 1 (P 1] )+f 2 (P 2] )+"KL( 0 ) Dual: max u,v Block coordinates relaxation: f 1 (u) f 2 (u) "he u ",Ke v " i (x, y) =a(x)k(x, y)b(y) max u max v f 1 (u) (a, b) =(e u ",e v " ) "he u ",Ke v " i (I u ) f 2 (v) "he v ",K e u " i (I v )

43 Dykstra-like Iterations Primal: min hc, i + f 1 (P 1] )+f 2 (P 2] )+"KL( 0 ) Dual: max u,v Block coordinates relaxation: f 1 (u) f 2 (u) "he u ",Ke v " i (x, y) =a(x)k(x, y)b(y) Proposition: max u max v a = ProxKL f 1 /"(Kb) Kb Prox KL f 1 /"(µ) def. f 1 (u) (a, b) =(e u ",e v " ) "he u ",Ke v " i (I u ) f 2 (v) "he v ",K e u " i (I v ) the solutions of (I u ) and (I v ) read: b = ProxKL f 2 /"(K a) K a = argmin f 1 ( )+"KL( µ)

44 Dykstra-like Iterations Primal: min hc, i + f 1 (P 1] )+f 2 (P 2] )+"KL( 0 ) Dual: max u,v Block coordinates relaxation: f 1 (u) f 2 (u) "he u ",Ke v " i (x, y) =a(x)k(x, y)b(y) Proposition: max u max v a = ProxKL f 1 /"(Kb) Kb f 1 (u) (a, b) =(e u ",e v " ) "he u ",Ke v " i (I u ) f 2 (v) "he v ",K e u " i (I v ) the solutions of (I u ) and (I v ) read: b = ProxKL f 2 /"(K a) K a Prox KL f 1 /"(µ) def. = argmin f 1 ( )+"KL( µ)! Only matrix-vector multiplications.! Highly parallelizable.! On regular grids: only convolutions! Linear time iterations.

45 Optimal transport problem: Sinkhorn s Algorithm f 1 = µ f 2 = Prox KL f 1 /"( µ) =µ Prox KL f 2 /"( ) =

46 Optimal transport problem: Sinkhorn s Algorithm f 1 = µ f 2 = Sinkhorn/IPFP algorithm: (`+1) def. a = µ Kb (`) and (`+1) def. b = Prox KL f 1 /"( µ) =µ Prox KL f 2 /"( ) = [Sinkhorn 1967][Deming,Stephan 1940] K a (`+1) (`) " `

47 Optimal transport problem: Sinkhorn s Algorithm Sinkhorn/IPFP algorithm: (`+1) def. a = µ Kb (`) and (`+1) def. b = Proposition: f 1 = µ f 2 = Prox KL f 1 /"( µ) =µ Prox KL f 2 /"( ) = [Sinkhorn 1967][Deming,Stephan 1940] K a (`+1) log( (`) ) log(? ) 1 = O(1 )`, apple 1/" c (`) def. = diag(a (`) )K diag(b (`) ) [Franklin,Lorenz 1989] Local rate: [Knight 2008] (`) " log( log( (`) ) log(? ) 1 ) 0 ` " " `

48 Gradient Flows: Crowd Motion def. µ t+1 = argmin µ W (µ t,µ)+ f(µ) Congestion-inducing function: f(µ) = [0,apple] (µ)+hw, µi [Maury, Roudne -Chupin, Santambrogio 2010]

49 Gradient Flows: Crowd Motion def. µ t+1 = argmin µ W (µ t,µ)+ f(µ) Congestion-inducing function: f(µ) = [0,apple] (µ)+hw, µi [Maury, Roudne -Chupin, Santambrogio 2010] Proposition: Prox 1 " f (µ) =min(e "w µ, apple)

50 Gradient Flows: Crowd Motion def. µ t+1 = argmin µ W (µ t,µ)+ f(µ) Congestion-inducing function: f(µ) = [0,apple] (µ)+hw, µi [Maury, Roudne -Chupin, Santambrogio 2010] Proposition: Prox 1 " f (µ) =min(e "w µ, apple) rw apple = µ t=0 1 apple =2 µ t=0 1 apple =4 µ t=0 1

51 Multiple-Density Gradient Flows (µ 1,t+1,µ 2,t+1 ) def. = argmin (µ 1,µ 2 ) W (µ 1,t,µ 1 )+W (µ 2,t,µ 2 )+ f(µ 1,µ 2 )

52 Multiple-Density Gradient Flows (µ 1,t+1,µ 2,t+1 ) def. = argmin (µ 1,µ 2 ) W (µ 1,t,µ 1 )+W (µ 2,t,µ 2 )+ f(µ 1,µ 2 ) Wasserstein attraction: f(µ 1,µ 2 )=W (µ 1,µ 2 )+h 1 (µ 1 )+h 2 (µ 2 ) rw Example: h i (µ) =hw, µi.

53 Multiple-Density Gradient Flows (µ 1,t+1,µ 2,t+1 ) def. = argmin (µ 1,µ 2 ) W (µ 1,t,µ 1 )+W (µ 2,t,µ 2 )+ f(µ 1,µ 2 ) Wasserstein attraction: f(µ 1,µ 2 )=W (µ 1,µ 2 )+h 1 (µ 1 )+h 2 (µ 2 ) rw Example: h i (µ) =hw, µi.

54 Overview Transportation-like Problems Regularized Transport Optimal Transport Barycenters Heat Kernel Approximation

55 Wasserstein Barycenters P Barycenters of measures (µ k ) k : µ? P 2 argmin k kw (µ k,µ) µ k k =1

56 Wasserstein Barycenters P Barycenters of measures (µ k ) k : µ? P 2 argmin k kw (µ k,µ) µ Generalizes Euclidean barycenter: If µ k = xk then µ? = P k kx k k k =1 µ 1 µ W 2 (µ 1,µ ) W 2 (µ 2,µ ) W 2 (µ 3,µ ) µ 2 µ 3

57 Wasserstein Barycenters P Barycenters of measures (µ k ) k : µ? P 2 argmin k kw (µ k,µ) µ Generalizes Euclidean barycenter: If µ k = xk then µ? = P k kx k k k =1 µ 1 µ W 2 (µ 1,µ ) W 2 (µ 2,µ ) W 2 (µ 3,µ ) µ 2 µ 3

58 Wasserstein Barycenters µ µ2 Mc Cann s displacement interpolation. ) µ1,µ W 2(µ 1 ) ) µ1 µ,µ (µ 3 W2 Generalizes Euclidean barycenter: If µk = xk then µ? = Pk k xk k µ2 =1 2 (µ 2,µ Barycenters of measures (µk )k : k P µ? 2 argmin k k W (µk, µ) W P µ3

59 Wasserstein Barycenters µ µ1,µ W 2(µ 1 ) ) µ1 µ,µ (µ 3 W2 Generalizes Euclidean barycenter: If µk = xk then µ? = Pk k xk ) k µ2 =1 2 (µ 2,µ Barycenters of measures (µk )k : k P µ? 2 argmin k k W (µk, µ) W P µ3 µ2 µ2 Mc Cann s displacement interpolation. µ3 µ1

2 argmin k k W (µk, µ) W P µ3 µ2 µ2 Mc Cann s displacement

60 Wasserstein Barycenters µ µ1,µ W 2(µ 1 ) ) µ1 µ,µ (µ 3 W2 Generalizes Euclidean barycenter: If µk = xk then µ? = Pk k xk ) k µ2 =1 2 (µ 2,µ Barycenters of measures (µk )k : k P µ? 2 argmin k k W (µk, µ) W P µ3 µ2 µ2 Mc Cann s displacement interpolation. [Agueh, Carlier, 2010] Theorem: (for c(x, y) = x y 2 ) if µ1 does not vanish on small sets, µ exists and is unique. µ3 µ1

61 min ( k ) k,µ k Regularized Barycenters ( ) X k (hc, k i + "KL( k 0,k )) ; 8k, k 2 (µ k,µ)

62 min ( k ) k,µ k Regularized Barycenters ( ) X k (hc, k i + "KL( k 0,k )) ; 8k, k 2 (µ k,µ)! Need to fix a discretization grid for µ, i.e. choose ( 0,k ) k

63 min ( k ) k,µ k Regularized Barycenters ( ) X k (hc, k i + "KL( k 0,k )) ; 8k, k 2 (µ k,µ)! Need to fix a discretization grid for µ, i.e. choose ( 0,k ) k! Sinkhorn-like algorithm [Benamou, Carlier, Cuturi, Nenna, Peyré, 2015].

64 min ( k ) k,µ Regularized Barycenters ( ) X k (hc, k i + "KL( k 0,k )) ; 8k, k 2 (µ k,µ) k! Need to fix a discretization grid for µ, i.e. choose ( 0,k ) k! Sinkhorn-like algorithm [Benamou, Carlier, Cuturi, Nenna, Peyré, 2015]. [Solomon et al, SIGGRAPH 2015]

65 Color Transfer Input images: (f,g) (chrominance components) Input measures: µ(a) =U(f 1 (A)), (A) =U(g 1 (A)) f µ g

66 Color Transfer Input images: (f,g) (chrominance components) Input measures: µ(a) =U(f 1 (A)), (A) =U(g 1 (A)) f µ g

67 Color Transfer Input images: (f,g) (chrominance components) Input measures: µ(a) =U(f 1 (A)), (A) =U(g 1 (A)) f T T f µ T g g

68 Color Harmonization Raw image sequence

69 Color Harmonization Raw image sequence Compute Wasserstein barycenter Project on the barycenter

70 Overview Transportation-like Problems Regularized Transport Optimal Transport Barycenters Heat Kernel Approximation

71 Optimal Transport on Surfaces Triangulated mesh M. Geodesic distance d M.

72 Optimal Transport on Surfaces Triangulated mesh M. Geodesic distance d M. Ground cost: c(x, y) =d M (x, y). x i d(x i, ) Level sets

73 Optimal Transport on Surfaces Triangulated mesh M. Geodesic distance d M. Ground cost: c(x, y) =d M (x, y). x i d(x i, ) Level sets Computing c (Fast-Marching): N 2 log(n)! too costly.

74 Entropic Transport on Surfaces Heat equation on t u t (x, ) = M u t (x, ), u t=0 (x, ) = x " t

75 Entropic Transport on Surfaces Heat equation on t u t (x, ) = M u t (x, ), u t=0 (x, ) = x Theorem: [Varadhan] " log(u " ) "!0! d 2 M " t

76 Entropic Transport on Surfaces Heat equation on t u t (x, ) = M u t (x, ), u t=0 (x, ) = x Theorem: [Varadhan] " log(u " ) "!0! d 2 M Sinkhorn kernel: K def. = e d 2 M " " u " Id ` M ` " t

77 Barycenter on a Surface µ2 µ

78 Barycenter on a Surface µ2 µ1 0 1 µ2 µ1 µ4 µ3 µ5 µ6 1

79 Barycenter on a Surface µ2 µ1 0 1 µ2 µ1 µ4 µ3 µ5 µ6 = (1,..., 1)/6 1

80 MRI Data Procesing [with A. Gramfort] Ground cost c = d M : geodesic on cortical surface M. L 2 barycenter W 2 2 barycenter

81 Gradient Flows: Crowd Motion with Obstacles M = sub-domain of R 2. apple = µ t=0 1 apple =2 µ t=0 1 apple =4 µ t=0 1 apple =6 µ t=0 1 Potential cos(w)

82 Gradient Flows: Crowd Motion with Obstacles M = sub-domain of R 2. apple = µ t=0 1 apple =2 µ t=0 1 apple =4 µ t=0 1 apple =6 µ t=0 1 Potential cos(w)

83 Crowd Motion on a Surface M = triangulated mesh. apple = µ t=0 1 apple =6 µ t=0 1 Potential cos(w)

84 Crowd Motion on a Surface M = triangulated mesh. apple = µ t=0 1 apple =6 µ t=0 1 Potential cos(w)

85 Histogram features in imaging and machine learning. Conclusion

86 Conclusion Histogram features in imaging and machine learning. Entropic regularization for optimal transport.

87 Conclusion Histogram features in imaging and machine learning. Entropic regularization for optimal transport. Barycenters, unbalanced OT, gradient flows,...

Generative Models and Optimal Transport

Generative Models and Optimal Transport Marco Cuturi Joint work / work in progress with G. Peyré, A. Genevay (ENS), F. Bach (INRIA), G. Montavon, K-R Müller (TU Berlin) Statistics 0.1 : Density Fitting