Optimal mass transport as a distance measure between images

Size: px

Start display at page:

Download "Optimal mass transport as a distance measure between images"

Patrick McBride
5 years ago
Views:

1 Optimal mass transport as a distance measure between images Axel Ringh 1 1 Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden. 21 st of June 2018 INRIA, Sophia-Antipolis, France

2 Acknowledgements This is based on joint work with Johan Karlsson 1. [1] J. Karlsson, and A. Ringh. Generalized Sinkhorn iterations for regularizing inverse problems using optimal mass transport. SIAM Journal on Imaging Sciences, 10(4), , I acknowledge financial support from Swedish Research Council (VR) Swedish Foundation for Strategic Research (SSF) Code: The code is based on ODL: 1 Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden 2 / 24

3 Outline Background Inverse problems Optimal mass transport Sinkhorn iterations - solving discretized optimal transport problems Sinkhorn iterations as dual coordinate ascent Inverse problems with optimal mass transport priors Example in computerized tomography 3 / 24

4 Inverse problems Consider the problem of recovering f X from data g Y, given by g = A(f ) + noise Notation: X is called the reconstruction space. Y is called the data space. A : X Y is the forward operator. A : Y X denotes the adjoint operator 4 / 24

5 Inverse problems Consider the problem of recovering f X from data g Y, given by g = A(f ) + noise Notation: X is called the reconstruction space. Y is called the data space. A : X Y is the forward operator. A : Y X denotes the adjoint operator Problems of interest are ill-posed inverse problems: a solution might not exist, the solution might not be unique, the solution does not depend continuously on data. Simply put: A 1 does not exist as a continuous bijection! Comes down to: find approximate inverse A so that g = A(f ) + noise = A (g) f. 4 / 24

6 Variational regularization A common technique to solve ill-posed inverse problems is to use variational regularization: arg min f X G(A(f ), g) + λf(f ) G : Y Y R, data discrepancy functional. F : X R, regularization functional. λ is the regularization parameter. Controls trade-off between data matching and regularization. Common example in imaging is total variation regularization: G(h, g) = h g 2 2, F(f ) = f 1. If A is linear this is a convex problem! 5 / 24

7 Incorporating prior information in variational schemes How can one incorporate prior information in such a scheme? 6 / 24

8 Incorporating prior information in variational schemes How can one incorporate prior information in such a scheme? One way: consider f is prior/template H defines closeness to f. What is a good choice for H? arg min G(A(f ), g) + λf(f ) + γh( f, f ) f X 6 / 24

9 Incorporating prior information in variational schemes How can one incorporate prior information in such a scheme? One way: consider f is prior/template H defines closeness to f. What is a good choice for H? arg min G(A(f ), g) + λf(f ) + γh( f, f ) f X Scenarios where potentially of interest. incomplete measurements, e.g. limited angle tomography. spatiotemporal imaging: data is a time-series of data sets: {g t} T t=0. For each set, the underlying image has undergone a deformation. each data set g t normally contains less information : A (g t) is a poor reconstruction. Approach: solve coupled inverse problems T [ ] arg min f 0,...,f T X G(A(f j ), g j ) + λf(f j ) + j=0 T γh(f j 1, f j ) j=1 6 / 24

10 Measuring distances between functions: the L p metrics Given two functions f 0(x) and f 1(x), what is a suitable way to measure the distance between the two? 7 / 24

11 Measuring distances between functions: the L p metrics Given two functions f 0(x) and f 1(x), what is a suitable way to measure the distance between the two? One suggestion: measure it pointwise, e.g., using an L p metric ( 1/p f 0 f 1 p = f 0(x) f 1(x) dx) p. D 7 / 24

12 Measuring distances between functions: the L p metrics Given two functions f 0(x) and f 1(x), what is a suitable way to measure the distance between the two? One suggestion: measure it pointwise, e.g., using an L p metric Draw-backs: for example unsensitive to shifts. ( 1/p f 0 f 1 p = f 0(x) f 1(x) dx) p. D Example: It gives the same distance from f 0 to f 1 and f 2: f 0 f 1 1 = f 0 f 2 1 = 8. 7 / 24

13 Optimal mass transport - Monge formulation Gaspard Monge: formulated optimal mass transport Optimal transport of soil for construction of forts and roads. Gaspard Monge 8 / 24

14 Optimal mass transport - Monge formulation Gaspard Monge: formulated optimal mass transport Optimal transport of soil for construction of forts and roads. Gaspard Monge 8 / 24

15 Optimal mass transport - Monge formulation Gaspard Monge: formulated optimal mass transport Optimal transport of soil for construction of forts and roads. Gaspard Monge 8 / 24

16 Optimal mass transport - Monge formulation Gaspard Monge: formulated optimal mass transport Optimal transport of soil for construction of forts and roads. Let c(x 0, x 1) : X X R + describes the cost for transporting a unit mass from location x 0 to x Gaspard Monge Given two functions f 0, f 1 : X R +, find the function φ : X X minimizing the transport cost c(x, φ(x))f 0(x)dx X / 24

17 Optimal mass transport - Monge formulation Gaspard Monge: formulated optimal mass transport Optimal transport of soil for construction of forts and roads. Let c(x 0, x 1) : X X R + describes the cost for transporting a unit mass from location x 0 to x Gaspard Monge Given two functions f 0, f 1 : X R +, find the function φ : X X minimizing the transport cost c(x, φ(x))f 0(x)dx where φ is mass preserving map from f 0 to f 1: f 1(x)dx = x A X φ(x) A 2 f 0(x)dx for all A X / 24

Optimal mass transport - Monge formulation Gaspard Monge: formulated optimal mass transport 1781. Optimal transport of soil for construction of forts and roads.

18 Optimal mass transport - Monge formulation Gaspard Monge: formulated optimal mass transport Optimal transport of soil for construction of forts and roads. Let c(x 0, x 1) : X X R + describes the cost for transporting a unit mass from location x 0 to x Gaspard Monge Given two functions f 0, f 1 : X R +, find the function φ : X X minimizing the transport cost Nonconvex problem! c(x, φ(x))f 0(x)dx where φ is mass preserving map from f 0 to f 1: f 1(x)dx = x A X φ(x) A 2 f 0(x)dx for all A X / 24

19 Optimal mass transport - Kantorovich formulation Leonid Kantorovich: convex formulation and duality theory Nobel Memorial Prize 1975 in Economics for contributions to the theory of optimum allocation of resources. Leonid Kantorovich 9 / 24

20 Optimal mass transport - Kantorovich formulation Leonid Kantorovich: convex formulation and duality theory Nobel Memorial Prize 1975 in Economics for contributions to the theory of optimum allocation of resources. Again, let c(x 0, x 1) denote the cost of transporting a unit mass from the point x 0 to the point x 1. Leonid Kantorovich Given two functions f 0, f 1 : X R +, find a transport plan M : X X R +, where M(x 0, x 1) is the amount of mass moved between x 0 to x / 24

21 Optimal mass transport - Kantorovich formulation Leonid Kantorovich: convex formulation and duality theory Nobel Memorial Prize 1975 in Economics for contributions to the theory of optimum allocation of resources. Again, let c(x 0, x 1) denote the cost of transporting a unit mass from the point x 0 to the point x 1. Given two functions f 0, f 1 : X R +, find a transport plan M : X X R +, where M(x 0, x 1) is the amount of mass moved between x 0 to x Leonid Kantorovich Transportation plan (M) / 24

22 Optimal mass transport - Kantorovich formulation Leonid Kantorovich: convex formulation and duality theory Nobel Memorial Prize 1975 in Economics for contributions to the theory of optimum allocation of resources. Again, let c(x 0, x 1) denote the cost of transporting a unit mass from the point x 0 to the point x 1. Given two functions f 0, f 1 : X R +, find a transport plan M : X X R +, where M(x 0, x 1) is the amount of mass moved between x 0 to x 1. Look for M that minimizes transportation cost: min M 0 X X c(x 0, x 1)M(x 0, x 1)dx 0dx Leonid Kantorovich Transportation plan (M) / 24

Optimal mass transport - Kantorovich formulation Leonid Kantorovich: convex formulation and duality theory 1942.

23 Optimal mass transport - Kantorovich formulation Leonid Kantorovich: convex formulation and duality theory Nobel Memorial Prize 1975 in Economics for contributions to the theory of optimum allocation of resources. Again, let c(x 0, x 1) denote the cost of transporting a unit mass from the point x 0 to the point x 1. Given two functions f 0, f 1 : X R +, find a transport plan M : X X R +, where M(x 0, x 1) is the amount of mass moved between x 0 to x 1. Look for M that minimizes transportation cost: min M 0 X X c(x 0, x 1)M(x 0, x 1)dx 0dx 1 s.t. f 0(x 0) = M(x 0, x 1)dx 1, x 0 X X f 1(x 1) = X Leonid Kantorovich 2 M(x 0, x 1)dx 0, 0 x 1 X Transportation plan (M) 9 / 24

24 Measuring distances between functions: optimal mass transport Define a distance between two functions f 0(x) and f 1(x) using optimal transport min c(x 0, x 1 )M(x 0, x 1 )dx 0 dx 1 M 0 X X T (f 0, f 1 ) := s.t. f 0 (x 0 ) = M(x 0, x 1 )dx 1, x 0 X X f 1 (x 1 ) = M(x 0, x 1 )dx 0, x 1 X. If d(x, y) metric on X and c(x, y) = d(x, y) p, then T (f 0, f 1) 1/p is a metric on the space of measures. X 10 / 24

25 Measuring distances between functions: optimal mass transport Define a distance between two functions f 0(x) and f 1(x) using optimal transport min c(x 0, x 1 )M(x 0, x 1 )dx 0 dx 1 M 0 X X T (f 0, f 1 ) := s.t. f 0 (x 0 ) = M(x 0, x 1 )dx 1, x 0 X X f 1 (x 1 ) = M(x 0, x 1 )dx 0, x 1 X. If d(x, y) metric on X and c(x, y) = d(x, y) p, then T (f 0, f 1) 1/p is a metric on the space of measures. X Example revisited: let c(x, y) = x y 2 2 T (f 0, f 1) = 4 ( 2) 2 = 8, T (f 0, f 2) = = 100 This indicates that optimal transport is a more natural distance between two images than L p, at least if one is a deformation of the other. 10 / 24

26 Optimal mass transport - discrete formulation How to solve the optimal transport problem? Here: discretize then optimize 11 / 24

27 Optimal mass transport - discrete formulation How to solve the optimal transport problem? Here: discretize then optimize A linear programming problem: for vectors f 0 R n and f 1 R m and a cost matrix C = [c ij ] R n m, where c ij defines the transportation cost between pixels x i and x j, find the transportation plan M = [m ij ] R n m, m ij is the mass transported between pixels x i and x j, such that min n m ij 0 i=1 j=1 subject to m c ij m ij m m ij = f 0 (i), i = 1,..., n j=1 n m ij = f 1 (j), j = 1,..., m i=1 11 / 24

28 Optimal mass transport - discrete formulation How to solve the optimal transport problem? Here: discretize then optimize A linear programming problem: for vectors f 0 R n and f 1 R m and a cost matrix C = [c ij ] R n m, where c ij defines the transportation cost between pixels x i and x j, find the transportation plan M = [m ij ] R n m, m ij is the mass transported between pixels x i and x j, such that min n m ij 0 i=1 j=1 m c ij m ij m subject to m ij = f 0 (i), i = 1,..., n j=1 n m ij = f 1 (j), j = 1,..., m i=1 min M 0 trace(c T M) subject to M1 m = f 0 M T 1 n = f 1 11 / 24

29 Optimal mass transport - discrete formulation How to solve the optimal transport problem? Here: discretize then optimize A linear programming problem: for vectors f 0 R n and f 1 R m and a cost matrix C = [c ij ] R n m, where c ij defines the transportation cost between pixels x i and x j, find the transportation plan M = [m ij ] R n m, m ij is the mass transported between pixels x i and x j, such that min n m ij 0 i=1 j=1 m c ij m ij m subject to m ij = f 0 (i), i = 1,..., n j=1 n m ij = f 1 (j), j = 1,..., m i=1 min M 0 trace(c T M) subject to M1 m = f 0 M T 1 n = f 1 Number of variables is n m. To compare the distance between to images of size n = m = : M R = approximately variables. Prohibitively large! 11 / 24

30 Optimal mass transport - Sinkhorn iterations Original problem too computationally demanding. Solution: introduce an entropy barrier/regularization term D(M) = i,j (m ij log(m ij ) m ij + 1) [1], T ɛ(f 0, f 1) := min M 0 subject to trace(c T M) + ɛd(m) f 0 = M1 f 1 = M T 1. [1] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages , / 24

31 Optimal mass transport - Sinkhorn iterations Original problem too computationally demanding. Solution: introduce an entropy barrier/regularization term D(M) = i,j (m ij log(m ij ) m ij + 1) [1], T ɛ(f 0, f 1) := Using Lagrangian relaxation gives min M 0 subject to trace(c T M) + ɛd(m) f 0 = M1 f 1 = M T 1. L(M, λ 0, λ 1) = trace(c T M) + ɛd(m) + λ T 0 (f 0 M1) + λ T 1 (f 1 M T 1). [1] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages , / 24

32 Optimal mass transport - Sinkhorn iterations Original problem too computationally demanding. Solution: introduce an entropy barrier/regularization term D(M) = i,j (m ij log(m ij ) m ij + 1) [1], T ɛ(f 0, f 1) := Using Lagrangian relaxation gives min M 0 subject to trace(c T M) + ɛd(m) f 0 = M1 f 1 = M T 1. L(M, λ 0, λ 1) = trace(c T M) + ɛd(m) + λ T 0 (f 0 M1) + λ T 1 (f 1 M T 1). Given dual variables λ 0, λ 1, the minimum m ij is 0 = L(M, λ0, λ1) m ij = c ij + ɛ log(m ij ) λ 0(i) λ 1(j) [1] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages , / 24

33 Optimal mass transport - Sinkhorn iterations Original problem too computationally demanding. Solution: introduce an entropy barrier/regularization term D(M) = i,j (m ij log(m ij ) m ij + 1) [1], T ɛ(f 0, f 1) := Using Lagrangian relaxation gives min M 0 subject to trace(c T M) + ɛd(m) f 0 = M1 f 1 = M T 1. L(M, λ 0, λ 1) = trace(c T M) + ɛd(m) + λ T 0 (f 0 M1) + λ T 1 (f 1 M T 1). Given dual variables λ 0, λ 1, the minimum m ij is 0 = L(M, λ0, λ1) m ij Expressed as m ij = e λ 0(i)/ɛ e c ij /ɛ e λ 1(j)/ɛ = c ij + ɛ log(m ij ) λ 0(i) λ 1(j) M = diag(e λt 0 /ɛ )Kdiag(e λ 1/ɛ ) where K = exp( C/ɛ). Here and in what follows exp( ), log( ),./, denotes the elementwise function. [1] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages , / 24

34 Optimal mass transport - Sinkhorn iterations Original problem too computationally demanding. Solution: introduce an entropy barrier/regularization term D(M) = i,j (m ij log(m ij ) m ij + 1) [1], T ɛ(f 0, f 1) := Using Lagrangian relaxation gives min M 0 subject to trace(c T M) + ɛd(m) f 0 = M1 f 1 = M T 1. L(M, λ 0, λ 1) = trace(c T M) + ɛd(m) + λ T 0 (f 0 M1) + λ T 1 (f 1 M T 1). The solution M = diag(e λt 0 /ɛ )Kdiag(e λ1/ɛ ) is specified by n + m unknowns! Given dual variables λ 0, λ 1, the minimum m ij is 0 = L(M, λ0, λ1) m ij Expressed as m ij = e λ 0(i)/ɛ e c ij /ɛ e λ 1(j)/ɛ = c ij + ɛ log(m ij ) λ 0(i) λ 1(j) M = diag(e λt 0 /ɛ )Kdiag(e λ 1/ɛ ) where K = exp( C/ɛ). Here and in what follows exp( ), log( ),./, denotes the elementwise function. [1] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages , / 24

35 Optimal mass transport - Sinkhorn iterations How to find the values of λ 0 and λ 1? [1] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages , / 24

36 Optimal mass transport - Sinkhorn iterations How to find the values of λ 0 and λ 1? Let u 0 = exp(λ 0/ɛ), u 1 = exp(λ 1/ɛ). The optimal solution M = diag(u 0)Kdiag(u 1) needs to satisfy diag(u 0)Kdiag(u 1)1 = f 0 and diag(u 1)K T diag(u 0)1 = f 1. [1] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages , / 24

37 Optimal mass transport - Sinkhorn iterations How to find the values of λ 0 and λ 1? Let u 0 = exp(λ 0/ɛ), u 1 = exp(λ 1/ɛ). The optimal solution M = diag(u 0)Kdiag(u 1) needs to satisfy Theorem (Sinkhorn iterations [2]) diag(u 0)Kdiag(u 1)1 = f 0 and diag(u 1)K T diag(u 0)1 = f 1. For any matrix K with positive elements there are diagonal matrices diag(u 0), diag(u 1) such that M = diag(u 0)Kdiag(u 1) has prescribed row- and column-sums f 0 and f 1. The vectors u 0 and u 1 can be obtained by alternating marginalization: u 0 = f 0./(Ku 1) u 1 = f 1./(K T u 0). Each iteration only requires the multiplications Ku 1 and K T u 0. This is the bottle neck. Linear convergence rate. Thus highly computationally efficient, allowing for solving large problems. [1] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages , [2] R. Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4), , / 24

38 Sinkhorn iterations as dual coordinate ascent Several ways to motivate Sinkhorn iterations [1] Diagonal matrix scaling Bregman projections Dykstra s algorithm [1] J.D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré. Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2), A1111-A1138, / 24

39 Sinkhorn iterations as dual coordinate ascent Several ways to motivate Sinkhorn iterations [1] Diagonal matrix scaling Bregman projections Dykstra s algorithm Here we will introduce yet another interpretation: Use a dual formulation The Sinkhorn iteration corresponds to dual coordinate ascent This allows us to generalize Sinkhorn iterations Approach for addressing inverse problem with optimal transport term [1] J.D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré. Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2), A1111-A1138, / 24

40 Sinkhorn iterations as dual coordinate ascent Lagrangian relaxation gave optimal form of the primal variable M = diag(u 0)Kdiag(u 1) 15 / 24

41 Sinkhorn iterations as dual coordinate ascent Lagrangian relaxation gave optimal form of the primal variable The Lagrangian dual function: M = diag(u 0)Kdiag(u 1) ϕ(u 0, u 1) := min M 0 L(M, u0, u1) = L(M, u 0, u 1) =... = ɛ log(u 0) T f 0 + ɛ log(u 1) T f 1 ɛu T 0 Ku 1 + ɛnm. 15 / 24

42 Sinkhorn iterations as dual coordinate ascent Lagrangian relaxation gave optimal form of the primal variable The Lagrangian dual function: M = diag(u 0)Kdiag(u 1) ϕ(u 0, u 1) := min M 0 L(M, u0, u1) = L(M, u 0, u 1) =... = ɛ log(u 0) T f 0 + ɛ log(u 1) T f 1 ɛu T 0 Ku 1 + ɛnm. The dual problem is thus max ϕ(u u 0,u 0, u 1) 1 15 / 24

43 Sinkhorn iterations as dual coordinate ascent Lagrangian relaxation gave optimal form of the primal variable The Lagrangian dual function: M = diag(u 0)Kdiag(u 1) ϕ(u 0, u 1) := min M 0 L(M, u0, u1) = L(M, u 0, u 1) =... = ɛ log(u 0) T f 0 + ɛ log(u 1) T f 1 ɛu T 0 Ku 1 + ɛnm. The dual problem is thus max ϕ(u u 0,u 0, u 1) 1 Taking the gradient w.r.t u 0 and putting it equal to zero gives and w.r.t u 1 gives These are the Sinkhorn iterations! 0 = f 0./u 0 Ku 1, ( T 0 = f 1./u 1 u0 K) T. 15 / 24

44 Sinkhorn iterations as dual coordinate ascent For g proper, convex and lower semicontinuous, we can now consider problems of the form min f 1 T ɛ(f 0, f 1) + g(f 1) 16 / 24

45 Sinkhorn iterations as dual coordinate ascent For g proper, convex and lower semicontinuous, we can now consider problems of the form min T ɛ(f 0, f 1) + g(f 1) = min trace(c T M) + ɛd(m) + g(f 1) f 1 M 0, f 1 subject to f 0 = M1 f 1 = M T / 24

46 Sinkhorn iterations as dual coordinate ascent For g proper, convex and lower semicontinuous, we can now consider problems of the form min T ɛ(f 0, f 1) + g(f 1) = min trace(c T M) + ɛd(m) + g(f 1) f 1 M 0, f 1 subject to f 0 = M1 f 1 = M T 1. Can be solve by dual coordinate ascent u 0 = f 0./(Ku 1) if the second inclusion can be solved efficiently. 0 g ( ɛ log(u 1)) 1 u 1 K T u 0, Can be solvable when g ( ) is component-wise. Example of such cases: g( ) = I f ( ) indicator function on { f } original optimal transport problem g( ) = / 24

47 Inverse problems with optimal mass transport priors Consider the inverse problems TV-regularization term: f 1 1 min f 1 0 f 1 1 subject to Af 1 g 2 κ. Forward model A, data g, and data mismatch term: Af 1 g 2 17 / 24

48 Inverse problems with optimal mass transport priors Consider the inverse problems min f 1 0 TV-regularization term: f 1 1 f f 1 close to f 0 subject to Af 1 g 2 κ. Forward model A, data g, and data mismatch term: Af 1 g 2 Prior f 0 17 / 24

49 Inverse problems with optimal mass transport priors Consider the inverse problems TV-regularization term: f 1 1 min f 1 0 f γt ɛ(f 0, f 1) subject to Af 1 g 2 κ. Forward model A, data g, and data mismatch term: Af 1 g 2 Prior f 0 17 / 24

50 Inverse problems with optimal mass transport priors Consider the inverse problems TV-regularization term: f 1 1 min f 1 0 f γt ɛ(f 0, f 1) subject to Af 1 g 2 κ. Forward model A, data g, and data mismatch term: Af 1 g 2 Prior f 0 Large convex optimization problem with several terms, variable splitting common: ADMM, primal-dual hybrid gradient algorithm, primal-dual Douglas-Rachford algorithm. Common tool in these algorithms: the proximal operator of the involved functions F. Prox σ F(h) = argmin f F(f ) + 1 2σ f h 2 2. [1] R.T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14(5), , / 24

51 Inverse problems with optimal mass transport priors Generalized Sinkhorn iterations We want to compute the proximal of T ɛ(f 0, ), given by Prox σ T ɛ(f 0, )(h) = argmin T ɛ(f 0, f 1) + 1 f 1 2σ f1 h 2 2. Thus we want to solve min trace(c T M) + ɛd(m) + 1 M 0,f 1 2σ f1 h 2 2 subject to f 0 = M1 f 1 = M T / 24

52 Inverse problems with optimal mass transport priors Generalized Sinkhorn iterations We want to compute the proximal of T ɛ(f 0, ), given by Prox σ T ɛ(f 0, )(h) = argmin T ɛ(f 0, f 1) + 1 f 1 2σ f1 h 2 2. Thus we want to solve min trace(c T M) + ɛd(m) + 1 M 0,f 1 2σ f1 h 2 2 subject to f 0 = M1 f 1 = M T 1. Compare to Using dual coordinate ascent, with g( ) = 1 2σ h 2 2, 1 u 0 = f 0./(Ku 1) we get the algorithm: 2 1 u 0 = f 0./(Ku 1) 2 u 1 = exp ( h ω ( h + log ( K T u ) σɛ σɛ 0) + log(σɛ) )) u 1 = f 1./(K T u 0) Here ω denotes the (elementwise) Wright omega function, i.e., x = log(ω(x)) + ω(x). Solved elementwise. Bottleneck is still computation of Ku 1, K T u / 24

53 Inverse problems with optimal mass transport priors Generalized Sinkhorn iterations We want to compute the proximal of T ɛ(f 0, ), given by Prox σ T ɛ(f 0, )(h) = argmin T ɛ(f 0, f 1) + 1 f 1 2σ f1 h 2 2. Thus we want to solve min trace(c T M) + ɛd(m) + 1 M 0,f 1 2σ f1 h 2 2 subject to f 0 = M1 f 1 = M T 1. Compare to Using dual coordinate ascent, with g( ) = 1 2σ h 2 2, 1 u 0 = f 0./(Ku 1) we get the algorithm: 2 1 u 0 = f 0./(Ku 1) 2 u 1 = exp ( h ω ( h + log ( K T u ) σɛ σɛ 0) + log(σɛ) )) u 1 = f 1./(K T u 0) Here ω denotes the (elementwise) Wright omega function, i.e., x = log(ω(x)) + ω(x). Solved elementwise. Bottleneck is still computation of Ku 1, K T u 0. Theorem The algorithm is globally convergent, and with linear convergence rate. 18 / 24

54 Example in computerized tomography Computerized Tomography (CT): imaging modality used in many areas, e.g., medicine. The object is probed with X-rays. Different materials attenuates X-rays differently = incoming and outgoing intensities gives information about the object. Simplest model L r,θ f (x)dx = log ( ) I0, I f (x) is the attenuation in the point x, which is what we want to reconstruct, L r,θ is the line along which the X-ray beam travels, I 0 and I are the the incoming and outgoing intensities. Illustration from Wikipedia 19 / 24

55 Example in computerized tomography Parallel beam 2D CT example: Reconstruction space: pixels Angles: 30 in [π/4, 3π/4] (limited angle) Detector partition: uniform 350 bins Noise level 5% (a) Shepp-Logan phantom (b) Prior 20 / 24

56 Example in computerized tomography TV-regularization and l 2 2 prior: min f 1 γ f 0 f f 1 1 subject to Af 1 w 2 κ. TV-regularization and optimal transport prior: min f 1 γt ɛ(f 0, f 1 ) + f 1 1 subject to Af 1 w 2 κ. (f) Shepp-Logan phantom (g) Prior 20 / 24

57 Example in computerized tomography TV-regularization and l 2 2 prior: min f 1 γ f 0 f f 1 1 subject to Af 1 w 2 κ. TV-regularization and optimal transport prior: min f 1 γt ɛ(f 0, f 1 ) + f 1 1 subject to Af 1 w 2 κ. (k) Shepp-Logan phantom (l) Prior (m) TV-regularization (n) TV-regularization and l 2 2 -prior (γ = 10) (o) TV-regularization and optimal transport prior (γ = 4) 20 / 24

58 Example in computerized tomography Comparing different regularization parameters for the problem with l 2 2 prior. min f 1 γ f 0 f f 1 1 subject to Af 1 w 2 κ. (p) γ = 1 (q) γ = 10 (r) γ = 100 (s) γ = 1000 (t) γ = Figure: Reconstructions using l 2 prior with different regularization parameters γ. 21 / 24

59 Example in computerized tomography Parallel beam 2D CT example: Reconstruction space: pixels Angles: 30 in [0, π] Detector partition: uniform 350 bins Noise level 3% (a) Phantom (b) Prior 22 / 24

60 Example in computerized tomography Parallel beam 2D CT example: Reconstruction space: pixels Angles: 30 in [0, π] Detector partition: uniform 350 bins Noise level 3% (f) Phantom (g) Prior (h) TV-regularization (i) TV-regularization and l 2 2 -prior (γ = 10) (j) TV-regularization and optimal transport prior (γ = 4) 22 / 24

61 Conclusions and further work Conclusions Optimal mass transport - a viable framework for imaging applications Generalized Sinkhorn iteration for computing the proximal operator of optimal transport cost Use variable splitting for solving the inverse problem Application to CT reconstruction using optimal transport priors 23 / 24

62 Conclusions and further work Conclusions Optimal mass transport - a viable framework for imaging applications Generalized Sinkhorn iteration for computing the proximal operator of optimal transport cost Use variable splitting for solving the inverse problem Application to CT reconstruction using optimal transport priors Potential future directions: Application to spatiotemporal image reconstruction: arg min f 0,...,f T X T j=0 [ ] G(A(f j ), g j ) + λf(f j ) + More efficient ways of solving the dual problem? T γh(f j 1, f j ) Learning for inverse problems using optimal transport as a loss function j=1 23 / 24

63 Thank you for your attention! Questions? 24 / 24

arxiv: v4 [math.oc] 28 Aug 2017

arxiv: v4 [math.oc] 28 Aug 2017 GENERALIZED SINKHORN ITERATIONS FOR REGULARIZING INVERSE PROBLEMS USING OPTIMAL MASS TRANSPORT JOHAN KARLSSON AND AXEL RINGH arxiv:1612.02273v4 [math.oc] 28 Aug 2017 Abstract. The optimal mass transport