CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability subset. In applie problems such as eep learning, the probability istribution is often generate by a parameterize mapping function. In this case, we erive a formulation for the constraine ynamical OT. 1. Introuction Dynamical optimal transport problems play vital roles in flui ynamics [12] an mean fiel games [8]. They provie a type of statistical istance an interesting ifferential structures in the set of probability ensities [14]. The full probability set is often intractable, when the imension of sample space is large. For this reason, parameterize probability subsets have been wiely consiere, especially in machine learning problems an information geometry [1, 2, 4, 7, 11, 13]. We are intereste in stuying ynamical OT problems over a parameterize probability subset. In this note, we follow a series of work foun in [6, 9,?, 10,?], an introuce general constraine ynamical OT problems in a parametrize probability subset. As in eep learning [3], the probability subset is often constructe by a parameterize mapping. In these cases, we emonstrate that the constraine ynamical OT problems exhibit simple variational structures. We arrange this note as follows: In section 2, we briefly review the ynamical OT in both Eulerian an Lagrangian coorinates 1. Using Eulerian coorinates, we propose the constraine ynamical OT over a parameterize probability subset. In section 3, we next erive an equivalent Lagrangian formulation for the constraine problem. Key wors an phrases. Dynamical optimal transport; Mean fiel games; Information geometry; Machine learning. The research is supporte by AFOSR MURI proposal number 18RT0073. 1 In flui ynamics, the Eulerian coorinates represent the evolution of probability ensity function of particles, while the Lagrangian coorinates escribe the motion of particles. In learning problems, the Eulerian coorinates naturally connect with the minimization problem in term of probability ensities, while Lagrangian coorinate refers to the variational problem formulate in samples, whose analog are particles. In learning, we moel problems in Eulerian, an compute them in Lagrangian. In other wors, we often write the objective function in term of ensities an compute them via samples. 1
2 LI, OSHER 2. Constraine ynamical OT In this section, we briefly review the ynamical OT in a full probability set via both Eulerian an Lagrangian formalisms. Using the Eulerian coorinates, we propose the ynamical OT in a parameterize probability subset. For the simplicity of exposition, { all our erivations assume smoothness. Consier ensities ρ 0, ρ 1 P + () = ρ(x) C (): ρ(x) > 0, ρ(x)x = 1, where is a n- imensional sample space. Here can be R n, or a convex compact region in R n with zero flux conitions or perioic bounary conitions. Dynamical OT stuies a variational problem in ensity space, known as the Benamou- Brenier formula [5]. Given a Lagrangian function L: T [0, + ) uner suitable conitions, consier C(ρ 0, ρ 1 ) := inf v t 1 0 EL(X t (ω), v(t, X t (ω)))t, where E is the expectation operator over realizations ω in event space an the infimum is taken over all vector fiels v t = v(t, ), such that (1a) Ẋ t (ω) = v(t, X t (ω)), X 0 ρ 0, X 1 ρ 1. (1b) Here X i ρ i represents that X i (ω) satisfies the probability ensity ρ i (x), for i = 0, 1. Equivalently, enote the ensity of particles X t (ω) at space x an time t by ρ(t, x). Then problem (1) refers to a variational problem in ensity space: 1 C(ρ 0, ρ 1 ) := inf v t 0 L(x, v(t, x))ρ(t, x)xt, where the infimum is taken over all Borel vector fiels v t = v(t, ), such that the ensity function ρ(t, x) satisfies the continuity equation: ρ(t, x) t (2a) + (ρ(t, x)v(t, x)) = 0, ρ(0, x) = ρ 0 (x), ρ(1, x) = ρ 1 (x). (2b) Here is the ivergence operator in. In the language of flui ynamics, problem (1) refers to the Lagrangian formalism, while problem (2) is the associate Eulerian formalism; see etails in [14]. The Lagrangian formalism focuses on the motion of each iniviual particles, while the Eulerian formalism escribes the global behavior of all particles. Here (1) an (2) are equivalent since they represent the same variational problem using ifferent coorinate systems. In aition, one often consiers L(x, v) = v p, p 1, where is the Eucliean norm. In this case, the optimal value of the variational problem efines a istance function in the set of probability space. Denote W p (ρ 0, ρ 1 ) := C(ρ 0, ρ 1 ) 1 p, where W p is calle the L p -Wasserstein istance. We next stuy the variational problem (2) constraine on a parameterize probability ensity set. In other wors, consier a parameter space Θ R with { P Θ = ρ(θ, x) C (): θ Θ, ρ(θ, x)x = 1, ρ(θ, x) > 0, x.
LAGRANGIAN FORMULATION 3 Here we assume that ρ: Θ P + () is an injective mapping 2. We introuce the constraine ynamical OT as follows: 1 c(θ 0, θ 1 ) := inf L(x, v(t, x))ρ(θ t, x)xt, (3a) v t 0 where θ t = θ(t) Θ, t [0, 1], is a path in parameter space, an the infimum is taken over the Borel vector fiels v t = v(t, ), such that the constraine continuity equation hols: t ρ(θ t, x) + (ρ(θ t, x)v(t, x)) = 0, θ 0, θ 1 are fixe. (3b) We notice that the infimum of problem (3) is taken over ensity paths lying in the parameterize probability set, i.e. ρ(θ t, ) ρ(θ). Here the changing ratio of ensity is reuce into a finite imensional irection, i.e. t ρ(θ t, x) = ( θt ρ(θ t, x), θt t ), where (, ) is an inner prouct in R. A natural question arises. Variational problem (2), together with its constraine problem (3), are written in Eulerian coorinates. They evolve unavoiably with the entire probability ensity functions. For practical reasons, can we fin the Lagrangian coorinates for constraine problem (3)? In other wors, what are analogs of (1) in ρ(θ)? We next emonstrate an answer to this question. We show that there is an expression for the motion of particles, whose ensity path moves accoring to the constraine continuity equation (3b). 3. Lagrangian formulations In this section, we show the main result of this note, that is the constraine ynamical OT (3) has a simple Lagrangian formulation in Proposition 1. Consier a parameterize mapping or implicit generative moel as follows. Given a input space R n 1, n 1 n, let g θ :, x = g(θ, z), for z. Here g θ is a mapping function epening on parameters θ Θ. Given realizations ω in event space, we assume that the ranom variable z(ω) satisfies a ensity function µ(z) P + (), an enote x(ω) = g(θ, z(ω)) satisfying the ensity function ρ(θ, x). This means that the map g θ pushes forwar µ(z) to ρ(θ, x), enote by ρ(θ, x) = g θ µ(z): f(g(θ, z))µ(z)z = f(x)ρ(θ, x)x, for any f Cc (). (4) In this case, the parameterize probability set is given as follows: { ρ(θ) = ρ(θ, x) C (): θ Θ, ρ(θ, x) = g θ µ(z). We next present the constraine ynamical OT in Lagrangian coorinates. We notice a fact that, the vector fiel in optimal ensity path of problem (2) or (3) satisfies v(t, x) = D p H(x, Φ(t, x)), 2 We abuse the notation of ρ. Notice that ρ(θ, x) is a probability istribution parameterize by θ Θ, while ρ(x) is a probability istribution function in the full probability set.
4 LI, OSHER where H(x, p) = sup pv L(x, v) v T x is the Hamiltonian function associate with L. Proposition 1 (Constraine ynamical OT in Lagrangian formulation). The constraine ynamical OT has the following formulation: { 1 c(θ 0, θ 1 ) = inf E z µ L(g(θ t, z), t g(θ t, z))t: 0 t g(θ t, z) = D p H(g(θ t, z), x Φ(t, g(θ t, z))), θ(0) = θ 0, θ(1) = θ 1, (5) where the infimum is taken over all feasible potential functions Φ: [0, 1] R an parameter paths θ : [0, 1] R. Proof. Denote t g(θ t, z) = v(t, g(θ, z)), with v(t, g(θ t, z)) = D p H(g(θ t, z), Φ(t, g(θ t, z))). We show that the probability ensity transition equation of g(θ t, z) satisfies the constraine continuity equation t ρ(θ t, x) + (ρ(θ t, x)v(t, x)) = 0, (6) an E z µ L(g(θ t, z), t g(θ t, z)) = L(x, v(t, x))ρ(θ t, x)x. (7) On the one han, consier f C c (), then t E z µf(g(θ t, z)) = f(g(θ t, z))µ(z)z t = f(x)ρ(θ t, x)x t = f(x) t ρ(θ t, x)x, (8) where the secon equality hols from the push forwar relation (4).
On the other han, consier LAGRANGIAN FORMULATION 5 t E z µf(g(θ t, z)) = lim E f(g(θ t+ t, z) f(g(θ t, z)) z µ t 0 t f(g(θ t+ t, z)) f(g(θ t, z)) = lim µ(z)z t 0 t = f(g(θ t, z)) t g(θ t, z)µ(z)z = f(g(θ t, z))v(t, g(θ t, z))µ(z)z = f(x)v(t, x)ρ(θ t, x)x = f(x) (v(t, x)ρ(θ t, x))x, where, are graient an ivergence operators w.r.t. x. The secon to last equality hols from the push forwar relation (4), an the last equality hols using the integration by parts w.r.t. x. Since (8) = (9) for any f C c (), we have proven (6). In aition, by the efinition of the push forwar operator (4), we have E z µ L(g(θ t, z), t g(θ t, z)) = L(g(θ t, z), v(t, g(θ t, z)))µ(z)z = L(x, v(t, x))ρ(θ t, x)x. Thus we prove (7). (9) It is interesting to compare variational problems (1) with (5). We can view g(θ t, z) as parameterize particles, whose ensity function is constraine in the parameterize probability set ρ(θ). Their motions result at the evolution of probability transition ensities in ρ(θ), satisfying the constraine continuity equation (3b). For this reason, we call (5) the Lagrangian formalism of constraine ynamical OT. It is also worth noting that each movement of g(θ t, z) results a motion in ensity path ρ(θ t, x). The change of ensity path will ientify a potential function Φ(t, x) epening on θ t. In aiton, the cost functional in ynamical OT can involve general potential energies, such as linear potential energy: V(ρ) = V (x)ρ(x)x, an interaction energy: W(ρ) = w(x, y)ρ(x)ρ(y)xy.
6 LI, OSHER Here V (x) is a linear potential, an w(x, y) = w(y, x) is a symmetric interaction potential. If ρ(θ, x) ρ(θ), then V(ρ(θ, )) = V (x)ρ(θ, x)x = V (g(θ, z))µ(z)z an W(ρ(θ, )) = = =E z µ V (g(θ, z)), w(x, y)ρ(θ, x)ρ(θ, y)xy w(g(θ, z 1 ), g(θ, z 2 ))µ(z 1 )µ(z 2 )z 1 z 2 =E (z1,z 2 ) µ µw(g(θ, z 1 ), g(θ, z 2 )), where each secon equality in the above two formulas hol because of the constraine mapping relation (4) an µ µ represents an inepenent joint ensity function supporte on with marginals µ(z 1 ), µ(z 2 ). Similarly in proposition 1, we have { 1 [ ] c(θ 0, θ 1 ) = inf L(x, v(t, x))ρ(θ t, x)x V(ρ(θ t, )) W(ρ(θ t, )) t: v t, θ(0)=θ 0, θ(1)=θ 1 0 t ρ(θ t, x) + (ρ(θ t, x)v(t, x)) = 0 { 1 [ ] = inf L(x, D p H(x, Φ(t, x)))ρ(θ t, x)x V(ρ(θ t, )) W(ρ(θ t, )) t: Φ t, θ t θ(0)=θ 0, θ(1)=θ 1 0 t ρ(θ t, x) + (ρ(θ t, x)v(t, x)) = 0 { 1 = inf [E z µ L(g(θ t, z), Φ t, θ t, θ(0)=θ 0, θ(1)=θ 1 0 t g(θ t, z)) E z µ V (g(θ t, z)) E (z1,z 2 ) µ µw(g(θ t, z 1 ), g(θ t, z 2 ))]t: t g(θ t, z) = D p H(g(θ t, z), x Φ(t, g(θ t, z))) We next emonstrate an example of constraine ynamical OT problems. Example 1 (Constraine L 2 -Wasserstein istance). Let L(x, v) = v 2 an enote W2 (θ 0, θ 1 ) = c(θ 0, θ 1 ) 1 2, then { W2 (θ 0, θ 1 ) 2 1 = inf E z µ t g(θ t, z)) 2 t: t g(θ t, z) = Φ(t, g(θ t, z)), θ(0) = θ 0, θ(1) = θ 1. 0 Observe that (5) forms a geometric action energy function in parameter space Θ, in which the metric tensor can be extracte explicitly. In other wors, enote G(θ) R by θ T G(θ) θ = θ T E µ ( θ g(θ, z) θ g(θ, z) T ) θ,. with the constraint ( θ, θ g(θ, z)) = x Φ(g(θ, z)).
Here θ g(θ, z) R n, Φ is a potential function satisfying LAGRANGIAN FORMULATION 7 (ρ(θ, x) Φ(x)) = ( θ ρ(θ, x), θ), an G(θ) = E z µ ( θ g(θ, z) θ g(θ, z) T ) R is a semi-positive efinite matrix. References [1] S.-i. A. Natural Graient Works Efficiently in Learning. Neural Computation, 10(2):251 276, 1998. [2] S. Amari. Information Geometry an Its Applications. Number volume 194 in Applie mathematical sciences. Springer, Japan, 2016. [3] M. Arjovsky, S. Chintala, an L. Bottou. Wasserstein GAN. arxiv:1701.07875 [cs, stat], 2017. [4] N. Ay, J. Jost, H. V. Lê, an L. J. Schwachhöfer. Information Geometry. Ergebnisse er Mathematik un ihrer Grenzgebiete A @series of moern surveys in mathematics$l3. Folge, volume 64. Springer, Cham, 2017. [5] J.-D. Benamou an Y. Brenier. A computational flui mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 84(3):375 393, 2000. [6] Y. Chen an W. Li. Natural graient in Wasserstein statistical manifol. arxiv:1805.08380 [cs, math], 2018. [7] A. Haler an T. T. Georgiou. Graient Flows in Uncertainty Propagation an Filtering of Linear Gaussian Systems. arxiv:1704.00102 [cs, math], 2017. [8] J.-M. Lasry an P.-L. Lions. Mean fiel games. Japanese Journal of Mathematics, 2(1):229 260, 2007. [9] W. Li. Geometry of probability simplex via optimal transport. arxiv:1803.06360 [math], 2018. [10] W. Li an G. Montufar. Natural graient via optimal transport I. arxiv:1803.07033 [cs, math], 2018. [11] K. Moin. Geometry of matrix ecompositions seen through optimal transport an information geometry. Journal of Geometric Mechanics, 9(3):335 390, 2017. [12] E. Nelson. Derivation of the Schröinger Equation from Newtonian Mechanics. Physical Review, 150(4):1079 1085, 1966. [13] G. Pistone. Lagrangian Function on the Finite State Space Statistical Bunle. Entropy, 20(2):139, 2018. [14] C. Villani. Optimal Transport: Ol an New. Number 338 in Grunlehren er mathematischen Wissenschaften. Springer, Berlin, 2009. E-mail aress: wcli@math.ucla.eu E-mail aress: sjo@math.ucla.eu Department of Mathematics, University of California, Los Angeles.