WUCHEN LI AND STANLEY OSHER

Similar documents
6 General properties of an autonomous system of two first order ODE

Schrödinger s equation.

The total derivative. Chapter Lagrangian and Eulerian approaches

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Chapter 6: Energy-Momentum Tensors

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

Second order differentiation formula on RCD(K, N) spaces

Introduction to the Vlasov-Poisson system

Calculus of Variations

Topic 2.3: The Geometry of Derivatives of Vector Functions

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Tensors, Fields Pt. 1 and the Lie Bracket Pt. 1

Physics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Problem set 2: Solutions Math 207B, Winter 2016

The continuity equation

Euler equations for multiple integrals

Generalization of the persistent random walk to dimensions greater than 1

INDEPENDENT COMPONENT ANALYSIS VIA

Wasserstein GAN. Juho Lee. Jan 23, 2017

Darboux s theorem and symplectic geometry

Agmon Kolmogorov Inequalities on l 2 (Z d )

SOME RESULTS ON THE GEOMETRY OF MINKOWSKI PLANE. Bing Ye Wu

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

arxiv: v1 [math.mg] 10 Apr 2018

arxiv: v2 [math.dg] 16 Dec 2014

arxiv: v1 [math-ph] 5 May 2014

Partial Differential Equations

1 Heisenberg Representation

Lagrangian and Hamiltonian Dynamics

arxiv: v1 [math.dg] 30 May 2012

ANALYSIS OF A GENERAL FAMILY OF REGULARIZED NAVIER-STOKES AND MHD MODELS

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

On the Aloha throughput-fairness tradeoff

Section 2.7 Derivatives of powers of functions

Lagrangian and Hamiltonian Mechanics

Approximate Reduction of Dynamical Systems

Approximate reduction of dynamic systems

An extension of Alexandrov s theorem on second derivatives of convex functions

TRAJECTORY TRACKING FOR FULLY ACTUATED MECHANICAL SYSTEMS

Some Examples. Uniform motion. Poisson processes on the real line

Centrum voor Wiskunde en Informatica

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

APPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France

ON ISENTROPIC APPROXIMATIONS FOR COMPRESSIBLE EULER EQUATIONS

Asymptotic estimates on the time derivative of entropy on a Riemannian manifold

7.1 Support Vector Machine

Key words. Optimal mass transport, Wasserstein distance, gradient flow, Schrödinger bridge.

The Three-dimensional Schödinger Equation

Chapter 4. Electrostatics of Macroscopic Media

Table of Common Derivatives By David Abraham

Noether s theorem applied to classical electrodynamics

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Gradient flow of the Chapman-Rubinstein-Schatzman model for signed vortices

Chapter 2 Lagrangian Modeling

CONSERVATION PROPERTIES OF SMOOTHED PARTICLE HYDRODYNAMICS APPLIED TO THE SHALLOW WATER EQUATIONS

Monotonicity of facet numbers of random convex hulls

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

θ x = f ( x,t) could be written as

12.5. Differentiation of vectors. Introduction. Prerequisites. Learning Outcomes

Witten s Proof of Morse Inequalities

Tractability results for weighted Banach spaces of smooth functions

Convergence rates of moment-sum-of-squares hierarchies for optimal control problems

L p Theory for the Multidimensional Aggregation Equation

Lecture 2 - First order linear PDEs and PDEs from physics

Mean field games via probability manifold II

Existence of equilibria in articulated bearings in presence of cavity

Nöether s Theorem Under the Legendre Transform by Jonathan Herman

Numerical Integrator. Graphics

Least-Squares Regression on Sparse Spaces

REVERSIBILITY FOR DIFFUSIONS VIA QUASI-INVARIANCE. 1. Introduction We look at the problem of reversibility for operators of the form

Many problems in physics, engineering, and chemistry fall in a general class of equations of the form. d dx. d dx

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

On Kelvin-Voigt model and its generalizations

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lower bounds on Locality Sensitive Hashing

Lie symmetry and Mei conservation law of continuum system

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

GLOBAL SOLUTIONS FOR 2D COUPLED BURGERS-COMPLEX-GINZBURG-LANDAU EQUATIONS

On the Inclined Curves in Galilean 4-Space

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

LECTURE 1: BASIC CONCEPTS, PROBLEMS, AND EXAMPLES

Stable and compact finite difference schemes

An Eulerian level set method for partial differential equations on evolving surfaces

arxiv: v4 [math.pr] 27 Jul 2016

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

Monte Carlo Methods with Reduced Error

arxiv:nlin/ v1 [nlin.cd] 21 Mar 2002

Lecture 2: Correlated Topic Model

Lecture 1b. Differential operators and orthogonal coordinates. Partial derivatives. Divergence and divergence theorem. Gradient. A y. + A y y dy. 1b.

Separation of Variables

1 M3-4-5A16 Assessed Problems # 1: Do 4 out of 5 problems

Transcription:

CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability subset. In applie problems such as eep learning, the probability istribution is often generate by a parameterize mapping function. In this case, we erive a formulation for the constraine ynamical OT. 1. Introuction Dynamical optimal transport problems play vital roles in flui ynamics [12] an mean fiel games [8]. They provie a type of statistical istance an interesting ifferential structures in the set of probability ensities [14]. The full probability set is often intractable, when the imension of sample space is large. For this reason, parameterize probability subsets have been wiely consiere, especially in machine learning problems an information geometry [1, 2, 4, 7, 11, 13]. We are intereste in stuying ynamical OT problems over a parameterize probability subset. In this note, we follow a series of work foun in [6, 9,?, 10,?], an introuce general constraine ynamical OT problems in a parametrize probability subset. As in eep learning [3], the probability subset is often constructe by a parameterize mapping. In these cases, we emonstrate that the constraine ynamical OT problems exhibit simple variational structures. We arrange this note as follows: In section 2, we briefly review the ynamical OT in both Eulerian an Lagrangian coorinates 1. Using Eulerian coorinates, we propose the constraine ynamical OT over a parameterize probability subset. In section 3, we next erive an equivalent Lagrangian formulation for the constraine problem. Key wors an phrases. Dynamical optimal transport; Mean fiel games; Information geometry; Machine learning. The research is supporte by AFOSR MURI proposal number 18RT0073. 1 In flui ynamics, the Eulerian coorinates represent the evolution of probability ensity function of particles, while the Lagrangian coorinates escribe the motion of particles. In learning problems, the Eulerian coorinates naturally connect with the minimization problem in term of probability ensities, while Lagrangian coorinate refers to the variational problem formulate in samples, whose analog are particles. In learning, we moel problems in Eulerian, an compute them in Lagrangian. In other wors, we often write the objective function in term of ensities an compute them via samples. 1

2 LI, OSHER 2. Constraine ynamical OT In this section, we briefly review the ynamical OT in a full probability set via both Eulerian an Lagrangian formalisms. Using the Eulerian coorinates, we propose the ynamical OT in a parameterize probability subset. For the simplicity of exposition, { all our erivations assume smoothness. Consier ensities ρ 0, ρ 1 P + () = ρ(x) C (): ρ(x) > 0, ρ(x)x = 1, where is a n- imensional sample space. Here can be R n, or a convex compact region in R n with zero flux conitions or perioic bounary conitions. Dynamical OT stuies a variational problem in ensity space, known as the Benamou- Brenier formula [5]. Given a Lagrangian function L: T [0, + ) uner suitable conitions, consier C(ρ 0, ρ 1 ) := inf v t 1 0 EL(X t (ω), v(t, X t (ω)))t, where E is the expectation operator over realizations ω in event space an the infimum is taken over all vector fiels v t = v(t, ), such that (1a) Ẋ t (ω) = v(t, X t (ω)), X 0 ρ 0, X 1 ρ 1. (1b) Here X i ρ i represents that X i (ω) satisfies the probability ensity ρ i (x), for i = 0, 1. Equivalently, enote the ensity of particles X t (ω) at space x an time t by ρ(t, x). Then problem (1) refers to a variational problem in ensity space: 1 C(ρ 0, ρ 1 ) := inf v t 0 L(x, v(t, x))ρ(t, x)xt, where the infimum is taken over all Borel vector fiels v t = v(t, ), such that the ensity function ρ(t, x) satisfies the continuity equation: ρ(t, x) t (2a) + (ρ(t, x)v(t, x)) = 0, ρ(0, x) = ρ 0 (x), ρ(1, x) = ρ 1 (x). (2b) Here is the ivergence operator in. In the language of flui ynamics, problem (1) refers to the Lagrangian formalism, while problem (2) is the associate Eulerian formalism; see etails in [14]. The Lagrangian formalism focuses on the motion of each iniviual particles, while the Eulerian formalism escribes the global behavior of all particles. Here (1) an (2) are equivalent since they represent the same variational problem using ifferent coorinate systems. In aition, one often consiers L(x, v) = v p, p 1, where is the Eucliean norm. In this case, the optimal value of the variational problem efines a istance function in the set of probability space. Denote W p (ρ 0, ρ 1 ) := C(ρ 0, ρ 1 ) 1 p, where W p is calle the L p -Wasserstein istance. We next stuy the variational problem (2) constraine on a parameterize probability ensity set. In other wors, consier a parameter space Θ R with { P Θ = ρ(θ, x) C (): θ Θ, ρ(θ, x)x = 1, ρ(θ, x) > 0, x.

LAGRANGIAN FORMULATION 3 Here we assume that ρ: Θ P + () is an injective mapping 2. We introuce the constraine ynamical OT as follows: 1 c(θ 0, θ 1 ) := inf L(x, v(t, x))ρ(θ t, x)xt, (3a) v t 0 where θ t = θ(t) Θ, t [0, 1], is a path in parameter space, an the infimum is taken over the Borel vector fiels v t = v(t, ), such that the constraine continuity equation hols: t ρ(θ t, x) + (ρ(θ t, x)v(t, x)) = 0, θ 0, θ 1 are fixe. (3b) We notice that the infimum of problem (3) is taken over ensity paths lying in the parameterize probability set, i.e. ρ(θ t, ) ρ(θ). Here the changing ratio of ensity is reuce into a finite imensional irection, i.e. t ρ(θ t, x) = ( θt ρ(θ t, x), θt t ), where (, ) is an inner prouct in R. A natural question arises. Variational problem (2), together with its constraine problem (3), are written in Eulerian coorinates. They evolve unavoiably with the entire probability ensity functions. For practical reasons, can we fin the Lagrangian coorinates for constraine problem (3)? In other wors, what are analogs of (1) in ρ(θ)? We next emonstrate an answer to this question. We show that there is an expression for the motion of particles, whose ensity path moves accoring to the constraine continuity equation (3b). 3. Lagrangian formulations In this section, we show the main result of this note, that is the constraine ynamical OT (3) has a simple Lagrangian formulation in Proposition 1. Consier a parameterize mapping or implicit generative moel as follows. Given a input space R n 1, n 1 n, let g θ :, x = g(θ, z), for z. Here g θ is a mapping function epening on parameters θ Θ. Given realizations ω in event space, we assume that the ranom variable z(ω) satisfies a ensity function µ(z) P + (), an enote x(ω) = g(θ, z(ω)) satisfying the ensity function ρ(θ, x). This means that the map g θ pushes forwar µ(z) to ρ(θ, x), enote by ρ(θ, x) = g θ µ(z): f(g(θ, z))µ(z)z = f(x)ρ(θ, x)x, for any f Cc (). (4) In this case, the parameterize probability set is given as follows: { ρ(θ) = ρ(θ, x) C (): θ Θ, ρ(θ, x) = g θ µ(z). We next present the constraine ynamical OT in Lagrangian coorinates. We notice a fact that, the vector fiel in optimal ensity path of problem (2) or (3) satisfies v(t, x) = D p H(x, Φ(t, x)), 2 We abuse the notation of ρ. Notice that ρ(θ, x) is a probability istribution parameterize by θ Θ, while ρ(x) is a probability istribution function in the full probability set.

4 LI, OSHER where H(x, p) = sup pv L(x, v) v T x is the Hamiltonian function associate with L. Proposition 1 (Constraine ynamical OT in Lagrangian formulation). The constraine ynamical OT has the following formulation: { 1 c(θ 0, θ 1 ) = inf E z µ L(g(θ t, z), t g(θ t, z))t: 0 t g(θ t, z) = D p H(g(θ t, z), x Φ(t, g(θ t, z))), θ(0) = θ 0, θ(1) = θ 1, (5) where the infimum is taken over all feasible potential functions Φ: [0, 1] R an parameter paths θ : [0, 1] R. Proof. Denote t g(θ t, z) = v(t, g(θ, z)), with v(t, g(θ t, z)) = D p H(g(θ t, z), Φ(t, g(θ t, z))). We show that the probability ensity transition equation of g(θ t, z) satisfies the constraine continuity equation t ρ(θ t, x) + (ρ(θ t, x)v(t, x)) = 0, (6) an E z µ L(g(θ t, z), t g(θ t, z)) = L(x, v(t, x))ρ(θ t, x)x. (7) On the one han, consier f C c (), then t E z µf(g(θ t, z)) = f(g(θ t, z))µ(z)z t = f(x)ρ(θ t, x)x t = f(x) t ρ(θ t, x)x, (8) where the secon equality hols from the push forwar relation (4).

On the other han, consier LAGRANGIAN FORMULATION 5 t E z µf(g(θ t, z)) = lim E f(g(θ t+ t, z) f(g(θ t, z)) z µ t 0 t f(g(θ t+ t, z)) f(g(θ t, z)) = lim µ(z)z t 0 t = f(g(θ t, z)) t g(θ t, z)µ(z)z = f(g(θ t, z))v(t, g(θ t, z))µ(z)z = f(x)v(t, x)ρ(θ t, x)x = f(x) (v(t, x)ρ(θ t, x))x, where, are graient an ivergence operators w.r.t. x. The secon to last equality hols from the push forwar relation (4), an the last equality hols using the integration by parts w.r.t. x. Since (8) = (9) for any f C c (), we have proven (6). In aition, by the efinition of the push forwar operator (4), we have E z µ L(g(θ t, z), t g(θ t, z)) = L(g(θ t, z), v(t, g(θ t, z)))µ(z)z = L(x, v(t, x))ρ(θ t, x)x. Thus we prove (7). (9) It is interesting to compare variational problems (1) with (5). We can view g(θ t, z) as parameterize particles, whose ensity function is constraine in the parameterize probability set ρ(θ). Their motions result at the evolution of probability transition ensities in ρ(θ), satisfying the constraine continuity equation (3b). For this reason, we call (5) the Lagrangian formalism of constraine ynamical OT. It is also worth noting that each movement of g(θ t, z) results a motion in ensity path ρ(θ t, x). The change of ensity path will ientify a potential function Φ(t, x) epening on θ t. In aiton, the cost functional in ynamical OT can involve general potential energies, such as linear potential energy: V(ρ) = V (x)ρ(x)x, an interaction energy: W(ρ) = w(x, y)ρ(x)ρ(y)xy.

6 LI, OSHER Here V (x) is a linear potential, an w(x, y) = w(y, x) is a symmetric interaction potential. If ρ(θ, x) ρ(θ), then V(ρ(θ, )) = V (x)ρ(θ, x)x = V (g(θ, z))µ(z)z an W(ρ(θ, )) = = =E z µ V (g(θ, z)), w(x, y)ρ(θ, x)ρ(θ, y)xy w(g(θ, z 1 ), g(θ, z 2 ))µ(z 1 )µ(z 2 )z 1 z 2 =E (z1,z 2 ) µ µw(g(θ, z 1 ), g(θ, z 2 )), where each secon equality in the above two formulas hol because of the constraine mapping relation (4) an µ µ represents an inepenent joint ensity function supporte on with marginals µ(z 1 ), µ(z 2 ). Similarly in proposition 1, we have { 1 [ ] c(θ 0, θ 1 ) = inf L(x, v(t, x))ρ(θ t, x)x V(ρ(θ t, )) W(ρ(θ t, )) t: v t, θ(0)=θ 0, θ(1)=θ 1 0 t ρ(θ t, x) + (ρ(θ t, x)v(t, x)) = 0 { 1 [ ] = inf L(x, D p H(x, Φ(t, x)))ρ(θ t, x)x V(ρ(θ t, )) W(ρ(θ t, )) t: Φ t, θ t θ(0)=θ 0, θ(1)=θ 1 0 t ρ(θ t, x) + (ρ(θ t, x)v(t, x)) = 0 { 1 = inf [E z µ L(g(θ t, z), Φ t, θ t, θ(0)=θ 0, θ(1)=θ 1 0 t g(θ t, z)) E z µ V (g(θ t, z)) E (z1,z 2 ) µ µw(g(θ t, z 1 ), g(θ t, z 2 ))]t: t g(θ t, z) = D p H(g(θ t, z), x Φ(t, g(θ t, z))) We next emonstrate an example of constraine ynamical OT problems. Example 1 (Constraine L 2 -Wasserstein istance). Let L(x, v) = v 2 an enote W2 (θ 0, θ 1 ) = c(θ 0, θ 1 ) 1 2, then { W2 (θ 0, θ 1 ) 2 1 = inf E z µ t g(θ t, z)) 2 t: t g(θ t, z) = Φ(t, g(θ t, z)), θ(0) = θ 0, θ(1) = θ 1. 0 Observe that (5) forms a geometric action energy function in parameter space Θ, in which the metric tensor can be extracte explicitly. In other wors, enote G(θ) R by θ T G(θ) θ = θ T E µ ( θ g(θ, z) θ g(θ, z) T ) θ,. with the constraint ( θ, θ g(θ, z)) = x Φ(g(θ, z)).

Here θ g(θ, z) R n, Φ is a potential function satisfying LAGRANGIAN FORMULATION 7 (ρ(θ, x) Φ(x)) = ( θ ρ(θ, x), θ), an G(θ) = E z µ ( θ g(θ, z) θ g(θ, z) T ) R is a semi-positive efinite matrix. References [1] S.-i. A. Natural Graient Works Efficiently in Learning. Neural Computation, 10(2):251 276, 1998. [2] S. Amari. Information Geometry an Its Applications. Number volume 194 in Applie mathematical sciences. Springer, Japan, 2016. [3] M. Arjovsky, S. Chintala, an L. Bottou. Wasserstein GAN. arxiv:1701.07875 [cs, stat], 2017. [4] N. Ay, J. Jost, H. V. Lê, an L. J. Schwachhöfer. Information Geometry. Ergebnisse er Mathematik un ihrer Grenzgebiete A @series of moern surveys in mathematics$l3. Folge, volume 64. Springer, Cham, 2017. [5] J.-D. Benamou an Y. Brenier. A computational flui mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 84(3):375 393, 2000. [6] Y. Chen an W. Li. Natural graient in Wasserstein statistical manifol. arxiv:1805.08380 [cs, math], 2018. [7] A. Haler an T. T. Georgiou. Graient Flows in Uncertainty Propagation an Filtering of Linear Gaussian Systems. arxiv:1704.00102 [cs, math], 2017. [8] J.-M. Lasry an P.-L. Lions. Mean fiel games. Japanese Journal of Mathematics, 2(1):229 260, 2007. [9] W. Li. Geometry of probability simplex via optimal transport. arxiv:1803.06360 [math], 2018. [10] W. Li an G. Montufar. Natural graient via optimal transport I. arxiv:1803.07033 [cs, math], 2018. [11] K. Moin. Geometry of matrix ecompositions seen through optimal transport an information geometry. Journal of Geometric Mechanics, 9(3):335 390, 2017. [12] E. Nelson. Derivation of the Schröinger Equation from Newtonian Mechanics. Physical Review, 150(4):1079 1085, 1966. [13] G. Pistone. Lagrangian Function on the Finite State Space Statistical Bunle. Entropy, 20(2):139, 2018. [14] C. Villani. Optimal Transport: Ol an New. Number 338 in Grunlehren er mathematischen Wissenschaften. Springer, Berlin, 2009. E-mail aress: wcli@math.ucla.eu E-mail aress: sjo@math.ucla.eu Department of Mathematics, University of California, Los Angeles.