CONSTRUCTING OPTIMAL MAPS FOR MONGE'S TRANSPORT PROBLEM AS A LIMIT OF STRICTLY CONVEX COSTS

Similar documents
CONSTRUCTING OPTIMAL MAPS FOR MONGE S TRANSPORT PROBLEM AS A LIMIT OF STRICTLY CONVEX COSTS

Rearrangements and polar factorisation of countably degenerate functions G.R. Burton, School of Mathematical Sciences, University of Bath, Claverton D

Near convexity, metric convexity, and convexity

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Introduction to Real Analysis Alternative Chapter 1

Topological properties

Some Background Material


REGULARITY OF MONOTONE TRANSPORT MAPS BETWEEN UNBOUNDED DOMAINS

Metric Spaces and Topology

Optimal Transportation. Nonlinear Partial Differential Equations

The optimal partial transport problem

2. Function spaces and approximation

On a Class of Multidimensional Optimal Transportation Problems

Part III. 10 Topological Space Basics. Topological Spaces

Calculus of Variations

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

PROPERTY OF HALF{SPACES. July 25, Abstract

Real Analysis Notes. Thomas Goller

Implicit Functions, Curves and Surfaces

2 Garrett: `A Good Spectral Theorem' 1. von Neumann algebras, density theorem The commutant of a subring S of a ring R is S 0 = fr 2 R : rs = sr; 8s 2

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures

Continuity of convex functions in normed spaces

Chapter 2 Metric Spaces

Lebesgue Measure on R n

A STRATEGY FOR NON-STRICTLY CONVEX TRANSPORT COSTS AND THE EXAMPLE OF x y p IN R 2

Overview of normed linear spaces

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Real Analysis Problems

LECTURE 15: COMPLETENESS AND CONVEXITY

Mathematical Analysis Outline. William G. Faris

Tools from Lebesgue integration

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

01. Review of metric spaces and point-set topology. 1. Euclidean spaces

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

PARAMETER IDENTIFICATION IN THE FREQUENCY DOMAIN. H.T. Banks and Yun Wang. Center for Research in Scientic Computation

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

A VERY BRIEF REVIEW OF MEASURE THEORY

B. Appendix B. Topological vector spaces

Exercises to Applied Functional Analysis

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

Maths 212: Homework Solutions

The Erwin Schrodinger International Boltzmanngasse 9. Institute for Mathematical Physics A-1090 Wien, Austria

Centre for Mathematics and Its Applications The Australian National University Canberra, ACT 0200 Australia. 1. Introduction

Convex Analysis and Economic Theory AY Elementary properties of convex functions

CHAPTER 6. Differentiation

r( = f 2 L 2 (1.2) iku)! 0 as r!1: (1.3) It was shown in book [7] that if f is assumed to be the restriction of a function in C

Local semiconvexity of Kantorovich potentials on non-compact manifolds

Extreme points of compact convex sets

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

MATH 202B - Problem Set 5

3. (a) What is a simple function? What is an integrable function? How is f dµ defined? Define it first

On supporting hyperplanes to convex bodies

An introduction to some aspects of functional analysis

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS

Set, functions and Euclidean space. Seungjin Han

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

MA651 Topology. Lecture 10. Metric Spaces.

Real Analysis Qualifying Exam May 14th 2016

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

1 Directional Derivatives and Differentiability

58 Appendix 1 fundamental inconsistent equation (1) can be obtained as a linear combination of the two equations in (2). This clearly implies that the

Recent developments in elliptic partial differential equations of Monge Ampère type

Winter Lecture 10. Convexity and Concavity

Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differenti. equations.

Analysis Comprehensive Exam Questions Fall 2008

Course 212: Academic Year Section 1: Metric Spaces

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3

LECTURE 6. CONTINUOUS FUNCTIONS AND BASIC TOPOLOGICAL NOTIONS

Lebesgue Measure on R n

The Heine-Borel and Arzela-Ascoli Theorems

Exercise Solutions to Functional Analysis

Math 396. Bijectivity vs. isomorphism

McGill University Math 354: Honors Analysis 3

Real Analysis Chapter 4 Solutions Jonathan Conder

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

Convex Analysis Background

A Representation of Excessive Functions as Expected Suprema

56 4 Integration against rough paths

Analysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t

The Arzelà-Ascoli Theorem

Garrett: `Bernstein's analytic continuation of complex powers' 2 Let f be a polynomial in x 1 ; : : : ; x n with real coecients. For complex s, let f

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

ARE202A, Fall Contents

c Birkhauser Verlag, Basel 1997 GAFA Geometric And Functional Analysis

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

MINIMAL GRAPHS PART I: EXISTENCE OF LIPSCHITZ WEAK SOLUTIONS TO THE DIRICHLET PROBLEM WITH C 2 BOUNDARY DATA

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

REGULARITY FOR INFINITY HARMONIC FUNCTIONS IN TWO DIMENSIONS

MATHS 730 FC Lecture Notes March 5, Introduction

The Lebesgue Integral

Examples of Dual Spaces from Measure Theory

REAL AND COMPLEX ANALYSIS

Transcription:

JOURNAL OF THE AMERICAN MATHEMATICAL SOCIETY Volume 15, Number 1, Pages 1{6 S 0894-0347(01)00376-9 Article electronically published on July 31, 001 CONSTRUCTING OPTIMAL MAPS FOR MONGE'S TRANSPORT PROBLEM AS A LIMIT OF STRICTLY CONVEX COSTS LUIS A. CAFFARELLI, MIKHAIL FELDMAN, AND ROBERT J. MCCANN The Monge-Kantorovich problem is to move one distribution of mass onto another as eciently as possible, where Monge's original criterion for eciency [19] was to minimize the average distance transported. Subsequently studied by many authors, it was not until 1976 that Sudakov showed solutions to be realized in the original sense of Monge, i.e., as mappings from R n to R n [3]. A second proof of this existence result formed the subject of a recent monograph by Evans and Gangbo [7], who avoided Sudakov's measure decomposition results by using a partial dierential equations approach. In the present manuscript, we give a third existence proof for optimal mappings, which has some advantages (and disadvantages) relative to existing approaches: it requires no continuity or separation of the mass distributions, yet our explicit construction yields more geometrical control than the abstract method of Sudakov. (Indeed, this control turns out to be essential for addressing a gap which has recently surfaced in Sudakov's approach to the problem in dimensions n 3; see the remarks at the end of this section.) It is also shorter and more exible than either, and can be adapted to handle transportation on Riemannian manifolds or around obstacles, as we plan to show in a subsequent work [13]. The problem considered here is the classical one: Problem 1 (Monge). Fix a norm d(x; y) = kx yk on R n, and two densities non-negative Borel functions f +, f L 1 (R n ) satisfying the mass balance condition (1) f + (x)dx = R n f (y)dy: R n In the set A( + ; ) of Borel maps r : R n! R n which push the measure d + = f + (x)dx forward to d = f (y)dy, nd a map s which minimizes the cost functional () I[r] := kr(x) xkf + (x)dx: R n Received by the editors March 15, 000. 000 Mathematics Subject Classication. Primary 49Q0; Secondary 6B10, 8A50, 58E17, 90B06. Key words and phrases. Monge-Kantorovich mass transportation, resource allocation, optimal map, optimal coupling, innite dimensional linear programming, dual problem, Wasserstein distance. This research was supported by grants DMS 9714758, 96376, 9970577, and 96997 of the US National Science Foundation, and grant 17006-99 RGPIN of the Natural Sciences and Engineering Research Council of Canada. The hospitality of the Max-Planck Institutes at Bonn and Leipzig are gratefully acknowledged by the second and third authors respectively. 1 c001 American Mathematical Society

L. A. CAFFARELLI, M. FELDMAN, AND R. J. MCCANN Here r A( + ; ) is sometimes denoted by r # + =, and means merely that (3) (r(x))f + (x)dx = (y)f (y)dy R R n n holds for each continuous test function on R n. Though the norm kx yk need not be Euclidean, throughout the present manuscript we assume there exist constants ; > 0 such that all x; y R n satisfy the uniform smoothness and convexity estimates: (4) kyk 1 kx + yk kxk + 1 kx yk kyk : The estimates (4) assert some uniform convexity and smoothness [3] of the unit ball; they are certainly satised if, e.g., the unit sphere kxk = 1 is a C surface in R n with positive principal curvatures. In particular, = = 1 makes (4) an identity in the Euclidean case. For a further discussion of Monge's problem, its history, and applications, we refer the reader to Evans [5], Evans and Gangbo [7], Gangbo and McCann [15] or Rachev and Ruschendorf [0]. Part of the diculty of this problem is the degeneracy which results from failure of the norm to be strictly convex (radially). Even in the simplest one-dimensional examples this leads to non-uniqueness of the minimizing map. By contrast, when the transportation cost function ky xk is replaced by a strictly convex function such as ky xk p with p > 1, the problem simplies considerably, and following ideas of Brenier and others it is possible to show that a unique map solves the problem and possesses nice measurability properties in both the Euclidean [4], [15] and Riemannian [18] settings. In this paper, our key idea for resolving this degeneracy is to rst nd a change of coordinates which adapts the problem's local geometry so that all transport directions become parallel, and then solve these one-dimensional transportation problems separately before invoking Fubini's theorem to complete the proof. The map we construct in this way might, in principle, be recovered in the p! 1 limit from the unique maps solving the p > 1 problems. Although we don't carry out this limit directly, we do use structural features of the optimal maps for p > 1 to facilitate several aspects of the proof. This distinguishes our solution from that of Evans and Gangbo, as illustrated by the book-shifting example f + = [0;n] and f = [1;n+1] on the line R 1 [15]. For this example, the map we construct is the translation s(x) = x+1, whereas Evans and Gangbo would leave the mass common to f + and f in its place a priori, obtaining the map s(x) = x on x [1; n] and s(x) = x + n on [0; 1] as a result. We anticipate that the ability to deal with overlapping densities f + and f will be signicant in applications. We also point out that a sequel shows the map constructed below is the only optimal map to preserve the ordering of pairs of points with collinear images [1]. We now give a heuristic outline of our existence proof. Following previous authors, we begin by solving a dual problem whose solution denes the set of transport rays, according to the terminology of Evans and Gangbo [7]. These rays are determined by the property that the Lipschitz potential u : R n! R from the dual problem decreases along them with maximum admissible rate. As we show below, the optimal mapping s takes each transport ray into itself. We therefore restrict the measures + and to each ray, so that mass balance holds for the restrictions, and solve a transportation problem on each ray. These one-dimensional problems are easy to solve. Thus we get an optimal map on each ray, and as the result a map

CONSTRUCTING OPTIMAL MAPS 3 from R n into R n. We show then that this map pushes + forward to, and is optimal. The most delicate step in this procedure involves restricting the measures to rays, and it is here that our approach diverges from Sudakov's. Instead of building on the measure decomposition results of Halmos [16] or Rokhlin [1], we seek a local change of variables in R n so that the new coordinate x n measures distance along each ray, while the remaining n 1 coordinates vary across nearby rays. For the Euclidean norm on R n the directions of rays are given by the gradient of Monge's potential u, and thus it is natural to use level sets of u to parametrize rays, i.e., the variables x 1 ; : : :; x n 1 will be coordinates on a xed level set of u. This can also be adapted to more general norms, if one denes the gradient of u using the appropriate (Finsler) identication of vectors with one-forms. But we also need certain properties of this change of variables in order to be able to express + and in the new coordinates: Indeed, expressing these measures is tantamount to changing variables under the integral, and the change of variables must be Lipschitz continuous to apply the Area formula. However, the typical pattern of rays is too complicated for us to achieve this globally. We therefore decompose the set of all rays into a countable collection of special subsets, chosen so that the rays enjoy a more \regular" structure within each subset while the mass of + still balances, and perform a Lipschitz change of variables on each subset separately. Thus the Lipschitz control on directions of rays given by Lemma 16 is absolutely crucial to our proof. The estimate which provides this control is a restriction on the geometry of quadrilaterals in a smooth, uniformly convex Banach space; established in Lemma 14, this estimate holds some independent interest (cf. Federer [10, x4.8(8)] and Feldman [11, Appendix A]). The remainder of this paper is organized as follows. In the rst section we recall the general duality theory for Monge-type problems introduced by Kantorovich [17], and the construction of optimal maps for transportation costs given by strictly convex functions instead of a norm [4], [15]. The Kantorovich dual problem is solved by taking a limit of such costs, and the section concludes with a criterion for optimality. It is followed by a section which introduces the transport rays and geometry dictated by the Kantorovich solution and criterion for optimality. Several observations by Evans and Gangbo are summarized here, followed by our key new estimate giving Lipschitz control on the directions of transport rays. In the third section we construct the local changes of variables which parallelize nearby transport rays, while the fourth section veries that the traces of f + and f weighted by a Jacobian factor accounting for the change of variables are balanced on each individual ray. Finally, these ingredients are combined in Section 5 to give a proof of our main theorem by constructing a map solving Monge's problem: Theorem 1 (Existence of Optimal Maps). Fix a norm on R n satisfying the uniform smoothness and convexity conditions (4), and two L 1 (R n ) densities f + ; f 0 with compact support and the same total mass (1). Then there exists a Borel map s : R n! R n which solves Monge's problem, in the sense that it minimizes the average distance () transported among all maps pushing f + forward to f (3). Remarks added in revision. After the submission of this manuscript, the authors learned of several signicant developments. Foremost among them is the concurrent but independent discovery of a existence result for optimal maps in Monge's problem by Trudinger and Wang [4]. Although very similar to our approach, the argument there is streamlined somewhat by their decision to focus exclusively on

4 L. A. CAFFARELLI, M. FELDMAN, AND R. J. MCCANN the Euclidean norm. Lecture notes subsequently released by Ambrosio [] contain an excellent summary of progress on Monge's problem cast into the framework of geometric measure theory; they include a derivation for existence and uniqueness of a transport density corresponding to f L 1 (R n ) which parallels certain results of Feldman and McCann [1]. Ambrosio's notes also highlight a logical gap in the solution of Monge's problem proposed by Sudakov. Without Lipschitz control on ray directions, it is impossible to know that the conditional restriction of f yields an absolutely continuous measure along almost every ray, as required for Sudakov's proof. In two dimensions, disjointness prevents nearby rays from turning too much without bumping into each other, so Lipschitz control is automatic and the gap can be bridged (as long as the norm has a strictly convex unit ball). But a counterexample in R 3 due to Alberti, Kircheim, and Preiss [1] shows an uncountable union of disjoint segments can be constructed, whose midpoints form the support of an absolutely continuous probability measure on R 3. The restriction of this measure to each of the segments yields a Dirac mass at its midpoint, violating the absolute continuity claimed by Sudakov. In this context the geometrical control provided by Lemma 16 is required to preclude such a collection of segments from forming transport rays in Monge's problem. Thus it would seem that Evans-Gangbo [7] contains the rst complete proof of existence for optimal maps between Lipschitz densities f with disjoint support, while the present manuscript and Trudinger-Wang [4] complete the rst proofs for more general f L 1 (R n ). Note that all complete proofs require a Euclidean ball, or at least the uniform smoothness and convexity hypothesis (4), which Sudakov explicitly eschews [3, p. 164]. 1. Duality in the limit of strictly convex costs In this section we recall a problem formulated by Kantorovich as a dual to Monge's problem. We construct its solution, and extract properties germane to our purposes. Consider R n metrized by a norm k k satisfying the uniform smoothness and convexity conditions (4), and denote the associated distance by d(x; y) := kx yk. Then the problem asserted by Kantorovich [17] to be dual to Monge's problem is formulated as follows. Let Lip 1 (X ; d) denote the set of functions on X R n which are Lipschitz continuous with Lipschitz constant no greater than one; thus Lip 1 (R n ; d) = u : R n! R 1 j ju(x) u(y)j d(x; y) for any x; y R n : Problem (Kantorovich). For f + ; f L 1 (R n ) from Monge's Problem 1, maximize ^K[v] on Lip1 (R n ; d), where ^K[v] := (vf + vf ) dx: R n To solve the Monge and Kantorovich problems, we consider a second pair of dual problems in which the metric d(x; y) is replaced by a more general transportation cost function c " (x; y) on R n R n. The Monge problem analogous to (1) then becomes: Problem 3 (Primal). Fix two Borel densities f + ; f 0 in L 1 (R n ) with compact support satisfying the mass balance condition (1). Among Borel maps r

CONSTRUCTING OPTIMAL MAPS 5 A( + ; ) which push the measure d + = f + (x)dx forward to d = f (y)dy as in (3), nd a map s : R n! R n which minimizes the total transportation cost (5) I " [r] := c " (x; r(x))f + (x)dx: R n The corresponding dual problem is: Problem 4 (Dual). Take f +, f L 1 (R n ) as in Problem 3. Denote the support of f + by X and of f by Y. Among all pairs of continuous functions '; in (6) J " (X ; Y) := f('; ) C(X ) C(Y) j nd a pair (' " ; " ) minimizing the functional (7) K('; ) := 'f + dx + '(x) + (y) c " (x; y) on X Yg X Y f dy: The duality assertion I " [s] = K(' " ; " ) which relates these two problems holds rather generally; see Rachev and Ruschendorf [0]. However, the Dual Problem 4 takes a fundamentally dierent form than the Kantorovich problem, due to the fact that the cost c " (x; y) need no longer satisfy a triangle inequality. This generalization is useful, since it permits us to replace the distance function d(x; y) = kx yk by a strictly convex cost function (8) c " (x; y) := h " (x y) = kx yk 1+" ; for which existence, uniqueness, and a characterization of optimal maps in Monge's problem can be found in Caarelli [4] and Gangbo and McCann [14], [15]. Noting that h (x) is C 1;" (R n ) smooth and strictly convex from Lemma 11 below, we recall the relevant results as follows: Theorem (Duality and Optimal Maps for Strictly Convex Costs [4], [14]). Take f + ; f L 1 (R n ) and denote X := spt f + and Y := spt f as in Problem 4. If the transportation cost c " (x; y) satises (8) and (4), then for " > 0: (i). Some pair (' " ; " ) minimizing K('; ) on J " (X ; Y) in the dual problem satises (9) (10) ' " (x) = sup( c " (x; y) " (y)); yy "(y) = sup( c " (x; y) ' " (x)): xx (ii). The function ' " is Lipschitz on X (as " is on Y), with Lipschitz constant dominated by the Lipschitz constant of c " (x; y) on X Y. (iii). For a.e. x X there exists a unique y Y such that (11) ' " (x) + " (y) = c " (x; y): (iv). Dene the mapping s " : X! Y by assigning to a.e. x X the unique y Y for which (11) holds. Then s " pushes the measure d + = f + (x)dx forward to d = f (y)dy and is the unique minimizer for the primal Problem 3. Our rst goal is to extract a pair of functions minimizing K('; ) on J 0 (X ; Y) from the limit "! 0 of this theorem. This follows from a simple compactness result:

6 L. A. CAFFARELLI, M. FELDMAN, AND R. J. MCCANN Proposition 3 (Limit of Minimizing Pairs). For some sequence " j > 0 which tends to zero, the Dual Problem 4 admits a sequence of pairs (' "j ; "j ) which minimize K('; ) on J "j (X ; Y) and converge uniformly on the compact sets X and Y respectively to limits ' "j! ' 0 and "j! 0 as j! 1. The limit functions ' 0 ; 0 minimize K('; ) on J 0 (X ; Y) and satisfy (1) (13) ' 0 (x) = sup( kx yk 0 (y)); yy 0(y) = sup( kx yk ' 0 (x)): xx Proof. Fix x 0 X and observe that K('; ) = K(' A; + A) for each A R 1 according to the mass balance condition (1). Thus any pair (' " ; " ) minimizing K('; ) on J " (X ; Y) may be shifted by A = ' " (x 0 ) to ensure ' " (x 0 ) = 0. Now X and Y are compact, so for " (0; 1) the costs c " (x; y) = kx yk 1+" form an equi-lipschitz family on X Y. The minimizing functions ' " and " in Theorem (ii) also form equi-lipschitz families on X and Y respectively. Moreover ' " (x 0 ) = 0, so the functions ' " are uniformly bounded on X. Also, jc " (x; y)j C for all (x; y; ") X Y (0; 1), implying a uniform bound on the " in (9). The Ascoli-Arzela theorem then yields a subsequence " j! 0 such that ' "j and "j converge uniformly on X and Y respectively to ' 0 C(X ) and 0 C(Y); (1){ (13) follow from (9){(10) and imply that (' 0 ; 0 ) J 0 (X ; Y). It remains to show that (' 0 ; 0 ) minimizes K('; ) on J 0 (X ; Y). For any other pair ( ~'; ~ ) J 0 (X ; Y) and " > 0, dene ~' " ~' and ~ " (y) = ~ (y) + max xx [ c "(x; y) + d(x; y)]: Compactness of X yields ~ " C(Y), while ( ~' " ; ~ ") J " (X ; Y) from their definition and (6). Moreover, ~ "! ~ uniformly on Y as "! 0. For each j these competitors satisfy K(' "j ; "j ) K( ~' "j ; ~ " j ). Uniform convergence yields K(' 0 ; 0 ) K( ~'; ~ ) in the limit j! 1, so the proposition is proved. Next we demonstrate equivalence of the Dual Problem 4 in the case " = 0 to the Kantorovich Problem via the triangle inequality; see (14) and (17) especially. Proposition 4 (Lipschitz Maximizer). Suppose (' 0 ; 0 ) satisfy (1){(13) and minimize K('; ) on J 0 (X ; Y). Then there exists u Lip 1 (R n ; d) such that (14) u = ' 0 on X ; u = 0 on Y: Moreover, u maximizes ^K[v] on Lip1 (R n ; d) and satises (15) u(x) = min (u(y) + kx yk) yy for any x X ; u(y) = max (u(x) kx yk) xx for any y Y: Proof. Extend ' 0 ; 0 to the whole space R n using the right-hand sides of (1){(13). We show rst that ' 0 ; 0 Lip 1 (R n ; d). Indeed, let x 1 ; x R n. Continuity of 0 on the compact set Y yields a point y 1 Y where the supremum (1) is attained: ' 0 (x 1 ) = kx 1 y 1 k 0 (y 1 ). Also (1) implies ' 0 (x ) kx y 1 k 0 (y 1 ). Thus ' 0 (x 1 ) ' 0 (x ) kx 1 y 1 k + kx y 1 k kx 1 x k by the triangle inequality. Thus ' 0 Lip 1 (R n ; d), and 0 Lip 1 (R n ; d) similarly.

CONSTRUCTING OPTIMAL MAPS 7 (16) Next we show that ' 0 + 0 = 0 on X. For any x X, (13) yields ' 0 (x) + 0 (x) 0 on X. Suppose for some z X a strict inequality holds: ' 0 (z) + 0 (z) > 0. By (1){(13) and continuity of ' 0 and 0, there exist x X and y Y such that ' 0 (z) = kz yk 0 (y); 0(z) = kz xk ' 0 (x): Combined with ' 0 (z) + 0 (z) > 0 and (' 0 ; 0 ) J 0 (X ; Y) this implies kz yk + kz xk = ' 0 (z) 0 (z) ' 0 (x) 0 (y) < ' 0 (x) 0 (y) kx yk; contradicting the triangle inequality. Thus ' 0 + 0 0 on X. In conjunction with (16) this yields ' 0 + 0 = 0 on X as desired. Thus, denoting u = 0 in R n we have shown u Lip 1 (R n ; d) and both parts of (14). Also, (15) follows directly from (1){(13). It remains to prove u maximizes ^K[v] in the Kantorovich Problem. Note that ^K[u] = K[' 0 ; 0 ] (17) by (14). Let v Lip 1 (R n ; d). Then the pair ^'; ^ dened by ^' = v on X and ^ = v on Y belongs to the set J 0 (X ; Y) dened in (6); indeed, for (x; y) X Y we have Now ^'(x) + ^(y) = v(x) + v(y) kx yk: ^K[u] = K[' 0 ; 0 ] K[ ^'; ^] = ^K[v]; where the last equality follows from the denition of ^'; ^, and the proposition is proved. Denition 5 (Kantorovich Potentials). Any function u which maximizes ^K[v] on Lip 1 (R n ; d) may be referred to as a Kantorovich potential. Such potentials exist by Propositions 3 and 4. However, the Kantorovich potentials obtained in this way via a limit (' 0 ; 0 ) of pairs from Theorem (i) have additional virtues ((14){ (15)). We call such u a limiting Kantorovich potential and exploit its existence hereafter. Finally, we discuss the connection between the primal and dual problems. For the strictly convex costs (8) this connection is given by Theorem, which shows how the primal problem can be solved using a solution to the dual problem. However, for the non-strictly cost c 0 (x; y) the uniqueness assertion of Theorem (iii) would fail, so the corresponding map is not well dened: its direction is clear, but its distance ambiguous. Indeed, when a minimizing pair (' 0 ; 0 ) for K('; ) satises (1){(13) and ' 0 (x) + 0 (y) = kx yk holds for some (x; y) X Y, we shall see ' 0 (x) + 0 (z) = kx zk for all z [x; y] \ Y, meaning all z = tx + (1 t)y Y with t [0; 1]. The next lemma exhibits the connection between the primal and dual problems for the cost function c 0 (x; y) = kx yk. It shows in particular that to obtain an

8 L. A. CAFFARELLI, M. FELDMAN, AND R. J. MCCANN optimal map in the primal problem, it is sucient to start from a Kantorovich potential u and construct any admissible map consistent with (18). The rest of this paper is devoted to carrying out this program on R n, suitably normed. Lemma 6 (Dual Criteria for Optimality). Fix u Lip 1 (R n ; d) and let s : R n! R n be a mapping which pushes + forward to. If (18) u(x) u(s(x)) = kx s(x)k for + a.e. x X ; then: (i). u is a Kantorovich potential maximizing Problem. (ii). s is an optimal map in Problem 1. (iii). The inmum I[s] in Problem 1 is equal to the supremum ^K[u] in Problem. (iv). Every optimal map ^s and Kantorovich potential ^u also satisfy (18). Proof. For any map r : R n! R n pushing forward + to + and v Lip 1 (R n ; d) we compute: I[r] = kx r(x)kd + (x) R n (19) [v(x) v(r(x))]d + (x) R n = v(x)d + (x) v(y)d (y) R R n n = ^K[v]; using (3). Thus the minimum value of I[r] on A( + ; ) is at least as large as the maximum of ^K[v] on Lip1 (R n ; d). On the other hand, our hypothesis (18) produces a case of equality I[s] = ^K[u] in (19). This implies the assertions (i) ^K[u] is a maximum; (ii) I[s] is a minimum; and (iii) I[s] = ^K[u] of the lemma. Now let r A( + ; ) and v Lip 1 (R n ; d) be any other optimal map and Kantorovich potential. Then I[r] = I[s] and ^K[v] = ^K[u] combine with (iii) to yield I[r] = ^K[v]. But this implies a pointwise equality + almost everywhere in (19), so the proofs of assertion (iv) and hence the lemma are complete.. Transport rays and their geometry The preceding section reduced the problem of nding an optimal map in Monge's problem to constructing an admissible map which also satises (18). We carry out this program on R n metrized by the norm d(x; y) = kx yk. Our starting point is a Kantorovich potential u Lip 1 (R n ; k k). In this section, we study the geometric meaning of condition (18), and introduce the transport rays and transport sets which are ultimately used to construct an optimal map. We study the properties of transport rays, in particular proving a Lipschitz estimate for how much the direction of nearby rays can vary if none of the rays are too short. The underlying idea is that smoothness and uniform convexity of the norm ball (4) impose geometrical constraints on each quadrilateral whose opposite sides are formed by transport rays. This estimate is much in the spirit of Federer's theorem on Euclidean distance functions [10, x4.8(8)]; see also Feldman [11, Appendix A] Fix two measures + and dened by non-negative densities f + ; f L 1 (R n ) satisfying the mass balance condition (1). Assume that + and have compact supports, denoted by X and Y R n respectively. Through xx{5, we x a limiting

CONSTRUCTING OPTIMAL MAPS 9 Kantorovich potential u a maximizer in Problem obtained from a limit of solutions to dual problems with strictly convex costs c " (x; y) = kx yk 1+". Such a potential exists and satises (15) by Propositions 3 and 4 and Denition 5. Note that u has Lipschitz constant one with respect to the distance d(x; y) = kx yk. The derivative of any function ' : R n! R 1 at x R n viewed as a linear functional on the tangent space is denoted by D'(x) (R n ). Since we want to investigate the geometrical implications of (18) for u, suppose x X and y Y satisfy From the Lipschitz constraint (0) u(x) u(y) = kx yk: ju(z 1 ) u(z )j kz 1 z k for any z 1 ; z R n ; it follows that on the segment connecting x and y the function u is ane and decreasing with the maximum rate compatible with (0). We will call maximal segments [x; y] having these properties the transport rays. More precisely: Denition 7 (Transport Rays). A transport ray R is a segment with endpoints a, b R n such that (i). a X, b Y, a 6= b; (ii). u(a) u(b) = ka bk; (iii). Maximality: for any t > 0 such that a t := a + t(a b) X there holds ju(a t ) u(b)j < k a t bk; and for any t > 0 such that b t := b + t(b a) Y there holds ju(b t ) u(a)j < k b t ak: We call the points a and b the upper and lower ends of R, respectively. Since u(a) u(b) = ka bk, it follows from (0) that any point z R satises (1) u(z) = u(b) + kz bk = u(a) ka zk: Denition 8 (Rays of Length ero). Denote by T 1 the set of all points which lie on transport rays. Dene a complementary set T 0, called the rays of length zero, by T 0 := fz X \ Y : ju(z) u(z 0 )j < kz z 0 k for any z 0 X [ Y; z 0 6= zg: From these two denitions and the property (15) of u we immediately infer the following lemma, whose obvious proof is omitted. Lemma 9 (Data is Supported Only on Transport Rays). X [ Y T 0 [ T 1. To study the properties of rays, let us call a point z R n an interior point of a segment [a; b], where a; b R n, if z = ta + (1 t)b for some 0 < t < 1. We denote by [a; b] 0 the set of interior points of [a; b]. The basic observation which goes back to Monge is that transport rays do not cross. Lemma 10 (Transport Rays Are Disjoint). Let two transport rays R 1 6= R share a common point c. Then R 1 \ R = fcg and c is either the upper end of both rays, or the lower end of both rays. In particular, an interior point of a transport ray does not lie on any other transport ray.

10 L. A. CAFFARELLI, M. FELDMAN, AND R. J. MCCANN Proof. First note the strict convexity of the unit ball kxk 1 asserted in Lemma 11 implies that equality kx yk + ky zk = kx zk holds if and only if y lies on the segment [x; z]. Since R 1 6= R share the point c, they cannot be collinear; otherwise (1) and the maximality part of Denition 7 would force R 1 = R. Thus the two rays can only intersect in a single point: R 1 \R = fcg. It remains to prove either c = a 1 = a or c = b 1 = b, where a k denotes the upper end and b k the lower end of R k, k = 1;. We shall assume c 6= b and argue that this forces c = a 1. It then follows that c 6= b 1 which by symmetry forces c = a to complete the proof. The other possibility c 6= a is handled similarly, leading to the conclusion that c = b 1 = b must be the lower end of both rays. Assuming c 6= b means b = R 1. By (1) thus u(c) = u(b ) + kc b k; u(c) = u(a 1 ) ka 1 ck; u(a 1 ) u(b ) = ka 1 ck + kc b k ka 1 b k: Strict inequality would violate the Lipschitz condition (0). Thus equality must hold, meaning c lies in the segment [a 1 ; b ] as well as R 1 = [a 1 ; b 1 ]. Since b 6 R 1, these two segments, like the two rays, are not collinear. Their sole intersection point is a 1, hence c = a 1. By our above remarks this completes the proof: c 6= b 1 hence c = a is the upper end of both rays. Denoting the norm by N(x) := kxk and its square by F (x) := kxk, the next lemma highlights some smoothness and strict convexity which follow from (4). From (3) it is clear that the strict convexity is uniform over the sphere @B. Lemma 11 (Norm Smoothness and Strict Convexity). If the norm N(x) := kxk satises (4), then F (x) := kxk is of smoothness class C 1;1 (R n ). Moreover, the unit ball B := fx R n j N(x) 1g is strictly convex, and () jdn(x) yj < 1; DN(x) x = 1 for all y 6= x with kxk = kyk = 1; where DN(x) y denotes the pairing of DN(x) (R n ) and y R n. Proof. Every norm N(x) is convex throughout R n and bounded by some multiple of the Euclidean norm: N(y) Ljyj. Thus both N(x) and its square F (x) = kxk are continuous functions. The midpoint convexity condition (4) therefore implies convexity of F (x). We shall use the opposite inequality to conclude concavity of g(x) := F (x) L jxj ; indeed, for x; y R n, g satises the midpoint estimate g(x + y) + g(x y) g(x) = F (x + y) + F (x y) F (x) L jyj F (y) L jyj 0: Now recall that the distributional second derivative of a convex function is a nonnegative denite matrix of Radon measures D ijf (x) [8, x6.3]. Concavity of g implies a pointwise bound on this matrix: 0 D F (x) L I. Thus F belongs to the Sobolev space W ;1 (R n ) and is dierentiable Lipschitz continuously [6, x5.8.{3]: F C 1;1 (R n ).

CONSTRUCTING OPTIMAL MAPS 11 To address (), rst observe strict convexity of the closed unit ball: given two distinct points a; b B, their midpoint must lie in the interior of B according to (4): a + b + a b kak + kbk (3) 1 with > 0. Now the triangle inequality implies (4) DN(x) y := kx + tyk kxk lim t!0 t kyk 1 while homogeneity yields DN(x) x = kxk = 1. Thus the supporting hyperplane to the ball at x @B consists of those z R n satisfying DN(x) z = 1. Strict convexity prevents this hyperplane from touching the ball at more than one point, whence (4) can be sharpened to DN(x) y < 1 for y B n fxg. Similarly, DN(x) ( y) < 1 for y B n fxg, which concludes the proof of (). Lemma 1 (Dierentiability of Kantorovich Potential Along Rays). If z 0 lies in the relative interior of some transport ray R, then u is dierentiable at z 0. Indeed, setting e := (a b)=ka bk where a; b are the upper and lower ends of R yields: jdu(z 0 )yj 1 for all kyk = 1, with equality if and only if y = e. Remark 13. This proof requires a modication of the Euclidean case dealt with by Evans and Gangbo [7, Lemma 4.1]. Proof of Lemma 1. Choose z 0 in the interior of R. By (1), for some small r 0 > 0, we have Rescale u by setting u(z 0 + te) = u(z 0 ) + t on r 0 t r 0 : u r (z) = u(z 0 + rz) u(z 0 ) for 0 < r r 0 : r Then u r satises the same Lipschitz condition (0) as u but is centered at u r (0) = 0. Hence for some subsequence r k! 0 we have u rk! v where the convergence is uniform on every compact subset of R n. Clearly v Lip 1 (R n ; k k) and (5) We shall now show linearity (6) v(te) = t for all t R 1 : v(z) = DN(e) z for all z R n ; by exploiting the Lipschitz condition v inherits from u together with the rst order Taylor expansion of N(e + z=t) := ke + z=tk around e guaranteed by Lemma 11: v(te) v(z) t kte zk = kek DN(e) z jtj t + o(1 t ): Subtracting v(te)=t = kek from both sides, the two limits t! 1 of this inequality combine to yield (6). We conclude that lim r!1 u(z 0 + rz) u(z 0 ) r = DN(e) z uniformly for z B 1 (0):

1 L. A. CAFFARELLI, M. FELDMAN, AND R. J. MCCANN This implies that u is dierentiable at z 0, with Du(z 0 ) = DN(e). The remaining assertions of Lemma 1 follow directly from (). The next lemma exploits uniform convexity and smoothness of the norm to produce a quantitative estimate of how far away any two rays must be from crossing. When F (x) = kxk, it states that the sums of the squares of the diagonals AC and BD of any quadrilateral ABCD are controlled by the distance between the midpoints of the shorter pair (in least squares sense) of opposite sides. In particular, no quadrilateral can be folded in such a way that the midpoints of these two sides are brought close together unless both pairs of opposite corners are also driven together with a particular rate. In the Euclidean or Hilbert space setting, the rate constant (1 + =)=(1 + ) = 1 given by the polarization identity is seen to be sharp by folding up a square. Alternately, the estimate (8) can be interpreted as a reverse form of the triangle inequality, which holds for vectors that are suciently aligned. Lemma 14 (Twisted Quadrilateral Non-crossing Estimate). Let F : V! R be any function on a vector space V, uniformly smooth and convex enough that for some ; > 0 and all x; y V the following inequalities hold: (7) F (y) 1 F (x + y) F (x) + 1 F (x y) F (y): If four points a; b; c; d V satisfy F (a b) + F (c d) F (a d) + F (c b), then a c b d F + F 1 + (=) a + b F c + d (8) : 1 + Proof. Applying uniform convexity (7) with both (x; y) = (a c; b d)= and (x; y) = (b d; a c)= and then summing yields a c b d (9) (1 + ) F + F a c F + b d + F ~ a c b d ; where F ~ (z) := [F (z) + F ( z)]=. The desired inequality will follow if we can show that the second-to-last term controls the last one. Applying uniform convexity again yields F ~ a b c d F (a b) + F (c d) (30) a b F + c d ; either with or without the tilde. Uniformity of the smoothness gives F (a d) + F (c b) a d F + c b a d (31) F c b But the left hand side of (31) dominates the right hand side of (30) by hypothesis, so F ~ a c b d a + b F c + d : Together with (9), this completes the proof of (8). :

CONSTRUCTING OPTIMAL MAPS 13 The next lemma is crucial for the denition of the change to variables in which one variable is along transport rays. The lemma shows that if transport rays intersect a level set of u(z) in their interior points, then directions of rays have a Lipschitz dependence on the point of intersection, provided distances from the point of intersection to endpoints of a ray are uniformly bounded away from zero for all rays. Taking x = 0 and F (y) = F ( y) symmetrical implies 1 in (7), so the Lipschitz constant of Lemma 16 is seen to satisfy C 1 with equality only in the Hilbert space case. Denition 15 (Ray Directions). Dene a function : R n! R n as follows. If z is an interior point of a transport ray R with upper and lower endpoints a, b (note that R is uniquely dened by z in view of Lemma 10), then (3) (z) := a b ka bk : Dene (z) = 0 for any point z R n not the interior point of a transport ray. We call (z) the direction function corresponding to the Kantorovich potential u. Lemma 1 shows that on transport rays, the direction function (z) is nothing but the gradient of u computed in the Finsler setting. Lemma 16 (Ray Directions Vary Lipschitz Continuously). Let R 1 and R be transport rays, with upper end a k and lower end b k for k = 1; respectively. If there are interior points y k (R k ) 0 where both rays pierce the same level set of Monge's potential u(y 1 ) = u(y ), then the ray directions (3) satisfy a Lipschitz bound (33) k(y 1 ) (y )k C ky 1 y k; with constant C + = (1 + 1 )=(1 + ) depending on the norm (4) and the distance := min k=1; fky k a k k; ky k b k kg to the ends of the rays. Proof. Let z k ; x k R k denote the points at distance above and below y k on the ray, so that (34) (35) (36) (37) (38) for k = 1;. Thus (39) u(z k ) = u(y 1 ) + ; u(x k ) = u(y 1 ) ; kz k x k k = ; y k = (z k + x k )=; and (y k ) = (z k x k )= k(y 1 ) (y )k = 1 z 1 x 1 z x ; while uniform convexity of the norm (4) gives z 1 z + x x 1 (40) 1 kz 1 z k + 1 kx x 1 k z 1 z x x 1 :

14 L. A. CAFFARELLI, M. FELDMAN, AND R. J. MCCANN Combining (34){(36), kz 1 x 1 k = u(z 1 ) u(x ) kz 1 x k; kz x k = u(z ) u(x 1 ) kz x 1 k; where the Lipschitz condition (0) controls the cross-terms. Lemma 14 therefore applies to F (x) = kxk and the four points (a; b; c; d) = (z 1 ; x 1 ; z ; x ), to yield z 1 z + x 1 x 1 + (=) 1 + z 1 + x 1 z + x (41) : Combining (39){(41) with the identity (37) gives k(y 1 ) (y )k 1 + (=) ky 1 y k ; 1 + to complete the proof. Remark 17. The proof of Lemma 16 uses only the Lipschitz property of u, and not the optimality of u in the Kantorovich Problem. Thus its conclusions hold true for any u Lip 1 (R n ; k k), if we call each segment [a; b] on which u(a) u(b) = ka bk a transport ray, and dene the direction function accordingly. 3. Measure decomposing change of variables It is in this section that we construct the change of variables on R n which is the heart of our proof. Lemma 16 suggests how these new coordinates must be dened: n 1 of the new variables are used to parametrize a given level set of the Kantorovich potential u, while the nal coordinate x n measures distance to this set along the transport rays which pierce it. Thus the eect of this change of variables will be to atten level sets of u while making transport rays parallel. But the conditions of the lemma make clear that we retain Lipschitz control only if we restrict our transformation to clusters of rays in which all rays intersect a given level set of u, and the intersections take place a uniform distance away from both endpoints of each ray. These observations motivate the construction to follow. We begin by parametrizing the level sets of u using a lemma of Federer [9, x3..9]. The key observation is that we only need this parametrization on the interiors of transport rays, where Du 6= 0 exists in view of Lemma 1. From now on, it will be convenient to x a Euclidean structure in R n. The Euclidean scalar product and associated norm are denoted by (; ) and jzj := (z; z) 1=, while B R (z) denotes the Euclidean ball of radius R centered at z R n. Of course, a function is Lipschitz in one norm if and only if it is Lipschitz in all norms, though the Lipschitz constants may dier. Lemma 18 (Bi-Lipschitz Parametrization of Level Sets). Let u : R n! R 1 be a Lipschitz function, p R 1, and S p the level set fx R n j u(x) = pg. Then the set S p \ fx R n j u is dierentiable at x and Du(x) 6= 0g has a countable covering consisting of Borel sets S i p S p, such that for each i N there exist Lipschitz coordinates U : R n! R n 1 and V : R n 1! R n satisfying (4) V (U(x)) = x for all x S i p:

CONSTRUCTING OPTIMAL MAPS 15 Proof. Note that if Du(x) 6= 0 exists, then Du(x)(R n ) = R 1. Federer [9, x3..9] asserts that fx R n j u is dierentiable at x and Du(x)(R n ) = R 1 g has a countable covering consisting of Borel sets E i such that there exist orthogonal projections i : R n! R n 1 in O (n; n 1) and Lipschitz maps (43) with Clearly the sets (44) ^U i : R n! R 1 R n 1 and ^Vi : R 1 R n 1! R n ^U i (x) = (u(x); i (x)) and ^Vi [ ^Ui (x)] = x for all x E i : cover S p. For any xed i N, dene (45) by S i p := S p \ E i U : R n! R n 1 and V : R n 1! R n U := ^Ui ; where : R 1 R n 1! R n 1 is the projection (x 1 ; X)! X; V (X) = ^V i (p; X) for all X R n 1 : Clearly U and V are Lipschitz continuous, while U(x) = i (x) for x S i p, and V (U(x)) = ^V i (p; i (x)) = ^V i [ ^U i (x)] = x establishes (4). For each rational level p Q and integer i N, we shall extend these coordinates to the transport rays intersecting Sp i. Taken together, these coordinate charts must parametrize all points T 1 R n on transport rays (cf. Denition 8). It is convenient to dene them on a countable collection of subsets called clusters of rays: Denition 19 (Ray Clusters). Fix p Q, a Kantorovich potential u, and the Borel cover fs i p g i of the level set S p := fx R n j u(x) = pg in Lemma 18. For each i; j N let the cluster T pij := S R z denote the union of all transport rays R z which intersect S i p, and for which the point of intersection z S i p is separated from both endpoints of the ray by distance greater than 1=j in k k. The same cluster, but with ray ends omitted, is denoted by T 0 pij := S z (R0 z). Lemma 0 (Clusters Cover Rays). The clusters T pij indexed by p Q and i; j N dene a countable covering of all transport rays T 1 R n. Moreover, each T pij and transport ray R satisfy: (46) Either (R) 0 T pij, or (R) 0 \ T pij = ;: Proof. A transport ray R = [a; b] has positive length by Denition 7. Along it, the Kantorovich potential u is an ane function with non-zero slope, according to Lemma 1. Thus there is some rational number p (u(a); u(b)), for which R intersects the level set S p := fx j u(x) = pg. The point x of intersection belongs to one of the covering sets S i p S p of Lemma 18, and lies a positive distance from each end of the ray, so R T pij for some j N. Conversely, if the interior of some other ray R 0 intersects one of the rays R z comprising the cluster T pij, the non-crossing property of Lemma 10 forces R = R z T pij, which completes the proof of (46).

16 L. A. CAFFARELLI, M. FELDMAN, AND R. J. MCCANN Denition 1 (Ray Ends). Denote by E T 1 the set of endpoints of transport rays. On each ray cluster T pij we are now ready to dene the Lipschitz change of variables which inspired the title of this section: Lemma (Lipschitz Change of Variables). Each ray cluster T pij R n admits coordinates G = G pij : T 0 pij! Rn 1 R 1 with inverse F = F pij : G(T 0 ) pij! Rn satisfying: (i). F extends to a Lipschitz mapping between R n 1 R 1 and R n. (ii). For each > 0, G is Lipschitz on Tpij := fx T 0 pij j kx ak; kx bk > g, where a and b denote the endpoints of the (unique) transport ray R x. (iii). F (G(x)) = x for each x T 0 pij. (iv). If a transport ray R z T pij intersects Sp i at z, then each interior point x (R z ) 0 of the ray satises (47) G(x) = (U(z); u(x) u(z)); where U : R n! R n 1 gives the Lipschitz coordinates (4) on S i p. Remark 3 (Flattening Level Sets). The nal assertion of Lemma implies: (a) F maps the part of the hyperplane R n 1 f0g which lies within G(T 0 pij) onto S i p; (b) F maps the segment where each \vertical" line fxg R 1 intersects G(T 0 pij ) onto a transport ray. Thus in the new coordinates (X; x n ) R n 1 R 1, the level sets of u are attened: they are parametrized by the variables X = (x 1 ; : : :; x n 1 ) while x n varies along the transport rays. Proof. Lemma 10 shows that rays do not cross, while Denition 7 (or Lemma 1) shows that u is an ane function on each ray, with slope as large as permitted by the Lipschitz constraint (0). Thus every point x T 0 pij lies on a unique transport ray, and this ray intersects the level set S p in a single point z Sp, i so the expression (47) denes a map G : T 0 pij! Rn 1 R 1 throughout the cluster. It remains to construct the inverse map F on G(T 0 ) pij Rn 1 R 1. Let (X; x n ) G(T 0 pij ), and let V be the map (4) parametrizing Sp. i Then the point V (X) Sp i is an interior point of some transport ray R, both of whose endpoints are separated from V (X) by a distance exceeding 1=j. Let ( ) be the direction function (3) associated with the Kantorovich potential u, and dene (48) F (X; x n ) := V (X) + x n (V (X)): That F inverts G (assertion (iii)) now follows from (4), (47) and the fact that u is ane with maximal slope along the ray R. To prove F is Lipschitz on G(T 0 pij ) Rn 1 R 1, introduce (49) := fx R n 1 j (X; 0) G(T 0 pij)g: We rst claim the ray direction V is a Lipschitz function of X. Indeed, recalling that V (X) S i p is separated from the endpoints of R V (X) by a distance greater than 1=j, we invoke Lemma 16 to conclude that X; X 0 satisfy (50) (51) k(v (X)) (V (X 0 ))k jc 1 kv (X) V (X 0 )k jc jx X 0 j

CONSTRUCTING OPTIMAL MAPS 17 because V : R n 1! R n was Lipschitz in Lemma 18. To complete the proof that F is Lipschitz, it remains only to bound x n in (48). Since the supports X and Y of the original measures were compact, the transport rays T pij T 1 lie in a bounded set. It follows from the denition (47) of G that (X; x n ) G(T 0 pij ) is also bounded, since u and U are Lipschitz on R n. Finally, we can extend F to all of R n 1 R 1 while preserving the Lipschitz bound (51) using Kirszbraun's theorem [9, x.10.43], to conclude the proof of assertion (i). It remains to prove assertion (ii) of the lemma. Let > 0. We rst show the direction function ( ) to be Lipschitz on Tpij. Being discontinuous at the mutual end of two rays, its Lipschitz constant must depend on. Let x; x 0 Tpij lie on the transport rays R and R 0. If kx x 0 k = there is nothing to prove, since k(x) (x 0 )k 4 kx x0 k: Therefore, assume kx x 0 k < = and hence ju(x) u(x 0 )j kx x 0 k < =. The point y 0 := x 0 + [u(x) u(x 0 )](x 0 ) then lies on the ray R 0, since the ends of R 0 are at least distance from x 0. Moreover, u(y 0 ) = u(x 0 ) + [u(x) u(x 0 )] = u(x), and the distances from x and y 0 to the ends of R and R 0 are at least = respectively. Invoking Lemma 16 again yields: (5) k(x) (x 0 )k = k(x) (y 0 )k C kx y0 k: Moreover, x 0 ; y 0 R 0 lie on the same transport ray, and u(x) = u(y 0 ), so (53) kx 0 y 0 k = ju(x 0 ) u(y 0 )j = ju(x 0 ) u(x)j kx 0 xk combines with the triangle inequality kx y 0 k kx x 0 k + kx 0 y 0 k to produce the desired bound for ( ) on T pij: (54) k(x) (x 0 )k 4C kx x0 k: (55) Turning to G, we estimate jg(x) G(x 0 )j jg(x) G(y 0 )j + jg(y 0 ) G(x 0 )j: Since x 0 and y 0 lie on R 0, denition (47) yields (56) jg(y 0 ) G(x 0 )j = ju(x 0 ) u(y 0 )j = ju(x 0 ) u(x)j kx x 0 k: Let z and z 0 be the points where R and R 0 pierce S i p. Since u(x) u(z) = u(y0 ) u(z 0 ), the same denition gives (57) jg(x) G(y 0 )j = ju(z) U(z 0 )j: Setting := u(z) u(x), we have z = x + (x) and z 0 = y 0 + (x 0 ). Also jj is bounded by the diameter of the cluster T pij. Because the coordinates U were Lipschitz, we have (58) (59) ju(z) U(z 0 )j C 3 kx y 0 k + C 3 k(x) (x 0 )k C 3 ( + 4C 1 )kx x 0 k from (53){(54). Now (55){(59) imply G is Lipschitz on Tpij, to complete the lemma.

18 L. A. CAFFARELLI, M. FELDMAN, AND R. J. MCCANN The next step is to address measurability of the sets T pij and G(Tpij). 0 As in Evans and Gangbo [7], this is done with the help of the distance functions to the upper and lower ends of rays: Lemma 4 (Semicontinuity of Distance to Ray Ends). At each z R n dene (60) (61) (z) := supfkz yk j y Y; u(z) u(y) = kz ykg; (z) := supfkz xk j x X ; u(x) u(z) = kz xkg; where sup ; := 1. Then ; : R n! R [ f 1g are both upper semicontinuous. Proof. We prove only the upper semicontinuity of (z); the proof for (z) is similar. Given any sequence of points z n! z for which 0 := lim n (z n ) exists, we need only show 0 (z). It costs no loss of generality to assume 0 > 1 and (z n ) > 1; moreover, (z n ) < 1 since the support Y of the measure was assumed compact. From (60), (6) (z n ) 1=n kz n y n k = u(z n ) u(y n ) for some sequence y n Y. By compactness of Y, a convergent subsequence y n! y Y exists. The (Lipschitz) continuity of u yields 0 kz yk = u(z) u(y) (z) from the limit of (6), which proves the lemma. Geometrically, the functions, have the following meaning: If z lies on a transport ray R, then (z) and (z) are the distances (in k k) from z to the lower and upper end of R respectively; thus at ray ends z E, exactly one of these distances vanishes. If z T 0 is a ray of zero length, then (z) = (z) = 0. If z R n n (T 0 [ T 1 ), then either (z) = 1 or (z) = 1. We combine these functions with our change of variables to show the clusters of ray interiors T 0 pij are Borel sets, and to give a much simpler proof than Evans and Gangbo that the ray ends have measure zero [7, Proposition 5.1]. In what follows, n-dimensional Lebesgue measure is denoted L n. Lemma 5 (Measurability of Clusters / Negligibility of Ray Ends). The ray ends E T 1 form a Borel set of measure zero: L n (E) = 0. The rays of length zero T 0 R n also form a Borel set. Finally, for each p Q and i; j N, the cluster T 0 pij of ray interiors and its attened image G(T 0 pij ) from Lemma are Borel. Proof. First observe that T 0 = fz R n j (z) = (z) = 0g while E = fz R n j (z)(z) = 0 but (z) + (z) > 0g. Both of these sets are Borel by the upper semicontinuity of and shown in Lemma 4. Therefore, x p Q and i; j N and recall the Borel set Sp i R n and Lipschitz coordinates U : R n! R n 1 on it from Lemma 18. Since U is univalent (i.e., one-to-one) on Sp, i it follows from Federer [9, x..10, p. 67] that U(Sp) i is a Borel subset of R n 1. Moreover, the set dened in (49) is given by = fx U(S i p) j (U 1 (X)); (U 1 (X)) > 1=jg according to (47), which with Denition 19 also yields the image G(T 0 pij) = f(x; x n ) j X ; (V (X)) < x n < (V (X))g of the ray cluster in attened coordinates. Here V = U 1 is Lipschitz, so V, V are upper semicontinuous in view of Lemma 4. Thus we conclude that

CONSTRUCTING OPTIMAL MAPS 19 both R n 1 and G(Tpij) 0 R n 1 R 1 are Borel. Lemma (iii) shows that the transformation F = G 1 back to the original coordinates is well dened and univalent on G(T 0 pij ). Since F extends to a Lipschitz function throughout Rn and T 0 = pij F (G(T 0 )), we conclude, using Federer [9, x..10] again, that pij T 0 pij is Borel. To show the ray ends have measure zero, consider the corresponding points G R n 1 R 1 of T pij in the attened coordinate system: G = f(x; (V (X))) j X g [ f(x; (V (X))) j X g: Using upper semicontinuity of V and V we conclude G is a Borel set, and L n (G) = 0 by Fubini's theorem. Now E \ T pij = F (G). Since F : R n 1 R 1! R n is a Lipschitz map, we can use L n (G) = 0 and the Area formula [9, x3..3] to conclude that L n (E \ T pij ) = 0 (and hence is Lebesgue measurable). By Lemma 0, the clusters ft pij g form a countable cover for E T 1, so L n (E) = 0 to conclude the proof. As a particular consequence of this lemma, the set T 1 of all transport rays is Borel, being a countable union of Borel sets T 0 pij with E. Also, the sets T pij are Lebesgue measurable, being the union of a Borel set with a subset of a negligible set. Finally, we can take the clusters T pij of rays to be disjoint. Indeed, enumerate the triples (p; i; j) so the collection of clusters ft pij g becomes ft (k) g, k = 1; ; : : :. For k > 1 redene T (k)! T (k) n ( S k 1 l=1 T (l)). Redene T(k) 0! T (k) 0 n (S k 1 l=1 T (l) 0 ) analogously. We will continue to denote the modied sets by T pij and Tpij. 0 Note that the structure of the clusters T pij remains the same: for each T pij we have a Borel subset S pij := T pij \ S p of Sp i R n on which there are Lipschitz coordinates U, V (4) satisfying (63) V (U(x)) = x for all x S pij : Indeed, since the new cluster is a subset of the old, the former maps U, V will suce. From the modication procedure it also follows that the ray property (46) holds for the modied sets which justies calling them clusters and that the ray R z corresponding to each z S pij extends far enough on both sides of S p (i.e., (z); (z) > 1=j) to dene coordinates F, G on T pij satisfying all assertions of Lemma (again, the original maps F and G work for the modied clusters). The measurability Lemma 5 holds for the new clusters, as follows readily from the modication procedure. Thus from now on we assume: (64) The clusters of ray interiors T 0 pij 4. Mass balance on rays are disjoint. Since we intend to solve Monge's problem by constructing a map which moves mass along rays, it is essential to know that f + and f assign the same amount of mass to each transport ray. In the Euclidean case this is the content of Evans and Gangbo [7, Lemma 5.1]. Here it remains true, but our proof exploits the fact that the optimal maps s " (x) between f + onto f for the cost c " (x; y) = kx yk 1+" in Theorem (iv) accumulate onto transport rays of the limiting Kantorovich potential. While individual rays all have mass zero, one can consider arbitrary collections of transport rays instead. The ray ends, having measure zero, are neglected. Thus: