A variational proof of partial regularity for optimal transportation maps between sets

A variational proof of partial regularity for optimal transportation maps between sets M. Goldman F. Otto October 24, 27 Abstract We provide a new proof of the known partial regularity result for the optimal transportation map (Brenier map) between two sets. Contrary to the existing regularity theory for the Monge-Ampère equation, which is based on the maximum principle, our approach is purely variational. By constructing a competitor on the level of the Eulerian (Benamou-Brenier) formulation, we show that locally, the velocity is close to the gradient of a harmonic function provided the transportation cost is small. We then translate back to the Lagrangian description and perform a Campanato iteration to obtain an ε-regularity result. Introduction Let E and F be two bounded open subsets of R d of equal (Lebesgue) measure and let T be the solution of the optimal transportation problem min T (x) x T χ E =χ F E 2 dx, (.) where with a slight abuse of notation T χ E denotes the push-forward by T of the measure χ E dx (existence and characterization of T as the gradient of a convex function ψ are given by Brenier s Theorem, see [8, Th. 2.2]). Our main result is a partial regularity theorem for T : Theorem.. There exist open sets E E and F F of full measure such that T is a C -diffeomorphism between E and F. This theorem is a consequence of the Alexandrov Theorem [9, Th. 4.25] and the following ε regularity theorem (plus a bootstrap argument): Theorem.2. Let T be the minimizer of (.) and let R be such that B 2R E F. For every α, there exists ε(α) such that if T x 2 dx + T x 2 dx ε(α), (2R) d+2 B 2R (2R) d+2 B 2R

then, T is C,α inside B R. Theorem. was already obtained by Figalli and Kim [2] (see also [] for a farreaching generalization), but our proof departs from the usual scheme for proving regularity for the Monge-Ampère equation. Indeed, while most proofs use some variants of the maximum principle, our proof is variational. The classical approach operates on the level of the convex potential ψ, and the main difficulty is to prove that a Brenier solution to the Monge-Ampère equation is actually a strictly convex Alexandrov solution, at which point Caffarelli s regularity theory [7] applies (see in particular [2], or [] where an ε regularity theorem based on a Campanato iteration is proven). Maximum principle arguments are also underlying the global regularity results of Caffarelli [5, 8, 6]. On the contrary, we work directly at the level of the optimal transportation map T, and besides the L bound (4.5) given by McCann s displacement convexity, we only use variational arguments. The main idea behind the proof is the well-known fact that the linearization of the Monge-Ampère equation gives rise to the Laplace equation [8, Sec. 7.6]. We prove that if the energy in a given ball is small enough, then in the half-sized ball, T is close to the gradient of a harmonic function (see Proposition 4.4). This result is actually established at the Eulerian level (i.e. for the solutions of the Benamou-Brenier formulation of optimal transportation, see [8, Th. 8.] or [3, Chap. 8]), see Proposition 4.. It is for this result that we need the outcome of McCann s displacement convexity, cf. (4.5), since it is required for the quasi-orthogonality property (4.3). Our argument is variational and proceeds by defining a competitor based on the solution of the Laplace equation with suitable flux boundary conditions, and a boundary-layer construction. The boundary-layer construction is carried out in Lemma 3.3; by a duality argument it reduces to the trace estimate (3.6). This part of the proof is reminiscent of arguments from []. Once we have the harmonic approximation result, using that harmonic functions are close to their second-order Taylor expansion, we establish improvement of flatness by tilting, see Proposition 4.5. This means that if the energy in a given ball is small then, up to a change of coordinates, the energy has a geometric decay on a smaller scale. The last step is to perform a Campanato iteration of this one-step improvement. This is done in Proposition 4.6, where we use our last fundamental ingredient, namely the invariance of the variational problem under affine transformations. This entire approach to ε-regularity is guided by De Giorgi s strategy for minimal surfaces (see [4] for instance). Let us notice that because of the natural scaling of the problem, our Campanato iteration operates directly at the C,α -level for T, as opposed to [2, ], where C,α -regularity is obtained first. The plan of the paper is the following. In Section 2 we gather some notation that we will use throughout the paper. Then, in Section 3, we recall some well-known facts about harmonic functions and then prove estimate (3.), the proof of which is based on the trace estimate (3.6). In the final section, we prove Theorem.2 and then Theorem.. We will soon put a new version of this preprint containing the extension of our proof to 2

the case of non-constant densities. We are currently working on the extension of the proof to more general cost functions. 2 Notation In the paper we will use the following notation. The symbols,, indicate estimates that hold up to a global constant C, which typically only depends on the dimension d and the Hölder exponent α (if applicable). For instance, f g means that there exists such a constant with f Cg, f g means f g and g f. An assumption of the form f means that there exists ε >, typically only depending on dimension and the Hölder exponent, such that if f ε, then the conclusion holds. We write E for the Lebesgue measure of a set E. Inclusions will always be understood as holding up to a set of Lebesgue measure zero, that is for two sets E and F, E F means that E\F =. When no confusion can be made, we will drop the integration measures in the integrals. For R > and x R d, B R (x ) denotes the ball of radius R centered in x. When x =, we will simply write B R for B R (). We will also use the notation f := f. B R B R B R For a function ρ defined on a set B we introduce the Hölder semi-norm of exponent α (, ) ρ(x) ρ(y) [ρ] α,b := sup. x y B x y α 3 Preliminaries In this section, we first recall some well-known estimates for harmonic functions. Lemma 3.. Given f L 2 ( B ) we consider a solution ϕ of { ϕ = in B ϕ ν = f on B, (3.) where ν denotes the outer normal to B. We have ϕ 2 f 2, (3.2) B B ( sup 3 ϕ 2 + 2 ϕ 2 + ϕ 2) B /2 B ϕ 2, (3.3) and for every r, letting := B \B r, ϕ 2 r f 2. B (3.4) 3

Proof. We start with (3.2). Changing ϕ by an additive constant, we may assume that B ϕ =. Testing (3.) with ϕ, we obtain ϕ 2 = fϕ B B ( ) /2 ( ) /2 ( B f 2 B f 2 B ϕ 2 ) /2 ( ) /2 ϕ 2, B where we used the trace estimate in conjunction with Poincaré s estimate for mean-value zero. This yields (3.2). Estimate (3.3) follows from the mean-value property of harmonic functions applied to ϕ and its derivatives. We finally turn to (3.4). By sub-harmonicity of ϕ 2 (which can for instance be inferred from the Bochner formula), we have the mean-value property in form ϕ 2 ϕ 2 for r. B r B Integrating this inequality between r and, using Pohozaev identity, that is, 2 ( ) (d 2) ϕ 2 = ϕ 2 ϕ B τ, (3.5) ν and (3.2), we obtain (3.4). We will also need a trace estimate in the spirit of [, Lem. 3.2]. B Lemma 3.2. For r, letting := B \B r, it holds for every function ψ, ( ) /2 ( ) /2 (ψ ψ) 2 r /2 ψ 2 + B r (d+)/2 t ψ, (3.6) where ψ(x) := ψ(t, x)dt. Proof. By a standard density argument, we may assume ψ C ( [, ]). Because of (ψ ψ) 2 ψ 2, we may rewrite (3.6) in terms of v := ψ ψ as ( ) /2 ( ) /2 v 2 r /2 v 2 + B r (d+)/2 t v. Since for every x B, ( ) /2 v =, we have v2 tv, so that it is enough to prove ( ) /2 ( ) /2 ( v 2 r /2 v 2 ) /2 + v 2. r (d+)/2 B 4 B

( ) /2 Introducing V := v2 and noting that V 2 v 2, we see that it is sufficient to establish ( ) /2 ( ) /2 r /2 V 2 + V. (3.7) r (d+)/2 B V 2 We now cover the sphere B by (geodesic) cubes Q of side-length r in such a way that there is only a locally finite overlap. Then the annulus is covered by the corresponding conical sets Q r. By summation over Q and the super-additivity of the square function, for (3.7) it is enough to prove for every Q ( ) /2 ( V 2 r /2 V 2 Q Q r ) /2 + V. r (d+)/2 Q r Since Q r is the bi-lipschitz image of the Euclidean cube (, r) d, it is enough to establish V 2 r V 2 + ( ) 2 V. (3.8) {} (,r) d (,r) r d d+ (,r) d By rescaling, for (3.8) it is sufficient to consider r =. inequality we have for every x (, ) d V (, x ) V (x, x ) dx + By a one-dimensional trace V (x, x ) dx. Taking squares, integrating and using Jensen s inequality, we get V 2 {} (,) d V 2 + (,) d V 2. (,) d Using Poincaré inequality in the form (,) d V 2 (,) d V 2 + ( (,) d V ) 2, we obtain (3.8). This trace estimate is used in a similar spirit as in [, Lem. 3.3] to obtain Lemma 3.3. Let f L 2 ( B (, )) be such that for a.e. x B, f(x, t)dt =. For r > we introduce := B \B r and define Λ as the set of pairs (s, q) with q /2 and such that for ψ C (B [, ]) i, s t ψ + q ψ = fψ. B (3.9) ( ) /(d+) Provided r B f 2 we then have inf (s,q) Λ 2 q 2 r f 2. B (3.) i For (s, q) regular, (3.9) just means t s+div q = in, s(, ) = s(, ) =, q ν = on B r (, ) and q ν = f on B (, ) 5

Proof. We first note that the class Λ is not empty: For t (, ), let u t be defined as the (mean-free) solution of the Neumann problem u t = B f in (, ) u t ν on B (, ) u t ν on B r (, ), and set q(x, t) := u t (x). The definition s(x, t) := t ( ) ensures that (3.9) is satisfied, and r B f 2 2 As in [, Lem. 3.3], we now prove (3.) with help of duality: inf (s,q) Λ 2 q 2 = inf sup (s,q), s /2 ψ = sup ψ + inf { (s,q), s /2 + B fψ { B fψ } div q(x, z)dz = yields s /2. 2 q 2 2 q 2 }, s t ψ + q ψ s t ψ + q ψ t B f then where the swapping of the sup and inf is allowed since the functional is convex in (s, q) and linear in ψ (see for instance [4, Prop..]). Minimizing in (s, q), and using f = which allows us to smuggle in ψ := ψ, we obtain inf (s,q) Λ 2 q 2 = sup ψ = sup ψ sup ψ With the abbreviation F := inf (s,q) Λ { 2 q 2 sup F ψ { { { 2 ( ψ 2 + t ψ ) + 2 ( ψ 2 + t ψ ) + B fψ } } f(ψ ψ) B 2 ( ψ 2 + t ψ ) ( ) /2 ( ) /2 } + f 2 (ψ ψ) 2. B B ( ) /2 B f 2 we have just established the inequality ( B (ψ ψ) 2 ) /2 2 ψ 2 + t ψ }. 6

Using now (3.6), where we denote the constant by C, and Young s inequality, we find that ( provided r C F 2/(d+) ) /(d+)), (in line with our assumption r B f 2 inf (s,q) Λ { 2 q 2 sup CF 2 2 r + C ψ F 2 r = r f 2. B This concludes the proof of (3.). F r (d+)/2 4 Proofs of the main results t ψ 2 } t ψ Let E and F be two bounded open subsets of R d with E = F and let T be the minimizer of min T (x) x T χ E =χ F E 2 dx, (4.) where by a slight abuse of notation T χ E denotes the push-forward by T of the measure χ E dx. If T is the optimal transportation map between χ F and χ E, then (see for instance [3, Rem. 6.2.]) T (T (x)) = x, and T (T (y)) = y for a.e. (x, y) E F. (4.2) By another abuse of notation, we will denote T := T. Now for t [, ] and x R d we set T t (x) := tt (x) + ( t)x and consider the non-negative and R d -valued measures defined through ρ(, t) := T t χ E and j(, t) := T t [(T Id)χ E ]. (4.3) It is easy to check that j(, t) is absolutely continuous with respect to ρ(, t). The couple (ρ, j) solves the Eulerian (or Benamou-Brenier) formulation of optimal transportation (see [8, Th. 8.] or [3, Chap. 8], see also [7, Prop. 5.32] for the uniqueness), i.e. it is the minimizer of { min R d ρ j 2 : t ρ + div j =, ρ(, ) = χ E, ρ(, ) = χ F }, (4.4) where the continuity equation including its boundary conditions are imposed in a distributional sense and where the functional is defined through (see [2, Th. 2.34]), 2 dj R ρ j 2 := R dρ dρ if j ρ, d d + otherwise. 7

Since T is the gradient of a convex function, by Alexandrov Theorem [9, Th. 4.25], T is differentiable a.e., that is for a.e. x, there exists a symmetric matrix A such that T (x) = T (x ) + A(x x ) + o( x x ). Moreover, A coincide a.e. with the absolutely continuous part of the distributional derivative DT of the map T. We will from now on denote T (x ) := A. For t [, ], by [8, Prop. 5.9], ρ(, t) (and thus also j) is absolutely continuous with respect to the Lebesgue measure and the Jacobian equation ρ(t, T t (x)) det T t (x) = χ E (x), holds a.e. (see [9, Ex..2] or [8, Th. 4.8]). In particular, det T (x) = χ E (x). By concavity of det( ) /d on non-negative symmetric matrices, we get that ρ, (4.5) which is a special instance of McCann s displacement convexity (see [5, Cor. 4.4]). The functional can be therefore rewritten as R ρ j 2 = d R ρ j 2 (x, t)dxdt. d where { ρ j 2 ρ(x,t) (x, t) := j(x, t) 2 if ρ(x, t) otherwise. We first prove that the deviation of the velocity field v := dj from being the gradient dρ of a harmonic function is locally controlled by the energy. The construction we use is somewhat reminiscent of the Dacorogna-Moser construction (see [7]). Proposition 4.. Let (ρ, j) be the minimizer of (4.4) and assume that B E F. Then, there exists ϕ harmonic in B /2 with B /2 ρ j ρ ϕ 2 ( B ρ j 2 ) d+2 d+ (4.6) and B /2 ϕ 2 B ρ j 2. (4.7) Proof. Step [Definition of ϕ] Using that ρ, cf. (4.5), and Fubini, we can find a radius R (/2, ) such that B R j 2 B R 8 ρ j 2 B ρ j 2 (4.8)

with the understanding that R is a Lebesgue point of r j L 2 ( B r ) with respect to the weak topology. Based on the latter, we claim that for every function ζ H (B R (, )) ii, B R ρ t ζ + j ζ = ζf + B R ζ(, ) ζ(, ), B R (4.9) where f := j ν denotes the normal component of j. To this purpose, for < ε we introduce the cut-off function if x R ε R x η ε (x) := if R ε x R ε otherwise and obtain by admissibility of (ρ, j) R 2 η ε (ζ(, ) ζ(, )) = = t (ζη ε )ρ + (ζη ε ) j R 2 R 2 η ε t ζρ + η ε ζ j ε B R \B R ε ζj ν. Letting ε go to zero and using the above Lebesgue-point property of R, we obtain (4.9). We now may define ϕ as a solution of { ϕ = in B R ϕ = f on B ν R, (4.) where f(x) := f(x, t)dt. We will show that B R ρ j ρ ϕ 2 ( B ρ j 2 ) d+2 d+. (4.) Once (4.) is established, applying (3.2) from Lemma 3. (with the radius replaced by R ), we have by (4.8), ϕ 2 B /2 ϕ 2 B R f 2 B R (4.8) B ρ j 2, which concludes the proof. In order to keep notation light, we will assume that R = /2. ii we consider here are larger class of test functions than C (B R [, ]) since we want to apply (4.9) to the harmonic function ϕ defined in (4.). 9

Step 2 [Quasi-orthogonality] Here we prove that B /2 ρ j ρ ϕ 2 B /2 ρ j 2 ϕ 2. (4.2) B /2 Notice that if ρ = then j = and thus also j ρ ϕ =, so that the left-hand side of (4.2) is well defined (see the discussion below (4.4)). Based on this we compute 2 B /2 ρ j ρ ϕ 2 = 2 B /2 ρ j 2 j ϕ + ρ ϕ 2 B /2 2 B = 2 B /2 ρ j 2 ( ρ B /2 2 ) ϕ 2 (j ϕ) ϕ B /2 (4.5) 2 B /2 ρ j 2 ϕ 2 (j ϕ) ϕ. 2 B /2 B /2 Using (4.9) with ζ = ϕ and testing (4.) with ϕ, we have ( ) (j ϕ) ϕ = ϕ f f =, B /2 B /2 where we recall that f = f and since B E F, ρ(, ) = ρ(, ) = in B /2 B. This proves (4.2). Step 3 [The main estimate] In this last step, we establish that B /2 ρ j 2 ( ϕ 2 B /2 B ρ j 2 ) d+2 d+. (4.3) Thanks to (4.2), this would yield (4.6). By minimality of (ρ, j), it is enough to construct a competitor ( ρ, j) that agrees with (ρ, j) outside of B /2 (, ) and that satisfies the upper bound in (4.3). Let r > and := B /2 \B /2( r). We now make the following ansatz { (, ϕ) in B /2( r) (, ), ( ρ, j) := ( + s, ϕ + q) in (, ), with (s, q) Λ, where Λ is the set defined in Lemma 3.3 with f replaced by f f and the radius replaced by /2. Thanks to (4.) for ϕ, (3.9) for (s, q) and (4.9), ( ρ, j) extended by (ρ, j) outside B /2 (, ) is indeed admissible for (4.4). ( ) /(d+), By Lemma 3.3, if r B /2 (f f) 2 we may choose (s, q) Λ such that 2 q 2 r B /2 (f f) 2. (4.4)

We now estimate using that s /2: B /2 ρ j 2 = ϕ 2 + B /2( r) ϕ 2 + 4 B /2( r) ϕ + q 2 + s ϕ 2 + q 2. By (3.4) (with f playing the role of f) and (4.4), we thus have B /2 ρ j 2 ϕ 2 r f 2 + r (f f) 2 B /2 B /2 B /2 = r f 2. B /2 ( ) /(d+) ( ) /(d+) Taking r to be a large but order-one multiple of B /2 (f f) 2 B /2 f 2 yields (4.3). Remark 4.2. The crucial point in (4.6) is that d+2 d+ >. Remark 4.3. The quasi-orthogonality property (4.2) is a generalization of the following classical fact: If ϕ is a harmonic function with ϕ = f on B ν, then for every divergencefree vector-field b with b ν = f on B b ϕ 2 = b 2 ϕ 2, B B B so that the minimizers b of the left-hand side coincide with the minimizers of the right-hand side. See for instance [6, Lem. 2.2] for an application of this idea in a different context. We now prove that (4.6) implies a similar statement in the Lagrangian setting, namely that the distance of the displacement T x to the set of gradients of harmonic functions is (locally) controlled by the energy. This is reminiscent of the harmonic approximation property for minimal surfaces (see [4, Sec. III.5]). For this we introduce the excess energy for T a minimizer of (4.) and R >, E(T, R) := R 2 T x 2 + T x 2. B R Proposition 4.4. Let T be the minimizer of (4.) and assume that B E F. Then there exists a harmonic function ϕ in B /6 such that T (x + ϕ) 2 + T (x ϕ) 2 E(T, ) d+2 d+ (4.5) B /6 B /6 and B /6 ϕ 2 E(T, ). (4.6)

Proof. Notice first that we may assume that E(T, ) since otherwise we can take ϕ =. Step [An L bound] We start by proving that ( ) /(d+2) ( ) /(d+2) sup T x T x 2 and sup T x T x 2. B 3/4 B B 3/4 B (4.7) We prove just the first inequality. Let u(x) := T (x) x. By monotonicity of T, for a.e. x, y B, (u(x) u(y)) (x y) x y 2. (4.8) Let y B 3/4 be such that (4.8) holds for a.e. x B. By translation we may assume that y =. By rotation, it is enough to prove for the first coordinate of u that u () ( B /4 u 2 Taking y = in (4.8), we find for a.e. x B /4 ) /(d+2). u() x u(x) x + x 2 u(x) 2 + x 2. Integrating the previous inequality over the ball B r (re ), we obtain u() re u 2 + r 2, B r(re ) so that Optimizing in r yields (4.7). u () u 2 + r. r d+ B As a first consequence of (4.7), we have that T (B /6 ) B /8, (4.9) and if for t [, ] and x E, we let T t (x) := tt (x) + ( t)x, we also have T t (B /8 ) B /4. (4.2) We now claim that for t [, ], we have on the level of the pre-image T t (B /2 ) B. (4.2) 2

Indeed, if x E is such that T t (x) B /2, then by (4.7) in the form of T t () = o(), where o() denotes a function that goes to zero as E(T, ) goes to zero, 4 ( + o()) T t(x) T t () 2 = t 2 T (x) T () 2 + 2t( t)(t (x) T ()) x + ( t) 2 x 2 (4.8) t 2 T (x) T () 2 + ( t) 2 x 2 2 min( T (x) T () 2, x 2 ). From this we see that x or T (x) is in B 2 +o() B 3/4. In the first case, (4.2) is proven while in the second, we have thanks to (4.7) that x = T (T (x)) T (B 3/4 ) B from which we get (4.2) as well. Step 2 [From Lagrangian to Eulerian viewpoint] We recall the definitions of the measures ρ(, t) := T t χ E and j(, t) := T t [(T Id)χ E ]. We note that the velocity field v = dj dρ satisfies v(t t(x), t) = T (x) x for a.e. x E (this can be seen arguing for instance the proof of [8, Th. 8.]. Hence by definition of the expression ρ j 2 and that of ρ, B /2 ρ j 2 = B /2 v 2 dρ = T t (B /2 ) T x 2 (4.2) B T x 2. By Proposition 4. with the radius replaced by /2, we infer that there exists a harmonic ϕ in B /4 such that B /4 ρ j ρ ϕ 2 E(T, ) d+2 d+ and B /4 ϕ 2 E(T, ). (4.22) Step 3 [Conclusion] In order to show (4.5), it is enough to prove its first part B /8 T (x + ϕ) 2 E(T, ) d+2 d+. (4.23) 3

Indeed, we would then have T (x ϕ) 2 = B /6 (4.9) T (B /6 ) T (B /6 ) x (T ϕ T ) 2 T (x + ϕ) 2 + T (B /6 ) B /8 T (x + ϕ) 2 + sup B /8 2 ϕ 2 ( (4.23) ) E(T, ) d+2 d+ + ϕ 2 E(T, ), B /4 ϕ ϕ T 2 B /8 x T 2 where in the last estimate we have used (3.3) with the radius /4 instead of. Using the second part of (4.22) and E(T, ) d+2 d+ E(T, ) 2, we obtain T (x ϕ) 2 E(T, ) d+2 d+, B /6 which together with (4.23) would give the full (4.5). Let us finally show (4.23). By the triangle inequality we have T (x + ϕ) 2 T (x + ϕ T t ) 2 + B /8 B /8 B /8 ϕ ϕ T t 2. Arguing as above and using that for t [, ], T t (x) x T (x) x, the second term on the right-hand side is estimated by E(T, ) 2. We thus just need to estimate the first term. Recall that v = dj satisfies v(t dρ t(x), t) = T (x) x, so that we obtain for the integrand T (x) (x + ϕ(t t (x)) = (v(t, ) ϕ)(t t (x)) for a.e. x E. Hence by definition of ρ and by our convention on how to interpret j ρ ρ ϕ 2 when ρ vanishes, T (x + ϕ T t ) 2 = v ϕ 2 dρ B /8 T t(b /8 ) = j ρ ϕ 2 T t(b /8 ) ρ (4.2) (4.22) j ρ ϕ 2 E(T, ) d+2 d+. B /4 ρ Analogously to De Giorgi s proof of regularity for minimal surfaces (see for instance [4, Chap. 25.2]), we are going to prove an excess improvement by tilting -estimate. By this we mean that if at a certain scale R, the map T is close to being linear, i. e. if E(T, R), then on a scale θr, after a proper change of coordinates, it is even closer to be linear. Together with (4.5) from Proposition 4.4, the main ingredient of the proof is the regularity estimates (3.3) from Lemma 3. for harmonic functions. 4

Proposition 4.5. For every α (, ) there exists θ = θ(d, α) > with the property that for every R > with B R E F and E(T, R), there exist a symmetric matrix B of unit determinant and a vector b with B Id 2 E(T, R) and b 2 R 2 E(T, R), (4.24) such that, letting ˆT (x) := B(T (Bx) b), E( ˆT, θr) θ 2α E(T, R). (4.25) Proof. By a rescaling x = R x, which amounts to the re-definition T ( x) := R T (R x) (which preserves optimality) and b := R b, we may assume that R =. Let ϕ be the harmonic function given by Proposition 4.4. Let then b := ϕ() and A := 2 ϕ() and set B := e A/2. Since ϕ is harmonic, Tr A = so that det B =. Using (3.3) from Lemma 3. and (4.6) from Proposition 4.4, we see that (4.24) is satisfied. Introducing ˆT (x) := B(T (Bx) b) we have θ 2 ˆT x 2 = θ 2 B(T b) B x 2 B θ BB θ (4.24) θ 2 T (B 2 x + b) 2, B 2θ where we used E(T, ). We split the right-hand side into three terms θ 2 ˆT x 2 B θ θ 2 T (x + ϕ) 2 + θ 2 (B 2 Id A)x 2 + θ 2 ϕ b Ax 2 B 2θ B 2θ B 2θ θ 2 T (x + ϕ) 2 + B 2 Id A 2 + θ 2 sup ϕ b Ax 2. B 2θ B 2θ Recalling B = e A/2, A = 2 ϕ(), and b = ϕ(), we obtain Since d+2 d+ θ 2 ˆT x 2 (4.5) θ (d+2) E(T, ) d+2 d+ + 2 ϕ() 4 + θ 2 sup 3 ϕ 2 B θ B 2θ (3.3)&(4.6) θ (d+2) E(T, ) d+2 d+ + E(T, ) 2 + θ 2 E(T, ). < 2 and E(T, ), this simplifies to θ 2 ˆT x 2 θ (d+2) E(T, ) d+2 d+ + θ 2 E(T, ). (4.26) B θ 5

We now prove a similar estimate for ˆT. Notice first that ˆT (x) = B T (B x+b) so that θ 2 ˆT x 2 = θ 2 B θ B B θ +b B T B(x b) 2 (4.24) θ 2 T B 2 (x b) 2, B 2θ where we had to strengthen the assumption to E(T, ) θ 2. We now need to split into four terms θ 2 ˆT x 2 B θ θ 2 T (x ϕ) 2 + B 2 Id + A 2 + θ 2 (Id B 2 )b 2 + θ 2 sup ϕ b Ax 2 B 2θ B 2θ θ (d+2) E(T, ) d+2 d+ + 2 ϕ() 4 + θ 2 2 ϕ() 2 ϕ() 2 + θ 2 sup B 2θ 3 ϕ() 2 θ (d+2) E(T, ) d+2 d+ + E(T, ) 2 + θ 2 E(T, ) 2 + θ 2 E(T, ) θ (d+2) E(T, ) d+2 d+ + θ 2 E(T, ). Combining this with (4.26), we find that there exists C > such that θ 2 ˆT x 2 + ˆT ) x 2 C (θ (d+2) E(T, ) d+2 d+ + θ 2 E(T, ). B θ We now fix θ such that Cθ 2 2 θ2α, which is possible because α <. If E(T, ) is small enough, Cθ (d+2) E(T, ) d+2 d+ 2 θ2α E(T, ) and thus θ 2 ˆT x 2 + ˆT x 2 θ 2α E(T, ). B θ Equipped with the one-step-improvement of Proposition 4.5, the next proposition is the outcome of a Campanato iteration (see for instance [3, Chap. 5] for an application of Campanato iteration to obtain Schauder theory). It is a Campanato iteration on the C,α level for the transportation map T and thus proceeds by comparing T to affine maps. The main ingredient is the affine invariance of transportation. Proposition 4.6 amounts to an ε-regularity result. Proposition 4.6. Assume that for some R >, B 2R E F and that E(T, 2R) then T is of class C,α in B R, with [ T ] α,br R α E(T, 2R) /2. 6

Proof. By Campanato s theory, see [, Th. 5.I], we have to prove that E(T, 2R) implies sup sup min x B R r 2 R A,b r T (Ax + b) 2 R 2α E(T, 2R). (4.27) 2(+α) B r(x ) Let us first notice that if E(T, 2R), then for every x B R E := R 2 T x 2 + T x 2. (4.28) B R (x ) Therefore, in order to prove (4.27), it is enough to prove that (4.28) implies that for r R, 2 min A,b r T (Ax + b) 2 r 2α E. (4.29) 2 B r(x ) Without loss of generality we may assume that x =. Thanks to (4.28), Proposition 4.5 applies and there exist a (symmetric) matrix B with det B = and a vector b such that T (x) := B (T (B x) b ) satisfies E(T, θr) θ 2α E(T, R). If T is a minimizer of (4.), then so is T with (E, F ) replaced by (B E, B (F b )) (indeed, since det B =, T sends χ B E on χ B (F b ) and if T is the gradient of a convex function ψ then T = ψ where ψ (x) := ψ(b x) b B x is also a convex function, which characterizes optimality [8, Th. 2.2]). Notice also that if E(T, R) is small enough, so that by (4.24), B Id and R b are very small, then B θr B E B (F b ). Therefore, we may iterate Proposition 4.5 to find a sequence of (symmetric) matrices B k, a sequence of vectors b k, and a sequence of maps T k such that T k (x) = B k (T k (B k x) b k ) and E(T k, θ k R) θ 2αk E(T, R), B k Id 2 θ 2αk E(T, R) and b k 2 θ 2(α+)k R 2 E(T, R). (4.3) Letting A k := B k B k B (so that det A k = ) and d k := k i= B kb k B i b i, we see that T k (x) = A k T (A k x) d k. By (4.3), A k Id 2 θ 2α E(T, R), (4.3) so that B 2 θk R A k (B θ R). We then conclude by definition of T k k that min A,b ( 2 θk R) T (Ax + b) 2 2 B 2 (θ k R) T (A 2 θ k A R k (B θ k R ) k = (θ k R) 2 k (T k x) 2 (4.3) B θ k R A (θ k R) T 2 k x 2 B θ k R From this (4.29) follows, which concludes the proof of (4.27). A k (4.3) x + A k d k) 2 θ 2αk E(T, R). 7

With this ε-regularity result at hand, we now may prove that T is a C,α diffeomorphism outside of a set of measure zero. Theorem 4.7. Let E and F be two bounded open sets with E = F and let T be the minimizer of (4.). Then, there exist open sets E E and F F with E\E = F \F = and such that T is a C diffeomorphism between E and F. Proof. By the Alexandrov Theorem [9, Th. 4.25], there exist two sets of full measure E E and F F such that T and T are differentiable in the sense that there exist A, B symmetric such that for a.e. (x, y) E F, T (x) = T (x )+A(x x )+o( x x ) and T (y) = T (y )+B(y y )+o( y y ). (4.32) Moreover, we may assume that (4.2) holds for every (x, y ) E F. Using (4.2), it is not hard to show that if T (x ) = y, then A = B and det A = det B =. We finally let E := E T (F ) and F := T (E ) = F T (E ). Notice that since T sends sets of measure zero to sets of measure zero, E\E = F \F =. We are going to prove that E and F are open sets and that T is a C,α diffeomorphism from E to F. Let x E, and thus automatically y := T (x ) F, be given; we shall prove that T is of class C,α for given α (, ) in a neighborhood of x. By (4.32) we have in particular lim R R T y 2 A(x x ) 2 + R T x 2 A (y y ) 2 =. (4.33) B R (x ) B R (y ) We make the change of variables x = A /2ˆx + x, y = A /2 ŷ + y, which leads to ˆT (ˆx) := A /2 (T (x) y ) and ˆT (ŷ) = A /2 (T (y) x ), and note that ˆT is the optimal transportation map between Ê := A/2 (E x ) and ˆF := A /2 (F y ) (indeed, if T = ψ for a convex function ψ, then ˆT = ˆ ˆψ, where ˆψ(ˆx) = ψ(x) y ˆx). This change of variables is made such that (4.33) turns into lim R R ˆT ˆx 2 + ˆT ŷ 2 =. 2 B R Moreover, since for R small enough, we have B R (x ) E and B R (y ) F and thus B R Ê ˆF. Hence we may apply Proposition 4.6 to ˆT to obtain that ˆT is of class C,α in a neighborhood of zero. Similarly, we obtain that ˆT is C,α in a neighborhood of zero. Going back to the original map, this means that T is a C,α diffeomorphism of a neighborhood U of x on the neighborhood T (U) of T (x ). In particular, U T (U) E F so that E and F are both open and thanks to (4.2), T is a global C,α diffeomorphism from E to F. If ψ is a convex function function such that ψ = T, this means that ψ C 2,α (E ) and it solves (in the classical sense) det 2 ψ = χ E which is now a uniformly elliptic equation. By the Evans-Krylov Theorem (see [9]) and Schauder estimates we obtain by bootstrap the C regularity of T. 8

Acknowledgment Part of this research was funded by the program PGMO of the FMJH through the project COCA. The hospitality of the IHES and of the MPI-MIS are gratefully acknowledged. References [] G. Alberti, R. Choksi, and F. Otto, Uniform energy distribution for an isoperimetric problem with long-range interactions, J. Amer. Math. Soc. 22 (29), no. 2, 569 65. [2] L. Ambrosio, N. Fusco, and D. Pallara, Functions of bounded variation and free discontinuity problems, Oxford Mathematical Monographs, The Clarendon Press, Oxford University Press, New York, 2. [3] L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows in metric spaces and in the space of probability measures, second ed., Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, 28. [4] H. Brézis, Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert, North-Holland Publishing Co., Amsterdam-London; American Elsevier Publishing Co., Inc., New York, 973, North-Holland Mathematics Studies, No. 5. Notas de Matemática (5). [5] L. A. Caffarelli, Interior W 2,p estimates for solutions of the Monge-Ampère equation, Ann. of Math. (2) 3 (99), no., 35 5. [6], A localization property of viscosity solutions to the Monge-Ampère equation and their strict convexity, Ann. of Math. (2) 3 (99), no., 29 34. [7], Some regularity properties of solutions of Monge Ampère equation, Comm. Pure Appl. Math. 44 (99), no. 8-9, 965 969. [8], The regularity of mappings with a convex potential, J. Amer. Math. Soc. 5 (992), no., 99 4. [9] L. A. Caffarelli and X. Cabré, Fully nonlinear elliptic equations, American Mathematical Society Colloquium Publications, vol. 43, American Mathematical Society, Providence, RI, 995. [] S. Campanato, Proprietà di una famiglia di spazi funzionali, Ann. Scuola Norm. Sup. Pisa (3) 8 (964), 37 6. [] G. De Philippis and A. Figalli, Partial regularity for optimal transport maps, Publ. Math. Inst. Hautes Études Sci. 2 (25), 8 2. 9

[2] A. Figalli and Y.-H. Kim, Partial regularity of Brenier solutions of the Monge-Ampère equation, Discrete Contin. Dyn. Syst. 28 (2), no. 2, 559 565. [3] M. Giaquinta and L. Martinazzi, An introduction to the regularity theory for elliptic systems, harmonic maps and minimal graphs, second ed., Appunti. Scuola Normale Superiore di Pisa (Nuova Serie) [Lecture Notes. Scuola Normale Superiore di Pisa (New Series)], vol., Edizioni della Normale, Pisa, 22. [4] F. Maggi, Sets of finite perimeter and geometric variational problems, Cambridge Studies in Advanced Mathematics, vol. 35, Cambridge University Press, Cambridge, 22, An introduction to geometric measure theory. [5] R. McCann, A convexity theory for interacting gases and equilibrium crystals, PhD Thesis, Princeton Univ. (994). [6] B. Merlet, A highly anisotropic nonlinear elasticity model for vesicles II: Derivation of the thin bilayer bending theory, Arch. Ration. Mech. Anal. 27 (25), no. 2, 68 74. [7] F. Santambrogio, Optimal transport for applied mathematicians, Progress in Nonlinear Differential Equations and their Applications, vol. 87, Birkhäuser/Springer, Cham, 25, Calculus of variations, PDEs, and modeling. [8] C. Villani, Topics in optimal transportation, Graduate Studies in Mathematics, vol. 58, American Mathematical Society, Providence, RI, 23. [9], Optimal transport. Old and new, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 338, Springer- Verlag, Berlin, 29. 2