Optimal transport for Seismic Imaging

Similar documents
Seismic imaging and optimal transport

Full Waveform Inversion via Matched Source Extension

Registration-guided least-squares waveform inversion

RANDOM PROPERTIES BENOIT PAUSADER

Fast convergent finite difference solvers for the elliptic Monge-Ampère equation

A Survey of Computational High Frequency Wave Propagation II. Olof Runborg NADA, KTH

Optimal Transportation. Nonlinear Partial Differential Equations

A Narrow-Stencil Finite Difference Method for Hamilton-Jacobi-Bellman Equations

A VARIATIONAL METHOD FOR THE ANALYSIS OF A MONOTONE SCHEME FOR THE MONGE-AMPÈRE EQUATION 1. INTRODUCTION

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Overview Avoiding Cycle Skipping: Model Extension MSWI: Space-time Extension Numerical Examples

Filtered scheme and error estimate for first order Hamilton-Jacobi equations

Optimal Transport: A Crash Course

Source estimation for frequency-domain FWI with robust penalties

Numerical Solutions of Geometric Partial Differential Equations. Adam Oberman McGill University

The semi-geostrophic equations - a model for large-scale atmospheric flows

A projected Hessian for full waveform inversion

Nonlinear seismic imaging via reduced order model backprojection

softened probabilistic

Class Meeting # 1: Introduction to PDEs

Elements of linear algebra

arxiv: v2 [math.na] 2 Aug 2013

Vector Derivatives and the Gradient

Wavefield Reconstruction Inversion (WRI) a new take on wave-equation based inversion Felix J. Herrmann

Brittany D. Froese. Contact Information

The X-ray transform for a non-abelian connection in two dimensions

FWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke

PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION

An Inexact Newton Method for Nonlinear Constrained Optimization

1.2 Derivation. d p f = d p f(x(p)) = x fd p x (= f x x p ). (1) Second, g x x p + g p = 0. d p f = f x g 1. The expression f x gx

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Computing High Frequency Waves By the Level Set Method

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

Youzuo Lin and Lianjie Huang

A new Hellinger-Kantorovich distance between positive measures and optimal Entropy-Transport problems

Motion Estimation (I) Ce Liu Microsoft Research New England

CONVERGENCE THEORY. G. ALLAIRE CMAP, Ecole Polytechnique. 1. Maximum principle. 2. Oscillating test function. 3. Two-scale convergence

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Algorithms for Nonsmooth Optimization

On second order sufficient optimality conditions for quasilinear elliptic boundary control problems

Lecture No 1 Introduction to Diffusion equations The heat equat

THE INVERSE FUNCTION THEOREM

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

LECTURES ON MEAN FIELD GAMES: II. CALCULUS OVER WASSERSTEIN SPACE, CONTROL OF MCKEAN-VLASOV DYNAMICS, AND THE MASTER EQUATION

Discrete transport problems and the concavity of entropy

Cubic regularization of Newton s method for convex problems with constraints

Dynamic and Stochastic Brenier Transport via Hopf-Lax formulae on Was

Partial Differential Equations

Gradient Flows: Qualitative Properties & Numerical Schemes

Motion Estimation (I)

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

POD for Parametric PDEs and for Optimality Systems

Investigating the Influence of Box-Constraints on the Solution of a Total Variation Model via an Efficient Primal-Dual Method

Introduction to Aspects of Multiscale Modeling as Applied to Porous Media

ARE211, Fall 2005 CONTENTS. 5. Characteristics of Functions Surjective, Injective and Bijective functions. 5.2.

LECTURE # 0 BASIC NOTATIONS AND CONCEPTS IN THE THEORY OF PARTIAL DIFFERENTIAL EQUATIONS (PDES)

Bounded uniformly continuous functions

Second Order Elliptic PDE

Blowup dynamics of an unstable thin-film equation

Optimal control problems with PDE constraints

Robust Optimization for Risk Control in Enterprise-wide Optimization

GERARD AWANOU. i 1< <i k

A few words about the MTW tensor

Recent developments in elliptic partial differential equations of Monge Ampère type

Stability of optimization problems with stochastic dominance constraints

On the approximation of the principal eigenvalue for a class of nonlinear elliptic operators

Lecture Introduction

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

NONTRIVIAL SOLUTIONS TO INTEGRAL AND DIFFERENTIAL EQUATIONS

Variational approach to mean field games with density constraints

Half of Final Exam Name: Practice Problems October 28, 2014

1 Non-negative Matrix Factorization (NMF)

Entropic Schemes for Conservation Laws

Dual methods for the minimization of the total variation

Bound-state solutions and well-posedness of the dispersion-managed nonlinear Schrödinger and related equations

Unravel Faults on Seismic Migration Images Using Structure-Oriented, Fault-Preserving and Nonlinear Anisotropic Diffusion Filtering

TD 1: Hilbert Spaces and Applications

Some Aspects of Solutions of Partial Differential Equations

ENO and WENO schemes. Further topics and time Integration

Comparison between least-squares reverse time migration and full-waveform inversion

Numerical Modeling of Methane Hydrate Evolution

Optimal mass transport as a distance measure between images

Degenerate Monge-Ampère equations and the smoothness of the eigenfunction

Microseismic Event Estimation Via Full Waveform Inversion

Iterative regularization of nonlinear ill-posed problems in Banach space

Matrix Derivatives and Descent Optimization Methods

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Uniformly Uniformly-ergodic Markov chains and BSDEs

Nonlinear Diffusion. Journal Club Presentation. Xiaowei Zhou

On the regularity of solutions of optimal transportation problems

Full Waveform Inversion (FWI) with wave-equation migration. Gary Margrave Rob Ferguson Chad Hogan Banff, 3 Dec. 2010

Errata Applied Analysis

PDEs, part 1: Introduction and elliptic PDEs

Transform methods. and its inverse can be used to analyze certain time-dependent PDEs. f(x) sin(sxπ/(n + 1))

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Convexity of level sets for solutions to nonlinear elliptic problems in convex rings. Paola Cuoghi and Paolo Salani

Weighted Essentially Non-Oscillatory limiters for Runge-Kutta Discontinuous Galerkin Methods

Nonlinear acoustic imaging via reduced order model backprojection

z x = f x (x, y, a, b), z y = f y (x, y, a, b). F(x, y, z, z x, z y ) = 0. This is a PDE for the unknown function of two independent variables.

Existence and uniqueness results for the MFG system

Transcription:

Optimal transport for Seismic Imaging Bjorn Engquist In collaboration with Brittany Froese and Yunan Yang ICERM Workshop - Recent Advances in Seismic Modeling and Inversion: From Analysis to Applications, Brown University, November 6-10, 2017

Outline 1. Remarks on Full Waveform Inversion (FWI) 2. Measures of mismatch 3. Optimal transport and Wasserstein metric 4. Monge-Ampère equation and its numerical approximation 5. Applications to full waveform inversion 6. Conclusions Background: matching problem and technique

1. Remarks on Full Waveform Inversion (FWI) Full Waveform Inversion is an increasingly important technique in the inverse seismic imaging process It is a PDE constrained optimization formulation Model parameters v are determined to fit data min m(x) ( u comp (m) u data A + λ Lv ) B

FWI: PDE constrained optimization FWI: Measured and processed data is compared to a computed wave field based on model parameters v to be determined (for example, P-wave velocity) min m(x) ( u comp (m) u data A + λ Lv ) B. A measure of mismatch L 2 the standard choice Lc B potential regularization term, which we will omit for this presentation at the surface Over determined boundary conditions at the surface

Mathematical and computational challenges Important computational steps Relevant measure of mismatch ( ) Fast wave field solver In our case scalar wave equation in time or frequency domain, for example u tt = m(x) 2 Δu, Efficient optimization Adjoint state method for gradient computation

2. Measures of mismatch We will denote the computed wave field by f(x,t;v) and the data by g(x,t), u comp (x,t;m) = f (x,t;m), u data (x,t) = g(x,t) The common and original measure of mismatch between the computed signal f and the measured data g is L 2, [Tarantola, 1984, 1986] min m f m g L2 We will make some remarks on different Measures of mismatch starting with local estimates to more global

Global minimum It can be expected that the mismatch functional will have local minima that complicates minimization algorithms Ideally, local minima different from the global min should be avoided for some natural parameterizations as shift and dilations (f (t) = g(at - s)) Shift as a function of t, dilation as a function of x u tt = m 2 u xx, x > 0, t > 0 u(0,t) = u 0 (t) u = u 0 (t x / m)

Local measures In the L 2 local mismatch, estimators f and g are compared point wise, J(v) = f g L2 = ( f (x i,t j ) g(x i,t j ) 2 ) 1/2 i, j This works well if the starting values for the model parameters are good otherwise there is risk for trapping in local minima cycle skipping

Cycle skipping The need for better mismatch functionals can be seen from a simple shift example small basin of attraction For other examples, [Vireux et al 2009] L 2 2 Shift or displacement Cycle skipping Local minima

Global measures Different measures have been introduced to to compare all of f and g not just point wise. Integrated functions NIM [Liu et al 2014] [Donno et al 2014] Stationary marching filters Example AWI, [Warner et al 2014] Non-stationary marching filters Example [Fomel et al 2013] Measures based on optimal transport ( )

Integrated functions f and g are integrated, typically in 1D-time, before L 2 comparison J = F G L2 = ( F(x i,t j ) G(x i,t j ) 2 ) 1/2, i, j j j F i, j = f (x i,t k ), G i, j = g(x i,t k ), k=1 In mathematical notation this is the H -1 semi-norm Slight increase in wave length for short signals (Ricker wavelet) Often applied to modified signals like squaring scaling or envelope to have f and g positive and with equal integral k=1

Matching filters The filter based measures typically has two steps Computing filter coefficients K K = argmin K f g L2 Estimation of difference between computed filter and the identity map. K I The filter can be stationary or non-stationary The optimal transport based techniques Does this in one step Minimization is of a measure of transform K or as it is called transport

3. Optimal transport and Wasserstein metric Wasserstein metric measures the cost for optimally transport one measure (signal) f to the other: g Monge-Kantorivich optimal transport measure earth movers distance in computer science f(x) g(y) Compare travel time distance Classic in seismology

Optimal transport and Wasserstein metric The Wasserstein metric is directly based on one cost function Signals in exploration seismology are not as clean as above and a robust functional combining features of L 2 and travel time is desirable Extensive mathematical foundation f(x) g(y)

Wasserstein distance W p ( f, g) = inf γ d(x, y) p dγ(x, y) Here the plan T is the optimal transport map from positive Borel measures f to g of equal mass Well developed mathematical theory, [Villani, 2003, 2009] X Y γ Γ X Y, the set of product measure : f and g f (x)dx = g(y)dy, f, g 0 X Y 1/p 2 W 2 ( f, g) = inf x T f,g (x) T f,g 2 f (x)dx X 1/2

Wasserstein distance s f g distance = s 2 (mass)

Wasserstein distance s f g In this model example W 2 and L 2 is equal (modulo a constant) to leading order when separation distance s is small. Recall L 2 is the standard measure

Wasserstein distance s f g When s is large W 2 = s = travel distance (time), ( higher frequency ), L 2 independent of s Potential for avoiding cycle skipping

Wasserstein distance vs L 2 Fidelity measure Cycle skipping Local minima L 2 2 Function of displacement

Wasserstein distance vs L 2 Fidelity measure L 2 2 W 2 2

Wasserstein distance vs L 2 Fidelity measure This is the basic motivation for exploring Wasserstein metric to measure the misfit Local min are well known problems L 2 2 W 2 2

Wasserstein distance vs L 2 Fidelity measure We will see that there are hidden difficulties in making this work in practice Normalized signal in right frame L 2 2 W 2 2

Analysis Theorem 1: W 2 2 is convex with respect to translation, s and dilation, a, W 2 2 ( f, g)[α, s], f (x) = g(ax s)α d, a > 0, x, s R d Theorem 2: W 2 2 is convex with respect to local amplitude change, λ # % g(x)λ, x Ω W 2 1 2 ( f, g)[β], f (x) = $ β R, Ω = Ω 1 Ω 2 &% βg(x)λ, x Ω 2 λ = ( ) gdx / gdx + β gdx Ω Ω2 Ω 1 (L 2 only satisfies 2 nd theorem)

Remarks The scalar dilation ax can be generalized to Ax where A is a positive definite matrix. Convexity is then in terms of the eigenvalues The proof of theorem 1 is based on c-cyclic monotonicity ( x j, x j ) ( ) c( x j, x σ ( j) ) { } Γ c x j, x j The proof of theorem two is based on the inequality j W 2 2 (sf 1 + (1 s) f 2, g) sw 2 2 ( f 1, g)+ (1 s)w 2 2 ( f 2, g) j

Illustration: discrete proof (theorem 1) Equal point masses then weak limit for generl theorem alternative to using the c-cyclic propery

Illustration: discrete proof W 2 2 = min σ min σ min σ $ & % $ & % x ο j J j=1 J j=1 J j=1 x ο j x j x ο j x j x ο j (x j sξ) 2 2 = x j σ j = j 2 = ( σ : permutation) J ' 2s ( x ο j x ) j ξ + J sξ 2 ) = j=1 ( ' + J sξ 2 J J ), from x ο j = x j ( j=1 j=1

Noise W 2 2 less sensitive to noise than L 2 Theorem 3: f = g + δ, δ uniformly distributed uncorrelated random noise, (f > 0), discrete i.e. piecewise constant: N intervals Proof by domain decomposition dimension by dimension and standard deviation estimates using closed 1D formula 2 f g L2 = O (1), W 2 2 f = ( f 1, f 2,.., f J ) ( f g) = O(N 1 )

Computing the optimal transport In 1D, optimal transport is equivalent to sorting with efficient numerical algorithms O(Nlog(N)) complexity, N data points W 2 ( f, g) = F(x) = ( F 1 (y) G 1 (y))dy x x f (ξ)dξ, g(x) = g(ξ)dξ In higher dimensions such combinatorial methods as the Hungarian algorithm are very costly O(N 3 ), Alternatives: linear programming, sliced Wasserstein, ADMM

Computing of optimal transport For higher dimensions fortunately the optimal transport related to W 2 can be solved via a Monge-Ampère equation [Brenier 1991, 1998] 2 W 2 ( f, g) = x u(x) 2 X f (x)dx ( ) = f (x) / g( u(x)) det D 2 (u) Brenier map T(x) = u(x) 1/2 Recently there are now alternative PDF formulations

4. Monge-Ampère equation and its numerical approximation Nonlinear equation with potential loss of regularity Weak viscosity solution u if u is both a sub and super solution Sub solution (super analogous) x 0 ( ) f (x) = 0, u convex, f C 0 (Ω) det D 2 (u) Ω, if local max of u φ, then ( ) f (x 0 ) det D 2 φ 1D u xx = f, φ(x 0 ) = u(x 0 ), φ(x 0 ) = u(x 0 ), φ(x) u(x) φ xx f

Numerical approximation Consistent, stable and monotone finite difference approximations will converge to Monge-Ampère viscosity solutions [Barles, Souganidis, 1991] ( ) = u v j v j det D 2 u d + j=1 ( ) { v j }: eigenvectors of D 2 u [Benamou, Froese, Oberman, 2014]

5. Applications to full waveform inversion First example: Problem with reflection from two layers dependence on parameters, with Froese. S R t Robust convergence with direct minimization algorithm (Simplex) geometrical optics and positive f, g Offset = R-S

Reflections and inversion example L 2 W 2

Gradient for optimization For large scale optimization, gradient of J(f) = W 22 (f,g) with respect to wave velocity is required in a quasi Newton method in the PDE constrained optimization step Based on linearization of J and Monge-Ampère equation resulting in linear elliptic PDE (adjoint source) J +δj = ( f +δ f ) x (u f +δu) 2 dx f +δ f = g( (u f +δu))det(d 2 (u f +δu)) L(v) = g( u f )tr((d 2 u f ) D 2 (v))+ det(d 2 u f )g( u f ) v = δ f

W2 Remarks + Captures important features of distance in both travel time and L 2, Convexity with respect to natural parameters + There exists fast algorithms and technique robust vs. noise - Constraints that are not natural for seismology X f (x)dx = g(y)dy, f, g 0, g > 0, M A Y Normalize: transform f, g to be positive and with the same integral

Large scale applications Early normalizations: squaring, consider positive and negative parts of f and g separately not appropriate for adjoint state technique Successful normalization linear!f (x) = ( f (x)+ c) / ( f (x)+ c)dx,!g(x) =... Efficient alternative to W 2 (2D): trace by trace W 2 (1D) coupled to L 2 [Yang et al 2016] Other alternatives, W 1, unbalanced transport [Chizat et al 2015], Dual formulation of optimal transport Normalized W 2 + λl 2 is an unbalanced transport measure, λ > 0

Applications Seismic test cases Marmousi model (velocity field) Original model and initial velocity field to start optimization

Marmousi model Original and FWI reconstruction with different initializations: W 2-1D, W 2-2D, L 2

Marmousi model Robustness to noise: good for data but allows for oscillations in optimal computed velocity, numerical M A errors Remedy: trace by trace, TV - regularization

Marmousi model Robustness to noise: good for data but allows for oscillations in optimal computed velocity, numerical M A errors Remedy: trace by trace, TV - regularization W 2 W 1

BP 2004 model High contrast salt deposit, W 2-1D, W 2-2D, L 2

Camembert

W 1 example Example below: W 1 measure and Marmousi p-velocity model [Metivier et. al, 2016] Similar quality but more sensitive to noise and L 2 when f g Solver with better 2D performance

W 2 with noise Slightly temporally correlated uniformly distributed noise

Remarks Troubling issues Theoretical results of convexity based on normalized signals of the form squaring etc. but not practically useful (squaring: not sign sensitive, requires compact support and problem with adjoint state method) The practically working normalization based on linear normalization does not satisfy convexity with respect to shifts

Remarks Linear scaling: misfit as function of shift, Ricker wavelet Function of velocity parameters: v = v p + αz [Metivier et. al, 2016]

Remarks New and currently best normalization!f (x) = ˆf f (x)+ c (x) / ˆf (x)dx, ˆf 1, x 0 (x) = c 1 exp(cf (x), x < 0 Good in practice allows for less accurate initial mode than linear scaling Satisfies our theorems!f for c large enough c 1 f

6. Conclusions Optimal transport and the Wasserstein metric are promising tools in seismic imaging Theory and basic algorithms need to be modified to handle realistic seismic data Ready for field data, [PGS, SEG2017, North Sea]