THE RICE INVERSION PROJECT

Size: px

Start display at page:

Download "THE RICE INVERSION PROJECT"

Lora Myra Shelton
5 years ago
Views:

1 THE RICE INVERSION PROJECT Mario Bencomo, Lei Fu, Jie Hou, Guanghui Huang, Rami Nammour, and William Symes Annual Report 6

2 Copyright 5-6 by Rice University

3 i TRIP6 TABLE OF CONTENTS William W. Symes, The Rice Inversion Project 6 Report - DRAFT.... Jie Hou and William W. Symes, Approximate Gauss-Newton iteration for Full Waveform Inversion Jie Hou and William W. Symes, Inversion velocity analysis via approximate Born inversion Guanghui Huang, Rami Nammour, and William W. Symes, Matched Source Waveform Inversion: Space-time Extension Guanghui Huang, Rami Nammour, and William W. Symes, Matched Source Waveform Inversion: Volume Extension William W. Symes, Mathematical Fidelity and Open Source Libraries for Large Scale Simulation and Optimization Mario J. Bencomo, An Implementation of Multipole Point-Sources within the IWave Framework Guanghui Huang, Rami Nammour, and William W. Symes, Full Waveform Inversion via Source-Receiver Extension William W. Symes, Gradients of Reduced Objective Functions for Separable Wave Inverse Problems Jie Hou and William W. Symes, An alternative formula for approximate extended Born inversion Lei Fu and William W. Symes, Fast extended waveform inversion using Morozov s discrepancy principle

4 ii

5 The Rice Inversion Project, TRIP6, July 6, 6 INTRODUCTION TO THE 6 ANNUAL REPORT Welcome to the 6 Annual Report volume of The Rice Inversion Project. This volume contains manuscripts of papers, abstracts and reports completed during the course of the project year. It will evolve as the year rolls along, and take final form in January 7. The finished Annual Report web page will link to this volume, and also include the program of the Annual Review Meeting and links to slides sets presented there, as in prior years. I am pleased to acknowledge our debt to Sergey Fomel and other contributors to the Madagascar project, whose reproducible research framework makes our approach to distribution of reports possible. WWS, April 6

7 The Rice Inversion Project, TRIP6, July 6, 6 Approximate Gauss-Newton iteration for Full Waveform Inversion Jie Hou, William W. Symes, The Rice Inversion Project, Rice University ABSTRACT Full waveform inversion (FWI) reconstructs the subsurface model from observed seismic data by minimizing the energy of the difference between predicted data and observed data. It is often formulated as an optimization problem and solved by computationally intensive iterative methods. The steepest ( gradient ) descent method, with or without line search, uses only first derivative information and is severely slowed by the ill-conditioned behavior of FWI. Newton s method and its relatives take the objective curvature into account, hence converge more quickly, at the cost of more expensive iterates. We describe a preconditioned gradient descent method that approximates the iterates, hence the convergence rate, of the Gauss-Newton method while having essentially the same cost per iterate as the steepest descent method. The preconditioned gradient is based on an approximate inverse to the Born modeling operator. Numerical examples demonstrate that preconditioned gradient descent converges dramatically faster than both conventional gradient descent and the L-BFGS quasi-newton method. INTRODUCTION Full waveform inversion (FWI) (Lailly, 983; Tarantola, 984; Virieux and Operto, 9) aims to recover detailed models of the subsurface through a model-based data-fitting procedure, via minimization of the difference between predicted and observed data in the least squares sense. FWI has the potential to recover highresolution model, given an initial model that predicts traveltimes to within a halfwavelength at frequencies with sufficiently high S/N. It extends in principle to any modeled physics and data geometry, although many questions remain about recovering multiple parameters, robustness, and computational efficiency. 3

8 4 Hou and Symes Mathematically, FWI poses to a large-scale nonlinear minimization problem. Due to problem size, only local gradient-based methods are feasible (Gauthier et al., 986; Pratt, 999; Crase et al., 99). An obvious candidate, still used in many FWI exercises, is gradient descent, more properly known as the steepest descent algorithm (Nocedal and Wright, 999). However, steepest descent converges slowly for ill-conditioned problems, for which the objective changes much more rapidly in some directions than in others, and FWI tends to be illconditioned. Newton s method, on the other hand, takes second order (curvature) information into account and thus generally converges to a local minimizer in many fewer iterations than required by steepest descent (Pratt et al., 998; Akcelik et al., 3; Métivier et al., 4). The Newton update (and its close relative, the Gauss-Newton update) requires the solution of a linear system, of size equal to the number of degrees of freedom in the model. Such systems must themselves be solved iteratively, due to their size, so that the (Gauss-)Newton steps may be quite expensive. Alternatively, quasi-newton methods build up an approximation to the inverse Hessian using the gradient at the current step and previous steps. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method (or its limited memory version, L-BFGS) is arguably the most widely-used quasi-newton method (Nocedal and Wright, 999). The main contribution of this paper is to show how to use an inexpensive modification of RTM to approximate the inverse Gauss-Newton Hessian, or pseudoinverse of the Born modeling operator (Hou and Symes, 6), thus achieving something close to the Gauss-Newton convergence rate with updates of roughly the same expense as steepest descent. Approximations to the inverse Hessian have a long history in computational work on FWI. Many authors have used a diagonal or small-bandwidth approximation to the inverse (Chavent and Plessix, 999; Shin et al., ; Operto et al., 4), or a version of the Generalized Radon Transform (Kirchhoff, ray-born,...) inversion (Jin et al., 99; Qin et al., 5; Métivier et al., 5), to accelerate FWI iterations. Our work is close to the latter in spirit, however we use an approximate inverse or true amplitude migration operator that does not require any ray-tracing computations. Similar true amplitude migration operators have been discussed by others (Xu et al., ; ten Kroode, ), in the context of reconstructing angle-based reflectivity. Hou and Symes (5b) provide an equivalent construction in the subsurface offset domain, and assess its accuracy as an approximate inverse. Subsequently we showed that this approximate inverse is very effective for accelerating linearized inversion ( least squares migration ), both in the subsurface offset domain (Hou and Symes, 5a) and for physical domain least squares (Hou and Symes, 6). Here we illustrate

9 Preconditioned FWI 5 a similar effect for nonlinear full waveform inversion. This paper is organized as follow: we first review the theory of FWI and common numerical algorithms used for optimization; we then show how to precondition the gradient and modify the approximate inverse operator; we end with numerical test on D Marmousi model, demonstrating FWI with preconditioned gradient exhibits faster convergence than both steepest (gradient) descent method and L-BFGS method, all equipped with the same backtracking line search. THEORY The D wave equation in acoustic constant density medium can be expressed as u v (x) t (x,t) u(x,t) = f (t,x,x s ); u(x,t),t. () Here x denotes the location in the modeling region, v(x) is the acoustic velocity, u is the acoustic potential and f is the source term, also parametrized by source position x s and time t. The forward modeling operator F maps the model m = v to seismic data d : F [m] = d. () FWI is often formulated as an optimization problem by minimizing the following objective function : J[m] = F [m] d o [+regularizing terms]. (3) Here m is the velocity model to recover, d o is the observed seismic data and stands for L norm. Iterative Optimization Iterative methods for minimization of J[m] increment the model along a search direction p: m i+ = m i + α i p i, (4)

10 6 Hou and Symes where i is the iteration number and α i is a step length, chosen by an approximate optimization along the line through m i in direction p i, to reduce the objective function. Steepest (or gradient) descent uses the negative gradient of J as search direction: p i = g i = F T (F [m i ] d o ). (5) Here g i represent the gradient, F = DF is the derivative of F with respect to model, also known as the Born modeling operator. The superscript T denotes the adjoint operator, defined with respect to the Euclidean inner product via the dot product test : F T δd δm = δd Fδm, (6) asserted to hold for any data perturbation δd and model perturbation δm. F T is a version of the RTM operator, for which relatively efficient computational methods are available (Plessix, 6). The update direction for Newton s method is the solution of a linear system: F T Fp i + D F T (p i,f d o ) = g i. (7) Compared to steepest descent, Newton s method converges extremely fast, at least near the minimum of the objective function. This is because the solution of the Newton system 7 compensates for the ill-conditioning mentioned earlier and to some extent for the nonlinearity of the objective function. The second term on the left-hand side of equation 7 vanishes when the model precisely predicts the data, and may be viewed as a nuisance, as it involves the second derivative of the modeling operator (however see Métivier et al. (4), who show the importance of this term when nonlinear (multiple) scattering is strong). The Gauss-Newton method neglects the second term, and computes p i as the solution of F T Fp i = g i. (8) Whether Newton (equation 7) or Gauss-Newton (equation 8 updates are chosen, a very large linear system must be solved, necessarily by an iterative method such as conjugate gradients (Nocedal and Wright, 999; Akcelik et al., 3). As a result, each iteration typically requires a considerable number of inner iterations to solve for p i, each inner iteration involving a Born modeling - RTM cycle. Quasi-Newton methods build up an approximation to the solution p i of either equation 7 or 8 using a generalization of the secant method, combining current

11 Preconditioned FWI 7 gradient and previous search directions (Nocedal and Wright, 999) to produce a low rank approximation to the operator on the left hand sides of these equation, allowing easy solution. An example is the BFGS algorithm. The limited memory version BFGS method, L-BFGS, avoids excessive memory use by storing only a limited number of previous search directions on a first-in, first-out basis (Nocedal and Wright, 999). Approximating Gauss-Newton In this section, we will show how to approximately solve the Gauss-Newton system 8 by applying an inexpensive modification to the RTM operator. The update direction (or preconditioned gradient) for this method is p i = F (F (m i ) d) (9) where the modified RTM operator F approximates the pseudoinverse of the Born modeling operator (F T F) F T. The approximate pseudoinverse used in our work originates in the construction of a computable approximate inverse to the subsurface offset extended Born modeling operator F, which acts on model perturbations depending on a subsurface offset (h-) axis (ten Kroode, ; Hou and Symes, 4, 5b): F = W model F T W data, () where W model = 4v5 LP, W data = I 4 t D zs D zr. L is similar to the Laplacian: in the wavenumber domain, it is multiplication by k xz k hz. I t is time integration. F T is the Euclidean adjoint of F (extended RTM) and D zs,d zr are the source and receiver depth derivatives. P is an oscillatory integral operator approximately equal to for nearly focused images. We will deal here only with the focused case (kinematically data-consistent velocity model), so we will neglect P. Since the extended image volume is then focused at h =, we can neglect the energy at nonzero h, so replace F T with ordinary (non-extended) F T (RTM) in equation 5 to obtain an approximate pseudoinverse for F (Hou and Symes, 6): F = W model FT W data. () In this case, only W model depends on the subsurface offset h, and can be simply applied by padding with zeroes along the subsurface offset axis. Alternatively, we

12 8 Hou and Symes can use operators defined only on non-extended (h = ) models and data perturbations: Wmodel = 8v4 k z, W data = ω It 4 D zs D zr. This reformulation arises from the ray geometry of reflection in the focused case, as shown in Figure : at h =, k xz = ω v cosθ k hz = k z cosθ. () x s x r Reflector kh khz ψ θ θ ψ β α ψ kr kxz ks Figure : Sketch of the subsurface for h =. x s,x r represent the location of the source and receiver; k s,k r are the slowness vectors of the source and receiver rays; k h is symmetric to k s in terms of vertical axis; θ is half of the angle between k s and k r ; α is the incident angle between source ray and vertical direction; β is the incident angle between receiver ray and vertical direction; ψ is half of the angle between k h and k r. Notice the weight operators W model, W data in either case are filters, and together are of order zero in frequency. Therefore, they will not change the frequency balance of the RTM image volume, but only alter amplitude and phase. Notice also that the approximate inverse operator has almost the same computational cost as the conventional RTM operator. FWI with Approximate Gauss-Newton Iteration FWI can be implemented in both time domain and frequency domain. Here we will test in time domain only. The workflow of FWI with the approximate Gauss- Newton, or preconditioned gradient descent, method is :. compute the data residual and its norm (equation 3);. apply the approximate inverse operator on the data residual (equation 9); 3. determine a suitable step length α i using backtracking line-search method; 4. update the velocity model (equation 4).

Preconditioned FWI 9 NUMERICAL EXAMPLES We apply the algorithm described above to an example based on the Marmousi model (Versteeg and Grau, 99). The model is resampled to m grid.

A -8 finite difference scheme is used to simulate 4 s seismic data. The source, regarded as known in these experiments and not updated, is a (.5-5--5)-Hz band passwavelet with ms time sample.

13 Preconditioned FWI 9 NUMERICAL EXAMPLES We apply the algorithm described above to an example based on the Marmousi model (Versteeg and Grau, 99). The model is resampled to m grid. A 5 m water layer (not shown in the figures) is added on the top. A fixed spread of 3 sources and 46 receivers are placed at m below the surface. A -8 finite difference scheme is used to simulate 4 s seismic data. The source, regarded as known in these experiments and not updated, is a ( )-Hz band passwavelet with ms time sample. The initial model (Fig a) used in FWI is the smoothed version of the true model (Fig a). Both the gradient and preconditioned gradient at the first iteration are shown in Fig. The amplitude distribution of the preconditioned gradient is clearly more balanced: both the shallow part and the deep part are well recovered, while most of the energy in the conventional gradient focuses in the shallow part. Figure : (a) Gradient; (b) Preconditioned gradient for the first iteration. We then optimize FWI objective function with both gradient descent method and L-BFGS method. The gradient descent method will be applied with both the conventional gradient and preconditioned gradient; as explained above, in the second case, the iteration approximates the Gauss-Newton method. The convergence history is shown in Figure 3. We can see the approximate Gauss-Newton method reduces the objective function to % of its initial value in iterations, whereas steepest descent and L-BFGS, respectively, achieves to reduce the residual only to 3 % and 5 % in 4 iterations. Figure 6 displays the result of steepest descent iterations; Figure 8, the

14 Hou and Symes Normalized Objective Function Steepest Descent L BFGS Approximate Gauss Newton Number of Iterations Figure 3: Blue line: residual for the steepest descent method, relative to initial residual. Red line: relative residual for approximate Gauss-Newton method. Cost per step is approximately the same. Green line: residual for the L-BFGS method. result of 4 L-BFGS; Figure 7, iteration of approximate Gauss-Newton iteration; and Figure 9, 4 iterations of approximate Gauss-Newton iteration. We set the L-BFGS iteration to retain 5 prior search directions. We can see iteration of approximate Gauss-Newton method gives a result comparable to that of iterations of steepest descent or 4 L-BFGS iterations. All three algorithms can recover the shallow part of the model accurately, but only approximate Gauss-Newton recovered the deep part at all well. Figure 4: Smoothed version of true model as initial guess

15 Preconditioned FWI Figure 5: Marmousi model Figure 6: Recovered model after iteration steepest descent method

16 Hou and Symes Figure 7: Recovered model after iteration approximate Gauss-Newton method Figure 8: Recovered model after 4 iteration L-BFGS method

17 Preconditioned FWI 3 Figure 9: Recovered model after 4 iteration approximate Gauss-Newton method DISCUSSION AND CONCLUSIONS We have described a preconditioned gradient descent algorithm that approximates the Gauss-Newton algorithm and its rate of convergence, by replacing the adjoint (RTM) operator in the definition of FWI gradient with an approximate inverse operator. In an initial test using the Marmousi velocity model, the approximate Gauss-Newton algorithm significantly outperformed both steepest (gradient) descent and L-BFGS iterations. Most importantly, the proposed algorithm has computational cost per iteration roughly equal to that of the steepest descent method, that is, computation of the FWI gradient. Several fundamental puzzles deserved mention. The approximate pseudoinverse to the Born modeling operator, at the heart of the approximate Gauss- Newton method, is based on high-frequency asymptotic analysis of Born scattering, even though the actual computations do not involve any ray tracing or other asymptotic calculations. The asymptotic analysis assumes separation of scales, a model/data relationship not required for the formulation of FWI, and not valid in principle for the model in our example (Marmousi). We do not at present understand the apparent effectiveness of the algorithm in this setting. Multiparameter models (variable density acoustics, elasticity) may admit a similar approach. We do not begin to understand how to formulate high frequency asymptotic approximation in the presence of attenuation. Finally, we note that the algorithm described here does not solve the cycleskipping problem. The standing assumption in this work is that the initial ve-

18 4 Hou and Symes locity model predicts traveltimes within at most a half-wavelength error at significant S/N. Given that assumption, the approximate Gauss-Newton algorithm appears to offer significant efficiency advantages over other common approaches to FWI. ACKNOWLEDGMENTS We are grateful to the sponsors of The Rice Inversion Project (TRIP) for their longterm support, and to Shell International Exploration and Production Inc for its support of Jie Hou s PhD research. We acknowledge the Texas Advanced Computing Center (TACC) and the Rice University Research Computing Support Group (RCSG) for providing essential HPC resources. REFERENCES Akcelik, V., J. Bielak, G. Biros, I. Epanomeritakis, A. Fernandez, O. Ghattas, E. Kim, J. Lopez, D. O Hallaron, T. Tu, and J. Urbanic, 3, High resolution forward and inverse earthquake modeling on terascale computers: Presented at the Supercomputing 3, Proceedings, Association for Computing Machinery. Chavent, G., and R. Plessix, 999, An optimal true-amplitude least-squares prestack depth-migration operator: Geophysics, 64, Crase, E., A. Pica, M. Noble, J. McDonald, and A. Tarantola, 99, Robust elastic nonlinear waveform inversion: Application to real data: Geophysics, 55, Gauthier, O., A. Tarantola, and J. Virieux, 986, Two-dimensional nonlinear inversion of seismic waveforms: Geophysics, 5, Hou, J., and W. Symes, 4, An approximate inverse to the extended Born modeling operator: 84th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, , 5a, Accelerating extended least squares migration with weighted conjugate gradient iteration: 85th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, , 5b, An approximate inverse to the extended Born modeling operator: Geophysics, 8, no. 6, R33 R349., 6, Accelerating least squares migration with weighted conjugate gradient iteration: Presented at the 78th Annual International Conference and Ex-

19 Preconditioned FWI 5 hibition, Expanded Abstract, European Association for Geoscientists and Engineers. Jin, S., R. Madariaga, J. Virieux, and G. Lambaré, 99, Two-dimensional asymptotic iterative elastic inversion: Geophysical Journal International, 8, Lailly, P., 983, The seismic inverse problem as a sequence of before-stack migrations, in Conference on Inverse Scattering: Theory and Applications: Society for Industrial and Applied Mathematics, 6. Métivier, L., F. Bretaudeau, R. Brossier, S. Operto, and J. Virieux, 4, Full waveform inversion and the truncated newton method: quantitative imaging of complex subsurface structures: Geophysical Prospecting, 6, Métivier, L., R. Brossier, and J. Virieux, 5, Combining asymptotic linearized inversion and full waveform inversion: Geophysical Journal International,, Nocedal, J., and S. Wright, 999, Numerical Optimization: Springer Verlag. Operto, S., C. Ravaut, L. Improta, J. Virieux, A. Herrero, and P. Dell Aversana, 4, Quantitative imaging of complex structures from dense wideaperture seismic data by multiscale traveltime and waveform inversions: a case study: Geophysical Prospecting, 5, no. 6, Plessix, R.-E., 6, A review of the adjoint-state method for computing the gradient of a functional with geophysical applications: Geophysical Journal International, 67, Pratt, R., 999, Seismic waveform inversion in the frequency domain, part : Theory, and verification in a physical scale model: Geophysics, 64, Pratt, R. G., C. Shin, and G. J. Hick, 998, Gaussnewton and full newton methods in frequencyspace seismic waveform inversion: Geophysical Journal International, 33, no., Qin, B., T. Allemand, and G. Lambaré, 5, Full waveform inversion using preserved amplitude reverse time migration: 85th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Shin, C., K. Yoon, K. J. Marfurt, K. Park, D. Yang, H. Y. Lim, S. Chung, and S. Shin,, Efficient calculation of a partial-derivative wavefield using reciprocity for seismic imaging and inversion: Geophysics, 66, no. 6, Tarantola, A., 984, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 49, ten Kroode, F.,, A wave-equation-based Kirchhoff operator: Inverse Problems, 53: 8. Versteeg, R., and G. Grau, 99, Practical aspects of inversion: The Marmousi

20 6 Hou and Symes experience: Proceedings of the EAEG, The Hague. Virieux, J., and S. Operto, 9, An overview of full waveform inversion in exploration geophysics: Geophysics, 74, WCC7 WCC5. Xu, S., Y. Zhang, and B. Tang,, 3D angle gathers from reverse time migration: Geophysics, 76, no., S77 S9.

21 The Rice Inversion Project, TRIP6, July 6, 6 Inversion velocity analysis via approximate Born inversion Jie Hou, William W. Symes, The Rice Inversion Project, Rice University ABSTRACT Optimization-based Migration Velocity Analysis (MVA) updates long wavelength velocity information by minimizing an objective function that measures the violation of a semblance condition, applied to an image volume. Differential Semblance Optimization (DSO) forms a smooth objective function both in velocity and data, regardless of the data frequency content. Depending on how the image volume is formed, however, the objective function may not be minimized at a kinematically correct velocity, a phenomenon characterized in the literature (somewhat inaccurately) as gradient artifacts. In this paper, we will show that the root of this pathology is imperfect image volume formation resulting from various forms of migration, and that the use of linearized inversion (least squares migration) more or less eliminates it. We demonstrate that an approximate inverse operator, little more expensive than RTM, leads to recovery of a kinematically correct velocity. INTRODUCTION Full waveform inversion (FWI) (Lailly, 983; Tarantola, 984; Virieux and Operto, 9) is capable of recovering detailed models of the subsurface structure through a waveform-based data-fitting procedure. However, it may stagnate at physically meaningless solutions in the absence of a kinematically accurate starting model. Within the limitation to single scattering, migration velocity analysis (MVA) (Yilmaz and Chambers, 984; Yilmaz, ) complements FWI by extracting long scale velocity. Image domain MVA involves construction of an extended image volume, which not only depends on the image point but also on an extra parameter (e.g surface offset, subsurface offset, incidence angle). When velocity and data are kinematically compatible, this volume should have particularly simple structure (flat, focused,...). Deviations from this semblance principle can be 7

22 8 Hou and Symes used to drive velocity updates, either by direct measurement (eg. residual moveout picking, (Stork, 99; Lafond and Levander, 993; Liu and Bleistein, 995; Biondi and Sava, 4)) or via optimization of an objective function. This paper focuses on a particular choice of optimization approach, namely differential semblance optimization (DSO) in the subsurface offset domain (Symes, 8). Implementations of this MVA approach have been based on double square root migration (Shen et al., 3), one-way shot record migration (Shen and Symes, 8), RTM (Shen, ; Weibull and Arntsen, 3), and various sorts of inversion (Biondi and Almomin, ; Liu et al., 4; Lameloise et al., 5). Both numerical and theoretical (Symes, 4; ten Kroode, 4) evidence suggest that this approach should be effective in recovering velocity macromodels under failure conditions for FWI. However other studies have suggested that the method may produce poor velocity update directions, in particular that the objective gradient may be contaminated with artifacts that prevent rapid convergence to a correct velocity (Fei and Williamson, ; Vyas and Tang, ). We show here that use of (linearized) inversion to create image volumes largely eliminates the gradient artifact pathology, and describe a computationally efficient method to achieve this goal. In fact, the artifacts actually are features of the objective function definition, not of the gradients. This is not a new observation: Khoury et al. (6) showed that subsurface offset DSO, using common azimuth migration to construct the image volume, could produce erroneous velocities returning lower objective function values than the true velocity. Liu et al. (4); Lameloise et al. (5) confirm this observation and show that use of inverted (rather than migrated) image volumes tends to improve DSO velocity updates, essentially because the inverted image volume is much better focused at the target velocity. Our innovation is to show that good velocity updates may be achieved with image volumes obtained by an approximation to linearized inversion, costing little more than RTM and involving no ray-theory computations (ten Kroode, ; Hou and Symes, 5). In this paper, we will compare three different imaging operators : conventional RTM operator, the adjoint of the extended Born modeling operator and an approximate inverse to the extended Born modeling operator. The conventional RTM operator only involves cross-correlating the forward source wavefield and backward receiver wavefield. The adjoint operator differs the RTM operator by time derivatives and velocity scaling. Hou and Symes (5) modify the adjoint operator into an approximate inverse operator by applying model and data-domain weight operators. To distinguish it from the other possibilities, we call MVA with

23 IVA 9 the approximate inverse operator Inversion Velocity Analysis (IVA). In the following sections, we will first review the theory of MVA via DSO in the subsurface offset domain; we then compare three imaging operators and their possible influence on the DSO objective function; we end with numerical test on D Marmousi model, demonstrating that better imaging leads to better velocities. THEORY The D constant density acoustic Born modeling operator F[v ] can be expressed as (F[v ]δv)(x s,x r,t) = t dxdhdτg(x s,x h,τ) δv(x) () v (x) 3 G(x + h,x r,t τ). Here δv is the model perturbation or reflectivity, v is the background velocity model and G is the Green s function. The Born inverse problem is to fit data d(x s,x r,t) by proper choice of v (x) and δv(x). As explained for example in Symes (8), data fit is impossible unless v is kinematically correct to within a half-wavelength error in traveltime prediction. On the other hand, if δv is extended to depend on subsurface offset h, that is, replaced by δ v(x,h) and used as input in an extended Born modeling operator F, ( F[v ]δ v)(x s,x r,t) = t dxdhdτg(x s,x h,τ) δ v(x,h) v (x) 3 G(x + h,x r,t τ). () then any model consistent data d can be fit with essentially any v by proper choice of δ v. The extended modeled data F[v ]δ v is equal to the (non-extended) Born modeled data F[v ]δv when δ v(x,h) = δv(x)δ(h), that is, when δ v is focused at h =, which is the semblance condition for subsurface offset extended modeling.

24 Hou and Symes Imaging Operators The conventional RTM with space shift imaging condition is often formulated as I(x,h) = dx s dx r dtdτg(x s,x h,τ) G(x + h,x r,t τ)d(x s,x r,t), (3) where d is seismic reflection data. It is, however, not exactly the adjoint to the extended Born modeling operator, which is I(x,h) = v (x) 3 dx s dx r dtdτg(x s,x h,τ) (4) G(x + h,x r,t τ) t d(x s,x r,t). The difference is minor: the velocity scaling and second time derivative. Nonetheless, the time derivative is quite important in the sense that the extended normal operator F T F is order zero. Without the time derivative, the frequency components will be affected. Hou and Symes (4, 5) modify the adjoint operator into an approximate inverse to the extended Born modeling operator, by applying model and data domain weight operators. It has the form, with Here L = F = W model F T W data, (5) Wmodel = 4v5 LP, W data = It 4 D zs D zr. (6) (x,z) (h,z), I t is time integration, F T is the adjoint of extend Born modeling operator and D zs,d zr are the source and receiver depth derivatives. P is a Fourier-like operator and approximately equal to near h =, and will be neglected. Differential Semblance Optimization The DSO objective function is J[v] = hi(x,h), (7)

25 IVA in which I(x,h) is produced via the application of either RTM (equation 3), adjoint Born modeling (equation 4), or the approximate inverse F (equation 5) to the data. Since the image volume depends on v, so does J. To examine the effect of imaging operator choice, we first plot the objective function values along a line segment in velocity model space. The model we use combines a constant background velocity model (.5 km/s) and single flat reflector at km depth. All three imaging operators are applied on the tapered Born data with a range of velocities : from km/s to 3 km/s. The objective function values for different velocities are then plotted in Figure. All three objective functions are smooth in velocity and unimodal. However, only the objective function using the approximate inverse operator reaches the minimum at the correct velocity. Both the RTM and adjoint operator version shift toward lower velocity (RTM: m/s; Adjoint operator : 4 m/s). Moreover, the objective function using approximate inverse better resolves the minimizer. Normalized Objective Function RTM Adjoint operator Approximate inverse Velocity (m/s) Figure : Normalized objective function with three different imaging operators. The velocity ranges from km/s to 3km /s. NUMERICAL EXAMPLES The comparison is performed on a truncated Marmousi model (Versteeg and Grau, 99). The true model, shown in 3a, is a smoothed version of the original model, in order to be wary of low frequency noise in the RTM image. The synthetic data is generated with -8 finite difference modeling. The acquisition geometry is a fixed spread of 5 sources and 3 receivers at m depth. A ( )-Hz bandpass wavelet with ms time sample is used to generate 3s Born data. Optimization is carried out with L-BFGS algorithm (Nocedal and Wright, 999). We

26 Hou and Symes set the L-BFGS iteration to retain prior search directions. We start the optimization from a highly smoothed initial model (Figure 3b)..9 Normalized Objective Function Number of Iterations Figure : Normalized convergence curves for three different imaging operators. The solid line is the convergence curve and the dashed line is the objective function value at the correct velocity. The green line represents the RTM operator; the blue line represents the adjoint operator; the red line represents the approximate inverse operator. 5 iterations L-BFGS are used to generate the results. Figure displays the normalized convergence curves for the optimization with different imaging operators. All three convergence curves mangage to go beyond the objective function value at the correct velocity, indicating that the objective function doesn t reach the minimum at the correct velocity value. Among them, the approximate inverse version gives the best objective function behavior in the sense that both the objective function value at correct model and the discrepancy between the converged value and objective function value at the correct model are the smallest. Figure 3 shows a comparison of the true, initial, updated models with three different imaging operators. As can be seen in the comparison, same optimization procedure with different imaging operators can produce quite different results. All five models are then used as background model to apply approximate inverse operator on the synthetic data (equation 5). Figure 4 compares the zero-offset images. All three recovered models significantly improve the quality of the images, compared to the clearly distorted initial image. Careful examination reveals that the result using approximate inverse operator is more correct: the location of the reflectors and the amplitude are closer to the image with true velocity. Finally, six offset gathers are pulled out from the middle of the image volume to compare in Figure 5, demonstrating approximate inverse operator produces a more focused image volume.

27 IVA 3 Figure 3: (a) True model; (b) Initial Model; 5 iteration L-BFGS result for (c) RTM operator; (d) the adjoint operator; (e) the approximate inverse operator.

28 4 Hou and Symes Figure 4: Zero-offset approximate inverse image associated with different velocity models shown in Figure 3.

29 IVA 5 Figure 5: Image gathers in the subsurface offset domain corresponding to the images in Figure 4.

30 6 Hou and Symes CONCLUSIONS We have compared velocity analysis via DSO with three different imaging operators and analyzed their corresponding performance. The numerical examples show that the closer that the imaging operator to inversion, the better the DSO velocity estimate. An approximate inverse as in Hou and Symes (5) adds no additional cost but improves velocity estimation substantially. ACKNOWLEDGMENTS We are grateful to the sponsors of The Rice Inversion Project (TRIP) for their longterm support, and to Shell International Exploration and Production Inc for its support of Jie Hou s PhD research. We acknowledge the Texas Advanced Computing Center (TACC) and the Rice University Research Computing Support Group (RCSG) for providing essential HPC resources. REFERENCES Biondi, B., and A. Almomin,, Tomographic full waveform inversion (TFWI) by combining full waveform inversion with wave-equation migration velocity analysis: 8nd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, SI9.5. Biondi, B., and P. Sava, 4, Wave-equation migration velocity analysis - I: Theory, and II: Subsalt imaging examples: Geophysics, 5, Fei, W., and P. Williamson,, On the gradient artifacts in migration velocity analysis based on differential semblance optimization: 8th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Hou, J., and W. Symes, 4, An approximate inverse to the extended Born modeling operator: 84th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, , 5, An approximate inverse to the extended Born modeling operator: Geophysics, 8, no. 6, R33 R349. Khoury, A., W. Symes, P. Williamson, and P. Shen, 6, DSR migration velocity analysis by differential semblance optimization: 76th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, SPMI3.4.

31 IVA 7 Lafond, C. F., and A. R. Levander, 993, Migration moveout analysis and depth focusing: Geophysics, 58, 9. Lailly, P., 983, The seismic inverse problem as a sequence of before-stack migrations, in Conference on Inverse Scattering: Theory and Applications: Society for Industrial and Applied Mathematics, 6. Lameloise, C.-A., H. Chauris, and M. Noble, 5, Improving the gradient of the image-domain objective function using quantitative migration for a more robust migration velocity analysis: Geophysical Prospecting, 63, Liu, Y., W. Symes, and Z. Li, 4, Inversion velocity analysis via differential semblance optimization: Presented at the 76th Annual Meeting, Extended Abstracts, European Association of Geoscientists and Engineers. Liu, Z., and N. Bleistein, 995, Migration velocity analysis: theory and an interative algorithm: Geophysics, 6, Nocedal, J., and S. Wright, 999, Numerical Optimization: Springer Verlag. Shen, P.,, An RTM based automatic migration velocity analysis in image domain: 8nd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, SVE.. Shen, P., and W. Symes, 8, Automatic velocity analysis via shot profile migration: Geophysics, 73, VE49 6. Shen, P., W. Symes, and C. C. Stolk, 3, Differential semblance velocity analysis by wave-equation migration: 73rd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Stork, C., 99, Reflection tomography in the postmigrated domain: Geophysics, 57, Symes, W., 8, Migration velocity analysis and waveform inversion: Geophysical Prospecting, 56, , 4, Seismic inverse problems: recent developments in theory and practice: Presented at the Inverse Problems - from Theory to Application, Proceedings, Institute of Physics. Tarantola, A., 984, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 49, ten Kroode, F.,, A wave-equation-based Kirchhoff operator: Inverse Problems, 53: 8., 4, A Lie group associated to seismic velocity estimation: Presented at the Inverse Problems - from Theory to Application, Proceedings, Institute of Physics. Versteeg, R., and G. Grau, eds., 99, The Marmousi experience: Proceedings of the EAEG workshop on practical aspects of inversion: IFP/Technip.

32 8 Hou and Symes Virieux, J., and S. Operto, 9, An overview of full waveform inversion in exploration geophysics: Geophysics, 74, WCC7 WCC5. Vyas, M., and Y. Tang,, Gradients for wave-equation migration velocity analysis: 8th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Weibull, W., and B. Arntsen, 3, Automatic velocity analysis with reverse time migration: Geophysics, 78, S79 S9. Yilmaz, O.,, Seismic data processing, in Investigations in Geophysics No. : Society of Exploration Geophysicists. Yilmaz, O., and R. Chambers, 984, Migration velocity analysis by wavefield extrapolation: Geophysics, 49,

33 The Rice Inversion Project, TRIP6, July 6, 6 Matched Source Waveform Inversion: Space-time Extension Guanghui Huang and William W. Symes, Rice University, and Rami Nammour, TOTAL E&P R&T USA ABSTRACT Matched Source Waveform Inversion (MSWI) uses non-physical (extended) synthetic energy sources to maintain waveform data fit throughout nonlinear inversion. A maximal variant of this approach extends the synthetic source to the entire space-time volume. Velocity is updated via a penalty on the spread of the synthetic source away from the physical source location: reducing this penalty by focusing the source at its correct location forces the velocity to achieve kinematic consistency with the data. Because data fit is maintained during this process, the stagnation typical nonlinear full waveform inversion is avoided, and convergence to an acceptable velocity is obtained without extremely low frequency data or particularly accurate starting models. INTRODUCTION Full waveform inversion (FWI) can provide detailed maps of subsurface medium parameters, such as compressional wave velocity and density, by iteratively updating these parameters to optimize the fit between simulated data and recorded seismic data in the least squares sense (Tarantola, 984a,b). However, both synthetic and field data examples illustrate the tendency of FWI to stagnate at unsatisfactory solutions (often described as local minima) with large data misfit, unless the method is provided with an initial model that predicts arrival times to within a half-period in the lowest usable data frequencies (Bunks et al., 995; Pratt, 999; Virieux and Operto, 9). Initial models and/or low frequency data of suitable accuracy are sometimes acquired, and in that case FWI imaging may yield significant improvements over other methods (Vigh et al., ; Plessix et al., ), but such information is not always available. 9

34 3 Huang & Symes The approach to nonlinear waveform inversion reported in this paper belongs to a variant of FWI, in which additional parameters are added to the model so that data fit can be maintained throughout the update process. Such model extensions are non-physical, and must be suppressed during the course of the inversion by means of a penalty term added to the data misfit term of ordinary FWI. Many forms of model extension have been explored. The space-time source extension used in the work reported below adds to the model distributed synthetic energy sources, occupying the entire space-time volume used for wave modeling. These extended sources are adjusted, together with the velocity model, so that data is fit and while source energy is focused at the physical source positions. When both goals are achieved, the FWI problem is solved. Various source extensions have been explored in prior work (Song and Symes, 994; Symes, 994; Plessix, ; Plessix et al., ; Pratt and Symes, ; Gao and Williamson, 4; Warner and Guasch, 4). Most of these works use the source-receiver extension, in which a (possibly) different source wavelet (at the physical source location) is used to model each data trace, and a penalty is imposed for trace dependence of the sources. Huang and Symes (5a) apply this concept to crosswell tomography, and show that with suitable choice of penalty, provided that no caustics (or multiple energy paths) occur between source and receiver, the objective of this extended version of FWI has the same stationary points as does the traveltime tomography objective, and in particular is convex over a much wider domain in model space than is the FWI objective. As will be explained below, inversion based on the space-time source extension appears to have the same tomographic property but without the requirement that the wavefield be free of caustics. We mention that another genre of model extension, involving addition of nonphysical axes to velocity and other mechanical parameters of the subsurface, has also been explored extensively (Shen and Symes, 8; Biondi and Almomin,, 4; Lameloise et al., 5), also (Symes, 8) for a survey of older work. These mechanical parameter extensions may achieve roughly the same tomographic goals as do source extensions of the type discussed here, under various conditions.

35 MSWI: Space-time 3 THEORY We model seismic wave propagation using the constant density acoustic wave equation with isotropic point source. The pressure field u(x,t;x s ) for the source position x = x s satisfies ( ) u v t u (x,t;x s ) = f (t)δ(x x s ) () u t= = u t = () t= The forward operator S[v]f which relates the velocity v(x) and wavelet function f (t) to the scattering field at the receiver x r, i.e., S[v]f (x r,t;x s ) = u(x r,t;x s ). (3) Conventional full waveform inversion based on this model consists in finding the pair (v,f ) such that the residual wavefield is minimized in the least squares sense: J FWI [v,f ] = S[v]f (x r,t;x s ) d(x r,t;x s ) dt. x s,x r where d(x r,t;x s ) is recorded data. Extended Model and Annihilator The extended seismic wavefield ū corresponding to the extended source model f and velocity v satisfies a modification of equation : ū v t ū = ū t= = ū t The extended forward modeling operator is defined by f (x,t,x s ) (4) = t= (5) S[v] f (x r,t;x s ) = ū(x r,t;x s ). (6) Note that the dimensional of data space is smaller than that of the extended source function space, so it is hardly surprising that we can easily fit the data, that is,

36 3 Huang & Symes solve d = S[v] f by adjusting f for more or less arbitrary v. That is, we now have too many solutions of the inverse problem. To eliminate the nonuniqueness, note that the extended model should be reduced to f (x,t,x s ) = δ(x x s )f (t) if given the true velocity, which motivates us to introduce an annihilator A to penalize nonphysical source energy away from x = x s. Many such annihilators exist. In this work we use A f (x,t;x s ) = x x s f (x,t;x s ). (7) Matched Source Waveform Inversion (MSWI) We combine a data fitting term using the the extended forward map 5 with the mean-square of the annihilator output to produce the matched source waveform inversion (MSWI) objective function, J α [v, f ] = α + x r,x s x,x s S[v] f (x r,t;x s ) d(x r,t;x s ) dt A f (x,t;x s ) dt. (8) Note that the MSWI objective functional is quadratic with respect to extended source function f, though it is still nonlinear in velocity v. It is therefore natural to adopt a nested approach: solve first for f - a linear, hence presumably easier problem - to create a reduced objective depending only on v, then minimize the reduced objective over v. Nested optimization of partially linear objectives was introduced under the name Variable Projection Method (VPM) by Golub and Pereyra (973, 3) and used to estimate sources and parameter perturbations in waveform inversion (van Leeuwen and Mulder, 9; Rickett, ; Li et al., 3). In fact, VPM and similar nested algorithms were used in extended waveform inversion from the beginning (Kern and Symes, 994), and for essential reasons, not merely as a computational convenience (Huang and Symes, 5b). Algorithm. VPM for MSWI Method. Define the v-dependent extended source f [v], by minimizing the objective 8 over f : this amounts to solving the normal system ( S[v] T S[v] + αa T A) f [v] = S[v] T d. (9)

Distance (km) Figure : Amplitude of the backpropagation wavefield S T d for (a) true velocity, (b) % low velocity, and (c) % high velocity To illustate the mechanics of this

6) km, and record data at z r = km from x r = km to x r = 4 km. The input source frequency is Hz. We compare the amplitude of backpropagation S[v] T d of the recorded data, i.e. the right hand side of (9), for true velocity and % low and high velocity as showed in Figure a-c.

37 MSWI: Space-time 33. Minimize over v the reduced objective function, Jα red [v] = α S[v] f [v] d dt + x r,x s x,x s A f [v] dt Depth (km) Depth (km) Depth (km) 3 4 Distance (km) 3 4 Distance (km) 3 4 Distance (km) Figure : Amplitude of the backpropagation wavefield S T d for (a) true velocity, (b) % low velocity, and (c) % high velocity To illustate the mechanics of this algorithm, we use a time-harmonic transmission inversion problem for which the correct velocity is the constant v = km/s. Put the single source at x s = (,.6) km, and record data at z r = km from x r = km to x r = 4 km. The input source frequency is Hz. We compare the amplitude of backpropagation S[v] T d of the recorded data, i.e. the right hand side of (9), for true velocity and % low and high velocity as showed in Figure a-c. In fact, this field is proportional to the first iteration of any gradient-descent method for solving 9, so can serve as a proxy for the solution. Only for the correct velocity is this wavefield focused at the source position, while for other velocities it is not.

34 Huang & Symes Zhang and Gao (8), Zhang and Wang (9), and Jin and Plessix (3) base algorithms for velocity updating on this principle, measuring energy spread via an operator similar to our A.

38 34 Huang & Symes Zhang and Gao (8), Zhang and Wang (9), and Jin and Plessix (3) base algorithms for velocity updating on this principle, measuring energy spread via an operator similar to our A. Note that the actual solution f of the least-squares problem 9 is much better focused, just as least-squares migration generally produces a better focused image than RTM, and therefore is a more sound basis on which to base a velocity update. NUMERICAL EXAMPLES We adopt the frequency continuation approach to solve the MSWI problem optimized by steepest descent gradient method with backtracking line search. For each frequency, a model update is constructed with a fixed number of steepest descent gradient iterations. The updated model is taken as the initial model for inversion at the next frequency. During each iteration, we use a direct matrix solve (Gaussian elimination) to solve the normal equation (9), which requires forming the matrix of the normal operator, to guarantee the accuracy of the gradient Depth (km).7.6 Time (s) Distance (km) 5 Receivers (km) Figure : Transmission configuration: (a) target velocity; (b) shot gather at x s = km.9.9 Depth (km) Depth (km) Distance (km).4 Distance (km) Figure 3: Inverted models after 5 iterations of (a) MSWI and (b) FWI. Example (Transmission). The model (Figure a) contains a low velocity anomaly embedded in a constant velocity background. 97 receivers are placed at depth z r =.99 km from. km to.98 km, spaced. km apart. 5 shots are placed at depth z s =. km from. km to.98 km, shot interval.4 km. The frequencies used in inversion are f = 6,,4,8 Hz. The regularization parameter α = 6.

39 MSWI: Space-time 35 Figures a-b show the true velocity and recorded data at the center shot, respectively. The inverted models after 5 iterations for each frequency by MSWI and FWI method are shown in Figure 3a-3b, respectively. Due to the absence of the low velocity anomaly, the initial velocity produces a big arrival time error in the predicted data, which causes FWI to fail, while the main low velocity feature is inverted by MSWI to high precision. 5 Depth (km) Distance (km) Depth (km) Distance (km) 4 3 Figure 4: (a) True velocity; (b) ray trace at the first shot position x s =.5km Example (Diving Wave). The model consists of a low velocity Gaussian anomaly embedded in linear increasing background velocity. receivers are placed at depth z r =.4 km from x r. km to 7.98 km with dx r =. km. 8 shots are placed at depth z s =. km from x s =.5 km to 7.95 km with dx s =. km. Successively inversion frequencies are f = 6,7,8,9, Hz. The regularization parameter α = 3. The true velocity is shown in Figure 4a. We perform the ray trace on this model as depicted in Figure 4b. As we can see, there are three types of waves arriving at receivers including direct wave, diving waves, and later arrivals due to the low velocity anomaly. We start the inversion with the correct background (linear) velocity. The initial velocity and inverted velocity are displayed in Figure 5a-5c after 5 iterations for each frequency. The low velocity anomaly is well reconstructed (Figure 4a). Example 3 (Marmousi Model). The Marmousi model (Bourgeois et al., 99) is modified by adding. km water on the top. Thus the model is 9. km in horizontal extent 3. km deep. We generate 4 shots starting at x s =.8 km to 9. km with dx s =.8 km at depth z s =.6km. The receivers are placed at depth z r =.4 km from x r =.8

40 36 Huang & Symes 5 Depth (km) Distance (km) Depth (km) Distance (km) Depth (km) Distance (km) 4 3 Figure 5: (a) Initial velocity; inverted velocity by (b) FWI method and (c) MSWI using (6-) Hz data km to 9. km with dx r =.4 km. Thus each shot generates 7 traces. 5 frequencies are used in the inversion: 4, 5, 6, 7, and 8 Hz. The regularization parameter α = 3. To avoid the obvious inverse crime, we generate the synthesized data using a finer mesh than is used in the inversion. The target velocity model, D linear increasing initial velocity and the inverted velocity after 5 iterations for each frequency are shown in Figure 6a-6c from top to bottom. The inverted and target velocities are quite close, allowing for the limited resolution available to the inversion due to limited frequency range. DISCUSSION AND CONCLUSION MSWI is one kind of nonlinear waveform inversion that imposes few constraints on the model or data geometry, hence can succeed with many kinds of waveform data, including transmission, reflection and diving/refraction, without any preprocessing. The source focus annihilator (A in equation ) forces the extended source to focus at survey source positions, and leads to a velocity update not de-

MSWI: Space-time 37 Depth (km) 3 3 4 5 6 7 8 9 Distance (km) 5 4 3 Depth (km) 5 4 3 3 3 4 5 6 7 8 9 Distance (km) Depth (km) 3 3 4 5 6 7 8 9 Distance (km) 5 4 3 Figure 6: (a) Marmousi model; (b) D

41 MSWI: Space-time 37 Depth (km) Distance (km) Depth (km) Distance (km) Depth (km) Distance (km) Figure 6: (a) Marmousi model; (b) D initial velocity; (c) inverted velocity after 5 MSWI iterations at each frequency. pending on prior knowledge of the source, an estimate for which is a by-product of the inversion. The data is fit throughout by appropriate choice of the weight α, allowing the method to avoid the stagnation at poorly-fitting models that appears to afflict FWI. MSWI with space-time extension, as defined here, is closely related to Waveform Reconstruction Inversion (WRI) (van Leeuwen and Herrmann, 3). WRI updates the velocity model via minimization of the minimum mean square error by which a trial wavefield u fails to solve the wave equation, penalized by a multiple of its mean square data misfit. The residual solution error field is exactly the extended source f of the present paper, which corresponds - with the trial wavefield u via the uniqueness theorem for the inhomogeneous wave equation. Besides focusing on the right-hand side rather than the solution, the approach explained here employs a different annihilator than do van Leeuwen and Herrmann (3). The implementation used in the examples presented here was formulated in the D frequency domain. Direct extension to 3D would require solution of the 3D Helmholtz equation, currently a limiting factor. Alternatively, 3D time domain solution is certainly possible in principle, but requires storage of the full space-time source wavefield f. Extension to 3D therefore requires either very large floating point or very large storage requirements, a limitation it shares with

42 38 Huang & Symes several other alternatives to FWI. On the other hand, extension to more general physics, for example various flavors of elastodynamics, is in principle straightforward. ACKNOWLEDGEMENT GH and WWS are grateful to TOTAL E&P USA and to the sponsors of The Rice Inversion Project for their support. We thank TOTAL E&P USA for permission to publish this study. We acknowledge the Rice University Computing Support Group for providing HPC resources that have contributed to the research results reported here. REFERENCES Biondi, B., and A. Almomin,, Tomographic full waveform inversion (TFWI) by combining full waveform inversion with wave-equation migration velocity analysis: 8nd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, SI9.5., 4, Simultaneous inversion of full data bandwidth by tomographic fullwaveform inversion: Geophysics, 79, WA9 WA4. Bourgeois, A., P. Lailly, and R. Versteeg, 99, The Marmousi model, in The Marmousi Experience: IFP/Technip. Bunks, C., F. Saleck, S. Zaleski, and G. Chavent, 995, Multiscale seismic waveform inversion: Geophysics, 6, Gao, F., and P. Williamson, 4, A new objective function for full waveform inversion: differential semblance in the data domain: 84th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Golub, G., and V. Pereyra, 973, The differentiation of pseudoinverses and nonlinear least squares problems whose variables separate: SIAM Journal on Numerical Analysis,, , 3, Separable nonlinear least squares: the variable projection method and its applications: Inverse Problems, 9, R R6. Huang, G., and W. Symes, 5a, Full waveform inversion via matched source extension: 85th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, 3 35.

43 MSWI: Space-time 39 Huang, Y., and W. W. Symes, 5b, Born waveform inversion via variable projection and shot record model extension: 85rd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Jin, S., and R.-E. Plessix, 3, Stack-based full wavefield velocity tomography: 83rd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Kern, M., and W. Symes, 994, Inversion of reflection seismograms by differential semblance analysis: Algorithm structure and synthetic examples: Geophysical Prospecting, 99, Lameloise, C.-A., H. Chauris, and M. Noble, 5, Improving the gradient of the image-domain objective function using quantitative migration for a more robust migration velocity analysis: Geophysical Prospecting, 63, Li, M., J. Rickett, and A. Abubakar, 3, Application of the variable projection scheme to frequency-domain full-waveform inversion: Geophysics, 78, R49 R57. Plessix, R.-E.,, Automatic cross-well tomography: an application of the differential semblance optimization to two real examples: Geophysical Prospecting, 48, Plessix, R.-E., G. Baeten, J. W. de Maag, M. Klaassen, R. Zhang, and Z. Tao,, Application of acoustic full waveform inversion to a low-frequency large-offset land data set: 8st Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Plessix, R.-E., W. Mulder, and F. ten Kroode,, Automatic cross-well tomography by semblance and differential semblance optimization: theory and gradient computation: Geophysical Prospecting, 48, Pratt, R., 999, Seismic waveform inversion in the frequency domain, part : Theory, and verification in a physical scale model: Geophysics, 64, Pratt, R., and W. Symes,, Semblance and differential semblance optimization for waveform tomography: a frequency domain approach: Journal of Conference Abstracts, 7, Rickett, J.,, The variable projection method for waveform inversion with an unknown source function: 8nd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, 5. Shen, P., and W. Symes, 8, Automatic velocity analysis via shot profile migration: Geophysics, 73, VE49 6. Song, H., and W. Symes, 994, Inverison of crosswell tomography via differential semblance optimization: Technical report, The Rice Inversion Project. (Annual Report).

44 4 Huang & Symes Symes, W., 994, Inverting waveforms in the presence of caustics: Technical report, The Rice Inversion Project. (Annual Report)., 8, Migration velocity analysis and waveform inversion: Geophysical Prospecting, 56, Tarantola, A., 984a, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 49, , 984b, Linearized inversion of seismic reflection data: Geophysical Prospecting, 3, van Leeuwen, T., and F. Herrmann, 3, Mitigating local minima in fullwaveform inversion by expanding the search space: Geophysical Journal International, 95, van Leeuwen, T., and W. Mulder, 9, A variable projection method for waveform inversion: 7st Annual Meeting, Extended Abstracts, European Association of Geoscientists and Engineers, U4. Vigh, D., W. Starr, J. Kapoor, and H. Li,, 3d full waveform inversion on a Gulf of Mexico WAZ data set: 8st Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Virieux, J., and S. Operto, 9, An overview of full waveform inversion in exploration geophysics: Geophysics, 74, WCC7 WCC5. Warner, M., and L. Guasch, 4, Adaptive waveform inversion: Theory: 84th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Zhang, Y., and F. Gao, 8, Full waveform inversion based on reverse time propagation: 78th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Zhang, Y., and D. Wang, 9, Traveltime information-based wave-equation inversion: Geophysics, 74, WCC7 WCC36.

45 The Rice Inversion Project, TRIP6, July 6, 6 Matched Source Waveform Inversion: Volume Extension Guanghui Huang and William W. Symes, Rice University, Rami Nammour, TOTAL E&P R&T USA ABSTRACT Matched source waveform inversion via volume extension introduces additional, non-physical sources acting at time t =. The extra sources permit the data to be fit well, even when velocity is kinematically inconsistent with data. Assuming point or near-point sources at known locations, a distanceweighted penalty on source energy is minimized when the data is fit and the sources act only at the known locations, thus solving the FWI problem. Good data fit throughout the inversion process produces a considerably more convex objective function than does the standard data-domain FWI formulation. A good theoretical foundation for this approach exists for pure transmission problems, while numerical examples suggest that in fact it may be a feasible approach to reflection and transmission inversion (separately or together) when the data lack adequate S/N at sufficiently low frequencies for conventional FWI to succeed. INTRODUCTION Full waveform inversion (FWI) obtains detailed maps of material parameters interior to the earth by matching simulated to recorded data in the least squares sense (Gauthier et al., 986; Tarantola, 984). This highly nonlinear and illposed inverse problem is also computationally large, forcing reliance on iterative optimization methods which may stagnate at geologically uninformative solutions (Pratt, 999; Virieux and Operto, 9). Many alternative methods have been suggested that may circumvent FWI s requirement of a starting model that predicts traveltimes to within one-half wavelength for frequencies with good S/N. Some recent examples include (Shen and Symes, 8; Weibull and Arntsen, 3; Biondi and Almomin, 4; Warner and Guasch, 4; van Leeuwen and 4

46 4 Huang and Symes Herrmann, 3). See Symes (8) for an overview of older literature on this topic. All of the works just cited expand the convergence domain of FWI by adding parameters to the earth model. The approach presented in this paper belongs to this model extension genre as well: it is a matched source waveform inversion method. The physical model consists of coefficients in the wave equation (mechanical paramters) and a source mechanism, represented as a right-hand side in the wave equation. We assume that the physical source is localized, essentially point-like, and adjoin a non-physical auxiliary source field that acts at t = (an exploding reflector model, essentially). The role of the auxiliary source field (the additional parameters in this volume extension of the acoustic model) is to enable fit to data, for all velocity models, hence the name matched source. We penalize the auxiliary source field by its mean square, weighted by distance to the source: minimizing this penalty forces the auxiliary source to zero, and recovers the physical source and thus the solution to the inverse problem, since data fit has been maintained throughout the process. The matched source approach was introduced by Symes (994): the additional parameters in that work were receiver dependent source wavelets, acting at the physical source positions. This source-receiver extension has been used in a variety of contexts (Plessix, ; Pratt and Symes, ; Luo and Sava, ; Warner and Guasch, 4). For combinations of acquisition geometry and velocity that generate unique ray paths between source and receiver, the objective function is equivalent to the traveltime misfit, which explains the large domain of convergence and essentially tomographic nature of the inversion (Huang and Symes, 5). However, for more refractive models generating multiple ray paths, the domain of convergence shrinks to resemble that of FWI (Symes, 994). The advantage of the volume extension presented here is the persistence of the large domain of convergence even for highly refractive models generating multiple ray paths, as will be illustrated below. The volume extension appears to share this advantage with the space-time extension, essentially equivalent to Wavefield Reconstruction Inversion (van Leeuwen and Herrmann, 3). However the volume extension requires only one time level of the field to be stored, as opposed to the full space-time (or space-frequency) volume needed (in principle) in WRI. We describe the volume extension and the matched source inversion based on it in the next section, along with a sketch of its theoretical justification for transmission inversion (direct or diving waves). Then we present several examples, of both transmission and reflection inversion.

47 MSWI via Volume Extension 43 THEORY We assume that the modeled data d(x r,x s,t) is the restriction to known survey source and receiver locations x s,x r, of a solution u to the constant density acoustic wave equation, u v t u = δ(x x s)f (t) () u t= = u = () t t= where v is the unknown seismic velocity, and f (t) is the input source wavelet. We introduce forward modeling operator S[v]f to relate the velocity v and source function f (t) to the data as just described. Full Waveform Inversion (FWI) adjusts the velocity to minimize the difference between the modeled data and recorded data in the least squares sense, that is, J FWI = S[v]f (x r,x s,t) d(x r,x s,t) dt x s,x r Gradient-based optimization methods applied to this problem tend to stagnate far from a useful model estimate if the initial model is does not predict travel times of important events in the data to within a half-wavelength at frequencies with good S/N. Extended modeling and volume extension The volume extension is defined by solving the extended acoustic wave equation, ū v t ū = f (t)δ(x x s) + f (x x s )δ(t) (3) u =,t < (4) Here f (t) is the physical point source wavelet, the pair (f (t), f (x,x s )) is the extended source, ū(x,t;x s ) is the acoustic potential field, and v(x) is the velocity field. The extended forward modeling operator is defined by sampling the pressure field, S[v](f, f )(x r,t;x s ) = t ū(x r,t;x s ). (5)

48 44 Huang and Symes If the data d is fit, that is, S[v](f, f ) = d, and f vanishes, so that the total source focuses at the source position x s, then v solves the FWI problem. We introduce a penalty or annihilator operator A(f, f )(x,x s ) = x x s f (x,x s ) for which A(f, f ) = precisely when f =. Then the solution of the FWI problem also solves minimize A(f, f ) subject to S[v](f, f ) d As unconstrained optimization problems are easier to attack, we introduce a penalty form of this problem (the variable projection method (Golub and Pereyra, 3)): we minimize over v J α [v] = ( x r,x s S[v](f, f ) d + α ) A(f, f ) (6) s.t. N α [v](f, f ) ( S[v] T S[v] + αa T A)(f, f ) = S[v] T d (7) for a penalty parameter α > to be determined. Transmission case In this section we summarize several important properties of the matched source waveform inversion problem 6 for slowly varying velocity on the wavelength scale. In the asymptotic limit, this simply means that the velocity is smooth as a function of spatial position, so all arrivals are transmitted, i.e. either direct or diving waves, and no reflections exist in the data. This case applies for example to crosswell geometry. We will make a number of assertions about S and related operators, mathematical justifications of which are technical and will be given elsewhere. We idealize the receiver arrays to continuous subsets of a receiver surface embedded in space (for example, z = z r ), and assume that the extended sources f are constrained to produce no energy propagation (at least asymptotically) along rays that graze the receiver array, i.e. are tangent to it at some point. Assuming this grazing ray constraint, the normal operator N α [v] defined in equation 7 is continuous in its argument - that is, a small mean-square change in (f, f ) results in a small mean-square change in N α [v](f, f ). Also, N α [v] is regular as a function of v as well.

49 MSWI via Volume Extension 45 N α [v] may be rendered invertible by adding a small multiple of the identity operator (Tihonov regularization). We presume that this has been done, and treat N α [v] as invertible. Then it is also possible to show that S[v]N α [v] T S[v] is smooth in v. Since J[v] may be re-written as J α [v] is also smooth in v. J α [v] = α (I S[v]N α [v] S[v] T )d,d, Computationally, the gradient of J α can be obtained via the adjoint state method (Plessix, 6). Most importantly, N α is approximately local, in the sense that a high-frequency pulse input, localized near x in space and near k in phase space, will produce a high-frequency pulse localized in the same way. That is, N α does not move events in the extended source f. For that reason, it is invertible, up to an error of lower order in spatial frequency: it replaces each localized pulse by a scaled version of itself, to good approximation, with the scaling being positive it the ray generated by the position x and momentum k intersects the acquisition geometry. That is, the part of the extended source that generates waves propagating within the experimental aperture is recovered by N α [v], apart from a position- and dipdependent scaling. This within-aperture invertibility differentiates the volume extension from the source-receiver extension described by Huang and Symes (5), which has a similar invertibility property if the velocity model is sufficiently close to homogenous that it does not generate multiple ray paths connecting sources and receivers, but loses invertibility for more complex models. NUMERICAL EXAMPLES Example 4 (Crosswell transmission tomography). The model consists of a low velocity Gaussian anomaly embedded in a constant background velocity v = km/s. 99 receivers are placed at depth z r =.98 km from. km to.98 km with dx r =. km for all of 4 shots at depth z s =. km from.8 km to.9 km with dx s =.8 km to imitate transmission problem. In both volume-based MSWI and FWI we invert the data simultaneously with frequencies from f = 9 Hz to f = Hz.

46 Huang and Symes.9.9 Depth (km).8.7.6 Depth (km).8.7.6.5.5.4 Distance (km).

constant velocity 5 5 5 5 Time (s) Time (s) 5 5 5 5 Trace Number 5 5 Trace Number

velocity.9.9.8.8 Depth (km).7 Depth (km).7.6.6.5.5.4 Distance (km).

50 46 Huang and Symes.9.9 Depth (km) Depth (km) Distance (km).4 Distance (km) Figure : Transmission configuration: (a) true velocity; (b) initial constant velocity Time (s) Time (s) Trace Number 5 5 Trace Number Figure : Shot gather at x s = km for (a) true velocity and (b) initial constant velocity Depth (km).7 Depth (km) Distance (km).4 Distance (km) Figure 3: Inverted velocity after 5th iterations by (a) volume-based MSWI method and (b) FWI method

MSWI via Volume Extension 47 Figure a-b shows the target velocity and the initial velocity, respectively.

data. The inverted results by MSWI and FWI after 5 LBFGS iterations are shown in Figure 3a-3b, respectively.

We note that the MSWI based on source-receiver extension (Huang and Symes, 5) also fails to invert this example, due the multipathing and consequent loss of normal operator invertibility for that

51 MSWI via Volume Extension 47 Figure a-b shows the target velocity and the initial velocity, respectively. The recorded data and initial data are ploted in Figure a-b, which show that there is significant multipathing present in the data and a large traveltime difference between recorded data and initial data. The inverted results by MSWI and FWI after 5 LBFGS iterations are shown in Figure 3a-3b, respectively. As expected, FWI fails due to the large traveltime mismatch in the initial model, whereas MSWI recovers a reasonable velocity estimate. We note that the MSWI based on source-receiver extension (Huang and Symes, 5) also fails to invert this example, due the multipathing and consequent loss of normal operator invertibility for that extension. Example 5 (Reflected wave tomography). The model consists of three layers, with dense sampling of sources and receivers distributed across the surface, Figure 4a. The initial velocity model is v =.5 km/s. MSWI and FWI results are shown in Figure 4b-4c, respectively. FWI method can only reconstuct the first interface position correctly. The volume-based MSWI method produces a rather decent resconstruction with correct velocity between the layers and the right second interface position Depth (km).4.6 Depth (km) Distance (km) Distance (km).5. Depth (km) Distance (km) Figure 4: Reflected wave tomography: (a) true velocity, inverted velocity by our method with 5-Hz data after iterations, and inverted velocity by FWI method with 5-Hz data after iterations

52 48 Huang and Symes Example 6 (Salt body reconstruction). The model (portion of Pluto model (Stoughton et al., )) consists of a high velocity salt inclusion in a layered medium. The initial model is a D linearly increasing velocity with depth. The frequency band used for inversion is from 4 to Hz. The target velocity and initial velocity are showed in Figure 8a-8b, respectively. The inverted velocity by FWI after iterations is ploted in Figure 9. It mischaracterized the structure badly. MSWI method after iterations recovers the structure reasonably accurately (Figure 5d). DISCUSSION AND CONCLUSION We have described matched source waveform inversion based on a volume extension, which allows close data fitting throughout the inversion process, and achieve a good reconstruction of the velocity in cases where FWI fails due to lack of low frequency data. We are able to offer a theoretical explanation for the behaviour of the algorithm in the transmission case. Numerical examples suggests that volume extension MSWI is effective in both transmission and reflection modes. Current implementation of the inner linear subproblem 7 uses direct matrix methods, and forms the the dense matrix explicitly. This approach is feasible in D but problem 7 will need to be solved iteratively in 3D. Both preconditioning the inner solve, and computing the gradient accurately with inaccurate inner solves, are important topics for future research. Additionally, for reflected wave tomography, the outer nonlinear iteration converges quite slowly, which may be caused by the problem s extreme ill-posedness. ACKNOWLEDGEMENT GH and WWS are grateful to TOTAL E&P USA and to the sponsors of The Rice Inversion Project for their support. We thank TOTAL E&P USA for permission to publish this study. We acknowledge the Rice University Computing Support Group for providing HPC resources that have contributed to the research results reported here.

MSWI via Volume Extension 49 4.5 4 Depth (km).5 3.5 3.5 3 Distance (km).

53 MSWI via Volume Extension Depth (km) Distance (km) Depth (km) Distance (km) Depth (km) Distance (km) Depth (km) Distance (km).5 Figure 5: Salt body reconstruction: (a) true velocity; (b) initial D velocity; inverted velocity by (c) FWI after iterations, and (b) volume-based MSWI method after iterations

54 5 Huang and Symes REFERENCES Biondi, B., and A. Almomin, 4, Simultaneous inversion of full data bandwidth by tomographic full-waveform inversion: Geophysics, 79, WA9 WA4. Gauthier, O., A. Tarantola, and J. Virieux, 986, Two-dimensional nonlinear inversion of seismic waveforms: Geophysics, 5, Golub, G., and V. Pereyra, 3, Separable nonlinear least squares: the variable projection method and its applications: Inverse Problems, 9, R R6. Huang, G., and W. Symes, 5, Full waveform inversion via matched source extension: 85th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Luo, S., and P. Sava,, A deconvolution-based objective function for waveequation inversion: 8th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Plessix, R.-E.,, Automatic cross-well tomography: an application of the differential semblance optimization to two real examples: Geophysical Prospecting, 48, , 6, A review of the adjoint-state method for computing the gradient of a functional with geophysical applications: Geophysical Journal International, 67, Pratt, R., 999, Seismic waveform inversion in the frequency domain, part : Theory, and verification in a physical scale model: Geophysics, 64, Pratt, R., and W. Symes,, Semblance and differential semblance optimization for waveform tomography: a frequency domain approach: Journal of Conference Abstracts, 7, Shen, P., and W. Symes, 8, Automatic velocity analysis via shot profile migration: Geophysics, 73, VE49 6. Stoughton, D., J. Stefani, and S. Michell,, -D elastic model for wavefield investigations of subsalt objectives, deep water Gulf of Mexico: 7th Annual International Meetting, SEG, Expanded Abstacts, Symes, W., 994, Inverting waveforms in the presence of caustics: Technical report, The Rice Inversion Project. (Annual Report)., 8, Migration velocity analysis and waveform inversion: Geophysical Prospecting, 56, Tarantola, A., 984, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 49, van Leeuwen, T., and F. Herrmann, 3, Mitigating local minima in fullwaveform inversion by expanding the search space: Geophysical Journal International, 95,

55 MSWI via Volume Extension 5 Virieux, J., and S. Operto, 9, An overview of full waveform inversion in exploration geophysics: Geophysics, 74, WCC7 WCC5. Warner, M., and L. Guasch, 4, Adaptive waveform inversion: Theory: 84th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Weibull, W., and B. Arntsen, 3, Automatic velocity analysis with reverse time migration: Geophysics, 78, S79 S9.

56 5 Huang and Symes

57 The Rice Inversion Project, TRIP6, July 6, 6 Mathematical Fidelity and Open Source Libraries for Large Scale Simulation and Optimization William W. Symes ABSTRACT Optimization algorithms used to solve inverse problems in geoscience have abstract mathematical descriptions many of them (Conjugate Gradient iteration, Newtons method,) are socalled matrix free algorithms, that is, they manipulate their mathematical objects (vectors, functions) without reference to their internal details. Similarly, time-stepping algorithms for dynamical simulation may be described in terms of update rules for dynamical states, without reference to the internal structure of these states or the precise action of the rules. Both of these algorithmic settings provide opportunities for creation of re-usable open source code bases, applying to many different tasks. Not only do such libraries save programmer effort and reduce the incidence of errors, but also they could potentially make possible comparison of inversion techniques by providing common implementations for common components. This paper lays out some examples of features that computational types should inherit from their mathematical models, and some solutions to the programming problems that arise in implementing such types. [To be presented in Workshop 8, EAGE, Vienna, June 6] INTRODUCTION One of the more celebrated papers in the theory of turbulence begins a discussion of fluid dynamics with the statement: The time evolution of a velocity field is given by the Navier-Stokes equations: dv dt = X µ(v) (Ruelle and Takens, 97). While this description of viscous incompressible fluid flow might seem a bit terse, in fact it was just the right setting for a step change in the understanding of turbulence. 53

58 54 Hou and Symes Analogously, the structure of scientific software for dynamical simulation and optimization can also be stripped down to mathematical basics, and the fundamental components reflected accurately in software, with beneficial effect on scope and flexibility. As the concepts involved are mathematical, rather than computational, computer languages unsurprisingly do not provide them, so new types, defined by their behaviour, rather than by their implementation, must be supplied. The sections to follow will describe some basic types required for expressing algorithms in simulation and optimization, and points out that their computational realizations should possess just those attributes necessary to define their behaviour and relations, and no more. Differing levels of abstraction are appropriate for different parts of a complex application: optimization algorithms deal in vectors and functions, for instance, whereas differential equation solvers are built out of grids and update rules. Open source libraries of optimization algorithms formulated in terms of basic vector calculus types could provide a currently lacking basis for comparison of inversion algorithms in geoscience. WHAT S A VECTOR? A vector is not an array. Consider two length-three arrays, [.5,.5,.5] and [.,.,.]. It is tempting to say that these can be added, but they cannot - did I mention that the units of the first three values are km/s, of the last three, gm/cm 3? The linear algebra concept vector space resolves this issue - membership of two vectors in the same vector space asserts that they can sensibly be added. Specification of a vector space of physical data involves specification of units as well, and legitimate manipulation of those units. The operations that vectors must support may be found in the first chapter of a good book on linear algebra, and these operation are attributes of the vector space, not of the vectors that belong to it. Thus a vector is in a vector space when the former refers to the latter for its operations, and for compatibility checks. The reward for formulating algorithms in terms of abstract types such as vector and vector space is that such implementations inherit a guarantee to function in the same way in every specific instance. Many enormously useful numerical libraries, such as Matlab (Mathworks, 5) and PETSc (Balay et al., 5), confound the concept of array with that of vector. One by-product of this confusion is inevitably the demand that dimension be specified - however dimension is not a mandatory attribute of a vector space, even

59 Alternative Approximate Inverse 55 of a computationally realizable one (consider the vector space of polynomials in a real variable, for instance). Vectors are ultimately implemented with data containers holding floating point samples, of course - usually arrays, but optionally lists, trees, or other suitable data structures. Thus a vector has a data container of some sort, but is not identical to the container that it owns. EVALUATING FUNCTIONS A function f is a triple: a domain V and a range W, both (subsets of) vector spaces, and a rule for assigning to each vector x in V a memeber f (x) of W. Computational realizations of functions may entail many intermediate computations and much storage. Freeing the memory used after each evaluation of the function may be wasteful, since some or all of the intermediate results might be re-used. Not freeing this information when it is no longer needed is also inefficient in a different way. Consider, for example, a function on a collection of data traces (with samples, headers, etc. making a natural vector space structure) computed by calculating an attribute (for example stack power) of a migrated image. Evaluation requires that the image be computed, but the image is not the result returned by the function. The migrated image must be updated every time the input data vector is changed, so it is not properly an attribute of the function itself. However, if the value corresponding to the same data is requested a second time, it is very inefficient to migrate the data all over again. Since the mathematical relation y = f (x) appears to present no opportunity to signal whether intermediate data needs to be re-computed, this appears to be an instance in which the mathematical API is inadequate for computational purposes. Indeed, a number of libraries, such as ROL and NOX in the Trilinos collection (Heroux et al., 5), create a new computational function interface, in effect y = f (x, isnew), giving the programmer the unwelcome responsibility for asserting the new-ness, or otherwise, of an input vector. With a few reasonable conventions, this burden is unnecessary. If Vector data is encapsulated, so that it is altered only by basic linear algebra operations and by evaluations of functions, then a versioning system is possible, from which in principle it is possible to determine whether vector data has altered since last access. However the reader will see with a moment s reflection that this information cannot reasonably be an attribute of a function. Instead, it is a natural attribute of an evaluation type, that

60 56 Hou and Symes combines a function f and a point x in its domain with the semantics of f (x) for variable x. Intermediate data required during evaluation is often re-usable for the computation of gradients, Hessians, and so on. Therefore the evaluation should own an independent copy of the function (the only abstract handle on the necessary data) and have access to the vector. The copy of the function should be freed and re-initialized whenver the version count indicates that the vector data has changed. Figure displays the structure of evaluation, with terminology borrowed from a library that implements it (RVL, described below). Figure flowcharts the evaluation process. The evaluation construction makes the mathematical interface for function evaluation compatible with computational efficiency. Figure : UML-like diagram of a typical function evaluation class. Shows external reference to function and vector, and internal ownership of a copy of the function.

61 Alternative Approximate Inverse 57 Figure : Flow diagram for value computation attribute of an evaluation object.

62 58 Hou and Symes TIME-DEPENDENT SIMULATION Any time-domain dynamical law has the form mentioned in the first paragraph. Therefore a natural implementation of any time-stepping method would represent the dynamical state as a vector in the sense described above, and the time step as a function. Spatial gridding then becomes an internal detail of the vector type, and the time step is a vector-valued function (see for example Gockenbach et al. ()). However two compelling reasons argue against this approach. First, the mathematical interface does not conform to computational requirements. Creation and destruction of vectors involves quite a lot of infrastructure not really necessary for time-stepping implementations, and definitely awkward for parallel implementations. Also, assignment is not mathematically legitimate ( x=f(x) ), yet timestepping is naturally written that way. More practically, excellent libraries such as PETSc (Balay et al., 5) and Trilinos (Heroux et al., 5) support implementaton of time stepping but as mentioned above are not consistent with the vector type explained here, at least not without a considerable amount of wrapper code. Therefore array and similar types are the natural domain of time stepping, code, which is naturally used to define functions in the abstract context, and these are available to abstractly formulated algorithms (Symes et al., ). CONCLUSIONS Several widely-distributed open source libraries incorporate at least some of the observations made here. In particular, the Rapid Optimization Library (ROL) (Kouri et al., 5), a Trilinos package, uses strong encapsulation of vector data and comes close to a mathematically consistent function interface. It does not however employ a vector space type, nor does it take advantage of the evaluation construction. In a different direction, Madagascar (Holden, 5) is tied to a specific data representation, but one that is hidden behind a file access interface, and therefore has intrinsically abstract features. So far as the author knows, only the Rice Vector Library (RVL) (Padula et al., 9) consistently implements the mathematics as API concept. One (of many) obstacles to progress in understanding the effectiveness of Full Waveform Inversion and similar algorithms is the near-impossibility of bench-

63 Alternative Approximate Inverse 59 mark comparison: details of data representation, simulator structure, and optimization algorithm pervade FWI algorithms and make head-to-head comparison difficult to carry out and even harder to understand. A common approach to organizing the optimization level of FWI applications, based on the intrinsic mathematical structure of these algorithms, would remove one obtstacle to such comparitive research. Use of open-source libraries constructed on this basis is essential to create such a level playing field. Disputes over programming language and other issues have impeded adoption of a common optimization framework, but adherance to a common system of types and practices for their use in algorithms, defined by their mathematical traits rather than by their implementation, could sideline these concerns. ACKNOWLEDGEMENTS I am grateful to the sponsors of The Rice Inversion Project (TRIP) for their longterm support, and to many students and colleagues who have contributed to (and/or argued against!) the views discussed here. REFERENCES Balay, S., S. Abhyankar, M. Adams, J. Brown, K. Brune, P.and Buschelman, L. Dalcin, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L. Curfman McInnes, K. Rupp, B. Smith, S. Zampini, and H. Zhang, 5, PETSc Web page ( Gockenbach, M., D. Reynolds, P. Shen, and W. Symes,, Efficient and automatic implementation of the adjoint state method: ACM Transactions on Mathematical Software, 8, 44. Heroux, M., R. Bartlett, V. Howle, R. Hoekstra, J. Hu, T. Kolda, R. Lehoucq, K. Long, R. Pawlowski, E. Phipps, A. Salinger, H. Thornquist, R. Tuminaro, J. Willenbring, A. Williams, and K. Stanley, 5, An overview of the Trilinos project: ACM Transactions on Mathematical Software, 3, Holden, J., 5, The genesis of Madagascar: The Leading Edge, 34, Kouri, D., D. Ridzal, B. van Bloemen Waanders, and G. von Winckel, 5, ROL Web page ( Mathworks, 5, Matlab version (r5a): The Mathworks, Inc. Padula, A. D., W. Symes, and S. D. Scott, 9, A software framework for the ab-

64 6 Hou and Symes stract expression of coordinate-free linear algebra and optimization algorithms: ACM Transactions on Mathematical Software, 36, 8: 8:36. Ruelle, D., and F. Takens, 97, On the nature of turbulence: Communications in Mathematical Physics,, Symes, W., D. Sun, and M. Enriquez,, From modelling to inversion: designing a well-adapted simulator: Geophysical Prospecting, 59, (DOI:./j x).

65 The Rice Inversion Project, TRIP6, July 6, 6 An Implementation of Multipole Point-Sources within the IWave Framework Mario J. Bencomo and William Symes ABSTRACT The goal of this paper is to document an implementation for enabling the use of multipole point-sources (MPSs) in IWave simulations and eventually inversions. A framework for representing MPSs as combinations of base multipole terms is introduced here via the MPS Space class. Moreover, the MPS to RHS class of linear operators is developed to bridge the multipole representation of sources with a format that is compatible with IWave applications. Some basic numerical results are included demonstrating the use of MPSs in the first-order form of the acoustic equations. INTRODUCTION Seismic sources are commonly idealized as souces of point support, i.e., point sources. Simply representing sources mathematically as spatial delta functions may not be sufficiently realistic; for example a delta function as the source term for the acoustic wave equation will produce an isotropic radiation pattern in homogeneous media, which would be wrong if the source radiation pattern is expected to be anisotropic. In order to represent aniostropic sources as pointsources it is thus vital to incorporate higher-order multipole point-source (MPS) terms, where the delta function constitutes the zeroth-order MPS. In fact general sources can be approximated by a truncated series of MPSs, depending on the finiteness and anisotropy of the source; this concept is similar to that of the multipole expansion commonly used in electromagnetism. Authors in Santosa and Symes () analyze and quantify the MPS approximation for small sources in acoustic media using a combination of far-field and spherical harmonic expansions. Though MPS approximation analysis has not been extended to linear elasticity, the leap from acoustics to elasticity is rather a justifiable one, where spher- 6

66 6 Hou and Symes ical harmonic expansions are replaced with a more laborious version involving vector terms. Earthquake seismology has indeed benefitted from the applicability of MPS approximations in elastic media for the determination of earthquake mechanisms. The basic earthquake representation is given by what is referred to as the seismic moment tensor, which is equivalent to a combination of point dipole body forces; Backus and Mulcahy (976), Shearer (9). Moment tensors of higher degree (related to higher degree multipole terms) have been shown to be important in cases where finiteness of the source is of issue, that is the fault size is comparable to propagating wavelengths; Stump and Johnson (98). An important aspect of my Ph.D. work will consist of inverting for general sources as a subproblem for the joint determination of model-source parameters. Representing sources as series of MPSs allows for a parametrization of source parameters that is linear with respect to the forward map. In this paper I document my approach for implementing MPSs within the IWave framework for forward modeling. The first section discusses the MPS representation and resulting source parametrization, particularly for first-order form of the acoustic and linear elasticity equations, motivating implementation. My proposed implementation is developed in the second section, where I define two main classes: MPS Space and MPS to RHS. The last section includes some numerical results as proof of concept, primarily the incorporation of MPSs in IWave s asg forward solver. SOURCE REPRESENTATION I am primarily interested in the following first-order forms of the acoustic and linear elasticity equations: (velocity-pressure form of acoustic equations) ρ(x) = density, κ(x) = bulk modulus, p(x, t) = pressure field, ρ t v i + p = f [v] x i, i (a) κ t p + v x j = f [p], j (b)

67 MPS Framework 63 v i (x,t) = i th component of particle velocity vector, f [p] (x,t) = pressure source term, f [v] i (x,t) = i th component of velocity source term vector, (velocity-stress form of linear elasticity equations) c ijmn (x) = fourth-order Hooke s tensor, σ ij (x,t) = second-order stress tensor, ρ t v i x j σ ij = f [v] i, (a) t σ ij c ijmn v x m = f [σ] ij, (b) n f [σ] ij (x,t) = second-order stress source term tensor. Indices i,j,m,n run from,...,d for d = {,3}, and Einstein notation is assumed, that is summation is assumed over repeated indices. Note that in Eq.() (or Eq.()) I have included source terms in the velocity and pressure (or stress) equations for acoustics (or linear elasticity). Indeed the relationships among corresponding velocity-pressure (or stress) sources will be of interest when considering source inversion. This write up however will abstain from such discussions given its intended scope. Moreover, I will not consider simultaneously both types of sources in practice, i.e., either the source is injected in the velocity fields or the pressure field (or stress fields) but not both. The following, for a given scalar source term f [p] (x,t), defines the MPS approximation of multipole order up to N, centered at x R d : f [p] (x,t) ( ) s F s (t)d s δ(x x ) (3) s N where I have used multi-index notation with s N d, and D s = d ( ) si, with s = i= x i d s i. i=

68 64 Hou and Symes Similarly, one can apply the MPS approximation to components of a vector or second-order tensor source term: f [v] i (x,t) ( ) s F i;s (t)d s δ(x x ), (4) f [σ] ij (x,t) s N ( ) s F ij;s (t)d s δ(x x ). (5) s N This source representation, as combinations of MPSs, allows for a source parametrization in terms of time functions F s (t),f i;s (t),f ij;s (t), which can result in a linear least squares problem for determining said source parameters. My proposed MPS framework seeks to maintain the source representation given in Eq.(3), Eq.(4), and Eq.(5), by defining a space of MPSs, where the MPS source parameters play the role of coefficients and linear combinations of singular terms D s δ(x x ) serve as basis vectors. For instance, consider a vector source term with components f [v] i given by some MPS as dictated in Eq.(4). In vector form, the source term f [v] (x,t) can be written as a span of the canonical basis denoted by ˆb i;s (x): f [v] (x,t) = d F i;s (t) ˆb i;s (x), with ˆb i;s (x) := ( ) s D s δ(x x )e i, (6) i= s where e i R d is the standard Euclidian basis vector. Canonical basis terms can be similarly defined for scalar and tensor source terms: f [p] (x,t) = F s (t)ˆb s (x), with ˆb s (x) = ( ) s D s δ(x x ), (7) and f [σ] (x,t) = d i,j= s N s N F ij;s (t) ˆb ij;s (x), with ˆb ij;s (x) = ( ) s D s δ(x x )e i e T j. (8) Thus, I refer to source parameters (i.e., the MPS time dependency) as the multipole coefficients, with respect to some specified multipole basis, which for the canonical case are given by ˆb s (x), ˆb i;s (x) or ˆb ij;s (x). Assuming source position x is known.

69 MPS Framework 65 Some applications may require constraints on the source representation due to physical reasons or assumptions on the radiation structure of the source. For example, in the study of earthquake source mechanisms, sources are commonly represented by the seismic moment tensor which corresponds in my notation to f [σ] given by an MPS expansion with multipole order N =. Using the canonical basis suggested in Eq.(8) results in nine unknown components of the seismic moment tensor to determine in 3-D elasticity. However, the number of unknown components can be reduced to six by requiring the seismic moment tensor to be symmetric in order to preserve angular moment, as expected from an indigenous source. This symmetry constraint can be naturally imposed by defining a new multipole basis, for example, b (x) = b 4 (x) =,,,,,,,,,,,, δ(x x ), b (x) = δ(x x ), b 5 (x) =,,,,,,,,,,,, δ(x x ), b 3 (x) = δ(x x ), b 6 (x) =,,,,,,,,,,,, δ(x x ) δ(x x ). Thus, any symmetric second-order tensor MPS of multipole order N = can be spanned by the proposed multipole basis {b i } 6 i=, that is f [σ] (x,t) = 6 F i (t)b i (x). i= My proposed MPS implementation follows this framework of representing MPSs by a set of (time dependent) multipole coefficients given a multipole basis. Constraints on the radiation structure of the source are naturally embedded in the choice of multipole basis and thus will have to be user defined in practice. The next section covers my implementation for embodying the MPS representation, given by the class MPS Space, and its connection to IWave related sources.

70 66 Hou and Symes IMPLEMENTATION My proposed implementation subdivides into two components: The MPS representation is implemented by the MPS Space class while the MPS to RHS class provides a link between MPS and IWave source formats. The MPS Space Class The MPS Space class serves as a general template for arbitrary spaces of MPSs, hosting information and methods that are common to all MPS representations. In particular, MPS Space is implemented as a purely virtual subclass of ProductSpace<ireal>, see listing. The MPS Space class can be interpreted as a wrapper class of some core Space cleverly named core. At runtime this core Space will be dynamically allocated by initializing a SEGYSpace related to a given MPS SU file containing the correct number of traces and source positions for the source in question. Thus core will be referred to as a SEGYSpace core of the MPS. using RVL:: Space; 8 using RVL:: StdSpace; using RVL:: ProductSpace; using RVL:: LinearOp; using RVL:: Operator; using RVL:: DataContainer; 4 using TSOpt:: IWaveSpace; 6 using std:: string; using std:: vector; 8 3 template < typename T > void print_vec(vector <T> v){ 3 for(int i=; i<v.size(); i++) cerr << v[i] << "\n"; 34 } 36 typedef struct s_ituple{ IPNT coor; 38 } Ituple; 4 typedef struct s_rtuple{ RPNT coor; 4 } Rtuple; Listing : Definition of MPS Space; in MPS Space.hh

71 MPS Framework typedef struct s_mps_base{ 46 int NT; // number of terms in base vector <Ituple > derv; // vector of multi -indexes for derivative info 48 vector <int> RHS_flags;// flags related to IWAVE RHS } MPS_base; 5 5 // struct MPS_KEYS contains all possible keys required in generating an // MPS_Space or an appropriate SubIWaveSpace RHS. 54 typedef struct s_mps_keys{ string appx_ord; // approximation order of MPS_to_RHS 56 string hdr_file; //su file containing source positions string grid_file; //rsf file containing physical grid info 58 string MPS_file; //su file for initializing MPS_Space string MPS_ref; // reference su file for generating MPS file 6 vector <string > RHS_files; //su files for RHS sources string delta_file; 6 string G_file; } MPS_KEYS; // Functions to add a list of keys and string values to a PARARRAY. void add_to_pars(pararray const &pars, vector <string > keys, vector <string > vals); 68 void add_to_pars(pararray const &pars, string key, string val); 7 //This function extracts source positions of given sufile. vector <Rtuple > ex_src_pos( string sufile ); 7 //This function counts the total number of traces is a SEGY file. 74 int get_ntr(string sufile); 76 //write_ * _SEGY methods write from scratch a segy file for either an MPS_Space //or an IWaveSpace (related to the RHS sources). 78 // Essentially the methods duplicate a given reference trace, from ref_file, //as appropriately needed. 8 //NOTE: source positions are embedded in the created MPS segy file, but not //for the RHS segys. 8 void write_mps_segy( vector <Rtuple > s_pos, int N_basis, 84 string ref_file, string MPS_file ); 86 void write_rhs_segy( int a_ord, 88 Ituple d_ord, int s_dim, 9 int N_src, string ref_file, 9 string RHS_file ); 94 // The MPS_Space class is a purely virtual class representing a space of multipole 96 // point sources (MPS s) with respect to a set "basis" of multipole terms. 98 // DATA MEMBERS: // MPS_KEYS mks = PARARRAY keys for everything needed in MPS code // Space <ireal > const * core = core SEGYSpace // Vector <Rtuple > s_pos = source positions

72 68 Hou and Symes // int s_dim = spatial dimension // string type = description of MPS Space 4 // int t_ord = tensor order of source term, // e.g, =scalar, =vector, =matrix Listing : Definition of MPS base and other data structures; in MPS Space.hh 36 typedef struct s_ituple{ IPNT coor; 38 } Ituple; 4 typedef struct s_rtuple{ RPNT coor; 4 } Rtuple; 44 typedef struct s_mps_base{ 46 int NT; // number of terms in base vector <Ituple > derv; // vector of multi -indexes for derivative info 48 vector <int> RHS_flags;// flags related to IWAVE RHS } MPS_base; A key of element of the MPS Space class is its MPS basis data member, consisting of a collection of MPS base data structures. As suggested by its name, MPS base contains information for specifying a particular multipole base. Another data structure worth mentioning is MPS KEYS, which houses all necessary keys related to files for constructing MPS and IWave sources, and will become more relevant in later parts of the the code. The standard MPS Space constructor is given in listing, starting at line 77. A parameter table pars, along with keys MPS KEYS mks, are given as input arguments from which the parent class can initialize some of the data members. Note that the initialization of data member MPS basis, and others, are left to the subclass constructor where the basis is to be defined. In a standard initialization of an MPS Space object one must procure some SU and RSF files a priori, namely an RSF file with spatial grid information (referred to as the grid file) and an SU file with the proper number of traces and with source positions (referred to as the MPS file). The make flag gives the user the option to make the MPS Space from scratch, that is in the case where the MPS file does not exist one will need to provide an SU file with source position information (referred to as the hdr file) and another SU file with at least one sample time trace to use as reference (referred to as MPS ref). In the making from scratch process the MPS Space constructor will generate an SU file, from which the SEGYSpace core will be initialized, by du-

73 MPS Framework 69 plicating the reference trace in MPS ref as needed by the MPS Space in question. Primarily, the total number is given by the following formula: number of traces = MPS basis.size() number of sources. The MPS Space is a purely virtual class and will require its subclasses to define in particular one method, build MPS basis() which sets the MPS basis for that family of MPSs. An example is in order. MPS Space Example: ExVec MPS Space Consider an MPS space of vector MPSs in -D spanned by the following basis multipoles, which will be referred to as the ExVec MPS Space: b (x) = [ ] δ(x), b (x) = [ ] δ(x), b 3 (x) = [ ]{ + } δ(x), (9) x x The following code demonstrates how ExVec MPS Space is implemented, and particularly how to write the build MPS basis() method for this space, see listing 4 starting at line 3.

74 7 Hou and Symes Listing 3: Definition of ExVec MPS Space; in MPS ExVec.hh / ******************************************************************************* / / *** ExVec_MPS_Space class ***************************************************** / / ******************************************************************************* / class ExVec_MPS_Space : public MPS_Space { 4 private: 6 Space <ireal > * clone() const{return new ExVec_MPS_Space( * this);} 8 public: ExVec_MPS_Space() : MPS_Space(){} ExVec_MPS_Space( ExVec_MPS_Space const &sp) 4 : MPS_Space(sp){} 6 ExVec_MPS_Space( MPS_KEYS mks, PARARRAY const &pars, 8 bool make ); 3 ExVec_MPS_Space(){} 3 void build_mps_basis(); }; }//end TSOpt 38 #endif Listing 4: Definition of ExVec MPS Space constructors and build MPS basis(); in MPS ExVec.cc ////////////////////////////////////////////////////////////////////////// // ExVec_MPS_Space /////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////// // // 4 ExVec_MPS_Space:: ExVec_MPS_Space( MPS_KEYS _mks, PARARRAY const& pars, 6 bool make ) : MPS_Space(_mks,pars,make) { 8 // // try{ if(s_dim!=){ RVLException e; 4 e << "input spatial dimension is not!\n"; throw e; 6 } 8 // setting t_ord and type t_ord = ; 3 type = "Example vector MPS_Space in D";

75 MPS Framework 7 3 // setting MPS_basis and d_ord build_mps_basis(); 34 //case where SEGY core is made from scratch 36 if( make ){ 38 string MPS_file; try{mps_file=valparse <string >(pars,mks.mps_file);} 4 catch(rvlexception &e){ //case where MPS file name is not supplied 4 cerr << "From ExVec_MPS_Space constructor\n" << "no MPS file name given, will generate one with rnd name!\n"; 44 MPS_file = "zzzz_mps_file.su"; } 46 // generate MPS file_and set SEGYSpace core 48 string ref_file = valparse <string >(pars,mks.mps_ref); string hdr_file = valparse <string >(pars,mks.hdr_file); 5 gen_segy_core(mps_file,ref_file,hdr_file); } 5 } 54 catch(rvlexception &e){ e << "ERROR from ExVec_MPS_Space constructor\n"; 56 throw e; } 58 } 6 // // void ExVec_MPS_Space:: build_mps_basis(){ 6 // // try{ 64 //b term = [,] delta(x_,x_) 66 MPS_base b; b.nt = ; 68 b.rhs_flags.resize(); 7 b.rhs_flags[]=; b.rhs_flags[]=; 7 b.derv.resize(b.nt); 74 b.derv[]. coor[]=; b.derv[]. coor[]=; 76 MPS_basis.push_back(b); 78 //b term = [,] delta(x_,x_) 8 MPS_base b; b.nt = ; 8 b.rhs_flags.resize(); 84 b.rhs_flags[]=; b.rhs_flags[]=; 86 b.derv.resize(b.nt); 88 b.derv[]. coor[]=;

76 7 Hou and Symes b.derv[]. coor[]=; 9 MPS_basis.push_back(b); 9 //b3 term = [,] { x _ + x _ } delta 94 MPS_base b3; b3.nt = ; 96 b3.rhs_flags.resize(); 98 b3.rhs_flags[]=; b3.rhs_flags[]=; b3.derv.resize(b3.nt); b3.derv[]. coor[]=; b3.derv[]. coor[]=; 4 b3.derv[]. coor[]=; b3.derv[]. coor[]=; 6 MPS_basis.push_back(b3); 8 // setting d_ord d_ord.coor[]=; d_ord.coor[]=; } catch(rvlexception &e){ 4 e << "ERROR from ExVec_MPS_Space:: build_mps_basis()!\n"; throw e; 6 } } 8 }//end TSOpt The ExVec MPS Space deals with vector (first-order tensor) MPSs in -D, thus t order= and s dim= in the constructors. The standard constructor is given in listing 4, starting in line 84. If the make flag is set to false, that is the MPS file exists, then the SEGYSpace core will get ininitialized by the parent class constructor and the subclass constructor for ExVec MPS Space will take care of MPS basis and other data members. If make==true, then the generation of the MPS file is carried out in gen SEGY core() method, which will be called in the subclass constructor. Lets take a closer look at the MPS base data structure: Consider the base term b 3 from Eq.(9). This multipole base contains two terms (b3.nt=), δ and δ, x x which are encoded by b3.derv. Lastly, vector data of multipole base b 3 is stored in b3.rhs flags, [ ] [,] T b3.rhs flags[]= =. b3.rhs flags[]= The RHS flags essentially correspond to the canonical R d basis vectors, which in

77 MPS Framework 73 turn correspond to components of IWave related fields. For example, assuming that MPSs represented by ExVec MPS Space correspond to vector velocity source for acoustics in pressure-velocity form, having set b3.rhs flags[]= and b3.rhs flags[]= implies that b 3 will contribute to a force injected in the z and x component of the velocity field. Alternately, b3.rhs flags[]= and b3.rhs flags[]= would correspond to a force contribution in only the z-velocity field. The MPS to RHS Operator Formulating source terms f [p],f [v] i ij as MPSs results in a forward map that is linear with respect to multipole coefficients F s (t),f i;s (t),f ij;s (t), given a multipole basis. Consequently in order to evaluate the forward map one must solve a system of PDEs with increasingly singular source terms that involve derivatives of the spatial delta function, D s δ(x x ). Errors due to discretization of singularities can have a detrimental effect on the overall wave propagation simulation, in particular it can hamper the solver s convergence rate if special care is not taken. Waldén (999) develops an approximation to singular sources in a finite difference (FD) setting for the -D Helmholtz equation. He proofs that overall the convergence rate of FD schemes can be preserved if the singular source terms satisfies certain discrete moment conditions. In this work I will follow the same singular source approximation proposed by Waldén (999) for MPSs over a uniform grid. It suffices to think of the approximation as smearing the source multipole coefficients from a single point to several grid points, which is referred to as the source stencil, in a manner depending on the derivative order over δ, source position relative to grid, grid size, and desired approximation order.,f [σ] The MPS to RHS operator provides a link between the MPS representation and RHS sources in an IWave compatible format, primarily by applying an appropriate singular source approximation and writing out corresponding SU files. I have chosen to implement MPS to RHS as a purely virtual subclass of LinearOp with domain MPS Space * and range IWaveSpace *. The MPS to RHS operator will also have make flags for both its domain and range, similar to the make flag for the MPS Space class. The essential forward and adjoint actions of the MPS to RHS operator will be carried out by kernels, mainly MPS to RHS kern and RHS to MPS kern respectively, via low level function calls to standard SU I/O methods such as fgettr() and fputtr().

78 74 Hou and Symes Listing 5: Definition of MPS to RHS; in MPS to RHS.hh 4 // destructor MPS_to_RHS(); 6 const T_MPS_Space & get_mps_domain() const{ return dom;} 8 const IWaveSpace & get_iwave_range() const {return * (rng_ptr);} const Space <ireal > & getdomain() const {return get_mps_domain();} const Space <ireal > & getrange() const {return get_iwave_range();} ostream & write(ostream & str) const{ str<<"mps_to_rhs operator\n"; 4 str<<"domain:\n"; dom.write(cerr); 6 str<<"range:\n"; if(rng_ptr==nullptr) 8 str<<"null pointer\n"; else 3 rng_ptr ->write(cerr); return str; 3 } 34 PARARRAY & get_pars(){ return * pars;} PARARRAY const & get_pars() const{ return * pars;} 36 grid get_grid() const{ return g;} int get_a_ord() const{ return a_ord;} 38 vector <Ituple > get_s_idx() const{ return s_idx;} vector <Rtuple > get_s_res() const{ return s_res;} 4 string get_dom_filename() const; vector <string > get_rng_filenames() const; 4 void print_info() const; }; // // template <class T_MPS_Space > 48 MPS_to_RHS <T_MPS_Space >:: MPS_to_RHS( MPS_to_RHS const &MtoR ) : dom(mtor.dom), 5 rng_ptr(mtor.rng_ptr), pars(null), 5 mks(mtor.mks), make_mps(mtor.make_mps), 54 make_rhs(mtor.make_rhs), a_ord(mtor.a_ord), 56 g(mtor.g), s_idx(mtor.s_idx), 58 s_res(mtor.s_res) { // // 6 try{ 6 if(!rng_ptr ){ RVLException e; 64 e << "empty range space pointer\n"; throw e; 66 } 68 // copying pars

79 MPS Framework 75 pars = ps_new(); 7 if(ps_copy(&pars, * MtoR.pars)){ RVLException e; 7 e <<"failed to copy parameter table\n"; throw e; 74 } } 76 catch(rvlexception &e){ e << "ERROR from MPS_to_RHS constructor\n"; 78 throw e; } 8 } 8 // // template <class T_MPS_Space > 84 MPS_to_RHS <T_MPS_Space >:: MPS_to_RHS( MPS_KEYS _mks, PARARRAY const &_pars, 86 bool _make_mps, bool _make_rhs ) 88 : dom(_mks,_pars,_make_mps), rng_ptr(nullptr), 9 pars(null), mks(_mks), 9 make_mps(_make_mps), make_rhs(_make_rhs){ 94 // // try{ 96 // copying pars 98 pars = ps_new(); if(ps_copy(&pars,_pars)){ RVLException e; e <<"failed to copy parameter table\n"; throw e; } 4 // extracting a_ord Listing 6: Definition of MPS to RHS::apply(); in MPS to RHS.cc Analogously to the MPS Space, the MPS to RHS operator is a purely virtual class and will require its subclasses to define in particular the set dom ptr() method for a given target MPS Space subclass. The following section provides an example of such a subclass for the ExVec MPS Space.

80 76 Hou and Symes TESTS Five tests were created to ensure that the MPS code is working. Currently, tests involve only the asg solver since it is the only forward solver in IWave at the moment that solves a first-order system. The space of scalar and vector MPS sources, spanned by corresponding canonical multipole basis (e.g., Eq.(7) and Eq.(6)), are implemented and also tested. These spaces are implemented by Can- Scal MPS Space and CanVec MPS Space, along with associated operators CanScal MPS to RHS and CanVec MPS to RHS given in MPS Can.cc and MPS Can.hh. The following medium specifications are used for all tests: 5m 5m acoustic domain constant density; ρ = kg/m 3 constant velocity; c = 4 m/s receiver depth of m receiver horizontal displacement of m 48 m, separated by m total recording time is s - staggered grid scheme grid size of 6 m The drivers for these tests are located in the directory mps code/main, and consist of asg canscal.cc (for canonical scalar MPSs), asg canvec.cc (for canonical vector MPSs) and asg exvec.cc (for MPSs from ExVec MPS Space). I would also like to point out that all IWaveSpace s related to RHS source are generated by the MPS to RHS operators in these examples, as oppose to providing correct RHS SU files a priori. In other words, all initializations of MPS to RHS operators have make RHS==true.

81 MPS Framework 77 Test The first test consists of a forward simulation with two scalar MPSs of the following form: f [p] (x,t) = F(t)δ(x x ). Recall that a scalar MPS will be injected in the pressure field, as oppose to the velocity field components. The first source is centered at x = [ 5 m, m] and with F(t) given by a derivative of the Gaussian curve (see Figure, blue curve). Source location x was chosen to be [ 5 m,4 m] for the second source, with F(t) given by a scaled derivative of Gaussian curve (see Figure, red curve). The resulting measured pressure field at receiver locations is given by traces in Figure. Results coincide with expectations. Test The second test also concerns two scalar MPSs with same source locations. For this case, the sources will be dipole point-sources: first source: f [p] (x,t) = F(t) x δ(x x ) second source: f [p] (x,t) = F(t) x δ(x x ). Figure 3 plots traces of the pressure field with expected results. Test 3 Third test involves two vector monopole point-sources, but with same source positions as in tests and : first source: f [v] (x,t) = F(t)[,] T δ(x x ) second source: f [v] (x,t) = F(t)[,] T δ(x x )

82 78 Hou and Symes Figure 4 plots traces of the pressure field, where these traces are similar to those shown in Figure and not by coincidence. Consider the process of rewriting Eq.() as a wave equation in the pressure field p, but carry over the scalar and vector source terms f [p],f [v] i : ( ) c t x i p = κ t f [p] c x i f [v] i. Thus, the first monopole vector source corresponds to a dipole scalar source, i.e., f [v] = F(t)[,] T δ(x x ) = f [p] = F(t) x δ(x x ), and similarly for the second source. Test 4 This test considers three vector MPSs located at depth 5 m and horizontal displacements m, m, and 3 m. first source: f [v] (x,t) = F(t)[,] T second source: f [v] (x,t) = F(t)[,] T x δ(x x ) x δ(x x ) third source: f [v] (x,t) = F (t)[,] T δ(x x ) + F (t)[,]δ(x x ) Resulting pressure-field traces are given in Figure 5. The third source, sum of vector monopoles, dominates in magnitude the color axis and thus results for first and second sources are also shown individually in Figure 6. Test 5 This last test uses a single vector MPS from the Ex MPS Space family. Source is centered at x = [ 5 m,5 m] with the following form: f [v] = F(t)b 3 (x) where b 3 (x) is given in Eq.(9). Results are given in Figure 7.

83 MPS Framework 79 APPENDIX: DERIVING ADJOINT ACTION OF MPS TO RHS The following notation is introduced in order to derive the adjoint action for any MPS to RHS operator. Let f D denote a tensor corresponding to an input (or domain vector) of some given MPS to RHS operator, where f D contains time traces for each multipole basis of each source. The following notation denotes the components and thus the data content of the tensor f D : where f D [i t,i s,i b ] i t = time index, i s = source index, i b = multipole basis index. Similarly, an output (or range vector) is encapsulated by a tensor f R, containing time traces at each source stencil position of each source for each RHS source component. Thus, components of f R are given by where f R [i t,i s,i r,i n ] i t = time index, i s = source index, i r = RHS component index, i n = source stencil index. The forward action of any operator from the class MPS to RHS is dictated by the method MPS to RHS kern(). A rough outline of the write RHS() method is

84 8 Hou and Symes given in Algorithm ; see lib/mps base.cc for the actual code. Essentially, the forward action of MPS to RHS operators consists of spreading a point source over a stencil and linearly combining time traces (point wise in time) of multipole bases for each source. The weights associate with linear combinations correspond to approximations of singular sources and are dependent on the multipole basis term (index i b ), source position (index i s ), stencil point (index i n ), a in some regards dependent on the RHS components as well (index i r ). Thus, I can contain all of the weights needed in the forward action by the tensor w, with components w[i s,i b,i r,i n ]. Let A denote an MPS to RHS operator. Then, given a domain tensor f D the forward action of A is given as follows: (Af D )[i t,i s,i r,i n ] = w[i s,i b,i r,i n ] f D [i t,i s,i b ]. i b Defining the adjoint of A will be of course dependent on the inner product defined for the domain and range. For given vectors in the domain, or range, the following l -like inner products are defined: f D,g D = f D [i t,i s,i b ] g D [i t,i s,i b ], i t,i s,i b f R,g R = i t,i s,i r,i n f R [i t,i s,i r,i n ] g R [i t,i s,i r,i n ]. Let f R = Af D, then f R,g R = w[i s,i b,i r,i n ] f D [i t,i s,i b ] gr [i t,i s,i r,i n ] i t,i s,i r,i n i b = f D [i t,i s,i b ] w[i s,i b,i r,i n ] g R [i t,i s,i r,i n ] i t,i s,i b i r,i n = f D,A g R, These are the inner product defined for the SEGYSpaces and tensor products of such spaces, and currently in use for domain and range of MPS to RHS operators.

85 MPS Framework 8 thus giving the adjoint of A, (A f R )[i t,i s,i b ] = w[i s,i b,i r,i n ] f R [i t,i s,i r,i n ]. i r,i n The equation for the action of A states that the adjoint action of a MPS to RHS operators consists of linear combinations of time traces for different RHS components (index i r ) and stencil points (index i n ), point wise in time (index i t ) for each source (index i s ). Algorithm Pseudocode for the write RHS() method. for i r =,... do //loop over RHS components for i s =,... do //loop over sources compute global stencil info for i n =,... do //loop over stencil points compute stencil point coordinates set source and receiver locations in RHS trace for i b =,... do //loop over multipole basis for b i =,... do //loop over basis terms compute local stencil info compute weight w from singular source appx for i t =,... do //loop over time update RHS traces: tr RHS [i t ]+ = tr MP S [i t ] w i t + + end for b i + + end for i b + + end for i n + + end for i s + + end for i r + + end for

86 8 Hou and Symes Figure : Plots of derivative of Gaussian, unscaled (blue)and a scaled by a factor of (red).

87 MPS Framework 83 Figure : Pressure-field traces for test : two scalar monopoles.

88 84 Hou and Symes Figure 3: Pressure-field traces for test : two scalar dipoles.

89 MPS Framework 85 Figure 4: Pressure-field traces for test 3: two vector monopoles.

90 86 Hou and Symes Figure 5: Pressure-field traces for test 4: three vector MPSs.

91 MPS Framework 87 Figure 6: Pressure-field traces for first (a) and second (b) sources of test 4.

92 88 Hou and Symes Figure 7: Pressure-field traces for test 5: one vector MPS from the Ex MPS Space family.

93 MPS Framework 89 REFERENCES Backus, G., and M. Mulcahy, 976, Moment tensors and other phenomenological descriptions of seismic sources i. continuous displacements: Geophysical Journal International, 46, Santosa, F., and W. Symes,, Multipole representation of small acoustic sources: Chinese Journal of Mechanics, 6, 5. Shearer, P. M., 9, Introduction to seismology: Cambridge University Press. Stump, B. W., and L. R. Johnson, 98, Higher-degree moment tensors the importance of source finiteness and rupture propagation on seismograms: Geophysical Journal International, 69, Waldén, J., 999, On the approximation of singular source terms in differential equations: Numerical Methods for Partial Differential Equations, 5, 53 5.

94 9 Hou and Symes

95 The Rice Inversion Project, TRIP6, July 6, 6 Full Waveform Inversion via Source-Receiver Extension Guanghui Huang and Rami Nammour and William Symes ABSTRACT The source-receiver extension models each seismic trace by a its own proper point source, permitting good data fit even in the absence of velocity information. Thus full waveform inversion might be accomplished by searching amongst extended models fitting data trace for those that minimize a measure of extension. Provided that a physial source signature is known, an effective measure of source-receiver extension is RMS time lag of the signaturedeconvolved sources. Estimation of extended sources by regularized leastsquares avoids possible ambiguity caused by multipathing. Numerical examples suggest that a properly formulated version of source-receiver extended inversion is capable of recovering reasonably good velocity models from synthetic data without the stagnation at physically irrelevant models frequently encountered by standard full waveform inversion. INTRODUCTION Full waveform inversion (FWI) estimates subsurface structure with high precision by minimizing the differences between the synthesized data and recorded data in the least squares sense (Tarantola, 984; Virieux and Operto, 9). However, The domain of convexity of the FWI objective function for velocity estimation is generally quite small, on the order of a wavelength in diameter, and iterative optimization methods starting further from the global minimizer may stagnate at physically meaningless apparent optima. The root cause of this behaviour is the tendency of predicted data to be out of phase with, or even orthogonal to, recorded data in large regions of model space ( cycle-skipped ), and therefore very far away in the mean square sense. This problem may be avoided to some extent by a combination of initial model accuracy, high S/N at low frequency 9

96 9 Huang, Nammour and Symes MSIVA (achieved in some surveys, not in others), and data fitting in expanding frequency bands, from low to high (Bunks et al., 995; Pratt, 999; Pratt and Shipp, 999; Sirgue and Pratt, 4; Virieux and Operto, 9). The stagnation problem is somewhat less severe for fitting of refracted energy (diving waves), as noted already by Gauthier et al. (986). This paper describes an alternative to least-squares data fitting, that we will call matched source waveform inversion (MSWI), and illustrates its behaviour with several synthetic examples. The basis of this approach is the source-receiver extension of constant density acoustic modeling: an independent source trace is provided for each data trace. Appropriate choice of the source trace fits the data trace, for any choice of velocity field. Since the mean-square error is small for all such extended models (velocity plus source traces), some other objective must take over the role of fit error in standard FWI, and drive the extended model towards a physical (non-extended) model that explains the data, hence solves the FWI problem. All traces in a common source gather should share the same source, so we penalize the deviation from common source. Several choices of objective are available for this task. Having chosen a penalty, it becomes a function of velocity, because only one extended source matches the data for each velocity model. Minimizing the common source penalty over the velocity model generates an FWI solution, in a process that fits the data well at every stage. In some circumstances, we have been able to show that various versions of the MSWI penalty have the same stationary points, and even the same gradient and Hessian up to a positive factor, as some variant of traveltime tomography (Song and Symes, 994; Huang and Symes, 5). In these settings, MSWI produces a model of tomographic quality, explaining the kinematics in the data. This idea is not new: Song and Symes (994); Symes (994); Plessix et al. (); Plessix (); Pratt and Symes (); Luo and Sava () investigate data fitting via source-receiver extension to enhance the convergence of FWI. Warner and Guasch (4) uses a very similar approach as part of Adaptive Waveform Inversion, and show its capacity to enlarge the domain of attraction for FWI and its practicality for application to contemporary 3D field surveys. Of several possible choices of penalty, we use the mean-square time lag suggested in several of these works (Plessix, ; Luo and Sava, ; Warner and Guasch, 4). The contribution of this report is to describe and partly remedy a pathology of MSWI, identified already by the third author a number of years ago (Symes, 994) in the context of the crosswell inversion problem: in the presence of multi-

97 MSIVA MSIVA 93 ple energetic arrivals (multipathing), the MSWI penalty functions tend to become multimodal, and MSWI comes to resemble FWI in its tendency to stagnate at non-informative models. Plessix et al. (); Plessix () observed this phenomenon in their application of a version of MSWI to inversion of synthetic and field crosswell data. We provide an explanation: with the source-data relation approximated by circulant convolution, we observe that Green s function with multiple arrivals may generate an effective convolutional null space, destroying the uniqueness of the extended source estimation and accounting for the multimodal penalty function. We also suggest a partial remedy: as is routine with other ill-conditioned inverse problems, we stabilize source estimation by Tihonov regularization (Engl et al., 996). The regularized MSWI problem tends to recover the convexity exhibited by the unregularized problem in the single-arrival case, and retains the same global minimizer within a small error. We illustrate all of these claims with numerical examples. In the following pages, we first review the theoretical foundation of MSWI, and explain the structure of the algorithm. Then we illustrate its behaviour with six D numerical examples. The first two are set in an idealized crosswell geometry, and illustrate the capability of MSWI to provide tomographic-quality solutions, the obstacle to velocity updating posed by multiple arrivals, and a partial remedy through Tihonov regularization. The other four examples use surface acquisition geometry. The first of these is a pure diving wave problem, with no reflections. The target model generates multiple ray paths in the diving wave field, with the same damaging effect on velocity updating via MSWI as in the previous crosswell example. Once again, Tihonov regularization is sufficient to restore convergence to a useful model. We then invert several models with reflected wave energy: a version of the Marmousi model, and two examples with inclusions mimicing the physical properties of salt. All examples begin with homogeneous or simple layered initial guesses. In the salt examples, the inclusion emerges without any encouragement from the background. We end with a discussion of several obvious or not-so-obvious capabiities and limitations of MSWI.

98 94 Huang, Nammour and Symes MSIVA THEORY The acoustic model of seismic wave propagation treats the excess pressure field u as the solution of the acoustic wave equation u v t u = δ(x x s )f (t), () ū =,t << () The energy source is modeled here as a point isotropic radiator with source pulse f (t). The forward modeling operator maps the source to the data traces, presumed to be perfect pressure measurements, and depends on the wave velocity: S[v]f (x r,t;x s ) = u(x r,t;x s ). (3) In equations and 3, the source and receiver locations, x s and x r respectively, define the acquisition geometry of the survey. The standard least-squares inversion, or FWI, problem is: given f (t) and d(x r,t;x s ), determine v(x) so as to minimize J FWI [v] = S[v]f d. (4) The vertical bars denote the mean square or L -norm squared, in other words the sum over all active values of x r,t, and x s, possibly scaled by cell volume or other factors. As mentioned in the introduction, the function defined by equation 4 is difficult to minimize, so we will explore an alternative approach to the solution, through models that violate at least some of the modeling assumptions made above. Source-Receiver Extended Modeling The source-receiver extension introduces a trace-dependent source function f (x r,t,x s ) to replace f (t). The extended acoustic system is ū v t ū = δ(x x s ) f (x r,t,x s ), (5) ū =,t << (6)

99 MSIVA MSIVA 95 and the extended forward modeling operator is defined by sampling the extended pressure field ū as before: S[v] f (x r,t;x s ) = ū(x r,t;x s ). (7) Since the extended source is trace-dependent, it is straightforward to fit the data, which is not the case in general with the non-extended source unless the data kinematics are well-predicted by the velocity. Denote by G[v](x r,t;x s ) the causal Green s function of the acoustic wave equation, that is, the solution of the system, with f (t) = δ(t). Then S[v] f (x r,t;x s ) = G[v](x r,t;x s ) t f (xr,t;x s ). (8) Equation 8 shows that computing the source-receiver extended forward map involves minimal expense beyond that of the non-extended forward map defined in equation 3: since G[v] = S[v]δ(t), computing the action of S[v] requires computing the action of S[v], followed by one additional convolution per output trace. If we denote by G[v] a convolution inverse to G[v], then S[v](G[v] d)(x r,t;x s ) = d(x r,t;x s ). (9) This operation assumes that the Green s function has a convolution inverse, of course. In this paper, we will assume that all traces are defined on the same time interval [, T ]. To accommodate timing errors associated with erroneous velocities, we extend the time interval for both the (extended) source and the data to [ T,T ], padding the data with zeros for t <. Then we regard all functions as periodic in T of period T, and use circulant convolution, which of course in the Fourier domain amounts to multiplication. So the convolution inverse of the Green s function is simply its reciprocal in the Fourier domain, which is a priori available only if the Fourier transform has no zeroes. Sometimes this is the case, sometimes not, as will be illustrated below. To make estimation of the extended source f robust against spectral zeroes of the Green s function, we introduce the regularized least squares version of equation 9: choose f to minimize S[v] f d + ε f. ()

100 96 Huang, Nammour and Symes MSIVA The minimizer solves the normal equation ( S[v] T S[v] + ε I) f (x r,t;x s ) = S[v] T d(x r,t;x s ) () and depends on v,d, and ε. The matrix of the operator on the left-hand side of is the autocorrelation of G, prewhitened by addition of ε. In particular, the normal equation is again a convolution equation, and can be solved exactly by means of the Fourier transform. Matched Source Waveform Inversion The extended source constructed in equation 9 or is mostly likely unphysical, in that the physics defined at the beginning of this section required that the source be uniform across all traces, that is, f (x r,t,x s ) = f (t). Since the data can be fit (assuming deconvolvable Green s function) for any velocity, it is only this requirement that provides velocity updates. A simple way to quantify failure of the extended source to match a physical one uses an annihilator, that is, an operator A that produces a zero result when applied to a physicalliy consistent source. Two possible choices are differentiation of the source with respect to x s and x r (Song and Symes, 993; Pratt and Symes, ), or forcing the convolution quotient of the extended and target (non-extended) sources to resemble the delta function, for example by penalizing the second moment of the mean square (Plessix et al., 999; Luo and Sava, ; Warner and Guasch, 4). We will use the second option in the work reported below, as it is somewhat simpler to implement. Specifically, A f (x r,t;x s ) = t(f ) f () If we apply A to the result of regularized least squares inversion as defined above, we obtain an index of velocity correctness: if the velocity model and data are kinematically compatible, then all of the inverted sources f (x r,t;x s ) are approximately the same as the source f (t), hence signature deconvolution should yield an approximate delta function at zero lag, and that is in turn nearly annihilated by multiplication by t. Kinematic disagreement between model and data should lead to larger Af. We capture this idea in an objective function: J[v] = A f [v] (3)

101 MSIVA MSIVA 97 in which f [v] is the solution of the inner (quadratic) minimization problem ( f [v] = argmin f S[v] f d + ɛ f ). (4) In effect, the model over which this objective is to be optimized includes both the velocity v(x) and the extended source function f (x r,t;x s ), hence is an extended model. The optimization is treated as a nested problem, with f determined as a function of v via solution of an inner optimization problem (equation 4), then v determinined by minimizing J[v] (equation 3). Remark. The nested design of the optimization problem defined in equations 3, 4 is essential, not merely a computational convenience. Minimization of alternative objective functions of v(x), f (x r,t;x s ), such as the penalty function J α [v, f ] = A f + α S[v] f d, (5) over v and f jointly, turns out to be very inefficient, with (in principle) arbitrarly slow convergence. A problem more amenable to solution via local optimization can be recovered by optimizing J α first over f to create a reduced objective depending only on v, which is then optimized over v - that is, a nested optimization, similar to that defined in equations 3, 4. See Symes (5) for an explanation. We will not use the penalty function J α in the work reported here. We will use a gradient-based method to minimize J. Appendix A shows how to compute this gradient. Define w to be the solution of and r by ( denotes cross-correlation). Then ( S[v] T S[v] + ε I) w = A T A f (6) r = w (d S f ) f S w (7) J[v] = (D S[v]δ) T r (8) in which (D S[v]δ) T is the well-known impulsive reverse time migration operator. See Appendix A for details.

102 98 Huang, Nammour and Symes MSIVA NUMERICAL EXAMPLES We present two sets of numerical examples illustrating the performance of MSWI in comparison to FWI. The first set uses an idealized crosswell geometry, with the source locations on one face of the rectangular scattering domain, the receivers on the opposite face. The second set mimics surface acquisition, with sources and receivers on the same face of the rectangular domain. We use the limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm with backtracking line search to assure compliance with the weak version of Wolfe s conditions for global convergence to a stationary point (Nocedal and Wright, 999). For each example, we will apply LBFGS both to solve the full-bandwidth FWI problem, and to minimize the MSWI objective defined in equaiton 4, and the FWI objective (equation 4). Forward modeling is implemented in frequency domain, using a nine-point (4th order, cross-shaped) stencil to approximate the Helmholtz operator, and a direct matrix solver. The least squares problems defined in equations and 6 are diagonal in the Fourier domain because of our use of circulant deconvolution, so these are solved to machine precision as well. The objective function and gradient are therefore computed to machine precision, at the discrete level. Crosswell Acquisition Geometry The first two examples use idealized crosswell acquisition, that is, sources lie on one boundary face of a rectangular scattering region, receivers on the opposite face. The first shows that MSWI without regularization (i.e. ε = ) may converge to a reasonable solution of inverse problem when FWI fails to do so. In this example, the data exhibits only a single arrival (though partly as a result of wavefront healing). Our previous analysis (Huang and Symes, 5) explains the behaviour of MSWI in this case: when raypaths from source to receiver are unique, MSWI is equivalent to least-squares traveltime tomography, and delivers a comparable solution. The second shows that unregularized MSWI may fail in the same way as FWI if multiple arrivals are present in the data with significant energy. We observe that this phenomenon may be understood as ill-posedness of the inner (source-estimation) problem, and that Tihonov regularization can restore apparent convergence to a tomographic-quality solution.

103 MSIVA MSIVA 99 Weak Low-Velocity Lens The target velocity model for the first example is a Gaussian low velocity anomaly embedded in a constant background velocity v = km/s (Figure ), i.e., v(x,z) =.7e (x ).5 (z ).5 km/s The sources and receivers (indicated with white triangle and green circle in Figure, respectively) are placed at x s =. km and x r =.99 km, respectively. Thirty-nine shots are evenly spaced between z s =.5 km to z s =.95 km, and 99 receivers are located from z r =. km to z r =.99 km with z r =. km. For display purposes, we synthesize a time-domain solution of the target problem from frequency-domain fields, and show the recorded data for the center shot at z s = km in Figure a. The effective time-domain source is a boxcar.5-3 Hz bandpass filter. We use the constant velocity v = km/s as the initial model. The simulated data in Figure b using the initial velocity for the centered shot shows a traveltime error larger than a half-wavelength at the median frequency of Hz for the central data traces of the recorded data. For inversion, we use the data frequency band from 3 Hz to Hz. After LBFGS iterations, FWI is stuck in a physically meaningless solution, whereas MSWI produces a reasonable estimate of v, see Figure 3a-3b for the detailed result, respectively. To further confirm the kinmenatic accuracy of the MSWI solution, we display the extended sources at initial velocity (Figure 4a) and final inverted velocity (Figure 4b)..95 Depth (km) Distance (km) Figure : First crosswell example: target velocity model with slow Gaussian anomaly. Lowest velocity is.7 km/s

crosswell example: comparison of shot gathers for the center shot z s = km: (a)

8 Depth (km).85.8.75.75.7 Distance (km).

104 Huang, Nammour and Symes MSIVA Time (s) Time (s) Receiver (km) Receiver (km) Figure : First crosswell example: comparison of shot gathers for the center shot z s = km: (a) target data and (b) simulated data using initial velocity Depth (km).85.8 Depth (km) Distance (km).7 Distance (km) Figure 3: First crosswell example: inverted velocity after LBFGS iterations with 3- Hz bandpass data by (a) FWI and (b) MSWI (ε = ). Initial velocity is v = km/s in both cases.

105 MSIVA MSIVA Time (s) Time (s) Receiver (km) Receiver (km) Figure 4: First crosswell example: extended sources for central shot z s = km using (a) initial velocity, and (b) inverted velocity by MSWI We can understand this example on basis of the explanation presented in Song and Symes (994); Symes (994); Huang and Symes (5). To recapitulate briefly, the MSWI objective is closely related to the standard least-squares traveltime tomography objective when the model predicts a single-arrival wavefield. In fact both the gradient and Hessian of the MSWI objective are asymptotically proportional to the gradient and Hessian of the traveltime objective. This accounts for the tomography-like behaviour of the MSWI iteration. In contrast, the FWI objective is non-convex and requires a much better starting model than v = km/s to converge to a useful velocity estimate. Strong Low-Velocity Lens However, the story changes dramatically if the data exhibits multiple energetic arrivals, as pointed out already in Symes (994): in this case, the (unregularized) MSWI objective may be as non-convex as the FWI objective, and the MSWI algorithm can fail to produce kinematically accurate velocity estimates.

106 Huang, Nammour and Symes MSIVA A modification of the first example exhibits this pathology. The second example (Figure 5) keep the shape velocity anomaly, but makes it stronger: v(x,z) =.6e (x ).5 (z ).5 Now the lowest velocity of this model is.4 km/s. We use the same source and receiver geometry and the data frequency band as the first example. Synthetic data (Figure 6a-6c) for three shot position z s =.,.5, km shows energetic later arrivals. For comparison, Figure 7a-7c shows simulated data with initial velocity v = km/s at the same shot positions. The misfit of traveltimes in the initial data is even more severe than was the case in the first example, and FWI indeed fails as we expect (Figure 8a). However, MSWI without regularization also fails to provide to produce a kinematically accurate velocity after LBFGS iteration (Figure 8b). To understand why MSWI without regularization does not work in the presence of multipathing, recall that this implementation solves the normal equation by circulant deconvolution, that is, by division in the Fourier domain. Besides producing potentially non-causal source estimates, this approach will fail when the Fourier domain Green s function vanishes or comes close to vanishing. We simulated data in Figure 9a-9c with the trial velocity v t =.8v +.v, which is close to the true velocity model. We can see there are still quite obvious triplications present in the simulated data. Taking a single trace z r =.55 km (Figure a) for the center shot gather at z s = km for example, we plot the spectrum of the normal operator S T S, which is the same as the power spectrum of the Green s function, in Figure b. For traces with multiple energetic arrivals, the spectrum oscillates and almost vanishes at several frequencies, suggesting the existence of an effective numerical null space of the normal operator. Regularization therefore seems natural. We plot the extended source, estimated by solving equation with the trial velocity v t, for regularization parameters ε = 6,.5,5 in Figure a-i. Oscillatory and non-decaying traces are present in the extended source function f for ε = 6, but are suppressed by increasing ε. To illustrate the effect of the null space contributions, we plot the objective function for different values of ε = 6,.5,5 on a line segment in model space, between the initial and target velocities. Parameter α = corresponds to the target

107 MSIVA MSIVA 3 velocity, α = to the initial velocity. Figures a,b, and c show that the region of convexity of MSWI objecitive is quite small for small ε, however expands to include the initial velocity for large enough ε..9 Depth (km) Distance (km) Figure 5: Second crosswell example: target velocity model with slow Gaussian anomaly. Lowest velocity is.4 km/s Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Figure 6: Second crosswell example: shot gathers of recorded data for shot at (a) z s =. km, (b) z s =.5 km and z s = km, respectively As mentioned before, the arrival time error in the initial 3- Hz data (Figure 7a-7c) is too large to permit successful FWI, starting with v = km/s (Figure 8a). Having thoroughly explored the behaviour of regularized MSWI with various values of ε, and having found that ε = 5 satisfactorily suppresses the non-decaying noise signal in the estimated extended source, we use that value in estimating the velocity by the algorithm described above. Fifteen LBFGS iterations produces the velocity estimate in Figure 3a. Both position and shape of Gaussian anormaly are well resolved. Additionally, the extended source functions are almost focused on the zero-lag time after regularized MSWI (Figure 4a-4c), indicating that the final velocity estimate is kinematically accurate. In view of its kinematic accuracy,

4 Huang, Nammour and Symes MSIVA 5 5 5 5 5 5 Time (s) Time (s) Time (s) 5 5 5 5.5.5 Receiver (km) 5.5.5 Receiver (km) 5.5.5 Receiver (km) Figure 7: Second crosswell example: shot gathers of simulated data using initial veloctiy v for shot at (a) z s =.

4 Distance (km) Figure 8: Second crosswell example: inverted velocity after LBFGS iterations with 3- Hz data by (a) FWI and (b) unregularized MSWI (ε = ). Initial velocity is v = km/s in both cases.

108 4 Huang, Nammour and Symes MSIVA Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Figure 7: Second crosswell example: shot gathers of simulated data using initial veloctiy v for shot at (a) z s =. km, (b) z s =.5 km and z s = km, respectively.9.9 Depth (km).8.7 Depth (km) Distance (km).4 Distance (km) Figure 8: Second crosswell example: inverted velocity after LBFGS iterations with 3- Hz data by (a) FWI and (b) unregularized MSWI (ε = ). Initial velocity is v = km/s in both cases Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Figure 9: Second crosswell example: shot gathers of simulated data using trial velocity v t =.8v +.v for shot at (a) z s =. km, (b) z s =.5 km and z s = km, respectively

109 MSIVA MSIVA Amplitude Spectrum Time (s) Frequency (Hz) Figure : Second crosswell example: (a) data trace at z r =.55 km for shot at z s = km and (b) spectrum of normal operator S T S of trace (a) the final estimate from MSWI should be a usable initial estimate for FWI: indeed, 5 LBFGS iterations of full-bandwidth FWI produces a quite accurate inversion (Figure 3b). Surface Acquisition Geometry In the remainder of this section, we apply regularized MSWI to waveform inversion to several examples with surface acquisition geometry, that is, both sources and receivers separated by a hyperplane from the scattering region. The extensive sampling of the objective function used to decide the value of ε in the second crosswell example is not generally feasible. Instead, we compute the relative initial data residual e and relative extended initial data residual ē : e = S[v ]f d d, ē = S[v ] f d d in which f is the known source wavelet and f is the extended source function, estimated by solving equation with the initial velocity. We adjusted ε by trialand-error until the ratio ē /e lay in the range..5. The (somewhat arbitrary) bounds are chosen to ensure that ε is large enough that the residual is substantially larger than zero (the expected value for ε = ), but small enough that the data is substantially better fit than is possible with a physical source (ε = ). This

parameters ε = 6,.5,5 for shot at z s =.,.5, km, respectively. (a) ε = 6,z s =. km; (b) ε = 6,z s =.

110 6 Huang, Nammour and Symes MSIVA Time (s) Time (s) Time (s) Receiver (km).5.5 Receiver (km).5.5 Receiver (km) Time (s) Time (s) Time (s) Receiver (km).5.5 Receiver (km).5.5 Receiver (km) Time (s) Time (s) Time (s) Receiver (km).5.5 Receiver (km).5.5 Receiver (km) Figure : Second crosswell example: plots of regularized extended source functions for three regularization parameters ε = 6,.5,5 for shot at z s =.,.5, km, respectively. (a) ε = 6,z s =. km; (b) ε = 6,z s =.5 km; (c) ε = 6,z s = km; (d) ε =.5,z s =. km; (e) ε =.5,z s =.5 km; (f) ε =.5,z s = km; (g) ε = 5,z s =. km; (h) ε = 5,z s =.5 km; (i) ε = 5,z s = km; Objective Value.6.4 Objective Value.6.4 Objective Value Parameter (α) Parameter (α) Parameter (α) Figure : Second crosswell example: objective function evaluated at velocities ( α)v +αv, α, for various choices of regularization parameter ε: (a) ε = 6, (b) ε =.5, (c) ε = 5

MSIVA MSIVA 7.9.9 Depth (km).8.7.6 Depth (km).8.7.6.5.5.4 Distance (km).

result after 5 iterations using the regularized MSWI inversion in (a) as initial velocity.5.5.5 Time (s) Time (s) Time (s).5.5.5.5.5 Receiver (km).

111 MSIVA MSIVA Depth (km) Depth (km) Distance (km).4 Distance (km) Figure 3: Second crosswell example: (a) inverted velocity by regularized MSWI (ε = 5) after 5 iterations; (b) FWI result after 5 iterations using the regularized MSWI inversion in (a) as initial velocity Time (s) Time (s) Time (s) Receiver (km).5.5 Receiver (km).5.5 Receiver (km) Figure 4: Second crosswell example: extended sources after regularized MSWI inversion for shot at (a) z s =. km, (b) z s =.5 km and z s = km

112 8 Huang, Nammour and Symes MSIVA method is admittedly crude: we will return to the choice of ε in the discussion section. Diving Wave Inversion The predominant contemporary use of FWI is to invert diving wave energy (Virieux and Operto, 9; Vigh et al., 3). This example examines the use of MSWI, with and without regularization, for a model generating diving waves with triplications. The model is smooth on the wavelength scale, hence transparent: the data consists only of direct and diving waves, with no reflections. The target model (Figure 5a) consists of a low velocity Gaussian anomaly embedded in linearly increasing background velocity. receivers are placed at depth z r =.4 km from x r =.4 km to x r = 7.96 km with x r =.8 km. 67 shots are placed at depth z s =.8 km from x s =.4 km to x r = 7.96 km with x s =. km. The frequency band used in inversion is 5- Hz. The choice ɛ = gives ē = 6.45% and e =.95%, satisfying the criterion explained earlier. The initial model in the iterative inversion is the linearly increasing background (Figure 5b), which produces diving wave arrivals without triplication. Comparison of the data for target and initial data in Figures 6a-6c and Figure 6d-6f shows that as expected, first arrival times differ by well over a cycle, and of course the triplication structure does not appear in the initial data at all, Therefore one would expect FWI to stagnate far from a useful model estimate, as indeed happens (Figure 7a). Unregularized MSWI (ε = 6 ) also fails: this is a pure transmission problem, and precisely the same phenomenon occurs as in the strong lens crosswell example (Figure 7b). On the other hand, regularized MSWI with the choice of penalty weight ɛ = explained above produces an satisfactory model estimation in the same number of iterations (Figure 8). Examination of extended sources at initial and final regularized MSWI model suggests that the kinematics of the data have been adequately captured in the final model (Figures 9a, 9b, 9c, 9d, 9e, and 9f). The output of regularized MSWI also performs well as an initial model for FWI, as it has already matched the data arrival times. FWI with 5 iterations of LBFGS, starting at the model shown in Figure 8, produces the slightly more

113 MSIVA MSIVA 9 refined model shown in Figure. The simulated data (Figure a-c) generated by the final inverted results show the best match with the recorded data in Figure 6a-6c. 5 Depth (km) Distance (km) 5 Depth (km) Distance (km) Figure 5: Diving wave example: (a) target velocity model with Gaussian low velocity lens embedded in the linearly increasing background velocity and (b) v (z) linearly increasing initial model Marmousi We modified the Marmousi model (Bourgeois et al., 99) by adding a. km water layer, then truncating the depth range at.35 km. We used the central 6.35 km horizontal extent, yielding the velocity model plotted in Figure a. The model is gridded at x = z = 5 m. Source depth is z s =. km. Sources are evenly spaced from x s =.5 km to x s = 6.3 km with sampling rate.5 km, and the fixed spread receivers placed at z r =.5 km spans from x r =.5 km to x r = 6.3 km with x r =.5 km. The initial velocity (Figure b) is linearly

114 Huang, Nammour and Symes MSIVA Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Figure 6: Diving wave example: shot gathers of recorded data for shot at at (a) x s =. km, (b) x s = km, and (c) x s = 4 km; Shot gathers of simulated data by initial velocity for shot at at (d) x s =. km, (e) x s = km, and (f) x s = 4 km increasing with depth below the water layer. The maximum time in the recorded data is 6 seconds and the frequencies used in the inversion are 4 to 8 Hz. The choice ε =.5 gives ē =.% and e =.6%, satisfying the criterion explained earlier. The data residual between recorded data and simulated data generated by initial model at x s =.,3.,6. km are plotted in Figure 3a-3c, respectively. The arrival times differ significantly for some events at large offset. FWI after LBGFS iterations yields the estimated model in Figure 4. Some fault reflections, which appear at large offsets, do not appear to be well-matched, accounting for the low- and high-velocity artifacts near the center of the model. Regularized MSWI produces the model shown in Figure 5a after LBFGS iterations. The fault structure appears in this result unobstructed by artifacts. The layer structure is also correct. Figure 6a-6c and Figure 6d-6f compare the estimated extended sources for the initial and final MSWI model. The large traveltime error in the large offet for initial velocity have been almost focused

MSIVA MSIVA 5 Depth (km) 4 3 3 3 4 5 6 7 8 Distance (km) 5 Depth (km) 4 3 3 3 4 5 6 7 8 Distance

FWI and (b) unregularized MSWI (ε = 6 ) 5 Depth (km) 4 3 3 3 4 5 6 7 8 Distance (km) Figure 8:

115 MSIVA MSIVA 5 Depth (km) Distance (km) 5 Depth (km) Distance (km) Figure 7: Diving wave example: inverted velocity after iterations using 5- Hz data by (a) FWI and (b) unregularized MSWI (ε = 6 ) 5 Depth (km) Distance (km) Figure 8: Diving wave example: inverted velocity using 5- Hz data after iterations by regularized MSWI (ε = )

Huang, Nammour and Symes MSIVA.5.5.5 Time (s) Time (s) Time (s).5.5.5 3 5 7 Receiver (km) 3 5 7 Receiver (km) 3 5 7 Receiver (km).5.5.5 Time (s) Time (s) Time (s).5.5.5 3 5 7 Receiver (km) 3 5 7 Receiver (km) 3 5 7 Receiver (km) Figure 9: Diving wave example: top row: extended sources by initial model for shot at (a) x s =.

116 Huang, Nammour and Symes MSIVA Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Figure 9: Diving wave example: top row: extended sources by initial model for shot at (a) x s =. km, (b) x s = km, and (c) x s = 4 km; Bottom row: extended sources by inverted velocity after regularized MSWI inversion for shot at (d) x s =. km, (e) x s = km, and (f) x s = 4 km 5 Depth (km) Distance (km) Figure : Diving wave example: inverted velocity by 5 FWI iterations, beginning with following regularized MSWI velocity (Figure 8)

117 MSIVA MSIVA Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Figure : Diving wave example: shot gathers of simulated data with final MSWI+FWI inverted result (Figure ) for shot at (a) x s =. km, (b) x s = km, and (c) x s = 4 km, respectively Depth (km) Distance (km) Depth (km) Distance (km).5 Figure : Marmousi example: (a) Modified velocity model - water layer is. km deeper than original; (b) v (z) linearly increasing initial velocity

4 Huang, Nammour and Symes MSIVA Time (s) Time (s) Time (s) 4 4 4 6 3 5

example: Data residual between recorded data and data with initial velocity

km, and (c) x s = 6. km 4.5 4 Depth (km) 3.5 3.5 3 4 5 6 Distance (km).

118 4 Huang, Nammour and Symes MSIVA Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Figure 3: Marmousi example: Data residual between recorded data and data with initial velocity with 4-8 Hz source for shot at (a) x s =. km, (b) x s = 3. km, and (c) x s = 6. km Depth (km) Distance (km).5 Figure 4: Marmousi example: inverted velocity obtained by FWI after LBFGS iteration,s using 4-8 Hz data.

119 MSIVA MSIVA 5 on zero lag in time, which also indicates that the inverted model is kinematically correct, within the resolution afforded by the inversion frequency range. In particular, the model in Figure 5a should be a good initial model for FWI, and indeed 5 LBFGS iterations of FWI with frequency band 6- Hz data produces the high resolution result displayed in Figure 5b. Figures 7a-7c show the data residual produced by the final velocity after applying FWI, also a significant improvement of the initial velocity Depth (km) Distance (km) Depth (km) Distance (km).5 Figure 5: Marmousi example: (a) inverted velocity from regularized MSWI (ε =.5) after LBFGS iterations using 4-8 Hz data and (b) inverted velocity from FWI after 5 LBFGS iterations using 6- Hz data with the inverted result in (a) as initial velocity Pluto We modify a portion of the Pluto model (Stoughton et al., ) to create the target model in Figure 8a, mimicing an isolated salt pillow, gridded with a cell of. km. km. The velocity in the salt inclusion is 4.5 km/s; the background

6 Huang, Nammour and Symes MSIVA Time (s) Time (s) Time (s) 3 5 Receiver (km) 3 5 Receiver (km) 3 5 Receiver (km) Time (s) Time

source functions from initial velocity for shot at (a) x s =. km, (b) x s = 3. km, and (c) x s = 6.

120 6 Huang, Nammour and Symes MSIVA Time (s) Time (s) Time (s) 3 5 Receiver (km) 3 5 Receiver (km) 3 5 Receiver (km) Time (s) Time (s) Time (s) 3 5 Receiver (km) 3 5 Receiver (km) 3 5 Receiver (km) Figure 6: Marmousi example: Top row: regularized extended source functions from initial velocity for shot at (a) x s =. km, (b) x s = 3. km, and (c) x s = 6. km; Bottom row: regularized extended source functions from regularized MSWI velocity for shot at (d) x s =. km, (e) x s = 3. km, and (f) x s = 6. km Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Figure 7: Marmousi example: Data residual between recorded data and data with final MSWI + FWI inverted velocity (Figure 5b) for shot at (a) x s =. km, (b) x s = 3. km, and (c) x s = 6. km

121 MSIVA MSIVA 7 medium is layer-like with velocity averaging. km/s. Source depth is z s =. km. Sources range from x s =.6 km to x s =.94 km. Fixed spread receivers range from x r =. km to x r =.98 km placed at depth z r =.4 km. The maximum time in the recorded data is 4 seconds and the frequency band is 4- Hz. We choose the regularization parameter ɛ =, for which initial relative errors are ē = 3.% and e = 6.6%, respectively. Note that in this example, positions of the sources and receivers permit undershooting, that is, transmitted and reflected ray paths that transit the region under the inclusion. Therefore the data should contain adequate kinematic information to determin the velocity throughout the model, except for the poorly illuminated edges. On the other hand, the large velocity contrast between the salt and surrounding sediments implies that FWI will likely fail to reconstruct the inclusion from an initial model (Figure 8b) in which it is absent. Indeed, FWI LBFGS iterations method locates the inclusion top, mispositions the bottom, and grossly underestimates the velocity in between (Figure 9). Regularized MSWI, in contrast, comes much closer to locating both top and bottom of the inclusion, and filling it with approximately correct velocity values, Figure 3a. Focus of the extended source estimates at zero time lag is much improved (Figure 3d-3f) over the initial model (Figure 3a-3c). In fact, the model depicted in Figure 3a appears to be a reasonable initial guess for FWI, providing approximately correct kinematics. A further 5 LBFGS iterations of FWI with frequency band 4-6 Hz data results in an accurate reconstruction of the inclusion (Figure 3b). The final data residual in Figure 3d-3f implies the inverted model has fit especially the refracted energy in the data quite well, in comparison to the initial data residual, shown in Figure 3a-3c. SEG/EAGE Salt Model The last example illustrates the (unsurprising) limitations on the performance of MSWI when the data provides inadequate illumination of the target. We extracted a D section from the SEG/EAGE salt model (Aminzadeh et al., 997), shown in Figure 33a, with.4 km.4 km grid cell. Acquisition geometry simulates towed streamer: the first shot is located at x s = 6. km, the last ones is placed

122 8 Huang, Nammour and Symes MSIVA Depth (km) Distance (km) Depth (km) Distance (km) Figure 8: Pluto example: (a) target velocity and (b) D initial velocity.5

123 MSIVA MSIVA Depth (km) Distance (km).5 Figure 9: Pluto example: inverted reuslt by FWI method after iterations using 4-Hz data at x s = 5.6 km, with x s =. km and z s =.8 km. The receiver depth is z r =.4 km, group interval is x r =.8 km, near offset is -. km and far offset is -6 km. The velocity in the salt inclusion is again 4.5 km/s and the surrounding sediment velocity ranges from.5 km/s in the water column to roughly 3 km/s in an overpressure zone to the left of the salt. The maximum time of recording is 8 seconds and we invert the data with - 5 Hz frequency band. The regularization parameter ɛ = 6, gives ē = % and e = 55%, conforming to the rule explained earlier. The initial velocity (Figure 33b) is linearly increasing v (z) with substantially larger dv/dz than the sediment model for SEG/EAGE Salt Model. Given the dramatic kinematic difference between initial and target models, FWI would be expected to stagnate. The inverted result by FWI after LBFGS iterations is shown in Figure 34. Top salt is reasonably well delineated, but no other major model features are correctly recovered. Regularized MSWI with LBFGS iterations produces the model shown in Figure 35. The top of the inclusion is accurately recovered, and to some extent the bottom, also the infill velocities are much closer to those of the target. However the salt root is not identified, and estimated velocity on the left below the overpressure zone, and on the right from about.5 km depth, are still far too high. We also compare the regularized extended source function for initial velocity and the inverted velocity by our method as showed in Figure 36a-36c and 36d-36f, re-

124 Huang, Nammour and Symes MSIVA Depth (km) Distance (km) Depth (km) Distance (km).5 Figure 3: Pluto example: (a) inverted result produced by regularized MSWI after iterations using 4- Hz data; (b) FWI result after 5 iterations using 4-6Hz data with regularized MSWI velocity in (a) as initial estimate.

125 MSIVA MSIVA Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Time (s) Time (s) Time (s) Receiver (km) Receiver (km) Receiver (km) Figure 3: Pluto example: Top row: regularized extended source functions with initial velocity for shot at (a) x s =.6 km, (b) x s =.5 km, and (c) x s =.94 km; Bottom row: regularized extended source functions with regularized MSWI velocity for shot at (d) x s =.6 km, (e) x s =.5 km, and (f) x s =.94 km

126 Huang, Nammour and Symes MSIVA Time (s) Time (s) Time (s) Receiver (km) 4 Receiver (km) 4 Receiver (km) Time (s) Time (s) Time (s) Receiver (km) 4 Receiver (km) 4 Receiver (km) Figure 3: Pluto example: (a) Data residual between recorded data and data simulated with initial velocity for shot at (a) x s =.6 km, (b) x s =.5 km, and (b) x s =.94 km; (b) Data residual between recorded data and simulated data using MSWI + FWI velocity (Figure 3b) for shot at (d) x s =.6 km, (e) x s =.5 km, and (f) x s =.94 km

MSIVA MSIVA 3 4.5 Depth (km) 3 4 3.5 3.5 4.

5 3 4 5 6 7 8 9 3 4 5 Distance (km) Figure 33: SEG Salt example: (a)

127 MSIVA MSIVA Depth (km) Distance (km) 4.5 Depth (km) Distance (km) Figure 33: SEG Salt example: (a) D SEG/EAGE salt model; (b) v (z) linearly increasing initial velocity 4.5 Depth (km) Distance (km) Figure 34: SEG Salt example: inverted velocity obtained by FWI after 4 LBFGS iterations using -5 Hz data

4 Huang, Nammour and Symes MSIVA 4.5 Depth (km) 3 4 3.5 3.5 4.

128 4 Huang, Nammour and Symes MSIVA 4.5 Depth (km) Distance (km) Figure 35: SEG Salt example: inverted velocity obtained by regularized MSWI (ε = 6) using -5 Hz data after LBFGS iterations Time (s) Time (s) Time (s) Receiver Receiver 4 4 Receiver Time (s) Time (s) Time (s) Receiver Receiver 4 4 Receiver Figure 36: SEG Salt example: Top row: regularized extended source functions with initial velocity for shot at x s = 6.6,x s =.8, and x s = 5 km, respectively; Bottom row: regularized extended source functions with MSWI velocity (Figure 35) for shot at x s = 6.6,x s =.8, and x s = 5 km, respectively

129 MSIVA MSIVA Depth (km) Distance (km) Figure 37: SEG Salt example: inverted velocity obtained by FWI after 5 LBFGS iterations using -7 Hz data. MSWI velocity (Figure 35 used as initial estimate. spectively. Focusing at zero time lag is considerably improved, however note the lack of energy on the left of each display: at large offsets, the kinematic disagreement is so violent that the regularization takes over, and completely suppresses the signal. Consequently, 5 FWI iterations with a somewhat broader frequency band (-7 Hz), starting at the MSWI estimate (Figure 37), improves the shallow structure and resolution of the top and bottom of salt somewhat, but does not correct the errors induced by lack of illumination of the deeper structure. We particularly point out the difficulty of correcting the subsalt velocity. This is a well-known issue in the imaging research community, and originates in the lack of energy penetration and aperture in those regions. It has also been recognized, in different form, in the mathematical inverse problems literature, where it has been recognized for a long time that stereotomographic data is generally insufficient to determine slowness in low-velocity regions surrounded by high-velocity zones (see for instance Stefanov and Uhlmann (9)). The region under salt in this example is effectively enclosed, due to the limited range of source and receiver locations. DISCUSSION We have already mentioned in the Introduction that the source-receiver extension used here is the basis for a number of other algorithms, for example Adaptive Waveform Inversion (Warner and Guasch, 4). Other source extension concepts have also been productive. Waveform Reconstruction Inversion as described by

130 6 Huang, Nammour and Symes MSIVA van Leeuwen and Herrmann (3) is roughly equivalent to introducing artificial sources everywhere. Contrast Source Inversion, described for example by Abubakar et al. (), may also be regarded as a source extension approach. Other extended modeling modifications of FWI have been motivated by wave equation migration velocity analysis ( WEMVA, (Biondi and Sava, 4)). These WEMVA-like extensions add parameters to the velocity model itself, for example subsurface offset (space shift) (Shen et al., 3, 5; Khoury et al., 6; Shen and Symes, 8; Shen, ; Sun and Symes, ; Biondi and Almomin, ; Weibull and Arntsen, 4; Lameloise et al., 5; Shen and Symes, 5), time shift (Yang and Sava, ; Biondi and Almomin, 4), scattering angle (De Hoop et al., 3; Shen and Calandra, 5), shot coordinates (Symes and Carazzone, 99; Symes and Kern, 994; Chauris and Plessix, 3), and surface offset (Mulder and ten Kroode, ; Chauris and Noble, ). The common feature in all of these extension based modifications to FWI is their tendency to produce the same sort of long-wavelength velocity updates as does traveltime tomography, that is, to extract kinematic information from the data. In some cases, it is actually possible to see why this is so. Huang and Symes (5) show that under certain restricted assumptions (no multiple ray paths), source-receiver MSWI for crosswell waveform tomography is equivalent to traveltime inversion, in the sense that the respective objective functions have the same stationary points. Similar computations show that subsurfacee offset extended waveform inversion and reflection slope tomography have proportional Hessians at a global solution (Symes, 4; ten Kroode, 4). We have addressed a major difficulty in source-receiver MSWI, its tendency to develop apparent multimodality in the event that energy arrives in the data along multiple ray paths. This phenomenon presents a real impediment to using MSWI for crosswell tomography, for example, since waveguides are quite commonly encountered in that application. Multiple arrivals are also common in diving wave fields. We have advanced an explanation in the context of periodic convolutional modeling, namely the presence of spectral zeros and near-zeros of the Hessian, and partly recovered convexity by Tihonov regularization, a standard technique for selecting solutions of ill-posed problems. We do not expect that this remedy will be the last word on the subject, but it does seem to be effective in many settings, including cartoon examples of crosswell tomography and diving wave FWI, as the examples show. We have used a very large number of LBFGS iterations (hundreds in several

131 MSIVA MSIVA 7 examples). The need for so many iterations may be linked to the rather small amount of data used (an almost all cases, 5 - frequencies), which of course also makes the iterations rather inexpensive. We believe this computational largesse is appropriate for a study designed to prove the MSWI concept. However it certainly begs the question: can reasonable results be obtained in a more reasonable number of RTM applications, say O()? Our use of circulant convolution to model the relation between source and data traces is computationally convenient. In particular, it enables inexpensive and machine-precision solution of the normal equation (optimization problem 4) as the matrices involved are diagonal. As our examples are both computed and inverted entirely in the frequency domain, this is an appropriate methodology. However this methodology is commonly termed an inverse crime - that is, the same tools used for modeling as for inversion. Field data comes in the form of time domain traces, with finite duration and not necessarily amplitude-decaying. A time cutoff is necessary, either implicitly or explicitly, and that operation does not commute with convolution. Therefore an appropriate version of problem 4 applicable to field data will need a different solution mode, either Gaussian elimination or the Levinson algorithm, or (more likely) an iterative solver such as conjugate gradient iteration. An inexact solve of the inner problem brings up another difficulty. The velocity gradient involves the linearized operator or migration operator, as noted in the Appendix. However this operator is not continuous in the sense in which iterative solutions of the inner problem converge, rendering the accuracy of the computed gradient questionable. See Symes (5) for more discussion on this point. Limited bandwidth masks this problem to some extent. However it acts as a barrier to improved resolution. The cited reference outlines some recent theoretical progress in using special accelerators for the inner problem to overcome the discontinuity of the gradient. Note that the use of circulant convolution and consequent machine-precision solution of the inner problem by deconvolution eliminates the source of outer gradient inaccuracy just described. The sharp-eyed reader familiar with the Warner s AWI algorithm (Warner and Guasch, 4) will notice that our definition of the objective function (equation 3) lacks the normalization by the mean square of f employed in AWI. There are excellent reasons to normalize such an objective. The principal reason is the am-

132 8 Huang, Nammour and Symes MSIVA plitude trade-off that occurs in reflection: data fit to a reflection is not affected, to first order, by scaling the reflectivity up and the source down by the same factor. Therefore one would not expect an effective velocity update in the reflection case without normalization of the source. The reader might well ask how we got away with it. The answer is that transmitted waves do not suffer from this scale ambiguity, and all of our examples contained transmitted waves. In particular, the data in the reflection-dominated examples (Marmousi, Pluto) contained direct wave energy, as the boundary conditions used here are absorbing in all directions so the dipole effect of the free surface is absent. Clearly this device is not one to rely upon, and a modification of our algorithm to normalize J following the model of Warner and Guasch (4) (an many previous works on joint sourcemodel estimation, such as (Minkoff and Symes, 997)) is indicated. As pointed out by Warner and Guasch (4), the additional computational expense from normalization is minimal. CONCLUSION We have presented an extended modeling approach to overcoming the tendency of FWI to stagnate at uninformative models. The key ingredient in this approach is the addition of parameters to the model, in the form of unphysical source parameters, which allow the model to fit the data at every stage of the inversion. Specifically, we allow the source pulse to depend on the source and receiver coordinates. Other choices of source extension are possible, see below. To eliminate the unphysical additional parameters and recover an FWI solution, we have imposed a penalty on the variance of the extended source across source and receiver. Since the extended source is uniquely determined by the data and the velocity model, the penalty is a function of the velocity. We have called minimization of this objective matched source waveform inversion (MSWI). An iterative local optimization method applied to MSWI produces a sequence of models approaching a solution of the FWI problem, even starting with a grossly inaccurate initial velocity estimate. The approach is close enough that convergent FWI iteration can be started from an MSWI solution. Thus in a sense the combination MSWI + FWI appears to globalize the convergence of FWI, at least in some cases.

133 MSIVA MSIVA 9 ACKNOWLEDGEMENT GH and WWS are grateful to TOTAL E&P USA and to the sponsors of The Rice Inversion Project for their support. We thank TOTAL E&P USA for permission to publish this study. We acknowledge the Rice University Computing Support Group for providing HPC resources that have contributed to the research results reported here.

134 3 Huang, Nammour and Symes MSIVA APPENDIX: COMPUTATION OF GRADIENT In this section, we show how to compute the gradient of inversion velocity analysis objective function via trace-based extension. The perturbation of the objective function with respect to model m = v is given by δj[m] = A T A f,δf Using the matched source equation, we have N ε δf = δ S T d δn εf = δ S T (d S f ) S T δ S f (A-) Let s introduce an auxiliary extended source w(x r,t,x s ) such that N ε w = A T A f. (A-) Then, we have by combining (A-)-(A-) δj[m] = w,δ S T (d S f ) S T δ S f = δ S w,d S f S w,δ S f Note that (δ S f )(x r,t;x s ) = ( f δsδ)(x r,t;x s ), and that S w, f δsδ, = f S w,δsδ where denotes cross-correlation. A similar transsformation of the other term above shows that δj[m] = r,δ Sδ where the residual r(x r,t,x s ) is: r = w (d S f ) f S w (note that this is a trace-by-trace computation). Therefore J[m] = (δ Sδ) T r (A-3) The transpose of the impulsive Born simulation operator δ Sδ is the well-known impulsive reverse time migration operator, that is the zero-lag cross-correlation

135 MSIVA MSIVA 3 between the incident Green function G and adjoint field v with the residual r as the adjoint source: [ ] m G t G (x,t;x s ) = δ(x x s )δ(t) (A-4) [ ] m v t v (x,t;x s ) = δ(x x r ) r(x r,t;x s ) (A-5) x r and the explicit form of equation A-3 is J[m](x) = x s T G t (x,t;x s)v(x,t;x s )dt (A-6) This is the gradient of J with respect to m = /v. To obtain the gradient with respect to v, simply multiply the right hand side of equation A-6 by /v 3. REFERENCES Abubakar, A., G. Pan, M. Li, L. Zhang, T. Habashy, and P. van den Berg,, Three-dimensional seismic full-waveform inversion using the finite-difference contrast source inversion method: Geophysical Prospecting, 59, Aminzadeh, F., J. Brac, and T. Kunz, 997, Seg/eage salt and overthrust models, in SEG/EAGE 3D Modeling Series: Society of Exploration Geophysicists,. Biondi, B., and A. Almomin,, Tomographic full waveform inversion (TFWI) by combining full waveform inversion with wave-equation migration velocity analysis: 8nd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, SI9.5., 4, Simultaneous inversion of full data bandwidth by tomographic fullwaveform inversion: Geophysics, 79, WA9 WA4. Biondi, B., and P. Sava, 4, Wave-equation migration velocity analysis - I: Theory, and II: Subsalt imaging examples: Geophysics, 5, Bourgeois, A., P. Lailly, and R. Versteeg, 99, The Marmousi model, in The Marmousi Experience: IFP/Technip. Bunks, C., F. Saleck, S. Zaleski, and G. Chavent, 995, Multiscale seismic waveform inversion: Geophysics, 6,

136 3 Huang, Nammour and Symes MSIVA Chauris, H., and M. Noble,, Two-dimensional velocity macro model estimation from seismic reflection data by local differential semblance optimization: applications synthetic and real data sets: Geophysical Journal International, 44, 4 6. Chauris, H., and R.-E. Plessix, 3, Differential waveform inversion - a way to cope with multiples?: Presented at the 75th Conference EAGE, European Association for Geoscientists and Engineers. (Expanded Abstract). De Hoop, M. V., S. Brandsberg-Dahl, and B. Ursin, 3, Seismic velocity analysis in the scattering-angle/azimuth domain: Geophysical Prospecting, 5, Engl, H., M. Hanke, and A. Neubauer, 996, Regularization of inverse problems: Kluwer Academic Publishers. Gauthier, O., A. Tarantola, and J. Virieux, 986, Two-dimensional nonlinear inversion of seismic waveforms: Geophysics, 5, Huang, G., and W. Symes, 5, Full waveform inversion via matched source extension: 85th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Khoury, A., W. Symes, P. Williamson, and P. Shen, 6, DSR migration velocity analysis by differential semblance optimization: 76th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, SPMI3.4. Lameloise, C.-A., H. Chauris, and M. Noble, 5, Improving the gradient of the image-domain objective function using quantitative migration for a more robust migration velocity analysis: Geophysical Prospecting, 63, Luo, S., and P. Sava,, A deconvolution-based objective function for waveequation inversion: 8th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Minkoff, S. E., and W. Symes, 997, Full waveform inversion of marine reflection data in the plane-wave domain: Geophysics, 6, Mulder, W., and F. ten Kroode,, Automatic velocity analysis by differential semblance optimization: Geophysics, 67, Nocedal, J., and S. Wright, 999, Numerical Optimization: Springer Verlag. Plessix, R.-E.,, Automatic cross-well tomography: an application of the differential semblance optimization to two real examples: Geophysical Prospecting, 48, Plessix, R.-E., Y.-H. de Roeck, and G. Chavent, 999, Waveform inversion of reflection seismic data for kinematic parameters by local optimization: SIAM Journal on Scientific Computation,, Plessix, R.-E., W. Mulder, and F. ten Kroode,, Automatic cross-well tomogra-

137 MSIVA MSIVA 33 phy by semblance and differential semblance optimization: theory and gradient computation: Geophysical Prospecting, 48, Pratt, R., 999, Seismic waveform inversion in the frequency domain, part : Theory, and verification in a physical scale model: Geophysics, 64, Pratt, R., and R. Shipp, 999, Seismic waveform inversion in the frequency domain, part : Fault delineation in sediments using crosshole data: Geophysics, 64, Pratt, R., and W. Symes,, Semblance and differential semblance optimization for waveform tomography: a frequency domain approach: Journal of Conference Abstracts, 7, Shen, P.,, An RTM based automatic migration velocity analysis in image domain: 8nd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, SVE.. Shen, P., and H. Calandra, 5, One-way waveform inversion within the framework of adjoint state differential migration: 75th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, SI3.5. Shen, P., and W. Symes, 8, Automatic velocity analysis via shot profile migration: Geophysics, 73, VE49 6., 5, Horizontal contraction in image domain for velocity inversion: Geophysics, 8, R95 R. Shen, P., W. Symes, S. Morton, and H. Calandra, 5, Differential semblance velocity analysis via shot profile migration: 75th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, SPVA.4. Shen, P., W. Symes, and C. C. Stolk, 3, Differential semblance velocity analysis by wave-equation migration: 73rd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Sirgue, L., and G. Pratt, 4, Efficient waveform inversion and imaging: a strategy for selecting temporal frequencies: Geophysics, 69, Song, H., and W. Symes, 993, A differential semblance formulation of crosswell tomography: Technical Report TR9-, Department of Computational and Applied Mathematics, Rice University, Houston, TX., 994, Inverison of crosswell tomography via differential semblance optimization: Technical report, The Rice Inversion Project. (Annual Report). Stefanov, P., and G. Uhlmann, 9, Local lens rigidity with incomplete data for a class fo non-simple Riemannian manifolds: Journal of Differential Geometry, 8, Stoughton, D., J. Stefani, and S. Michell,, -D elastic model for wavefield investigations of subsalt objectives, deep water Gulf of Mexico: 7th Annual

138 34 Huang, Nammour and Symes MSIVA International Meetting, SEG, Expanded Abstacts, Sun, D., and W. Symes,, Waveform inversion via nonlinear differential semblance optimization: 8nd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, SI3.3. Symes, W., 994, Inverting waveforms in the presence of caustics: Technical report, The Rice Inversion Project. (Annual Report)., 4, Seismic inverse problems: recent developments in theory and practice: Presented at the Inverse Problems - from Theory to Application, Proceedings, Institute of Physics., 5, Algorithmic aspects of extended waveform inversion: Presented at the 76th Annual International Conference and Exhibition, Expanded Abstract, European Association for Geoscientists and Engineers. Symes, W., and J. J. Carazzone, 99, Velocity inversion by differential semblance optimization: Geophysics, 56, Symes, W., and M. Kern, 994, Inversion of reflective seismograms by differential semblance analysis: Algorithm structure and synthetic examples: Geoph. Prosp., in press. (also: Tech. Rep. TR9-3 CAAM Rice University). Tarantola, A., 984, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 49, ten Kroode, F., 4, A Lie group associated to seismic velocity estimation: Presented at the Inverse Problems - from Theory to Application, Proceedings, Institute of Physics. van Leeuwen, T., and F. J. Herrmann, 3, Mitigating local minima in fullwaveform inversion by expanding the search space: Geophysical Journal International, 95, Vigh, D., K. Jiao, W. Huang, N. Moldoveanu, and J. Kapoor, 3, Long-offsetaided Full-waveform Inversion: 75th Annual Meeting, Expanded Abstracts, European Association of Geoscientists and Engineers, 76th Annual Meeting. Virieux, J., and S. Operto, 9, An overview of full waveform inversion in exploration geophysics: Geophysics, 74, WCC7 WCC5. Warner, M., and L. Guasch, 4, Adaptive waveform inversion: Theory: 84th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Weibull, W., and B. Arntsen, 4, Anisotropic migration velocity analysis using reverse time migration: Geophysics, 79, R3 R5. Yang, T., and P. Sava,, Wave-equation migration velocity analysis with timeshift imaging: Geophysical prospecting, 59,

139 The Rice Inversion Project, TRIP6, July 6, 6 Gradients of Reduced Objective Functions for Separable Wave Inverse Problems William W. Symes, The Rice Inversion Project ABSTRACT The derivative of the forward (modeling) map for wave inverse problems is unbounded in the natural metric of the model and data spaces. This unboundedness complicates the accurate computation of the reduced gradient in separable wave inverse problems: iterative estimation of the linear variables may not lead to convergent computation of the gradient. If the linear part of the modeling operator possesses a computable inverse up to a smoothing error, however, and if the data is high-frequency in a precise sense, then the gradient may be decomposed into a computable non-iterative part and a convergent remainder. The resulting form of the gradient error control is well-suited to integration with inexact trust-region optimization methods. INTRODUCTION Separable (partly linear) modeling operators for wave inverse problems arise either via Born approximation (Kern and Symes, 994; Chauris and Noble, ; Mulder and ten Kroode, ; Shen and Symes, 8; Weibull and Arntsen, 3; Biondi and Almomin, 4) or by including source (right-hand side) parameters as model degrees of freedom (Song and Symes, 994; Plessix, ; Pratt and Symes, ; Luo and Sava, ; Warner and Guasch, 4; van Leeuwen and Herrmann, 3). Extended model spaces add yet more degrees of freedom to the linear parameters of a separable modeling operator, to permit data fit for a large set of nonlinear parameter ( velocity model ) values. The non-extended ( physical ) subspace is the kernel of an annihilator. Minimizing the norm of the annihilator output over all data-fitting models picks out an approximation to the solution of the physical least squares inverse problem. A natural approach to this 35

140 36 Huang, Nammour and Symes MSIVA minimization is via elimination of the linear variables to create a reduced function of the nonlinear variables (Kern and Symes, 994; van Leeuwen and Mulder, 9; Li et al., 3). If the nonlinear parameters include velocities, indices of refraction, or equivalent, then the modeling operator is differentiable with loss of regularity: a convergent sequence of approximate solutions of the data-fit problem does not (necessarily) lead to a convergent sequence of derivative values. Consequently, the computable approximate solutions of the data-fit problem do not lead to convergent gradient approximations for the reduced objective, which may reduce the effectiveness of iterative optimization strategies (Huang and Symes, 5). In the paragraphs to follow, I show that the gradient computation may be modified to become convergent if three conditions are satisfied: the linear part of the forward modeling operator possesses a computable parametrix (inverse modulo compacts, or asymptotic inverse ) in the form of its adjoint with respect to suitable weighted norms, and the inverse problem is embedded in a family of problems indexed by a data frequency parameter. the (inevitable) Tihonov regularization parameter is coupled to data frequency. Then the reduced gradient computation may be modified to become convergent as well. Reliance on parametrices might seem surprising. However, parametrices for linearized inverse wave problems have a long history, mostly as a subtopic of computational geometric optics (Cohen and Bleistein, 977; Beylkin, 985; Beylkin and Burridge, 99; Virieux et al., 99; Araya and Symes, 996; Operto et al., ). Recently several authors have described direct ( wave equation ) parametrix constructions without any ray-geometric computations (Zhang et al., 5, 7; Xu et al., ; ten Kroode, ; Hou and Symes, 5). For inversion of source parameters, even the usual exact inversion by Fourier division is actually a parametrix, due to the inevitable presence of time cut-off. The existence of the parametrix permits the decomposition of the gradient into two summands: the first does not involve the solution of the inner (linear)

141 MSIVA MSIVA 37 problem at all, while the second does but is manifestly continuous in the linear variables. Hence approximation of the linear part of the solution by a convergent iteration results in a convergent gradient approximation. While convergence is assured by the three conditions above, properly interpreted, control over the actual error in the gradient is indirect, depending on quantities that are difficult or impossible ot estimate accurately. However I also show that gradient error control of the type established here is compatible with so-called inexact trust-region optimization algorithms (Heinkenschloss and Vicente, ), with (in principle) assured convergence to a stationary point. That is, control over gradient error must be linked to convergence metrics of the optimization algorithm. Some uses of model extension for inverse wave problems have been criticized for so-called gradient artifacts, that is, computed gradients that appear to give rise to slow convergence or undesired model features (Fei and Williamson, ; Vyas and Tang, ). However it now appears that nothing was wrong with the gradients - instead, the reduced objective functions of separable least squares problems were defined with insufficiently precise solution of the linear (inner) inverse problem (Kern and Symes, 994; Liu et al., 4; Lameloise et al., 5). The present paper could be viewed as an elaboration upon the latter work. OPERATORS This is a Hilbert space story. In fact, the natural objects in this study are Hilbert space scales, that is sequences of Hilbert spaces H = {H s,s Z}, decreasing in the sense that H s+ H s,s Z. Define the sets H = s Z H s, H = s Z H s. The inner product in H s will be denoted, s, and the corresponding norm is s. Usually several scales will be involved in most of our assertions; the scale to which the inner product or norm belongs should be clear from context. An operator L of order k Z from a scale H to a scale H is linear function from H to H for which L H s B(H s,hs k ) for all s Z. Denote by Op k (H,H ) a set of operators of order k, satisfying the additional order properties: for L i

142 38 Huang, Nammour and Symes MSIVA Op k i (H,H ),i =,, L L Op k +k (H,H ), [L,L ] Op k +k (H,H ). () Inv k (H,H ) Op k (H,H ) is the subset of invertible operators of order k. The motivating example of this structure are scales of L Sobolev spaces, and scalar (pseudo-)differential operators on them. The domain of a separable extended model consists of two components. The first is a (background) model Hilbert space M b and an open set of admissible models U M b. In all examples, M b is a space of (possibly) vector-valued) smooth (C, low frequency ) functions on a suitable physical domain representing the physical parameters of a wave dynamics model, and members of U obey additional constraints (such as bounds) required to make the dynamical laws they parametrize well-posed. The other component of the model domain is a Hilbert space scale of extended models M = { M k,k Z}. A distinguished subscale M = {M k,k Z},M k M k, represents the non-extended or physical models. The annihilator A Op ( M,N) has M as its kernel: m M, A m = m M. (). The extension map E Op (M, M) is injective. The model range or data space is another Hilbert scale D. The extended modeling operator, or forward map, is a continuous function It is smooth with loss of regularity: but and in particular However, the normal operator is smooth F : U Op ( M,D) m b F[m b ] M k Cp (U,B( M k,d k p )), (3) m b F[m b ] M k Cp (U,B( M k,d k p+ )), (4) m b F[m b ] M Cp (U,B( M,D )). (5) F T F C (U,Op ( M, M), (6)

143 MSIVA MSIVA 39 as follows from the factorization lemma: D F[m b ]δm b = F[m b ](Q [m b ]δm b ) (7) in which Q C (U M b,op ( M, M)) is essentially skew-symmetric: Q + Q T = Q C (U M b,op ( M, M)). (8) Finally, assume that F has the Egorov property: if L C (U,Op k ( M, M)), then there exists L C (U,Op k (D,D)) so that for every m b U, and vis-versa. F[m b ]L [m b ] = L [m b ] F[m b ]. (9) Remark: In the examples, F[m b ] is a Fourier Integral Operator whose canonical relation is a local canonical graph, and whose symbol and canonical relation depends smoothly on m b. Both the regularity with loss of a derivative (Blazek et al., 3) and normal regularity of F T F follow from these facts: in this case, F T F is a pseudodifferential operator. For examples of the factorization lemma, see ten Kroode (4); Symes (4). The Egorov property follows from the usual Egorov s theorem (see for example Taylor (98)). The physical modeling operator is a similar map F : U Op (M,D) with the same smoothness with loss of regularity. The extended and physical modeling operators are related through the extension map: for each m b U, F[m b ] = F[m b ] E. OBJECTIVE In the notation just introduced, the separable inverse problem of this paper may be crudely stated as: Given d D, find m M b,m l M so that F[m]m l d ()

144 4 Huang, Nammour and Symes MSIVA In view of the extension structure, this problem is equivalent to Given d D, find m M b, m M so that F[m] m d, A m () For consistent data (that is, for which can be replaced with = in either of the above statements, a solution is also a global minimizer of both F[m] m d and A m () The fundamental difficulty of this type of problem lies in the mapping properties of m F[m] or F[m]: because of these maps are differentiable only with loss of regularity (statements 3, 4, and 5), the first of the two functions in display (the data residual ) is not smooth as a map M b M R. This lack of smoothness is the ultimate reason that solution of the basic full waveform inversion problem is still a topic of active research, rather than a routine tool in exploration and academic seismology, more than thirty years after it was introduced. Two objectives such as can be combined in various ways to produce constrained and unconstrained optimization problems. Both are quadratic in m, so minimization of either in this variable seems a tractable problem. Minimizing the second ( semblance ) function so that the data residual becomes a function of m only (a so called reduced objective) does not improve matters: the resulting unconstrained optimization is equivalent to and has a non-smooth objective. Remarkably, eliminating m by minimizing the data residual, yields a smooth reduced semblance objective: this conclusion will be justified below. Generally it cannot be assumed that F is coercive, even if it is injective. Therefore some form of regularization must be added to the data residual to guarantee existence of a stable minimizer. It is also turns out to be essential to permit more freedom in the metric structure of domain and range spaces also. Introduce a data weighting operator W d C (U,Inv (D,D)). When restricted to D, W d is self-adjoint and positive definite, hence induces a norm on the order data space: d,d d = d,w d d for d,d D (3) Similarly, introduce a model weight operator W m C (U,Inv ( M, M)) on the model space, self-adjoint and positive definite when restricted to M, with corresponding norm and inner product m, m m = m,w d m for m, m M (4)

145 MSIVA MSIVA 4 Reformulate data residual minimization for the linear variables m as min m F[m] m d d + λ R m m (5) in which R is a bounded and coercive regularization operator, whose range is some other Hilbert space. The normal equation is N λ [m] m = F [m]d, N λ [m] = F [m] F[m] + λ R R, (6) in which F = Wm F[m] T W d [m] is the adjoint of F with respect to the weighted norms and F T is the adjoint with respect to the scale -norms. Similarly R is the adjoint of R with respect to the weighted model-space norm. Denote by m λ C (U D, M ) the solution of 6, and define the reduced objective function J λ [m,d] = A m λ[m,d] (7) If J λ [m,d] attains its obvious lower bound, then m λ [m,d] M and (m, m λ [m,d]) solves the extended inverse problem. GRADIENT Minimization of J λ by any variant of Newton s method requires computation of the gradient of J λ. Formally, DJ λ [m,d]δm = D m λ [m,d]δm,a T A m λ [m,d] = N λ [m] (DN λ [m]δm) m λ [m,d] + N λ [m] (D F [m]δm)d,a T A m λ [m,d] Inserting W m Wm, referring to the definition of, m, and noting that Wm A T = A and that N λ is self-adjoint in the model norm, obtain = (DN λ [m]δm) m λ [m,d] + (D F [m]δm)d, q λ [m,d] m (8) where q λ [m,d] is the solution of N λ [m] q λ [m,d] = A A m λ [m]. (9) The first term in right-hand side of equation 8 defines a continuous function of m λ [m,d]. Since N λ = Wm F T W d F + λ R R, the rule 9 implies the existence of

146 4 Huang, Nammour and Symes MSIVA P C (U,Inv ( M, M)) so that N λ = Wm F T FP + λ R R. This shows in turn that N λ C (U,Op ( M, M)), thanks to the assumptions on W d,w m, 9 once again, and the 6. In particular, DN λ [m]δm Op ( M, M), so the bilinear form is continuous in M M. Also, q, m (DN λ [m]δm) m, q m () (DN λ [m]δm) m, q m = (D( F [m])δm) F[m] m, q m + ( F [m](d F[m]δm) m, q m = W m [m] (DW m [m]δm) F [m] F[m] m, q m + F [m]w d [m] DW d [m] F[m] m, q m + (D F[m]δm) F[m] m, q m + F [m](d F[m]δm) m, q m = (DW m [m]δm) F [m] F[m] m, q + (DW d [m]δm) F[m] m, F[m]q + F[m] m,(d F[m]δm) q d + (D F[m]δm) m, F[m] q d () Concerning the second term in 8, note that (D F [m]δm) = D[(W m [m]) F[m] T W d [m]d](δm) = W m [m] (DW m [m]δm) F [m]d + F [m]w d [m] DW d [m]d + (D F[m]δm) d () The first two terms in equation involve only bounded operators on M. The contribution to the second term in the right hand side of equation 8 is (D F [m]δm)d, q m = W m [m] (DW m [m]δm) F [m]d, q m + F [m]w d [m] DW d [m]d, q m + (D F[m]δm) d, q m = (DW m [m]δm)δm) F [m]d, q + + (DW d [m]δm)d, F[m] q + d,(d F[m]δm) q d (3) Now the difficulty is plain to see: if q = q λ [m,d] in the last term on the right hand side of equation 3 is replaced with an approximation in the sense of the

147 MSIVA MSIVA 43 norm, as would be the result of an iterative process for solving 9 (or equation 6), no bound on the resulting error in the inner product above, hence in the resulting approximation to DJ λ [m,d]δm or the gradient of J λ, can be asserted, since D F[m]δm is not continuous in the sense of M. Ignoring for the moment the apparent instability, combine equations and 3 to obtain an expression for the derivative of J λ [m,d]: DJ λ [m,d]δm = (DW m [m]δm) F [m]( F[m] m λ [m,d] d), q λ [m,d] + (DW d [m]δm)( F[m] m λ [m,d] d), F[m] q λ [m,d] F[m] m λ [m,d] d,(d F[m]δm) q λ [m,d] d (D F[m]δm) m λ [m,d],f[m] q λ [m,d] d (4) There follows an expression for the gradient, using the partial duals D F[m], a continuous quadratic form : D M M b defined by δm,d F[m] (d, m) Mb = d,(d F[m]δm) m d = (D F[m]δm) d, m m = W d [m] d,(d F[m]δm) m (5) and DW m [m] t, a continuous bilinear map M M M b defined by δm,dw m [m] t ( m, q) Mb = (DW m [m]δm) m, q DW d [m] t is defined similarly. Then the m-gradient of J λ is given by m J λ [m,d] = DW m [m] t ( F [m]( F[m] m λ [m,d] d), q λ [m,d]) +DW d [m] t ( F[m] m λ [m,d] d), F[m] q λ [m,d] D F[m] ( F[m] m λ [m,d], q λ [m,d]) D F[m] ( m λ [m,d], F[m] q λ [m,d]) +D F[m] (d, q λ [m,d]) (6) Every part of this expression is manifestly stable, that is, substitutions of approximations m λ,a, q λ,a in their computation will entail O( m λ m λ,a, q λ q λ,a ) error, except for the last term

148 44 Huang, Nammour and Symes MSIVA STABLILITY In this section, I will show that the last term in 6 can expressed as the value at m = m λ [m,d] and q = q λ [m,d] of of continuous function of d, m, q, thus giving a stable computation of the gradient in the sense explained at the end of the last section. The construction depends on the availability of a parametrix or approximate inverse modulo lower order error, of a particular form. The utility of the construction depends on the computability of the parametrix, that is, on its being a straightforward modification of the transpose. (Hou and Symes, 5) demonstrated the existence of such special parametrices for a particular separable inverse problem (extended linearized constant density acoustic inversion with horizontal subsurface offset extension) and parametrices with similar properties exist for other extended modeling operators as well. In fact, most parametrices of the type I ve mentioned are only microlocal, and moreover can only be computed approximately. These limitations will be addressed in coming sections. Assume that the data and model weight operators W d,w m can be chosen so that F is approximately unitary with respect to the norms d, m : satisfies F F I S C (U,Op ( M, M)). (7) Thus N λ = S + I + λ R R). Thus equations 6, 9 are equivalent to m λ = (I + λ R R) ( F d S m λ ) (8) q λ = (I + λ R R) (A A m λ S q λ ) (9) = (I + λ R R) (A A(I + λ R R) ( F d S m λ ) (I + λ R R) S q λ (3) Replacing q λ in 6 with the right hand side in the last equality of 8, obtain D F[m] (d, q λ [m,d]) = D F[m] (d,(i + λ R R) A A(I + λ R R) F [m]d (I + λ R R) A A(I + λ R R) S[m] m λ [m,d] (I + λ R R) S[m] q λ [m,d]). (3) The operators appearing in the last two terms of 3 are of order, mapping M continuously to M. So the continuity property of D F[m] noted above implies

149 MSIVA MSIVA 45 that the last two terms above are stable, that is, substitution of approximations m λ,a and q λ,a for m λ and q λ in the right-hand side of equation 3 will result in an O( m λ,a m λ, q λ,a q λ ) error. It remains to be seen that the first term on the right hand side of equation 3 is stable: a priori, it only makes sense for d D (hence m λ M ). A continuous extension to d D follows however from the factorization property 7. For convenience, define B = ( + λ R R) A A( + λ R R). Then the first term on the right-hand side of 3 is δm,d F[m] (d,b F [m]d) Mb = d,(d F[m]δm)B F [m]d d = d, F[m](Q [m]δm)b F [m]d d = F [m]d,(q [m]δm)b F [m]d m (3) The essentially-skew property of Q (equation 8) holds also for the weighted inner product, m, since the weight operator W m is invertible and of order : specifically, (Q [m]δm) = (Q [m]δm) + (Q,m [m]δm), (33) with Q,m = Q + W m [Q,W m ] C (U,Op ( M, M)). So the right-hand side of equation C-4 is = F [m]d,([b,(q [m]δm)] + B(Q [m]δm)) F [m]d m + F [m]d,([b,(q [m]δm)] + ((Q,m [m] Q [m])δm)b) F [m]d m. (34) Add the right hand sides of equations C-4 and 34 and divide by to obtain δm,d F[m] (d,b F [m]d) Mb = F [m]d,([b,(q [m]δm)] + (Q,m [m]δm)) F [m]d m. (35)

150 46 Huang, Nammour and Symes MSIVA The right-hand side of equation 35 is a quadratic form in F [m]d, defined by a self-adjoint operator (the sum of the various commutators and products in the above expression) of order zero. Therefore D F[m] (d,b F [m]d) is a -continuous M b -valued quadratic form in d, hence extends continuously to d D, whence m J λ [m,d] is -continuous in d. For Hilbert spaces H,H, denote by Q(H,H ) the Banach space of continuous H -valued quadratic forms on H. Then the result of the foregoing calculations is summarized in Theorem. Define G λ C (U,Q(D M M,M b ) by G λ [m](d, m, q) =D F[m] (d,( + λ R R) A A( + λ R R) F [m]d ( + λ R R) A A( + λ R R) S[m] m ( + λ R R) S[m] q]) + DW m [m] t ( F [m]( F[m] m d), q) + DW d [m] t ( F[m] m d, F[m] q) D F[m] ( F[m] m, q) D F[m] ( m, F[m] q) (36) Then m J λ [m,d] = G λ [m](d, m λ [m,d], q λ [m,d]), (37) in which m λ [m,d] and q λ [m,d] are solutions of equations 6 and 9 respectively. Proof. The content of the definition 36 of G λ is that it is a continuous quadraticform-valued function. This fact has been established for each of the summands: for the first term by equation 35, for the second and third by equation 3 and following discussion, the fourth and fifth by equation, and the last two by equations and and surrounding discussion. These calculations also established equation 37. Remark. For computational purposes, it is more convenient to work directly with the defining Hilbert space structure of the background, domain and range spaces, rather

151 MSIVA MSIVA 47 than with the background, model and data norms, and to formulate the background model space norm also as a weighted norm. For example, δm,d F[m] (d,a A F [m]d) Mb = d,(d F[m]δm)A A F [m]d d So = W d d,(d F[m]δm)W m A T A F [m]d. m J λ [m,d] = W b D F[m] t (W d [m]d,w m [m] A T A F [m]d) + K[m]( m λ, q λ ), (38) where W b is the background model space weight operator defining the norm Mb, and D F[m] t is the transpose with respect to the -norms (usually, L, or in the discrete case Euclidean length) rather than the M b and d norms: δm,d F[m] t (d, m) = d,(d F[m]δm) m. (39) Assuming that F is implemented by a time-stepping finite difference or finite element method, this notion of transpose can be computed by a variant of the adjoint state method. For the case that F is a linearization (that is, Born approximation), so that D F is actually a second derivative, this application of the adjoint state method was introduced by Symes and Santosa (988), and employed by Kern and Symes (994) in computations similar to those presented here. CONTROLLABILITY The stability result of the last two sections is not in itself sufficient foundation for a convergent optimization algorithm, for two reasons:. The solution errors m λ,a m λ and q λ,a q λ are not directly observable, since neither m λ nor q λ are known in practice;. typical iterative solution algorithms for equations 6 and 9 do not necessarily reduce their solution errors.

152 48 Huang, Nammour and Symes MSIVA Observable quantities actually reduced by iterative algorithms include the data residual F[m] m λ,a d d, the residual for each of the equations 6 and 9, and the norms of the approximate solutions m λ,a and q λ,a. It might be objected that the solution errors and residuals for equations 6 and 9 are related, the former being at most λ times the latter (λ being a lower bound for the normal operator N λ ). Of course such bounds are not uniform in λ, hence are not useful in formulation of an effective algorithm. As will be explained in the next section, the necessary form for the gradient error estimate is g J λ [m,d] Mb Kɛ (4) in which K and ɛ is a parameter of the estimation process that produces m λ,a, q λ,a, in effect that m λ,a, q λ,a are functions of m U,d D,e >. Only minimal assumptions will be made about the estimation process - as will be seen below, only requirements on the residuals produced in equations 6 and 9. However, such estimates appear to require explicit use of scale separation, abstracted in part by introducing a family d = {d λ : λ > } D of data, and corresponding separable least squares problems. The high frequency cone condition d λ k C k λ d λ k+, k k (4) gives a quantitative expression of scale separation. The lower index bound k is presumed to be at most. Remark 3. For the Sobolev scale, this condition identifies the regularization parameter λ with wavelength. From here on, m λ [m,d] and q λ [m,d] denote the solutions of the parametrized systems N λ [m] m λ [m,d] = F [m]d λ (4) and Linking the size of the residuals N λ [m] q λ [m,d] = A A m λ [m,d]. (43) e m,λ [m,d] = N λ [m] m λ,a [m,d] F [m]d λ, (44) e q,λ [m,d] = N λ [m] q λ,a [m,d] A A m λ,a [m,d,e] (45)

153 MSIVA MSIVA 49 to the solution errors m λ,a m λ and q λ,a q λ requires some assumption about the approximate solutions m λ,a and q λ,a. Note that any Krylov method for solution of the systems 4 and 43, preconditioned by (I + λ R R), yields first iterates for which the residuals are m λ, [m,d] = (I + λ R R) F [m]d λ (46) q λ, [m,d] = (I + λ R R) A A m λ, [m,d] (47) e m,λ, = S[m](I + λ R R) F [m]d λ, (48) e q,λ, = S[m](I + λ R R) A A m λ, [m,d], (49) Theorem. Suppose that for ɛ >, m λ,a [m,d,ɛ] and q λ,a [m,d,ɛ] are approximate solutions of 4 and 43 for which the residuals e m,λ,ɛ and e q,λ,ɛ satisfy Then there exists K > and λ so that for λ λ, e m,λ,ɛ m ɛ e m,λ, m (5) e q,λ,ɛ m ɛ e q,λ, m (5) m λ,a [m,d,ɛ] m λ [m,d] m Kɛ d λ d (5) q λ,a [m,d,ɛ] q λ [m,d] m Kɛ d λ d (53) uniformly in m U, for fixed choice of {C k } in 4. Proof. Since N λ [m]( m λ,a [m,d,ɛ] m λ [m,d]) = e m,λ,ɛ, m λ,a [m,d,ɛ] m λ [m,d] K λ e m,λ,ɛ Kɛ λ e m,λ, Kɛ λ d λ Kɛ d λ d The next-to-the last inequality follows from the definition of e m,λ, and the uniform Op bound on S[m], the last from 4 for k =. Thus 5 is established. Since e q,λ, takes the same form, a similar estimate holds for it. Corollary. Under the assumptions of Theorem, there exists K for which λ,a[m,d,ɛ] J λ [m,d] Kɛ (54) G λ [m](d, m λ,a [m,d,ɛ], q λ,a [m,d,ɛ]) J λ [m,d] Mb Kɛ (55) for m U and sufficiently small λ.

154 5 Huang, Nammour and Symes MSIVA Proof. Follows from continuity of G λ as described in Theorem, and bounds on the solution errors established in Theorem. OPTIMIZATION In view of the error inherent in the calculations explained in the previous sections, standard variants of Newton s method for local minimization cannot be guaranteed to converge to a local minimizer of J λ. Minimization of the reduced objective requires use of iterations that converge in the presence of inexact gradient and objective function evaluations. Such an algorithm would necessarily need to couple error and step size control. Since error-contaminated function and gradient computations are hardly rare, it is unsurprising that this topic has a fairly large literature (Dembo et al., 98; Carter, 99; Deuflhard, 99; Carter, 993; Eisenstat and Walker, 994). All of these works and many more recent ones share a major drawback: they mandate absolute control of function and/or gradient error, implying in particular that the computed gradient be a descent direction. While of course this condition must eventually hold, it is nearly impossible to check in practice: many sources of error are like those considered in the preceding sections, in that they give no direct measure of gradient or function error, but only allow indirect control. One exception to this pattern is the work of Heinkenschloss and Vicente (); Kouri et al. (3, 4) on error control in the context of trust region methods for sequential qudratic programming, an approach to constrained optimization. Trust region globalization is a general concept that applies also to unconstrained formulations (Conn et al., ; Nocedal and Wright, 999). The condition that Heinkenschloss and Vicente () place on gradient error takes the form: error Kɛ, where ɛ is a control parameter but K is unknown or at least poorly controlled. Thus nothing can be said a priori about the size of the gradient error, or even whether the computed gradient is a descent direction, except in an asymptotic sense. Reference to Corollary shows that the theory developed above provides exactly this sort of condition. The trust region approach to minimization of a C function f defined on a

155 MSIVA MSIVA 5 Hilbert space H bases its kth step on a quadratic model approximating f near x k. m k (s) = f k + s,g k + s,h ks. Here f k is the computed value and g k the computed gradient at x k. H k is an approximation, possibly quite crude, to the Hessian of f at x k. The basic trust region algorithm seeks the step s = x k+ x k as the optimum of the constrained problem minimize m k (s) subject to s k, (56) The trust radius k is also subject to update as the iteration proceeds, so as to satisfy a sufficient decrease criterion leading to assured global convergence to a local minimizer, under various conditions (smoothness, Hessian definiteness, etc.) some of which will be mentioned below. I will describe a simple variant of the trust radius step, depending on two computed quantities: actual reduction, and predicted reduction, actred = f (x k ) f (x k + s) predred = m k () m k (s) = ( s,g k + s,h ks ), and on four magic numbers, < η < η < and < γ < < γ. Choose a search direction p by minimizing the model as an unconstrained problem: that is, p = H k g k. Note that H k is only an approximation to the Hessian of f, and that in a rather loose sense: it may be computed by building up a Krylov or quasi-newton approximation to the Hessian or its Gauss-Newton simplification, for example. Thus p may be only a crude approximation to a Newton or Gauss-Newton step: this algorithm description encompasses so-called truncated Newton-Krylov methods, for example, and the step selection may interact with the computation of H k. Next enter the step update loop:. set s = k p

156 5 Huang, Nammour and Symes MSIVA. compute actred, predred. (a) if actred η predred, k γ k, go to. (b) if actred η predred, k min(,γ k ), 3. x k+ = x k + s, k+ = k, exit. To see how this algorithm accommodates function and gradient error, first presume that there is none: that is, f k = f (x k ),g k = f (x k ), and H k β > for all k. The theory presumes that the function f is twice continuously differentiable, bounded below, and has uniform bounds on the gradient and condition number of the Hessian over the sub-level sets of the objective. It is easy to see that the condition for a successful step implies that f k f k+ C g k k (57) references, these assumptions imply a natural floor under the trust radius k. The trust radius decreases only when the actual function reduction is not at least the lower proportion η of the predicted reduction. However as the trust radius, hence the step, gets smaller, the actual reduction and the predicted reduction become close, since the model becomes close to the function to first order. Once the trust radius is small enough, the conditions for further decrease are never met, and the trust radius never decreases beyond a fixed positive thresshold. Since the function values are decreasing from step to step and bounded below, they must converge. Therefore the actual reduction converges to zero, hence the gradient must converge to zero thanks to inequality 57. Note that this reasoning does not establish that {x k } converges - that requires a bit more reasoning. However, if the sequence of iterates does converge, it must converge to a stationary point. In fact the expected behaviour of this algorithm is that it eventually settles down at k =, takes the full step, and converges at whatever rate the underlying approximation to Newton s method yields. Convergence with inexact function and gradient evaluations follows from an extension of the same reasoning. Assume for the moment that the function values are exact. Then the primary condition to be imposed on the gradient, according to Heinkenschloss and Vicente (), is of the form g k f (x k ) ξ min( g k, k ). (58)

157 MSIVA MSIVA 53 in which it is merely required that the constant ξ be uniform over the sublevel set of f in which the iterates lie. Then once again reduction of k during a step eventually stops short of a uniform positive lower bound, since the inequality 58 forces g k, hence the step s, towards the steepest descent step. For short enough steepest descent steps the actual reduction exceeds η times the predicted reduction, which leads both to the step being taken and in the computed gradient g k being reduced in length. I embed this mechanism in an algorithm for minimization of J λ following the pattern of Heinkenschloss and Vicente (), section 5. Add ξ > to the list of magic numbers required by the trust region algorithm; in theory at least ξ = is adequate. Define (f, g, H) = JET (m, ɛ) via the algorithm. compute initial approximate solutions m, q with residuals e m,,e q, ; m = (I + λ R R) F [m]d λ (59) q = (I + λ R R) A A m (6). compute approximate solutions m, q of the system 4,43 satisfying with residuals e = (e m,e q ) satisfying 3. compute approximate function value, gradient e m m ɛ e m, m (6) e q m ɛ e q, m (6) and approximate Hessian H f = A m (63) g = G λ [m](d, m, q) (64) Formulation of inexact Trust Region algorithm:. (f k,g k,h k ) = JET (m k,ɛ k )

158 54 Huang, Nammour and Symes MSIVA. while ɛ k > ξ min( g k m, k ) (a) ɛ k ξ min( g k m, k ) (b) (f k,g k,h k ) = JET (m k,ɛ k ) 3. s = k H k g k,m k+ = m k + s,ɛ k+ = ɛ k,(f k+,g k+,h k+ ) = JET (m k+,ɛ k+ ) actred = f k f k+ (65) ( predred = ) k k g k,hk g k m (66) 4. if actred η predred, k γ k, go to if actred η predred, k min(,γ k ), 6. next k Corollary. The sequence {m k } produced by the inexact Trust Region algorithm just described converges to a stationary point of J λ. MICROLOCALIZATION With a few exceptions, no examples of forward maps F defined by separable inverse wave problems have actual parametrices, that is, operators F satisfying the definition 7. Instead, they have microlocal parametrices. Abstractly, the microlocal property is captured in an approximate projector Π Op ( M, M), self-adjoint with respect to m and close to idempotent in the sense that Π Π m, Π. (67) 4 A microlocal parametrix F satisfies F [m] F[m] Π = S[m] Op ( M, M). (68) In this section, the approximate projector Π is locally constant, that is, the relation 68 holds in a neighborhood of m M.

159 MSIVA MSIVA 55 Augment the linear least squares problem 5 by adding m,(i Π) m m to objective. The corresponding effect on the normal operator is: N λ [m] = F [m] F[m] + (I Π) + λ R R = I + S[m] + λ R R. Define as before m λ [m,d] to be the solution of N λ [m] m λ [m,d] = F [m]d, and the reduced objective J λ by equation 7. Computation of J λ and J λ goes exactly as before, and Theorems and and Corollary hold verbatim. The only change comes in the relation between the minimizer of J λ and the solution of the inverse problem m, m: the data misfit is small only if one assumes that d can be fit with m nearly annihilated by I Π. That is an a priori assumption about the solution, which in examples entails a similar assumption about data. REFERENCES Araya, K., and W. Symes, 996, D and.5d kirchhoff inversion using upwind finite difference amplitudes: Proc. 66th Annual International Meeting, Society of Exploration Geophysicists, (Expanded abstract). Beylkin, G., 985, Imaging of discontinuities in the inverse scattering problem by inversion of a causal generalized Radon transform: Journal of Mathematical Physics, 6, Beylkin, G., and R. Burridge, 99, Linearized inverse scattering problem of acoustics and elasticity: Wave Motion,, 5. Biondi, B., and A. Almomin, 4, Simultaneous inversion of full data bandwidth by tomographic full-waveform inversion: Geophysics, 79, WA9 WA4. Blazek, K., C. C. Stolk, and W. Symes, 3, A mathematical framework for inverse wave problems in heterogeneous media: Inverse Problems, 9, 65: 34. Carter, R., 99, On the global convergence of trust region algorithms using inexact gradient information: SIAM J.Numer. Anal., 8, 5 65., 993, Numerical experience with a class of algorithms for nonlinear optimization using inexact function and gradient information: SIAM J. Sci. Comp., 4, Chauris, H., and M. Noble,, Two-dimensional velocity macro model estimation from seismic reflection data by local differential semblance optimization:

160 56 Huang, Nammour and Symes MSIVA applications synthetic and real data sets: Geophysical Journal International, 44, 4 6. Cohen, J., and N. Bleistein, 977, An inverse method for determining small variations in propagation speed: SIAM Journal on Applied Mathematics, 3, Conn, A., N. Gould, and P. Toint,, Trust region methods: SIAM. Dembo, R., S. Eisenstat, and T. Steihoug, 98, Inexact Newton methods: SIAM Journal on Numerical Analysis, 9, Deuflhard, P., 99, Global inexact Newton methods for very large scale nonlinear problems: Impact of Computing in Science and Engineering, 3, Eisenstat, S., and H. Walker, 994, Globally convergent inexact Newton methods: SIAM Journal on Optimization, 4, Fei, W., and P. Williamson,, On the gradient artifacts in migration velocity analysis based on differential semblance optimization: 8th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Heinkenschloss, M., and L. Vicente,, Analysis of inexact trust-region SQP algorithms: SIAM J. Optimization, (), Hou, J., and W. Symes, 5, An approximate inverse to the extended Born modeling operator: Geophysics, 8, no. 6, R33 R349. Huang, Y., and W. W. Symes, 5, Born waveform inversion via variable projection and shot record model extension: 85rd Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Kern, M., and W. Symes, 994, Inversion of reflection seismograms by differential semblance analysis: Algorithm structure and synthetic examples: Geophysical Prospecting, 99, Kouri, D. P., M. Heinkenschloss, D. Ridzal, and B. G. van Bloemen Waanders, 3, A trust-region algorithm with adaptive stochastic collocation for PDE optimization under uncertainty: SIAM Journal on Scientific Computing, 35, A847 A879., 4, Inexact objective function evaluations in a trust-region algorithm for pde-constrained optimization under uncertainty: SIAM Journal on Scientific Computing, 36, A3 A39. Lameloise, C.-A., H. Chauris, and M. Noble, 5, Improving the gradient of the image-domain objective function using quantitative migration for a more robust migration velocity analysis: Geophysical Prospecting, 63, Li, M., J. Rickett, and A. Abubakar, 3, Application of the variable projection scheme to frequency-domain full-waveform inversion: Geophysics, 78, R49

161 MSIVA MSIVA 57 R57. Liu, Y., W. Symes, and Z. Li, 4, Inversion velocity analysis via differential semblance optimization: Presented at the 76th Annual Meeting, Extended Abstracts, European Association of Geoscientists and Engineers. Luo, S., and P. Sava,, A deconvolution-based objective function for waveequation inversion: 8th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Mulder, W., and F. ten Kroode,, Automatic velocity analysis by differential semblance optimization: Geophysics, 67, Nocedal, J., and S. Wright, 999, Numerical Optimization: Springer Verlag. Operto, M. S., S. Xu, and G. Lambaré,, Can we quantitativvely image complex structures with rays?: Geophysics, 65, Plessix, R.-E.,, Automatic cross-well tomography: an application of the differential semblance optimization to two real examples: Geophysical Prospecting, 48, Pratt, R., and W. Symes,, Semblance and differential semblance optimization for waveform tomography: a frequency domain approach: Journal of Conference Abstracts, 7, Shen, P., and W. Symes, 8, Automatic velocity analysis via shot profile migration: Geophysics, 73, VE49 6. Song, H., and W. Symes, 994, Inverison of crosswell tomography via differential semblance optimization: Technical report, The Rice Inversion Project. (Annual Report). Symes, W., 4, Seismic inverse problems: recent developments in theory and practice: Presented at the Inverse Problems - from Theory to Application, Proceedings, Institute of Physics. Symes, W., and F. Santosa, 988, Computation of the Newton Hessian for leastsquares solution of inverse problems in reflection seismology: Inverse Problems, 4, 33. Taylor, M., 98, Pseudodifferential Operators: Princeton University Press. ten Kroode, F.,, A wave-equation-based Kirchhoff operator: Inverse Problems, 53: 8., 4, A Lie group associated to seismic velocity estimation: Presented at the Inverse Problems - from Theory to Application, Proceedings, Institute of Physics. van Leeuwen, T., and F. Herrmann, 3, Mitigating local minima in fullwaveform inversion by expanding the search space: Geophysical Journal International, 95,

162 58 Huang, Nammour and Symes MSIVA van Leeuwen, T., and W. Mulder, 9, A variable projection method for waveform inversion: 7st Annual Meeting, Extended Abstracts, European Association of Geoscientists and Engineers, U4. Virieux, J., S. Jin, R. Madariaga, and G. Lambaré, 99, Two dimensional asymptotic iterative elastic inversion: Geophys. J. Int., 8, Vyas, M., and Y. Tang,, Gradients for wave-equation migration velocity analysis: 8th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Warner, M., and L. Guasch, 4, Adaptive waveform inversion: Theory: 84th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Weibull, W., and B. Arntsen, 3, Automatic velocity analysis with reverse time migration: Geophysics, 78, S79 S9. Xu, S., Y. Zhang, and B. Tang,, 3D angle gathers from reverse time migration: Geophysics, 76, no., S77 S9. Zhang, Y., J. Sun, and S. H. Gray, 7, Reverse time migration: amplitude and implementation issues: 77th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, Zhang, Y., G. Zhang, and N. Bleistein, 5, Theory of true-amplitude one-way wave equations and true-amplitude common-shot migration: Geophysics, 7.

163 The Rice Inversion Project, TRIP6, July 6, 6 An alternative formula for approximate extended Born inversion Jie Hou, William W. Symes ABSTRACT Hou and Symes (5b) modify the adjoint of subsurface offset extended Born modeling operator into an approximate inverse operator with modeland data-domain weight operators. Although the derivation is based on high-frequency approximation, the implementation doesn t involve any ray tracing. However, the model-domain weight operator involves a daunting prospect, that is, the Fourier transform of the extended model (3D transform for D, 5D transform for 3D). In this paper, we present an alternative formula for calculating the approximate inverse operator. This new formula is simpler and cheaper to implement while preserving the same approximate inversion property. Both theoretical derivation and numerical examples demonstrate the correctness of the alternative formula. INTRODUCTION Motivated by ten Kroode s analysis of 3D acoustics (ten Kroode, ), Hou and Symes (5b) described a modification of RTM (with space-shift imaging condition), based on D constant density acoustic modeling, that asymptotically inverts the subsurface offset extended Born modeling operator. This approximate inverse takes the form F = W model F T W data, () in which F is the subsurface offset extended Born modeling operator (dependence on background model suppressed), F T is its adjoint (a form of RTM), W model and W data are model- and data-domain weight operators, F denotes the approximate inverse. The weight operators are defined by W model = 4v5 (D x + D z ) / (D h + D z ) / P (x,z,h,k x,k z,k h ); W data = I 4 t D zs D zr. 59 ()

164 6 Hou and Symes MSIVA Here, D x,d h,d z are x,h,z-direction derivative, P (...) is a so-called pseudodifferential operator (P sido) which acts approximately as the identity on inputs concentrated near h =, I t is time integral and D zs,d zr are source and receiver depth derivative. In practice, the model-domain weight operator is implemented in wavenumber domain. Such computation requires the Fourier transform of the extended model or a transform of equivalent complexity. This implies a daunting prospect: 3D transform in D and 5D transform in 3D. In this paper, we show that an alternative construction is possible, leading to an alternative approximate inverse F = 8v 4 D z F T D t W data P (x,z,h,k x,k z,k h ) (3) of roughly the same quality as that constructed in Hou and Symes (5b). The implementation of the alternative formula involves only a simple vertical derivative rather than the expensive Fourier transform. Note that the expression shown in equation 3 does not have the same form as the weighted adjoint relation defined by equation. It remains to be seen whether the altenative approximate inverse operator shown in equation 3 can play a similar role in accelerating Krylov iterations (Hou and Symes, 5a, 6a,b). THEORY In this section, we will briefly review the approximate inverse operator explained by Hou and Symes (5b) and derive the alternative formula. Throughout this paper, we will cite equations by number from Hou and Symes (5b), also use the notation φ(x r,x s,x,z,h) = T (x s,z s,x h,z) + T (x r,z r,x + h,z) from that paper. The derivation of Hou and Symes (5b) starts from understanding the asymptotic behavior of the slightly modified normal operator (I t F) (I t F), which can be generalized into a class of operator : M a δ v(x,z,h) = dx r dx s dx dz dh a(x r,x s,x,z,h,x,z,h ) δ(φ(x r,x s,x,z,h) φ(x r,x s,x,z,h ))δ v(x,z,h ). (4) Appendix A shows that this class of integral operator has the asymptotic repre-

165 MSIVA Alternative Approximate Inversion 6 sentation M a δ v(x,z,h) dk 4π x dk z dk h e i(xk x+zk z +hk h ) δ v(kx,k z,k h )B (x,z,h,k x /k z,k h /k z ) ( ) v φ αs α r a kz z x s x (x r(x,z,h,k x /k z,k h /k z ),x s (x,z,h,k x /k z,k h /k z ),x,z,h). r (5) In this representation, B (x,z,h,k x /k z,k h /k z ) is a symbol of order zero satisfying B (x,z,,k x /k z,k h /k z ). (6) Appendix B studies specifically the asymptotic representation for M a = (I t F) (I t F), which replicates the derivation of equation appeared in Hou and Symes (5b) : (I t F) (I t F)δ v(x,z,h) πv(x,z) 5 dk x dk z dk h e i(xk x+zk z +hk h ) δ v(kx,k z,k h ) ( ) αs α r a x s x r r a s P k xz k (x r(x,z,h,k x /k z,k h /k z ),x s (x,z,h,k x /k z,k h /k z ),x,z,h). hz (7) If the integrand factor in the bracket were identically =, then the right side recovers the velocity perturbation δ v. Hou and Symes (5b) discuss in details how to apply additional filtering and scaling operators on model and data so that the asymptotic representation above can be simplified into an approximate identity : 4v 5 ((Dx + Dz ) / (Dh + D z ) / F )(It I t D zs I t D zr I t F)δ v(x,z,h) 8π 3 dk x dk z dk h e i(xk x+zk z +hk h ) P (x,z,h,k x /k z,k h /k z ) δ v(k x,k z,k h ). Appendix C depicts the derivation of an alternative formula using the same idea : 8v(x,z) 4 D z (I t F) (D t I t D zs I t D zr I t F)δ v(x,z,h) 8π 3 dk x dk z dk h e i(xk x+zk z +hk h ) B (x,z,h,k x /k z,k h /k z ) δ (9) v(k x,k z,k h ). In view of property 6, equation 9 defines an approximate inverse for F. This definition must be equivalent to the approximate inverse defined by equation 8. Therefore, 8v 4 D z F D t W data F. () (8)

166 6 Hou and Symes MSIVA As is clear from equation 9, the approximate inverse defined here differs from the approximate inverse defined in Hou and Symes (5b) by an pseudodifferentials operator (P sido) with symbol C, defined in the appendix. C is also equal to at h =, implying that the two approximations have the same nature. In particular, they are equally good in the focused case, and are in that case inverses modulo smoothing operators, i.e. what are commonly called true amplitude migrations in the seismic imaging literature. The advantage of the approximate inverse derived here is that it commutes with restriction to h = const. planes, so is suitable for application directly to the physical image or to averaging over a small range of h near h =. NUMERICAL EXAMPLE In this section, we use a synthetic Marmousi model (Bourgeois et al., 99) to compare the accuracy of two approximate inverse formulas. Figure shows the smoothed background velocity model and reflectivity model. The model is discretized on a 3*9 grid with a spacing of m in both horizontal and vertical directions. The generated Born data have 3 common shot gathers every 4m, and each shot has 9 receivers every m. The simulation uses a Hz bandpass wavelet with ms temporal sampling. The recorded length is 4 s in total. Both two formulas mentioned above are applied on the Born data for the Marmousi model. The extended images have an subsurface offset range [-5 m : 5 m] with m spacing. Figure plots the extended inversion produced by two formulas and their difference. The comparison illustrates that two formulas for the approximate inversion express the same nature and produce very similar results. The difference mainly corresponds to the energy out of the asymptotic framework, e.g. diving wave and refractions. As explained by Hou and Symes (5b), the physical model can be also recovered by stacking along subsurface offset axis (nonextended inversion). We plot the stacked images and their difference in Figure 3. These two stacked images are almost identical visually. The artifacts appeared in the extended images also cancel during the stacking process. The middle trace extracted from the reflectivity model and two stacked images (Figure 4a) further reinforce the similarity of two formulas. In order to further verify the effectiveness of the approximate inverse operators, we apply extended Born modeling operator on the inverted extended images to resimulate the data.

167 MSIVA Alternative Approximate Inversion 63 Figure : Marmousi model. (a) Background velocity model obtained by smoothing the original Marmousi model; (b) Reflectivity model obtained by taking the difference between the original model and the background model.

168 64 Hou and Symes MSIVA Figure 5 plots the middle shot of the original data, the resimulated data for the extended image using original formula, the resimulated data for the extended image using new formula and the difference between two resimulated data. The approximate inversion using both formulas produce almost the same level of relative data misfit, compared to the original data. The middle trace extracted from the middle shot of the original data and two resimulated data confirm this view more clearly (Figure 4b). However, the substantial difference between two resimulated data (Figure 5d) shows the effect of different P sidos in two formulas. CONCLUSIONS In this paper, we have proposed an alternative formula for approximate extended Born inversion. By theoretical derivation and numerical experiment, we have demonstrated that the proposed formula can achieve a similar accuracy as that proposed by Hou and Symes (5b). The new formula requires only a vertical derivative instead of operation in wavenumber domain for the model-domain weight operator, making the implementation easier and cheaper. ACKNOWLEDGEMENTS We are grateful to the sponsors of The Rice Inversion Project (TRIP) for their longterm support, and to Shell International Exploration and Production Inc. for its support of Jie Hou s PhD research. We thank Fons ten Kroode for inspiring our work. Thanks also go to Jon Sheiman, Henning Kuehl and Peng Shen for very helpful discussion. We have benefited greatly from the computational resources provided by the Texas Advanced Computing Center and the Rice University Research Computing Support Group. The Madagascar open source software package has been critically useful in our work.

169 MSIVA Alternative Approximate Inversion 65 Figure : (a) Extended image inverted with the original formula; (b) Extended image inverted with the new formula; (c) The difference between two extended images.

170 66 Hou and Symes MSIVA Figure 3: (a) Stacked image corresponding to the extended image in Figure a; (b) Stacked image corresponding to the extended image in Figure b; (c) The difference between (a) and (b).

171 MSIVA Alternative Approximate Inversion 67 Figure 4: Middle trace comparison. (a) Model comparison: blue line is the reflectivity model; green line is the stacked image using original formula; red line is the stacked image using new formula. (b) Data comparison: blue line is the original data, green line is the resimulated data corresponding to the original formula, red line is the resimulated data corresponding to the new formula.

172 68 Hou and Symes MSIVA Figure 5: The middle shot of (a) original Born data ; (b) resimulated Born data corresponding to the extended image in Figure a; (c) resimulated Born data corresponding to the extended image in Figure b;(d) The difference between (c) and (d).

173 MSIVA Alternative Approximate Inversion 69 APPENDIX A PSEUDODIFFERENTIAL EXPRESSION FOR A CLASS OF INTEGRAL OPERATORS Let s start with a careful statement of the principle stated after equation by Hou and Symes (5b), concerning the generalized operator defined in equation 4: M a δ v(x,z,h) = dk x dk z dk h dx r dx s dx dh a(x r,x s,x,z,h,x,z(x r,x s,x,h,z,x,h ),h ) ( φ z 8π 3 (x r,x s,x,z(x r,x s,x,h,z,x,h ),h )e ik z ( x k x k z +h k h k z +Z(x r,x s,x,h,z,x,h ) )) δ v(k x,k z,k h ). (A-) The statement implicitly assumes φ, that is, energy propagates vertically z and reflectors are subhorizontal. Otherwise, vertical subsurface offset extension needs to be considered. It also ignores the possibility of diving rays. Apply stationary phase as in equation A-4 of Hou and Symes (5b), with large parameter k z, amplitude 8π 3 a(x r,x s,x,z,h,x,z(x r,x s,x,h,z,x,h ),h ) φ z and phase x k x k z + h k h k z (x r,x s,x,z(x r,x s,x,h,z,x,h ),h ) (A-) + Z(x r,x s,x,h,z,x,h ) (A-3) to approximate the integral over x,h,x r,x s. Stationary phase conditions are given in equation A-6, A-8 of Hou and Symes (5b), and we obtain φ z [ ] [ ] π dk x dk z dk h k z 8π 3 a(x r (x,z,h,k x /k z,k h /k z ),x s (x,z,h,k x /k z,k h /k z ),x,z,h,x,z,h) (x r (x,z,h,k x /k z,k h /k z ),x s (x,z,h,k x /k z,k h /k z ),x,z,h)e i(xk x+zk z +hk h ) δ v(kx,k z,k h ) det Hess(x r (x,z,h,k x /k z,k h /k z ),x s (x,z,h,k x /k z,k h /k z ),x,z,h) / (A-4)

174 7 Hou and Symes MSIVA in which x r (x,z,h,k x /k z,k h /k z ),x s (x,z,h,k x /k z,k h /k z ) are determined by tracing rays from (x, z, h) to the source and receiver manifolds determined by the stationary phase conditions. The x-components of the ray initial covectors are given in terms of x,z,h,k x /k z,k h /k z by A-, the sum of the z-components (that is, φ/ z) by A- 4, A-5, and A-6, and the z-components of the individual rays by A- (All equations mentioned here are cited from Hou and Symes (5b)). The Hessian factor det Hess / is buried in A-7 and A-8 of Hou and Symes (5b): we fish it out as ( ) ( ) det Hess / = v αs α r φ B (A-5) x s x r z in which the arguments are suppressed temporarily. B is the quantity in brackets ( ) φ in equation A-8 of Hou and Symes (5b), multiplied by. The precise z form of B will not concern us - we need only know that B(x,z,,k x /k z,k h /k z ) =. (A-6) Combining these results, we obtain v k z ( φ αs α r z x s x r M a δ v(x,z,h) dk 4π x dk z dk h e i(xk x+zk z +hk h ) δ v(kx,k z,k h ) ) ab (x r (x,z,h,k x /k z,k h /k z ),x s (x,z,h,k x /k z,k h /k z ),x,z,h). (A-7) APPENDIX B ASYMPTOTICAL REPRESENTATION OF NORMAL OPERATOR For the slightly modified normal operator (I t F) (I t F), M a is defined by a(x r,x s,x,z,h,x,z,h ) = 4π v(x,z) 6 a r(x r,x+h,z)a r (x r,x +h,z )a s (x s,x h,z)a s (x s,x h,z ) (B-)

175 MSIVA Alternative Approximate Inversion 7 (from equation A- of Hou and Symes (5b)). Plugging it into equation A-7, we have (I t F) (I t F)δ v(x,h,z) = πv(x,z) 4 dk x dk z dk h e i(xk x+zk z +hk h ) δ v(kx,k z,k h ) ( ) φ αs α r kz a r a s B z x s x (x r (x,z,h,k x /k z,k h /k z ),x s (x,z,h,k x /k z,k h /k z ),x,z,h). r (B-) From A-9 of Hou and Symes (5b), in which k xz = φ z (x,z,,k x/k z,k h /k z ) = v kx + kz,k hz = kh + k z. Set k z k xz k hz (B-3) C(x,z,h,k x /k z,k h /k z ) = v k xz k hz φ kz z (x,z,h,k x/k z,k h /k z ) (B-4) so that φ z = kz, C(x,z,,k v k xz k x /k z,k h /k z ) =. (B-5) hz It leads to (I t F) (I t F)δ v(x,h,z) = πv(x,z) 5 dk x dk z dk h e i(xk x+zk z +hk h ) δ v(kx,k z,k h ) ( ) αs α r a r a s P k xz k hz x s x (x r(x,z,h,k x /k z,k h /k z ),x s (x,z,h,k x /k z,k h /k z ),x,z,h), r (B-6) in which P = C B, P (x,z,,k x/k z,k h /k z ) =. (B-7) In fact, P defined above is identical to P defined after equation A-9 of Hou and Symes (5b). The expression above is identical to equation of Hou and Symes (5b).

176 7 Hou and Symes MSIVA APPENDIX C AN ALTERNATIVE FORMULA FOR APPROXIMATE INVERSION According to equations, of Hou and Symes (5b), Whence in which dxdzdha s a r cosθ r v r D t I t D zs I t D zr I t Fδ v(x r,x s,t) cosθ s v s ( ) φ δ(t T z r T s ) πd zδ v v 3 (x,z,h). (C-) (I t F) (D t I t D zs I t D zr I t F)δ v(x,z,h) M a D z δ v(x,z,h), (C-) a = 4π v 3 (v ) 3 a s a r a sa cosθ r cosθ r v r s v s ( ) φ. (C-3) The prime denotes evaluation at x,z,h. Use the stationary phase condition x = x,z = z,h = h (Hou and Symes (5b), equation A-6) and equation 5, we obtain the asymptotic representation (I t F) (D t I t D zs I t D zr I t F)δ v(x,z,h) πv(x,z) 4 dk x dk z dk h e i(xk x+zk z +hk h ) δ v(kx,k z,k h ) ( αr x r α s x s ) a r a cosθ r s v r z cosθ s v (x r(x,z,h,k x /k z,k h /k z ),x s (x,z,h,k x /k z,k h /k z ),x,z,h) s i k z B (x,z,h,k x /k z,k h /k z ). From equations B-6, B-7 of Hou and Symes (5b), a r cosθ r v r ( αr x r ) = 8π = cosθ s a s v s (C-4) ( ) αs (C-5) x s so the RHS of equation C-4 simplifies to = 64π 3 v 4 dk x dk z dk h e i(xk x+zk z +hk h ) i B (x,z,h,k k x /k z,k h /k z ) δ v(k x,k z,k h ). z (C-6)

177 MSIVA Alternative Approximate Inversion 73 An additional factor of D z leads to 8v(x,z) 4 D z (I t F) (D t I t D zs I t D zr I t F)δ v(x,z,h) 8π 3 dk x dk z dk h e i(xk x+zk z +hk h ) B (x,z,h,k x /k z,k h /k z ) δ v(k x,k z,k h ). (C-7) REFERENCES Bourgeois, A., P. Lailly, and R. Versteeg, 99, The Marmousi model, in The Marmousi Experience: IFP/Technip. Hou, J., and W. Symes, 5a, Accelerating extended least squares migration with weighted conjugate gradient iteration: 85th Annual International Meeting, Expanded Abstracts, Society of Exploration Geophysicists, , 5b, An approximate inverse to the extended Born modeling operator: Geophysics, 8, no. 6, R33 R349., 6a, Accelerating extended least-squares migration with weighted conjugate gradient iteration: Geophysics, 8, no. 4, S65 S79., 6b, Accelerating least squares migration with weighted conjugate gradient iteration: Presented at the 78th Annual International Conference and Exhibition, Expanded Abstract, European Association for Geoscientists and Engineers. ten Kroode, F.,, A wave-equation-based Kirchhoff operator: Inverse Problems, 53: 8.

178 74 Hou and Symes MSIVA

179 The Rice Inversion Project, TRIP6, July 6, 6 Fast extended waveform inversion using Morozov s discrepancy principle Lei Fu and William W. Symes ABSTRACT Extended waveform inversion globalizes the convergence of seismic waveform inversion by adding non-physical degrees of freedom to the model, thus permitting it to fit the data well throughout the inversion process. These extra degrees of freedom must be curtailed at the solution, for example by penalizing them as part of an optimization formulation. For separable (partly linear) models, a natural objective function combines a mean square data residual and a quadratic regularization term penalizing the non-physical (linear) degrees of freedom. The linear variables are eliminated in an inner optimization step, leaving a function of the outer (nonlinear) variables to be optimized. This variable projection method is convenient for computation, but requires that the penalty weight be increased as the estimated model tends to the (physical) solution. We describe an algorithm based on an unusual application of the Morozov s discrepancy principle to control the penalty weight as part of the outer optimization. We illustrate this algorithm in the context of constant density acoustic waveform inversion, by recovering background model and perturbation fitting bandlimited waveform data in the Born approximation. INTRODUCTION Seismic full waveform inversion is able to yield high-resolution images of the subsurface structure by iteratively minimizing the difference between predicted data and observed data (Virieux and Operto, 9; Vigh et al.,, 3). However, the success of full waveform inversion (FWI) is highly dependent on lowfrequency data or a initial model of the earth that contains accurate long-scale velocity information. These attributes are available for some surveys but not for 75

180 76 Fu & Symes MSIVA others (Plessix et al., ). Without them, data fit is poor from the outset, and tends to stagnate at uninformative velocity models as the iteration proceeds. Extended waveform inversion enlarges the model with non-physical degrees of freedom, in such a way that data fit may be achieved even with poor prior information. Since the additional degrees of freedom are non-physical, they must be suppressed if the extended model is to converge to a solution of the full waveform inversion problem, which must necessarily be described only by the parameters of the chosen wave physics. The inversion task is to construct a sequence of iterates in the extended model domain for which () a measure of data misfit e remains below a chosen error level, and () a measure of model non-physicality p tends to zero, so that the limit is a solution of the (physical model) waveform inversion problem. The main contribution of this paper is a complete description of an optimization-based approach to this task. We combine e and p in a single objective function with a penalty weight α: J α = e + αp. Since the penalty term p should go to zero as the iteration converges, expressing the disappearance of the nonphysical extended degrees of freedom, it is to be expected that α should increase, in principle without bound, in the course of the iteration. We describe a simple method for increasing α sporadically, so as to maintain the data error e in an acceptable range around an initial estimate. This criterion for adjusting α via a target range for e is a version of the Morozov s discrepancy principle (Engl et al., 996; Sen and Roy, 3; Ajo-Franklin et al., 7). Use of a target range, rather than target value, allows several steps of an iterative optimization algorithm to be taken with constant α. The iteration continues through α updates by warm-starting the next sequence iterates. We apply this algorithm to a D reflection seismic example, using the subsurface offset extended model of linearized (Born) constant density acoustic scattering (Stolk et al., 9) and, for p, the differential semblance penalty to guide the model towards a physical limit (Shen and Symes, 8). The example illustrates the substantial improvement in convergence rate obtainable by systematic control of α. Acoustic Born scattering, extended or not, is separable: the model parameters naturally fall into two subsets, which we will call inner and outer, and the inner parameters have a linear influence on the modeled data. For acoustic Born scattering, the outer parameter is the background velocity field, or macromodel.

181 MSIVA Discrepancy Principle 77 The inner parameter is the reflectivity, or scaled velocity perturbation, which if extended is a function also of the subsurface offset (the background velocity, or outer parameter, is not extended in this approach). Separable problems may be approached via the variable projection algorithm (Golub and Pereyra, 3), which minimizes the objective J α over the inner variables, and treats the minimum value so obtained as a function of the outer variables, the reduced objective. In our example, the reduced objective is a function only of the background velocity field. Minimizing the reduced objective over the outer variable via the variable projection method is equivalent to finding a stationary point for the original objective function of all variables (Golub and Pereyra, 3). The model fitting property mentioned above is expressed as follows: for any choice of outer parameters, the inner (extended) parameters may be chosen so that the data is well-fit. Actually, the use of variable projection (or some other similar reduction scheme, with possibly different objectives for inner and outer variables) is not optional for this type of problem. Kern and Symes (994) and Huang and Symes (5), for example, show that the objective J α is very (in principle, arbitrarily) ill-conditioned and not amenable to any Newton-like optimization technique. The variable projection objective is far better conditioned, for any problem in which the outer parameters include kinematic fields (for instance, velocity) and the inner parameters are highly resolved fields. Other examples of separable extended scattering models are either Born approximations to more complicated scattering physics, or modifications of the energy source mechnism that violate the modeled data acquisition scheme. For recent examples of the first type, see see (Biondi and Almomin, 4; Weibull and Arntsen, 3; Lameloise et al., 5), also (Symes, 8) for a review of older work. Plessix (), Luo and Sava (), Warner and Guasch (4), and van Leeuwen and Herrmann (3) describe various examples of the second type of source extension, Note that the concept of model extension, as we describe it, is really a very old idea, implicit in the practice of seismic velocity analysis from its inception. The algorithm developed here has several unusual aspects, in comparison with conventional optimization approaches for solution of inverse problems. First, it

182 78 Fu & Symes MSIVA minimizes not one objective function, but a composite of many, namely J α for various values of α. While a pure penalty approach, that is, fixed α, might seem natural (and has been tried many times, see for example (Kern and Symes, 994; Huang and Symes, 5)), we will show that an algorithm with updated α can yield much faster convergence. Second, the Morozov s discrepancy principle in its usual form (as described for instance by Engl et al. (996)) is a method for controlling error in the solution of ill-posed operator equations or least-squares problems, for which the actual solution may not exist in general and may depend discontinuously on noise in the data. The principle provides a way to adjust a penalty parameter α so that the penalized solutions with data of noise level δ > tend to the noise-free solution as δ, provided that α in an appropriate δ-dependent way. In our application, by contrast, the data and its noise level are constant, and α instead, so that we navigate a path of approximate solutions for which the penalty p goes to zero. Application of the discrepancy principle demands an estimate of the noise level in the data, often the most problematic aspect of the principle s use. Our example poses a peculiar version of this problem: we commit a blatant inverse crime, using the same computational method to calculate the data as to invert it. In particular, the noise level in our example data - with the usual meaning of noise level, that is, the infimum of all data misfits over all possible models - is zero! The discrepancy principle would then imply α =, and no progress towards a physical solution could occur. Equally well, a % estimated error would imply data fit for any outer parameters with the inner (linear) parameters set to the zero vector, so that no information about the outer parameters could be derived. We get around this issue by a conservative and computationally motivated estimate of noise. We estimate the noise level as the misfit level attained by a reasonable computational effort for the problem with α = and a more or less arbitrary initial choice the of outer variable. Here reasonable means roughly the effort which we intend to devote to each iteration (objective or gradient evaluation). The net result is that the final, near-physical solution estimate fits the data as well as the initial estimate did, at roughly the same cost. This point of view makes sense only for separable extended models of the type described above, for which the inner variables can always be adjusted so that the data is fit to more or less arbitrary tolerance, for any choice of the outer variables (within the feasible set of the model). The remainder of the paper begins with a theory section describing the sep-

183 MSIVA Discrepancy Principle 79 arable model structure, the variable projection algorithm, the discrepancy principle, and our variable-α algorithm, in abstract form. The following section first explains the concrete form taken by the algorithm components for the constant density acoustic Born model and its subsurface offset extension. We then describe the application of this version of the algorithm to an example based on the SEG-EAGE overthrust model. We mention various unresolved issues and possible extensions in the penultimate section, and end by reiterating the conclusions of this study. THEORY This section is organized as follow: first, we introduce the separable inverse problem and variable projection method; then we investigate the role that weight α plays in objective function; we show how to interpret Morozovs discrepancy principle as a parameter choice rule to keep residual in an acceptable range. Separable inverse problems In this section, we present an abstract formulation of the key ideas mentioned in the introduction. We will give the various components concrete form appropriate for acoustic seismic modeling at the beginning of the next section. In this formulation, a model is a pair consisting of an outer parameter m and an inner parameter x. The forward (data simulation, modeling) operator F is linear in x and (possibly) nonlinear in m, as reflected in our notation for its evaluation: The value F[m]x is a vector in the space of data. m, x F[m]x () The objective has the form described in the introduction, that is, a linear combination of a data error or misfit term e and a penalty term p, the latter applying only to the inner variables. We will assume that the model and data spaces are Hilbert spaces, with inner products, and norm. We will use the same notation for inner and outer model and data spaces, distinguishing the (possibly

184 8 Fu & Symes MSIVA different) norms by context. The data error and model penalty are both normssquared: e[m,x] = F[m]x d, () p[m,x] = Ax, (3) and the objective is their weighted sum (the factor of / is for later computational convenience): J α [m,x] = e[m,x] + αp[m,x] (4) In concrete instances, the regularization operator A measures the extent of nonphysicality of the inner parameter x. Physical inner parameter vectors lie in its null space. As explained in the introduction, the object of the optimization is to remove the non-physical extended degrees of freedom from the model while maintaining data fit. Reducing p as defined in equation 3 will move the model towards the null space of A, which is exactly the physical subspace of the inner parameter vector. The weight α controls the amount of penalty applied for model extension. When α, the objective function expresses little constraint on model extension and allows good data fit. When α, minimization of J α forces the extended model to be close to physical one, so that the optimization approximates nonextended or physical inversion. As mentioned in the introduction, the separable nature of this least-squares inverse problems invites use of the variable projection method, a nested optimization approach. First, in the inner loop, the objective function is optimized over linear parameter x with the nonlinear parameter m fixed. The gradient of the objective function J α [m,x] with respect to x is x J α [m,x] = F[m] T (F[m]x d) + αa T Ax (5) where denotes adjoint. A stationary point of equation (4) satisfies the normal equation (F[m] T F[m] + αa T A)x = F[m] T d. (6) We presume that this system is positive definite for all values of α, and in particular that F[m] T F[m] is also positive definite: this condition may require that a primal version of the forward map be regularized. The system 6 thus has a

185 MSIVA Discrepancy Principle 8 unique solution that can be approximated via an iterative method such as conjugate gradient (CG) iteration. Since m and α determine the operator on the LHS of equation 6, its solution becomes a function x[m,α] of these quantities. We minimize the reduced objective J α [m,x[m,α]] over m via a gradient based method such as steepest descent, LBFGS or Gauss-Newton iteration (Nocedal and Wright, 999). The gradient of the reduced objective function J α [m,x[m,α]] with respect to m is m J α [m,x[m,α]] = (DF[m] T (F[m]x[m,α] d))x[m,α] (7) (Golub and Pereyra, 973). Since F[m] is linear, DF[m]δm is linear for each perturbation δm, and is also linear in δm. DF[m] T is defined by in terms of the inner products in the various model and data spaces by (DF[m] T d)x,δm = d,(df[m]δm)x (8) for any data vector d, outer parameter perturbation δm, and inner parameter vector x. The Discrepancy Principle As described in the introduction, the Morozov s discrepancy principle (in one of its guises) involves setting an acceptable range of data misfit [X,X + ], and adjusting the penalty weight α so that e[m,x[m,α]] lies in this range. The principle so stated applies to the inner optimization over x: since updating m changes the inner problem, the appropriate condition for such separable problems is that the e stays in the range [X,X + ] as m is updated. In this subsection we examine the dependence of e on α with a view to understanding how to reset α when m changes. Accordingly, regard m as fixed and suppress it from the notation for the remainder of this subsection, and introduce the abbreviations e(α) = e[m, x[m, α]] (9) p(α) = p[m, x[m, α]] () Differentiating equation 6 with respect to α leads to the relation (F T F + αa T A) dx dα = AT Ax ()

186 8 Fu & Symes MSIVA whence de dα = dx dα,ft (Fx d) = α dx dα,at Ax = α A T Ax,(F T F + αa T A) A T Ax () Note that the inequality in is strict if p > hence A T Ax, since the normal operator is assumed to be positive definite. The derivative of p(α) with respect to α is similarly a strict inequality if p >. dp dα = AT Ax,(F T F + αa T A) A T Ax (3) Equation together with equation 3 show that increasing α implies increasing e while decreasing p, and A T Ax,(F T F + αa T A) A T Ax = (A T A) / x,[(a T A) / F T F(A T A) / + αi] (A T A) / x α AT Ax,x = α p In view of equation, de p. (4) dα with this inequality also being strict if p >. Suppose the current weight is α c and denote a candidate for an updated weight by α +. Then from inequality 4, e(α + ) e(α c ) α+ α c pdα If α + α c, then in view of inequality 3, the above is p(α c )(α + α c ) (5)

187 MSIVA Discrepancy Principle 83 Let us suppose that e(α c ) < X +. Then setting α + = α c + X + e(α c ) p(α c ) (6) implies via inequality 5 that e(α + ) e(α c ) X + e(α c ). Assuming that p(α + ) >, hence p(α) > for α c α α +, we conclude that if α + is given by the rule 6, then e(α c ) < e(α + ) X + (7) That is, unless p(α + ) = (in which case a physical solution of the inverse problem has been reached), e(α + ) is larger than e(α c ) but in any case does not exceed X +. The rule 6 therefore provides a feasible updated α consistent with the upper bound in the discrepancy principle. Practical application of the discrepancy principle As mentioned in the introduction, the discrepancy principle requires that a range of data noise (in our notation, one-half data noise squared) [X,X + ] be given. We base our algorithm on a data error estimate X, and set X = γ X,X + = γ + X, where γ < < γ + are positive constants at the disposal of the user: typical values might be γ = (.7),γ + = (.) (we use these values in the experiment reported below). The choice of data error estimate X remains. Two approaches to this choice are (i) treat it as hypothetical, with all subsequent results being contingent on it, and choose initial model (m,x[m,]) for α = so that X = e() = e[m,x[m,]]; (ii) if the normal equation 6 is solved approximately by an iterative method (we used conjugate gradient iteration (CG)), choose a number of iterations to be used throughout and choose x[m,] as an approximate solution of 6 computed by the chosen number of iterations, and X = e() = e[m,x[m,]]. We used the second method in the experiments described in the next section. Of course, the second approach is really a variant of the first, with indirect rather than direct choice of the hypothetical noise level X. Either approach make sense only for inverse problems of the character described in this paper, in which the unconstrained (α = ) data misfit may be made arbitrarily small by choice of the linear parameter

188 84 Fu & Symes MSIVA x, for any choice of the nonlinear parameter m. Note the implication for X: in principle, e() =! Therefore a non-zero X must be selected, which embodies the actual (and initially unknown) data error e for p =, and a corresponding linear parameter x for which e() = e[m, x]. The second approach outlined above does this in a natural way. However, the arbitrariness of the choice cannot be avoided. The progress of the algorithm and its end result clearly depend on X. We will address this dependence and its implications in the discussion section. With either approach to choice of X, the algorithm proceeds as follows (a flowchart is shown in Figure ):. Choose initial m, set α =, compute x[m,], X = e[m,x[m,]], X ± = γ ± X.. While (not done),. While e[m,x[m,α]] [X,X + ], update m by means of a continuous optimization algorithm, using the gradient as given in equation 7.. If e[m,x[m,α]] > X +, exit.3 if e[m,x[m,α]] < X,.3. Compute α + by equation While e + = e[m,x[m,α + ]] [X,X + ],.3.. If e + < X, set α + α If e + > X +, set α + α + / Set α α + EXAMPLES In this section we illustrate the performance of the discrepancy-based algorithm by solving a velocity estimation problem modeled on reflection seismology. We use the subsurface offset extension of the D Born (linearized) constant density acoustic model. The forward modeling operator F in our case is the subsurface extended Born modeling operator. For a detailed description of this model, its origins, and its properties, see Symes (8). In this separable model, the acoustic wave velocity field v is the nonlinear parameter (denoted m in the discussion

189 MSIVA Discrepancy Principle 85 Initial! Calculate!(! = ) Estimate!! and!! Update!! =! +!!!!(!)!!(!) False!! >!! " True Calculate!(!! ) Calculate!(!) True! [!!,!!! ]" Update! False!! = or!! /=.5 Figure : A flowchart for our implementation of our proposed algorithm.

190 86 Fu & Symes MSIVA above), and the reflectivity r (proportional to the perturbation of v) is the linear parameter (x in the abstract discussion). The quantities appear as coefficients in the wave equations satisfied by the pressure field u and its perturbation δu: ( ) t v(x,z) u(t,x,z;x s ) = w(t)δ(x x s )δ(z z s ) (8) ( ) t v(x,z) δu(t,x,z;x s ) = H H dhr(x,z,h) u(t,x,z + h;x s ) Note that the velocity v depends on the spatial coordinates x,z, whereas the reflectivity r depends on another coordinate h, representing subsurface (half-)offset. The physical meaning of this dependence is that action-at-a-distance is permitted in this model: to first order in perturbation theory, strain at one space-time position (x h,z) is allowed to cause stress instantaneously at a different position at the same depth (x, z). The introduction of spatial shift compensates velocity errors, which permits data to be fit well for arbitrary v. Thus this model has the feature required by our construction of the discrepancy-based algorithm. The right-hand side of equation 8 represents an isotropic point radiator located at x = x s,z = z s with time dependence w(t). The forward map F[v]r is defined by F[v]r(x r,t;x s ) = δu(x r,z r,t;x s ) in which x r,z r range over the receiver positions of the modeled survey. If the reflectivity is concentrated or focused at h =, that is, r(x,z,h) = r (x,z)δ(h), then the perturbation wave equation 9 reduces to the ordinary perturbation equation of linearized acoustics, and the model to the ordinary acoustic Born model. The aim of the inverse problem is to fit data with an appropriate velocity and a physical reflectivity. Thus an appropriate choice of annihilator is A = multiplication by h: its null space is precisely the collection of r having a factor of δ(h). In numerical implementation of the wave equations 8,9, we use a centered finite difference method of order in time and 8 in space. A 4 m water layer is added on the top. The computational domain is padded by a sufficient number of

191 MSIVA Discrepancy Principle 87 cells to eliminate boundary reflections from the measured data traces (at positions x r,z r ). The example is modified from the SEG/EAGE 3D overthrust model (Aminzadeh et al., 997). In the reflectivity model, horizontal layers are distorted by several thrust (reverse) faults (shown in Figure b). The background velocity increases with depth (Figure a). The velocity is higher in the center, where the anticline structure sits. The basic information is listed in Table. Parameter Measurements Source wavelet bandpass 5 Hz Source position x s x : 7 km every 4 m, z = 4 m Receiver position x r x : 8 km every 4 m, z = m Space and time x = 8 km, z = km, t = 3 s Grid size dx = dh = dz = m, dt = ms Initial velocity v =.5 km/s Max iter inner loop Table : Parameters for overthrust model example Figure 3a shows the initial background velocity is constant (.5 : km/s) and far away from the correct one, so the geological structures can barely be observed from the extended image (Figure 3b). In the subsurface offset gather, the downward curves indict slow velocity. First, given initial velocity, set weight α =, solving the normal equation by iterations of CG gives the data residual term e() =.8e, which approximately equals 5.7% of relative data residual. Based on the value of e(), the upper and lower bound of accept range are estimated by formula X =.7 e() and X + =. e(). Then the updated weight α =.e 6 is estimated by equation 6. After velocity update, the data residual drops below the lower bound X, so we updated weight α = 4.e 6 again. The residual is still smaller than X. According to update algorithm, the value of α was doubled three time, which gives data residual in acceptable range. After 3 more velocity updates with weight α = 3.3e 5, data residual became smaller than X. Recalculating α and doubling its value made the data residual back to the acceptable range. After 9 more velocity updates, the data residual dropped below the lower bound, the updated weight α =.4e 4 provides data residual in acceptable range.

192 88 Fu & Symes MSIVA Figure : (a) True background velocity model; (b) Reflectivity model.

193 MSIVA Discrepancy Principle 89 Figure 3: (a) Initial background velocity; (b) Extended reflectivity

194 9 Fu & Symes MSIVA The inverted background velocity model from iterations and the extended reflectivity are shown in Figure 4a and Figure 4b. The anticline and reverse fault structures can be clearly observed. Furthermore, even the reflector beneath the anticline is positioned correctly (at about x = 4 km,z = km). The velocity error is mostly at the edges, which is a result of imperfect illumination. The subsurface gathers are mostly focused towards h = after inversion. The gathers are much more focused towards h = after inversion (Figure 5). With the inverted background velocity, the geological structures are imaged with much higher resolution at -offset section of the reflectivity model (Figure 6). Extending the reflectivity model permits good data fit throughout the inversion process (Figure 7 and 8). Note that in order to improve the computational efficiency, the adequate subsurface offset range was estimated adaptively by measuring data fit throughout the inversion process (Fu and Symes, 5). We increase weight α at iteration number, 4 and 3, which significantly accelerate the convergence rate as shown by Figure 9. At the same time, according to Morozov s discrepancy principle, the data residual stays in an acceptable range (see Figure ). For comparison, results from the conventional FWI are shown in Figure. Without model extension, the conventional FWI clearly fails to recover the correct models with the exact the same experiment setting as the extended case. DISCUSSION Because the update rule formulated as equation 6 is deviated by inequation, the new value of α is more or less underestimated. Step.3.. and.3.. are introduced to ensure the data residual within an acceptable range, which is required by Morozov s discrepancy principle. Based on the previous example, we observe that adjusting weight α according to Morozov s discrepancy principle significantly improves the convergence of extended waveform inversion. To further investigate the behavior of the objective function affected by weight α, a scan experiment of the objective function is conducted. The scan test measures the value of objective function with different velocity errors in a particular direction of the model space (see Figure ). The

195 MSIVA Discrepancy Principle 9 Figure 4: (a) Inverted background velocity; (b) Extended reflectivity

196 9 Fu & Symes MSIVA Figure 5: (a) Subsurface offset gathers at x =,3,4,5,6 km using initial background velocity; (b) Subsurface offset gathers using inverted background velocity

197 MSIVA Discrepancy Principle 93 Figure 6: (a) Zero offset section with the reflectivity model using initial background velocity. (b) Zero offset section with the reflectivity model using inverted background velocity.

198 94 Fu & Symes MSIVA Figure 7: Data gather for a shot at the center of the model (a) Observed data. (b) Predicted data with initial background velocity model. (c) predicted data with inverted background velocity model.

199 MSIVA Discrepancy Principle 95 Figure 8: Comparison of data residual (a) with initial background velocity model and (b) with inverted background velocity model numbers with negative or positive sign represent the percentage of the velocity that is slower or faster than the true velocity. The result shows that the choice of weight parameter α determines the convexity of the objective function. On the onset of the outer iteration, small values of α provide more degree of freedom to the model in order to fit data, which exhibits fast convergence rate for models with large velocity error. However, when the model parameters are still far from the optimal ones, a large penalty weight would not be able to provide enough degrees of freedom to the model, so the model updates fail to proceed towards correct directions. On the other hand, when velocity is closer to the correct value, small α is not able to provide enough sensitivity to the velocity error, which suggests that it is necessary to increase α in order to keep adequate data residual. As the velocity becomes closer to the true one, larger value of α is needed to ensure fast convergence. In our experiments, iterations of CG is used to ensure accuracy of solving the inner loop problem, corresponding to relative data residual 5.7% for α =. In principle, when the data conforms to the model, by introducing non-physical degrees of freedom to physical models, any observed data can be predicted with extended waveform inversion. In application to field data, all models are more or less inadequate. Although it might be still possible to fit data well with extended model, we will not be able to drive the penalty term p towards while keeping the data misfit e within an acceptable range, at least with an acoustic constant density model. As a result, in any practical application of the discrepancy-based

200 96 Fu & Symes MSIVA Relative model misfit α=.e 6 α=.e 6 α=3.e 5 α=.e 4 α=.4e Iteration number Figure 9: Model convergence rates versus iteration numbers: different colors represent different values of weight α; dashed lines show convergence rate without changing α.

201 MSIVA Discrepancy Principle 97 Relative data misfit α= α=.e 6 α=3.e 5 α=.e 4 α=.4e 4 X+=6.9% X =4.% Iteration number Figure : Relative data misfit versus iteration: different colors represent different values of weight α

202 98 Fu & Symes MSIVA Figure : FWI results from iterations of updates (a) Inverted background velocity; (b) Extended reflectivity

Overview Avoiding Cycle Skipping: Model Extension MSWI: Space-time Extension Numerical Examples

Overview Avoiding Cycle Skipping: Model Extension MSWI: Space-time Extension Numerical Examples Guanghui Huang Education University of Chinese Academy of Sciences, Beijing, China Ph.D. in Computational