A projected Hessian for full waveform inversion

Size: px

Start display at page:

Download "A projected Hessian for full waveform inversion"

Avice Jenkins
5 years ago
Views:

CWP-679 A projected Hessian for full waveform inversion Yong Ma & Dave Hale Center for Wave Phenomena, Colorado School of Mines, Golden, CO 80401, USA (c) Figure 1.

ABSTRACT A Hessian matrix in full waveform inversion (FWI) is difficult to compute directly because of high computational cost and an especially large memory requirement.

We modify the BFGS method to use a projected Hessian matrix that reduces both the computational cost and memory required, thereby making a quasinewton solution to FWI feasible.

1 CWP-679 A projected Hessian for full waveform inversion Yong Ma & Dave Hale Center for Wave Phenomena, Colorado School of Mines, Golden, CO 80401, USA (c) Figure 1. Update directions for one iteration of the conjugate gradient method, the image-guided conjugate gradient method, and a quasi-newton method with application of the inverse projected Hessian (c). ABSTRACT A Hessian matrix in full waveform inversion (FWI) is difficult to compute directly because of high computational cost and an especially large memory requirement. Therefore, Newton-like methods are rarely feasible in realistic largesize FWI problems. We modify the BFGS method to use a projected Hessian matrix that reduces both the computational cost and memory required, thereby making a quasinewton solution to FWI feasible. Key words: projected Hessian matrix, BFGS method 1 INTRODUCTION Full waveform inversion (FWI) (Tarantola, 1984; Pratt, 1999) is usually formulated as an optimization problem (Symes, 2008), in which we consider the minimization of a nonlinear objective function E: RN R, min E (m), m RN (1) where the function variable m denotes N parameters, such as seismic velocities, for an earth model. In FWI, the objective function often takes a least-squares form: E (m) = 21 kd F (m) k2, where k.k denotes an L2 norm, d denotes the recorded data, and F is a forward datasynthesizing operator, a nonlinear function of the model m. Various iterative methods (Nocedal and Wright, 2000; Tarantola, 2005) for solving this least-squares optimization problem include the conjugate gradient method, Newton s method, the Gauss-Newton method, and various quasi-newton methods. Compared to the conjugate gradient method, Newton s method and other Newton-like methods generally converge faster in fewer iterations. However, Newton-like methods are rarely used to solve realistic seismic full waveform problems, because for such large problems they are costly. Newtonlike methods, especially Newton s method, are suitable only for solving small- or medium-size optimization problems (Pratt et al., 1998; Sheen et al., 2006; Virieux and Operto, 2009), in which the number N of model parameters ranges from hundreds to thousands. Newton s method must compute the 1st and 2nd order derivatives of the objective function E (m) with respect to model parameters in m. We refer to these two E and the Hesderivatives as the gradient g (m) m 2 E sian H (m) m2, respectively. The Hessian matrix, in particular, consists of large numbers O(N 2 ) of 2nd

2 14 Y. Ma & D. Hale derivatives. To compute each 2nd derivative in the most straightforward way, we must solve one forward problem F (m). Therefore, the Hessian matrix computed in this way requires the solution of O(N 2 ) forward problems. Pratt et al. (1998) show a more efficient method that reduces the number of required forward solutions from O(N 2 ) to O(N), which is still too costly in realistic large-size FWI problems. Moreover, the Hessian matrix consumes large amounts of memory. In a model with N parameters, a Hessian matrix stored in single precision requires 4N 2 bytes of memory. Because of the high cost of Newton s method for large problems, various methodologies have been proposed to approximate the Hessian matrix. The Gauss- Newton method, for example, ignores 2nd-order terms that account for nonlinearity in the function F (m). This approximation is valid only when the forward operator F is a linear or quasi-linear function of m. If the initial model in FWI is far from the true model and nonlinearity is significant, the Gauss-Newton method may not converge. Quasi-Newton methods do not compute the Hessian matrix explicitly, but instead iteratively update Hessian approximations. The BFGS method (Broyden, 1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970) is an efficent way to update the Hessian matrix. However, although the BFGS method reduces the computation time required to approximate a Hessian matrix, it does not reduce the computation time required to use the Hessian matrix to update model parameters, nor does it decrease the amount of memory required to store Hessian approximations. In practical applications of quasi- Newton methods for FWI, we must reduce both computation time and memory consumption significantly. In this paper, we first review various Newton-like methods. We then pose FWI in a sparse model space and introduce a projected Hessian matrix in that space. We obtain the projected Hessian by using a projected BFGS method, a version of BFGS in the sparse model space. A projected Hessian matrix built in this way significantly reduces both computation time and memory consumption. Tests of our projected Hessian on the Marmousi II model suggest that quasi-newton methods like ours may be promising in practical applications of FWI. 2 NEWTON-LIKE METHODS We consider only iterative methods for solution of the nonlinear FWI optimization problem. In the i th iteration of such methods, we update the model m in the direction of a vector p i : m i+1 = m i + α ip i, (2) for some step length α i. Newton s method, the Gauss- Newton method, and quasi-newton methods differ in the way in which they compute and use the update vector p i. 2.1 Newton s method In Newton s method, the update vector p i is determined by first ignoring the higher-order (> 2) terms in a Taylor series approximation of the objective function E (m): E (m i + δm i) E (m i) + δm T i g i δmt i H iδm i, (3) where E (m i) denotes the objective function evaluated at m i, g i = g (m i), and H i = H (m i). For models m near the current model estimate m i, Equation 3 approximates the objective function with a quadratic function. We then minimize this approximated E (m) by solving a set of linear equations: with solution H iδm i = g i (4) δm i = H i g i. (5) Therefore, the update direction in Newton s method is simply p i = H i g i, (6) and the step length α i = 1. However, for models with large numbers of parameters, high costs for computing the Hessian matrix H i prevent the application of Newton s method. To analyze the cost, let us explicitly write each element of H i as H jk = 2 E m j m k, (7) where j and k are parameter indices in the model m, ranging from 1 to the total number N of model parameters. If, say, N = in a 2D FWI problem, the Hessian H i is a matrix. To compute this Hessian matrix in the most straightforward way, we must solve forward problems F (m) in each iteration of FWI. Even using the more efficient way in Pratt et al. (1998), we can only reduce this number to per iteration. For FWI, which normally requires multiple iterations, this cost is prohibitive. Furthermore, the memory required to store this Hessian matrix in single precision is 1 terabyte, or about 1/2 terabyte considering symmetry in the Hessian. 2.2 Gauss-Newton method Discarding the second term on the right-hand side of equation 7, we get an approximated Hessian H a, with elements H jk J T j J k, (8)

3 Projected Hessian 15 where J j and J k are the j th and k th columns in the Jacobian matrix { F(m) m 1, F(m) m 2,..., F(m) m N }, respectively. Using this approximation we obtain the Gauss-Newton method with update direction p i = H a g i. (9) Equation 8 shows that the cost of computing H a is equivalent to that of computing the gradient g i. Pratt et al. (1998) show an efficient approach for calculating g i, and only 2 forward data syntheses F (m) are necessary. However, the approximation H a in effect assumes that the function F (m i + δm i) is linear with respect to the difference δm i between the true model m and the current model m i. In practice this assumption is seldom realistic. 2.3 Quasi-Newton method A quasi-newton method iteratively approximates the Hessian matrix. The BFGS method (Broyden, 1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970) is currently considered the most popular (Nocedal and Wright, 2000) quasi-newton update formula. The basic idea of BFGS is to update the Hessian using changes in both the model m i and the gradient g i, from one iteration to the next: H i+1 = H i + y iy T i Hiδmi (Hiδmi)T, (10) y T i δmi δm T i Hiδmi where y i = g i+1 g i, δm i = m i+1 m i. The BFGS method normally begins with an identity matrix H 0 = I in the first iteration, so that the first iteration of BFGS is similar to a gradientdescent method. Pseudo-code for implementing the BFGS method is as follows (Nocedal and Wright, 2000): given m 0, g 0 = E m 0, and H 0 = I for i = 0, 1, 2,..., until convergence do p i = H i g i search for α i, δm i = α ip i m i+1 = m i + δm i g i+1 = E (m i+1) y i = g i+1 g i H i+1 = H i + y i yt i H iδm i (H i δm i ) T y T i δm i δm T i H iδm i end for The BFGS method is an efficient way to compute Hessian approximations. Like the Gauss-Newton method, BFGS requires computation of only gradients g i, with the cost of only 2 forward modelings F (m). However, unlike the the Gauss-Newton method, BFGS does not simply ignore 2nd derivative terms in the Hessian. Instead, the BFGS method uses differences in models and gradients between iterations to construct a complete Hessian approximation. Unfortunately, the BFGS method does not reduce the O ( N 3) cost of solving the set of linear equations (equation 4) for the update vector p i, nor does it reduce the O ( N 2) memory required to store the Hessian approximation. 3 PROJECTED HESSIAN MATRIX The only way to reduce these costs is to reduce the number N of model parameters. For this purpose, we introduce a projected Hessian matrix in a sparse model space. 3.1 Sparse optimization In an approach similar to that used in subspace methods (Kennett et al., 1988; Oldenburg et al., 1993) and alternative parameterizations (Pratt et al., 1998, Appendix A), we construct a finely- and uniformly-sampled (dense) model m from a sparse model s that contains a much smaller number n << N of model parameters: m = Rs, (11) where R is an interpolation operator that interpolates model parameters from the sparse model s to the dense model m. FWI is then reformulated as a new sparse optimization problem (Pratt et al., 1998, Appendix A; Ma et al., 2010), in which we minimize a new nonlinear objective function E: R n R, min E (Rs). (12) s Rn In the optimization problem posed in equation 12, model parameters in m are not determined independently as in equation 1. Instead, the n sparse parameters in s constrain the other N n parameters in m. This constraint is imposed by the interpolation operator R. Therefore, the optimization problem in equation 12 is equivalent to a linearly constrained optimization problem in the dense model space m, where the number of constraints is N n. In the context of constrained optimization, the projected Hessian matrix is suggested by Gill et al. (1981) and several other authors. We rewrite the objective function in equation 3 as E (Rs i + Rδs i) E (Rs i) + δs T i R T g i δst i R T H irδs i, (13) where we refer to R T H ir as a projected Hessian matrix (Gill et al., 1981), and R T g i as a projected gradient. Here R T denotes the adjoint of the interpolation operator R. For models s near the current model estimate s i, equation 13 approximates the objective function with a quadratic function. We then minimize this approximated E (Rs) by solving a set of n linear equations: R T H irδs i = R T g i, (14)

4 R T H i R RHR 16 Y. Ma & D. Hale R T H i R = Hi Figure 2. Schematic for computing a projected Hessian H i. The projected Hessian reduces computation time and memory consumption. with a solution for the n unknowns ( δs i = R T H ir) R T g i. (15) Therefore, in Newton s method the update direction vector becomes ( p i = R T H ir) R T g i, (16) for a step length α i = 1. The projected Hessian R T H ir is a n n symmetric matrix. Figure 2 describes this projected Hessian matrix in a schematic fashion, where the tall and thin rectangle denotes the interpolation operator R, the short and fat rectangle denotes the adjoint operator R T, and the large and small squares denote the original and projected Hessian matrices, respectively. This schematic diagram illustrates how memory consumption is reduced from O ( N 2) to O ( n 2). Computation is likewise reduced from O ( N 3) to O ( n 3). 3.2 Projected BFGS method A projected Hessian matrix can be approximated iteratively with an approach similar to the classic BFGS method. Apply the interpolation operator R and its adjoint operator R T to both sides of equation 10 to obtain R T H i+1r = R T H ir + RT y i y T i R y T i δmi RT H iδm i ( R T H iδm i ) T δm T i Hiδmi. (17) Using the simple relationship δm i rewrite equation 17 as R T H i+1r = R T H ir + RT y i y T i R y T i Rδsi = = Rδs i, we can RT H irδs i ( R T H irδs i ) T δs T i RT H irδs i. (18) Now simplify equation 18 by defining H i R T H ir, g i R T g i, and ỹ i R T y i = g i+1 g i to obtain an update formula for the projected Hessian matrix: H i+1 = H i + ỹiỹt i ỹ T i δsi H iδs i ( Hiδs i ) T δs T i H iδs i. (19) We can further simplify equation 19 with δs i = α i p i and H iδs i = α i g i (Gill et al., 1981): H i+1 = H i + ỹiỹt i α iỹ T i p i + g i gt i p T i g. (20) i Equation 20 has the same structure as equation 10, so we refer to this formula for updating the projected Hessian as the projected BFGS method, a version of the classic BFGS method in the sparse model space s. Pseudo-code for implementing this projected BFGS method is as follows: given s 0, g 0 = R T g 0, and H 0 = I for i = 0, 1, 2,..., until convergence do p i = H i g i search for α i, δs i = α i p i m i+1 = m i + Rδs i g i+1 = E (m i+1), g i+1 = R T g i+1 ỹ i = g i+1 g i H i+1 = H i + (equation 20) end for 3.3 Wolfe conditions ỹiỹt i + g i gt α i ỹ T i i p i p T i g i The projected Hessian matrix H is symmetric positivedefinite (SPD) if ỹ T i δs i > 0 in equation 19. This SPD property is important for an optimization problem, because it guarantees that a quasi-newton method that uses this projected Hessian converges to (possibly local) minimum. This SPD property can be achieved if the step length α i satisfies the sufficient descent condition E (s i + α i p i ) E (α i) E (0) + c 1E (0) α i, (21) and the curvature condition E (α i) c 2E (0), (22) where E (0) = g T i p i and E (α i) = g T i+1 p i are directional derivatives. In equations 21 and 22, c 1 and c 2 are constants (0, 1). Suggested values for c 1 and c 2 are and 0.9 (Nocedal and Wright, 2000), respectively. Together, equations 21 and 22 are referred to as the Wolfe conditions (Nocedal and Wright, 2000). If the curvature condition in equation 22 is replaced by E (α i) c 2 E (0), (23) then the Wolfe conditions become the strong Wolfe conditions used in this study.

5 Projected Hessian 17 Figure 4. Structurally constrained sample selections. A total of 165 samples are chosen for our projected BFGS method. 4.1 Projected Hessian and its inverse Figure 3. The Marmousi II velocity model and an initial velocity model. 3.4 Cost of projected BFGS In each iteration of the projected BFGS method, we perform at least one forward calculation F (m) and compute at least one gradients g. Therefore, in the ideal case, the cost of each iteration of the projected BFGS method is that of three forward calculations F (m). Furthermore, if the number of parameters n in the sparse model space s is much smaller than the number N in the dense model m, the amount of memory required to store the projected Hessian is significantly reduced, as shown in Figure 2. 4 EXAMPLES We test our projected BFGS method on the Marmousi II model. Figure 3a shows the true model m, and Figure 3b shows the initial model m 0, a highly smoothed version of the true model. We use 11 shots uniformly distributed on the surface, and a 15Hz Ricker wavelet as the source for simulating wavefields. The source and receiver intervals are 0.76 km and km, respectively. In this example, the dense model space has N = parameters. Therefore, computation or storage of the true Hessian matrix for this example is infeasible. Our projected BFGS method poses the optimization problem in a sparse model space s. As described by Ma et al. (2011), we construct the sparse model s using a structurally constrained sample selection scheme. This selection method considers structures apparent in migrated images, and places samples along structural features. Figure 4 shows a total of 165 scattered sample locations. This set of scattered sample locations are representative, because (1) they lie between (not on) reflectors; (2) they are distributed along structural features; (3) there are more samples in structurally complex areas and fewer samples in simple areas. The scattered locations together with corresponding values at these locations comprise a sparse model space s that will be used in the projected BFGS method. We employ image-guided interpolation (IGI) (Hale, 2009) as the operator R and the adjoint of image-guided interpolation (Ma et al., 2010) as the operator R T. IGI interpolates values in the sparse model s to a uniformly sampled dense model m, and the interpolant makes good geological sense because it accounts for structures in the model. Figure 5a shows the initial projected Hessian H 0, an identity matrix. Figure 5b shows the updated projected Hessian H 1 after one iteration of the projected BFGS method. The update H 1 H 0 to the Hessian in this first iteration is shown separately in Figure 5c. As we can see, the projected BFGS method adds significant off-diagonal components to the initial Hessian H 0. Because our line search satisfies the strong Wolfe conditions, the inverse of the projected Hessian matrix exists. Figure 5d shows the inverse Hessian H 1.

18 Y. Ma & D. Hale (c) (d) Figure 5. The initial Hessian H 0, the Hessian matrix H 1 after the 1st iteration, the Hessian update H 1 H 0 in the 1st iteration (c), and the inverse Hessian H 1 (d). 4.

previous Hessian approximation. Figures 6a, 6b, 6c and 6d show the eigenvalues of the projected Hessian H i in the 1st, 4th, 7th and 10th iterations, respectively.

6 18 Y. Ma & D. Hale (c) (d) Figure 5. The initial Hessian H 0, the Hessian matrix H 1 after the 1st iteration, the Hessian update H 1 H 0 in the 1st iteration (c), and the inverse Hessian H 1 (d). 4.2 Eigenvalues and eigenvectors Note that the last two terms of the right-hand side of equation 19 are two rank-one matrices, so each iteration of the projected BFGS method is a rank-two update to the previous Hessian approximation. Figures 6a, 6b, 6c and 6d show the eigenvalues of the projected Hessian H i in the 1st, 4th, 7th and 10th iterations, respectively. As suggested by the rank-two update, the projected BFGS method changes only two eigenvalues in each iteration: one, the smallest, and the other, the largest. Figures 7a and 7b show two eigenvectors of the projected Hessian H 1, corresponding to the largest and the smallest eigenvalues, respectively. The eigenvalues and eigenvectors of the projected Hessian have geometric meaning. Consider the objective function E (m) as a multidimensional parabolic bowl (Thacker, 1989). The eigenvector associated with the largest eigenvalue points in the direction of greatest curvature on the bowl. Updating the model m along the eigenvector direction shown in Figure 7a, we get the largest rate of change in the objective function E (m). This eigenvector indicates the component of the model m that can be best determined in FWI. Likewise, the eigenvector associated with the smallest eigenvalue indicates the direction of least curvature. Updating the model m along the eigenvector direction shown in Figure 7b yields the smallest rate of change of E (m). This eigenvector indicates the component of the model m that can be least well determined in FWI. 4.3 Projected quasi-newton FWI The projected BFGS method updates the model in each iteration, and therefore the projected BFGS method can be directly used in quasi-newton solutions to FWI. The key difference between quasi-newton FWI and

Projected Hessian 19 1.5 2.5 Eigenvalues 1 0.5 Eigenvalues 2 1.5 1 0.5 0 0 25 50 75 100 125 150 Index 2.5 0 0 25 50 75 100 125 150 Index 3.5 Eigenvalues 2 1.5 1 0.5 0 0 25 50 75 100 125 150 Index (c) Eigenvalues 3 2.

Eigenvectors of H 1 corresponding to the largest and the smallest eigenvalues.

7 Projected Hessian Eigenvalues Eigenvalues Index Index 3.5 Eigenvalues Index (c) Eigenvalues Index (d) Figure 6. Eigenvalues of projected Hessian in 1st, 4th, 7th (c), and 10th (d) iterations. Figure 7. Eigenvectors of H 1 corresponding to the largest and the smallest eigenvalues. Geometrically, an eigenvector is a direction on the multidimensional surface of an objective function E (m), and the associated eigenvalue determines the curvature on the surface. If the model is updated in the eigenvector direction in, the rate of change of E (m) is largest. If the model is updated in the eigenvector direction in, the rate of change of E (m) is smallest.

8 20 Y. Ma & D. Hale Table 1. Update directions p i for the conjugate gradient (CG) method, the image-guided CG method, and our quasi- Newton method. Method Conjugate gradient Image-guided CG Quasi-Newton Update direction p 0 = g 0 β i β ( g i, g i ) = g T i (g i g i ) g T i g i p i = g i + β i p i p 0 = RR T g 0 β i = β ( RR T g i, RR T g i ) p i = RR T g i + β i p i H 0 = I p i = R H i R T g i other solutions, such as the conjugate gradient method and Newton s method, is the update direction. Table 1 compares the update directions used in the conjugate gradient method, the image-guided conjugate gradient method (Ma et al., 2011), and the quasi-newton method. As suggested in Ma et al. (2011), the image-guided conjugate gradient method uses a gather-scatter process in applying RR T, and thereby generates low wavenumbers not present in the simple conjugate gradient due to the lack of low frequencies in recorded data. The conjugate gradient in Figure 1a and the image-guided conjugate gradient in Figure 1b illustrate this difference. Figures 8a and 8c show inversion results estimated by the conjugate gradient (CG) method and the imageguided CG method, respectively, after two iterations. The corresponding estimated models in the 10th iteration are shown Figure 8b and 8d, respectively. We can see that both the conventional CG and the imageguided CG methods mainly update the shallow part of the model. This is due to the fact that both the conjugate gradient and the image-guided conjugate gradient have unbalanced amplitudes, high in the shallow part and low in the deeper part. In the case of quasi-newton FWI, the update direction shown in Figure 1c not only contains significant low wavenumbers, as does the image-guided conjugate gradient in Figure 1b, but the amplitudes are more balanced as well. In Figure 1c, the amplitudes are higher in the deeper part. This improvement is due to the use of H i the inverse projected Hessian matrix (see the last row in Table 1), which works as a filter applied to the gradient. Figures 8e and 8f show the models estimated in the 2nd and 10th iterations of the quasi-newton method, respectively. As we can see, quasi-newton FWI updates the deeper part of the model in early iterations. Figure 9 shows the convergence rates of the quasi- Newton method in this Marmousi II example with the data misfit function and the model misfit function (an L2 norm of the difference between the true model and the estimated model). Overall, the quasi-newton FWI shows faster convergence. 5 CONCLUSIONS A projected BFGS method iteratively approximates the Hessian matrix in FWI, thereby reducing both computation time and required memory. The projected BFGS method enables FWI to be performed using a quasi- Newton method. As demonstrated by the Marmousi II example, quasi-newton FWI converges in fewer iterations. Compared with the conjugate gradient method or the image-guided conjugate gradient method, the primary disadvantage of our projected BFGS method is the computational cost of a single iteration, which is relatively high because the line search must satisfy the Wolfe conditions. Therefore, a further investigation of the coefficients in the Wolfe conditions is necessary. 6 ACKNOWLEDGMENT This work is sponsored by a research agreement between ConocoPhillips and Colorado School of Mines (SST SRA). Yong Ma thanks Diane Witters for help in learning how to polish this manuscript. REFERENCES Broyden, C. G., 1970, The convergence of a class of double-rank minimization algorithms: IMA Journal of Applied Mathematics, 6, Fletcher, R., 1970, A new approach to variable metric algorithms: The Computer Journal, 13, Gill, P. E., W. Murray, and M. H. Wright, 1981, Practical optimization: Academic Press, London. Goldfarb, D., 1970, A family of variable-metric methods derived by variational means: Mathematics of Computation, 109, Hale, D., 2009, Image-guided blended neighbor interpolation of scattered data: SEG Technical Program Expanded Abstracts, 28, Kennett, B., M. Sambridge, and P. Williamson, 1988, Subspace methods for large inverse problems with multiple parameter classes: Geophysical Journal International, 94, Ma, Y., D. Hale, B. Gong, and Z. Meng, 2011, Imageguided full waveform inversion: CWP Report. Ma, Y., D. Hale, Z. Meng, and B. Gong, 2010, Full waveform inversion with image-guided gradient: SEG Technical Program Expanded Abstracts, 29,

Projected Hessian (c) (d) (e) (f) 21 Figure 8. Inverted velocity models for the conjugate gradient method (a,b), the image-guided conjugate gradient method (c,d), and the quasi-newton method (e,f).

9 Projected Hessian (c) (d) (e) (f) 21 Figure 8. Inverted velocity models for the conjugate gradient method (a,b), the image-guided conjugate gradient method (c,d), and the quasi-newton method (e,f). Left (a,c,e): the 2nd iteration; right (b,d,f): the 10th iteration. Nocedal, J., and S. J. Wright, 2000, Numerical Optimization: Springer. Oldenburg, D., P. McGillvray, and R. Ellis, 1993, Generalized subspace methods for large-scale inverse problems: Geophysical Journal International, 114, Pratt, R., C. Shin, and G. Hicks, 1998, Gauss-Newton and full newton methods in frequency-space seismic waveform inversion: Geophysical Journal International, 133, Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, part 1: Theory and verification in a physical scale model: Geophysics, 64, 888. Shanno, D. F., 1970, Conditioning of quasi-newton methods for function minimization: Mathematics of Computation, 111, Sheen, D. H., K. Tuncay, C. E. Baag, and P. J. Ortoleva, 2006, Time domain gauss-newton seismic waveform inversion in elastic media: Geophysical Journal International, 167, Symes, W. W., 2008, Migration velocity analysis and waveform inversion: Geophysical Prospecting, 56, Tarantola, A., 1984, Inversion of seismic-reflection data

10 22 Y. Ma & D. Hale Figure 9. Convergence of the conjugate gradient method, the image-guided conjugate gradient method, and the projected quasi-newton method: the data misfit function and the model misfit function. in the acoustic approximation: Geophysics, 49, , 2005, Inverse problem theory and methods for model parameter estimation: Society for Industrial and Applied Mathematics. Thacker, W. C., 1989, The role of the Hessian matrix in fitting models to measurements: Journal of Geophysical Research, 94, Virieux, J., and S. Operto, 2009, An overview of fullwaveform inversion in exploration geophysics: Geophysics, 74, WCC1 WCC26.

Comparison between least-squares reverse time migration and full-waveform inversion

Comparison between least-squares reverse time migration and full-waveform inversion Lei Yang, Daniel O. Trad and Wenyong Pan Summary The inverse problem in exploration geophysics usually consists of two