Inversion of Phase Data for a Phase Velocity Map 101. Summary for CIDER12 Non-Seismologists

Similar documents
Structural Cause of Missed Eruption in the Lunayyir Basaltic

Global surface-wave tomography

Supporting Information for An automatically updated S-wave model of the upper mantle and the depth extent of azimuthal anisotropy

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

SOEE3250/5675/5115 Inverse Theory Lecture 2; notes by G. Houseman

Regularizing inverse problems. Damping and smoothing and choosing...

= (G T G) 1 G T d. m L2

Geophysical Data Analysis: Discrete Inverse Theory

Supplementary Figure 1. Distribution of seismic event locations determined using the final 3-D velocity model. We separate the crust-related

Statistical Geometry Processing Winter Semester 2011/2012

1 Linearity and Linear Systems

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Mathematical Properties of Stiffness Matrices

Data Repository Item

INTERPRETATION OF SEISMOGRAMS

Data Repository Item For: Kinematics and geometry of active detachment faulting beneath the TAG hydrothermal field on the Mid-Atlantic Ridge

ANEWJOINTP AND S VELOCITY MODEL OF THE MANTLE PARAMETERIZED IN CUBIC B-SPLINES

Estimation of S-wave scattering coefficient in the mantle from envelope characteristics before and after the ScS arrival

Inverse Theory Methods in Experimental Physics

29th Monitoring Research Review: Ground-Based Nuclear Explosion Monitoring Technologies

Basic Ray Tracing. Rick Aster and Sue Bilek. October 3, 2003

1.2 Derivation. d p f = d p f(x(p)) = x fd p x (= f x x p ). (1) Second, g x x p + g p = 0. d p f = f x g 1. The expression f x gx

Fitting functions to data

Supporting Online Material for

Image Registration Lecture 2: Vectors and Matrices

Computing tomographic resolution matrices using Arnoldi s iterative inversion algorithm

Operational modal analysis using forced excitation and input-output autoregressive coefficients

Surface wave higher-mode phase velocity measurements using a roller-coaster-type algorithm

Linear Algebra Review. Vectors

Singular value decomposition

LAB 6 SUPPLEMENT. G141 Earthquakes & Volcanoes

SIGNAL AND IMAGE RESTORATION: SOLVING

SUPPLEMENTARY INFORMATION

1. University of Ottawa 2. Dublin Institute for Advanced Studies 3. University of Texas at Austin

E = UV W (9.1) = I Q > V W

Inverse Problems, Information Content and Stratospheric Aerosols

Geophysical Journal International

1. Introduction. Figure 1. The shallow water environment with the cold eddy. (c) European Acoustics Association

The Application of Discrete Tikhonov Regularization Inverse Problem in Seismic Tomography

Geometric Modeling Summer Semester 2012 Linear Algebra & Function Spaces

Stanford Exploration Project, Report 115, May 22, 2004, pages

Vollständige Inversion seismischer Wellenfelder - Erderkundung im oberflächennahen Bereich

Chapter 3 Transformations

Combined tomographic forward and inverse modeling of active seismic refraction profiling data

2.1 Introduction. 16 Chapter2

Approximate Methods for Time-Reversal Processing of Large Seismic Reflection Data Sets. University of California, LLNL

Mantle flow and multistage melting beneath the Galápagos hotspot revealed by seismic imaging

5 Linear Algebra and Inverse Problem

Tomography of the 2011 Iwaki earthquake (M 7.0) and Fukushima

Theory of Model-Based Geophysical Survey and Experimental Design. Part A Linear Problems

Multi-Robotic Systems

C5 Magnetic exploration methods data analysis techniques

Achieving depth resolution with gradient array survey data through transient electromagnetic inversion

Changbaishan volcanism in northeast China linked to subduction-induced mantle upwelling

Properties of Matrices and Operations on Matrices

Receiver Function Inversion

3D IMAGING OF THE EARTH S MANTLE: FROM SLABS TO PLUMES

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems

CS6964: Notes On Linear Systems

On the use of the checker-board test to assess the resolution of tomographic inversions

Co-seismic Gravity Changes Computed for a Spherical Earth Model Applicable to GRACE Data

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

Getting Started with Communications Engineering

Global anisotropic phase velocity maps for higher mode Love and Rayleigh waves

Linear Algebra March 16, 2019

Linear Inverse Problems

Applied Linear Algebra in Geoscience Using MATLAB

A dynamic objective function technique for generating multiple solution models in seismic tomography

Math 302 Outcome Statements Winter 2013

Evidence of an axial magma chamber beneath the ultraslow spreading Southwest Indian Ridge

M. Koch and T.H. Münch. Department of Geohydraulics and Engineering Hydrology University of Kassel Kurt-Wolters-Strasse 3 D Kassel

Databases of surface wave dispersion

Name: INSERT YOUR NAME HERE. Due to dropbox by 6pm PDT, Wednesday, December 14, 2011

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

EIGENVALUES AND SINGULAR VALUE DECOMPOSITION

What is Image Deblurring?

2008 Monitoring Research Review: Ground-Based Nuclear Explosion Monitoring Technologies

MITOCW watch?v=ztnnigvy5iq

Extraction of Individual Tracks from Polyphonic Music

Hale Collage. Spectropolarimetric Diagnostic Techniques!!!!!!!! Rebecca Centeno

REVIEW FOR EXAM II. The exam covers sections , the part of 3.7 on Markov chains, and

Estimating vertical and horizontal resistivity of the overburden and the reservoir for the Alvheim Boa field. Folke Engelmark* and Johan Mattsson, PGS

Lecture 2: Linear Algebra Review

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

Final Examination. CS 205A: Mathematical Methods for Robotics, Vision, and Graphics (Fall 2013), Stanford University

Numerical Methods for Inverse Kinematics

Data Repository: Seismic and Geodetic Evidence For Extensive, Long-Lived Fault Damage Zones

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Scientific Computing

Lecture 6 Positive Definite Matrices

MATRICES. knowledge on matrices Knowledge on matrix operations. Matrix as a tool of solving linear equations with two or three unknowns.

Estimation of Cole-Cole parameters from time-domain electromagnetic data Laurens Beran and Douglas Oldenburg, University of British Columbia.

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Seismic Scattering in the Deep Earth

SURFACE WAVE GROUP VELOCITY MEASUREMENTS ACROSS EURASIA

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

M. Matrices and Linear Algebra

CPE 310: Numerical Analysis for Engineers

Transcription:

Inversion of Phase Data for a Phase Velocity Map 101 Summary for CIDER12 Non-Seismologists 1. Setting up a Linear System of Equations This is a quick-and-dirty, not-peer reviewed summary of how a dataset of phase anomalies (or group travel time anomalies, or any dataset for a 2-dimensional ray-path geometry) can be inverted for a phase velocity (anomaly) map (or any 2-dimensional model) (Figure 1). In a nutshell, to describe the problem numerically, we want to invert the linear system of equations d = A m (1) where vector d is the data vector, vector m is the model vector, and matrix A is the data kernels matrix. Setting up kernel matrix A for inversion i=1 i=2 r1 j=1 s1 2 1 i=3 i=4 r3 r2 i=5 sn: source n rm: receiver m j=2 s2 4 3 j=3 s3 5 r4 data vector: dk=phase anomaly along ray k d1: ray 1: s1 - r2 d2: ray 2: s2 - r3 d3: ray 3: s3 - r3 d4: ray 4: s3 - r1 d5: ray 5: s3 - r4 model vector: ml=map_ji m1=map_11 m2=map_12 m3=map_13 m4=map_14 m5=map_15 m6=map_21 m7=map_22... m15=map_35 elements of matrix A: cells A kl value of A kl for each ray k: x k-ji which is length of segment of ray in each of j x i cells in model Figure 1. Top: Schematic diagram of 5 rays from n MAX =3 sources to m MAX =4 receivers (not all possibilities are plotted!) in a phase velocity map with 3 rows (latitudes) and 5 columns (longitudes). Bottom: scheme of how the data vector and the model vectors as well as the data kernel matrix A are set up. 1

At this point, we have not yet made any assumptions on how the data relate to the model, i.e. whether we use ray theory or a more sophisticated finite frequency approach. All that is assumed here is that the dependence of the data on the model is linear. 1.1 Use Ray Theory Now we apply Fermat s Principle and assume that phase anomaly accumulates along the shortest path between a source and a receiver. On a sphere, this path is a great circle. The phase anomaly is assumed to not be influenced by structure away from the great circle. This assumption is valid in the infinite-frequency limit, i.e. when the wavelength of the wave is much smaller than the wavelength of the structure through which it travels. Since the phase accumulates asφ = kx wherek is the wave number andxthe travel distance, a phase anomaly at distancea is δφ = δ k a where k is the path-averaged wave number anomaly along the path,aearth radius and the epicentral distance in radians. k is related to the path-averaged phase velocity, c, throughc = ω/k and δk = ω/c δc/c. So our phase datum with index k, at frequency ω becomes δφ }{{} k ωa d k = c 0 0 dx } {{ } k l A kl δc(θ, φ) c 0 }{{} m l (2) where θ and φ are geographical coordinates. We integrate over δc/c 0 but the integral is rearranged to make the connection to equation (1). The signifies that we replaced the unknown actual path-averaged phase velocity, c, by a known, predicted reference phase velocity,c 0 (e.g. for a known model). It is ok to do this becauseδc c 0. The indexlfor the model vector in the matrix notation relates to the cells in a map as illustrated in Figure 1. 1.2 Make Data Kernel Matrix For illustration purposes on how we set up our inversion, we assume that our map has 3 rows and 5 column. This gives 3 5 = 15 model parameters. We have 5 rays, i.e. our data vector has length 5. The data dependence on the model parameters, as traced from source,s n, to receiver,r m, is the following d 1 =a 11 m 1 +a 12 m 2 +a 13 m 3 +a 14 m 4 d 2 =a 26 m 6 +a 27 m 7 +a 22 m 2 +a 23 m 3 d 3 =a 3(12) m (12) +a 3(13) m (13) +a 28 m 8 +a 33 m 3 (3) d 4 =a 4(12) m (12) +a 47 m 7 +a 42 m 2 d 5 =a 5(12) m (12) +a 5(13) m (13) +a 5(14) m (14) A = x 1 11 x 1 12 x 1 13 x 1 14 0 0 0 0 0 0 0 0 0 0 0 0 x 2 12 x 2 13 0 0 x 2 21 x 2 22 0 0 0 0 0 0 0 0 0 0 x 3 13 0 0 0 0 x 3 23 0 0 0 x 3 32 x 3 33 0 0 0 x 4 12 0 0 0 0 x 4 22 0 0 0 0 x 4 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x 5 32 x 5 33 x 5 34 0 (4) In a real global inversion we have many more data, e.g. 50,000 or so. Our maps are also much larger. The model vector for a 2-degree map would be 90 180 = 16, 000. To accommodate the fact that a 2 2-degree cell is much smaller near the poles than near the equator but the data resolution is not latitude-dependent, we use an equal-area cell parameterization, i.e. cells are 2 degrees wide near the equator, but much wider near the poles. 2

Consequently, we have 180 cells near the equator but only 3 at±89 latitude. This gives a total of 10,312 cells in the model, reduces the number of model parameters but providing a more evenly spaced model parameterization at the same time. Numerically, less independent data would be required to constrain the model. 2. Setting up the Inversion Formally, we want to determine the model m by inverting equation (1), so m = A 1 d. Numerically, it is difficult to invert a rectangular matrix, and we apply a trick by multiplying equation (1) by A T where T denotes transpose. So, A T d = A T A m (5) We can do this because A has the same eigenvectors and eigenvalues as A T A. Inversion of this equation gives ˆm = (A T A) 1 A T d. (6) where G = (A T A) 1 A T is called the generalized inverse. The hat signifies that because of the imperfection of the data, we will not be able to recover the actual model but a version of the model as seen (or filtered) by the data. An inversion using (6) is a least-squares inversion, i.e. a model is retrieved that minimizes the misfit, χ 2 χ 2 = k ( d k d ˆ k ) 2 σ k were ˆ d k are the predictions calculated with the new model ˆm andσ k are the data errors. Whenχ 2 = 1, the model is said to fit the data to within their errors. If χ 2 < 1, the model overfits the data and contains components that are not required to fit the data, from a numerical point of view. If χ 2 > 1 then the model does not fit the data, either because the model parameterization is not adequate or the data are internally inconsistent (e.g. because of noise contamination or systematic effects that are not modeled such as an erroneous earthquake event times or locations). 2.1 Model Regularization or Damping an Inversion As seen in equation (4), many elements of A are zero as some cells in the model remain unsampled, resulting in a determinant for A T A that is zero. Matrix A is singular and some eigenvalues are zero which does not allow us to retrieve the complete model vector. The matrix is ill-conditioned, with an infinite condition number (largest eigenvalue divided by lowest; a low number signifies a well-conditioned matrix). A strategy often used is that only the non-zero eigenvalues (or the large eigenvalues) and corresponding eigenvectors are used to construct parts of the model. Some workers discard this approach as it leaves unwanted holes in the model though, numerically, this may be the correct approach. An alternative is to regularize an inversion by adding something to the matrix to decrease the condition number, e.g. our inversion may look like ˆm = (A T A +µi) 1 A T d. (7) 3

µ40 Misfit χ2 µ23 µ17 µ3 µ2 µ1 Model norm Figure 2. Trade-off curve between model norm (x-axis) and data misfit (y-axis). Increasing µ indices denote increasing values. The model norm chosen here is m 2. Both the model norm and the data misfit cannot be arbitrarily small at the same time. An optimal model is often chosen at the bend where both model norm and data misfit do not change much with changing regularization parameter µ. where G = (A T A +µi) 1 A T The regularization or damping factorµdetermines how much the inversion is regularized. The identity matrix I in this equation limits the size of the model, where the model length decreases with increasing µ regardless of the structural wavelength in the model (i.e. all elements in the model vector are penalized in the same way). This type of inversion minimizes the weighted sum of data misfit and model norm, or the misfit function,mf, MF = χ 2 +µ m 2. There exists a trade-off between misfit and model norm. A small µ produces a large model but some components may not be constrained by the data as the difference between data and predictions are much smaller than the error bars and the misfit is far below 1. With increasing µ the size of the model decreases, but such a model is less able to predict the data to within their errors, so the misfit increases (Figure 2). In principle, we can impose any regularization on the model, e.g. we can impose a maximum value on individual elements of the model vector (e.g. phase velocity anomalies must not be larger than 10%). We can also impose constraints on gradients or curvatures in the model, in which case we replace the identity matrix in equation (7) by the first or second derivatives: and stands for either the first or second derivative and ˆm = (A T A +µ T ) 1 A T d (8) 4

G = (A T A +µ T ) 1. The misfit function then becomes MF = χ 2 +µ m 2. The choice of the optimal model is somewhat subjective. A model a long the trade-off curve is often chosen for which both the model norm (model norm here meaning the vector length or the length of first or second derivatives thereof) and the data misfit both do not change much when changing the damping parameter. In Figure 2, this would be any model nearµ 17. However, models in that range overfit the data and contain elements that are not required by the data. So, some people would rather choose a model near µ 23. Also, sometimes, models nearµ 17 are unrealistically rough or have unrealistically large values, so a model farther up the trade-off curve is chosen as most optimal model. 2.2 Quick Notes on Numerical Inversion Techniques and Other Approaches On the computer, matrices can be inverted to use canned computer tools. As mentioned in the introduction, a classical way leads through finding the eigenvalues and eigenvectors through singular value decomposition. As seen in Figure 1, many of our matrix elements are zero, so the matrix is sparse. To save computer time, iterative sparse matrix solvers are available. Without going into too much detail, such iterative techniques explore the misfit function by trying to find to fastest way toward the minimum. One such technique is LSQR or least squares OR (see, e.g., van der Sluis and van der Vorst, 1987. This technique is used in the tutorial. Here, one has to make sure to go through enough iterations so that the that the process has converged, i.e. the model no longer changes significantly. A completely different approach to a model is by forward modeling. Here, models are compiled using a variety of strategies. Synthetic data are computed using these models and the misfit determined. The model is kept if it satisfies a certain misfit criterion or discarded if it does not. This way, a group of models can be found that may look completely different from the one obtaining through an inversion but that satisfy the data equally well. Proponents of this approach prefer this over inversions because the latter may get caught in local minima if the misfit function is complex. The art of forward modeling is to decide how to search the model space. Monte Carlo approaches do this randomly. Other approaches using, e.g. genetic algorithms or evolutionary programming, produce models that are bases on mutations of models using certain rules. 3. A Few (Uncomplete) Notes on Model Evaluation If we consider equations (1) and (8) together, we get ˆm = G d = G A m (9) where the resolution matrix R = G A maps the true model, m, into the version of the model, ˆm, as seen by the data through filter R: ˆm = R m. Ideally, the resolution matrix should be the identity matrix. If the diagonal elements are less than 1 then the amplitudes of the corresponding model element is not recovered completely. Any non-zero off-diagonal elements indicate that some cross-mapping between model parameters exists. Though this strategy is somewhat contested (see below) it is useful to evaluate the resolution matrix. For example, in resolution test one model 5

parameter in m is switched on, while all other elements are set to zero. The structure of the resulting ˆm then reveals how this particular element is resolved and/or smeared into other model elements. If the model is a cell-parameterized phase velocity map, then such a test is equivalent to a spike test. If the model is a map expanded in surface spherical harmonics, then such a test is equivalent to a checker board test (Figure 3). Such tests are only meaningful if the same damping and the same error bars are used in this test as in the inversion of the real data, i.e. do the test on equation (8), not on equation (6)! The advantage of using checkerboards as input model for resolution tests in maps is that one gets a quick geographical overview over where in the map structure of a certain wavelength is resolved or not. In our example, good recovery is observed for parts of the western Pacific Ocean (lots of earthquakes) and along the coast of western North America (lots of stations). Input checker boards can have larger wavelengths (lower harmonic degree) or shorter wavelengths (higher harmonic degree). Opponents of checkerboard tests argue that the checkerboard tests do not tell you about the real resolution capabilities of the data. E.g. when one uses ray theory instead of finite-frequency theory, then the checkerboard test does not take into account that not accounting for wavefront healing effects in ray theory degrades resolution. But this argument holds true for any input structure in tests described above, regardless whether it is a checker board, spike or some oddly-shaped structure that looks like the structure that the data imaged (as also often seen in the literature). One could argue that the latter is a most subjective test while a checker board gives a more objective view. Figure 3. Input (bottom) and output (top) of a checkerboard test for a phase velocity map as obtained using the CIDER 12 dataset for 50-s Rayleigh waves. The map view is centered on the Pacific Ocean. The input model is a degree-40 checker board. Note that the inversion strategy used here was LSQR, not a classical SVD matrix inversion. 4. References Dziewonski, A. M. and D. L. Anderson, D.L., 1981. Preliminary reference Earth model. Phys. Earth Planet. Int, 25, 297 356. Menke, W., 1989. Geophysical Data Analysis: Discrete Inverse Theory. Academic Press, 289 pp. 6

and update/matlab version thereof Menke, W., 2012. Geophysical Data Analysis: Discrete Inverse Theory, Third Edition: MATLAB Edition. International Geophysics Series; Academic Press, 348 pp. van der Sluis, A. and van de Vorst, H.A., 1987. Numerical solution of large sparse linear systems arising from tomographic problems. Seismic Tomography, G. Nolet ed, D. Reidel Publishing Company, Dordrecht, pp. 49 83. 7