Bayesian inversion Main Credits: Inverse problem Theory,Tarantola, 005 Parameter Estimation and inverse problems, R. Aster et al., 013 Théorie de l information (problèmes inverse & traitement du signal) Lecture by D. Gibert and F. Lopez, IPGP, 013 (http://www.ipgp.fr/~gibert/information_theory_files/) (many slides borrowed from this source) Olivier Coutant, seismology group, ISTerre Laboratory, Grenoble University Prac%ce: )p://ist- )p.ujf- grenoble.fr/users/coutanto/jig.zip
Plan 1. Introduction to Bayesian inversion. Bayesian solution of the inverse problem 1. Basic examples 3. How to find the solution of the Bayesian inversion 1. deterministic or stochastic 4. The deterministic approach 1. Linear and non linear. More on regularization 3. Resolution operator 4. Joint inversion (e.g seismic and gravity) 5. A practical case: seismic and gravity on a volcano
Bayesian inversion We saw this morning the inversion of linear problem: data = Model x parameter ó d = M p d 1 d... d N = M ij. p 1 p... p M
Introduction to Bayesian inversion That we solve by mean of different kind of solution, e.g least-square: d M p minimize: and obtain parameters as: d = M p M T d = M T M p p = M T M ( ) 1 M T d What if data includes inherent measurement errors? What if model includes also inherent error? How to deal with that? How to infer the transmission of error from data to parameter?
Introduction to Bayesian inversion Data with errors: Model with error: d =d true + ε d M=M true + ε m M and d may be treated as statistical quantities given the statistics of the error ε: They are random variables with a given density of probability p(x) : P(x I) = p(x) dx defined by different moments of increasing order: I - Expectation (mean) = E(x) = I x.p(x) dx - Variance = Var(x) = I (x E(x)) p(x) dx
Introduction to Bayesian inversion What if we have an a priori knowledge about parameters? «There is a good chance that the earthquake is not located in the air, neither in the sea ; most likely in Ubaye beneath Barcelonnette» The Bayesian approach gives a good formalism to introduce: - A priori knowledge on parameters - Description of data and model errors - To deal with non deterministic or non exact direct model and coupling
Bayesian solution of the inverse problem Basic concept is both simple and very powerful: Probability (A and B) = Probability (B and A): P(A,B)=P(B,A) P(A,B) = Probability(A) x Probability (B with A realized): P(A,B)=P(A)P(B A) P(B,A) = Probability(B) x Probability (A with B realized): P(A,B)=P(A)P(B A) P(A) = Σ P(A B) P(B) and P(B) = Σ P(B A) P(A) Bayes formula: P(A)P(B A)=P(B)P(A B) Let us call A event the data, and B event the parameter or model, then: Posterior Probability on model Prior Probability on model Data likelyhood knowing the model P(B A) = P(B) P(A B) P(A B)P(B)
Bayesian solution to inverse problem Bayesian solution of inverse problems Practical issues to obtain the Bayesian posterior probability: P(B A) P(B) = P(B) P(B = x P(A P(A B) B) P(A P(A,B)dB B)P(B) The data likelihood for model B P(A B) is obtained by computing the probability for the data to be actually observed if model B is the true model. P(A B) is a probability because: the data contain errors and there is some probability that the data depart from the model prediction. the relationship giving the data from the model the so-called forward problem may be imprecise or fundamentally of a probabilistic nature like in quantum mechanics (but not only). P(A B) is what will be called the forward problem 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013 5
Bayesian solution to inverse problem: more on data and model error Earthquake location problem: knowing seismic arrival times (P, S, Pg, Pn) è invert for source location (x,y,z) homogeneous gradient Gradient + layers Systematic bias between these three models Model 1 introduce systematic error on Pn (refracted waves) data type
Bayesian solution to inverse problem Bayesian solution of inverse problems Practical issues to obtain the Bayesian posterior probability: P(B A) = P(B) P(B) x A) = P(A B) B) P(A P(A,B)dB B)P(B) The prior probability for model B P(B) represents the amount of information available BEFORE performing the Bayesian inversion. P(B) may be obtained or applied by: using the posterior probability of another, previously solved, inverse problem. defining the parameter space in order to restrict it to the a priori probable values of the parameters. Finally, the Bayesian solution is extented to the continuous case: ρ(m d) = ρ(m) ρ(d m) ρ(d m)ρ(m)dm m: model d: data 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013 6
Bayesian solution to inverse problem Introduction The solution of a given inverse problem may constitute the prior information of another inverse problem. 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013
Bayesian solution to inverse problem Bayesian solution of inverse problems Probabilistic forward problem: deterministic physical law and exact data A (data) A1 A P(A B) P(A1 B1) = 1 P(A B1) = 0 B1 B (model) 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013 7
Bayesian solution to inverse problem Bayesian solution of inverse problems Probabilistic forward problem: fuzzy physical law A (data) A1 A P(A B) P(A1 B1) P(A B1) B1 B (model) 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013 8
Bayesian solution to inverse problem Bayesian solution of inverse problems Probabilistic forward problem: deterministic physical law and imprecise data A (data) A1 A P(A B) P(A1 B1) P(A B1) B1 B (model) 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013 9
Remarks The direct problem may be more or less difficult to solve either analytically or numerically. The resolution of the forward problem often represents a great deal of work. The relationship between data and parameters may be linear: - For instance gravity anomaly is doubled if density is multiplied by a factor of The relationship betwwen datat and parameters is often non-linear: - Gravity anomaly is not divided by if the depth of the causative source is multiplied by The forward problem often has a unique solution - A given geological structure produces one gravity anomaly A remarkable property of inverse problems is that they generally have multiple acceptable solutions But, different geological structures may produce the same, or very comparable gravity anomaly
Gravity measurement from point Jg1 J J
Example solve the inverse problem for a single measurement and a single parameter We want to solve: ρ(m d) = ρ(m)ρ(d m) ρ(d) = ρ(m)ρ(d m) ρ(m)ρ(d m)dm The theore%cal rela%on giving the ver%cal gravita%onal field is : g z (x) = π.g.ρ.r t.z t (x x t ) + z z We assume here that the measurement is above the tunnel, that we know the radius, the density, and seek only for the depth
Example The relation reduces to : with α=-1018.84. In our problem we measure g=-58.96 µgal; ρ=700kg/m 3 and r t =3m g z (z t ) = α z t And the Bayes relation to be solve is ρ(z t g) = ρ(z t )ρ(g z t ) ρ(g)ρ(g z t )dz t Let assume that the measurement g is subject to a Gaussian error with respect to the true value: p(g g true ) e ( g g ) true σ g
Example And that the a priori function on the parameter is : z min < z t <z max ρ(z t ) = 1 Δz Π z z t moy Δz Then the solution of the inversion is ρ(z t g) = Π z z t moy Δz z max z min e. e ( g g ) true σ g ( g g true ) σ g dz t = Π z z t moy Δz z max z min e. e ( g g ) true σ g g α z t dz t σ g
Example
Example δg1 Find the posterior probability of the tunnel depth. The prior probability is non-uniform and a single data (δg1) is used. 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013 14
Example δg1 δg Find the posterior probability of the tunnel depth. The prior probability is uniform and a two data (δg1 and δg) are used. 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013 15
Example δg Find the posterior probability of the tunnel depth. The prior probability is uniform and a single data (δg) is used. 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013 16
δg1 Example Find the posterior probability of both depth and horizontal position of the tunnel from a single data. Prior probability is uniform. 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013 17
δg1 δg Example Find the posterior probability of both depth and horizontal position of the tunnel from two data. Prior probability is uniform. 17 sept. 01 -- Lecture Information theory - D. Gibert - Year 01-013 18
How to solve a Bayesian inversion? ρ(m d) = ρ(m)ρ(d m) ρ(m)ρ(d m)dm 1. Find the maximum likelihood (ie: find the maximum of the a posteriori probability distribution) max m (ρ(m d)) ó a cost function ó optimization problem, deterministic approach. Estimate the a posteriori expectation for y (mean value) yρ(y x) dy Deterministic approaches
How to solve a Bayesian inversion? 3. Explore the a posteriori density probability ρ(m d) by a numerical approach «stochastic sampling» - Metropolis-Hasting ( Richards Hobbs talk ) - Gibbs sampling; nested sampling - simulated annealing; genetic algorithm Stochastic approaches 4. Model the a posteriori density by mean of analytical representation: Laplace approximation (Gaussian probability); Variational Bayesian methods ( see www.variational-bayes.org)
Deterministic approach (cf example solved above) We assume a known density probability distribution for a priori density on parameters ρ(m) forward problem ρ(d m) And solve for the maximum likelihood In practical we mostly consider Gaussian type density distribution: ρ(m) = e 1 ( m m prior ) T C 1 M ( m m prior ) ρ(d m) = e 1 ( g(m) d)t C 1 d ( g(m) d)
Deterministic approach The Bayes formula becomes: ρ(m d) e 1 ( g(m) d)t C 1 d ( g(m) d) 1 ( m m prior ) T C 1 M ( m m prior ) where g(m) is a general non linear relation between data and model, m prior is our a priori hypothesis on the model. C m and C d are covariance matrices on the model and the data: C m = σ m1..... C ji = C ij.. C ij σ m j.....
Deterministic approach, linear case In the linear case: g(m) = G.m, it reduces to: ρ(m d) = e 1 { ( G.m d)t C 1( G.m d) + ( m m ) T C 1 d prior m ( m m prior )} which can also be written as a function of likelihood estimator : ρ(m d) = e 1 {( m ˆm ) T C 1 M ' ( m ˆm )} which maximizes ρ(m d), or equivalently minimizes: ˆm the maximum of C ( M = G T C 1 1) d G+C 1 m a posteri covariance on m ( G.m d) T 1 C d ( G.m d ) ( ) T 1 ( ) + m m prior C m mprior M
Deterministic approach, linear case And the solution is (model space): ˆm = m prior + (G T C d -1 G + C m -1 ) -1 G T C d -1 ( d G.m ) prior ˆm = (G T G) -1 G T d (recall the least square solution) or (data space): ˆm = m prior + C m G T (G T C m G + C d ) -1 ( d G.m ) prior that again minimizes G.m d C d + m m prior C m
Deterministic approach, non-linear case For the non linear case, we wish to find ˆm that minimizes: g(m) d C d + m m prior C m We are now back to optimization iterative procedures. The solution using a quasi-newton method for instance is (Tarantola, 005): m n+1 = m n + µ n (G T n C -1 d G n + C -1 m ) -1 ( G T n C -1 ( d d n d) + C -1 m (m n m prior )) Deterministic approach pro and con: - easy to program - flexible to fix a priori constraint on model - Can fall in secondary minima - Not suited for large scale problems
More on regularization Back to least square problem: d = Gm with solution: ˆm = (G T G) -1 G T d If G and hence G T G is ill-conditioned, or the system is under determined, or not unique or,, the above linear system may not be solved and we need to regularize the system, ex SVD solution,, or damped least square (Levenberg Marquardt) : which minimizes: ˆm = (G T G + λi d ) -1 G T d G.m d + λ m Let s compare our Bayesian deterministic solution to the LS solution using regularization techniques (Tikhonov)
More on regularization To stabilize inversions, one often uses Tikhonov regularization techniques Zero order minimize G.m d + λ m ó ˆm = (G T G + λi d ) -1 G T d and favor model solution with minimum perturbation First order: G.m d + λ Lm L ~ first derivative operator favor model solution with zero gradient (flat solutions) Second order: G.m d + λ Lm L ~ second derivative operator favor model solution with minimum roughness (L~ Laplacien)
More on regularization How does Bayesian inversions compares to Tikhonov regularization? One minimizes: G.m d + (m m prior ) T C m 1 (m m prior ) The operator acting on the model is matrix.. C m 1 the inverse of the covariance In 1D, a covariance matrix is strictly equivalent to a smoothing or convolution operator: C m s ó f*s where s is a signal and f filter impulse response C m= Δ : Correla%on distance
More on regularization C m -1 is thus the reciprocal filter (i.e a highpass filter) If we chose a covariance matrix with an exponential shape: e i j Δ Its Fourier tranform e x Δ Δ ó (k=wavenumber) 1+ k Δ The reciprocal filter (i.e C m -1) is 1+ k Δ Δ By comparison, the second order tikhonov regularization k operator is (Laplacien ) x The bayesian inversion regularization operator is then a high pass filter with a characteristic length
More on regularization Conclusion G.m d C d + (m m prior ) T C m 1 (m m prior ) This regulariza%on minimizes the model varia%ons that are smaller than the correla%on length Δ The covariance on parameter allows to perform a scaled inversion to enhance model varia%on that are larger than Δ ó allows for mul%scale or mul%grid inversion
More on regularization Other regularizations: Total variation, minimize: => enhance sharp variations G.m d + λ Lm where L=gradient operator Sparsity regularization, minimize: G.m d => maximize the number of null parameter + λ m
Resolution operator It yields some insight on how well a parameter is resolved, and if neighbor solution are correlated or independent. For the bayesian, linear case, the resolution operator takes the form: R = I C M C M 1 and ˆm = R m true C M where is the a posteriori covariance matrix and the inverse C M 1 a priori covariance matrix
Resolution operator (Barnoud et al., 015) 0.0 0 resolution matrix row for a parameter at km depth (a) diagonal term 0000 40000 60000 (b) 0.004 (c) Resolution along horizontal profile A-B resolution lengths Parameter at km depth All indices of the line Parameter at 4 km depth Parameter at 6 km depth Nodes of density grid Z UTM (km) 0 (d) 0 8 C A B D 630 635 640 645 650 655 X UTM (km) Resolution along vertical profile C-D (e) 8 0 0.005 0.01 0.015 0 4 Z UTM (km)
Resolution operator (Barnoud et al., 015) 0.0 0 resolution matrix row for a parameter at km depth (a) diagonal term 0000 40000 60000 (b) 0.004 (c) Resolution along horizontal profile A-B resolution lengths Parameter at km depth All indices of the line Parameter at 4 km depth Parameter at 6 km depth Nodes of density grid 0 A simple method for determining the spa%al resolu%on of a Resolution general along inverse vertical problem profile C-D Geophysical (d) Journal Interna3onal, Wiley Online Library, (e) 01, 191, 849-864 0 C Trampert, A J.; Fichtner, A. & Ritsema, J. Resolu%on B tests revisited: the power of random numbers 4 Geophysical Journal Interna3onal, Oxford University Press, 013, 19, 676-680 D Z UTM (km) 8 630 635 640 645 650 655 X UTM (km) 8 0 0.005 0.01 0.015 0 Z UTM (km)
Joint inversion with a Bayesian deterministic method We wish now to invert simultaneously the slowness s and the density ρ from travel times t and gravity measurement g Assuming that density and slowness are coupled via: S=f(ρ) and minimize: T (s) t C t + G.ρ g C g + s s prior C s + ρ ρ prior C ρ + S f (ρ ) C ρ We gather parameters and data in the same vector m and d g(m) d Cd + m m prior C m + S f (ρ ) C ρ data model coupling Onizawa et al., 00; coutant at al., 01
Joint inversion (a) 1 Sp slowness s/km 0.8 0.6 0.4 0. 0 0 0.5 1 1.5 density.5.5 (b) Points from (,V ) independant inversions 7 p Birch (1961) law Onizawa (00) Our study linear (,S ) fit to samples 6 5 p 4 Mt Pelée samples 3 p V velocity km/s Rela%on slowness- density: Use linear coupling? 1 0 0 0.5 1 1.5 density (c).5 All nodes 5% 5% best best resolved resolved nodes nodes our our study study ((,S,Sp)) relation relation Sp slowness s/km p 1.5 1 0.5 Coutant et al. 013 0 0.5 1 1.5 density.5
Joint inversion: an experiment A joint inversion: seismic and gravity on a volcano (active ) Seismic and gravimetric stations Ttime = nodes S i X i Horizontal travel time Gravity = nodes ρ i G i (ρ, S) Vp=000 Two dykes ρ=700
Joint inversion: an experiment A joint inversion: seismic and gravity on a volcano (active ) Seismic and gravimetric stations Ttime = nodes S i X i Horizontal travel time Gravity = nodes ρ i G i (ρ, S) Vp=000 Two dykes ρ=700
Joint inversion: an experiment
Joint inversion: an experiment How to solve? G.m d Cd + m m prior C m + S f (ρ ) C ρ data model coupling Assume a linear coupling between ρ and S: G.m d Cd + m m prior C m
Joint inversion: an experiment d = G m d = ns travel times ns gravity left slope ns gravity right slope G = dtraveltime dslowness 0 0 dgravity dρensity m = np slowness dyke1 np slowness dyke np density dyke1 np density dyke C d = σ t 0 0 0 σ g 0 0 0 σ g C m = σ Sdyke1 0 coupling 0 0 σ Sdyke 0 0 coupling 0 σ ρdyke1 coupling 0 coupling 0 σ ρdyke ˆm = m prior + (G T C d -1 G + C m -1 ) -1 G T C d -1 ( d G.m ) prior
Joint inversion: an experiment Independent inversion, no coupling
Joint inversion: an experiment Independent inversion, no coupling
Joint inversion: an experiment Joint inversion: parameters
Joint inversion: an experiment Joint inversion: data fitting
)p://ist- )p.ujf- grenoble.fr/users/coutanto/jig.zip