Regularizing inverse problems. Damping and smoothing and choosing...

Size: px

Start display at page:

Download "Regularizing inverse problems. Damping and smoothing and choosing..."

Mabel O’Connor’
6 years ago
Views:

1 Regularizing inverse problems Damping and smoothing and choosing

2 Regularization The idea behind SVD is to limit the degree of freedom in the model and fit the data to an acceptable level. Retain only those features necessary to fit the data. A general framework for solving non-unique inverse problems is to introduce regularization. Regularization makes a non-unique problem become a unique problem. How does it do this? We want to minimize a combination of data misfit and some property of the model that measures extravagant behaviour, e.g. φ(m) =(d g(m)) T Cd 1 (d g(m))+μ(m m o) T Cm 1 (m m o ) Inverse problem nonlinear optimization problem 142

3 Solutions to inverse problems Optimal data fitting model Extremal model Data acceptable models The general framework is optimization φ(m) =(d g(m) T Cd 1 (d g(m)+μ(m m o) T CM 1 (m m o) 143

4 Damped least squares We seek the model that minimizes φ(d, m) =(d Gm) T Cd 1 (d Gm)+μ(m m o) T CM 1 (m m o) After some algebra we get m DLS =(G T C 1 D G+μC 1 M ) 1 (G T C 1 D d+μc 1 M m o) This is the damped least squares solution. A special case is to minimise a weighted sum of the data misfit and the model norm. min (d Gm) μ m 2 2 Normal equations become (G T G + μi)m = G T d this system of linear equations will have a unique solution which is called the Tikhonov solution 144

5 L-curve solution Choose μ that is nearest the elbow of the curve With this value of μ we can construct a particular Tikhonov solution (G T G + μi)m = G T d Perform repeat solutions of the normal equations for different μ and select the one which lies near the elbow of the trade-off curve Elbow is estimated visually -> potentially subjective if elbow is not clear See example 5.1 of Aster et al. (2005) 145

6 Tikhonov regularization Choose μ that minimizes the length of the model while specifying a maximum value of the prediction min m 2 2 s.t. d Gm 2 2 < δ Choice of μ depends on knowing the errors in the data Discrepancy principle gives us a value of δ Since each solution δ corresponds to a particular μ, we often refer to the Tikhonov solution as a function of μ on the curve min (d Gm) μ m 2 2 (G T G + μi)m = G T d See example 5.1 of Aster et al. (2005) 146

7 Tikhonov: Discrepancy principle Discrepancy principle gives us a value of δ and consequently a value for μ If the N data have Gaussian errors with known variance N(0,σ 2 ) then we would expect that on average that each residual (d i G I,j m j ) would be approximately σ. Hence we have d Gm 2 = Nσ 2 c.f.χ2 N (50%) δ = σ N We get an expected value of the norm of the prediction error from the number of data and the standard deviation of the data (G T G + μi)m = G T d Perform repeated solutions until d Gm = δ 147

8 Tikhonov Example: The Shaw problem Recall the discretized Shaw problem Data d(s) and model m(θ) at N equal angles s i = θ i = (i 0.5)π n π 2, (i =1, 2,...,n) d i = d(s i ) m j = m(θ j ) (i =1,...,n) (j =1,...,n) This gives a system of N N linear equations where d = Gm Ã! G i,j = s(cos(s i )+cos(θ j )) 2 sin(π(sin(si )+sin(θ j ))) 2 π(sin(s i )+sin(θ j )) s = π n We know this problem was ill-posed and previously solved with truncated SVD. 148

9 Example: Shaw problem with Tikhonov regularization Let s revisit the Shaw problem d = Gm N = M =20 s i Eigenvalue spectrum An severely ill-posed inverse problem where G has a condition number = Extreme sensitivity to noise i Resolution is sacrificed to achieve stability Input spike model Data from input spike Amplitude Model units Data units 149

10 Recall SVD applied to this problem (with noise) d = Gm m = V p S 1 p U T p d = px i=1 Ã ui d s i! v i Input spike model Data from spike model Amplitude Model units Data units Recovered model Add Gaussian noise to data σ =10 6 Presence of small eigenvalues means sensitivity of solution to noise 150

11 d = Gm SVD: Shaw problem with p=10 m = V p S 1 p U T p d = px i=1 Ã ui d s i! v i Input spike model Amplitude Model units use first 10 eigenvalues only No noise solution Noise solution Truncating Truncating eigenvalues eigenvalues reduces reduces sensitivity sensitivity to to noise noise but but also also resolving resolving power power of of the the data data 151

12 Tikhonov regularization: Shaw problem (G T G + μi)m = G T d L-curve for the Shaw problem Fortunately a nice clear elbow is visible in this log-log plot, Trade off parameter μ is easily picked. N[0, (10 6 ) 2 ] μl = δ = = μδ = Tikhonov solution for μ δ L-curve solution μ L m =

13 Example: Tikhonov regularization How do these choices for μ relate to the SVD truncation level chosen earlier? μl = δ = σ N δ = μδ = m = px i=1 Ã ui d s i! v i The eigenvalue from the truncation level in SVD is similar to the two choices of in the Tikhonov scheme. μ m = V p S 1 p U T p d = px i=1 Ã! ui d s i v i (G T G + μi)m = G T d 153

14 SVD and Tikhonov regularization Tkihonov solution G = U p S p V T p m =(G T G + μi) 1 G T d It can be shown... that this can be written as Look! MX s 2 Ã! Page of m = i ui d i=1 s 2 i + μ v Aster et al (2005) i s i Now we get very different behaviour m = Ã! px ui d v i i=1 s i SVD Let f i = s2 i s 2 i + μ Filter factors As singular values s i 0 the solution is not highly sensitive to noise Because f i 0 rather than. If s i >> μ then f i 1 If s i << μ then f i 0 SVD s i >s p f i =1 s i <s p f i =0 μ filters out (damps) the unstable influence of the small eigenvalues 154

15 Filter factors Filter factors control influence of singular values u i d s i s i u i d m = f i = c.f. Filter factors MX i=1 s2 i s 2 i + μ Ã! ui d f i v i s i f i =1 As singular values s i 0 the solution is not highly sensitive to noise Because f i 0 rather than. If s i >> μ then f i 1 If s i << μ then f i 0 μ filters out (damps) the unstable influence of the small eigenvalues 155

16 Model resolution for a Tikhonov solution A Tikhonov solution has the form m =(G T G + μi) 1 G T d = G g d Input spike model Model resolution matrix R M = G g G = VFV T F is the M x M diagonal matrix of filter factors m est = R M m true Each row of the resolution matrix can also be found using a spike test! m =(G T G + μi) 1 G T d m = MX i=1 s 2 i s 2 i + μ output model Ã! ui d s i v i This is the row which corresponds to the parameter in the spike model. Resolution tells us how the estimated model is a linear combination of the true model. Relationship to true solution 156

17 Model covariance a Tikhonov solution A Tikhonov solution has the form Input spike model m =(G T G + μi) 1 G T d = G g d Model covariance matrix C M = G g C 1 D (G g ) T Confidence intervals calculated from the diagonal of the covariance matrix Noise propagation into the solution μδ = Error bars too small to see! give unrealistically small errors As regularization strength increases the size of confidence interval decreases Confidence limits do not represent distance to the true solution -> Regularized solutions are ALWAYS BIASED! 157

18 Effect of data noise on regularized solutions Tikhonov m =(G T G + μi) 1 G T d Amplitude Input spike model Model units m = TSVD MX i=1 m = s 2 i s 2 i + μ px i=1 Ã ui d Ã ui d s i s i!! v i v i Because regularized solutions are ALWAYS BIASED! No noise solution Noise solution 158

19 Choosing μ with Generalized cross validation As we have seen the choice of trade-off parameter μ determines the level of fit to the data possible with a Tikhonov solution and vice versa δ = σ N What if I really don t have a good idea of how well I should fit my data? How can I choose δ or μ? Can I still get a robust solution? Is all lost? (L-curve is one answer) All is not lost. There is usually information in the variability of data itself about the likely level of fit needed. We can use the data itself to estimate the regularization parameter! This approach is called Cross Validation

20 Choosing μ by cross validation Cross validation is a way of assessing the quality of a solution based on the principle of measuring the fit to missing data The Leave one out principle Lets fit a curve (red) with one data point (green) left out A simple model A complex model Large hyper-parameter μ 1 Small hyper-parameter μ 1 In both cases (on average) the missing data tend to be fit poorly! 160

21 Choosing μ with Cross validation What does cross validation mean? For N data and a chosen value for μ, Drop one data value, d k and use your favourite algorithm to fit the remainder, d i (i=1,.k-1,k+1,..n) m k (μ) =(G T k G k + μi) 1 G T k d = G g k d Compare prediction of recovered model with missing datum r k (μ) =d k Gm k (μ) Repeat for each datum (k=1,...,n) and sum the squares of the residuals to produce a fitness for μ V (μ) = rk 2 k=1 Find the μ that minimizes of V(μ) NX 161

22 Leave some out A general framework: K-fold cross validation V (μ) is the cross validation measure for μ 162

23 Example: GCV in Curve fitting Problem: Find the curve that generated this noisy data. y Data are the y i values for known x i (i=1,,n) and each has Gaussian random noise N[0,0.25] What would you guess is the solution? 163

24 s(x, m) Example: Curve fitting Let represent the continuous solution curve as a function of x, and let its shape be determined by some parameters m. Lets first choose to fit the data with the smoothest possible curve i.e. minimize and also ψ(d, m) = J(s) = NX i=1 Z Ã 2! 2 s x 2 dx (d i s(x i, m)) 2 Duchon (1976) showed that its possible to fit the data d i exactly and obtain a unique curve with minimum J(s) using thin plate splines. s(x, m) =p(x)+ where NX i=1 m i φ(x x i ) p(x) is a polynomial in x φ(x x i ) is a Thin plate spline basis function The parameters m i can be solved for using matrix inversion 164

25 The smoothest thin plate spline The parameters m in thin plate spline solution are given by the solution of a set of linear equations Does this extremal solution look smooth? What has gone wrong? Answer: Data is over fit 165

26 Choosing the trade-off parameter with GCV We should not try to fit noisy data exactly but only to an acceptable level. To relax data fit we want to minimize φ(d, m) = NX i=1 (d i s(x i, m)) 2 + μj(s) How to choose the trade off parameter μ? Answer: there are many ways. If we do not know what an acceptable fit to the data is we can use Generalized Cross Validation. In this approach we use the data to find a value for μ NX Leave each datum out in turn and compute the solution V (μ) = i=1 (d i s i (x i, m)) 2 s i (x i, m) = thin plate spline interpolant with i-th datum missing Look how V(μ) behaves as a function of μ μ V (μ) μ 0 V (μ) The solution is very smooth The solution is very rough 166

27 Example: Curve fitting with GCV. We find μ that minimizes V(μ) by repeated solutions of the TPS problem We get a unique minimum in V(μ)! 167

28 Example: Curve fitting with CV. The TPS solution that corresponds to the minimum of μ This is achieved without knowing in advance what a suitable acceptable data fit was. The GCV is in effect a way of estimating the error in the data and using it in model fitting. 168

29 Regularization by GCV in general For a general linear inverse problem with the k-th datum left out a solution is obtained by minimizing This is second order Tikhonov regularization NX φ k (d, m) = ((Gm) i d i ) 2 + μm T L T Lm i6=k If is the solution model for given μwith k-th datum missing m k (μ) m k (μ) =(G T k G k + μl T L) 1 G T k d k = G g k d k then the cross validation function is the prediction error for all N solutions. V (μ) = 1 N NX k=1 (G k m k d k ) 2 We find μ that minimized V(μ). Evaluation of V for any μ would appear to require N solutions to the inverse problem but its possible to show that V (μ) N (d Gm(μ))T (d Gm(μ)) Tr(I GG g See Aster et al. (p108, 2005) μ ) 2 Which requires only a single optimization solution to evaluate for any μ 169

30 Generalized cross validation Note that calculation of a single value of the cross validation function V(¹ 1 ) requires N-repeat inversion solutions. Where we choose N. This might become expensive. Craven & Wahba (1979) devised Generalized cross validation, which removes the need to perform many repeat inversions. Hence its much more efficient and gives the same answer. Minimize φ k (d, m) = NX i6=k ((Gm) i d i ) 2 + μm T L T Lm CV function V (μ) = 1 N NX k=1 (G k m k d k ) 2 N solutions for each μ GCV function V (μ) N (d Gm(μ))T (d Gm(μ)) Tr(I GG g μ ) 2 1 solution for each μ 170

Example: Cross validation in Finite Fault

Australian Tsunami Warning System Model of

31 Example: Cross validation in Finite Fault Slip inversions Fault slip model 2004 Sumatra-Andaman earthquake Vertical Sea floor movement Chen Ji (Caltech) Australian Tsunami Warning System Model of 1833 event Courtesy Phil Cummins (Geoscience Australia) 171

32 How does Finite Fault Slip inversion work? Try to fit the data (body wave, surface wave seismograms, geodetic observations etc) Body wave seismograms Also try to seek the least complex model Least complex may mean Minimum length Flattest model Smoothest model Most compact slip distribution (Lohman & Simons, 2004) Fault slip model d = Gm Posed as a regularized linear inverse problem: Minimize a combination of fit to data and control on the model φ = {Data misfit} + r {Model Roughness} 172

33 Solving for a finite fault slip model φ = d Gm 2 + μ 2 1 L 1m 2 + μ 2 2 L om 2 For given values of the hyper-parameters we find the m model which minimizes φ. (μ 1, μ 2 ) Use non-negative least squares method (Lawson and Hansen 1972) Solves the linear system d 0 0 = G μ 1 L 1 μ 2 L 0 m A good stable, reliable approach d 0 = G 0 m in a least squares sense subject to positivity constraints on all variables m 0 Existence of inequality constraints means GCV not applicable 173

34 2006 Kuril earthquake Fault plane solution obtained with interactively chosen hyper-parameter 40 waveforms to fit 4677 data equations 1676 model parameters Toshikata Baba s solution Deep ocean Tsunami records Regularization: Minimum length + positivity constraints 174

35 Interactive selection of hyper-parameters is not efficient enough! Poor data fit simple solutions Baba s solution Good data fit complex solutions Time consuming, subjective and difficult 175

36 2006 Kuril earthquake with CV 176

= 210) 40 waveforms to fit 4677 data equations 1676 model

37 2006 Kuril earthquake: comparison between CV and interactive Interactive solution (r 1 = 150) Cross validation solution (r 1 = 210) 40 waveforms to fit 4677 data equations 1676 model parameters Regularization: Minimum length + positivity constraints 177

38 2007 Solomons earthquake with CV A clear minimum again! Baba s Cross validation solution o fold waveforms to fit 2562 data equations 896 model parameters r 1 = 70, r 2 = 0 Regularization: Minimum length + positivity constraints 178

39 What about the computational cost of Cross validation? Ideally suited to parallelization Parallelization possible over data space and model space T. Baba parallelized this calculation over data space only and was able to reduce computation from 3 days to 2 hours. 179

40 Grid search is not the only option φ = d Gm 2 + μ 2 1 L 1m 2 + μ 2 2 L om 2 More than one hyper-parameter requires multi-dimensional direct search to minimize V(¹ 1, ¹ 2 ) V (μ 1, μ 2 ) Uniform grid search μ 2 μ 1 Adaptive search (NA, SA, GA) 180

41 Cross validation properties Cross validation is a way of assessing the quality of a solution based on the principle of measuring the fit to missing data On the plus side Cross validation can be used in cases where a suitable level of data misfit is not known in advance (it is in effect a bootstrap measure of data variance) Provides an objective criterion for picking a preferred solution. Can be automated. (GCV for linear problems with no inequality constraints) On the minus side Has been criticized for over-smoothing models. Computational burden has often seen it ignored (e.g. by geophysicists!) 181

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch