Singular value decomposition If only the first p singular values are nonzero we write G =[U p U o ] " Sp 0 0 0 # [V p V o ] T U p represents the first p columns of U U o represents the last N-p columns of U V p represents the first p columns of V V o represents the last M-p columns of V A data null space is created A model null space is created Properties U T p U o =0 U T o U p =0 V T p V o =0 U T p U p = I U T o U o = I V T o V o = I V T o V p =0 V T p V p = I Since the columns of V o and U o multiply by zeros we get the compact form for G G = U p S p V T p 120
Model null space Consider a vector made up of a linear combination of the columns of V o m v = MX i=p+1 λ i v i The model m lies in the space spanned by columns of V o Gm v = MX i=p+1 λ i U p S p V T p v i = 0 So any model of this type has no affect on the data. It lies in the model null space! Where have we seen this before? Consequence: If any solution exists to the inverse problem then an infinite number will Assume the model m ls fits the data Gm ls = d obs G(m ls + m v )=Gm ls + Gm v = d obs + 0 Uniqueness question of Backus and Gilbert The data can not constrain models in the model null space 121
Data null space Consider a data vector with at least one component in U o d obs = d o + λ i u i (i>p) For any model space vector m we have d pre = Gm = U p S p V T p m = U p a For the model to fit the data we must have d o + λ i u i = px j=1 a j u j d obs = d pre Where have we seen this before? So data of this type can not be fit by any model. The data has a component in the data null space! Consequence: No model exists that can fit the data Existence question of Backus and Gilbert All this depends on the structure of the kernel matrix G! 122
Moore Penrose Generalized inverse G = V p Sp 1 Up T The generalized inverse combines the features of the least squares and minimum length solutions. Purely over-determined problem it is equivalent to the least squares solution m = G d =(G T G) 1 G T d In a purely under-determined problem it is equivalent to the minimum length solution m = G d = G T (GG T ) 1 d In general problems it minimizes the data prediction error while also producing a minimizing the length solution. L(m )=m T m φ(m )=(d Gm ) T (d Gm ) 123
Covariance and Resolution of the pseudo inverse How does data noise propagate into the model? What is the model covariance matrix for the generalized inverse? For the case C d = σ 2 I C M = G C d (G ) T G = V p Sp 1 Up T C M = σ 2 G (G ) T = σ 2 V p Sp 2 Vp T Prove this Recall that S p is a diagonal matrix of singular ordered values S p = diag[s 1,s 2,...,s p ] C M = σ 2 p X i=1 v i v T i s 2 i Prove this As the number of singular values, p, increases the variance of What is the effect of singular values on the model covariance? the model parameters increases! 124
Covariance and Resolution of the pseudo inverse How is the estimated model related to the true model? Model resolution matrix R = G G = V p Sp 1 Up T U p S p Vp T = V p V T p As p increases the model null space decreases m = Rm true G = V p Sp 1 Up T p M : V T p V 1 p, R I As the number of singular values, p, increases the resolution of What is the effect of singular values on the resolution matrix? the model parameters increases! We see the trade-off between variance and resolution 125
Worked example: tomography Using rays 1-4 G = 1 0 1 0 0 1 0 1 0 2 2 0 2 0 0 2 δd = Gδm G T G = 3 0 1 2 0 3 2 1 1 2 3 0 2 1 0 3 This has eigenvalues 0, 2, 4, 6. V p = 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 V o = 0.5 0.5 0.5 0.5 Gv o = 0 s 1 2 = 6 s 2 2 = 4 s 3 2 = 2 s 4 2 = 0 126
Worked example: Eigenvectors S 1 2 =6 S 2 2 =4 V p = 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 S 3 2 =2 V o = 0.5 0.5 0.5 0.5 127
Worked example: tomography Using all non zero eigenvalues s 1, s 2 and s 3 the resolution matrix becomes δm = Rδm true = V p V T p δm true R = 0.75 0.25 0.25 0.25 0.25 0.75 0.25 0.25 0.25 0.25 0.75 0.25 0.25 0.25 0.25 0.75 V p = 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 Input model Recovered model 128
Worked example: tomography Using eigenvalues s 1, s 2 and s 3 the model covariance becomes C M = σ 2 p X i=1 v i v T i s 2 i C M = σ2 4 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 s 1 2 = 2 s 2 2 = 4 s 3 2 = 6 + 1 6 C M = σ2 48 11 7 5 1 7 11 1 5 5 1 11 7 1 5 7 11 129
Worked example: tomography Repeat using only one singular value s 3 =6 Model resolution matrix V p = 0.5 0.5 0.5 0.5 R = V p V T p = 1 4 Model covariance matrix Input Output C M = σ 2 p X i=1 v i v T i s 2 i = σ2 24 130
Recap: Singular value decomposition There may exist a model null space -> models that can not be constrained by the data. There may exist a data null space -> data that can not be fit by any model. The general linear discrete inverse problem may be simultaneously under and over determined (mix-determined). Singular value decomposition is a framework for dealing with ill-posed problems. The Pseudo inverse is constructed using SVD and provides a unique model with desirable properties. Fits the data in a least squares sense Gives a minimum length model (no component in the null space) Model Resolution and Covariance can be traded off by choosing the number of eigenvalues to use in reconstruction. 131
Ill-posedness = sensitivity to noise Look what happens when the eigenvalues are small and positive Truncated SVD m = V p S 1 p U T p d = px i=1 Ã ui d Noise in the data is amplified in the model if s i << 1. The eigenvalue spectrum needs to be truncated by reducing p. s i! v i Discrete Picard condition Stability question of Backus and Gilbert TSVD: Choose the smallest p such that data fit is acceptable Gm d 2 δ As N or M increase the computational cost increases significantly! (See example 4.3 of Aster et al., 2005) 132
SVD Example: The Shaw problem m(θ) = intensity of light incident on a slit at angle θ π 2 m(θ) π 2 d(s) = measurements of diffracted light intensity at angle s π 2 s π 2 Shaw Problem Given d(s) find m(s)? d(s) = Ã! 2 sin(π(sin(s)+sin(θ))) π/2 (cos(s)+cos(θ))2 m(θ)dθ π(sin(s)+sin(θ)) Z π/2 Is this a continuous or discrete inverse problem? Is this a linear or nonlinear inverse problem? 133
SVD Example: The Shaw problem Let s discretize the inverse problem Data d(s) and model m(θ) at N equal angles s i = θ i = (i 0.5)π n π 2, (i =1, 2,...,n) d i = d(s i ) m j = m(θ j ) (i =1,...,n) (j =1,...,n) This gives a system of N N linear equations where d = Gm Ã! G i,j = s(cos(s i )+cos(θ j )) 2 sin(π(sin(si )+sin(θ j ))) 2 π(sin(s i )+sin(θ j )) s = π n See MATLAB routine `shaw 134
Example: Ill-posedness Ill-posedness means solution sensitivity to noise m = V p S 1 p U T p d = px i=1 Ã ui d s i! v i d = Gm s i 20 data, 20 unknowns N = M =20 i Eigenvalue spectrum for Shaw problem Condition number is the ratio of largest to smallest singular value = 10 14 Large condition number means severe ill-posedness 135
Example: Ill-posedness Eigenvectors for different singular values: Shaw problem Ã! m = V p Sp 1 Up T px ui d d = v i i=1 s i Amplitude v 18 v 1 Model units Eigenvector for smallest non-zero singular value Model units Eigenvector for largest singular value 136
Test inversion without noise d = Gm m = V p S 1 p U T p d = px i=1 Ã ui d s i! v i Input spike model Data from input spike Amplitude Model units Data units Recovered model 137
d = Gm Test inversion with noise m = V p S 1 p U T p d = px i=1 Ã ui d s i! v i Input spike model Data from spike model Amplitude Model units Data units Recovered model Add Gaussian noise to data σ =10 6 Presence of small eigenvalues means sensitivity of solution to noise 138
d = Gm Shaw problem with p=10 m = V p S 1 p U T p d = px i=1 Ã ui d s i! v i Input spike model Amplitude Model units use first 10 eigenvalues only No noise solution Noise solution Truncating Truncating eigenvalues eigenvalues reduces reduces sensitivity sensitivity to to noise noise but but also also resolving resolving power power of of the the data data 139
Shaw problem Picard plot A guide to choosing the SVD truncation level p (=number of eigenvalues) Ã! m = V p Sp 1 Up T px ui d d = v i i=1 s i The eigenvalue from the truncation level in SVD 140