Resolving the White Noise Paradox in the Regularisation of Inverse Problems

1 / 32 Resolving the White Noise Paradox in the Regularisation of Inverse Problems Hanne Kekkonen joint work with Matti Lassas and Samuli Siltanen Department of Mathematics and Statistics University of Helsinki, Finland January 14, 2016

2 / 32

3 / 32 Outline Tikhonov approach Bayesian approach

4 / 32 Outline Tikhonov approach Bayesian approach

5 / 32 Our starting point is an indirect measurement problem We consider the continuous linear measurement model where m = Au + εδ, δ > 0 (1) u H r (N) is the unknown physical quantity of interest and m H s (N) the measurement, N closed d-dimensional manifold ε is Gaussian white noise. A is a finitely many times smoothing linear operator that models the measurement.

6 / 32 Tikhonov regularisation is a classical way to gain a noise-robust solution If εδ = m Au L 2 (N) we can use Tikhonov regularisation to get a reconstruction of u from the measurement m { u δ = argmin Au m 2 u H r L + δ 2 (I ) r/2 u 2 2 L }. (2) 2 However (2) is not well defined if we assume ε to be white noise since then ε L 2.

White noise does not belong to L 2 Formally ε = ε, ψ j ψ j j= where ψ j form an orthonormal basis for L 2. The Fourier coefficients of white noise satisfy ε, e k N(0, 1), where e k (t) = e ikt. Hence ε 2 L = ε, e 2 k 2 < k= with probability zero. For the white noise we have i) ε L 2 with probability zero, ii) ε H s, s > d/2, with probability one. 7 / 32

8 / 32 The white noise paradox If we are working on a discrete world ε k l 2 < with all k R. Hence the minimisation problem { } un δ = argmin Au n m k 2 u l + δ 2 (I D) r/2 u 2 n 2 l 2 is well defined. However we know that lim ε k l 2 =. k The goal is to build a rigorous theory removing the apparent paradox arising from the infinite L 2 -norm of the natural limit of white Gaussian noise in R k as k.

9 / 32 Why are we interested in continuous white noise? It is important to be able to connect discrete models to their infinite-dimensional limit models. In practice we do not solve the continuous problem but its discretisation. Discrete white noise is used in many practical inverse problems as a noise model. If the discrete model is an orthogonal projection of the continuous model to a finite dimensional subspace it guarantees that we can switch consistently between different discretisations which is important for e.g. multigrid methods.

10 / 32 We can define new regularised solution by omitting the constant term m 2 L 2 When m Au L 2 we can write m Au 2 L 2 = Au 2 L 2 2(m, Au) L 2 + m 2 L 2. Now omitting the constant term m 2 in (2) we get a new well L 2 defined minimisation problem { u δ = argmin Au 2 u H r L 2 m, Au + δ 2 u 2 2 H }. r The solution to the problem above is u δ = ( A A + δ 2 (I ) r ) 1 (A m) where A is a pseudodifferential operator.

11 / 32 Does omitting m 2 L 2 = cause any troubles? Example We consider problem m = Au+εδ = Φ( y)u(y)dy+εδ where u H 1 is a piecewise linear function, ε is white noise and The unknown function u. A = F 1 ((1 + n 2 ) 1 (Fu)(n)). We have u H 1 and u δ H 1 for all δ > 0 so does u δ u in H 1 when δ 0? Noiseless data m = Au

12 / 32 Solution u δ does not converge to u in H 1 We are interested in knowing what happens to the regularised solution u δ in different Sobolev spaces when δ 0. Figure: Normalised errors c(ζ) u u δ H ζ (T 2 ) in logarithmic scale with different values of ζ. We observe that lim δ 0 u u δ H1 (T 2 ) 0.

13 / 32 Theorem 1 Let N be a d-dimensional closed manifold and u H r (N) with r 0. Let ε H s (N) with some s > d/2 and consider the measurement m δ = Au + δε, where A : H r H r+t, is an elliptic pseudodifferential operator of order t < min{0, r s}. Assume that A : L 2 (N) L 2 (N) is injective. Then we have for the regularised solution u δ H r of u lim u δ = u δ 0 in H ζ (N) where ζ r s < r d/2.

14 / 32 Why do we not get convergence in the original space H r? We know that both u and u δ belong to H r but we can only prove convergence u δ u in H ζ, ζ < r d/2. If we study the problem only from the regularisation point of view it is hard to see why we get the upper limit ζ < r d/2 for the smoothness of the space of convergence.

15 / 32 Outline Tikhonov approach Bayesian approach

16 / 32 The indirect measurement problem revisited Let us return to the continuous linear measurement model M = AU + Eδ, δ > 0. (3) This time M, U and E are treated as random variables. The unknown U takes values in H τ (N) with some τ R. We assume E to be Gaussian white noise taking values in H s (N), s > d/2. The unknown is treated as a random variable since we have only incomplete data of U.

17 / 32 Prior distribution contains all other information about unknown U The prior distribution shouldn t depend on the measurement. The prior distribution should assign a clearly higher probability to those values of u that we expect to see than to unexpected u. Designing a good prior distributions is one of the main difficulties in Bayesian inversion.

18 / 32 Bayes formula combines data and a priori information We want to reconstruct the most probable u in light of Measurement information A priori information Bayes formula in discrete case Bayes formula gives us the posterior distribution π(u n m k ): π(u m) = π pr (u n )π ε (m k u n ) π m (m) ( = C exp 1 2δ 2 m k Au n 2 l 1 ) 2 2 (I D)r/2 u n 2 l. 2 We recover u δ as a point estimate from π(u n m k ). (4)

19 / 32 The result of Bayesian inversion is the posterior distribution, but typically one looks at estimates Maximum a posteriori (MAP) estimate: arg max π(u m) u Rn Conditional mean (CM) estimate: R n u π(u m) du

There is no general rule to know which estimate is better 20 / 32

21 / 32 We don t have Bayes formula for continuous problem If we assume that that the noise takes values in L 2 the MAPestimate of (4) Γ-converges to the following infinite-dimensional minimisation problem: { 1 argmin u H r 2δ 2 m Au 2 L + 1 } 2 2 u 2 H. r Now if we think that the above is a MAP estimate of a Bayesian problem we have to assume that U has formally the following distribution ( π pr (u) = c exp 12 ) u 2Hr. formally

22 / 32 Simple example in T 1 Next we assume that U, E N(0, I). We know that the realisations u, ε H s, s > 1/2 a.s. The unknown U has the formal distribution π pr (u) = c exp formally ( 12 u 2L2 ). Solving the CM/MAP estimate is linked to solving the minimisation problem } u δ = argmin { Au 2L2 2 m, Au + δ 2 u 2L2. u L 2 That is we are looking for approximation in L 2 even though the realisations of U are in L 2 with probability zero.

23 / 32 The prior U takes values in H τ, where τ < r d/2 in which Sobolev space H τ does U take values? Assume that U N(0, C U ) where C U is 2r times smoothing covariance operator. Then U(ω) H τ operator C U (I ) τ is a trace class operator in H τ. Above is true if τ < r d/2. Now the CM / MAP estimate for U is given by ( ) 1(A U δ = A A + δ 2 C 1/2 U M) Note that the approximation U δ (ω) H r with probability one but the prior U takes values in H τ, where τ < r d/2

24 / 32 What does this mean for our example? For Gaussian smoothness prior r = 1 but in two dimensional case we get that τ < 0. The realisations of the prior don t belong in L 2.

25 / 32 Bayesian explanation for why we do not get convergence in H r Assume that we are looking for an approximation U δ (ω) H r. To get the wanted penalty term we have to assume a prior U for which the realisations U(ω) H τ, τ < r d/2, a.s. Now it is natural that we only get convergence U δ (ω) U(ω) a.s. in H ζ with ζ < r d/2 since the realisations of the prior are in H r only with probability zero.

26 / 32 Theorem 2: Assumptions We assume that N is a d-dimensional closed manifold. Operator C U is 2r times smoothing, self-adjoint, injective and elliptic. Unknown U is a generalised random variable taking values in H τ, τ > d/2 r with mean zero and covariance operator C U. E is white Gaussian noise taking values in H s, s > d/2. Consider the measurement M δ = AU + δe, where A Ψ t, is an elliptic pseudodifferential operator of order t < min{0, τ s}. Assume that A : L 2 (N) L 2 (N) is injective.

27 / 32 Theorem 2: Convergence results Take ζ r s < r d/2. Then the following convergence takes place in H ζ (N) norm Above U δ (ω) H r a.s. lim E U δ(ω) U(ω) δ 0 We have the following estimates for the speed of convergence: (i) If ζ s t then (ii) If s t ζ r s then E U δ (ω) U(ω) H ζ Cδ E U δ (ω) U(ω) H ζ Cδ r s ζ t+r

28 / 32 Back to the minimisation problem Both the regularised solution in Tikhonov regularisation and the MAP estimate in Bayesian settings is given by { } u δ = arg min Au 2 u H r L 2 m, Au + α C 1/2 2 U u 2 L 2 where α = δ κ, κ > 0. We get the classical Tikhonov problem by choosing C U = (I ) r. In Bayesian case we have to choose κ = 2.

29 / 32 Convergence results in Tikhonov case ( ) } Take ζ < min {r, 2 κ r + 2 κ 1 t s then we get the following convergence and the speed of convergence lim u δ u H ζ = 0 δ 0 u δ u H ζ C max{δ κ(r ζ) κ(s+t+ζ) 1 2(t+r), δ 2(t+r) }. Note that above u, u δ H r (N) with r 0.

30 / 32 What can we say about the convergence depending on κ? The regularised solution { } u δ = argmin Au 2 u H r L 2 m, Au + δ κ u 2 2 H r can be written in a form where K = A A + δ κ (I ) r. u δ = K 1 A Au + K 1 A δε = u noiseless + u noise The noise term is dominating if we choose κ 2. We get convergence in H r when κ 1. Note that in classical theory we get convergence when κ < 2.

31 / 32 From Bayesian to Tikhonov and (almost) back A prior distribution U N(0, C U ). Independent of the measurement. MAP estimate given by { } U δ (ω) = arg min Au 2 u H r L 2 m, Au + α C 1/2 2 U u 2 L 2 where α = δ 2. Relax the previous assumption; α = δ κ, κ > 0. New a priori distribution U δ 2 κ (0, C U ). Note that this distribution depends on the noise level!

Thank you for your attention! 32 / 32