Resolving the White Noise Paradox in the Regularisation of Inverse Problems

Similar documents
Discretization-invariant Bayesian inversion and Besov space priors. Samuli Siltanen RICAM Tampere University of Technology

Bayesian inverse problems with Laplacian noise

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

MCMC Sampling for Bayesian Inference using L1-type Priors

Regularization and Inverse Problems

Large deviations and averaging for systems of slow fast stochastic reaction diffusion equations.

Recent Advances in Bayesian Inference for Inverse Problems

MAT Inverse Problems, Part 2: Statistical Inversion

Minimum Message Length Analysis of the Behrens Fisher Problem

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

An inverse scattering problem in random media

Inverse problem and optimization

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Cheng Soon Ong & Christian Walder. Canberra February June 2018

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

A hierarchical Krylov-Bayes iterative inverse solver for MEG with physiological preconditioning

INSTANTON MODULI AND COMPACTIFICATION MATTHEW MAHOWALD

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Reproducing Kernel Hilbert Spaces

Stochastic Spectral Approaches to Bayesian Inference

Numerical differentiation by means of Legendre polynomials in the presence of square summable noise

Calderón s inverse problem in 2D Electrical Impedance Tomography

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Dimension-Independent likelihood-informed (DILI) MCMC

Introduction to Bayesian methods in inverse problems

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

On a Data Assimilation Method coupling Kalman Filtering, MCRE Concept and PGD Model Reduction for Real-Time Updating of Structural Mechanics Model

Greedy Signal Recovery and Uniform Uncertainty Principles

The spectral zeta function

Preconditioned space-time boundary element methods for the heat equation

Bayesian Interpretations of Regularization

Scalable algorithms for optimal experimental design for infinite-dimensional nonlinear Bayesian inverse problems

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

ECE521 week 3: 23/26 January 2017

Inverse problems in statistics

Bayesian Inference by Density Ratio Estimation

Small ball probabilities and metric entropy

Probabilistic & Unsupervised Learning

Hypoelliptic multiscale Langevin diffusions and Slow fast stochastic reaction diffusion equations.

arxiv: v3 [math.st] 16 Jul 2013

D I S C U S S I O N P A P E R

Statistical regularization theory for Inverse Problems with Poisson data

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Provable Alternating Minimization Methods for Non-convex Optimization

Nonlinear error dynamics for cycled data assimilation methods

Nonparametric Drift Estimation for Stochastic Differential Equations

Adaptive methods for control problems with finite-dimensional control space

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

Parametric Problems, Stochastics, and Identification

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis

Convergence of the Ensemble Kalman Filter in Hilbert Space

Calculus in Gauss Space

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Contre-examples for Bayesian MAP restoration. Mila Nikolova

Structured signal recovery from non-linear and heavy-tailed measurements

The Kato square root problem on vector bundles with generalised bounded geometry

The Kato square root problem on vector bundles with generalised bounded geometry

More Spectral Clustering and an Introduction to Conjugacy

EE 574 Detection and Estimation Theory Lecture Presentation 8

Strengthened Sobolev inequalities for a random subspace of functions

Approximation of Geometric Data

Lecture : Probabilistic Machine Learning

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Multiscale Frame-based Kernels for Image Registration

STA414/2104 Statistical Methods for Machine Learning II

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

SPDEs, criticality, and renormalisation

L19: Fredholm theory. where E u = u T X and J u = Formally, J-holomorphic curves are just 1

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

Iterative Methods for Ill-Posed Problems

CS-E3210 Machine Learning: Basic Principles

(x x 0 ) 2 + (y y 0 ) 2 = ε 2, (2.11)

Multilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems

Contributors and Resources

Variational problems in machine learning and their solution with finite elements

Estimation theory and information geometry based on denoising

PATTERN RECOGNITION AND MACHINE LEARNING

Inference and estimation in probabilistic time series models

Lecture 4: Harmonic forms

Computation of operators in wavelet coordinates

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Traces, extensions and co-normal derivatives for elliptic systems on Lipschitz domains

LECTURE 1: SOURCES OF ERRORS MATHEMATICAL TOOLS A PRIORI ERROR ESTIMATES. Sergey Korotov,

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Statistical Estimation of the Parameters of a PDE

Reproducing Kernel Hilbert Spaces

Wavelets For Computer Graphics

From independent component analysis to score matching

ESTIMATION THEORY AND INFORMATION GEOMETRY BASED ON DENOISING. Aapo Hyvärinen

Gaussian Processes. 1. Basic Notions

Density Modeling and Clustering Using Dirichlet Diffusion Trees

From Completing the Squares and Orthogonal Projection to Finite Element Methods

Super-Resolution. Shai Avidan Tel-Aviv University

Greedy control. Martin Lazar University of Dubrovnik. Opatija, th Najman Conference. Joint work with E: Zuazua, UAM, Madrid

Transcription:

1 / 32 Resolving the White Noise Paradox in the Regularisation of Inverse Problems Hanne Kekkonen joint work with Matti Lassas and Samuli Siltanen Department of Mathematics and Statistics University of Helsinki, Finland January 14, 2016

2 / 32

3 / 32 Outline Tikhonov approach Bayesian approach

4 / 32 Outline Tikhonov approach Bayesian approach

5 / 32 Our starting point is an indirect measurement problem We consider the continuous linear measurement model where m = Au + εδ, δ > 0 (1) u H r (N) is the unknown physical quantity of interest and m H s (N) the measurement, N closed d-dimensional manifold ε is Gaussian white noise. A is a finitely many times smoothing linear operator that models the measurement.

6 / 32 Tikhonov regularisation is a classical way to gain a noise-robust solution If εδ = m Au L 2 (N) we can use Tikhonov regularisation to get a reconstruction of u from the measurement m { u δ = argmin Au m 2 u H r L + δ 2 (I ) r/2 u 2 2 L }. (2) 2 However (2) is not well defined if we assume ε to be white noise since then ε L 2.

White noise does not belong to L 2 Formally ε = ε, ψ j ψ j j= where ψ j form an orthonormal basis for L 2. The Fourier coefficients of white noise satisfy ε, e k N(0, 1), where e k (t) = e ikt. Hence ε 2 L = ε, e 2 k 2 < k= with probability zero. For the white noise we have i) ε L 2 with probability zero, ii) ε H s, s > d/2, with probability one. 7 / 32

8 / 32 The white noise paradox If we are working on a discrete world ε k l 2 < with all k R. Hence the minimisation problem { } un δ = argmin Au n m k 2 u l + δ 2 (I D) r/2 u 2 n 2 l 2 is well defined. However we know that lim ε k l 2 =. k The goal is to build a rigorous theory removing the apparent paradox arising from the infinite L 2 -norm of the natural limit of white Gaussian noise in R k as k.

9 / 32 Why are we interested in continuous white noise? It is important to be able to connect discrete models to their infinite-dimensional limit models. In practice we do not solve the continuous problem but its discretisation. Discrete white noise is used in many practical inverse problems as a noise model. If the discrete model is an orthogonal projection of the continuous model to a finite dimensional subspace it guarantees that we can switch consistently between different discretisations which is important for e.g. multigrid methods.

10 / 32 We can define new regularised solution by omitting the constant term m 2 L 2 When m Au L 2 we can write m Au 2 L 2 = Au 2 L 2 2(m, Au) L 2 + m 2 L 2. Now omitting the constant term m 2 in (2) we get a new well L 2 defined minimisation problem { u δ = argmin Au 2 u H r L 2 m, Au + δ 2 u 2 2 H }. r The solution to the problem above is u δ = ( A A + δ 2 (I ) r ) 1 (A m) where A is a pseudodifferential operator.

11 / 32 Does omitting m 2 L 2 = cause any troubles? Example We consider problem m = Au+εδ = Φ( y)u(y)dy+εδ where u H 1 is a piecewise linear function, ε is white noise and The unknown function u. A = F 1 ((1 + n 2 ) 1 (Fu)(n)). We have u H 1 and u δ H 1 for all δ > 0 so does u δ u in H 1 when δ 0? Noiseless data m = Au

12 / 32 Solution u δ does not converge to u in H 1 We are interested in knowing what happens to the regularised solution u δ in different Sobolev spaces when δ 0. Figure: Normalised errors c(ζ) u u δ H ζ (T 2 ) in logarithmic scale with different values of ζ. We observe that lim δ 0 u u δ H1 (T 2 ) 0.

13 / 32 Theorem 1 Let N be a d-dimensional closed manifold and u H r (N) with r 0. Let ε H s (N) with some s > d/2 and consider the measurement m δ = Au + δε, where A : H r H r+t, is an elliptic pseudodifferential operator of order t < min{0, r s}. Assume that A : L 2 (N) L 2 (N) is injective. Then we have for the regularised solution u δ H r of u lim u δ = u δ 0 in H ζ (N) where ζ r s < r d/2.

14 / 32 Why do we not get convergence in the original space H r? We know that both u and u δ belong to H r but we can only prove convergence u δ u in H ζ, ζ < r d/2. If we study the problem only from the regularisation point of view it is hard to see why we get the upper limit ζ < r d/2 for the smoothness of the space of convergence.

15 / 32 Outline Tikhonov approach Bayesian approach

16 / 32 The indirect measurement problem revisited Let us return to the continuous linear measurement model M = AU + Eδ, δ > 0. (3) This time M, U and E are treated as random variables. The unknown U takes values in H τ (N) with some τ R. We assume E to be Gaussian white noise taking values in H s (N), s > d/2. The unknown is treated as a random variable since we have only incomplete data of U.

17 / 32 Prior distribution contains all other information about unknown U The prior distribution shouldn t depend on the measurement. The prior distribution should assign a clearly higher probability to those values of u that we expect to see than to unexpected u. Designing a good prior distributions is one of the main difficulties in Bayesian inversion.

18 / 32 Bayes formula combines data and a priori information We want to reconstruct the most probable u in light of Measurement information A priori information Bayes formula in discrete case Bayes formula gives us the posterior distribution π(u n m k ): π(u m) = π pr (u n )π ε (m k u n ) π m (m) ( = C exp 1 2δ 2 m k Au n 2 l 1 ) 2 2 (I D)r/2 u n 2 l. 2 We recover u δ as a point estimate from π(u n m k ). (4)

19 / 32 The result of Bayesian inversion is the posterior distribution, but typically one looks at estimates Maximum a posteriori (MAP) estimate: arg max π(u m) u Rn Conditional mean (CM) estimate: R n u π(u m) du

There is no general rule to know which estimate is better 20 / 32

21 / 32 We don t have Bayes formula for continuous problem If we assume that that the noise takes values in L 2 the MAPestimate of (4) Γ-converges to the following infinite-dimensional minimisation problem: { 1 argmin u H r 2δ 2 m Au 2 L + 1 } 2 2 u 2 H. r Now if we think that the above is a MAP estimate of a Bayesian problem we have to assume that U has formally the following distribution ( π pr (u) = c exp 12 ) u 2Hr. formally

22 / 32 Simple example in T 1 Next we assume that U, E N(0, I). We know that the realisations u, ε H s, s > 1/2 a.s. The unknown U has the formal distribution π pr (u) = c exp formally ( 12 u 2L2 ). Solving the CM/MAP estimate is linked to solving the minimisation problem } u δ = argmin { Au 2L2 2 m, Au + δ 2 u 2L2. u L 2 That is we are looking for approximation in L 2 even though the realisations of U are in L 2 with probability zero.

23 / 32 The prior U takes values in H τ, where τ < r d/2 in which Sobolev space H τ does U take values? Assume that U N(0, C U ) where C U is 2r times smoothing covariance operator. Then U(ω) H τ operator C U (I ) τ is a trace class operator in H τ. Above is true if τ < r d/2. Now the CM / MAP estimate for U is given by ( ) 1(A U δ = A A + δ 2 C 1/2 U M) Note that the approximation U δ (ω) H r with probability one but the prior U takes values in H τ, where τ < r d/2

24 / 32 What does this mean for our example? For Gaussian smoothness prior r = 1 but in two dimensional case we get that τ < 0. The realisations of the prior don t belong in L 2.

25 / 32 Bayesian explanation for why we do not get convergence in H r Assume that we are looking for an approximation U δ (ω) H r. To get the wanted penalty term we have to assume a prior U for which the realisations U(ω) H τ, τ < r d/2, a.s. Now it is natural that we only get convergence U δ (ω) U(ω) a.s. in H ζ with ζ < r d/2 since the realisations of the prior are in H r only with probability zero.

26 / 32 Theorem 2: Assumptions We assume that N is a d-dimensional closed manifold. Operator C U is 2r times smoothing, self-adjoint, injective and elliptic. Unknown U is a generalised random variable taking values in H τ, τ > d/2 r with mean zero and covariance operator C U. E is white Gaussian noise taking values in H s, s > d/2. Consider the measurement M δ = AU + δe, where A Ψ t, is an elliptic pseudodifferential operator of order t < min{0, τ s}. Assume that A : L 2 (N) L 2 (N) is injective.

27 / 32 Theorem 2: Convergence results Take ζ r s < r d/2. Then the following convergence takes place in H ζ (N) norm Above U δ (ω) H r a.s. lim E U δ(ω) U(ω) δ 0 We have the following estimates for the speed of convergence: (i) If ζ s t then (ii) If s t ζ r s then E U δ (ω) U(ω) H ζ Cδ E U δ (ω) U(ω) H ζ Cδ r s ζ t+r

28 / 32 Back to the minimisation problem Both the regularised solution in Tikhonov regularisation and the MAP estimate in Bayesian settings is given by { } u δ = arg min Au 2 u H r L 2 m, Au + α C 1/2 2 U u 2 L 2 where α = δ κ, κ > 0. We get the classical Tikhonov problem by choosing C U = (I ) r. In Bayesian case we have to choose κ = 2.

29 / 32 Convergence results in Tikhonov case ( ) } Take ζ < min {r, 2 κ r + 2 κ 1 t s then we get the following convergence and the speed of convergence lim u δ u H ζ = 0 δ 0 u δ u H ζ C max{δ κ(r ζ) κ(s+t+ζ) 1 2(t+r), δ 2(t+r) }. Note that above u, u δ H r (N) with r 0.

30 / 32 What can we say about the convergence depending on κ? The regularised solution { } u δ = argmin Au 2 u H r L 2 m, Au + δ κ u 2 2 H r can be written in a form where K = A A + δ κ (I ) r. u δ = K 1 A Au + K 1 A δε = u noiseless + u noise The noise term is dominating if we choose κ 2. We get convergence in H r when κ 1. Note that in classical theory we get convergence when κ < 2.

31 / 32 From Bayesian to Tikhonov and (almost) back A prior distribution U N(0, C U ). Independent of the measurement. MAP estimate given by { } U δ (ω) = arg min Au 2 u H r L 2 m, Au + α C 1/2 2 U u 2 L 2 where α = δ 2. Relax the previous assumption; α = δ κ, κ > 0. New a priori distribution U δ 2 κ (0, C U ). Note that this distribution depends on the noise level!

Thank you for your attention! 32 / 32