An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information. Model: where g(t) = a(t s)f(s)ds + e(t), a(t) t = (rapidly). The problem, even without discretization errors is ill-posed. Computational Methods in Inverse Problems, Mat.3626 -

Analytic solution Noiseless model: g(t) = Apply the Fourier transform, a(t s)f(s)ds. ĝ(k) = e ikt g(t)dt. The Convolution Theorem implies so, by inverse Fourier transform, ĝ(k) = â(k) f(k), f(t) = 2π itk ĝ(k) e â(k) dk. Computational Methods in Inverse Problems, Mat.3626 -

Example: Gaussian kernel, noisy data: ( a(t) = exp ) 2πω 2 2ω 2 t2. In the frequency domain, â(k) = exp ( 2 ) ω2 k 2. If ĝ(k) = â(k) f(k) + ê(k), the estimate f est (t) of f based on the Convolution Theorem is f est (t) = f(t) + ( ) ê(k)exp 2π 2 ω2 k 2 + ikt dk, which may not be even well defined unless ê(k) drops faster than superexponentially at high frequencies. Computational Methods in Inverse Problems, Mat.3626-2

Finite sampling, finite interval: Discretized problem g(t j ) = a(t j s)f(s)ds + e j, j m. Discrete approximation of the integral by a quadrature rule. Use the piecewise constant approximation with uniformly distributed sampling, a(t j s)f(s)ds n j= n a(t j s k )f(s k ) = n a jk x k, j= where s k = k n, x k = f(s k ), a jk = n a(t j s k ). Computational Methods in Inverse Problems, Mat.3626-3

Discrete equation Additive Gaussian noise: likelihood b = Ax + e, b j = g(t j ). Stochastic extension: X, E and B random variables, B = AX + E. Assume that E is Gaussian white noise with variance σ 2, ( E N (, σ 2 I), or π noise (e) exp ) 2σ 2 e 2, where σ 2 is assumed known. Likelihood density ( π(b x) exp ) b Ax 2. 2σ2 Computational Methods in Inverse Problems, Mat.3626-4

Prior according to the information available Prior information: The signal f varies rather slowly, the slope being not more than of the order one, and vanishes at t =. Write an autoregressive Markov model (ARMA): X j = X j + θw j, W j N (, ), X =. x j x j STD=γ Computational Methods in Inverse Problems, Mat.3626-5

How do we choose the variance θ of the innovation process? Use the slope information: The slope is S j = n(x j X j ) = n θw j and P { S j < 2n } θ.95, 2n θ = two standard deviations. A reasonable choice would then be 2n θ =, or θ = 4n 2. Computational Methods in Inverse Problems, Mat.3626-6

The ARMA model in matrix form: X j X j = θw j, j n, gives where LX = θw, L =...... W N (, I),. Prior density: (/ θ)lx = W, so π prior (x) exp( 2θ Lx 2 ), θ = 4n 2. Computational Methods in Inverse Problems, Mat.3626-7

Posterior density From Bayes formula, the posterior density is π post (x) = π(x b) π prior (x)π(b x) ( exp 2σ 2 b Ax 2 ) 2θ Lx 2. This is a Gaussian density, whose maximum is the Maximum A Posteriori (MAP) estimate, ( x MAP = argmin 2σ 2 b Ax 2 + ) 2θ Lx 2, or, equivalently, the least squares solution of the system [ ] [ ] σ A σ θ /2 x = b. L Computational Methods in Inverse Problems, Mat.3626-8

Posterior covariance: by completing the square, 2σ 2 b Ax 2 + ( 2θ Lx 2 = x T σ 2 AT A + ) θ LT L }{{} =Γ = ( x Γ ) T ( σ 2 AT b Γ x Γ ) σ 2 AT b ( ) x+2x T σ 2 AT b + σ 2 b 2 + terms independent of x. Denoting x = Γ σ 2 AT b = ( σ 2 AT A + ) θ LT L σ 2 AT b. Computational Methods in Inverse Problems, Mat.3626-9

The posterior density is π post (x) exp ( ) 2 (x x ) T Γ (x x ), or in other words, π post N (x, Γ), Γ = ( σ 2 AT A + ) θ LT L, x = Γ σ 2 AT b = ( σ 2 AT A + ) θ LT L σ 2 AT b = x MAP. Computational Methods in Inverse Problems, Mat.3626 -

Example Gaussian convolution kernel, ( a(t) = exp ) 2πω 2 2ω 2 t2 Number of discretization points = 6. See the attached Matlab file Deconvloution.m: Test : Input function triangular pulse,, ω =.5. f(t) = { αt, t t =.7, β( t), t > t, where α =.6, β = α t t =.4 >. Hence, the prior information about the slopes is slightly too conservative. Computational Methods in Inverse Problems, Mat.3626 -

Posterior credibility envelope Plotting the MAP estimate and the 95% credibility envelope: If X is distributed according to the Gaussian posterior density, var(x j ) = Γ jj. With 95% a posteriori probability (x ) j 2 Γ jj < X j < (x ) j + 2 Γ jj. Try the algorithm with various values of θ. Computational Methods in Inverse Problems, Mat.3626-2

Test 2: Input function, whose slope is locally larger than one: ( f(t) = (t.5)exp ) (t.5)2, δ 2 =. 2δ2 The slope in the interval [.4,.6] is much larger than one, while outside the interval it is very small. The algorithm does not perform well. Assume that we, for some source, have the following prior information: Prior information: The signal is almost constant (the slope of the order., say) outside the interval [.4,, 6], while within this interval, the slope can be of order. Computational Methods in Inverse Problems, Mat.3626-3

Non-stationary ARMA model Write Write an autoregressive Markov model (ARMA): X j = X j + θ j W j, W j N (, ), X =, i.e., the variance of the innovation process varies. How to choose the variance? Slope information: The slope is S j = n(x j X j ) = n θ j W j and P { S j < 2n } θ j.95, 2n θ = two standard deviations. A reasonable choice would then be 2n {,.4 < tj <.6 θ j =. else Computational Methods in Inverse Problems, Mat.3626-4

Define the variance vector θ R n, θ j = 25 n 2, if.4 < t j <.6 θ j = 4n 2, else, The ARMA model in matrix form: D /2 LX = W N (, I), where D = diag(θ). This leads to a prior model π prior (x) exp ( 2 ) D /2 Lx 2. Computational Methods in Inverse Problems, Mat.3626-5

Posterior density From Bayes formula, the posterior density is π post (x) = π(x b) π prior (x)π(b x) ( exp 2σ 2 b Ax 2 ) 2 D /2 Lx 2. This is a Gaussian density, whose maximum is the Maximum A Posteriori (MAP) estimate, ( x MAP = argmin 2σ 2 b Ax 2 + ) 2 D /2 Lx 2, or, equivalently, the least squares solution of the system [ ] [ ] σ A σ D /2 x = b. L Computational Methods in Inverse Problems, Mat.3626-6

Gaussian posterior density Posterior mean and covariance π post N (x, Γ), where Γ = ( ) σ 2 AT A + L T D L, x = Γ σ 2 AT b = ( σ 2 AT A + L T D L ) σ 2 AT b = x MAP. Computational Methods in Inverse Problems, Mat.3626-7

Test 2 continued: Use the non-stationary ARMA model based prior. Try different values for the variance in the interval. Observe the effect on the posterior covariance envelopes. Test 3: Go back to the stationary ARMA model, and assume this time that f(t) = {, τ < t < τ 2, else where τ = 5/6, τ 2 = 9/6. Observe that the prior is badly in conflict with the reality. Notice that the posterior envelopes are very tight, but the true signal is not within. Is it wrong to say: The p% credibility envelopes contain the true signal with probability p%. Computational Methods in Inverse Problems, Mat.3626-8

Smoothness prior and discontinuities Assume that the prior information is: Prior information: The signal is almost constant (the slope of the order., say), but is suffers a jump of order one somewhere around t 5 = 6/6 and t 9 = 9/6. Build an ARMA model based on this information: θ j =, except around j = 5 and j = 9, 4n2 and adjust the variances around the jumps to correspond to the jump amplitude, e.g., θ j = 4, corresponding to standard deviation θ j =.5. Notice the large posterior variance around the jumps! Computational Methods in Inverse Problems, Mat.3626-9

Prior based on incorrect information Note: I avoid the term incorrect prior: prior is what we believe a priori, it is not right or wrong, but evidence may prove to be against it or supporting it. In simulations, it is easy to judge a prior incorrect, because we control the truth. Test 4: What happens if we have incorrect information concerning the jumps? We believe that there is a third jump, which does not exist in the input; or the prior information concerning the jump locations is completely wrong. Computational Methods in Inverse Problems, Mat.3626-2

A step up: unknown variance Going back to the original problem, but with less prior information: Prior information: The signal f varies rather slowly, and vanishes at t =, but no particular information about the slope is available. As before, start with the initial ARMA model: or in the matrix form, X j = X j + θw j, W j N (, ), X =, LX = θw, W N (, I). As before, we write the prior, pretending that we knew the variance, π prior (x θ) = C exp ( 2θ ) Lx 2. Computational Methods in Inverse Problems, Mat.3626-2

Normalizing constant: the integral of the density has to be one, = C(θ) exp ( 2θ ) Lx 2 dx, R n and with the change of variables x = θz, dx = θ n/2 dz, we obtain so we deduce that = θ n/2 C(θ) exp ( 2 ) Lz 2 dz, R } n {{} independent of θ C(θ) θ n/2, and we may write π prior (x θ) exp ( 2θ Lx 2 n2 ) log θ. Computational Methods in Inverse Problems, Mat.3626-22

Stochastic extension Since θ is not known, it will be treated as a random variable Θ. Any information concerning Θ is then coded the prior probability density called the hyperprior, π hyper (θ). The inverse problem is to infer on the pair of unknown, (X, Θ). The joint prior density is π prior (x, θ) = π prior (x θ)π hyper (θ). The posterior density of (X, Θ) is, by Bayes formula, π post (x, θ) π prior (x, θ)π(b x) = π prior (x θ)π hyper (θ)π(b x). Computational Methods in Inverse Problems, Mat.3626-23

Here, we assume that the only prior information concerning Θ is its positivity, π hyper (θ)π + (θ) = {, θ >, θ Note: this is, in fact an improper density, since i is not integrable. In practice, we assume an upper bound that we hope will never play a role. The posterior density is now π post (x, θ) π + (θ)exp ( 2σ 2 b Ax 2 2θ Lx 2 n ) 2 log θ. Computational Methods in Inverse Problems, Mat.3626-24

Sequential optimization: Iterative algorithm for the MAP estimate. Initialize θ = θ >, k =. 2. Update x, 3. Update θ, x k = argmax { π post (x θ k ) }. θ k = argmax { π post (θ x k ) }. 4. Increase k by one and repeat from 2. until convergence. Notice: π post (x θ) means that we simply fix θ. Computational Methods in Inverse Problems, Mat.3626-25

Updating steps in practice: Updating x: Since θ = θ k is fixed, we simply have { x k = argmin 2σ 2 b Ax 2 + } Lx 2, 2θ k which is the Good Old Least Squares Solution (GOLSQ). Updating θ: The likelihood does not depend on θ, and so we have { θ k = argmin 2θ Lx k 2 + n } 2 log θ, which is the zero of the derivative of the objective function, or 2θ 2 Lx k 2 + n 2θ =, θ k = Lx k 2 n. Computational Methods in Inverse Problems, Mat.3626-26

Undetermined variances Qualitative description of problem: Given a noisy indirect observation, recover a signal or an image that varies slowly except for unknown number of jumps of unknown size and location. Information concerning the jumps: The jumps should be sudden, suggesting that the variances should be mutually independent. There is no obvious preference of one location over others, therefore the components should be identically distributed. Only a few variances can be significantly large, while most of them should be small, suggesting a hyperprior that allows rare outliers. Computational Methods in Inverse Problems, Mat.3626-27

Candidate probability densities: Gamma distribution, θ j Gamma(α, θ ), inverse Gamma distribution, π hyper (θ) n j= ( θ α j exp θ ) j, () θ θ j InvGamma(α, θ ), π hyper (θ) n j= ( θ α j exp θ ). (2) θ j Computational Methods in Inverse Problems, Mat.3626-28

Random Draws x 4 x 4.8.8.6.6.4.4.2.2 2 4 6 8 2 4 6 8 Computational Methods in Inverse Problems, Mat.3626-29

Estimating the MAP Outline of the algorithm is as follows:. Initialize θ = θ, k =. 2. Update the estimate of the increments: x k = argmax π(x, θ k b). 3. Update the estimate of the variances θ: θ k = argmax π(x k, θ b). 4. Increase k by one and repeat from 2. until convergence. Computational Methods in Inverse Problems, Mat.3626-3

In practice: F (x, θ b) = log ( π(x, θ b) ), which becomes, when the hyperprior is the gamma distribution, F (x, θ b) 2σ 2 Az b 2 + 2 D /2 Lx 2 + n θ j θ j= and, when the hyperprior is the inverse gamma distribution, where D = D θ. F (x, θ b) 2σ 2 Ax b 2 + 2 D /2 Lx 2 + θ n j= θ j + ( α 3 ) n log θ j, 2 j= ( α + 3 ) n log θ j, 2 j= Computational Methods in Inverse Problems, Mat.3626-3

. Updating x: x k = argmin ( 2σ 2 Ax b 2 + ) 2 D /2 Lx 2, D = D θ k, that is, x k is the least squares solution of the system [ ] [ ] (/σ)a (/σ)b D /2 x =. L 2. Updating θ: θ k j satisfies θ j F (x k, θ) = 2 ( z k j θ j ) 2 + θ ( α 3 ) =, 2 θ j z k = Lx k, which has an explicit solution, θj k (zj = θ η k + )2 + η 2θ 2, η = 2 ( α 3 ). 2 Computational Methods in Inverse Problems, Mat.3626-32

Computed example: Signal and data.2.8.6.4.2.8.6.4.2.2.5.5 2 2.5 3 Gaussian blur, noise level 2%.2.5.5 2 2.5 3 Computational Methods in Inverse Problems, Mat.3626-33

MAP estimate, Gamma hyperprior x 3.5 2.5 2.5.5.5.5 5 Iteration 5 t 2 3 5 Iteration 5 t 2 3 Computational Methods in Inverse Problems, Mat.3626-34

MAP estimate, Gamma hyperprior 2.5 x 3.8.6.4.2.2.5.5 2 2.5 3 2.5.5.5.5 2 2.5 3 Computational Methods in Inverse Problems, Mat.3626-35

MAP estimate, Inverse Gamma hyperprior.5.2.5.5..5.5 5 Iteration 5 t 2 3 5 Iteration 5 t 2 3 Computational Methods in Inverse Problems, Mat.3626-36

MAP estimate, Inverse Gamma hyperprior.2.8.6.4.2.2.5.5 2 2.5 3.5..5.5.5 2 2.5 3 Computational Methods in Inverse Problems, Mat.3626-37