An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

Similar documents
Inverse Problems in the Bayesian Framework

Introduction to Bayesian methods in inverse problems

MCMC Sampling for Bayesian Inference using L1-type Priors

A hierarchical Krylov-Bayes iterative inverse solver for MEG with physiological preconditioning

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Resolving the White Noise Paradox in the Regularisation of Inverse Problems

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

The problem is to infer on the underlying probability distribution that gives rise to the data S.

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Inverse problem and optimization

The Metropolis-Hastings Algorithm. June 8, 2012

Linear Dynamical Systems

Bayesian Learning (II)

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

DSGE Methods. Estimation of DSGE models: Maximum Likelihood & Bayesian. Willi Mutschler, M.Sc.

GAUSSIAN PROCESS REGRESSION

Classical and Bayesian inference

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Statistical techniques for data analysis in Cosmology

Statistical Estimation of the Parameters of a PDE

David Giles Bayesian Econometrics

Lecture 3: Pattern Classification

Principles of Bayesian Inference

Dynamic System Identification using HDMR-Bayesian Technique

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Gibbs Sampler Componentwise sampling directly from the target density π(x), x R n. Define a transition kernel

Bayesian Regression Linear and Logistic Regression

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier

Lecture 6: Bayesian Inference in SDE Models

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Density Estimation. Seungjin Choi

Spatially Adaptive Smoothing Splines

GWAS V: Gaussian processes

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

MAT Inverse Problems, Part 2: Statistical Inversion

Probabilistic numerics for deep learning

Strong Lens Modeling (II): Statistical Methods

Bayesian modelling of fmri time series

Gaussian Process Regression

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Computational statistics

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Reminder of some Markov Chain properties:

Expectation Maximization Deconvolution Algorithm

Chapter 8: Sampling distributions of estimators Sections

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Cheng Soon Ong & Christian Walder. Canberra February June 2018

ECE531 Homework Assignment Number 6 Solution

X t = a t + r t, (7.1)

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Sequential Monte Carlo Samplers for Applications in High Dimensions

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Linear Models

CSC 2541: Bayesian Methods for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Variational Methods in Bayesian Deconvolution

Markov Chain Monte Carlo methods

10708 Graphical Models: Homework 2

A Bayesian Model for Root Computation

Stochastic Spectral Approaches to Bayesian Inference

Analysing geoadditive regression data: a mixed model approach

Gaussian Processes for Machine Learning

A short introduction to INLA and R-INLA

eqr094: Hierarchical MCMC for Bayesian System Reliability

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Bayesian Linear Models

STA 4273H: Sta-s-cal Machine Learning

ECE521 week 3: 23/26 January 2017

F denotes cumulative density. denotes probability density function; (.)

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

6.867 Machine Learning

Principles of Bayesian Inference

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

MODULE -4 BAYEIAN LEARNING

Principles of Bayesian Inference

Algorithmisches Lernen/Machine Learning

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

STAT 730 Chapter 4: Estimation

Bayesian Methods for Machine Learning

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

Naïve Bayes classification

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Expectation Propagation Algorithm

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Fast Likelihood-Free Inference via Bayesian Optimization

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

1 Bayesian Linear Regression (BLR)

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Encoding or decoding

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

Simulation. Where real stuff starts

Transcription:

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information. Model: where g(t) = a(t s)f(s)ds + e(t), a(t) t = (rapidly). The problem, even without discretization errors is ill-posed. Computational Methods in Inverse Problems, Mat.3626 -

Analytic solution Noiseless model: g(t) = Apply the Fourier transform, a(t s)f(s)ds. ĝ(k) = e ikt g(t)dt. The Convolution Theorem implies so, by inverse Fourier transform, ĝ(k) = â(k) f(k), f(t) = 2π itk ĝ(k) e â(k) dk. Computational Methods in Inverse Problems, Mat.3626 -

Example: Gaussian kernel, noisy data: ( a(t) = exp ) 2πω 2 2ω 2 t2. In the frequency domain, â(k) = exp ( 2 ) ω2 k 2. If ĝ(k) = â(k) f(k) + ê(k), the estimate f est (t) of f based on the Convolution Theorem is f est (t) = f(t) + ( ) ê(k)exp 2π 2 ω2 k 2 + ikt dk, which may not be even well defined unless ê(k) drops faster than superexponentially at high frequencies. Computational Methods in Inverse Problems, Mat.3626-2

Finite sampling, finite interval: Discretized problem g(t j ) = a(t j s)f(s)ds + e j, j m. Discrete approximation of the integral by a quadrature rule. Use the piecewise constant approximation with uniformly distributed sampling, a(t j s)f(s)ds n j= n a(t j s k )f(s k ) = n a jk x k, j= where s k = k n, x k = f(s k ), a jk = n a(t j s k ). Computational Methods in Inverse Problems, Mat.3626-3

Discrete equation Additive Gaussian noise: likelihood b = Ax + e, b j = g(t j ). Stochastic extension: X, E and B random variables, B = AX + E. Assume that E is Gaussian white noise with variance σ 2, ( E N (, σ 2 I), or π noise (e) exp ) 2σ 2 e 2, where σ 2 is assumed known. Likelihood density ( π(b x) exp ) b Ax 2. 2σ2 Computational Methods in Inverse Problems, Mat.3626-4

Prior according to the information available Prior information: The signal f varies rather slowly, the slope being not more than of the order one, and vanishes at t =. Write an autoregressive Markov model (ARMA): X j = X j + θw j, W j N (, ), X =. x j x j STD=γ Computational Methods in Inverse Problems, Mat.3626-5

How do we choose the variance θ of the innovation process? Use the slope information: The slope is S j = n(x j X j ) = n θw j and P { S j < 2n } θ.95, 2n θ = two standard deviations. A reasonable choice would then be 2n θ =, or θ = 4n 2. Computational Methods in Inverse Problems, Mat.3626-6

The ARMA model in matrix form: X j X j = θw j, j n, gives where LX = θw, L =...... W N (, I),. Prior density: (/ θ)lx = W, so π prior (x) exp( 2θ Lx 2 ), θ = 4n 2. Computational Methods in Inverse Problems, Mat.3626-7

Posterior density From Bayes formula, the posterior density is π post (x) = π(x b) π prior (x)π(b x) ( exp 2σ 2 b Ax 2 ) 2θ Lx 2. This is a Gaussian density, whose maximum is the Maximum A Posteriori (MAP) estimate, ( x MAP = argmin 2σ 2 b Ax 2 + ) 2θ Lx 2, or, equivalently, the least squares solution of the system [ ] [ ] σ A σ θ /2 x = b. L Computational Methods in Inverse Problems, Mat.3626-8

Posterior covariance: by completing the square, 2σ 2 b Ax 2 + ( 2θ Lx 2 = x T σ 2 AT A + ) θ LT L }{{} =Γ = ( x Γ ) T ( σ 2 AT b Γ x Γ ) σ 2 AT b ( ) x+2x T σ 2 AT b + σ 2 b 2 + terms independent of x. Denoting x = Γ σ 2 AT b = ( σ 2 AT A + ) θ LT L σ 2 AT b. Computational Methods in Inverse Problems, Mat.3626-9

The posterior density is π post (x) exp ( ) 2 (x x ) T Γ (x x ), or in other words, π post N (x, Γ), Γ = ( σ 2 AT A + ) θ LT L, x = Γ σ 2 AT b = ( σ 2 AT A + ) θ LT L σ 2 AT b = x MAP. Computational Methods in Inverse Problems, Mat.3626 -

Example Gaussian convolution kernel, ( a(t) = exp ) 2πω 2 2ω 2 t2 Number of discretization points = 6. See the attached Matlab file Deconvloution.m: Test : Input function triangular pulse,, ω =.5. f(t) = { αt, t t =.7, β( t), t > t, where α =.6, β = α t t =.4 >. Hence, the prior information about the slopes is slightly too conservative. Computational Methods in Inverse Problems, Mat.3626 -

Posterior credibility envelope Plotting the MAP estimate and the 95% credibility envelope: If X is distributed according to the Gaussian posterior density, var(x j ) = Γ jj. With 95% a posteriori probability (x ) j 2 Γ jj < X j < (x ) j + 2 Γ jj. Try the algorithm with various values of θ. Computational Methods in Inverse Problems, Mat.3626-2

Test 2: Input function, whose slope is locally larger than one: ( f(t) = (t.5)exp ) (t.5)2, δ 2 =. 2δ2 The slope in the interval [.4,.6] is much larger than one, while outside the interval it is very small. The algorithm does not perform well. Assume that we, for some source, have the following prior information: Prior information: The signal is almost constant (the slope of the order., say) outside the interval [.4,, 6], while within this interval, the slope can be of order. Computational Methods in Inverse Problems, Mat.3626-3

Non-stationary ARMA model Write Write an autoregressive Markov model (ARMA): X j = X j + θ j W j, W j N (, ), X =, i.e., the variance of the innovation process varies. How to choose the variance? Slope information: The slope is S j = n(x j X j ) = n θ j W j and P { S j < 2n } θ j.95, 2n θ = two standard deviations. A reasonable choice would then be 2n {,.4 < tj <.6 θ j =. else Computational Methods in Inverse Problems, Mat.3626-4

Define the variance vector θ R n, θ j = 25 n 2, if.4 < t j <.6 θ j = 4n 2, else, The ARMA model in matrix form: D /2 LX = W N (, I), where D = diag(θ). This leads to a prior model π prior (x) exp ( 2 ) D /2 Lx 2. Computational Methods in Inverse Problems, Mat.3626-5

Posterior density From Bayes formula, the posterior density is π post (x) = π(x b) π prior (x)π(b x) ( exp 2σ 2 b Ax 2 ) 2 D /2 Lx 2. This is a Gaussian density, whose maximum is the Maximum A Posteriori (MAP) estimate, ( x MAP = argmin 2σ 2 b Ax 2 + ) 2 D /2 Lx 2, or, equivalently, the least squares solution of the system [ ] [ ] σ A σ D /2 x = b. L Computational Methods in Inverse Problems, Mat.3626-6

Gaussian posterior density Posterior mean and covariance π post N (x, Γ), where Γ = ( ) σ 2 AT A + L T D L, x = Γ σ 2 AT b = ( σ 2 AT A + L T D L ) σ 2 AT b = x MAP. Computational Methods in Inverse Problems, Mat.3626-7

Test 2 continued: Use the non-stationary ARMA model based prior. Try different values for the variance in the interval. Observe the effect on the posterior covariance envelopes. Test 3: Go back to the stationary ARMA model, and assume this time that f(t) = {, τ < t < τ 2, else where τ = 5/6, τ 2 = 9/6. Observe that the prior is badly in conflict with the reality. Notice that the posterior envelopes are very tight, but the true signal is not within. Is it wrong to say: The p% credibility envelopes contain the true signal with probability p%. Computational Methods in Inverse Problems, Mat.3626-8

Smoothness prior and discontinuities Assume that the prior information is: Prior information: The signal is almost constant (the slope of the order., say), but is suffers a jump of order one somewhere around t 5 = 6/6 and t 9 = 9/6. Build an ARMA model based on this information: θ j =, except around j = 5 and j = 9, 4n2 and adjust the variances around the jumps to correspond to the jump amplitude, e.g., θ j = 4, corresponding to standard deviation θ j =.5. Notice the large posterior variance around the jumps! Computational Methods in Inverse Problems, Mat.3626-9

Prior based on incorrect information Note: I avoid the term incorrect prior: prior is what we believe a priori, it is not right or wrong, but evidence may prove to be against it or supporting it. In simulations, it is easy to judge a prior incorrect, because we control the truth. Test 4: What happens if we have incorrect information concerning the jumps? We believe that there is a third jump, which does not exist in the input; or the prior information concerning the jump locations is completely wrong. Computational Methods in Inverse Problems, Mat.3626-2

A step up: unknown variance Going back to the original problem, but with less prior information: Prior information: The signal f varies rather slowly, and vanishes at t =, but no particular information about the slope is available. As before, start with the initial ARMA model: or in the matrix form, X j = X j + θw j, W j N (, ), X =, LX = θw, W N (, I). As before, we write the prior, pretending that we knew the variance, π prior (x θ) = C exp ( 2θ ) Lx 2. Computational Methods in Inverse Problems, Mat.3626-2

Normalizing constant: the integral of the density has to be one, = C(θ) exp ( 2θ ) Lx 2 dx, R n and with the change of variables x = θz, dx = θ n/2 dz, we obtain so we deduce that = θ n/2 C(θ) exp ( 2 ) Lz 2 dz, R } n {{} independent of θ C(θ) θ n/2, and we may write π prior (x θ) exp ( 2θ Lx 2 n2 ) log θ. Computational Methods in Inverse Problems, Mat.3626-22

Stochastic extension Since θ is not known, it will be treated as a random variable Θ. Any information concerning Θ is then coded the prior probability density called the hyperprior, π hyper (θ). The inverse problem is to infer on the pair of unknown, (X, Θ). The joint prior density is π prior (x, θ) = π prior (x θ)π hyper (θ). The posterior density of (X, Θ) is, by Bayes formula, π post (x, θ) π prior (x, θ)π(b x) = π prior (x θ)π hyper (θ)π(b x). Computational Methods in Inverse Problems, Mat.3626-23

Here, we assume that the only prior information concerning Θ is its positivity, π hyper (θ)π + (θ) = {, θ >, θ Note: this is, in fact an improper density, since i is not integrable. In practice, we assume an upper bound that we hope will never play a role. The posterior density is now π post (x, θ) π + (θ)exp ( 2σ 2 b Ax 2 2θ Lx 2 n ) 2 log θ. Computational Methods in Inverse Problems, Mat.3626-24

Sequential optimization: Iterative algorithm for the MAP estimate. Initialize θ = θ >, k =. 2. Update x, 3. Update θ, x k = argmax { π post (x θ k ) }. θ k = argmax { π post (θ x k ) }. 4. Increase k by one and repeat from 2. until convergence. Notice: π post (x θ) means that we simply fix θ. Computational Methods in Inverse Problems, Mat.3626-25

Updating steps in practice: Updating x: Since θ = θ k is fixed, we simply have { x k = argmin 2σ 2 b Ax 2 + } Lx 2, 2θ k which is the Good Old Least Squares Solution (GOLSQ). Updating θ: The likelihood does not depend on θ, and so we have { θ k = argmin 2θ Lx k 2 + n } 2 log θ, which is the zero of the derivative of the objective function, or 2θ 2 Lx k 2 + n 2θ =, θ k = Lx k 2 n. Computational Methods in Inverse Problems, Mat.3626-26

Undetermined variances Qualitative description of problem: Given a noisy indirect observation, recover a signal or an image that varies slowly except for unknown number of jumps of unknown size and location. Information concerning the jumps: The jumps should be sudden, suggesting that the variances should be mutually independent. There is no obvious preference of one location over others, therefore the components should be identically distributed. Only a few variances can be significantly large, while most of them should be small, suggesting a hyperprior that allows rare outliers. Computational Methods in Inverse Problems, Mat.3626-27

Candidate probability densities: Gamma distribution, θ j Gamma(α, θ ), inverse Gamma distribution, π hyper (θ) n j= ( θ α j exp θ ) j, () θ θ j InvGamma(α, θ ), π hyper (θ) n j= ( θ α j exp θ ). (2) θ j Computational Methods in Inverse Problems, Mat.3626-28

Random Draws x 4 x 4.8.8.6.6.4.4.2.2 2 4 6 8 2 4 6 8 Computational Methods in Inverse Problems, Mat.3626-29

Estimating the MAP Outline of the algorithm is as follows:. Initialize θ = θ, k =. 2. Update the estimate of the increments: x k = argmax π(x, θ k b). 3. Update the estimate of the variances θ: θ k = argmax π(x k, θ b). 4. Increase k by one and repeat from 2. until convergence. Computational Methods in Inverse Problems, Mat.3626-3

In practice: F (x, θ b) = log ( π(x, θ b) ), which becomes, when the hyperprior is the gamma distribution, F (x, θ b) 2σ 2 Az b 2 + 2 D /2 Lx 2 + n θ j θ j= and, when the hyperprior is the inverse gamma distribution, where D = D θ. F (x, θ b) 2σ 2 Ax b 2 + 2 D /2 Lx 2 + θ n j= θ j + ( α 3 ) n log θ j, 2 j= ( α + 3 ) n log θ j, 2 j= Computational Methods in Inverse Problems, Mat.3626-3

. Updating x: x k = argmin ( 2σ 2 Ax b 2 + ) 2 D /2 Lx 2, D = D θ k, that is, x k is the least squares solution of the system [ ] [ ] (/σ)a (/σ)b D /2 x =. L 2. Updating θ: θ k j satisfies θ j F (x k, θ) = 2 ( z k j θ j ) 2 + θ ( α 3 ) =, 2 θ j z k = Lx k, which has an explicit solution, θj k (zj = θ η k + )2 + η 2θ 2, η = 2 ( α 3 ). 2 Computational Methods in Inverse Problems, Mat.3626-32

Computed example: Signal and data.2.8.6.4.2.8.6.4.2.2.5.5 2 2.5 3 Gaussian blur, noise level 2%.2.5.5 2 2.5 3 Computational Methods in Inverse Problems, Mat.3626-33

MAP estimate, Gamma hyperprior x 3.5 2.5 2.5.5.5.5 5 Iteration 5 t 2 3 5 Iteration 5 t 2 3 Computational Methods in Inverse Problems, Mat.3626-34

MAP estimate, Gamma hyperprior 2.5 x 3.8.6.4.2.2.5.5 2 2.5 3 2.5.5.5.5 2 2.5 3 Computational Methods in Inverse Problems, Mat.3626-35

MAP estimate, Inverse Gamma hyperprior.5.2.5.5..5.5 5 Iteration 5 t 2 3 5 Iteration 5 t 2 3 Computational Methods in Inverse Problems, Mat.3626-36

MAP estimate, Inverse Gamma hyperprior.2.8.6.4.2.2.5.5 2 2.5 3.5..5.5.5 2 2.5 3 Computational Methods in Inverse Problems, Mat.3626-37