Bayesian Paradigm. Maximum A Posteriori Estimation

Bayesian Paradigm Maximum A Posteriori Estimation

Simple acquisition model noise + degradation

Constraint minimization or

Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization) Maximum Aposteriori (MAP)

Discrete representation Image is a random field Each pixel value is a realization of some random variable

Random variables Discrete: takes a countable number of distinct values e.g. num. of children in a family Continuous: x probability density function ( distribution ): Moments: expectation ( average ), variance

Random variables Expectation Variance

Image as a random field

Normal distribution

Multivariate Normal Distribution

Gamma distribution

Bayesian Paradigm a posteriori distribution unknown likelihood given by our problem a priori distribution our prior knowledge Maximum a posteriori (MAP): max p(u z) Maximum likelihood (MLE): max p(z u)

Likelihood

Image Prior 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 Intensity histogram Gradient histogram

Image Prior Intensities Gradients (log)

Image Prior 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 Gradient histogram

Image Prior 1 0.9 0.8 0.7 Tikhonov regularization 0.6 0.5 0.4 0.3 0.2 0.1 0-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 1 0.9 0.8 0.7 0.6 0.5 TV regularization 0.4 0.3 0.2 0.1 0-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5

Image Prior Non-convex regularization

MAP

Kullback-Leibler divergence Measures similarity between two distributions

Convex function f(x) Jensen s inequality:

Kullback-Leibler divergence is strictly convex

Variational Bayes Approximate posterior with a simpler form Minimize with respect to E-L:

Variational Bayes

Variational Bayes Minimize with respect to all other factors

Variational Bayes Minimize with respect to

Factorized approximation

GM approximated by a single Gaussian depends on initialization

Denoising with unknown noise level Noise model: Bayes: and is unknown

MAP denoising Minimize the -log posterior with respect to : with respect to :

A Posteriori surface

MAP denoising

Marginalized denoising Expectation of marginalized posterior

Marginalized denoising

VB denoising Find the factorized approximation of the posterior: image:

VB denoising noise precision:

VB denoising

Comparission

Sparse & Redundant Representation Iterated Shrinkage Algorithms Wavelet thresholding

We will Talk About The minimization of the functional Substitution: and assume D is invertible : This is our original denoising functional, if Moore-Penrose pseudoinverse

A Closer Look at the Functional D rectangular, overcomplete dictionary -> underdetermined system a sparse representation ρ(a) measure of sparsity

Measure of Sparsity norms ( ) norm, counts nonzero elements many other sparsity measures smooth l 1

l 2 unit ball

l 1 unit ball

l 0.9 unit ball

l 0.5 unit ball

l 0.3 unit ball

Equivalent formulation Method of Lagrange multipliers

l 2 -norm Solution via pseudoinverse is simple but not sparse:

l 1 -norm

Why Seeking Sparse Representation? Assume x is 1-sparse signal in the dictionary is the perfect solution Overcomplete dictionaries => sparsity

Classical Solvers Steepest Descent: Hessian: Convergence depends on the condition number κ of the Hessian: Hessian is ill-conditioned => poor convergence

The Unitary Case (DD T = I) Use Define We have m independent 1D optimization problems l 2 is unitarily invariant

The 1D Task We need to solve the following 1D problem: LUT a opt b Such a Look-Up-Table (LUT) can be built for ANY sparsity measure function ρ(a), including non-convex ones and non-smooth ones (e.g., l 0 norm).

Shrinkage for ρ(a)= a p

Soft & Hard Thresholding Soft thresholding Hard thresholding

The Unitary Case: Summary Minimizing is done by: x Multiply by D H b LUT S, b a Multiply by D xˆ DONE! The obtained solution is the GLOBAL minimizer of f(a), even if f(a) is non-convex.

The minimization of Leads to two very Contradicting Observations: 1. The problem is quite hard classic optimization find it hard. 2. The problem is trivial for the case of unitary D. How Can We Enjoy This Simplicity in the General Case?

Iterated-Shrinkage Algorithms Common to all is a set of operations in every iteration that includes: (i) Multiplication by D (ii) Multiplication by D T, and (iii) A Scalar shrinkage on the solution S ρ,λ (a).

The Proximal-Point Method Aim: minimize f(a) Suppose it is found to be too hard. Define a surrogate-function g(a,a 0 )=f(a)+dist(a-a 0 ). Then, the following algorithm necessarily converges to a local minima of f(a) [Rokafellar, `76]: a 0 a 1 Minimize Minimize a 2 a k a k+1 Minimize g(a,a 0 ) g(a,a 1 ) g(a,a k ) Comments: (i) The minimization of g(a,a 0 ) should be easier. (ii) Looks like it will slow-down convergence. Really?

The Proposed Surrogate-Function Original function: Distance: The beauty is that vanishes! Not a function of a. Minimization of g(a,a 0 ) is done in a closed form by shrinkage, done on the vector b k, and this generates the solution a k+1 of the next iteration. We have m independent 1D optimization problems

Iterated-Shrinkage Algorithm While the Unitary case solution is given by x Multiply by D T b LUT S, b aˆ Multiply by D xˆ the general case, by SSF requires: x + - + Multiply by D T /c + b k S, c b a LUT k 1 Multiply by D xˆ

Linear Programming Basis Pursuit: LP: Interior-point Algorithms for LP problems

Compressed Sensing In compressed sensing we compress the signal x of size m by exploiting its origin. This is done by p<<m random projections. The core idea: (P size: p m) holds all the information about the original signal x, even though p<<m. Reconstruction? Done by MAP and solving y M Da x Px Noise? xˆ