Bayesian Paradigm Maximum A Posteriori Estimation
Simple acquisition model noise + degradation
Constraint minimization or
Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization) Maximum Aposteriori (MAP)
Discrete representation Image is a random field Each pixel value is a realization of some random variable
Random variables Discrete: takes a countable number of distinct values e.g. num. of children in a family Continuous: x probability density function ( distribution ): Moments: expectation ( average ), variance
Random variables Expectation Variance
Image as a random field
Normal distribution
Multivariate Normal Distribution
Multivariate Normal Distribution
Gamma distribution
Bayesian Paradigm a posteriori distribution unknown likelihood given by our problem a priori distribution our prior knowledge Maximum a posteriori (MAP): max p(u z) Maximum likelihood (MLE): max p(z u)
Likelihood
Image Prior 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 Intensity histogram Gradient histogram
Image Prior Intensities Gradients (log)
Image Prior 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 Gradient histogram
Image Prior 1 0.9 0.8 0.7 Tikhonov regularization 0.6 0.5 0.4 0.3 0.2 0.1 0-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 1 0.9 0.8 0.7 0.6 0.5 TV regularization 0.4 0.3 0.2 0.1 0-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5
Image Prior Non-convex regularization
MAP
Kullback-Leibler divergence Measures similarity between two distributions
Convex function f(x) Jensen s inequality:
Kullback-Leibler divergence is strictly convex
Variational Bayes Approximate posterior with a simpler form Minimize with respect to E-L:
Variational Bayes
Variational Bayes Minimize with respect to all other factors
Variational Bayes Minimize with respect to
Factorized approximation
GM approximated by a single Gaussian depends on initialization
Denoising with unknown noise level Noise model: Bayes: and is unknown
MAP denoising Minimize the -log posterior with respect to : with respect to :
A Posteriori surface
MAP denoising
Marginalized denoising Expectation of marginalized posterior
Marginalized denoising
VB denoising Find the factorized approximation of the posterior: image:
VB denoising noise precision:
VB denoising
Comparission
Sparse & Redundant Representation Iterated Shrinkage Algorithms Wavelet thresholding
We will Talk About The minimization of the functional Substitution: and assume D is invertible : This is our original denoising functional, if Moore-Penrose pseudoinverse
A Closer Look at the Functional D rectangular, overcomplete dictionary -> underdetermined system a sparse representation ρ(a) measure of sparsity
Measure of Sparsity norms ( ) norm, counts nonzero elements many other sparsity measures smooth l 1
l 2 unit ball
l 1 unit ball
l 0.9 unit ball
l 0.5 unit ball
l 0.3 unit ball
Equivalent formulation Method of Lagrange multipliers
l 2 -norm Solution via pseudoinverse is simple but not sparse:
l 1 -norm
Why Seeking Sparse Representation? Assume x is 1-sparse signal in the dictionary is the perfect solution Overcomplete dictionaries => sparsity
Classical Solvers Steepest Descent: Hessian: Convergence depends on the condition number κ of the Hessian: Hessian is ill-conditioned => poor convergence
The Unitary Case (DD T = I) Use Define We have m independent 1D optimization problems l 2 is unitarily invariant
The 1D Task We need to solve the following 1D problem: LUT a opt b Such a Look-Up-Table (LUT) can be built for ANY sparsity measure function ρ(a), including non-convex ones and non-smooth ones (e.g., l 0 norm).
Shrinkage for ρ(a)= a p
Soft & Hard Thresholding Soft thresholding Hard thresholding
The Unitary Case: Summary Minimizing is done by: x Multiply by D H b LUT S, b a Multiply by D xˆ DONE! The obtained solution is the GLOBAL minimizer of f(a), even if f(a) is non-convex.
The minimization of Leads to two very Contradicting Observations: 1. The problem is quite hard classic optimization find it hard. 2. The problem is trivial for the case of unitary D. How Can We Enjoy This Simplicity in the General Case?
Iterated-Shrinkage Algorithms Common to all is a set of operations in every iteration that includes: (i) Multiplication by D (ii) Multiplication by D T, and (iii) A Scalar shrinkage on the solution S ρ,λ (a).
The Proximal-Point Method Aim: minimize f(a) Suppose it is found to be too hard. Define a surrogate-function g(a,a 0 )=f(a)+dist(a-a 0 ). Then, the following algorithm necessarily converges to a local minima of f(a) [Rokafellar, `76]: a 0 a 1 Minimize Minimize a 2 a k a k+1 Minimize g(a,a 0 ) g(a,a 1 ) g(a,a k ) Comments: (i) The minimization of g(a,a 0 ) should be easier. (ii) Looks like it will slow-down convergence. Really?
The Proposed Surrogate-Function Original function: Distance: The beauty is that vanishes! Not a function of a. Minimization of g(a,a 0 ) is done in a closed form by shrinkage, done on the vector b k, and this generates the solution a k+1 of the next iteration. We have m independent 1D optimization problems
Iterated-Shrinkage Algorithm While the Unitary case solution is given by x Multiply by D T b LUT S, b aˆ Multiply by D xˆ the general case, by SSF requires: x + - + Multiply by D T /c + b k S, c b a LUT k 1 Multiply by D xˆ
Linear Programming Basis Pursuit: LP: Interior-point Algorithms for LP problems
Compressed Sensing In compressed sensing we compress the signal x of size m by exploiting its origin. This is done by p<<m random projections. The core idea: (P size: p m) holds all the information about the original signal x, even though p<<m. Reconstruction? Done by MAP and solving y M Da x Px Noise? xˆ