Noise-Blind Image Deblurring Supplementary Material

Similar documents
Expectation Maximization Deconvolution Algorithm

Noise-Blind Image Deblurring

Machine Learning Basics III

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

Lecture 3: Pattern Classification. Pattern classification

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

Introduction to Machine Learning

Modeling Blurred Video with Layers Supplemental material

Total Variation Blind Deconvolution: The Devil is in the Details Technical Report

1 Kernel methods & optimization

Probabilistic Graphical Models

Machine Learning 4771

MACHINE LEARNING ADVANCED MACHINE LEARNING

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction

y Xw 2 2 y Xw λ w 2 2

Where now? Machine Learning and Bayesian Inference

2 Statistical Estimation: Basic Concepts

Least Squares Regression

MACHINE LEARNING ADVANCED MACHINE LEARNING

CSE 559A: Computer Vision

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

CSE 559A: Computer Vision Tomorrow Zhihao's Office Hours back in Jolley 309: 10:30am-Noon

Inf2b Learning and Data

Gaussian Process Vine Copulas for Multivariate Dependence

LPA-ICI Applications in Image Processing

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

Sparsity Regularization

Bayesian Inference of Noise Levels in Regression

Least Squares Regression

Notes on Discriminant Functions and Optimal Classification

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

Notes on Machine Learning for and

Bayesian Deep Learning

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Expectation maximization tutorial

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods

Independent Component Analysis and Unsupervised Learning

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

arxiv: v2 [cs.cv] 4 Oct 2017

Lecture Note 2: Estimation and Information Theory

ebay/google short course: Problem set 2

State Space and Hidden Markov Models

Convolutional Neural Networks

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

Introduction to Machine Learning

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

SOS Boosting of Image Denoising Algorithms

Hidden Markov Models

Lecture : Probabilistic Machine Learning

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Reading Group on Deep Learning Session 1

Lightweight Probabilistic Deep Networks Supplemental Material

CSC321 Lecture 18: Learning Probabilistic Models

MCMC Sampling for Bayesian Inference using L1-type Priors

ECE521 week 3: 23/26 January 2017

Computer Intensive Methods in Mathematical Statistics

Bayesian Methods / G.D. Hager S. Leonard

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Conditional Random Field

On a Data Assimilation Method coupling Kalman Filtering, MCRE Concept and PGD Model Reduction for Real-Time Updating of Structural Mechanics Model

Learning MMSE Optimal Thresholds for FISTA

EUSIPCO

A Robust State Estimator Based on Maximum Exponential Square (MES)

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Bayesian Semi-supervised Learning with Deep Generative Models

Machine Learning: The Perceptron. Lecture 06

Linear and logistic regression

Gaussian Processes in Reinforcement Learning

p(d θ ) l(θ ) 1.2 x x x

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

1 Expectation Maximization

Outline Lecture 2 2(32)

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Bayesian Paradigm. Maximum A Posteriori Estimation

ECE521 Tutorial 2. Regression, GPs, Assignment 1. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel for the slides.

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Sparsity Regularization for Image Reconstruction with Poisson Data

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

Noise, Image Reconstruction with Noise!

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Recursive Estimation

Instance-level l recognition. Cordelia Schmid INRIA

The evolution from MLE to MAP to Bayesian Learning

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

DATA MINING AND MACHINE LEARNING

Upper Bounds to Error Probability with Feedback

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

X t = a t + r t, (7.1)

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Linear Models for Regression

Transcription:

Noise-Blind Image Deblurring Supplementary Material Meiguang Jin University of Bern Switzerland Stefan Roth TU Darmstadt Germany Paolo Favaro University of Bern Switzerland A. Upper and Lower Bounds Our approach proposes a Bayesian generalization to Maimum a-posteriori (MAP). It is interesting to understand what relationship eists between MAP, the proposed epected loss, and the loss lower bound. It turns out that the logarithm of Bayes utility (BU) is bounded from below by our proposed bound and from above by the log of the MAP problem. We have log ma p(y ˆ; σ n)p(ˆ) = log ma p(ˆ, y; σ n) (MAP) ˆ ˆ = ma log ma p(ˆ, y; σ n) G(, ) d ˆ ma ma log p(, y; σ n )G(, ) d G(, ) log p(, y; σ n ) d. (BU) (lower bound) The second equation is due to the definition of G (it integrates to over ). Notice that as σ 0, the lower bound tends towards the MAP and thus they both converge to the logarithm of BU. B. A General Family of Image Priors We here show the applicability of our framework to a general family of image priors whose negative log-likelihood can be written as a concave function φ of terms F ik µ. As in the main paper, F i is a linear filter or a Toeplitz matri, e.g., F i =. Again as before, F ik yields the k th entry of the output and is therefore a row vector; µ and σ are parameters. Consequently, we have log p( ) = φ (ξ,..., ξ IJK ) with ξ ik = F ik µ. (39) Table 5 summarizes a few choices of φ for some popular image priors. Also notice that the type- Gumbel prior falls into this family as φ (ξ 0, ξ,..., ξ IJK ) = ξ 0 ik w i e ξ ik with ξ 0 = σ0, ξ ik = F ik µ σ (40) is a concave function ointly in all its arguments. These priors can have simple surrogate functions that yield simple Maorization Minimization (MM) 0 iterations. The concavity of φ gives the following inequality and surrogate function ψ( τ ) log p( ) log p( τ ) ik ( F ik µ ) F ik τ µ σ. = ψ( τ ). (4)

Image prior log p( ) φ( ( ) ) φ ( ( ) ) Gaussian σ, φ(w) = σ w(z) dz φ (w) = σ Total Variation σ, φ(w) = σ (w(z) + ɛ) dz φ (w) = 4σ (w(z) + ɛ) Sparsity (p < ) σ,p φ(w) = σ ( (w(z) + ɛ) p dz) p φ (w) = ( (w(z) + ɛ) p dz) p (w(z)+ɛ) p 4σ Table 5. Eamples of image priors, their negative log probability density functions, the corresponding φ functions, and derivative φ. The small coefficient ɛ > 0 is to avoid division by zero. Then, this yields the inequality G(, ) log p( ) d = = ik G(, )ψ( τ ) d G(, ) ik where all the constant terms do not depend on. C. Noise-Adaptive Deblurring F ik µ d + const (4a) (4b) F ik µ + const, (4c) The noise-adaptive algorithm in the general image prior family is analogous to that in the main paper for the Gumbel prior. We need to put all the terms together and solve the maimization of the lower bound arg ma G(, ) log p(, y; σ n ) d. (43),σ n Thus, we obtain the following iterative algorithm ( τ+ y k + Mσ k, ˆσ n ) = arg min,σ n σn + N log σ n + ik F ik µ. We can now solve eplicitly for ˆσ n and, as in the main paper, obtain ˆσ n = N y k + Mσ k. (45) This closed form can be incorporated in an iterative algorithm and yields τ+ N = arg min log y k + Mσ k + ik In the noise-blind deblurring formulation we can eplicitly obtain the gradient descent iteration τ+ = τ α λ τ K (K τ y) + F F ik τ µ ik σ ik σ (44) F ik µ. (46). (47) for some small step α > 0, where τ is the solution at gradient descent iteration τ and, as in the main paper, the noise adaptivity is given as λ τ = N y K τ +Mσ k.

TV-L Noise-Adaptive Deblurring. descent iteration τ+ = τ α Following from Eq. (47) the case of TV-L is then readily obtained as the gradient λ τ K (K τ y) + ( τ 4σ τ, ) (48) for some small step α > 0, where denotes the finite difference operator along the two coordinate aes, and τ and λ τ are defined as before. EPLL Noise-Adaptive Deblurring. noise-adaptive term λ τ = N y K τ +Mσ k In the case of EPLL 37, we modify the original Eq. (4) in 37 by introducing our and obtain τ+ = λ τ A A + β P P λ τ A y + β P z τ. (49) D. Back-Propagation in the GradNet Architecture In the following section we derive the gradients of GradNet with respect to its parameters. We first introduce the notation for the basic components of one stage in GradNet and then compute the gradients of this stage with respect to the parameters. The derivatives for the other stages will be similar. Initialization. To obtain 0 q at the stage τ = 0 we first pad the noisy blurry input using the constant boundary assumption (i.e., zero derivative at the boundary). Then we apply 3 iterations of MATLAB s edgetaper function to the padded noisy blurry input. We find eperimentally that this makes the reconstructed image converge faster at the boundaries. Greedy Training. τ+ as ( τ+ = τ In the following derivation, we will omit the sample inde q for simplicity. In the main paper, we defined λ τ H H + I + γ τ σ0 ik B τ ik B τ ik ) λ τ K (K τ y) + τ σ0 ik F ik τ ŵi τ ep F τ ik τ µ (σ +σ ) = τ (Λ τ ) η τ (50) where we have λ τ = N y K τ + Mσ k, (5) Λ τ = λ τ H H + I + γ τ σ0 ik B τ ik B τ ik, (5) and η τ = λ τ K (K τ y) + τ σ0 ik F ik τ ŵ τ i ep F τ ik τ µ (σ + σ ). (53) In the stage τ, the gradient equals τ+ Θ τ = (Λ τ ) Λτ Θ τ (Λτ ) η τ + ητ Θ τ. (54) Hence, given the loss function for stage τ L(Θ τ ) = C τ+ ( τ+ GT ), (55) 3

we can get the gradient of the loss function with respect to Θ τ as Θ τ = L(Θτ ) τ+ Λ τ+ Θ τ = P τ+ τ Θ τ (Λτ ) η τ ητ Θ τ η = P τ+ τ Θ τ + Λτ Θ τ (τ+ τ ), (56) where we define P τ+ = ( τ+ GT ) (C τ+ ) C τ+ (Λ τ ). Now we can obtain the derivative with respect to each parameter to be learned, i.e. σ, γ τ, ŵi τ and f i τ. To calculate L(Θτ ) σ, we can first take the derivative L(Θτ ) λ, after which τ can be easily calculated by chain rule σ Hence, we get σ γ τ ŵ τ i λ τ = P τ+ K (K τ y) + H H( τ+ τ ). (57) = L(Θτ ) λ τ λ τ σ P τ+ ik = P τ+ ik = P τ+ k λ τ σ = σnm k ( y K τ + Mσ k ). (58) F ik τ ŵ τ i ep F τ ik τ µ (σ + σ ) F τ ik τ µ σ (σ + σ ), (59) B τ ik B τ ik( τ+ τ ), (60) (F τ ik) ep F τ ik τ µ (σ + σ ). (6) We omit the derivation of L(Θτ ) f here. Since our work and shrinkage fields 5 make use of filters in the same way, the i τ derivation is analogous. For details we refer to the supplementary material of 5. Joint Training. After training each stage separately, we perform a oint training similarly to shrinkage fields 5 and the diffusion network 4, to which we refer to for more details. E. Additional Eperimental Results In Table 7, 8 and 9 we show additional results with both the PSNR and metrics and intermediate results of our GradNet (after st stage and 4 th stage). We also show two more visual results in Figs. 6 and 7 at the % and % noise levels corresponding to σ = 5.0 and.55. We can see that GradNet removes more blur in the flower region in Fig. 6. In the 3 rd row of Fig. 7 we also show the globally contrast-adusted difference between each reconstruction and the ground truth. We can see that GradNet compares favorably to EPLL on the dataset of Sun et al. Our noise-adaptive formulation is also able to deal with colored (spatially correlated) noise. We generate different amounts of white Gaussian noise %, 3%, and 4% (i.e., σ = 5.0, 7.65, 0.0), convolve these noise images with a 3 3 uniform filter to make noise spatially correlated and then finally add them to the blurry image. Eperiments are performed with 3 test images from 6. Results in Table 6 show that our noise-adaptive formulation is robust to colored noise. PSNR Method σ = 5.0 σ = 7.65 σ = 0. σ = 5.0 σ = 7.65 σ = 0. FD3 (non-blind) 8.075 7.88 6.85 0.837 0.88 0.795 EPLL 37 + NE 7.35 4.099.683 0.709 0.556 0.444 EPLL 37 + NA 3.356 8.775 6.54 0.900 0.807 0.687 TV-L + NA 30.88 9.086 7.489 0.879 0.80 0.738 BD 6 9.00 6.495 4.675 0.789 0.685 0.60 GradNet 7S 3.04 9. 6.88 0.903 0.840 0.74 Table 6. Average PSNR(dB) and on 3 images from 6 for three different colored noise levels. 4

PSNR Method σ =.55 σ = 5.0 σ = 7.65 σ = 0. σ =.55 σ = 5.0 σ = 7.65 σ = 0. FD 3 (non-blind) 30.09 8.396 7.35 6.50 0.890 0.845 0.807 0.78 RTF 4 (σ =.55) 3.355 6.337.48 7.38 0.95 0.676 0.44 0.30 CSF 5 (non-blind) 9.954 8.6 7.84 6.698 0.88 0.80 0.779 0.750 TNRD 4 (non-blind) 8.88 8.095 7.550 7.8 0.854 0.84 0.800 0.795 TV-L (non-blind) 30.870 8.48 7.594 6.54 0.89 0.8 0.79 0.76 EPLL 37 (non-blind) 3.08 9.789 8.3 7.96 0.90 0.874 0.836 0.803 EPLL 37 + NE 36 3.857 9.77 8.8 7.57 0.99 0.878 0.839 0.807 EPLL 37 + NA 3.60 30.48 8.957 7.85 0.93 0.888 0.856 0.84 TV-L + NA 3.050 9.35 8.08 7.6 0.894 0.843 0.8 0.798 BD 6 30.4 8.765 7.908 7.89 0.880 0.833 0.807 0.789 GradNet S 5.094 4.700 4.6 3.88 0.754 0.736 0.77 0.698 GradNet 4S 30.86 8.97 7.363 6.753 0.869 0.786 0.75 0.734 GradNet 7S 3.43 8.878 7.55 6.960 0.9 0.84 0.797 0.783 Table 7. Average PSNR (db) on 3 test images from 6. PSNR Method σ =.55 σ = 5.0 σ = 7.65 σ = 0. σ =.55 σ = 5.0 σ = 7.65 σ = 0. FD 3 (non-blind) 30.789 8.898 7.863 7.38 0.85 0.787 0.744 0.74 EPLL 37 (non-blind) 3.049 9.60 8.5 7.338 0.880 0.807 0.758 0.7 CSF 5 (non-blind) 30.875 8.604 7.647 6.969 0.853 0.75 0.78 0.68 TNRD 4 (non-blind) 30.06 8.794 8.040 7.544 0.844 0.790 0.750 0.739 EPLL 37 + NE 3.0. 9.600 8.49 7.340 0.878 0.807 0.758 0.74 EPLL 37 + NA 3.8 30.077 8.770 7.806 0.88 0.86 0.775 0.736 TV-L + NA 30.07 8.587 7.600 6.886 0.853 0.793 0.75 0.78 GradNet S 7.000 6.459 5.99 5.595 0.73 0.706 0.684 0.665 GradNet 4S 30.430 8.67 7.0 6.675 0.85 0.76 0.67 0.650 GradNet 7S 3.745 9.30 8.044 7.540 0.873 0.798 0.750 0.733 Table 8. Average PSNR (db) on 640 test images from 9. PSNR Method σ =.55 σ = 5.0 σ = 7.65 σ = 0. σ =.55 σ = 5.0 σ = 7.65 σ = 0. FD 3 (non-blind) 4.436 3.40.64.065 0.664 0.577 0.534 0.49 EPLL 37 (non-blind) 5.377 5.53.545.905 0.7 0.590 0.5 0.476 RTF 4 (σ =.55) 5.70 3.454 9.833 6.939 0.73 0.607 0.403 0.80 CSF 5 (non-blind) 4.734 3.603.88.440 0.693 0.6 0.558 0.5 TNRD 4 (non-blind) 4.74 3.76 3. 70.865 0.690 0.63 0.589 0.550 EPLL 37 + NE 36 5.360 3.53.545.904 0.708 0.588 0.50 0.478 EPLL 37 + NA 5.570 3.90.9.7 0.74 0.608 0.537 0.493 TV-L + NA 4.6 3.65.896.336 0.687 0.607 0.546 0.504 GradNet S.94.695.455.34 0.506 0.485 0.464 0.447 GradNet 4S 4.665 3.56.874.39 0.687 0.60 0.548 0.5 GradNet 7S 5.57 4.7 3.464.94 0.73 0.653 0.595 0.55 Table 9. Average PSNR (db) on 50 test images from Berkeley segmentation dataset with large blurs. 5

(a) Blurry input (b) Ground Truth (c) TV-L (d) FD 3 (e) EPLL 37 (f) GradNet stage (g) GradNet 4 stage (h) GradNet 7 stage Figure 6. Results for the % noise case. PSNR results are also shown at the top left corner of the estimated images. (a) Blurry input (b) Ground Truth (c) TV-L (d) FD 3 (e) EPLL 37 (f) GradNet stage (g) GradNet 4 stage (h) GradNet 7 stage (i) Difference between EPLL and GT () Difference between FD and GT (k) Difference between TV and GT (l) Difference between GradNet and GT Figure 7. Results for the % noise case (best viewed on screen). PSNR results are also shown at the top left corner of the estimated images. 6