SOS Boosting of Image Denoising Algorithms

Similar documents
A Quest for a Universal Model for Signals: From Sparsity to ConvNets

Sparse & Redundant Signal Representation, and its Role in Image Processing

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang

Graph Signal Processing for Image Compression & Restoration (Part II)

EUSIPCO

The Analysis Cosparse Model for Signals and Images

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Residual Correlation Regularization Based Image Denoising

CS 664 Segmentation (2) Daniel Huttenlocher

Spatially adaptive alpha-rooting in BM3D sharpening

Conjugate gradient acceleration of non-linear smoothing filters

Poisson Image Denoising Using Best Linear Prediction: A Post-processing Framework

Sparsifying Transform Learning for Compressed Sensing MRI

The Iteration-Tuned Dictionary for Sparse Representations

2.3. Clustering or vector quantization 57

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Signal Sparsity Models: Theory and Applications

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

Sparsity and Morphological Diversity in Source Separation. Jérôme Bobin IRFU/SEDI-Service d Astrophysique CEA Saclay - France

Recent developments on sparse representation

Learning an Adaptive Dictionary Structure for Efficient Image Sparse Coding

Introduction to Machine Learning

Closed-Form MMSE Estimation for Signal Denoising Under Sparse Representation Modeling Over a Unitary Dictionary

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

ITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

The Little Engine That Could: Regularization by Denoising (RED)

arxiv: v3 [cs.cv] 12 Sep 2016

A Nonlocal p-laplacian Equation With Variable Exponent For Image Restoration

Sensing systems limited by constraints: physical size, time, cost, energy

Image processing and nonparametric regression

Perturbation of the Eigenvectors of the Graph Laplacian: Application to Image Denoising

Self-Tuning Spectral Clustering

Sparse Subspace Clustering

Greedy Dictionary Selection for Sparse Representation

Inverse Imaging Problems using Graph-Signal Smoothness Priors

SPARSE signal representations have gained popularity in recent

Slicing the Transform - A Discriminative Approach for Wavelet Denoising

SEQUENTIAL SUBSPACE FINDING: A NEW ALGORITHM FOR LEARNING LOW-DIMENSIONAL LINEAR SUBSPACES.

L26: Advanced dimensionality reduction

Conjugate gradient acceleration of non-linear smoothing filters Iterated edge-preserving smoothing

arxiv: v1 [stat.ml] 27 Jul 2016

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

Filtering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011

Fast Linear Iterations for Distributed Averaging 1

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

A tutorial on sparse modeling. Outline:

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

SIGNAL AND IMAGE RESTORATION: SOLVING

A Dual Sparse Decomposition Method for Image Denoising

Structured matrix factorizations. Example: Eigenfaces

Efficient Beltrami Filtering of Color Images via Vector Extrapolation

Graph Partitioning Using Random Walks

Adaptive Primal Dual Optimization for Image Processing and Learning

Morphological Diversity and Source Separation

COMPUTATIONAL ISSUES RELATING TO INVERSION OF PRACTICAL DATA: WHERE IS THE UNCERTAINTY? CAN WE SOLVE Ax = b?

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler

Noise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

Preprocessing & dimensionality reduction

Minimizing the Difference of L 1 and L 2 Norms with Applications

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Iterative Methods for Solving A x = b

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Overcomplete Dictionaries for. Sparse Representation of Signals. Michal Aharon

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

Computational Photography

Diffuse interface methods on graphs: Data clustering and Gamma-limits

Introduction to Compressed Sensing

arxiv: v1 [eess.iv] 5 Feb 2018

Designing Information Devices and Systems I Discussion 13B

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

Solving linear systems (6 lectures)

Inverse Problems in Image Processing

Sparse molecular image representation

Truncation Strategy of Tensor Compressive Sensing for Noisy Video Sequences

Design of Image Adaptive Wavelets for Denoising Applications

Some Optimization and Statistical Learning Problems in Structural Biology

Recommendation Systems

Space-Variant Computer Vision: A Graph Theoretic Approach

NONLINEAR DIFFUSION PDES

Lecture: Local Spectral Methods (1 of 4)

STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS

Course Notes: Week 1

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Sparse Approximation: from Image Restoration to High Dimensional Classification

Automated Filter Parameter Selection using Measures of Noiseness

MLCC 2015 Dimensionality Reduction and PCA

Finite-Horizon Statistics for Markov chains

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Final Presentation

Noise-Blind Image Deblurring Supplementary Material

Physics-based Prior modeling in Inverse Problems

Harmonic Analysis and Geometries of Digital Data Bases

Lecture Notes 5: Multiresolution Analysis

Scale & Affine Invariant Interest Point Detectors

Transcription:

SOS Boosting of Image Denoising Algorithms Yaniv Romano and Michael Elad The Technion Israel Institute of technology Haifa 32000, Israel The research leading to these results has received funding from the European Research Council under European Union's Seventh Framework Program, ERC Grant agreement no. 320649, and by the Intel Collaborative Research Institute for Computational Intelligence

Leading Image Denoising Methods Are built upon powerful patch-based (local) image models: Non-Local Means (NLM): self-similarity within natural images K-SVD: sparse representation modeling of image patches BM3D: combines a sparsity prior and non local self-similarity Kernel-regression: offers a local directional filter EPLL: exploits a GMM model of the image patches Today we present a way to improve various such state-of-the-art image denoising methods, simply by applying the original algorithm as a black-box several times 2

Leading Image Denoising Methods Are built upon powerful patch-based (local) image models: Non-Local Means (NLM): self-similarity within natural images K-SVD: sparse representation modeling of image patches BM3D: combines a sparsity prior and non local self-similarity Kernel-regression: offers a local directional filter EPLL: exploits a GMM model of the image patches Search: Image and Denoising in ISI Webof-Science Today we present a way to improve various such state-of-the-art image denoising methods, simply by applying the original algorithm as a black-box several times 3

Background

Boosting Methods for Denoising Improved results can be achieved by processing the residual/method-noise image: Noisy image y Denoised image x = f y Method Noise y x 5

Processing the Residual Image Twicing [Tukey ( 77), Charest et al. ( 06)] x k+1 = x k + f y x k Method noise whitening [Romano & Elad ( 13)] Recovering the stolen content from the method-noise using the same basis elements that were chosen to represent the initially denoised patches TV denoising using Bregman distance [Bregman ( 67), Osher et al. ( 05)] x k+1 = f x k k + i=1 y x i 6

Boosting Methods Diffusion [Perona-Malik ( 90), Coifman et al. ( 06), Milanfar ( 12)] Removes the noise leftovers that are found in the denoised image x k+1 = f x k SAIF [Talebi et al. ( 12)] Chooses automatically the local improvement mechanism: Diffusion Twicing 7

Reducing the Local/Global Gap EPLL [Zoran & Weiss ( 09), Sulam & Elad ( 14)] Treats a major shortcoming of patch-based methods: The gap between the local patch processing and the global need for a whole restored image By encouraging the patches of the final image (i.e. after patch aggregation) to comply with the local prior In practice iterated denoising with a diminishing variance I. Denoising the patches of x k II. Obtain x k+1 by averaging the overlapping patches 8

SOS Boosting

Strengthen - Operate - Subtract Boosting Given any denoiser, how can we improve its performance? Denoise 10

Strengthen - Operate - Subtract Boosting Given any denoiser, how can we improve its performance? Denoise I. Strengthen the signal Previous Result II. Operate the denoiser 11

Strengthen - Operate - Subtract Boosting Given any denoiser, how can we improve its performance? Denoise I. Strengthen the signal II. Operate the denoiser Previous Result III. Subtract the previous estimation from the outcome SOS formulation: x k+1 = f y + x k x k 12

Strengthen - Operate - Subtract Boosting An improvement is obtained since SNR y + x In the ideal case, where x = x, we get SNR y + x = 2 SNR y > SNR y We suggest strengthening the underlying signal, rather than Adding the residual back to the noisy image Twicing converges to the noisy image Filtering the previous estimate over and over again Diffusion could lead to over-smoothing, converging to a piece-wise constant image 13

Image Denoising A Matrix Formulation In order to study the convergence of the SOS function, we represent the denoiser in its matrix form The properties of W: x = f y = W y Kernel-based methods (e.g. Bilateral filter, NLM, Kernel Regression) can be approximated as row-stochastic positive definite matrices [Milanfar ( 13)] Has eigenvalues in the range [0,,1] What about sparsity-based methods? 14

Sparsity Model The Basics We assume the existence of a dictionary D R d n whose columns are the atom signals Signals are modeled as sparse linear combinations of the dictionary atoms: x = Dα where α is sparse, meaning that it is assumed to contain mostly zeros The computation of α from x (or its or its noisy version) is called sparse-coding The OMP is a popular sparse-coding technique, especially for low dimensional signals x = D D α 15

K-SVD Image Denoising [Elad & Aharon ( 06)] Noisy Image Initial Dictionary Using KSVD Update the Dictionary Denoise each patch Using OMP p i = D Si α i = D Si D T Si D Si 1 T DSi R i y Denoised Patch A linear combination of few atoms α i = min z D Si z R i y 2 2 R i extracts the i th patch from y 16

K-SVD Image Denoising [Elad & Aharon ( 06)] Noisy Image Initial Dictionary Using KSVD Update the Dictionary Reconstructed Image Denoise each patch Using OMP x = min μ x y 2 2 2 + p i R i x 2 x i = μi + R i T R i i = Wy 1 μi + T i R i D Si D T Si D Si 1 T DSi R i y 17

K-SVD: A Matrix Formulation W is a sum of projection matrices, and has the following properties: Symmetric, W T = W (by assuming periodic boundary condition) Positive definite, W 0 Minimal eigenvalue λ min W c, where c = μ μ+n > 0 μ is the global averaging constant, and n is the patch size W1 = 1, thus λ=1 is eigenvalue corresponding to the eigenvector 1 The spectral radius W 2 = 1, thus λ max W = 1 W may have negative entries, which violates the classic definition of row-stochasticness The spectral radius I W 2 1 c < 1 18

Convergence Study SOS formulation: x k = W k y + x k 1 x k 1 Given: x the denoised image that obtained for a large k W k, k = 1,2, A series of filter-matrices The error e k = x k x of the recursive function is expressed by e k = W k I e k 1 + W k W y + x By assuming W k = W from a certain k (which comes up in practice): e k = W I e k 1 19

Convergence Study The SOS recursive function converges if I W 2 < 1 Holds both for kernel-based (Bilateral filter, NLM, Kernel Regression), and sparsity-based methods (K-SVD) For most denoising algorithms the SOS boosting is guaranteed to converge to x = I + I W 1 Wy The condition for convergence is alleviated in the generalized SOS version SOS formulation: x k = W k y + x k 1 x k 1 20

Generalization We introduce 2 parameters that modify The steady-state outcome The requirements for convergence (the eigenvalues range) The rate of convergence The parameter ρ, expressed by x k+1 = f y + ρx k ρx k Controls the signal emphasis Affects the steady-state outcome 21

Generalization The second parameter, τ, controls the rate-of-convergence The steady state outcome is given by x = f y + ρx ρx Multiplying both sides by τ, and adding x x to the RHS (does not effect the steady state result) τx = τf y + ρx τρx + x x Rearranging the equation above, and replacing x with x k leading to the generalized SOS recursive function: x k+1 = τf y + ρx k τρ + τ 1 x k 22

Generalization By assuming W k = W, the error e k = x k x is given by e k = τρw k τρ + τ 1 I e k 1 As a result, ρ and τ control the rate-of-convergence We suggest a closed-form expression for the optimal (ρ, τ) setting Given ρ, what is the best τ, leading to the fastest convergence? τ Largest eigenvalue of the error s transition matrix ρ 23

Experiments

Verifying the Convergence Study We apply the K-SVD on noisy House image σ = 25, with a fixed W τ is the outcome of our closed-form expression, achieving the fastest convergence γ is our analytic expression that bounds the error (largest eigenvalue of the error s transition matrix) log 10 e k Iteration number 25

Convergence We apply the K-SVD on noisy House image σ = 25, with a fixed W Converges faster and improves the PSNR log 10 e k PSNR of x k [db] Iteration number Iteration number 26

Results We successfully boost several state-of-the-art denoising algorithms: K-SVD, NLM, BM3D, and EPLL Without any modifications, simply by applying the original software as a black-box We manually tuned two parameters ρ signal emphasis factor σ noise level, which is an input to the denoiser Since the noise level of y + ρx k is higher than the one of y 27

Quantitative Comparison Average boosting in PSNR* over 5 images (higher is better): Noise std Improved Methods σ K-SVD NLM BM3D EPLL 10 0.13 0.44 0.01 0.13 20 0.22 0.34 0.02 0.25 25 0.26 0.41 0.03 0.26 50 0.77 0.30 0.07 0.30 75 1.26 0.56 0.11 0.32 100 0.81 0.36 0.14 0.27 *PSNR = 20log 10 255 MSE 32

Visual Comparison: K-SVD Original K-SVD results, σ = 25 29.06dB 33

Visual Comparison: K-SVD SOS K-SVD results, σ = 25 29.41dB 34

Visual Comparison: EPLL Original EPLL results, σ = 25 Forman House 32.44dB 32.07dB 35

Visual Comparison: EPLL SOS EPLL results, σ = 25 Forman House 32.78dB 32.38dB 36

Reducing the Local-Global Gap

Reaching a Consensus It turns out that the SOS boosting treats a major shortcoming of many patch-based methods: Local processing of patches VS. the global need in a whole denoised result We define the local disagreements by Disagreement patch Local independent denoised patch Globally averaged patch The disagreements Naturally exist since each noisy patch is denoised independently Are based on the intermediate results 38

Sharing the Disagreement Inspired by the Consensus and Sharing problem from game-theory: There are several agents, each one of them aims to minimize its individual cost (i.e., representing the noisy patch sparsely) These agents affect a shared objective term, describing the overall goal (i.e., obtaining the globally denoised image) Imitating this concept, we suggest sharing the disagreements Noisy patches Patch-based denoising Patch Avg. Est. Image Noisy image Disagreement per-patch 39

Connection to SOS Boosting Interestingly, for a fixed filter matrix W, sharing the disagreement and the SOS boosting are equivalent x k+1 = W y + x k x k The connection to the SOS is far from trivial because The SOS is blind to the intermediate results (the independent denoised patches, before patch-averaging) The intermediate results are crucial for sharing the disagreement approach The SOS boosting reduces the Local/Global gap 40

Graph-Based Interpretation

Graph-Based Analysis Essentially, the filter-matrix W is an adaptive filter, where the i th denoised pixel is obtained by x i = W i,j measures the similarity between the i th and j th pixels A large value implies large similarity A small value implies small similarity For kernel-based methods (NLM, Bilateral, LARK): W i,j is a function of the Euclidean distance between pixels For sparsity-based methods (K-SVD): W i,j is a function of the dictionary atoms that were chosen to represent the patch j W i,j y Measures the affinity between pixels, through the dictionary 42

Graph Laplacian The normalized Graph Laplacian can be defined as Encapsulates the structure of underlying signal Most of the image content is represented by the eigenvectors that correspond to the small eigenvalues Most of the noise is represented by eigenvectors that correspond to the large eigenvalues What can we do with L? L = I W Regularize the inverse problem by encouraging similar pixels to remain similar in the final estimate 43

Graph Laplacian Regularization The regularization can be defined as [Elmoataz et al. ( 08), Bougleux et al. ( 09)] x = min x Seeks for an estimation that is close to the noisy version x y 2 2 + ρx T Lx Another option is to integrate the filter also in the data fidelity term [Kheradmand and Milanfar ( 13)] x = min x While promoting similar pixels to remain similar x y T W x y + ρx T Lx Using the adaptive filter as a weight-matrix 44

Graph Laplacian Regularization Another natural option is to minimize the following cost function x = min x x Wy + ρx T Lx Seeks for an estimation that is close to the denoised version Its closed-form solution is the steady-state outcome of the SOS x = I + ρ I W 1 Wy = I + ρl 1 Wy The SOS boosting acts as a graph Laplacian regularizer We have also expressed the previous cost functions in the SOS language more can be found in our paper 45

Conclusions The SOS boosting algorithm: Easy to use In practice, we treat the denoiser f as a black-box Applicable to a wide range of denoising algorithms Guaranteed to converge for most denoising algorithms Thus, has a straightforward stopping criterion Reduces the local-global gap Acts as a graph Laplacian regularizer Improves the state-of-the-art methods 46

We are Done Thank you! Questions? 47