Regularization Theory

Similar documents
3 Compact Operators, Generalized Inverse, Best- Approximate Solution

Regularization and Inverse Problems

Numerische Mathematik

Preconditioned Newton methods for ill-posed problems

COMPUTATION OF FOURIER TRANSFORMS FOR NOISY B FOR NOISY BANDLIMITED SIGNALS

Spectral Regularization

Two-parameter regularization method for determining the heat source

Convergence rates of the continuous regularized Gauss Newton method

Machine Learning. Regression. Manfred Huber

Lecture 2: Tikhonov-Regularization

Regularization via Spectral Filtering

Super-Resolution. Shai Avidan Tel-Aviv University

PDEs in Image Processing, Tutorials

Ill-Posedness of Backward Heat Conduction Problem 1

Numerical differentiation by means of Legendre polynomials in the presence of square summable noise

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods

Deep Learning: Approximation of Functions by Composition

Optimal control as a regularization method for. an ill-posed problem.

Accelerated Newton-Landweber Iterations for Regularizing Nonlinear Inverse Problems

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Regularization in Banach Space

Convolutional Neural Networks

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Iterative Methods for Ill-Posed Problems

Linear Diffusion and Image Processing. Outline

AN ANALYSIS OF MULTIPLICATIVE REGULARIZATION

Reproducing Kernel Hilbert Spaces

Convergence rates of spectral methods for statistical inverse learning problems


Irregular Solutions of an Ill-Posed Problem

Morozov s discrepancy principle for Tikhonov-type functionals with non-linear operators

Machine Learning in Modern Well Testing

arxiv: v3 [math.na] 15 Aug 2018

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Inverse problem and optimization

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

A Double Regularization Approach for Inverse Problems with Noisy Data and Inexact Operator

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Spatial Transformer Networks

Nonlinear Inverse Problems: Theoretical Aspects and Some Industrial Applications

Parameter Identification

2 Tikhonov Regularization and ERM

The Learning Problem and Regularization

In the Name of God. Lectures 15&16: Radial Basis Function Networks

Convergence Rates in Regularization for Nonlinear Ill-Posed Equations Involving m-accretive Mappings in Banach Spaces

arxiv: v1 [math.na] 30 Jan 2018

Spectral Filtering for MultiOutput Learning

Learning, Regularization and Ill-Posed Inverse Problems

Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks

A semismooth Newton method for L 1 data fitting with automatic choice of regularization parameters and noise calibration

Regularization for a Common Solution of a System of Ill-Posed Equations Involving Linear Bounded Mappings 1

Reproducing Kernel Hilbert Spaces

Direct estimation of linear functionals from indirect noisy observations

Inverse problems in statistics

Conditional stability versus ill-posedness for operator equations with monotone operators in Hilbert space

Generalized Local Regularization for Ill-Posed Problems

An Iteratively Regularized Projection Method with Quadratic Convergence for Nonlinear Ill-posed Problems

Tuning of Fuzzy Systems as an Ill-Posed Problem

A NOTE ON THE NONLINEAR LANDWEBER ITERATION. Dedicated to Heinz W. Engl on the occasion of his 60th birthday

Spatial Transformation

Iterative regularization of nonlinear ill-posed problems in Banach space

The Application of Extreme Learning Machine based on Gaussian Kernel in Image Classification

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Super-Resolution. Dr. Yossi Rubner. Many slides from Miki Elad - Technion

Introduction to Bayesian methods in inverse problems

PhD Course: Introduction to Inverse Problem. Salvatore Frandina Siena, August 19, 2012

1 Inria Rennes - Bretagne Atlantique 2 Alcatel-Lucent - Bell Labs France

Lecture 14: Deep Generative Learning

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Levenberg-Marquardt method in Banach spaces with general convex regularization terms

Deep Learning (CNNs)

Tikhonov Regularization of Large Symmetric Problems

DETECTING HUMAN ACTIVITIES IN THE ARCTIC OCEAN BY CONSTRUCTING AND ANALYZING SUPER-RESOLUTION IMAGES FROM MODIS DATA INTRODUCTION

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Iterative regularization methods for ill-posed problems

Bayesian Paradigm. Maximum A Posteriori Estimation

Mollifying Networks. ICLR,2017 Presenter: Arshdeep Sekhon & Be

ON THE DYNAMICAL SYSTEMS METHOD FOR SOLVING NONLINEAR EQUATIONS WITH MONOTONE OPERATORS

Improved Rotational Invariance for Statistical Inverse in Electrical Impedance Tomography

MCMC Sampling for Bayesian Inference using L1-type Priors

Inverse scattering problem from an impedance obstacle

Interpolation via weighted l 1 minimization

Statistical Machine Learning

Optimal stopping time formulation of adaptive image filtering

Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning

Learning-Based Image Super-Resolution

Introduction to Convolutional Neural Networks 2018 / 02 / 23

A model function method in total least squares

Adaptive and multilevel methods for parameter identification in partial differential equations

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

REGULARIZATION PARAMETER SELECTION IN DISCRETE ILL POSED PROBLEMS THE USE OF THE U CURVE

Reproducing Kernel Hilbert Spaces

Review: Continuous Fourier Transform

A DISCREPANCY PRINCIPLE FOR PARAMETER SELECTION IN LOCAL REGULARIZATION OF LINEAR VOLTERRA INVERSE PROBLEMS. Cara Dylyn Brooks A DISSERTATION

Deep Bayesian Inversion Computational uncertainty quantification for large scale inverse problems

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

MASTER. The numerical inversion of the Laplace transform. Egonmwan, A.O. Award date: Link to publication

arxiv: v3 [math.na] 11 Oct 2017

Transcription:

Regularization Theory Solving the inverse problem of Super resolution with CNN Aditya Ganeshan Under the guidance of Dr. Ankik Kumar Giri December 13, 2016

Table of Content 1 Introduction Material coverage Introduction to Inverse Problems 2 Regularization Theory Moore-Penrose Generalized Inverse Regularization Operator Order Optimality Continuous Regularization Methods 3 Image Super-Resolution Introduction Image Super-Resolution Training The CNN 2 / 38

Material origin q Regularization of Inverse problems. A book by Dr. Heinz Werner Engl, Dr. Martin Hanke-Bourgeois and Dr. Andreas Neubauer. q Image Super-Resolution Using Deep Convolutional Networks. Research paper by Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang in the journal Computer Vision ECCV 2014,Volume 8692. 3 / 38

What are Inverse Problems? Inverse Problems 4 / 38

What are Inverse Problems? Hadamard s condition for well-posedness q For all admissible data, solution must exist. q For all admissible data, solution is unique. q The solution depends continuously on the data. 5 / 38

What are Inverse Problems? Hadamard s condition for well-posedness q For all admissible data, solution must exist. q For all admissible data, solution is unique. q The solution depends continuously on the data. 5 / 38

What are Inverse Problems? Hadamard s condition for well-posedness q For all admissible data, solution must exist. q For all admissible data, solution is unique. q The solution depends continuously on the data. 5 / 38

What are Inverse Problems? Ill-posed Problems Problems which do not follow all of Hadamard s conditions are called Ill-posed problems. Inverse Problems are mostly Ill-posed. 6 / 38

What are Inverse Problems? Inverse problems are concerned with determining causes for the desired or observed effect. Comparing them with Hadamard s conditionsq They might not have a solution in the strict sense. q They might not have a unique solution. q They might not depend continuously on the data. 7 / 38

What are Inverse Problems? Inverse problems are concerned with determining causes for the desired or observed effect. Comparing them with Hadamard s conditionsq They might not have a solution in the strict sense. q They might not have a unique solution. q They might not depend continuously on the data. 7 / 38

What are Inverse Problems? Inverse problems are concerned with determining causes for the desired or observed effect. Comparing them with Hadamard s conditionsq They might not have a solution in the strict sense. q They might not have a unique solution. q They might not depend continuously on the data. 7 / 38

Table of Content 1 Introduction Material coverage Introduction to Inverse Problems 2 Regularization Theory Moore-Penrose Generalized Inverse Regularization Operator Order Optimality Continuous Regularization Methods 3 Image Super-Resolution Introduction Image Super-Resolution Training The CNN 8 / 38

Generalized Inverse Definition (2.1) Let T : X 7 Y be bounded linear operator. 1. x X is called a least-squares solution of Tx = y, if ktx y k = inf {ktx y k x X } (1) 9 / 38

Generalized Inverse Definition (2.1) Let T : X 7 Y be bounded linear operator. 2. x X is called best-approximate solution of Tx = y if, x is a least-squares solution of Tx = Y and kxk = inf {kxk x is least squares solution} (2) 10 / 38

Generalized Inverse Definition (2.2) The Moore-Penrose Generalized inverse T of T L(X, Y ) is defined e 1 to as the unique linear extension of T D(T ) = R(T ) + R(T ) (3) N(T ) = R(T ) (4) e := T T N(T ) : N(T ) R(T ) (5) Where, 11 / 38

Generalized Inverse Theorem Let y D(T ). Then, Tx = y has a unique best-approximate solution, which is given by x := T y. (6) The set of all least-square solutions is x + N(T ). 12 / 38

Regularization It is the approximation of a well-posed problem by neighboring well-posed problems. We want to find the best-approximate solution x = T y, but only y δ is known, with, yδ y δ 13 / 38

Regularization In Ill-posed problems, T y δ is unbounded(it might not even exist!). Hence, we look for am approximation xαδ, which q depends continuously on the noisy data y δ. q tends to x as noise level decreases to zero (if regularization parameter α is selected appropriately). 14 / 38

Regularization q As we look not for specific values of y, rather for every y R(T ), we regularize the solution operator T. q A simple regularization of T is replacement of unbounded operator T by a parameter-dependant family {Rα }, taking xαδ = Rα y δ. q This way we define the regularization operator for the whole collection of equations. Tx = y y D(T ) 15 / 38

Regularization Definition (3.1) Let T : X Y bea a bounded linear operator between Hilbert spaces X and Y, α0 (0, + ). for every α (0, α0 ), let Rα : Y X be a continuous(not necessarily linear) operator.the family {Rα } is called a regularization or a regularization operator for T, if, for all y D(T ), there exists a parameter choice rule α = α(y δ, δ) such that lim sup{ Rα(y δ,δ) y δ T y y δ Y, y δ y δ} = 0 δ 0 (7) holds. 16 / 38

Regularization Definition (continued) Here, α : R + Y (0, α0 ) (8) lim sup{α(y δ, δ) y δ Y, y δ y δ} = 0 (9) is such that δ 0 For a specific y D(T ), a pair (Rα, α) is called a convergent regularization method if 7 and 9 holds. 17 / 38

18 / 38 Definition (3.2) Let α be a parameter choice rule according to definition 3.1. If α does not depend on y δ, but only on δ, then we call α an a-priori parameter choice rule and write α = α(δ). Otherwise, we call it a a-posteriori parameter choice rule. (If α = α(y δ ), α is called an error-free parameter choice rule)

Order optimality The rate at which xα x 0 as α 0. (10) or xα(δ,y δ ) x 0 as δ 0. (11) 19 / 38

Order Optimality Definition (3.3) The worst-case error under the information that y δ y δ and a-priori information that x M is given by 4(δ, M, R) = sup{ Ry δ x x M, y δ Y, Tx y δ δ} (12) 20 / 38

Order Optimality Convergence rates can only be on subsets of D(T ). i.e. under a-priori assumptions on the exact data.hence, we consider subsets of the form {x X x = Bw, kwk ρ} where B is a linear operator from some Hilbert space into X. For the choice of B, B = (T T )µ for some µ > 0, we denote the set formed by Xµ,ρ := {x X x = (T T )µ w, kwk < ρ} (13) 21 / 38

Order Optimality We use further the notation, [ µ Xµ,ρ = R((T T ) ) Xµ := (14) ρ>0 These are usually called Source sets, x Xµ,ρ is said to have a source representation. This requirement can be considered as a smoothness condition. 22 / 38

Order Optimality Definition (3.4) Let R(T ) be non-closed, {Rα } be a regularization operator for T. For µ, ρ > 0 and y TXµ,ρ, let α be a parameter choice rule. We call (Rα, α) optimal in Xµ,ρ if 2µ 1 4(δ, Xµ,ρ, Rα ) = δ 2µ+1 ρ 2µ+1 (15) holds for all δ > 0. We call (Rα, α) of optimal order in Xµ,ρ if there exist a constant c 1 such that 2µ 1 4(δ, Xµ,ρ, Rα ) cδ 2µ+1 ρ 2µ+1 (16) holds for all δ > 0 23 / 38

Continuous Regularization Methods Various Parameter Choice rules which give optimal solution under specific conditions exist, such as, q A-priori Parameter Choice rule 2 δ 2µ+1 α ( ) ρ (17) α(δ, y δ ) = sup{α > 0 Txαδ y δ τ δ} (18) τ > sup{ rα (λ) α > 0, λ [0, kt k2 ]} (19) q The Discrepancy Principle where 24 / 38

Continuous Regularization Methods From here on, Various types of regularization methods are generalized, and various required conditions are studied for existence of solution as well as for optimality of the solution. Various Regularization techniques covered include Tikhonov Regularization, Land-weber Iteration, ν method etc. 25 / 38

Table of Content 1 Introduction Material coverage Introduction to Inverse Problems 2 Regularization Theory Moore-Penrose Generalized Inverse Regularization Operator Order Optimality Continuous Regularization Methods 3 Image Super-Resolution Introduction Image Super-Resolution Training The CNN 26 / 38

Image Super-Resolution Single image super-resolution aims at recovering a high-resolution image from a single low resolution image. Since, a multiple solutions exist for any given low-resolution pixel, This problem is inherently ill-posed,due to the non-uniqueness of the solution. 27 / 38

Image Super-Resolution Most of the other state-of-art methods mostly adopt example-based strategy. These methods q Exploit internal similarities of the same image, q Or learn mapping functions from external low and high resolution exemplar pairs. 28 / 38

How is this model different? q This method creates a convolutional neural network that directly learns an end-to-end mapping between low resolution and high resolution images. q It does not explicitly learn dictionaries.they are implicitly achieved through hidden layers. q In this approach the entire Super Resolution pipeline is fully obtained through learning, with little pre/post processing. 29 / 38

How is this model different? q Its structure is implicitly designed with simplicity in mind. q Provides superior accuracy when compared with other state-of-the-art example based methods. q With moderate number of filters and layers, this method achieves fast speed for practical on-line usage even on a CPU.(It also does not require solving of any optimization problem on usage, hence it is even faster.) 30 / 38

Image Super-Resolution Single-image super-resolution algorithms can be classified into four types q Prediction Models q Edge Based Models q Image Statistical Methods q Patch based methods. The majority of SR algorithms focus on grey-scale or single channel image super-resolution.for color images, the aforementioned methods first transform the problem to a different color space, like YCbCr, and SR is applied only to the luminescence channel. 31 / 38

CNN For Super Resolution Consider a single low-resolution image, we first upscale it to the desired size using bicubic interpolation. Let the interpolated image be Y. Aim - to recover F(Y) that is as similar as possible to the ground truth high-resolution image X. 32 / 38

CNN For Super Resolution We will be learning the mapping F, which conceptually consists of three different operations q Patch extraction and representation q non-linear mapping q Reconstruction 33 / 38

Patch Extraction and representation We convolve the image by a set of filters. The first layer can be represented as F1 (Y ) = max{0, W1 Y + B1 } (20) where, W1 and B1 represent the filter and biases respectively. W1 corresponds to n1 filters of support c f1 f1, where c denotes the number of channels in the input image, and f1 denotes the spatial size of the filter. B1 is a n1 dimensional vector, whose each element is associated with a filter. 34 / 38

Non-Linear Mapping From the first layer, we extract an n1 -dimensional feature for each patch.now each of the n1 -dimensional vectors is mapped into another n2 -dimensional vector. This is equivalent to applying n2 filters which have only trivial spatial support 1 1. when filter size is 3 3 etc, the non-linear mapping is a mapping on a patch of the feature map. The operation of the second layer is F2 (Y ) = max{0, W2 F1 (Y ) + B2 } (21) W1 corresponds to n2 filters of support n1 f1 f1, and B1 is a n2 dimensional vector. 35 / 38

Reconstruction Traditionally, the predicted overlapping high-resolution patches are often averaged to produce the full final image. Here, we consider averaging as a pre-defined filter on a set of feature maps.this layer is defined as F3 (Y ) = W3 F2 (Y ) + B3 (22) W1 corresponds to c filters of support n2 f3 f3, and B1 is a c dimensional vector. 36 / 38

Training The CNN Learning the end-to-end mapping function F requires estimation of the network parameters Θ = W1, W2, W3, B1, B2, B3. This is done by minimizing the loss function between the reconstructed imagef (Y ; Θ) and X. n L(Θ) = 1X kf (Yi ; Θ) Xi k2 n (23) i=1 37 / 38

Thank you 38 / 38