Regularization Theory - PDF Free Download

Regularization Theory Solving the inverse problem of Super resolution with CNN Aditya Ganeshan Under the guidance of Dr. Ankik Kumar Giri December 13, 2016

Table of Content 1 Introduction Material coverage Introduction to Inverse Problems 2 Regularization Theory Moore-Penrose Generalized Inverse Regularization Operator Order Optimality Continuous Regularization Methods 3 Image Super-Resolution Introduction Image Super-Resolution Training The CNN 2 / 38

Material origin q Regularization of Inverse problems. A book by Dr. Heinz Werner Engl, Dr. Martin Hanke-Bourgeois and Dr. Andreas Neubauer. q Image Super-Resolution Using Deep Convolutional Networks. Research paper by Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang in the journal Computer Vision ECCV 2014,Volume 8692. 3 / 38

What are Inverse Problems? Inverse Problems 4 / 38

What are Inverse Problems? Hadamard s condition for well-posedness q For all admissible data, solution must exist. q For all admissible data, solution is unique. q The solution depends continuously on the data. 5 / 38

What are Inverse Problems? Ill-posed Problems Problems which do not follow all of Hadamard s conditions are called Ill-posed problems. Inverse Problems are mostly Ill-posed. 6 / 38

What are Inverse Problems? Inverse problems are concerned with determining causes for the desired or observed effect. Comparing them with Hadamard s conditionsq They might not have a solution in the strict sense. q They might not have a unique solution. q They might not depend continuously on the data. 7 / 38

Generalized Inverse Definition (2.1) Let T : X 7 Y be bounded linear operator. 1. x X is called a least-squares solution of Tx = y, if ktx y k = inf {ktx y k x X } (1) 9 / 38

Generalized Inverse Definition (2.1) Let T : X 7 Y be bounded linear operator. 2. x X is called best-approximate solution of Tx = y if, x is a least-squares solution of Tx = Y and kxk = inf {kxk x is least squares solution} (2) 10 / 38

Generalized Inverse Definition (2.2) The Moore-Penrose Generalized inverse T of T L(X, Y ) is defined e 1 to as the unique linear extension of T D(T ) = R(T ) + R(T ) (3) N(T ) = R(T ) (4) e := T T N(T ) : N(T ) R(T ) (5) Where, 11 / 38

Generalized Inverse Theorem Let y D(T ). Then, Tx = y has a unique best-approximate solution, which is given by x := T y. (6) The set of all least-square solutions is x + N(T ). 12 / 38

Regularization It is the approximation of a well-posed problem by neighboring well-posed problems. We want to find the best-approximate solution x = T y, but only y δ is known, with, yδ y δ 13 / 38

Regularization In Ill-posed problems, T y δ is unbounded(it might not even exist!). Hence, we look for am approximation xαδ, which q depends continuously on the noisy data y δ. q tends to x as noise level decreases to zero (if regularization parameter α is selected appropriately). 14 / 38

Regularization q As we look not for specific values of y, rather for every y R(T ), we regularize the solution operator T. q A simple regularization of T is replacement of unbounded operator T by a parameter-dependant family {Rα }, taking xαδ = Rα y δ. q This way we define the regularization operator for the whole collection of equations. Tx = y y D(T ) 15 / 38

Regularization Definition (3.1) Let T : X Y bea a bounded linear operator between Hilbert spaces X and Y, α0 (0, + ). for every α (0, α0 ), let Rα : Y X be a continuous(not necessarily linear) operator.the family {Rα } is called a regularization or a regularization operator for T, if, for all y D(T ), there exists a parameter choice rule α = α(y δ, δ) such that lim sup{ Rα(y δ,δ) y δ T y y δ Y, y δ y δ} = 0 δ 0 (7) holds. 16 / 38

Regularization Definition (continued) Here, α : R + Y (0, α0 ) (8) lim sup{α(y δ, δ) y δ Y, y δ y δ} = 0 (9) is such that δ 0 For a specific y D(T ), a pair (Rα, α) is called a convergent regularization method if 7 and 9 holds. 17 / 38

18 / 38 Definition (3.2) Let α be a parameter choice rule according to definition 3.1. If α does not depend on y δ, but only on δ, then we call α an a-priori parameter choice rule and write α = α(δ). Otherwise, we call it a a-posteriori parameter choice rule. (If α = α(y δ ), α is called an error-free parameter choice rule)

Order optimality The rate at which xα x 0 as α 0. (10) or xα(δ,y δ ) x 0 as δ 0. (11) 19 / 38

Order Optimality Definition (3.3) The worst-case error under the information that y δ y δ and a-priori information that x M is given by 4(δ, M, R) = sup{ Ry δ x x M, y δ Y, Tx y δ δ} (12) 20 / 38

Order Optimality Convergence rates can only be on subsets of D(T ). i.e. under a-priori assumptions on the exact data.hence, we consider subsets of the form {x X x = Bw, kwk ρ} where B is a linear operator from some Hilbert space into X. For the choice of B, B = (T T )µ for some µ > 0, we denote the set formed by Xµ,ρ := {x X x = (T T )µ w, kwk < ρ} (13) 21 / 38

Order Optimality We use further the notation, [ µ Xµ,ρ = R((T T ) ) Xµ := (14) ρ>0 These are usually called Source sets, x Xµ,ρ is said to have a source representation. This requirement can be considered as a smoothness condition. 22 / 38

Order Optimality Definition (3.4) Let R(T ) be non-closed, {Rα } be a regularization operator for T. For µ, ρ > 0 and y TXµ,ρ, let α be a parameter choice rule. We call (Rα, α) optimal in Xµ,ρ if 2µ 1 4(δ, Xµ,ρ, Rα ) = δ 2µ+1 ρ 2µ+1 (15) holds for all δ > 0. We call (Rα, α) of optimal order in Xµ,ρ if there exist a constant c 1 such that 2µ 1 4(δ, Xµ,ρ, Rα ) cδ 2µ+1 ρ 2µ+1 (16) holds for all δ > 0 23 / 38

Continuous Regularization Methods Various Parameter Choice rules which give optimal solution under specific conditions exist, such as, q A-priori Parameter Choice rule 2 δ 2µ+1 α ( ) ρ (17) α(δ, y δ ) = sup{α > 0 Txαδ y δ τ δ} (18) τ > sup{ rα (λ) α > 0, λ [0, kt k2 ]} (19) q The Discrepancy Principle where 24 / 38

Continuous Regularization Methods From here on, Various types of regularization methods are generalized, and various required conditions are studied for existence of solution as well as for optimality of the solution. Various Regularization techniques covered include Tikhonov Regularization, Land-weber Iteration, ν method etc. 25 / 38

Image Super-Resolution Single image super-resolution aims at recovering a high-resolution image from a single low resolution image. Since, a multiple solutions exist for any given low-resolution pixel, This problem is inherently ill-posed,due to the non-uniqueness of the solution. 27 / 38

Image Super-Resolution Most of the other state-of-art methods mostly adopt example-based strategy. These methods q Exploit internal similarities of the same image, q Or learn mapping functions from external low and high resolution exemplar pairs. 28 / 38

How is this model different? q This method creates a convolutional neural network that directly learns an end-to-end mapping between low resolution and high resolution images. q It does not explicitly learn dictionaries.they are implicitly achieved through hidden layers. q In this approach the entire Super Resolution pipeline is fully obtained through learning, with little pre/post processing. 29 / 38

How is this model different? q Its structure is implicitly designed with simplicity in mind. q Provides superior accuracy when compared with other state-of-the-art example based methods. q With moderate number of filters and layers, this method achieves fast speed for practical on-line usage even on a CPU.(It also does not require solving of any optimization problem on usage, hence it is even faster.) 30 / 38

Image Super-Resolution Single-image super-resolution algorithms can be classified into four types q Prediction Models q Edge Based Models q Image Statistical Methods q Patch based methods. The majority of SR algorithms focus on grey-scale or single channel image super-resolution.for color images, the aforementioned methods first transform the problem to a different color space, like YCbCr, and SR is applied only to the luminescence channel. 31 / 38

CNN For Super Resolution Consider a single low-resolution image, we first upscale it to the desired size using bicubic interpolation. Let the interpolated image be Y. Aim - to recover F(Y) that is as similar as possible to the ground truth high-resolution image X. 32 / 38

CNN For Super Resolution We will be learning the mapping F, which conceptually consists of three different operations q Patch extraction and representation q non-linear mapping q Reconstruction 33 / 38

Patch Extraction and representation We convolve the image by a set of filters. The first layer can be represented as F1 (Y ) = max{0, W1 Y + B1 } (20) where, W1 and B1 represent the filter and biases respectively. W1 corresponds to n1 filters of support c f1 f1, where c denotes the number of channels in the input image, and f1 denotes the spatial size of the filter. B1 is a n1 dimensional vector, whose each element is associated with a filter. 34 / 38

Non-Linear Mapping From the first layer, we extract an n1 -dimensional feature for each patch.now each of the n1 -dimensional vectors is mapped into another n2 -dimensional vector. This is equivalent to applying n2 filters which have only trivial spatial support 1 1. when filter size is 3 3 etc, the non-linear mapping is a mapping on a patch of the feature map. The operation of the second layer is F2 (Y ) = max{0, W2 F1 (Y ) + B2 } (21) W1 corresponds to n2 filters of support n1 f1 f1, and B1 is a n2 dimensional vector. 35 / 38

Reconstruction Traditionally, the predicted overlapping high-resolution patches are often averaged to produce the full final image. Here, we consider averaging as a pre-defined filter on a set of feature maps.this layer is defined as F3 (Y ) = W3 F2 (Y ) + B3 (22) W1 corresponds to c filters of support n2 f3 f3, and B1 is a c dimensional vector. 36 / 38

Training The CNN Learning the end-to-end mapping function F requires estimation of the network parameters Θ = W1, W2, W3, B1, B2, B3. This is done by minimizing the loss function between the reconstructed imagef (Y ; Θ) and X. n L(Θ) = 1X kf (Yi ; Θ) Xi k2 n (23) i=1 37 / 38

Thank you 38 / 38