Numerical Methods for Separable Nonlinear Inverse Problems with Constraint and Low Rank

Similar documents
Inverse Ill Posed Problems in Image Processing

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Regularization methods for large-scale, ill-posed, linear, discrete, inverse problems

Mathematical Beer Goggles or The Mathematics of Image Processing

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017

What is Image Deblurring?

Ill Posed Inverse Problems in Image Processing

COMP 558 lecture 18 Nov. 15, 2010

Choosing the Regularization Parameter

Numerical Linear Algebra and. Image Restoration

Golub-Kahan iterative bidiagonalization and determining the noise level in the data

Linear Inverse Problems

Advanced Numerical Linear Algebra: Inverse Problems

Regularization Parameter Estimation for Least Squares: A Newton method using the χ 2 -distribution

Linear Least-Squares Data Fitting

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

ITERATIVE REGULARIZATION WITH MINIMUM-RESIDUAL METHODS

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Computational Methods. Eigenvalues and Singular Values

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

6 The SVD Applied to Signal and Image Deblurring

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

8 The SVD Applied to Signal and Image Deblurring

Introduction. Chapter One

Computational Methods. Least Squares Approximation/Optimization

NUMERICAL OPTIMIZATION METHODS FOR BLIND DECONVOLUTION

From Stationary Methods to Krylov Subspaces

8 The SVD Applied to Signal and Image Deblurring

SIGNAL AND IMAGE RESTORATION: SOLVING

Discrete Ill Posed and Rank Deficient Problems. Alistair Boyle, Feb 2009, SYS5906: Directed Studies Inverse Problems 1

Statistically-Based Regularization Parameter Estimation for Large Scale Problems

Downloaded 06/11/15 to Redistribution subject to SIAM license or copyright; see

On the regularization properties of some spectral gradient methods

Maths for Signals and Systems Linear Algebra in Engineering

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

2 Nonlinear least squares algorithms

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS

Basic Math for

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Newton s Method for Estimating the Regularization Parameter for Least Squares: Using the Chi-curve

Tikhonov Regularization of Large Symmetric Problems

Notes on Eigenvalues, Singular Values and QR

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz

3 QR factorization revisited

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Review of some mathematical tools

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Krylov subspace iterative methods for nonsymmetric discrete ill-posed problems in image restoration

2 Tikhonov Regularization and ERM

6.4 Krylov Subspaces and Conjugate Gradients

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Matrix Derivatives and Descent Optimization Methods

Statistical Geometry Processing Winter Semester 2011/2012

COMPUTATIONAL ISSUES RELATING TO INVERSION OF PRACTICAL DATA: WHERE IS THE UNCERTAINTY? CAN WE SOLVE Ax = b?

Least Squares Approximation

Laboratorio di Problemi Inversi Esercitazione 2: filtraggio spettrale

(f(x) P 3 (x)) dx. (a) The Lagrange formula for the error is given by

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm

Least-Squares Fitting of Model Parameters to Experimental Data

Lecture 6. Numerical methods. Approximation of functions

Numerical Methods in Matrix Computations

Discrete ill posed problems

IV. Matrix Approximation using Least-Squares

arxiv: v1 [math.na] 3 Jan 2019

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Applied Numerical Linear Algebra. Lecture 8

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,

17 Solution of Nonlinear Systems

Linear Systems. Carlo Tomasi

arxiv: v1 [math.na] 15 Jun 2009

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

1 Singular Value Decomposition and Principal Component

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Numerical Methods for Inverse Kinematics

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation

The Singular Value Decomposition

Introduction to Linear Systems

1 Cricket chirps: an example

Final Examination. CS 205A: Mathematical Methods for Robotics, Vision, and Graphics (Fall 2013), Stanford University

Math 118, Fall 2014 Final Exam

Lecture Notes: Geometric Considerations in Unconstrained Optimization

6 EIGENVALUES AND EIGENVECTORS

On nonstationary preconditioned iterative regularization methods for image deblurring

Written Examination

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

1 Linearity and Linear Systems

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

CSC 576: Linear System

MATH 3795 Lecture 10. Regularized Linear Least Squares.

Chapter 3 Numerical Methods

Mathematics and Computer Science

Transcription:

Numerical Methods for Separable Nonlinear Inverse Problems with Constraint and Low Rank Taewon Cho Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master in Mathematics Julianne Chung, Chair Matthias Chung Mark Embree Nov 20, 207 Blacksburg, Virginia Keywords: Nonlinear Inverse Problem, Image Deblurring, Gauss-Newton method, Variable Projection, Alternating Optimization Copyright 207, Taewon Cho

Numerical Methods for Separable Nonlinear Inverse Problems with Constraint and Low Rank Taewon Cho (ABSTRACT) In this age, there are many applications of inverse problems to lots of areas ranging from astronomy, geoscience and so on For example, image reconstruction and deblurring require the use of methods to solve inverse problems Since the problems are subject to many factors and noise, we can t simply apply general inversion methods Furthermore in the problems of interest, the number of unknown variables is huge, and some may depend nonlinearly on the data, such that we must solve nonlinear problems It is quite different and significantly more challenging to solve nonlinear problems than linear inverse problems, and we need to use more sophisticated methods to solve these kinds of problems

Numerical Methods for Separable Nonlinear Inverse Problems with Constraint and Low Rank Taewon Cho (GENERAL AUDIENCE ABSTRACT) In various research areas, there are many required measurements which can t be observed due to physical and economical reasons Instead, these unknown measurements can be recovered by known measurements This phenomenon can be modeled and be solved by mathematics

Contents Introduction 2 Background 5 2 Point Spread Function (PSF) 5 2 One-dimensional 9 22 Two-dimensional 23 Low-Rank PSF problem 4 22 Regularization for the Linear Problem 5 22 Picard Condition 6 222 Spectral Filtering Methods 7 223 Choosing the Regularization Parameter 9 23 Gauss-Newton Method Nonlinear Least Squares 23 24 Variable Projection for Separable Nonlinear Least-Squares Problems 27 3 Exploiting a Low Rank PSF in Solving Nonlinear Inverse Problems 32 iii

3 Symmetric PSF 33 32 Non-symmetric PSF 35 33 Reformulation 38 4 Numerical Results 40 4 Variable projection with low rank PSF 40 42 x true - Alternating Optimization 47 43 Alternating Optimization 3 ways 5 5 Conclusions and Discussion 60 6 References 62 iv

List of Figures Forward Problem 2 Example of image blurring 2 2 Blurring by PSF 6 22 Discrete Picard Conditions from [2] 7 4 True parameters y true, z true 4 42 Comparing λ nl = 0025, 0028, 003, 0032, 0035 42 43 Error compares with graph for Non-symmetric PSF with λ nl = 003 43 44 A comparison the true, blurred, deblurred images as the reduced Gauss-Newton method with non-symmetric PSF and λ nl = 003 45 45 A comparison the true, reconstructed PSFs with λ nl = 003 46 46 Error Compares with Alternating Optimization 48 47 Error Compares with Alternating Optimization 3 ways 52 48 Rotated true and blurred images with 0 o, 90 o, 80 o, 270 o 53 49 A comparison errors of y and z by Alternating Optimization 3 ways with rotation 54 v

40 Relative errors by Alternating Optimization 3 ways with rotation 55 4 Computed images by Alternating Optimization 3 ways with rotation (64 64 size) 56 42 Computed PSFs by Alternating Optimization 3 ways with rotation (64 64 size) 57 43 Computed images by Alternating Optimization 3 ways with rotation (256 256 size) 58 44 Computed PSFs by Alternating Optimization 3 ways with rotation (256 256 size) 59 vi

List of Tables 4 Table of relative norm errors for Non-symmetric PSF with λ nl = 003 44 42 Table of relative norm errors for Alternating Optimization 49 vii

Chapter Introduction x A b Input Forward Operation Output Figure : Forward Problem We are usually interested in forward problems where, we just compute data from given parameters In other words, if we have input data and a forward procedure, we observe output data In this case the input data and the forward procedure are known quantities, and the output is unknown For the problems of interest here, evaluating the forward procedure usually does not require us to spend much time or lots of cost But let s consider the problem where we know the output data and don t know one or more of the input parameters and forward procedure Then inversion becomes a much more complicated task because we need to modify this system to get unknown data Problems where the goal is to compute unknown input data and the forward procedure from output data, are called inverse problems [2] In simulated problems, we get observed data from the forward system

Taewon Cho Chapter Introduction 2 (a) True Image (b) Blur (c) Blurred Image Figure 2: Example of image blurring and the goal is to obtain the original one; it is an inverse problem Other examples arise when we take blurred images in astronomy, medicine, or geoscience We desire to obtain true images from blurred images (Figure ) or we want to check some characteristic of inside the earth through surface measurements In order to understand the inverse problem, we need tools from basic linear algebra and matrix computations Let A R n n be a linear operator and let x R n, b R n be vectors Then the forward model is given by b = Ax That is, in the forward model, we know A, x and we compute b by matrix-vector multiplication On the other hand, if we only know A and b, then it becomes a complicated problem to find the exact x This type of problem is known as an inverse problem and in well-posed situations, we could get x by using the inverse matrix (or pseudo inverse) of A But in many real applications, A is more likely to be ill-conditioned or a singular matrix This means that the inverse may not exist and if it exists, it may not be easy to compute the inverse matrix of A, A even when A is well-conditioned but large, and solve for x = A b even if we can use high performance computers Then it will require tons of time and cost to compute the inverse matrix When A is singular, A does not exist And if A is not a square matrix and it is a m-by-n rectangular matrix with m > n, we are not able to use general ways Sometimes x won t be unique So we need to solve that least squares problem, min b Ax, using techniques such x

Taewon Cho Chapter Introduction 3 as the normal equations, QR decomposition, and Singular Value Decomposition (SVD)such as A = UΣV T where the U, V are orthogonal matrices and the Σ is a diagonal matrix with nonnegative real numbers Regularization is one approach to impose prior knowledge to solve the inverse problem more correctly The meaning of solving correctly is obtaining x reg which is close to x true To analyze what happens when solving for x, we use the SVD form, where x reg is expressed by x reg = n i= φ i u T i b σ i v i, () where the σ i s are diagonal elements of Σ, the u i s are column vectors of U, and the v i s are column vectors of V And the φ i s are filtering factors which play a big role in regularization We will discuss regularization more in chapter 2 But even if we use regularization to stabilize the inversion process, it is still a challenge for us because the forward operator A may depend on some unknown parameters In many real cases, A may not be known exactly Thus when we build A, we need to incorporate a new variable y Now we need to consider the form A = A(y) where y R n Then we will have b = A(y)x and we need to solve a nonlinear least squares problem with both linear parameters x and nonlinear parameters y, min x,y b A(y)x 2 2 (2)

Taewon Cho Chapter Introduction 4 In many scenarios, it may be desirable to include additional solution constraints For example, since images often represent light intensities or densities, pixels should be expressed by nonnegative values in the matrix Thus we could force x to be constrained such as x 0 Various methods have been investigated such as the Active Set method [] However in our numerical experience, we did not observe significant improvements for the Active Set, thus we do not consider it here However, we do enforce y 0 Therefore our goal in this thesis is to solve the following constrained nonlinear least squares problem min x,y A(y)x b 2 2 (3) st y 0 where A is forward operator matrix depending on unknown parameters y, b is observed data vector, and x is true parameter vector To solve this problem, we will use the Gauss-Newton method, but it is very expensive to compute the Jacobian at every iteration Thus we need to discuss how to use variable projection methods to reduce the computational costs and exploit the separable model We also consider alternating optimization methods that can exploit problem structure In this thesis, we will start by introducing how to construct the PSF and how to mathematically describe image deblurring as a linear model And then we are going to look at how to regularize the linear problem and investigate methods for nonlinear least squares problems including Gauss-Newton, variable projection, and alternating optimization Then we will apply the numerical method to blind image deblurring problems and analyze the numerical results

Chapter 2 Background 2 Point Spread Function (PSF) First we describe what the point spread function (PSF) is The PSF is very important in image processing because it can be used to describe a blur and define the forward operation There are different reasons for blurs Mainly we can split them into physical and mechanical processes For example in taking a photo, moving the camera or imaging through the atmosphere physically cause blurs A deformed or broken lens is one mechanical reason for blur [3, 2] The PSF can be used to construct mathematical models The PSF is based on the assumption that each pixel is blurred by its neighboring pixels For example in two-dimensions, if there is the only intensity at the center of the matrix, then a PSF array containing values of 0, 2 0 would result in the origin image being blurred In Figure 2, the operator means that each cell in the blurred matrix is calculated by a sum of component-wise multiplication between the origin matrix and PSF array with the 5

Taewon Cho Chapter 2 Background 6 Origin PSF array Blurred 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 = 0 2 0 0 0 0 0 0 0 0 0 0 0 Figure 2: Blurring by PSF assumption that outside components of the origin matrix are zeros By matching the center of the PSF array and the chosen node, we can compute the blurred matrix like below In center node (,), 0 0 0 0 0 0 0 0 Blurred 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 Sum of all 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 In center node (,2), 0 0 0 0 0 0 0 0 Blurred 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 Sum of all 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Taewon Cho Chapter 2 Background 7 In center node (,3), 0 0 0 0 0 0 0 0 Blurred 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 Sum of all 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 In center node (2,), 0 0 0 0 0 Blurred 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 Sum of all 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 In center node (2,2), 0 0 0 0 0 Blurred 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 Sum of all 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Taewon Cho Chapter 2 Background 8 In center node (2,3), 0 0 0 0 0 Blurred 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 2 0 0 0 Sum of all 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 In center node (3,), 0 0 0 0 0 Blurred 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 Sum of all 0 2 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 In center node (3,2), 0 0 0 0 0 Blurred 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 Sum of all 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Taewon Cho Chapter 2 Background 9 In center node (3,3), 0 0 0 0 0 Blurred 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 Sum of all 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Regardless of the location of the pixel, the blur is same This is called a spatially invariant blur This is just one example of a blur PSF and various boundary conditions can be used for the image In the blur process, regardless of the PSF, each pixel is influenced by neighboring pixels 2 One-dimensional First let s see how to construct the blur matrix in one-dimension We need to define the forward operation describing that each pixel has an effect on each other with a described weight from the PSF If p(s) and x(s) are continuous functions, then the convolution of p(s) and x(s) is defined by b(t), which is a Fredholm integral equation of the first kind such that b(s) = p(s t)x(t) dt (2) Then for each s, b(s) is obtained by the integration of x(t) with a weight from function p So we have to flip the function p and shift to get p(s t) For the discrete version of convolution, we consider the vectors x, p, and b as the true image, PSF array, and blurred

Taewon Cho Chapter 2 Background 0 image respectively denoted by In one-dimension, for example the true image and PSF in R 3 are w x x 2 x 3 y and p p 2 p 3, where w and y are pixels outside of the original image, which is called the boundary In order to get a pixel value in the blurred image we flip and shift the PSF array, and get, b = p 3 w + p 2 x + p x 2, b 2 = p 3 x + p 2 x 2 + p x 3, b 3 = p 3 x 2 + p 2 x 3 + p y Then we could write this convolution as b p 3 p 2 p b 2 = p 3 p 2 p b 3 p 3 p 2 p w x x 2 x 3 y Now depending on how w and y are defined, we could define different boundary conditions or assumptions Zero boundary condition: Setting boundary pixels to zero In this case, w = 0 and y = 0 And the blur matrix is a Toeplitz matrix

Taewon Cho Chapter 2 Background 0 b p 3 p 2 p x p 2 p b 2 = p 3 p 2 p x 2 = p 3 p 2 p b 3 p 3 p 2 p x 3 p 3 p 2 0 Periodic boundary condition: Setting boundary pixels to be periodic with respect to inside image pixels In this case, w = x 3 and y = x And the matrix is a circulant x x 2 x 3 matrix b p 3 p 2 p b 2 = p 3 p 2 p b 3 p 3 p 2 p x 3 x x 2 x 3 p 2 p p 3 = p 3 p 2 p p p 3 p 2 x x 2 x 3 x Reflexive boundary condition: Setting boundary pixels to reflect inside image pixels In this case, w = x and y = x 3 And the matrix is a Toeplitz plus Hankel matrix b p 3 p 2 p b 2 = p 3 p 2 p b 3 p 3 p 2 p x x x 2 x 3 x 3 p 3 + p 2 p = p 3 p 2 p p 3 p 2 + p x x 2 x 3 22 Two-dimensional Now let s check the two-dimensional case in R 3 3 Consider the matrices

Taewon Cho Chapter 2 Background 2 X = w w 2 w 3 w 4 w 5 w 6 x x 2 x 3 w 7 w 8 x 2 x 22 x 23 w 9 w 0 x 3 x 32 x 33 w w 2 w 3 w 4 w 5 w 6 p p 2 p 3 b b 2 b 3, P = p 2 p 22 p 23, and B = b 2 b 22 b 23, p 3 p 32 p 33 b 3 b 32 b 33 where w i s are outside of the original image To get a blurred image by the convolution operation in two-dimensions, we flip the matrix P vertically and horizontally and shift it Then, for instance, we get the elements of B such that b = p 33 w + p 32 w 2 + p 3 w 3 + p 23 w 6 + p 22 x + p 2 x 2 + p 3 w 8 + p 2 x 2 + p x 22, b 2 = p 33 w 6 + p 32 x + p 3 x 2 + p 23 x + p 22 x 2 + p 2 x 22 + p 3 w 0 + p 2 x 3 + p x 32, b 3 = p 33 w 8 + p 32 x 2 + p 3 x 22 + p 23 x 2 + p 22 x 3 + p 2 x 32 + p 3 w 2 + p 2 w 3 + p w 4, b 2 = p 33 w 2 + p 32 w 3 + p 3 w 4 + p 23 w 8 + p 22 x 2 + p 2 x 3 + p 3 x 2 + p 2 x 22 + p x 23, b 22 = p 33 x + p 32 x 2 + p 3 x 3 + p 23 x 2 + p 22 x 22 + p 2 x 23 + p 3 x 3 + p 2 x 32 + p x 33, b 32 = p 33 x 2 + p 32 x 22 + p 3 x 23 + p 23 x 22 + p 32 x 22 + p 2 x 33 + p 3 w 3 + p 2 w 4 + p w 5, b 3 = p 33 w 3 + p 32 w 4 + p 3 w 5 + p 23 w 0 + p 22 x 3 + p 2 w 7 + p 3 x 22 + p 2 x 23 + p w 9, b 23 = p 33 x 2 + p 32 x 3 + p 3 w 7 + p 23 x 3 + p 22 x 23 + p 2 w 9 + p 3 x 32 + p 2 x 33 + p w, b 33 = p 33 x 22 + p 32 x 23 + p 3 w 9 + p 23 x 32 + p 22 x 33 + p 2 w + p 3 w 4 + p 2 w 5 + p w 6 For the elements, we need to take into account boundary conditions such as zero, periodic, or reflexive Then, for example, we could describe the relations between b = vec(b) and x = vec(x) with zero boundary condition as

Taewon Cho Chapter 2 Background 3 b = b b 2 b 3 b 2 b 22 b 32 b 3 b 23 b 33 = p 22 p 2 p 2 p p 32 p 22 p 2 p 3 p 2 p p 32 p 22 p 3 p 2 p 23 p 3 p 22 p 2 p 2 p p 33 p 23 p 3 p 32 p 22 p 2 p 3 p 2 p p 33 p 23 p 32 p 22 p 3 p 2 p 23 p 3 p 22 p 2 p 33 p 23 p 3 p 32 p 22 p 2 p 33 p 23 p 32 p 22 x x 2 x 3 x 2 x 22 x 32 x 3 x 23 x 33 = A(P)x, where A(p) is a BTTB matrix Notice that we can rewrite by change of variables in (2), b = b b 2 b 3 b 2 b 22 b 32 b 3 b 23 b 33 = x 22 x 2 x 2 x x 32 x 22 x 2 x 3 x 2 x x 32 x 22 x 3 x 2 x 23 x 3 x 22 x 2 x 2 x x 33 x 23 x 3 x 32 x 22 x 2 x 3 x 2 x x 33 x 23 x 32 x 22 x 3 x 2 x 23 x 3 x 22 x 2 x 33 x 23 x 3 x 32 x 22 x 2 x 33 x 23 x 32 x 22 p p 2 p 3 p 2 p 22 p 32 p 3 p 23 p 33 = A(X)vec(P) Thus for invariant PSFs, we have the following property A(P)x = A(X)vec(P) (22) where x = vec(x) We will exploit this property in our algorithms

Taewon Cho Chapter 2 Background 4 23 Low-Rank PSF problem The reason we are interested in low-rank PSFs is that the corresponding matrix A can be describe by a Kronecker product For instance, assume that the PSF is n n and can be written as y y n ] P = yz T = [z z n (23) Then we could write A = A(P) = A(P(y, z)) = A z A y (24) where A y = y + n 2 y y n y + n 2 y y n y + n 2 and A z = z + n 2 z z n z + n 2 z If n is an odd number, then ( + n 2, + n 2 ) will be the center of A y, A z z n z + n 2 The reason we focus on the Kronecker product is that we can use the properties of Kronecker products for efficient computation [3, 6] If A is a Kronecker product, then we could write its SVD like, (U y Σ y V T y) (U z Σ z V T z ) = (U y U z )(Σ y Σ z )(V y V z ) T (25)

Taewon Cho Chapter 2 Background 5 And if P is a sum of rank matrices such as P = n y j z T j, (26) j= A can be expressed as a sum of Kronecker products (See Kamm and Nagy [3]), A = n j= A (j) y A (j) z (27) 22 Regularization for the Linear Problem First we will consider the linear problem and investigate why regularization is needed to compute a solution The objective of regularization is to constrain unwanted parts and to reconstruct more stable solutions that are close to the exact one First, let s investigate classical perturbation theory For some perturbation e, there are two solutions x and x exact such that Ax exact = b exact and Ax = b exact + e, where A is nonsingular square matirx Then we have a bound like x x exact x exact e cond(a) b exact, (28) where cond(a) = A A [2, 4] When A is ill-posed, cond(a) is very large, and x may not be close to x exact Although it has an upper bound, empirically the error between x and x exact tends to follow the upper bound So regularization is needed to make solution x close to x exact []

Taewon Cho Chapter 2 Background 6 In this thesis we will use the SVD formulation to express and describe the regularization methods (), where φ i are the filter factors being decided by the regularization method Regularization for inverse problems is a well-studied field with many excellent textbooks and papers, eg[, 2, 3] 22 Picard Condition For the linear problem Ax = b with A a nonsingular square matrix, we could describe the inverse solution x from the SVD, V T x = Σ U T b = n i= u T i b σ i, x = VΣ U T b = n i= u T i b σ i v i (29) The discrete Picard condition is proposed and studied in [2, 5] - The Discrete Picard Condition: Let τ be the level at which the computed singular value σ i levels off because of rounding errors The discrete Picard condition requires, for all singular values larger than τ, the corresponding u T i b decay faster than the σ i Without noise, if the solution satisfies the discrete Picard condition, u T i b decays faster than σ i until round off error But with noise, the solution starts to be violated, so that it won t satisfy the discrete Picard condition If we check the Picard plot (Figure 22) with u T i b/σ i, u T i b, and σ i, they are supposed to be decreasing and the rate of decay of u T i b is faster than the rate of decay of σ i, so that u T i b/σ i also decreases However when noise e is added to b, the rate of decay of u T i b becomes slower than σ i Hence u T i b/σ i starts to increase without bound some point, which makes a solution x far away from the true x From this result, if we could suppress noise after that point, we could get a more stable

Taewon Cho Chapter 2 Background 7 Picard plot Picard plot 0 0 0 0 0-5 0 0 0-0 0-5 i u i T b 0-0 i u i T b u i T b / i u i T b / i 0 0 20 30 40 50 60 70 i 0 0 20 30 40 50 60 70 i (a) Rounding error (b) Noise: 0 6 error Figure 22: Discrete Picard Conditions from [2] approximated solution To avoid these noise components, we could use Truncated SVD or Tikhonov regularization 222 Spectral Filtering Methods For problems where the SVD of A can be computed, we will use the SVD of A as (29) to compute x The noise components start to disturb the solution when the rate of decreasing σ i is faster than the rate of decay of u T i b To make a stable approximated solution, the solution needs to be free from the noise components Then let s look for a way to avoid the noise components First, we could cut the noise parts from the sum of ut i b σ i v i By cutting them simply, we could make x not to be unbounded This method is called Truncated SVD(TSVD) We could split at some k like x = k i= u T i b σ i v i + n i=k+ u T i b v i σ i

Taewon Cho Chapter 2 Background 8 where the rate of decay of u T i b becomes slower than σ i at k TSVD cuts off all noise components and we get the regularized solution x = k i= u T i b σ i v i But we still need to determine k where we have to cut off Another option is to remove noise components using filters Rather than cut off at some point k, we will define a new function called the filtering function φ i for each i =,, n, with given parameter λ > 0, φ i = σ2 i λ 2 + σi 2 and we will multiply this filter function, so we get regularized solution x = n i= φ i u T i b σ i v i If σ i is much bigger than the given λ, φ i will be close to If σ i is much less than the given λ, φ i will be close to 0 Thus we could minimize disturbance from noise This method is called Tikhonov regularization It can be shown that this form is equivalent to the optimization problem, Notice that, min x Ax b 2 2 + λ 2 x 2 2 (20) min Ax b 2 2 + λ 2 x 2 2 = min A x b x x λi 0 2 2

Taewon Cho Chapter 2 Background 9 Then the solution x can be obtained by the normal equations, x = (A T A + λ 2 I) A T b = (VΣU T UΣV T + λ 2 VV T ) VΣU T b = (V(Σ 2 + λ 2 I)V T ) VΣU T b = V(Σ 2 + λ 2 I) V T VΣU T b (2) = V(Σ 2 + λ 2 I) ΣU T b = n u T i b φ i v i σ i i= Similar to TSVD, we need to select a regularization parameter λ that gives us a stable approximated solution 223 Choosing the Regularization Parameter In this section, we describe various methods to select k or λ Now define diagonal filter matrix, φ(λ) = diag(φ, φ 2,, φ n ) Let s consider the error of the solution from Tikhonov regularization, x exact x = x exact Vφ(λ)Σ U T b = x exact Vφ(λ)Σ U T (b exact + e) = x exact Vφ(λ)Σ U T b exact Vφ(λ)Σ U T e = x exact Vφ(λ)Σ U T Ax exact Vφ(λ)Σ U T e (22) = (I Vφ(λ)Σ U T A)x exact Vφ(λ)Σ U T e = (VV T Vφ(λ)Σ U T UΣV T )x exact Vφ(λ)Σ U T e = V(I φ(λ))v T x exact Vφ(λ)Σ U T e

Taewon Cho Chapter 2 Background 20 In this way, we can consider the error of the solution in two parts The first term is called the regularization error and the second term is called the perturbation error When λ is close to zero, then the regularization error is very small because φ(λ) is close to I as λ 0 but the perturbation error can be large Conversely, as λ increases the perturbation error decreases but the regularization error increases So we need to give an appropriate value of λ to get balance between the regularization and perturbation error in order to minimize the error of the solution A similar form for the filter factors can be obtained for TSVD where φ i = or φ i = 0 We will explore three methods to choose the parameter λ for Tikhonov or the proper index k for TSVD We consider the Discrepancy Principle (DP), L-curve, and the Generalized Cross Validation (GCV) method which is a statistical method We will use the norm to approximate the parameter such as x λ 2 2 and Ax λ b 2 2 or x k 2 2 and Ax k b 2 2 since these norms also will decrease to zero as x x exact First, the Discrepancy Principle is one of the most simple approaches If the noise is known (eg e 2 ), then we will choose k dp or k λ such that Ax k b 2 ν dp e 2 Ax k+ b 2, Ax λ b 2 = ν dp e 2, respectively, where ν dp is a safety factor and ν dp > in [2] So if we know the measure of noise, we could choose a parameter by making the residual error be equal to the error norm with a safety factor But it is a very critical disadvantage because we usually don t know e 2 exactly But if we know it, the Discrepancy Principle is really simple to compute

Taewon Cho Chapter 2 Background 2 Second, the L-curve uses the curvature of curve (log Ax λ b 2, log x λ 2 ) and seeks a location where the curve transitions between horizontal and vertical parts It uses the quantities ξ = x λ 2 2 and ρ = Ax λ b 2 2, and chooses λ to maximize the curvature ĉ λ = 2 ξρ ξ λ 2 ξ ρ + 2λξρ + λ 4 ξξ (λ 2 ξ 2 + ρ 2 ) 3/2 For TSVD, we choose k at the corner of the L-curve Unfortunately, the L-curve method fails when v T i x exact decays to zero quickly or the change in the norms of residual and solution is small compared with two consecutive values of k Last, GCV is a very common and useful method Let s check the difference between b exact and Ax k for some rank-k of TSVD solution Then Ax k b exact = AVφ(k)Σ U T b b exact = UΣV T Vφ(k)Σ U T (b exact + e) UU T b exact = U I k 0 U T b exact + U I k 0 U T e UU T b exact 0 0 0 0 = U I k 0 U T e U 0 0 U T b exact 0 0 0 I n k where φ(k) = diag(,,, 0,, 0) Thus, the error norm becomes Ax k b exact 2 2 = k (u T i e) 2 + n (u T i b exact ) 2 i= i=k+ If we know b exact and a noise e, we will be able to find an appropriate index k to minimize the error But the noise is usually not known and b exact won t be available Since we don t know b exact, we will estimate each element by using the other elements

Taewon Cho Chapter 2 Background 22 Consider the Tikhonov case Without the ith row of A, b, call them A (i) and b (i) respectively, solve x (i) λ by Tikhonov such that x (i) λ = ((A(i) ) T A (i) + λ 2 I n ) (A (i) ) T b (i) and then use this x (i) λ to estimate the element b i by computing A(i, :)x (i) λ Hence our goal is to minimize the errors of each part such that min λ n n (A(i, :)x (i) λ b i) 2 With some technical computation it could be written by min λ n i= ( ) n 2 A(i, :)xλ b i, h ii i= where h ii are the diagonal elements of matrix A(A T A + λ 2 I) A T and x λ is Tikhonov solution But it still has an issue because the solution will depend on the order of h ii So to recover this problem, we replace h ii by the average of them We called this method generalized cross validation (GCV), where it has the minimization form By using the SVD of A, min λ n ( n A(i, :)x λ b i ) 2 trace(a(a T A + λ 2 I) A T )/n i= trace(a(a T A + λ 2 I) A T ) = trace(uσv T V(Σ 2 + λ 2 I) V T VΣU T ) = trace(uσ(σ 2 + λ 2 I) ΣU T ) = trace(uφ(λ)u T ) = trace(φ(λ)) = trace( n φ i (λ)) i=

Taewon Cho Chapter 2 Background 23 Hence GCV chooses λ such that λ GCV minimizes Ax λ b 2 2 ( n n φ i (λ) ) (23) 2 i= For TSVD, since φ(k) = diag(,,, 0,, 0), k GCV minimizes Ax k b 2 2 ( n k ) 2 (24) In summary, we have checked that the regularization method is needed for a linear inverse problem to get a more stable solution which is close to the exact solution Through the Picard condition and plots, we noticed that we should consider the first few elements of the SVD form (29) to avoid unwanted errors Standard methods for choosing the regularization parameter include DP, L-curve, and GCV 23 Gauss-Newton Method Nonlinear Least Squares Next we review optimization methods for solving nonlinear least squares problems Our goal is to minimize f(x) = n rj 2 (x) where r j is a smooth function from R n to R We call r the 2 j= residual vector from R n to R n with r(x) = (r (x), r 2 (x),, r n (x)) T and we get f(x) = 2 r(x) 2 2 Now we can express the derivatives of f(x) in terms of the Jacobian J(x) R n n, where

Taewon Cho Chapter 2 Background 24 r (x) T r 2 (x) T J(x) = r n (x) T such that f 2 f = J(x) T r(x) = J(x) T J(x) + n r j (x) 2 r j (x) j= and f, 2 f are called the gradient vector and Hessian matrix respectively [4] For linear problems where r(x) = Ax b, we have J(x) = A and f = A T (Ax b) The standard Newton method for minimizing f(x) is an iterative method, such as x k+ = x k + α k p k where α k is step size and the descent direction p k is calculated such that 2 f(x k )p k = f(x k ) at each step k The Gauss-Newton method uses an approximation of the Hessian 2 f(x k ) J(x k ) T J(x k ) Then, along with the gradient, f(x k ) = J(x k ) T r(x k ), we get the Gauss-Newton step p GN k such that J(x k ) T J(x k )p GN k = J(x k ) T r(x k ) (25)

Taewon Cho Chapter 2 Background 25 The reason we use the approximation is to avoid calculating the Hessian matrix 2 f Furthermore J(x) T J(x) usually dominates the second term of 2 f(x) from the Taylor series Then let s go back to the nonlinear problem min x,y b A(y)x 2 2 and from Chung and Nagy [7], we can define the coupled least squares problem as min w ψ(w) = min w 2 f(w) 2 2 (26) where f(w) = f(x, y) = A(y) x b and w = x λi 0 y Now the problem min ψ(w) can be solved by the Gauss-Newton method, which the iterates w are given by w l+ = w l + d l, l = 0,, 2,, where w 0 is an initial guess, and d l is computed by solving J ψ (w l ) T J ψ (w l )d l = J ψ (w l ) T f(w l ) If we define r = f, then J ψ (w l ) T J ψ (w l )d l = J ψ (w l ) T f(w l ) = J ψ (w l ) T r(w l ) Finding the search direction d l is equivalent to

Taewon Cho Chapter 2 Background 26 min d J ψ (w l ) T J ψ (w l )d J ψ (w l ) T r(z l ) 2 2 = min d J ψ (w l ) T (J ψ (w l )d r(w l )) 2 2 = min d J ψ (w l )d r(w l ) 2 2 The Jacobian matrix J ψ can be written as ] [ f(x, y) J ψ = [f x f y = x ] f(x, y) (27) y In summary, the Gauss-Newton method for min ψ(w) has the following general form w Choose initial w 0 = x 0 2 for l = 0,, 2, y 0 2 r l = b A(y l) x l 0 λi 22 d l = argmin J ψ d r l 2 d 23 w l+ = w l + d l 3 end But calculating and solving the above algorithm with J ψ at each step can be expensive and take much time when the size of x, y is big Alternative approaches include variable projection [6, 0] and alternating optimization [24]

Taewon Cho Chapter 2 Background 27 24 Variable Projection for Separable Nonlinear Least- Squares Problems [ ] T In Golub and Pereyra [6], O Leary and Rust [0], for given observation b = b b n, a separable nonlinear least-square problem consists of a linear combination of nonlinear functions that have multiple parameters Then the residual vector r is defined as where r i (x, y) = b i n x j φ j (y; t i ) j= - t i are independent variables related to b i - the nonlinear functions φ j (y; t i ) are the columns of A(y) and y is computed at all t i - x j and the n-dimensional vector y are obtained by minimizing r(x, y) 2 2 Thus we have r(x, y) 2 2 = b A(y)x 2 2 If we assume we know what the nonlinear parameters y are, then the linear parameters x can be computed as, x = A(y) + b where A + is the pseudoinverse of A(y) and x represents the solution of the linear least squares problem for fixed y If we incorporate this in the nonlinear problem, the optimization problem has the form

Taewon Cho Chapter 2 Background 28 min (I y 2 A(y)A(y)+ )b 2 2 where the linear parameters have vanished So (I A(y)A(y) + )b is called the variable projection of b and I A(y)A(y) + is the projector onto the orthogonal complement of the column space of A(y) In the variable projection method is an iterative nonlinear algorithm that is used to solve the minimum problem in a reduced space In general, it tends to converge in fewer iterations compared to the original minimization problem However, convergence of this method is not guaranteed By eliminating x implicitly, we can reduce the cost to depend only on y Thus this method will be proper when y has relatively fewer parameters than x By the fact that ψ(x, y) from (26) is linear in x, we can consider ρ(y) ψ(x(y), y) = A(y) x(y) b 2 λi 0 2 2 (28) where x(y) is the solution of min ψ(x, y) Now in order to apply the Gauss-Newton algorithm x to ρ(y), we need to compute ρ (y) = ψ y (x, y) = ψ x dx dy + ψ y dy dy Since x is the solution of min ψ(x, y), we assume that x [ ψ x = A(y) T ] λi A(y) x b = 0 λi 0 Thus

Taewon Cho Chapter 2 Background 29 ρ (y) = 0 dx dy + ψ y = ψ y, [ ] T and with f = f f 2 f 2n from f = A(y) x b, λi 0 ψ y = ] f 2 [f 2 y f 2y f 2ny + ] [f 2 f 2 f 2n and so ρ (y) = f T yf From this calculation, we can check that J ρ = f y = y f f 2n A(y)x b = λx [A(y)x] y 0 f y f 2y f 2ny = f T yf, (29) Let Ĵρ = [A(y)x] y and ˆr = b A(y)x Since we can reduce the form as, J T ρ J ρ = [ ] [A(y)x] [ [A(y)x] ] T y 0 y 0 = [ [A(y)x] ] T [A(y)x] y y = ĴT ρ Ĵρ, [ ] [ J T ρ f = [A(y)x] ] T 0 y A(y) x b λi 0 = [ [A(y)x] ] T [ ] y A(y)x b = ĴT ρ ˆr, we can get the search direction d l for the Gauss-Newton algorithm by solving Ĵ T ρ Ĵρd l = ĴT ρ ˆr

Taewon Cho Chapter 2 Background 30 Finding d l is equivalent to min d ĴT ρ Ĵρd ĴT ρ ˆr 2 2 = min d ĴT ρ (Ĵρd ˆr) 2 2 = min d Ĵρd ˆr 2 2 This reduced Gauss-Newton method for min y ρ(y) has the general form : Choose initial y 0 2 for l = 0,, 2, 2 x l = argmin x 22 ˆr l = b A(y l )x l A(y l) x b λ l I 0 23 d l = argmin Ĵρd ˆr l 2 d 24 y l+ = y l + d l 2 3 end Besides computing the search direction d l at each step, we also need to check the step length or distance of descent There is an algorithm for this such as the Armijo rule [20] for step distance α l, y l+ = y l + α l d l But in our numerical examples, the step size does not affect significantly our numerical result so we let α l = For choosing a regularization parameter to solve the linear least squares problem, GCV (23)

Taewon Cho Chapter 2 Background 3 is applied to Tikhonov problem x l = argmin x A(y l) x b λ l I 0 2 And to obtain the nonlinear parameter direction, d l = argmin Ĵρd ˆr l 2 d we solve using the normal equations, d l = (ĴT ρ Ĵρ) Ĵ T ρ ˆr l For enforcing the constraint, y 0, we use the built in Matlab code lsqnonlinm, which is based on [8, 9], to enforce the nonnegativity constraint

Chapter 3 Exploiting a Low Rank PSF in Solving Nonlinear Inverse Problems The general separable nonlinear least squares problem could be written as min x,y A(y)x b 2 2, (3) st y 0 where x R n, y R n 0, A(y) R n n, and b R n For our problem, we will consider point spread functions P that have rank First we consider the symmetric case, where y R n and P = s(y) yyt with s(y) = n n y i y j We also consider the nonsymmetric case, where y, z R n and P(y, z) = i= j= s(y, z) yzt with s(y, z) = n i= j= n y i z j So we can assume that P is rank In order to implement the Gauss-Newton approach to solve the reduced problem, we need to derive the Jacobian, Ĵρ So first we assume that the entries of the PSF are nonnegative 32

Taewon Cho Chapter 3 Exploiting a Low Rank PSF 33 and let the sum of the entries of the PSF be 3 Symmetric PSF Let the PSF be P = s(y) yyt where s(y) = n vec(p(y)) Then the Jacobian Ĵρ R n2 n contains all partial derivatives of vec(p(y)) with respect to y, and Ĵρ has the form i= Ĵ ρ = = y vec(p(y)) y [ s(y) y y ], (32) because for y R n P(y) = vec(p(y)) = s(y) yyt = s(y) y y 2 ] [y y 2 y n y n = s(y) [ y s(y) 2 y 2 y y n y y y n y 2 y n yn 2 y y y y 2 y y n y 2 y y 2 y 2 y 2 y n, y n y y n y 2 y n y n ] T = s(y) y y For all k =,, n, s(y) y k = y k ( n i= j= n y i y j ) = 2 n y i = 2 y i= Therefore

Taewon Cho Chapter 3 Exploiting a Low Rank PSF 34 vec(p(y)) y = y ( y 2 s(y) ) y n ( y 2 s(y) ) y ( yn y s(y) ) y n ( yn y s(y) ) y ( y y n s(y) ) y n ( y y n s(y) ) y ( y 2 n s(y) ) y n ( y 2 n s(y) ) = 2y s(y) 2y 2 y s(y) 2 0 s(y) 2y 2 y s(y) 2 y n s(y) 2y n y y s(y) 2 y s(y) 2y n y y s(y) 2 y n s(y) 2y y n y s(y) 2 y s(y) 2y y n y s(y) 2 0 s(y) 2y 2 n y s(y) 2 2y n s(y) 2y 2 n y s(y) 2 = s(y) 2y y 2 y n 0 0 y y n 0 0 y y 2 2y n 2 y s(y) 2 y 2 y 2 y y n y y 2 y 2 y y n y y y n y 2 y n y 2 n y y n y 2 y n y 2 n

Taewon Cho Chapter 3 Exploiting a Low Rank PSF 35 [ ] Thus with T = R n, Ĵ ρ = vec(p(y)) y = s(y) y I n y n I n + y y 2 y s(y) 2 y y T y n y T (33) = s(y) (y I n + I n y) 2 y s(y) 2 (y yt ) and Ĵρ R n2 n 32 Non-symmetric PSF Now consider the scaled PSF with two different nonnegative vectors y R n and z R n, The Jacobians can be derived as: P(y, z) = s(y, z) yzt where s(y, z) = n i= vec(yz T ) J y = J z = vec(p(y, z)) y vec(p(y, z)) z = [ y s(y, z) = [ z ] (z y), ] (z y) s(y, z), because for y R n, z R n,

Taewon Cho Chapter 3 Exploiting a Low Rank PSF 36 P(y, z) = vec(p(y, z)) = y y z y z 2 y z n s(y, z) yzt = y 2 ] [z s(y, z) z 2 z n = y 2 z y 2 z 2 y 2 z n, s(y, z) y n y n z y n z 2 y n z n [ ] T y s(y, z) z y 2 z y n z y z n y 2 z n y n z n = s(y, z) z y For all k =,, n, s(y, z) y k = y k ( n i= j= n y i z j ) = n z i = z, i= s(y, z) z k = z k ( n i= j= n y i z j ) = n y i = y i= Therefore

Taewon Cho Chapter 3 Exploiting a Low Rank PSF 37 vec(p(y, z)) y = y ( y z s(y, z) ) y n ( y z s(y, z) ) y ( yn z s(y, z) ) y n ( yn z s(y, z) ) y ( y z n s(y, z) ) y n ( y z n s(y, z) ) y ( yn z n s(y, z) ) y n ( yn z n s(y, z) ) = z s(y, z) y z z s(y, z) 2 0 s(y, z) y z z s(y, z) 2 0 s(y, z) y n z z s(y, z) 2 z s(y, z) y n z z s(y, z) 2 z n s(y, z) y z n z s(y, z) 2 0 s(y, z) y z n z s(y, z) 2 0 s(y, z) y n z n z s(y, z) 2 z n s(y, z) y n z n z s(y, z) 2 = s(y, z) z 0 0 0 0 z z n 0 0 0 0 z n z s(y, z) 2 y z y 2 z y n z y z y 2 z y n z y z n y 2 z n y n z n y z n y 2 z n y n z n

Taewon Cho Chapter 3 Exploiting a Low Rank PSF 38 [ ] Thus with T = R n, J y = vec(p(y, z)) y = s(y, z) z I n z n I n z s(y, z) 2 z y T z n y T (34) = s(y, z) (z I n) z s(y, z) 2 (z yt ) Similarly, J z = vec(p(y, z)) z = s(y, z) (I n y) y s(y, z) 2 (z yt ) (35) Note that both J y, J z are in R n2 n Therefore, we finally get the Jacobian [ ] Ĵ ρ = J y J z [ = s(y, z) (z I n) z s(y, z) 2 (z yt ) s(y, z) (I n y) y s(y, z) 2 (z yt ) ] and Ĵρ R n2 2n We will use this Jacbian in Section 4 for the reduced Gauss-Newton method 33 Reformulation We also noticed that a reformulation can be done to simplify the problem For non-symmetric PSF, P = yz T, if we fix x 0 and y 0, then we can show that the problem is linear in z Let vec(x 0 ) = x 0 Then min A(y 0 z T )x 0 b 2 2 = min A(X 0 )(z y 0 ) b 2 z z 2

Taewon Cho Chapter 3 Exploiting a Low Rank PSF 39 And we also can change z y 0 z 2 y 0 z y 0 = = y 0 y0 z z 2 = (I n y 0 )z (36) z n y 0 y 0 z n Thus we have min A(y 0 z T )x 0 b 2 2 = min A(X 0 )(I n y 0 )z b 2 z z 2 (37) Let Y = A(X 0 )(I n y 0 ) Then we have min Yz b 2 2 and we get linear least squares z problem In the same way, if we fix z 0 and x 0, then we can reformulate the nonlinear problem into linear in y: min A(yz T 0 )x 0 b 2 2 = min A(X 0 )(z 0 y) b 2 y y 2 We can change z 0, y z 0, I n y z 0,2 y z 0,2 I n y 2 z 0 y = = = (z 0 I n )y (38) z 0,n y z 0,n I n y n Thus we have min A(yz T 0 )x 0 b 2 2 = min A(X 0 )(z 0 I n )y b 2 y y 2 (39) Let Z = A(X 0 )(z 0 I n ) Then we have min Zy b 2 2 and we get a linear least squares y problem again With these reformulations, the alternating optimization method can be done efficiently See Section 42

Chapter 4 Numerical Results 4 Variable projection with low rank PSF In this chapter, we will solve a nonlinear inverse problem with a separable low rank PSF on an image deblurring example This experiment is called Blind Deconvolution The forward process is (3) In real processing, we don t know how the images are blurred There are many types of blurring which have been investigated eg Gaussian blur [7] But our framework allows more general and realistic blurs through low-rank PSFs We assume that the PSF is a rank matrix Thus we can consider the symmetric case, P = yy T, where y R n and y 0 or the non-symmetric case, P = yz T, where y R n, z R n and y 0, z 0 We will apply the PSF to the Grain image 40

Taewon Cho Chapter 4 Numerical Results 4 05 04 True y Initial y 05 04 True z Initial z 03 03 02 02 0 0 0 50 00 50 200 250 0 50 00 50 200 250 (a) y true, y 0 (b) z true, z 0 Figure 4: True parameters y true, z true from Chung and Nagy [7] The image size is 256 256 This experiment uses a non-symmetiric PSF, so we set true parameter y true and z true in Figure 4 After constructing the PSF, the matrix A is also made Then with the true image, we get b To give noise to the observed data b, % Gaussian white noise is added We force y and z to satisfy that the sum of yz T is by dividing by the sum of elements of yz T We set the initial y 0 by convolving y true with a Gaussian kernel [8], [9] and z 0 similarly We will use Ĵρ from the result of chapter 3 for the non-symmetric case, [ Ĵ ρ = J y J z ] [ = s(y, z) (z I n) z s(y, z) 2 (z yt ) s(y, z) (I n y) y s(y, z) 2 (z yt ) ] and apply the Gauss-Newton method We need to solve min x,y,z A(P(y, z))x b 2 2, (4) st y,z 0

Taewon Cho Chapter 4 Numerical Results 42 03 Relative Error of x Relative Error of y relative error 025 02 05 nl =0025 nl =0028 nl =003 nl =0032 nl =0035 relative error 09 08 07 06 05 nl =0025 nl =0028 nl =003 nl =0032 nl =0035 0 04 03 005 0 2 4 6 8 0 2 4 6 iteration (a) Relative Error of x 02 0 2 4 6 8 0 2 4 6 iteration (b) Relative Error of y Relative Error of z Relative Error of x, y, and z relative error 09 08 07 06 05 nl =0025 nl =0028 nl =003 nl =0032 nl =0035 relative error 09 08 07 06 05 nl =0025 nl =0028 nl =003 nl =0032 nl =0035 04 04 03 03 02 0 2 4 6 8 0 2 4 6 iteration (c) Relative Error of z 02 0 2 4 6 8 0 2 4 6 iteration (d) Relative Error of x, y, and z Figure 42: Comparing λ nl = 0025, 0028, 003, 0032, 0035 but this problem is ill-posed Thus, regularization parameters for linear, λ l, and for nonlinear, λ nl, parameters are added to give a restriction such that min A(P(y, z))x b 2 x,y,z 2 + λ l x 2 2 + λ nl st y,z 0 y z 2 2 (42) The linear regularization parameter λ l is chosen by GCV (23) Without non-linear regularization parameter λ nl, y and z are unstable because the condition number of Ĵρ is a really

Taewon Cho Chapter 4 Numerical Results 43 05 04 True y Initial y Computed y 05 04 True z Initial z Computed z 03 03 02 02 0 0 0 50 00 50 200 250 0 50 00 50 200 250 (a) Compare y true, y initial, y computed (b) Compare z true, z initial, z computed 03 028 026 024 022 02 08 06 04 02 0 2 4 6 8 0 2 4 6 (c) Relative errors of x, y, and z Figure 43: Error compares with graph for Non-symmetric PSF with λ nl = 003 large number near 0 4 even we assumed n n y i z j = in Chapter 3 To suppress y and z, i= j= λ nl is chosen experimentally, see Figure 42 By choosing λ nl between 0025 and 0035, we can observe the changing of the relative error of x, y, and z Among those values λ nl = 0035 shows the smallest relative error of x while the relative errors of y, z drastically increase In the aspect of the total relative error, λ nl = 003 is the best non-linear regularization parameter in this experiment We chose λ nl = 003, we can check that y, z are slightly closer to y true, z true in the graphs of Figure 43 Also, the relative error of x is much closer to zero as the iteration number increases But after iteration l = 5, there was no significant change

Taewon Cho Chapter 4 Numerical Results 44 Table 4: Table of relative norm errors for Non-symmetric PSF with λ nl = 003 l x true x l 2 y true y l 2 z true z l 2 x true 2 y true 2 z true 2 0280 02560 0267 2 02528 02539 02652 3 0276 02504 0266 4 0962 02470 0258 5 0824 02439 02547 6 0720 0240 0256 7 0647 02384 02488 8 0580 02360 0246 9 058 02338 02435 0 046 0237 024 0407 02299 02389 2 0363 02283 02368 3 032 02268 02348 4 0284 02255 02330 5 0245 02244 0233 6 0245 02244 0233 in the errors We can check the numbers of the relative errors in Table 4 In Figure 44, we see how the Grain image is changed as the reduced Gauss-Newton method is applied The final image looks still far from true one, but the images becomes closer to the true image as l increases In addition, there were unexpected results when the same experiment was tried in the case of symmetric blur such that the relative errors of x, y do not decrease constantly and they end up increasing at some point In order to investigate the problem, we fix x true and alternate between solving the linear problems for y and z In Figure 45, we see how the PSF is changed as the reduced Gauss-Newton method is applied In the true PSF, we see that the blur contains oscillations in both y and z The initial guesses y 0 and z 0 have no oscillation so that the initial PSF starts from a continuous blur As l increases, the final PSF looks closer to the true PSF than the initial PSF

Taewon Cho Chapter 4 Numerical Results 45 True Image Blurred Image Figure 44: A comparison the true, blurred, deblurred images as the reduced Gauss-Newton method with non-symmetric PSF and λ nl = 003

Taewon Cho Chapter 4 Numerical Results 46 True PSF Figure 45: A comparison the true, reconstructed PSFs with λ nl = 003

Taewon Cho Chapter 4 Numerical Results 47 42 x true - Alternating Optimization In section 33, we changed the nonlinear problem to a sequence of linear problems by fixing x true and y 0, or x true and z 0 For this linear problem, we have used the matlab function lsqlinm to enforce nonnegativity [8] and linear equality [7] when we need to solve a constrained least squares problem First for x true, we have the problem where P(y, z) = s(y, z) yzt with s(y, z) = n i= has the general form: min y,z A(P(y, z))x true b 2 2, (43) st y,z 0 y = z = n y i z j Then the alternating optimization j= Choose y 0 2 For l = 0,, 2,, 3 end - Set z initial st min z A(X true )(I n y l )z b 2 - Modify z initial to be nonnegative and z = 2 Find z l+ st min z 0, z = A(X true)(i n y l )z b) 2 with z initial - Set y initial st min y A(X true )(z l+ I n )y b 2 - Modify y initial to be nonnegative and y = 22 Find y l+ st min y 0, y = A(X true)(z l+ I n )y b) 2 with y initial

Taewon Cho Chapter 4 Numerical Results 48 006 005 004 True y Initial y Computed y 007 006 005 True z Initial z Computed z 004 003 003 002 002 00 00 0 0 50 00 50 200 250 300 0 0 50 00 50 200 250 300 (a) Compare y true, y initial, y computed (b) Compare z true, z initial, z computed 03 025 02 05 0 005 0 2 3 4 5 6 7 8 9 0 (c) Relative errors of y and z Figure 46: Error Compares with Alternating Optimization To start this algorithm, y 0 is chosen by convolving between y true and a Gaussian kernel [8], [9] For setting the PSF, P(y, z), we make the sum of all elements of P(y, z) to be by dividing yz T by n n y i z j as we set in Chapter 3 However by forcing y = z =, i= j= the sum of all elements of P(y, z) is automatically equal to because n n y i z j i= j= = n n y i z j i= j= = n y i i= n z j since y, z 0 j= = y z = since y = z =

Taewon Cho Chapter 4 Numerical Results 49 Table 42: Table of relative norm errors for Alternating Optimization l y true y l 2 y true 2 z true z l 2 z true 2 0 02584774 00459049 004462480 004392798 2 004437537 004393802 3 00443469 004394230 4 0044342 004392070 5 004433983 00439204 6 004433946 0043923 7 004433936 0043925 8 004433934 0043926 9 004433933 0043926 0 004433933 0043926 For the linear least squares problem with y, z 0 and y = z =, the matlab code lsqlinm has been used with initial vector y initial, z initial respectively When z initial, y initial has been set for the first iteration, they aren t nonnegative and their -norm isn t To give an initial vector for lsqlinm, we give a small modification by changing small negative numbers to zero and normalizing it Once the alternating optimization process has started, we can check that the relative error of y has dropped to a smaller error The error of z started small and there is no great change, as the iterations continue See Figure 46 Also the computed y, z are really close to the true y, z Actually, we need to show convergence of the alternating optimization method if we want to give valid theoretical fundamentals Without a proof of the convergence, it may not be expected that y l, z z will converge to the true y, z In Chan and Wong [24], if a model is not convex then it can allow multiple solutions Though they use a PSF with only one parameter, they have found convergence globally with the converged solution depending on the initial guess In this thesis, we assume that the PSF is a rank- matrix so that we have n 2 nonlinear parameters, which means we need to consider results may more parameters than the model of Chan and Wong Therefore we just

Taewon Cho Chapter 4 Numerical Results 50 give numerical experiment, without convergence results Showing theoretical convergence, remains future work

Taewon Cho Chapter 4 Numerical Results 5 43 Alternating Optimization 3 ways This section is a extension of alternating optimization to solve for x, y, and z Thus it is much more difficult After initializing y 0 and z 0, we will compute initial x 0 by solving min x A(P(y 0, z 0 ))x b 2 (44) using a regularization method, namely a Weighted-GCV Method for Lanczos-Hybrid Regularization [23] with matlab code HyBRm [2, 22, 23] Also, we enforce y, z to be nonnegative and y = z = Then the alternating optimization 3 ways has the general form: Choose y 0, z 0 2 For l = 0,, 2,, 3 end 2 Compute x l st min x A(P(y l, z l ))x b 2 and vec(x l ) = x l - Set z initial st min z A(X l )(I n y l )z b 2 - Modify z initial to be nonnegative and z = 22 Find z l+ st min z 0, z = A(X l)(i n y l )z b 2 with z initial - Set y initial st min y A(X l )(z l+ I n )y b 2 - Modify y initial to be nonnegative and y = 23 Find y l+ st min y 0, y = A(X l)(z l+ I n )y b 2 with y initial For this experiment, we consider a smaller example, eg, 64 64, by cutting the middle

Taewon Cho Chapter 4 Numerical Results 52 6 4 2 08 06 04 02 0 5 0 5 (a) Relative errors of x, y and z Figure 47: Error Compares with Alternating Optimization 3 ways part of the Grain image Then A R 642 64 2, b, x R 642 When we just try to find x, y, and z through the alternating optimization 3 ways in Figure 47, it doesn t give us satisfactory results as shown in the above plot Since we start with x 0, which has been derived from y 0, z 0 by using HyBRm, x 0 may be far from x true To compensate for this weak point, we add more information by using rotated images with the assumption that the PSF is invariant In this case, A and b have to be changed Since we may lose some parts of the original images interpolation is need if we rotate by degrees except 90 o, 80 o, and 270 o Thus, we will just consider those 3 types of rotation, see Figure 48 For the image vector x, let s denote Rx to be the rotated image with a rotation matrix R k R n2 n 2 where k = 0,, 2, 3 Then R k x rotates image x counterclockwise by k 90 o The corresponding observed image is b k = A(P(y, z))r k x

Taewon Cho Chapter 4 Numerical Results 53 True Image, 0 o True Image, 90 o True Image, 80 o True Image, 270 o Blurred image, 0 o Blurred Image, 90 o Blurred Image, 80 o Blurred Image, 270 o Figure 48: Rotated true and blurred images with 0 o, 90 o, 80 o, 270 o Thus we can state again our problem as with k = 0,, 2, 3, min x,y,z st y,z 0 y = z = A(P(y, z))r 0 b 0 A(P(y, z))r b x A(P(y, z))r k b k 2 2, (45) and let A(P(y, z))r 0 A(P(y, z))r Â(P(y, z)) = A(P(y, z))r k b 0 b k and b b = (46) Now this problem can be considered as minimizing Â(P(y, z))x b 2 2 where Â(P(y, z)) R kn2 n 2 and b R kn2 The relative errors are shown in Figure 40 We can expect that more rotations give better reconstructions

Taewon Cho Chapter 4 Numerical Results 54 008 Error between True and Computed y 04 Error between True and Computed z 007 006 0 rotation rotation 2 rotation 3 rotation 02 0 0 rotation rotation 2 rotation 3 rotation 005 008 004 003 006 002 004 00 002 0 0 0 20 30 40 50 60 70 0 0 0 20 30 40 50 60 70 (a) Computed y and True y (b) Computed z and True z Figure 49: A comparison errors of y and z by Alternating Optimization 3 ways with rotation From Figure 40, we can check that the relative errors of x, y, z, and PSF decrease as the number of rotations grow Using three rotations, we get the smallest errors overall From this we can expect that we can get better computed x, y, and z if we have more rotation images In Figure 49, as we have checked the relative errors in Figure 40, the alternating optimization 3 ways with three times rotations gives us a meaningful error between the true and computed y, z The errors are just absolute difference values for each of the corresponding y i, z i, i =,, 64 While the other cases go away from the true y, z, three times rotation gives the final computed y, z which is close to the true y, z respectively A potential problem of the alternating optimization 3 ways with rotation is that it taken more time for bigger n As n (size of image) and k (number of rotations) are bigger, the algorithm needs more time to compute When we look at Figure 4, 42, the results show that the computed image and PSF with more rotations will give us better consequences In Figure 43, we repeat the same experiment with the original Grain image, whose size is 256 256 Without rotation, the deblurred image gets worse and the first image is still better than after 6 iteration of Alternating Optimization 3 ways However, with 2 or 3

Taewon Cho Chapter 4 Numerical Results 55 25 Relative Error of x 3 Relative Error of y 2 0 rotation rotation 2 rotation 3 rotation 2 0 rotation rotation 2 rotation 3 rotation relative error 5 relative error 09 08 07 06 05 05 04 0 0 5 0 5 iteration (a) Relative errors of x 03 0 5 0 5 iteration (b) Relative errors of y 6 Relative Error of z 8 Relative Error of PSF 4 0 rotation rotation 2 rotation 3 rotation 6 0 rotation rotation 2 rotation 3 rotation 2 4 relative error 08 relative error 2 06 08 04 06 02 0 5 0 5 iteration (c) Relative errors of z 04 0 5 0 5 iteration (d) Relative errors of PSF Figure 40: Relative errors by Alternating Optimization 3 ways with rotation rotations, the computed images are much better than no rotation or just rotation In Figure 44, for PSF of 256 256, 2 or 3 rotations are better than 0 or rotation However, the reconstructed PSFs are still poor approximation of the true PSF When we use the larger size of Grain image, the results are not as good as the case of 64 64 However, with more rotations, better reconstructions are likely

Taewon Cho Chapter 4 Numerical Results 56 True Image Initial Image Final Image (a) Computed images with No rotation True Image Initial Image Final Image (b) Computed images with One rotation True Image Initial Image Final Image (c) Computed images with Two rotation True Image Initial Image Final Image (d) Computed images with Three rotation Figure 4: Computed images by Alternating Optimization 3 ways with rotation (64 64 size)

Taewon Cho Chapter 4 Numerical Results 57 True PSF Initial PSF Final PSF (a) Computed PSF with No rotation True PSF Initial PSF Final PSF (b) Computed PSF with One rotation True PSF Initial PSF Final PSF (c) Computed PSF with Two rotation True PSF Initial PSF Final PSF (d) Computed PSF with Three rotation Figure 42: Computed PSFs by Alternating Optimization 3 ways with rotation (64 64 size)

Taewon Cho Chapter 4 Numerical Results 58 True Image Initial Image Final Image (a) Computed images with No rotation True Image Initial Image Final Image (b) Computed images with One rotation True Image Initial Image Final Image (c) Computed images with Two rotation True Image Initial Image Final Image (d) Computed images with Three rotation Figure 43: Computed images by Alternating Optimization 3 ways with rotation (256 256 size)

Taewon Cho Chapter 4 Numerical Results 59 True PSF Initial PSF Final PSF (a) Computed PSF with No rotation True PSF Initial PSF Final PSF (b) Computed PSF with One rotation True PSF Initial PSF Final PSF (c) Computed PSF with Two rotation True PSF Initial PSF Final PSF (d) Computed PSF with Three rotation Figure 44: Computed PSFs by Alternating Optimization 3 ways with rotation (256 256 size)