Statistically-Based Regularization Parameter Estimation for Large Scale Problems

Similar documents
The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation

A Hybrid LSQR Regularization Parameter Estimation Algorithm for Large Scale Problems

Newton s method for obtaining the Regularization Parameter for Regularized Least Squares

Newton s Method for Estimating the Regularization Parameter for Least Squares: Using the Chi-curve

Regularization Parameter Estimation for Least Squares: A Newton method using the χ 2 -distribution

SIGNAL AND IMAGE RESTORATION: SOLVING

A Newton Root-Finding Algorithm For Estimating the Regularization Parameter For Solving Ill- Conditioned Least Squares Problems

Towards Improved Sensitivity in Feature Extraction from Signals: one and two dimensional

THE χ 2 -CURVE METHOD OF PARAMETER ESTIMATION FOR GENERALIZED TIKHONOV REGULARIZATION

Regularization parameter estimation for underdetermined problems by the χ 2 principle with application to 2D focusing gravity inversion

Regularization Parameter Estimation for Underdetermined problems by the χ 2 principle with application to 2D focusing gravity inversion

COMPUTATIONAL ISSUES RELATING TO INVERSION OF PRACTICAL DATA: WHERE IS THE UNCERTAINTY? CAN WE SOLVE Ax = b?

Inverse Ill Posed Problems in Image Processing

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Golub-Kahan iterative bidiagonalization and determining the noise level in the data

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Regularization methods for large-scale, ill-posed, linear, discrete, inverse problems

Inverse problem and optimization

A fast nonstationary preconditioning strategy for ill-posed problems, with application to image deblurring

Tikhonov Regularization of Large Symmetric Problems

Numerical Methods for Separable Nonlinear Inverse Problems with Constraint and Low Rank

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

On nonstationary preconditioned iterative regularization methods for image deblurring

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Introduction to Bayesian methods in inverse problems

Laboratorio di Problemi Inversi Esercitazione 2: filtraggio spettrale

Stochastic Analogues to Deterministic Optimizers

Computational Methods. Least Squares Approximation/Optimization

Krylov subspace iterative methods for nonsymmetric discrete ill-posed problems in image restoration

Lecture 6. Regularized least-squares and minimum-norm methods 6 1

Functional SVD for Big Data

ITERATIVE REGULARIZATION WITH MINIMUM-RESIDUAL METHODS

Large Sample Properties of Estimators in the Classical Linear Regression Model

Inverse Theory. COST WaVaCS Winterschool Venice, February Stefan Buehler Luleå University of Technology Kiruna

Mathematical Beer Goggles or The Mathematics of Image Processing

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Principles of the Global Positioning System Lecture 11

STAT 100C: Linear models

Discrete ill posed problems

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

Uncertainty Quantification for Inverse Problems. November 7, 2011

1 Cricket chirps: an example

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS

Parameter estimation: A new approach to weighting a priori information

Iterative Methods for Ill-Posed Problems

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

UPRE Method for Total Variation Parameter Selection

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

ETNA Kent State University

LPA-ICI Applications in Image Processing

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact

Signal Denoising with Wavelets

Binary choice 3.3 Maximum likelihood estimation

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Numerical Linear Algebra and. Image Restoration

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Extension of GKB-FP algorithm to large-scale general-form Tikhonov regularization

Multivariate Regression

Tikhonov Regularization for Weighted Total Least Squares Problems

Regularizing inverse problems. Damping and smoothing and choosing...

Structural Damage Detection Using Time Windowing Technique from Measured Acceleration during Earthquake

Computer Vision & Digital Image Processing

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017

Downloaded 06/11/15 to Redistribution subject to SIAM license or copyright; see

Generalized Local Regularization for Ill-Posed Problems

[y i α βx i ] 2 (2) Q = i=1

10. Multi-objective least squares

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

c 1999 Society for Industrial and Applied Mathematics

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Open Problems in Mixed Models

8 The SVD Applied to Signal and Image Deblurring

Lanczos Bidiagonalization with Subspace Augmentation for Discrete Inverse Problems

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Least-Squares Performance of Analog Product Codes

Lecture 1: Numerical Issues from Inverse Problems (Parameter Estimation, Regularization Theory, and Parallel Algorithms)

JACOBI S ITERATION METHOD

10. Linear Models and Maximum Likelihood Estimation

Optimal State Estimators for Linear Systems with Unknown Inputs

Introduction. Chapter One

Deconvolution. Parameter Estimation in Linear Inverse Problems

Linear Inverse Problems

Linear Regression Linear Regression with Shrinkage

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

An l 2 -l q regularization method for large discrete ill-posed problems

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

The Kalman filter is arguably one of the most notable algorithms

Inverse scattering problem from an impedance obstacle

On the regularization properties of some spectral gradient methods

Chapter 3 Numerical Methods

System Identification

6 The SVD Applied to Signal and Image Deblurring

SPARSE SIGNAL RESTORATION. 1. Introduction

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Transcription:

Statistically-Based Regularization Parameter Estimation for Large Scale Problems Rosemary Renaut Joint work with Jodi Mead and Iveta Hnetynkova March 1, 2010 National Science Foundation: Division of Computational Mathematics 1 / 1

Outline National Science Foundation: Division of Computational Mathematics 2 / 1

Signal/Image Restoration: Integral Model of Signal Degradation b(t) = K(t, s)x(s)ds K(t, s) describes blur of the signal. Convolutional model: invariant K(t, s) = K(t s) is Point Spread Function (PSF). Typically sampling includes noise e(t), model is b(t) = K(t s)x(s)ds + e(t) Discrete model: given discrete samples b, find samples x of x Let A discretize K, assume known, model is given by Naïvely invert the system to find x! b = Ax + e. National Science Foundation: Division of Computational Mathematics 3 / 1

Signal/Image Restoration: Integral Model of Signal Degradation b(t) = K(t, s)x(s)ds K(t, s) describes blur of the signal. Convolutional model: invariant K(t, s) = K(t s) is Point Spread Function (PSF). Typically sampling includes noise e(t), model is b(t) = K(t s)x(s)ds + e(t) Discrete model: given discrete samples b, find samples x of x Let A discretize K, assume known, model is given by Naïvely invert the system to find x! b = Ax + e. National Science Foundation: Division of Computational Mathematics 3 / 1

Example 1-D Original and Blurred Noisy Signal Original signal x. Blurred and noisy signal b, Gaussian PSF. National Science Foundation: Division of Computational Mathematics 4 / 1

The Solution: Regularization is needed Naïve Solution A Regularized Solution National Science Foundation: Division of Computational Mathematics 5 / 1

Two dimensional Image Deblurring How do we obtain the deblurred image National Science Foundation: Division of Computational Mathematics 6 / 1

Two dimensional Image Deblurring How do we obtain the deblurred image National Science Foundation: Division of Computational Mathematics 6 / 1

Least Squares for Ax = b: A Quick Review Consider discrete systems: A R m n, b R m, x R n Ax = b + e, Classical Approach Linear Least Squares (A full rank) x LS = arg min x Ax b 2 2 Difficulty x LS sensitive to changes in right hand side b when A is ill-conditioned. For convolutional models system is ill-posed. National Science Foundation: Division of Computational Mathematics 7 / 1

Least Squares for Ax = b: A Quick Review Consider discrete systems: A R m n, b R m, x R n Ax = b + e, Classical Approach Linear Least Squares (A full rank) x LS = arg min x Ax b 2 2 Difficulty x LS sensitive to changes in right hand side b when A is ill-conditioned. For convolutional models system is ill-posed. National Science Foundation: Division of Computational Mathematics 7 / 1

Least Squares for Ax = b: A Quick Review Consider discrete systems: A R m n, b R m, x R n Ax = b + e, Classical Approach Linear Least Squares (A full rank) x LS = arg min x Ax b 2 2 Difficulty x LS sensitive to changes in right hand side b when A is ill-conditioned. For convolutional models system is ill-posed. National Science Foundation: Division of Computational Mathematics 7 / 1

Least Squares for Ax = b: A Quick Review Consider discrete systems: A R m n, b R m, x R n Ax = b + e, Classical Approach Linear Least Squares (A full rank) x LS = arg min x Ax b 2 2 Difficulty x LS sensitive to changes in right hand side b when A is ill-conditioned. For convolutional models system is ill-posed. National Science Foundation: Division of Computational Mathematics 7 / 1

Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

Choice of λ crucial Different algorithms yield different solutions. Discrepancy Principle L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 12 / 1

Choice of λ crucial Different algorithms yield different solutions. Discrepancy Principle L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 12 / 1

Choice of λ crucial Different algorithms yield different solutions. Discrepancy Principle L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 12 / 1

Choice of λ crucial Different algorithms yield different solutions. Discrepancy Principle L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 12 / 1

Choice of λ crucial Different algorithms yield different solutions. Discrepancy Principle L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 12 / 1

The Discrepancy Principle Suppose noise is white: C b = σ 2 b I. Find λ such that the regularized residual satisfies σ 2 b = 1 m b Ax(λ) 2 2. (2) Can be implemented by a Newton root finding algorithm. But discrepancy principle typically oversmooths. National Science Foundation: Division of Computational Mathematics 13 / 1

The Discrepancy Principle Suppose noise is white: C b = σ 2 b I. Find λ such that the regularized residual satisfies σ 2 b = 1 m b Ax(λ) 2 2. (2) Can be implemented by a Newton root finding algorithm. But discrepancy principle typically oversmooths. National Science Foundation: Division of Computational Mathematics 13 / 1

The Discrepancy Principle Suppose noise is white: C b = σ 2 b I. Find λ such that the regularized residual satisfies σ 2 b = 1 m b Ax(λ) 2 2. (2) Can be implemented by a Newton root finding algorithm. But discrepancy principle typically oversmooths. National Science Foundation: Division of Computational Mathematics 13 / 1

The Discrepancy Principle Suppose noise is white: C b = σ 2 b I. Find λ such that the regularized residual satisfies σ 2 b = 1 m b Ax(λ) 2 2. (2) Can be implemented by a Newton root finding algorithm. But discrepancy principle typically oversmooths. National Science Foundation: Division of Computational Mathematics 13 / 1

Some standard approaches I: L-curve - Find the corner Let r(λ) = (A(λ) A)b: Influence Matrix A(λ) = A(A T W b A+λ 2 D T D) 1 A T Plot log( Dx ), log( r(λ) ) Trade off contributions. Expensive - requires range of λ. GSVD makes calculations efficient. Not statistically based Find corner No corner National Science Foundation: Division of Computational Mathematics 14 / 1

Generalized Cross-Validation (GCV) Let A(λ) = A(A T W b A+λ 2 D T D) 1 A T Can pick W b = I. Minimize GCV function b Ax(λ) 2 W b [trace(i m A(λ))] 2, which estimates predictive risk. Expensive - requires range of λ. GSVD makes calculations efficient. Requires minimum Multiple minima Sometimes flat National Science Foundation: Division of Computational Mathematics 15 / 1

Unbiased Predictive Risk Estimation (UPRE) Minimize expected value of predictive risk: Minimize UPRE function b Ax(λ) 2 W b +2 trace(a(λ)) m Expensive - requires range of λ. GSVD makes calculations efficient. Need estimate of trace Minimum needed National Science Foundation: Division of Computational Mathematics 16 / 1

An Alternative: Iterative Methods with Stopping Criteria Iterate to find approximate solution of Ax b. Use LSQR. Introduce stopping criteria based on determining noise in the solution. e.g. Residual Periodogram (O Leary and Rust). Stop when noise dominates Hnetynkova et al. Hybrid Methods combine iteration and regularization parameter. Cost of regularization of reduced system is minimal Advantage any regularization may be used for subproblem. (Nagy etc) How to be sure when to stop the LSQR iteration? How is statistics included in sub problem? Include statistics directly National Science Foundation: Division of Computational Mathematics 17 / 1

An Alternative: Iterative Methods with Stopping Criteria Iterate to find approximate solution of Ax b. Use LSQR. Introduce stopping criteria based on determining noise in the solution. e.g. Residual Periodogram (O Leary and Rust). Stop when noise dominates Hnetynkova et al. Hybrid Methods combine iteration and regularization parameter. Cost of regularization of reduced system is minimal Advantage any regularization may be used for subproblem. (Nagy etc) How to be sure when to stop the LSQR iteration? How is statistics included in sub problem? Include statistics directly National Science Foundation: Division of Computational Mathematics 17 / 1

Background: Statistics of the Least Squares Problem Theorem (Rao73: First Fundamental Theorem) Let r be the rank of A and for b N(Ax, σb 2 I), (errors in measurements are normally distributed with mean 0 and covariance σb 2 I), then J = min Ax b 2 σ 2 x bχ 2 (m r). J follows a χ 2 distribution with m r degrees of freedom: Basically the Discrepancy Principle Corollary (Weighted Least Squares) For b N(Ax, C b ), W b = C b 1 then J = min Ax b 2 x W b χ 2 (m r). National Science Foundation: Division of Computational Mathematics 18 / 1

Background: Statistics of the Least Squares Problem Theorem (Rao73: First Fundamental Theorem) Let r be the rank of A and for b N(Ax, σb 2 I), (errors in measurements are normally distributed with mean 0 and covariance σb 2 I), then J = min Ax b 2 σ 2 x bχ 2 (m r). J follows a χ 2 distribution with m r degrees of freedom: Basically the Discrepancy Principle Corollary (Weighted Least Squares) For b N(Ax, C b ), W b = C b 1 then J = min Ax b 2 x W b χ 2 (m r). National Science Foundation: Division of Computational Mathematics 18 / 1

Extension: Statistics of the Regularized Least Squares Problem Thm: χ 2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (3) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) {0}. Moore-Penrose generalized inverse of W D is C D Statistics: Errors in the right hand side e N(0, C b ), and x 0 is known so that (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D (ˆx(W D )) χ 2 (m + p n) National Science Foundation: Division of Computational Mathematics 19 / 1

Extension: Statistics of the Regularized Least Squares Problem Thm: χ 2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (3) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) {0}. Moore-Penrose generalized inverse of W D is C D Statistics: Errors in the right hand side e N(0, C b ), and x 0 is known so that (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D (ˆx(W D )) χ 2 (m + p n) National Science Foundation: Division of Computational Mathematics 19 / 1

Extension: Statistics of the Regularized Least Squares Problem Thm: χ 2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (3) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) {0}. Moore-Penrose generalized inverse of W D is C D Statistics: Errors in the right hand side e N(0, C b ), and x 0 is known so that (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D (ˆx(W D )) χ 2 (m + p n) National Science Foundation: Division of Computational Mathematics 19 / 1

Extension: Statistics of the Regularized Least Squares Problem Thm: χ 2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (3) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) {0}. Moore-Penrose generalized inverse of W D is C D Statistics: Errors in the right hand side e N(0, C b ), and x 0 is known so that (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D (ˆx(W D )) χ 2 (m + p n) National Science Foundation: Division of Computational Mathematics 19 / 1

Extension: Statistics of the Regularized Least Squares Problem Thm: χ 2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (3) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) {0}. Moore-Penrose generalized inverse of W D is C D Statistics: Errors in the right hand side e N(0, C b ), and x 0 is known so that (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D (ˆx(W D )) χ 2 (m + p n) National Science Foundation: Division of Computational Mathematics 19 / 1

Significance of the χ 2 result For sufficiently large m = m + p n J D χ 2 (m + p n) E(J(x(W D ))) = m + p n E(JJ T ) = 2(m + p n) Moreover m 2 mz α/2 < J(ˆx(W D )) < m + 2 mz α/2. (4) z α/2 is the relevant z-value for a χ 2 -distribution with m = m + p n degrees National Science Foundation: Division of Computational Mathematics 20 / 1

Key Aspects of the Proof I: The Functional J Algebraic Simplifications: Rewrite functional as quadratic form Regularized solution given in terms of resolution matrix R(W D ) ˆx = x 0 + (A T W b A + D T W x D) 1 A T W b r, (5) = x 0 + R(W D )W 1/2 b r, r = b Ax 0 = x 0 + y(w D ). (6) R(W D ) = (A T W b A + D T W x D) 1 A T 1/2 W b (7) Functional is given in terms of influence matrix A(W D ) A(W D ) = W b 1/2 AR(W D ) (8) J D (ˆx) = r T W b 1/2 (I m A(W D ))W b 1/2 r, let r = W b 1/2 r (9) = r T (I m A(W D )) r. A Quadratic Form (10) National Science Foundation: Division of Computational Mathematics 21 / 1

Key Aspects of the Proof II : Properties of a Quadratic Form χ 2 distribution of Quadratic Forms x T Px for normal variables (Fisher- Cochran Theorem) Components x i are independent normal variables x i N(0, 1), i = 1 : n. A necessary and sufficient condition that x T Px has a central χ 2 distribution is that P is idempotent, P 2 = P. In which case the degrees of freedom of χ 2 is rank(p) =trace(p) = n.. When the means of x i are µ i 0, x T Px has a non-central χ 2 distribution, with non-centrality parameter c = µ T Pµ A χ 2 random variable with n degrees of freedom and centrality parameter c has mean n + c and variance 2(n + 2c). National Science Foundation: Division of Computational Mathematics 22 / 1

Key Aspects of the Proof III: Requires the GSVD Lemma Assume invertibility and m n p. There exist unitary matrices U R m m, V R p p, and a nonsingular matrix X R n n such that [ ] Υ A = U X T D = V[M, 0 p (n p) ]X T, (11) 0 (m n) n Υ = diag(υ 1,..., υ p, 1,..., 1) R n n, M = diag(µ 1,..., µ p ) R p p, 0 υ 1 υ p 1, 1 µ 1 µ p > 0, υi 2 + µ 2 i = 1, i = 1,... p. (12) The Functional with the GSVD Let Q = diag(µ 1,..., µ p, 0 n p, I m n ) then J = r T (I m A(W D )) r = QU T r 2 2 = k 2 2 National Science Foundation: Division of Computational Mathematics 23 / 1

Proof IV: Statistical Distribution of the Weighted Residual Covariance Structure Errors in b are e N(0, C b ). Now b depends on x, b = Ax hence we can show b N(Ax 0, C b + AC D A T ) (x 0 is mean of x) Residual r = b Ax N(0, C b + AC D A T ). r = W b 1/2 r N(0, I + ÃC D Ã T ), Ã = W b 1/2 A. Use the GSVD I + ÃC D Ã T = UQ 2 U T, Q = diag(µ 1,..., µ p, I n p, I m n ) Now k = QU T r then k N(0, QU T (UQ 2 U T )UQ) N(0, I m ) But J = QU T r 2 = k 2, where k is the vector k excluding components p + 1 : n. Thus J D χ 2 (m + p n). National Science Foundation: Division of Computational Mathematics 24 / 1

When mean of the parameters is not known, or x 0 = 0 is not the mean Corollary: non-central χ 2 distribution of the regularized functional Recall ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. Assume all assumptions as before, but x x 0 is the mean vector of the model parameters. Let c = c 2 2 = QU T W b 1/2 A(x x 0 ) 2 2 Then J D χ 2 (m + p n, c) The functional at optimum follows a non central χ 2 distribution National Science Foundation: Division of Computational Mathematics 25 / 1

A further result when A is not of full column rank The Rank Deficient Solution Suppose A is not full column rank. Then the filtered solution can be written in terms of the GSVD x FILT (λ) = p i=p+1 r γ 2 i υ i (γ 2 i + λ 2 ) s i x i + n i=p+1 s i x i = p i=1 f i υ i s i x i + n i=p+1 s i x i. Here f i = 0, i = 1 : p r, f i = γ 2 i /(γ2 i + λ 2 ), i = p r + 1 : p. This yields J(x FILT (λ)) χ 2 (m n + r, c) notice degrees of freedom are reduced. National Science Foundation: Division of Computational Mathematics 26 / 1

General Result The Cost Functional follows a χ 2 Statistical Distribution Suppose degrees of freedom m and centrality parameter c then E(J D ) = m + c E(J D J T D) = 2( m) + 4c Suggests: Try to find W D so that E(J) = m + c Mead expensive nonlinear algorithm, c = 0 general W D. First find λ only. Find W x = λ 2 I National Science Foundation: Division of Computational Mathematics 27 / 1

General Result The Cost Functional follows a χ 2 Statistical Distribution Suppose degrees of freedom m and centrality parameter c then E(J D ) = m + c E(J D J T D) = 2( m) + 4c Suggests: Try to find W D so that E(J) = m + c Mead expensive nonlinear algorithm, c = 0 general W D. First find λ only. Find W x = λ 2 I National Science Foundation: Division of Computational Mathematics 27 / 1

General Result The Cost Functional follows a χ 2 Statistical Distribution Suppose degrees of freedom m and centrality parameter c then E(J D ) = m + c E(J D J T D) = 2( m) + 4c Suggests: Try to find W D so that E(J) = m + c Mead expensive nonlinear algorithm, c = 0 general W D. First find λ only. Find W x = λ 2 I National Science Foundation: Division of Computational Mathematics 27 / 1

General Result The Cost Functional follows a χ 2 Statistical Distribution Suppose degrees of freedom m and centrality parameter c then E(J D ) = m + c E(J D J T D) = 2( m) + 4c Suggests: Try to find W D so that E(J) = m + c Mead expensive nonlinear algorithm, c = 0 general W D. First find λ only. Find W x = λ 2 I National Science Foundation: Division of Computational Mathematics 27 / 1

What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

Assume x 0 is the mean (experimentalists know something about model parameters) DESIGNING THE ALGORITHM: I Recall: if C b and C x are good estimates of covariance should be small. J D (ˆx) (m + p n) Thus, let m = m + p n then we want m 2 mz α/2 < J(x(W D )) < m + 2 mz α/2. z α/2 is the relevant z-value for a χ 2 -distribution with m degrees GOAL Find W x to make (??) tight: Single Variable case find λ J D (ˆx(λ)) m National Science Foundation: Division of Computational Mathematics 29 / 1

Assume x 0 is the mean (experimentalists know something about model parameters) DESIGNING THE ALGORITHM: I Recall: if C b and C x are good estimates of covariance should be small. J D (ˆx) (m + p n) Thus, let m = m + p n then we want m 2 mz α/2 < J(x(W D )) < m + 2 mz α/2. z α/2 is the relevant z-value for a χ 2 -distribution with m degrees GOAL Find W x to make (??) tight: Single Variable case find λ J D (ˆx(λ)) m National Science Foundation: Division of Computational Mathematics 29 / 1

A Newton-line search Algorithm to find λ = 1/σ. (Basic algebra) Newton to Solve F(σ) = J D (σ) m = 0 We use σ = 1/λ, and y(σ (k) ) is the current solution for which x(σ (k) ) = y(σ (k) ) + x 0 Then σ J(σ) = 2 σ 3 Dy(σ) 2 < 0 Hence we have a basic Newton Iteration σ (k+1) = σ (k) (1 + 1 2 ( σ(k) Dy )2 (J D (σ (k) ) m)). National Science Foundation: Division of Computational Mathematics 30 / 1

A Newton-line search Algorithm to find λ = 1/σ. (Basic algebra) Newton to Solve F(σ) = J D (σ) m = 0 We use σ = 1/λ, and y(σ (k) ) is the current solution for which x(σ (k) ) = y(σ (k) ) + x 0 Then σ J(σ) = 2 σ 3 Dy(σ) 2 < 0 Hence we have a basic Newton Iteration σ (k+1) = σ (k) (1 + 1 2 ( σ(k) Dy )2 (J D (σ (k) ) m)). National Science Foundation: Division of Computational Mathematics 30 / 1

A Newton-line search Algorithm to find λ = 1/σ. (Basic algebra) Newton to Solve F(σ) = J D (σ) m = 0 We use σ = 1/λ, and y(σ (k) ) is the current solution for which x(σ (k) ) = y(σ (k) ) + x 0 Then σ J(σ) = 2 σ 3 Dy(σ) 2 < 0 Hence we have a basic Newton Iteration σ (k+1) = σ (k) (1 + 1 2 ( σ(k) Dy )2 (J D (σ (k) ) m)). National Science Foundation: Division of Computational Mathematics 30 / 1

Algorithm Using the GSVD GSVD Use GSVD of [W 1/2 b A, D] For γ i the generalized singular values, and s = U T W 1/2 b r m = m n + p s i = s i /(γi 2σ2 x + 1), i = 1,..., p, t i = s i γ i. Find root of F(σ x ) = p 1 ( γi 2σ2 x + 1 )s2 i + i=1 m s 2 i m = 0 i=n+1 Equivalently: solve F = 0, where F(σ x ) = s T s m and F (σ x ) = 2σ x t 2 2. National Science Foundation: Division of Computational Mathematics 31 / 1

Discussion on Convergence F is monotonic decreasing (F (σ x ) = 2σ x Dy 2 2 ) Solution either exists and is unique for positive σ Or no solution exists F(0) < 0. implies incorrect statistics of the model Theoretically, lim σ F > 0 possible. Equivalent to λ = 0. No regularization needed. National Science Foundation: Division of Computational Mathematics 32 / 1

Practical Details of Algorithm Find the parameter Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields sigmamax and sigmamin Step 2: Calculate step, with steepness controlled by told. Let t = Dy/σ (k), where y is the current update, then step = 1 2 ( 1 max { t, told} )2 (J D (σ (k) ) m) Step 3: Introduce line search α (k) in Newton sigmanew = σ (k) (1 + α (k) step) α (k) chosen such that sigmanew within bracket. National Science Foundation: Division of Computational Mathematics 33 / 1

Practical Details of Algorithm Find the parameter Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields sigmamax and sigmamin Step 2: Calculate step, with steepness controlled by told. Let t = Dy/σ (k), where y is the current update, then step = 1 2 ( 1 max { t, told} )2 (J D (σ (k) ) m) Step 3: Introduce line search α (k) in Newton sigmanew = σ (k) (1 + α (k) step) α (k) chosen such that sigmanew within bracket. National Science Foundation: Division of Computational Mathematics 33 / 1

Practical Details of Algorithm Find the parameter Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields sigmamax and sigmamin Step 2: Calculate step, with steepness controlled by told. Let t = Dy/σ (k), where y is the current update, then step = 1 2 ( 1 max { t, told} )2 (J D (σ (k) ) m) Step 3: Introduce line search α (k) in Newton sigmanew = σ (k) (1 + α (k) step) α (k) chosen such that sigmanew within bracket. National Science Foundation: Division of Computational Mathematics 33 / 1

Practical Details of Algorithm Find the parameter Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields sigmamax and sigmamin Step 2: Calculate step, with steepness controlled by told. Let t = Dy/σ (k), where y is the current update, then step = 1 2 ( 1 max { t, told} )2 (J D (σ (k) ) m) Step 3: Introduce line search α (k) in Newton sigmanew = σ (k) (1 + α (k) step) α (k) chosen such that sigmanew within bracket. National Science Foundation: Division of Computational Mathematics 33 / 1

Practical Details of Algorithm: Large Scale problems Algorithm Initialization Convert generalized Tikhonov problem to standard form.( if L is not invertible you just need to know how to find Ax and A T x, and the null space of L) Use LSQR algorithm to find the bidiagonal matrix for the projected problem. Obtain a solution of the bidiagonal problem for given initial σ. Subsequent Steps Increase dimension of space if needed with reuse of existing bidiagonalization. May also use smaller size system if appropriate. Each σ calculation of algorithm reuses saved information from the Lancos bidiagonalization. National Science Foundation: Division of Computational Mathematics 34 / 1

Comparison with Standard LSQR hybrid Algorithm Algorithm can concurrently regularize and find λ Standard hybrid LSQR solves projected system then adds regularization. This is also possible: results of both approaches. Advantages Costs Needs only cost of standard LSQR algorithm with some updates for solution solves for iterated σ. The regularization introduced by LSQR projection may be useful for preventing problems with GSVD expansion. Makes algorithm viable for large scale problems. National Science Foundation: Division of Computational Mathematics 35 / 1

Illustrating the Results for Problem Size 512: Two Standard Test Problems omparison for noise level 10%. On left D = I and on right D is first derivative Notice L-curve and χ 2 -LSQR perform well. UPRE does not perform well. National Science Foundation: Division of Computational Mathematics 36 / 1

Real Data: Seismic Signal Restoration The Data Set and Goal Real data set of 48 signals of length 3000. The point spread function is derived from the signals. Calculate the signal variance pointwise over all 48 signals. Goal: restore the signal x from Ax = b, where A is PSF matrix and b is given blurred signal. Method of Comparison- no exact solution known: use convergence with respect to downsampling. National Science Foundation: Division of Computational Mathematics 37 / 1

Comparison High Resolution White noise Greater contrast with χ 2. UPRE is insufficiently regularized. L-curve severely undersmooths (not shown). Parameters not consistent across resolutions National Science Foundation: Division of Computational Mathematics 38 / 1

THE UPRE SOLUTION: x 0 = 0 White Noise Regularization Parameters are consistent: σ = 0.01005 all resolutions National Science Foundation: Division of Computational Mathematics 39 / 1

THE LSQR Hybrid SOLUTION: White Noise Regularization quite consistent resolution 2 to 100 σ = 0.0000029,.0000029,.0000029,.0000057,.0000057 National Science Foundation: Division of Computational Mathematics 40 / 1

Illustrating the Deblurring Result: Problem Size 65536 Example taken from RESTORE TOOLS Nagy et al 2007-8: 15% Noise Computational Cost is Minimal: Projected Problem Size is 15, λ =.58 National Science Foundation: Division of Computational Mathematics 41 / 1

Problem Grain noise 15% added for increasing subproblem size Signal to noise ratio 10 log 10 (1/e) relative error e Regularization parameter σ in each case. National Science Foundation: Division of Computational Mathematics 42 / 1

Illustrating the progress of the Newton algorithm post LSQR National Science Foundation: Division of Computational Mathematics 43 / 1

Illustrating the progress of the Newton algorithm with LSQR National Science Foundation: Division of Computational Mathematics 44 / 1

Problem Grain noise 15% added for increasing subproblem size Signal to noise ratio 10 log 10 (1/e) relative error e National Science Foundation: Division of Computational Mathematics 45 / 1

Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

Future Work Other Results and Future Work Software Package! Diagonal Weighting Schemes Edge preserving regularization - Total Variation Better handling of Colored Noise. Bound Constraints (with Mead accepted). National Science Foundation: Division of Computational Mathematics 47 / 1

THANK YOU! National Science Foundation: Division of Computational Mathematics 48 / 1