The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation

Similar documents
The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation

Statistically-Based Regularization Parameter Estimation for Large Scale Problems

Newton s method for obtaining the Regularization Parameter for Regularized Least Squares

A Hybrid LSQR Regularization Parameter Estimation Algorithm for Large Scale Problems

Newton s Method for Estimating the Regularization Parameter for Least Squares: Using the Chi-curve

Regularization Parameter Estimation for Least Squares: A Newton method using the χ 2 -distribution

A Newton Root-Finding Algorithm For Estimating the Regularization Parameter For Solving Ill- Conditioned Least Squares Problems

Towards Improved Sensitivity in Feature Extraction from Signals: one and two dimensional

SIGNAL AND IMAGE RESTORATION: SOLVING

THE χ 2 -CURVE METHOD OF PARAMETER ESTIMATION FOR GENERALIZED TIKHONOV REGULARIZATION

Regularization parameter estimation for underdetermined problems by the χ 2 principle with application to 2D focusing gravity inversion

Regularization Parameter Estimation for Underdetermined problems by the χ 2 principle with application to 2D focusing gravity inversion

COMPUTATIONAL ISSUES RELATING TO INVERSION OF PRACTICAL DATA: WHERE IS THE UNCERTAINTY? CAN WE SOLVE Ax = b?

Golub-Kahan iterative bidiagonalization and determining the noise level in the data

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Inverse Ill Posed Problems in Image Processing

A fast nonstationary preconditioning strategy for ill-posed problems, with application to image deblurring

Discrete ill posed problems

Tikhonov Regularization of Large Symmetric Problems

Extension of GKB-FP algorithm to large-scale general-form Tikhonov regularization

ETNA Kent State University

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

On nonstationary preconditioned iterative regularization methods for image deblurring

Iterative Methods for Ill-Posed Problems

LEAST SQUARES PROBLEMS WITH INEQUALITY CONSTRAINTS AS QUADRATIC CONSTRAINTS

Inverse Theory. COST WaVaCS Winterschool Venice, February Stefan Buehler Luleå University of Technology Kiruna

Regularization methods for large-scale, ill-posed, linear, discrete, inverse problems

Inverse scattering problem from an impedance obstacle

Linear Inverse Problems

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods

Old and new parameter choice rules for discrete ill-posed problems

LEAST SQUARES PROBLEMS WITH INEQUALITY CONSTRAINTS AS QUADRATIC CONSTRAINTS

Lecture 1: Numerical Issues from Inverse Problems (Parameter Estimation, Regularization Theory, and Parallel Algorithms)

Lanczos Bidiagonalization with Subspace Augmentation for Discrete Inverse Problems

DELFT UNIVERSITY OF TECHNOLOGY

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact

unbiased predictive risk estimator for regularization

Methods for sparse analysis of high-dimensional data, II

1 Cricket chirps: an example

On the regularization properties of some spectral gradient methods

10. Multi-objective least squares

Moments, Model Reduction and Nonlinearity in Solving Linear Algebraic Problems

Introduction. Chapter One

Numerical Methods for Separable Nonlinear Inverse Problems with Constraint and Low Rank

Stochastic Analogues to Deterministic Optimizers

Linear Algebra and its Applications

Computational Methods. Least Squares Approximation/Optimization

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

Krylov subspace iterative methods for nonsymmetric discrete ill-posed problems in image restoration

Introduction to Bayesian methods in inverse problems

Signal Denoising with Wavelets

Elaine T. Hale, Wotao Yin, Yin Zhang

Open Problems in Mixed Models

UPRE Method for Total Variation Parameter Selection

Vector and Matrix Norms. Vector and Matrix Norms

Methods for sparse analysis of high-dimensional data, II

Fitting. PHY 688: Numerical Methods for (Astro)Physics

PROJECTED TIKHONOV REGULARIZATION OF LARGE-SCALE DISCRETE ILL-POSED PROBLEMS

estimator for determining the regularization parameter in

A Parameter-Choice Method That Exploits Residual Information

Nonlinear Optimization for Optimal Control

Application of the χ 2 principle and unbiased predictive risk estimator for determining the regularization parameter in 3D focusing gravity inversion

Scientific Computing: An Introductory Survey

Generalized Local Regularization for Ill-Posed Problems

Inverse problem and optimization

8 The SVD Applied to Signal and Image Deblurring

Solving discrete ill posed problems with Tikhonov regularization and generalized cross validation

Discontinuous parameter estimates with least squares estimators

ITERATIVE REGULARIZATION WITH MINIMUM-RESIDUAL METHODS

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS

Augmented Tikhonov Regularization

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Solving Regularized Total Least Squares Problems

NON-NEGATIVELY CONSTRAINED LEAST

Numerical Linear Algebra and. Image Restoration

Downloaded 06/11/15 to Redistribution subject to SIAM license or copyright; see

Choosing the Regularization Parameter

Normed & Inner Product Vector Spaces

Regularization with Randomized SVD for Large-Scale Discrete Inverse Problems

Iterative regularization of nonlinear ill-posed problems in Banach space

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods

The Conjugate Gradient Method for Solving Linear Systems of Equations

6 The SVD Applied to Signal and Image Deblurring

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

arxiv: v2 [math.na] 8 Feb 2018

2017 Society for Industrial and Applied Mathematics

Regularizing inverse problems. Damping and smoothing and choosing...

8 The SVD Applied to Signal and Image Deblurring

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Regularization Parameter Selection Methods for Ill-Posed Poisson Maximum Likelihood Estimation

Chapter 3 Numerical Methods

Deconvolution. Parameter Estimation in Linear Inverse Problems

A study on regularization parameter choice in Near-field Acoustical Holography

Parameter estimation: A new approach to weighting a priori information

Covariance function estimation in Gaussian process regression

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Numerical Methods I Solving Nonlinear Equations

Transcription:

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation Rosemary Renaut Collaborators: Jodi Mead and Iveta Hnetynkova DEPARTMENT OF MATHEMATICS AND STATISTICS Denmark 2008 MATHEMATICS AND STATISTICS 1 / 37

Outline 1 Introduction and Motivation Some Standard (or NOT) Statistical Methods for Regularization Parameter Estimation 2 Statistical Results for Least Squares 3 Implications of Statistical Results for Regularized Least Squares 4 Newton algorithm 5 Algorithm with LSQR 6 Results 7 Conclusions and Future Work MATHEMATICS AND STATISTICS 2 / 37

Least Squares for Ax = b: A Quick Review Consider discrete systems: A R m n, b R m, x R n Ax = b + e, Classical Approach Linear Least Squares x LS = arg min x Ax b 2 2 Difficulty x LS is sensitive to changes in the right hand side b when A is ill-conditioned. System is numerically ill-posed. MATHEMATICS AND STATISTICS 3 / 37

Example 1-D Signal: Restore Signal 1-D Original and Blurred Noisy Signal MATHEMATICS AND STATISTICS 4 / 37

Example 1-D Signal: Restore Signal The Unregularized Solution: Illustrates Sensitivity MATHEMATICS AND STATISTICS 4 / 37

Alternative: Introduce a Mechanism for Regularization Weighted Fidelity with Regularization Regularize x LS = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b is inverse covariance matrix for data b. R(x) is a regularization term λ is a regularization parameter which is unknown. Notice that the solution is x LS (λ), dependent on λ. It also depends on choice of R. Requirements Depends on R - what to chose? MATHEMATICS AND STATISTICS 5 / 37

A Specific Choice R(x) = D(x x 0 ) 2 : Tikhonov Regularized Generalized Tikhonov regularization: Given matrix D that is suitable. ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) Assume N (A) N (D) = x 0 is a reference solution, often x 0 = 0. Solution ˆx(λ) = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (2) Question Given D, how do we find λ? Choice of λ impacts the solution MATHEMATICS AND STATISTICS 6 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Solution for Increasing λ, D = I. MATHEMATICS AND STATISTICS 7 / 37

Choice of λ crucial Different algorithms yield different solutions. What is the correct choice? Use some prior information. But there is no one correct choice. MATHEMATICS AND STATISTICS 8 / 37

The Discrepancy Principle Suppose noise is white: C b = σ 2 b I. Find λ such that the regularized residual satisfies σ 2 b = 1 m b Ax(λ) 2 2. (3) Can be implemented by a Newton root finding algorithm. But discrepancy principle typically oversmooths. MATHEMATICS AND STATISTICS 9 / 37

Generalized Cross-Validation (GCV) Let A(λ) = A(A T W b A+λ 2 D T D) 1 A T Can pick W b = I. Minimize GCV function b Ax(λ) 2 W b [trace(i m A(λ))] 2, which estimates predictive risk. Expensive - requires range of λ. GSVD makes calculations efficient. Requires minimum Multiple minima Sometimes flat MATHEMATICS AND STATISTICS 10 / 37

Unbiased Predictive Risk Estimation (UPRE) Minimize expected value of predictive risk: Minimize UPRE function b Ax(λ) 2 W b +2 trace(a(λ)) m Expensive - requires range of λ. GSVD makes calculations efficient. Need estimate of trace Minimum needed MATHEMATICS AND STATISTICS 11 / 37

Iterative Methods with Stopping Criteria Iterate to find approximate solution of Ax b. Introduce stopping criteria based on determining noise in the solution. e.g. Residual Periodogram (O Leary and Rust). Hybrid LSQR iterate to reduce problem size. Stop when noise dominates Hnetynkova et al. Hybrid method - solve reduced system with additional regularization. Cost of regularization of reduced system is minimal Advantage any regularization may be used for subproblem. (Nagy etc) How to find the appropriate regularization approach? How to be sure when to stop the LSQR iteration? How is statistics included in sub problem? Include statistics directly MATHEMATICS AND STATISTICS 12 / 37

Background: Statistics of the Least Squares Problem Theorem (Rao73: First Fundamental Theorem) Let r be the rank of A and for b N(Ax, σb 2 I), (errors in measurements are normally distributed with mean 0 and covariance σb 2 I), then J = min Ax b 2 σ 2 x bχ 2 (m r). J follows a χ 2 distribution with m r degrees of freedom: Basically the Discrepancy Principle Corollary (Weighted Least Squares) For b N(Ax, C b ), and W b = C b 1 then J = min Ax b 2 x W b χ 2 (m r). MATHEMATICS AND STATISTICS 13 / 37

Extension: Statistics of the Regularized Least Squares Problem Two New Results to Help Find the Regularization parameter: Theorem: χ 2 distribution of the regularized functional ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (4) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) 0. Moore-Penrose generalized inverse of W D is C D Statistics: (b Ax) = e N(0, C b ), (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D χ 2 (m + p n) MATHEMATICS AND STATISTICS 14 / 37

Corollary: a-priori information not mean value, e.g. x 0 = 0 Corollary: non-central χ 2 distribution of the regularized functional ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (5) Assume all assumptions as before, but x x 0 is the mean vector of the model parameters. Let c = c 2 2 = QU T W b 1/2 A(x x 0 ) 2 2 Then J D χ 2 (m + p n, c) MATHEMATICS AND STATISTICS 15 / 37

Implications of the Result Statistical Distribution of the Functional Mean and Variance are prescribed E(J D ) = m + p n + c E(J D J T D) = 2(m + p n) + 4c Can we use this? YES Try to find W D so that E(J) = m n + p + 2c Mead presented expensive nonlinear algorithm when c = 0. But it does find W D. Our proposal - find λ only. MATHEMATICS AND STATISTICS 16 / 37

What do we need to apply the Theory? Requirements Covariance information C b on data parameters b ( or on model parameters x!) A priori information either x 0 is the mean, or mean value x. But x and x 0 are not known. For repeated data measurements C b can be calculated. Also b can be found, the mean of b. But E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2) = m+p n+ QU T W b 1/2 (b Ax 0 ) 2 2 Then we can use E(J) to find λ MATHEMATICS AND STATISTICS 17 / 37

Assume x 0 is the mean (experimentalists do know something about the model parameters) DESIGNING THE ALGORITHM: I Recall: if C b and C x are good estimates of covariance should be small. J D (ˆx) (m + p n) Thus, let m = m + p n then we want m 2 mz α/2 < J(x(W D )) < m + 2 mz α/2. (6) z α/2 is the relevant z-value for a χ 2 -distribution with m degrees GOAL Find W D to make (6) tight: Single Variable case find λ J D (ˆx(λ)) m MATHEMATICS AND STATISTICS 18 / 37

A Newton-line search Algorithm to find λ. (Basic algebra) Newton to Solve F(σ) = J D (σ) m = 0 We use σ = 1/λ, and y(σ (k) ) is the current solution for which x(σ (k) ) = y(σ (k) ) + x 0 Then σ J(σ) = 2 σ 3 Dy(σ) 2 < 0 Hence we have a basic Newton Iteration Add a line search σ (k+1) = σ (k) (1 + 1 2 ( σ(k) Dy )2 (J D (σ (k) ) m)). σ (k+1) = σ (k) (1 + α(k) 2 ( σ(k) Dy )2 (J D (σ (k) ) m)). MATHEMATICS AND STATISTICS 19 / 37

Algorithm Using the GSVD GSVD Use GSVD of [W 1/2 b A, D] For γ i the generalized singular values, and s = U T W 1/2 b r m = m n + p s i = s i /(γi 2σ2 x + 1), i = 1,..., p, t i = s i γ i. Find root of F(σ x ) = p 1 ( γi 2σ2 x + 1 )s2 i + i=1 m s 2 i m = 0 i=n+1 Equivalently: solve F = 0, where F(σ x ) = s T s m and F (σ x ) = 2σ x t 2 2. MATHEMATICS AND STATISTICS 20 / 37

Discussion on Convergence F is monotonic decreasing (F (σ x ) = 2σ x t 2 2 ) Solution either exists and is unique for positive σ Or no solution exists F(0) < 0. implies incorrect statistics of the model Theoretically, lim σ F > 0 possible. Equivalent to λ = 0. No regularization needed. MATHEMATICS AND STATISTICS 21 / 37

Practical Details of Algorithm Find the parameter Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields sigmamax and sigmamin Step 2: Calculate step, with steepness controlled by told. Let t = Dy/σ (k), where y is the current update, given from the GSVD, then step = 1 2 ( 1 max { t, told} )2 (J D (σ (k) ) m) Step 3: Introduce line search α (k) in Newton sigmanew = σ (k) (1 + α (k) step) α (k) chosen such that sigmanew within bracket. MATHEMATICS AND STATISTICS 22 / 37

Practical Details of Algorithm: Large Scale problems Algorithm Initialization Convert generalized Tikhonov problem to standard form. Use LSQR algorithm to find the bidiagonal matrix for the projected problem. Obtain a solution of the bidiagonal problem for given initial σ. Subsequent Steps Increase dimension of space if needed with reuse of existing bidiagonalization. Each σ calculation of algorithm reuses saved information from the Lancos bidiagonalization. The system is augmented if needed. May also use smaller size system if appropriate. MATHEMATICS AND STATISTICS 23 / 37

Comparison with Standard LSQR hybrid Algorithm Algorithm concurrently regularizes and solves the system. Standard hybrid LSQR solves projected system then adds regularization. Advantages Costs Needs only cost of standard LSQR algorithm with some updates for solution solves for iterated σ. The regularization introduced by LSQR projection may be useful for preventing problems with GSVD expansion. Makes algorithm viable for large scale problems. MATHEMATICS AND STATISTICS 24 / 37

Recall: Implementation Assumptions Covariance of Error: Statistics of Measurement Errors Information on the covariance structure of errors in b needed. Use C b = σb 2 I for common covariance, white noise. Use C b = diag(σ1 2, σ2 2,..., σ2 m) for colored uncorrelated noise. With no noise information C b = I. Use b as the mean of measured b, when implemented with centrality parameter, x 0 = 0. MATHEMATICS AND STATISTICS 25 / 37

Illustrating the Results for Problem Size 512: Two Standard Test Problems Figure: Comparison for noise level 10%. On left D = I and on right D is first derivative Notice L-curve and χ 2 -LSQR perform well. UPRE does not perform well. MATHEMATICS AND STATISTICS 26 / 37

Real Data: Seismic Signal Restoration The Data Set and Goal Real data set of 48 signals of length 3000. The point spread function is derived from the signals. Calculate the signal variance pointwise over all 48 signals. Goal: restore the signal x from Ax = b, where A is psf matrix and b is given blurred signal. Method of Comparison- no exact solution known: use convergence with respect to downsampling. MATHEMATICS AND STATISTICS 27 / 37

Comparison High Resolution White noise (left) and Colored Noise (right) Greater contrast with χ 2. UPRE is insufficiently regularized. L-curve severely undersmooths (not shown). Parameters not consistent across resolutions MATHEMATICS AND STATISTICS 28 / 37

THE UPRE SOLUTION: White Noise and Colored Noise x 0 = 0 Regularization Parameters are consistent: σ = 0.01005 all resolutions MATHEMATICS AND STATISTICS 29 / 37

THE LSQR Hybrid SOLUTION: White Noise (left) and Colored Noise (right) x 0 = 0 Regularization quite consistent resolution 2 to 100 σ = 0.0000029,.0000029,.0000029,.0000057,.0000057 (left) σ = 0.00007,.00007,.00007,.00007,.00012 (right). Notice that colored noise eliminates second arrival of signal but excellent contrast to identify primary arrival. MATHEMATICS AND STATISTICS 30 / 37

Sensitivity of LSQR to Parameter Choices Figure: Illustrating the dependence of j(σ) on σ for problem with noise level 10% for increasing inner tolerance in LSQR iteration. Subproblem size increases with increasing σ. Subproblem size decreases with decreasing tolerance. MATHEMATICS AND STATISTICS 31 / 37

Some Comparisons of LSQR and GSVD Impementations Problem size 64 shaw Average Newton Steps P value Relative ɛ Inner Steps GSVD LSQR Iteration σ Error 5e 02 7.4 26.7 26.5 0.9 1.0 0.189 0.189 1e 01 6.9 26.5 25.8 0.8 1.0 0.215 0.215 ilaplace 5e 02 4.7 25.6 30.8 0.9 1.0 0.161 0.162 1e 01 4.3 24.1 30.9 0.7 1.0 0.211 0.211 heat 5e 02 14.8 26.1 30.3 1.0 1.0 0.304 0.304 1e 01 11.1 24.5 30.3 0.8 1.0 0.375 0.375 phillips 5e 02 8.0 24.6 30.2 0.9 1.0 0.079 0.079 1e 01 7.0 25.4 30.5 0.9 1.0 0.112 0.112 MATHEMATICS AND STATISTICS 32 / 37

Some Comparisons with Other Methods: Problem Size 128 shaw Least Squares Error Failure Count ɛ GSVD LSQR LC UPRE GSVD LSQR LC UPRE 5e 02 0.229 0.206 0.200 0.239 1 1 0 63 1e 01 0.276 0.236 0.241 0.259 5 4 0 71 ilaplace 5e 02 0.200 0.204 0.166 0.195 22 18 0 71 1e 01 0.247 0.230 0.231 0.230 67 53 63 112 heat 5e 02 0.327 0.324 NaN 0.305 46 40 250 21 1e 01 0.408 0.404 NaN 0.396 119 109 250 61 phillips 5e 02 0.100 0.099 0.080 0.109 0 0 0 34 1e 01 0.137 0.136 0.123 0.139 2 2 0 28 MATHEMATICS AND STATISTICS 33 / 37

Main Observations GSVD and LSQR implementations consistent Method requires few iterations of root finding LSQR uses small bidiagonal system. LSQR is relatively robust to inner tolerance Increased σ implies reduced regularization by Tikhonov but increased regularization by LSQR. MATHEMATICS AND STATISTICS 34 / 37

Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. Method is very efficient, Newton method is robust and fast. But a priori information is needed. MATHEMATICS AND STATISTICS 35 / 37

Difficulties when central parameter is required What are the issues? Function need not be monotonic More problematic for NonCentral version with x 0 not the mean. (ie x 0 = 0. σ can be bounded by result of central case. Range of σ given by range of γ i. MATHEMATICS AND STATISTICS 36 / 37

Future Work Other Results and Future Work Degrees of freedom reduced when using the GSVD. Image deblurring. (Implementation to use minimal storage) Diagonal Weighting Schemes Edge preserving regularization Constraint implementation ( with Mead submitted). MATHEMATICS AND STATISTICS 37 / 37