Statistically-Based Regularization Parameter Estimation for Large Scale Problems

Size: px
Start display at page:

Download "Statistically-Based Regularization Parameter Estimation for Large Scale Problems"

Transcription

1 Statistically-Based Regularization Parameter Estimation for Large Scale Problems Rosemary Renaut Joint work with Jodi Mead and Iveta Hnetynkova March 1, 2010 National Science Foundation: Division of Computational Mathematics 1 / 1

2 Outline National Science Foundation: Division of Computational Mathematics 2 / 1

3 Signal/Image Restoration: Integral Model of Signal Degradation b(t) = K(t, s)x(s)ds K(t, s) describes blur of the signal. Convolutional model: invariant K(t, s) = K(t s) is Point Spread Function (PSF). Typically sampling includes noise e(t), model is b(t) = K(t s)x(s)ds + e(t) Discrete model: given discrete samples b, find samples x of x Let A discretize K, assume known, model is given by Naïvely invert the system to find x! b = Ax + e. National Science Foundation: Division of Computational Mathematics 3 / 1

4 Signal/Image Restoration: Integral Model of Signal Degradation b(t) = K(t, s)x(s)ds K(t, s) describes blur of the signal. Convolutional model: invariant K(t, s) = K(t s) is Point Spread Function (PSF). Typically sampling includes noise e(t), model is b(t) = K(t s)x(s)ds + e(t) Discrete model: given discrete samples b, find samples x of x Let A discretize K, assume known, model is given by Naïvely invert the system to find x! b = Ax + e. National Science Foundation: Division of Computational Mathematics 3 / 1

5 Example 1-D Original and Blurred Noisy Signal Original signal x. Blurred and noisy signal b, Gaussian PSF. National Science Foundation: Division of Computational Mathematics 4 / 1

6 The Solution: Regularization is needed Naïve Solution A Regularized Solution National Science Foundation: Division of Computational Mathematics 5 / 1

7 Two dimensional Image Deblurring How do we obtain the deblurred image National Science Foundation: Division of Computational Mathematics 6 / 1

8 Two dimensional Image Deblurring How do we obtain the deblurred image National Science Foundation: Division of Computational Mathematics 6 / 1

9 Least Squares for Ax = b: A Quick Review Consider discrete systems: A R m n, b R m, x R n Ax = b + e, Classical Approach Linear Least Squares (A full rank) x LS = arg min x Ax b 2 2 Difficulty x LS sensitive to changes in right hand side b when A is ill-conditioned. For convolutional models system is ill-posed. National Science Foundation: Division of Computational Mathematics 7 / 1

10 Least Squares for Ax = b: A Quick Review Consider discrete systems: A R m n, b R m, x R n Ax = b + e, Classical Approach Linear Least Squares (A full rank) x LS = arg min x Ax b 2 2 Difficulty x LS sensitive to changes in right hand side b when A is ill-conditioned. For convolutional models system is ill-posed. National Science Foundation: Division of Computational Mathematics 7 / 1

11 Least Squares for Ax = b: A Quick Review Consider discrete systems: A R m n, b R m, x R n Ax = b + e, Classical Approach Linear Least Squares (A full rank) x LS = arg min x Ax b 2 2 Difficulty x LS sensitive to changes in right hand side b when A is ill-conditioned. For convolutional models system is ill-posed. National Science Foundation: Division of Computational Mathematics 7 / 1

12 Least Squares for Ax = b: A Quick Review Consider discrete systems: A R m n, b R m, x R n Ax = b + e, Classical Approach Linear Least Squares (A full rank) x LS = arg min x Ax b 2 2 Difficulty x LS sensitive to changes in right hand side b when A is ill-conditioned. For convolutional models system is ill-posed. National Science Foundation: Division of Computational Mathematics 7 / 1

13 Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

14 Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

15 Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

16 Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

17 Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

18 Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

19 Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

20 Introduce Regularization to Pick a Solution Weighted Fidelity with Regularization Regularize x RLS (λ) = arg min{ b Ax 2 x W b + λ 2 R(x)}, Weighting matrix W b R(x) is a regularization term λ is a regularization parameter which is unknown. Solution x RLS (λ) depends on λ. depends on regularization operator R depends on the weighting matrix W b National Science Foundation: Division of Computational Mathematics 8 / 1

21 The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

22 The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

23 The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

24 The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

25 The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

26 The Weighting Matrix: Some Assumptions for Multiple Data Measurements Given multiple measurements of data b: Usually error in b, e is an m vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E(ee T ). For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For white noise C b = σ 2 I. Weighting by W b = C b 1 in data fit term, theoretically, ẽ = W b 1/2 e are uncorrelated. Difficulty if W b increases ill-conditioning of A! National Science Foundation: Division of Computational Mathematics 9 / 1

27 Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

28 Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

29 Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

30 Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

31 Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

32 Formulation: Generalized Tikhonov Regularization With Weighting Use R(x) = D(x x 0 ) 2 ˆx = argmin J(x) = argmin{ Ax b 2 W b + λ 2 D(x x 0 ) 2 }. (1) D is a suitable operator, often derivative approximation. Assume N (A) N (D) = {0} x 0 is a reference solution, often x 0 = 0. Question Given D, W b how do we find λ? National Science Foundation: Division of Computational Mathematics 10 / 1

33 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

34 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

35 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

36 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

37 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

38 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

39 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

40 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

41 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

42 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

43 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

44 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

45 Example: solution for Increasing λ, D = I. National Science Foundation: Division of Computational Mathematics 11 / 1

46 Choice of λ crucial Different algorithms yield different solutions. Discrepancy Principle L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 12 / 1

47 Choice of λ crucial Different algorithms yield different solutions. Discrepancy Principle L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 12 / 1

48 Choice of λ crucial Different algorithms yield different solutions. Discrepancy Principle L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 12 / 1

49 Choice of λ crucial Different algorithms yield different solutions. Discrepancy Principle L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 12 / 1

50 Choice of λ crucial Different algorithms yield different solutions. Discrepancy Principle L-Curve Generalized Cross Validation (GCV) Unbiased Predictive Risk (UPRE) National Science Foundation: Division of Computational Mathematics 12 / 1

51 The Discrepancy Principle Suppose noise is white: C b = σ 2 b I. Find λ such that the regularized residual satisfies σ 2 b = 1 m b Ax(λ) 2 2. (2) Can be implemented by a Newton root finding algorithm. But discrepancy principle typically oversmooths. National Science Foundation: Division of Computational Mathematics 13 / 1

52 The Discrepancy Principle Suppose noise is white: C b = σ 2 b I. Find λ such that the regularized residual satisfies σ 2 b = 1 m b Ax(λ) 2 2. (2) Can be implemented by a Newton root finding algorithm. But discrepancy principle typically oversmooths. National Science Foundation: Division of Computational Mathematics 13 / 1

53 The Discrepancy Principle Suppose noise is white: C b = σ 2 b I. Find λ such that the regularized residual satisfies σ 2 b = 1 m b Ax(λ) 2 2. (2) Can be implemented by a Newton root finding algorithm. But discrepancy principle typically oversmooths. National Science Foundation: Division of Computational Mathematics 13 / 1

54 The Discrepancy Principle Suppose noise is white: C b = σ 2 b I. Find λ such that the regularized residual satisfies σ 2 b = 1 m b Ax(λ) 2 2. (2) Can be implemented by a Newton root finding algorithm. But discrepancy principle typically oversmooths. National Science Foundation: Division of Computational Mathematics 13 / 1

55 Some standard approaches I: L-curve - Find the corner Let r(λ) = (A(λ) A)b: Influence Matrix A(λ) = A(A T W b A+λ 2 D T D) 1 A T Plot log( Dx ), log( r(λ) ) Trade off contributions. Expensive - requires range of λ. GSVD makes calculations efficient. Not statistically based Find corner No corner National Science Foundation: Division of Computational Mathematics 14 / 1

56 Generalized Cross-Validation (GCV) Let A(λ) = A(A T W b A+λ 2 D T D) 1 A T Can pick W b = I. Minimize GCV function b Ax(λ) 2 W b [trace(i m A(λ))] 2, which estimates predictive risk. Expensive - requires range of λ. GSVD makes calculations efficient. Requires minimum Multiple minima Sometimes flat National Science Foundation: Division of Computational Mathematics 15 / 1

57 Unbiased Predictive Risk Estimation (UPRE) Minimize expected value of predictive risk: Minimize UPRE function b Ax(λ) 2 W b +2 trace(a(λ)) m Expensive - requires range of λ. GSVD makes calculations efficient. Need estimate of trace Minimum needed National Science Foundation: Division of Computational Mathematics 16 / 1

58 An Alternative: Iterative Methods with Stopping Criteria Iterate to find approximate solution of Ax b. Use LSQR. Introduce stopping criteria based on determining noise in the solution. e.g. Residual Periodogram (O Leary and Rust). Stop when noise dominates Hnetynkova et al. Hybrid Methods combine iteration and regularization parameter. Cost of regularization of reduced system is minimal Advantage any regularization may be used for subproblem. (Nagy etc) How to be sure when to stop the LSQR iteration? How is statistics included in sub problem? Include statistics directly National Science Foundation: Division of Computational Mathematics 17 / 1

59 An Alternative: Iterative Methods with Stopping Criteria Iterate to find approximate solution of Ax b. Use LSQR. Introduce stopping criteria based on determining noise in the solution. e.g. Residual Periodogram (O Leary and Rust). Stop when noise dominates Hnetynkova et al. Hybrid Methods combine iteration and regularization parameter. Cost of regularization of reduced system is minimal Advantage any regularization may be used for subproblem. (Nagy etc) How to be sure when to stop the LSQR iteration? How is statistics included in sub problem? Include statistics directly National Science Foundation: Division of Computational Mathematics 17 / 1

60 Background: Statistics of the Least Squares Problem Theorem (Rao73: First Fundamental Theorem) Let r be the rank of A and for b N(Ax, σb 2 I), (errors in measurements are normally distributed with mean 0 and covariance σb 2 I), then J = min Ax b 2 σ 2 x bχ 2 (m r). J follows a χ 2 distribution with m r degrees of freedom: Basically the Discrepancy Principle Corollary (Weighted Least Squares) For b N(Ax, C b ), W b = C b 1 then J = min Ax b 2 x W b χ 2 (m r). National Science Foundation: Division of Computational Mathematics 18 / 1

61 Background: Statistics of the Least Squares Problem Theorem (Rao73: First Fundamental Theorem) Let r be the rank of A and for b N(Ax, σb 2 I), (errors in measurements are normally distributed with mean 0 and covariance σb 2 I), then J = min Ax b 2 σ 2 x bχ 2 (m r). J follows a χ 2 distribution with m r degrees of freedom: Basically the Discrepancy Principle Corollary (Weighted Least Squares) For b N(Ax, C b ), W b = C b 1 then J = min Ax b 2 x W b χ 2 (m r). National Science Foundation: Division of Computational Mathematics 18 / 1

62 Extension: Statistics of the Regularized Least Squares Problem Thm: χ 2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (3) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) {0}. Moore-Penrose generalized inverse of W D is C D Statistics: Errors in the right hand side e N(0, C b ), and x 0 is known so that (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D (ˆx(W D )) χ 2 (m + p n) National Science Foundation: Division of Computational Mathematics 19 / 1

63 Extension: Statistics of the Regularized Least Squares Problem Thm: χ 2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (3) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) {0}. Moore-Penrose generalized inverse of W D is C D Statistics: Errors in the right hand side e N(0, C b ), and x 0 is known so that (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D (ˆx(W D )) χ 2 (m + p n) National Science Foundation: Division of Computational Mathematics 19 / 1

64 Extension: Statistics of the Regularized Least Squares Problem Thm: χ 2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (3) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) {0}. Moore-Penrose generalized inverse of W D is C D Statistics: Errors in the right hand side e N(0, C b ), and x 0 is known so that (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D (ˆx(W D )) χ 2 (m + p n) National Science Foundation: Division of Computational Mathematics 19 / 1

65 Extension: Statistics of the Regularized Least Squares Problem Thm: χ 2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (3) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) {0}. Moore-Penrose generalized inverse of W D is C D Statistics: Errors in the right hand side e N(0, C b ), and x 0 is known so that (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D (ˆx(W D )) χ 2 (m + p n) National Science Foundation: Division of Computational Mathematics 19 / 1

66 Extension: Statistics of the Regularized Least Squares Problem Thm: χ 2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: Weighting Matrix on Regularization term.) ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. (3) Assume Then W b and W x are symmetric positive definite. Problem is uniquely solvable N (A) N (D) {0}. Moore-Penrose generalized inverse of W D is C D Statistics: Errors in the right hand side e N(0, C b ), and x 0 is known so that (x x 0 ) = f N(0, C D ), x 0 is the mean vector of the model parameters. J D (ˆx(W D )) χ 2 (m + p n) National Science Foundation: Division of Computational Mathematics 19 / 1

67 Significance of the χ 2 result For sufficiently large m = m + p n J D χ 2 (m + p n) E(J(x(W D ))) = m + p n E(JJ T ) = 2(m + p n) Moreover m 2 mz α/2 < J(ˆx(W D )) < m + 2 mz α/2. (4) z α/2 is the relevant z-value for a χ 2 -distribution with m = m + p n degrees National Science Foundation: Division of Computational Mathematics 20 / 1

68 Key Aspects of the Proof I: The Functional J Algebraic Simplifications: Rewrite functional as quadratic form Regularized solution given in terms of resolution matrix R(W D ) ˆx = x 0 + (A T W b A + D T W x D) 1 A T W b r, (5) = x 0 + R(W D )W 1/2 b r, r = b Ax 0 = x 0 + y(w D ). (6) R(W D ) = (A T W b A + D T W x D) 1 A T 1/2 W b (7) Functional is given in terms of influence matrix A(W D ) A(W D ) = W b 1/2 AR(W D ) (8) J D (ˆx) = r T W b 1/2 (I m A(W D ))W b 1/2 r, let r = W b 1/2 r (9) = r T (I m A(W D )) r. A Quadratic Form (10) National Science Foundation: Division of Computational Mathematics 21 / 1

69 Key Aspects of the Proof II : Properties of a Quadratic Form χ 2 distribution of Quadratic Forms x T Px for normal variables (Fisher- Cochran Theorem) Components x i are independent normal variables x i N(0, 1), i = 1 : n. A necessary and sufficient condition that x T Px has a central χ 2 distribution is that P is idempotent, P 2 = P. In which case the degrees of freedom of χ 2 is rank(p) =trace(p) = n.. When the means of x i are µ i 0, x T Px has a non-central χ 2 distribution, with non-centrality parameter c = µ T Pµ A χ 2 random variable with n degrees of freedom and centrality parameter c has mean n + c and variance 2(n + 2c). National Science Foundation: Division of Computational Mathematics 22 / 1

70 Key Aspects of the Proof III: Requires the GSVD Lemma Assume invertibility and m n p. There exist unitary matrices U R m m, V R p p, and a nonsingular matrix X R n n such that [ ] Υ A = U X T D = V[M, 0 p (n p) ]X T, (11) 0 (m n) n Υ = diag(υ 1,..., υ p, 1,..., 1) R n n, M = diag(µ 1,..., µ p ) R p p, 0 υ 1 υ p 1, 1 µ 1 µ p > 0, υi 2 + µ 2 i = 1, i = 1,... p. (12) The Functional with the GSVD Let Q = diag(µ 1,..., µ p, 0 n p, I m n ) then J = r T (I m A(W D )) r = QU T r 2 2 = k 2 2 National Science Foundation: Division of Computational Mathematics 23 / 1

71 Proof IV: Statistical Distribution of the Weighted Residual Covariance Structure Errors in b are e N(0, C b ). Now b depends on x, b = Ax hence we can show b N(Ax 0, C b + AC D A T ) (x 0 is mean of x) Residual r = b Ax N(0, C b + AC D A T ). r = W b 1/2 r N(0, I + ÃC D Ã T ), Ã = W b 1/2 A. Use the GSVD I + ÃC D Ã T = UQ 2 U T, Q = diag(µ 1,..., µ p, I n p, I m n ) Now k = QU T r then k N(0, QU T (UQ 2 U T )UQ) N(0, I m ) But J = QU T r 2 = k 2, where k is the vector k excluding components p + 1 : n. Thus J D χ 2 (m + p n). National Science Foundation: Division of Computational Mathematics 24 / 1

72 When mean of the parameters is not known, or x 0 = 0 is not the mean Corollary: non-central χ 2 distribution of the regularized functional Recall ˆx = argmin J D (x) = argmin{ Ax b 2 W b + (x x 0 ) 2 W D }, W D = D T W x D. Assume all assumptions as before, but x x 0 is the mean vector of the model parameters. Let c = c 2 2 = QU T W b 1/2 A(x x 0 ) 2 2 Then J D χ 2 (m + p n, c) The functional at optimum follows a non central χ 2 distribution National Science Foundation: Division of Computational Mathematics 25 / 1

73 A further result when A is not of full column rank The Rank Deficient Solution Suppose A is not full column rank. Then the filtered solution can be written in terms of the GSVD x FILT (λ) = p i=p+1 r γ 2 i υ i (γ 2 i + λ 2 ) s i x i + n i=p+1 s i x i = p i=1 f i υ i s i x i + n i=p+1 s i x i. Here f i = 0, i = 1 : p r, f i = γ 2 i /(γ2 i + λ 2 ), i = p r + 1 : p. This yields J(x FILT (λ)) χ 2 (m n + r, c) notice degrees of freedom are reduced. National Science Foundation: Division of Computational Mathematics 26 / 1

74 General Result The Cost Functional follows a χ 2 Statistical Distribution Suppose degrees of freedom m and centrality parameter c then E(J D ) = m + c E(J D J T D) = 2( m) + 4c Suggests: Try to find W D so that E(J) = m + c Mead expensive nonlinear algorithm, c = 0 general W D. First find λ only. Find W x = λ 2 I National Science Foundation: Division of Computational Mathematics 27 / 1

75 General Result The Cost Functional follows a χ 2 Statistical Distribution Suppose degrees of freedom m and centrality parameter c then E(J D ) = m + c E(J D J T D) = 2( m) + 4c Suggests: Try to find W D so that E(J) = m + c Mead expensive nonlinear algorithm, c = 0 general W D. First find λ only. Find W x = λ 2 I National Science Foundation: Division of Computational Mathematics 27 / 1

76 General Result The Cost Functional follows a χ 2 Statistical Distribution Suppose degrees of freedom m and centrality parameter c then E(J D ) = m + c E(J D J T D) = 2( m) + 4c Suggests: Try to find W D so that E(J) = m + c Mead expensive nonlinear algorithm, c = 0 general W D. First find λ only. Find W x = λ 2 I National Science Foundation: Division of Computational Mathematics 27 / 1

77 General Result The Cost Functional follows a χ 2 Statistical Distribution Suppose degrees of freedom m and centrality parameter c then E(J D ) = m + c E(J D J T D) = 2( m) + 4c Suggests: Try to find W D so that E(J) = m + c Mead expensive nonlinear algorithm, c = 0 general W D. First find λ only. Find W x = λ 2 I National Science Foundation: Division of Computational Mathematics 27 / 1

78 What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

79 What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

80 What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

81 What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

82 What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

83 What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

84 What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

85 What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

86 What do we need to apply the Theory? Requirements Covariance C b on data parameters b (or on model parameters x!) A priori information x 0, mean x. But x (and hence x 0 ) are not known. If not known use repeated data measurements calculate C b and mean b. Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence c = c 2 2 = QU T W b 1/2 (b Ax 0 ) 2 2 E(J D ) = E( QU T W b 1/2 (b Ax 0 ) 2 2 ) = m + p n + c 2 2 Given the GSVD estimate the degrees of freedom m. Then we can use E(J) to find λ National Science Foundation: Division of Computational Mathematics 28 / 1

87 Assume x 0 is the mean (experimentalists know something about model parameters) DESIGNING THE ALGORITHM: I Recall: if C b and C x are good estimates of covariance should be small. J D (ˆx) (m + p n) Thus, let m = m + p n then we want m 2 mz α/2 < J(x(W D )) < m + 2 mz α/2. z α/2 is the relevant z-value for a χ 2 -distribution with m degrees GOAL Find W x to make (??) tight: Single Variable case find λ J D (ˆx(λ)) m National Science Foundation: Division of Computational Mathematics 29 / 1

88 Assume x 0 is the mean (experimentalists know something about model parameters) DESIGNING THE ALGORITHM: I Recall: if C b and C x are good estimates of covariance should be small. J D (ˆx) (m + p n) Thus, let m = m + p n then we want m 2 mz α/2 < J(x(W D )) < m + 2 mz α/2. z α/2 is the relevant z-value for a χ 2 -distribution with m degrees GOAL Find W x to make (??) tight: Single Variable case find λ J D (ˆx(λ)) m National Science Foundation: Division of Computational Mathematics 29 / 1

89 A Newton-line search Algorithm to find λ = 1/σ. (Basic algebra) Newton to Solve F(σ) = J D (σ) m = 0 We use σ = 1/λ, and y(σ (k) ) is the current solution for which x(σ (k) ) = y(σ (k) ) + x 0 Then σ J(σ) = 2 σ 3 Dy(σ) 2 < 0 Hence we have a basic Newton Iteration σ (k+1) = σ (k) ( ( σ(k) Dy )2 (J D (σ (k) ) m)). National Science Foundation: Division of Computational Mathematics 30 / 1

90 A Newton-line search Algorithm to find λ = 1/σ. (Basic algebra) Newton to Solve F(σ) = J D (σ) m = 0 We use σ = 1/λ, and y(σ (k) ) is the current solution for which x(σ (k) ) = y(σ (k) ) + x 0 Then σ J(σ) = 2 σ 3 Dy(σ) 2 < 0 Hence we have a basic Newton Iteration σ (k+1) = σ (k) ( ( σ(k) Dy )2 (J D (σ (k) ) m)). National Science Foundation: Division of Computational Mathematics 30 / 1

91 A Newton-line search Algorithm to find λ = 1/σ. (Basic algebra) Newton to Solve F(σ) = J D (σ) m = 0 We use σ = 1/λ, and y(σ (k) ) is the current solution for which x(σ (k) ) = y(σ (k) ) + x 0 Then σ J(σ) = 2 σ 3 Dy(σ) 2 < 0 Hence we have a basic Newton Iteration σ (k+1) = σ (k) ( ( σ(k) Dy )2 (J D (σ (k) ) m)). National Science Foundation: Division of Computational Mathematics 30 / 1

92 Algorithm Using the GSVD GSVD Use GSVD of [W 1/2 b A, D] For γ i the generalized singular values, and s = U T W 1/2 b r m = m n + p s i = s i /(γi 2σ2 x + 1), i = 1,..., p, t i = s i γ i. Find root of F(σ x ) = p 1 ( γi 2σ2 x + 1 )s2 i + i=1 m s 2 i m = 0 i=n+1 Equivalently: solve F = 0, where F(σ x ) = s T s m and F (σ x ) = 2σ x t 2 2. National Science Foundation: Division of Computational Mathematics 31 / 1

93 Discussion on Convergence F is monotonic decreasing (F (σ x ) = 2σ x Dy 2 2 ) Solution either exists and is unique for positive σ Or no solution exists F(0) < 0. implies incorrect statistics of the model Theoretically, lim σ F > 0 possible. Equivalent to λ = 0. No regularization needed. National Science Foundation: Division of Computational Mathematics 32 / 1

94 Practical Details of Algorithm Find the parameter Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields sigmamax and sigmamin Step 2: Calculate step, with steepness controlled by told. Let t = Dy/σ (k), where y is the current update, then step = 1 2 ( 1 max { t, told} )2 (J D (σ (k) ) m) Step 3: Introduce line search α (k) in Newton sigmanew = σ (k) (1 + α (k) step) α (k) chosen such that sigmanew within bracket. National Science Foundation: Division of Computational Mathematics 33 / 1

95 Practical Details of Algorithm Find the parameter Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields sigmamax and sigmamin Step 2: Calculate step, with steepness controlled by told. Let t = Dy/σ (k), where y is the current update, then step = 1 2 ( 1 max { t, told} )2 (J D (σ (k) ) m) Step 3: Introduce line search α (k) in Newton sigmanew = σ (k) (1 + α (k) step) α (k) chosen such that sigmanew within bracket. National Science Foundation: Division of Computational Mathematics 33 / 1

96 Practical Details of Algorithm Find the parameter Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields sigmamax and sigmamin Step 2: Calculate step, with steepness controlled by told. Let t = Dy/σ (k), where y is the current update, then step = 1 2 ( 1 max { t, told} )2 (J D (σ (k) ) m) Step 3: Introduce line search α (k) in Newton sigmanew = σ (k) (1 + α (k) step) α (k) chosen such that sigmanew within bracket. National Science Foundation: Division of Computational Mathematics 33 / 1

97 Practical Details of Algorithm Find the parameter Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yields sigmamax and sigmamin Step 2: Calculate step, with steepness controlled by told. Let t = Dy/σ (k), where y is the current update, then step = 1 2 ( 1 max { t, told} )2 (J D (σ (k) ) m) Step 3: Introduce line search α (k) in Newton sigmanew = σ (k) (1 + α (k) step) α (k) chosen such that sigmanew within bracket. National Science Foundation: Division of Computational Mathematics 33 / 1

98 Practical Details of Algorithm: Large Scale problems Algorithm Initialization Convert generalized Tikhonov problem to standard form.( if L is not invertible you just need to know how to find Ax and A T x, and the null space of L) Use LSQR algorithm to find the bidiagonal matrix for the projected problem. Obtain a solution of the bidiagonal problem for given initial σ. Subsequent Steps Increase dimension of space if needed with reuse of existing bidiagonalization. May also use smaller size system if appropriate. Each σ calculation of algorithm reuses saved information from the Lancos bidiagonalization. National Science Foundation: Division of Computational Mathematics 34 / 1

99 Comparison with Standard LSQR hybrid Algorithm Algorithm can concurrently regularize and find λ Standard hybrid LSQR solves projected system then adds regularization. This is also possible: results of both approaches. Advantages Costs Needs only cost of standard LSQR algorithm with some updates for solution solves for iterated σ. The regularization introduced by LSQR projection may be useful for preventing problems with GSVD expansion. Makes algorithm viable for large scale problems. National Science Foundation: Division of Computational Mathematics 35 / 1

100 Illustrating the Results for Problem Size 512: Two Standard Test Problems omparison for noise level 10%. On left D = I and on right D is first derivative Notice L-curve and χ 2 -LSQR perform well. UPRE does not perform well. National Science Foundation: Division of Computational Mathematics 36 / 1

101 Real Data: Seismic Signal Restoration The Data Set and Goal Real data set of 48 signals of length The point spread function is derived from the signals. Calculate the signal variance pointwise over all 48 signals. Goal: restore the signal x from Ax = b, where A is PSF matrix and b is given blurred signal. Method of Comparison- no exact solution known: use convergence with respect to downsampling. National Science Foundation: Division of Computational Mathematics 37 / 1

102 Comparison High Resolution White noise Greater contrast with χ 2. UPRE is insufficiently regularized. L-curve severely undersmooths (not shown). Parameters not consistent across resolutions National Science Foundation: Division of Computational Mathematics 38 / 1

103 THE UPRE SOLUTION: x 0 = 0 White Noise Regularization Parameters are consistent: σ = all resolutions National Science Foundation: Division of Computational Mathematics 39 / 1

104 THE LSQR Hybrid SOLUTION: White Noise Regularization quite consistent resolution 2 to 100 σ = , , , , National Science Foundation: Division of Computational Mathematics 40 / 1

105 Illustrating the Deblurring Result: Problem Size Example taken from RESTORE TOOLS Nagy et al : 15% Noise Computational Cost is Minimal: Projected Problem Size is 15, λ =.58 National Science Foundation: Division of Computational Mathematics 41 / 1

106 Problem Grain noise 15% added for increasing subproblem size Signal to noise ratio 10 log 10 (1/e) relative error e Regularization parameter σ in each case. National Science Foundation: Division of Computational Mathematics 42 / 1

107 Illustrating the progress of the Newton algorithm post LSQR National Science Foundation: Division of Computational Mathematics 43 / 1

108 Illustrating the progress of the Newton algorithm with LSQR National Science Foundation: Division of Computational Mathematics 44 / 1

109 Problem Grain noise 15% added for increasing subproblem size Signal to noise ratio 10 log 10 (1/e) relative error e National Science Foundation: Division of Computational Mathematics 45 / 1

110 Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

111 Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

112 Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

113 Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

114 Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

115 Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

116 Conclusions Observations A new statistical method for estimating regularization parameter Compares favorably with UPRE with respect to performance and compared to L-curve. (GCV is not competitive). Method can be used for large scale problems. PROBLEM SIZE of Hybrid LSQR USED IS VERY SMALL Method is very efficient, Newton method is robust and fast. a priori information is needed. Can be obtained directly from the data. e.g. Use local statistical information of image National Science Foundation: Division of Computational Mathematics 46 / 1

117 Future Work Other Results and Future Work Software Package! Diagonal Weighting Schemes Edge preserving regularization - Total Variation Better handling of Colored Noise. Bound Constraints (with Mead accepted). National Science Foundation: Division of Computational Mathematics 47 / 1

118 THANK YOU! National Science Foundation: Division of Computational Mathematics 48 / 1

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation Rosemary Renaut Collaborators: Jodi Mead and Iveta Hnetynkova DEPARTMENT OF MATHEMATICS

More information

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation Rosemary Renaut DEPARTMENT OF MATHEMATICS AND STATISTICS Prague 2008 MATHEMATICS AND STATISTICS

More information

A Hybrid LSQR Regularization Parameter Estimation Algorithm for Large Scale Problems

A Hybrid LSQR Regularization Parameter Estimation Algorithm for Large Scale Problems A Hybrid LSQR Regularization Parameter Estimation Algorithm for Large Scale Problems Rosemary Renaut Joint work with Jodi Mead and Iveta Hnetynkova SIAM Annual Meeting July 10, 2009 National Science Foundation:

More information

Newton s method for obtaining the Regularization Parameter for Regularized Least Squares

Newton s method for obtaining the Regularization Parameter for Regularized Least Squares Newton s method for obtaining the Regularization Parameter for Regularized Least Squares Rosemary Renaut, Jodi Mead Arizona State and Boise State International Linear Algebra Society, Cancun Renaut and

More information

Newton s Method for Estimating the Regularization Parameter for Least Squares: Using the Chi-curve

Newton s Method for Estimating the Regularization Parameter for Least Squares: Using the Chi-curve Newton s Method for Estimating the Regularization Parameter for Least Squares: Using the Chi-curve Rosemary Renaut, Jodi Mead Arizona State and Boise State Copper Mountain Conference on Iterative Methods

More information

Regularization Parameter Estimation for Least Squares: A Newton method using the χ 2 -distribution

Regularization Parameter Estimation for Least Squares: A Newton method using the χ 2 -distribution Regularization Parameter Estimation for Least Squares: A Newton method using the χ 2 -distribution Rosemary Renaut, Jodi Mead Arizona State and Boise State September 2007 Renaut and Mead (ASU/Boise) Scalar

More information

SIGNAL AND IMAGE RESTORATION: SOLVING

SIGNAL AND IMAGE RESTORATION: SOLVING 1 / 55 SIGNAL AND IMAGE RESTORATION: SOLVING ILL-POSED INVERSE PROBLEMS - ESTIMATING PARAMETERS Rosemary Renaut http://math.asu.edu/ rosie CORNELL MAY 10, 2013 2 / 55 Outline Background Parameter Estimation

More information

A Newton Root-Finding Algorithm For Estimating the Regularization Parameter For Solving Ill- Conditioned Least Squares Problems

A Newton Root-Finding Algorithm For Estimating the Regularization Parameter For Solving Ill- Conditioned Least Squares Problems Boise State University ScholarWorks Mathematics Faculty Publications and Presentations Department of Mathematics 1-1-2009 A Newton Root-Finding Algorithm For Estimating the Regularization Parameter For

More information

Towards Improved Sensitivity in Feature Extraction from Signals: one and two dimensional

Towards Improved Sensitivity in Feature Extraction from Signals: one and two dimensional Towards Improved Sensitivity in Feature Extraction from Signals: one and two dimensional Rosemary Renaut, Hongbin Guo, Jodi Mead and Wolfgang Stefan Supported by Arizona Alzheimer s Research Center and

More information

THE χ 2 -CURVE METHOD OF PARAMETER ESTIMATION FOR GENERALIZED TIKHONOV REGULARIZATION

THE χ 2 -CURVE METHOD OF PARAMETER ESTIMATION FOR GENERALIZED TIKHONOV REGULARIZATION THE χ -CURVE METHOD OF PARAMETER ESTIMATION FOR GENERALIZED TIKHONOV REGULARIZATION JODI MEAD AND ROSEMARY A RENAUT Abstract. We discuss algorithms for estimating least squares regularization parameters

More information

Regularization parameter estimation for underdetermined problems by the χ 2 principle with application to 2D focusing gravity inversion

Regularization parameter estimation for underdetermined problems by the χ 2 principle with application to 2D focusing gravity inversion Regularization parameter estimation for underdetermined problems by the χ principle with application to D focusing gravity inversion Saeed Vatankhah, Rosemary A Renaut and Vahid E Ardestani, Institute

More information

Regularization Parameter Estimation for Underdetermined problems by the χ 2 principle with application to 2D focusing gravity inversion

Regularization Parameter Estimation for Underdetermined problems by the χ 2 principle with application to 2D focusing gravity inversion Regularization Parameter Estimation for Underdetermined problems by the χ principle with application to D focusing gravity inversion Saeed Vatankhah, Rosemary A Renaut and Vahid E Ardestani, Institute

More information

COMPUTATIONAL ISSUES RELATING TO INVERSION OF PRACTICAL DATA: WHERE IS THE UNCERTAINTY? CAN WE SOLVE Ax = b?

COMPUTATIONAL ISSUES RELATING TO INVERSION OF PRACTICAL DATA: WHERE IS THE UNCERTAINTY? CAN WE SOLVE Ax = b? COMPUTATIONAL ISSUES RELATING TO INVERSION OF PRACTICAL DATA: WHERE IS THE UNCERTAINTY? CAN WE SOLVE Ax = b? Rosemary Renaut http://math.asu.edu/ rosie BRIDGING THE GAP? OCT 2, 2012 Discussion Yuen: Solve

More information

Inverse Ill Posed Problems in Image Processing

Inverse Ill Posed Problems in Image Processing Inverse Ill Posed Problems in Image Processing Image Deblurring I. Hnětynková 1,M.Plešinger 2,Z.Strakoš 3 hnetynko@karlin.mff.cuni.cz, martin.plesinger@tul.cz, strakos@cs.cas.cz 1,3 Faculty of Mathematics

More information

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Preconditioning. Noisy, Ill-Conditioned Linear Systems Preconditioning Noisy, Ill-Conditioned Linear Systems James G. Nagy Emory University Atlanta, GA Outline 1. The Basic Problem 2. Regularization / Iterative Methods 3. Preconditioning 4. Example: Image

More information

Golub-Kahan iterative bidiagonalization and determining the noise level in the data

Golub-Kahan iterative bidiagonalization and determining the noise level in the data Golub-Kahan iterative bidiagonalization and determining the noise level in the data Iveta Hnětynková,, Martin Plešinger,, Zdeněk Strakoš, * Charles University, Prague ** Academy of Sciences of the Czech

More information

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Preconditioning. Noisy, Ill-Conditioned Linear Systems Preconditioning Noisy, Ill-Conditioned Linear Systems James G. Nagy Emory University Atlanta, GA Outline 1. The Basic Problem 2. Regularization / Iterative Methods 3. Preconditioning 4. Example: Image

More information

Regularization methods for large-scale, ill-posed, linear, discrete, inverse problems

Regularization methods for large-scale, ill-posed, linear, discrete, inverse problems Regularization methods for large-scale, ill-posed, linear, discrete, inverse problems Silvia Gazzola Dipartimento di Matematica - Università di Padova January 10, 2012 Seminario ex-studenti 2 Silvia Gazzola

More information

Inverse problem and optimization

Inverse problem and optimization Inverse problem and optimization Laurent Condat, Nelly Pustelnik CNRS, Gipsa-lab CNRS, Laboratoire de Physique de l ENS de Lyon Decembre, 15th 2016 Inverse problem and optimization 2/36 Plan 1. Examples

More information

A fast nonstationary preconditioning strategy for ill-posed problems, with application to image deblurring

A fast nonstationary preconditioning strategy for ill-posed problems, with application to image deblurring A fast nonstationary preconditioning strategy for ill-posed problems, with application to image deblurring Marco Donatelli Dept. of Science and High Tecnology U. Insubria (Italy) Joint work with M. Hanke

More information

Tikhonov Regularization of Large Symmetric Problems

Tikhonov Regularization of Large Symmetric Problems NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2000; 00:1 11 [Version: 2000/03/22 v1.0] Tihonov Regularization of Large Symmetric Problems D. Calvetti 1, L. Reichel 2 and A. Shuibi

More information

Numerical Methods for Separable Nonlinear Inverse Problems with Constraint and Low Rank

Numerical Methods for Separable Nonlinear Inverse Problems with Constraint and Low Rank Numerical Methods for Separable Nonlinear Inverse Problems with Constraint and Low Rank Taewon Cho Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial

More information

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

ECE295, Data Assimila0on and Inverse Problems, Spring 2015 ECE295, Data Assimila0on and Inverse Problems, Spring 2015 1 April, Intro; Linear discrete Inverse problems (Aster Ch 1 and 2) Slides 8 April, SVD (Aster ch 2 and 3) Slides 15 April, RegularizaFon (ch

More information

On nonstationary preconditioned iterative regularization methods for image deblurring

On nonstationary preconditioned iterative regularization methods for image deblurring On nonstationary preconditioned iterative regularization methods for image deblurring Alessandro Buccini joint work with Prof. Marco Donatelli University of Insubria Department of Science and High Technology

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Laboratorio di Problemi Inversi Esercitazione 2: filtraggio spettrale

Laboratorio di Problemi Inversi Esercitazione 2: filtraggio spettrale Laboratorio di Problemi Inversi Esercitazione 2: filtraggio spettrale Luca Calatroni Dipartimento di Matematica, Universitá degli studi di Genova Aprile 2016. Luca Calatroni (DIMA, Unige) Esercitazione

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Computational Methods. Least Squares Approximation/Optimization

Computational Methods. Least Squares Approximation/Optimization Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the

More information

Krylov subspace iterative methods for nonsymmetric discrete ill-posed problems in image restoration

Krylov subspace iterative methods for nonsymmetric discrete ill-posed problems in image restoration Krylov subspace iterative methods for nonsymmetric discrete ill-posed problems in image restoration D. Calvetti a, B. Lewis b and L. Reichel c a Department of Mathematics, Case Western Reserve University,

More information

Lecture 6. Regularized least-squares and minimum-norm methods 6 1

Lecture 6. Regularized least-squares and minimum-norm methods 6 1 Regularized least-squares and minimum-norm methods 6 1 Lecture 6 Regularized least-squares and minimum-norm methods EE263 Autumn 2004 multi-objective least-squares regularized least-squares nonlinear least-squares

More information

Functional SVD for Big Data

Functional SVD for Big Data Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem

More information

ITERATIVE REGULARIZATION WITH MINIMUM-RESIDUAL METHODS

ITERATIVE REGULARIZATION WITH MINIMUM-RESIDUAL METHODS BIT Numerical Mathematics 6-3835/3/431-1 $16. 23, Vol. 43, No. 1, pp. 1 18 c Kluwer Academic Publishers ITERATIVE REGULARIZATION WITH MINIMUM-RESIDUAL METHODS T. K. JENSEN and P. C. HANSEN Informatics

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Inverse Theory. COST WaVaCS Winterschool Venice, February Stefan Buehler Luleå University of Technology Kiruna

Inverse Theory. COST WaVaCS Winterschool Venice, February Stefan Buehler Luleå University of Technology Kiruna Inverse Theory COST WaVaCS Winterschool Venice, February 2011 Stefan Buehler Luleå University of Technology Kiruna Overview Inversion 1 The Inverse Problem 2 Simple Minded Approach (Matrix Inversion) 3

More information

Mathematical Beer Goggles or The Mathematics of Image Processing

Mathematical Beer Goggles or The Mathematics of Image Processing How Mathematical Beer Goggles or The Mathematics of Image Processing Department of Mathematical Sciences University of Bath Postgraduate Seminar Series University of Bath 12th February 2008 1 How 2 How

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

Principles of the Global Positioning System Lecture 11

Principles of the Global Positioning System Lecture 11 12.540 Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring http://geoweb.mit.edu/~tah/12.540 Statistical approach to estimation Summary Look at estimation from statistical point

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Discrete ill posed problems

Discrete ill posed problems Discrete ill posed problems Gérard MEURANT October, 2008 1 Introduction to ill posed problems 2 Tikhonov regularization 3 The L curve criterion 4 Generalized cross validation 5 Comparisons of methods Introduction

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS SILVIA NOSCHESE AND LOTHAR REICHEL Abstract. Truncated singular value decomposition (TSVD) is a popular method for solving linear discrete ill-posed

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information c de Gruyter 27 J. Inv. Ill-Posed Problems 5 (27), 2 DOI.55 / JIP.27.xxx Parameter estimation: A new approach to weighting a priori information J.L. Mead Communicated by Abstract. We propose a new approach

More information

Iterative Methods for Ill-Posed Problems

Iterative Methods for Ill-Posed Problems Iterative Methods for Ill-Posed Problems Based on joint work with: Serena Morigi Fiorella Sgallari Andriy Shyshkov Salt Lake City, May, 2007 Outline: Inverse and ill-posed problems Tikhonov regularization

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation

More information

UPRE Method for Total Variation Parameter Selection

UPRE Method for Total Variation Parameter Selection UPRE Method for Total Variation Parameter Selection Youzuo Lin School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ 85287 USA. Brendt Wohlberg 1, T-5, Los Alamos National

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

ETNA Kent State University

ETNA Kent State University Electronic Transactions on Numerical Analysis. Volume 28, pp. 49-67, 28. Copyright 28,. ISSN 68-963. A WEIGHTED-GCV METHOD FOR LANCZOS-HYBRID REGULARIZATION JULIANNE CHUNG, JAMES G. NAGY, AND DIANNE P.

More information

LPA-ICI Applications in Image Processing

LPA-ICI Applications in Image Processing LPA-ICI Applications in Image Processing Denoising Deblurring Derivative estimation Edge detection Inverse halftoning Denoising Consider z (x) =y (x)+η (x), wherey is noise-free image and η is noise. assume

More information

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact Zdeněk Strakoš Academy of Sciences and Charles University, Prague http://www.cs.cas.cz/ strakos Hong Kong, February

More information

Signal Denoising with Wavelets

Signal Denoising with Wavelets Signal Denoising with Wavelets Selin Aviyente Department of Electrical and Computer Engineering Michigan State University March 30, 2010 Introduction Assume an additive noise model: x[n] = f [n] + w[n]

More information

Binary choice 3.3 Maximum likelihood estimation

Binary choice 3.3 Maximum likelihood estimation Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Numerical Linear Algebra and. Image Restoration

Numerical Linear Algebra and. Image Restoration Numerical Linear Algebra and Image Restoration Maui High Performance Computing Center Wednesday, October 8, 2003 James G. Nagy Emory University Atlanta, GA Thanks to: AFOSR, Dave Tyler, Stuart Jefferies,

More information

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0 Singular value decomposition If only the first p singular values are nonzero we write G =[U p U o ] " Sp 0 0 0 # [V p V o ] T U p represents the first p columns of U U o represents the last N-p columns

More information

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2 HE SINGULAR VALUE DECOMPOSIION he SVD existence - properties. Pseudo-inverses and the SVD Use of SVD for least-squares problems Applications of the SVD he Singular Value Decomposition (SVD) heorem For

More information

Extension of GKB-FP algorithm to large-scale general-form Tikhonov regularization

Extension of GKB-FP algorithm to large-scale general-form Tikhonov regularization Extension of GKB-FP algorithm to large-scale general-form Tikhonov regularization Fermín S. Viloche Bazán, Maria C. C. Cunha and Leonardo S. Borges Department of Mathematics, Federal University of Santa

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Tikhonov Regularization for Weighted Total Least Squares Problems

Tikhonov Regularization for Weighted Total Least Squares Problems Tikhonov Regularization for Weighted Total Least Squares Problems Yimin Wei Naimin Zhang Michael K. Ng Wei Xu Abstract In this paper, we study and analyze the regularized weighted total least squares (RWTLS)

More information

Regularizing inverse problems. Damping and smoothing and choosing...

Regularizing inverse problems. Damping and smoothing and choosing... Regularizing inverse problems Damping and smoothing and choosing... 141 Regularization The idea behind SVD is to limit the degree of freedom in the model and fit the data to an acceptable level. Retain

More information

Structural Damage Detection Using Time Windowing Technique from Measured Acceleration during Earthquake

Structural Damage Detection Using Time Windowing Technique from Measured Acceleration during Earthquake Structural Damage Detection Using Time Windowing Technique from Measured Acceleration during Earthquake Seung Keun Park and Hae Sung Lee ABSTRACT This paper presents a system identification (SI) scheme

More information

Computer Vision & Digital Image Processing

Computer Vision & Digital Image Processing Computer Vision & Digital Image Processing Image Restoration and Reconstruction I Dr. D. J. Jackson Lecture 11-1 Image restoration Restoration is an objective process that attempts to recover an image

More information

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 1 One Picture and a Thousand Words Using Matrix Approximations Dianne P. O Leary

More information

Downloaded 06/11/15 to Redistribution subject to SIAM license or copyright; see

Downloaded 06/11/15 to Redistribution subject to SIAM license or copyright; see SIAM J. SCI. COMPUT. Vol. 37, No. 2, pp. B332 B359 c 2015 Society for Industrial and Applied Mathematics A FRAMEWORK FOR REGULARIZATION VIA OPERATOR APPROXIMATION JULIANNE M. CHUNG, MISHA E. KILMER, AND

More information

Generalized Local Regularization for Ill-Posed Problems

Generalized Local Regularization for Ill-Posed Problems Generalized Local Regularization for Ill-Posed Problems Patricia K. Lamm Department of Mathematics Michigan State University AIP29 July 22, 29 Based on joint work with Cara Brooks, Zhewei Dai, and Xiaoyue

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

10. Multi-objective least squares

10. Multi-objective least squares L Vandenberghe ECE133A (Winter 2018) 10 Multi-objective least squares multi-objective least squares regularized data fitting control estimation and inversion 10-1 Multi-objective least squares we have

More information

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016 Linear models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 5, 2016 1 / 16 Outline for today linear models least squares estimation orthogonal projections estimation

More information

c 1999 Society for Industrial and Applied Mathematics

c 1999 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 21, No. 1, pp. 185 194 c 1999 Society for Industrial and Applied Mathematics TIKHONOV REGULARIZATION AND TOTAL LEAST SQUARES GENE H. GOLUB, PER CHRISTIAN HANSEN, AND DIANNE

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

8 The SVD Applied to Signal and Image Deblurring

8 The SVD Applied to Signal and Image Deblurring 8 The SVD Applied to Signal and Image Deblurring We will discuss the restoration of one-dimensional signals and two-dimensional gray-scale images that have been contaminated by blur and noise. After an

More information

Lanczos Bidiagonalization with Subspace Augmentation for Discrete Inverse Problems

Lanczos Bidiagonalization with Subspace Augmentation for Discrete Inverse Problems Lanczos Bidiagonalization with Subspace Augmentation for Discrete Inverse Problems Per Christian Hansen Technical University of Denmark Ongoing work with Kuniyoshi Abe, Gifu Dedicated to Dianne P. O Leary

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,

More information

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL. Adaptive Filtering Fundamentals of Least Mean Squares with MATLABR Alexander D. Poularikas University of Alabama, Huntsville, AL CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is

More information

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee J. Korean Math. Soc. 0 (0), No. 0, pp. 1 0 https://doi.org/10.4134/jkms.j160152 pissn: 0304-9914 / eissn: 2234-3008 DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM

More information

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is Likelihood Methods 1 Likelihood Functions The multivariate normal distribution likelihood function is The log of the likelihood, say L 1 is Ly = π.5n V.5 exp.5y Xb V 1 y Xb. L 1 = 0.5[N lnπ + ln V +y Xb

More information

Least-Squares Performance of Analog Product Codes

Least-Squares Performance of Analog Product Codes Copyright 004 IEEE Published in the Proceedings of the Asilomar Conference on Signals, Systems and Computers, 7-0 ovember 004, Pacific Grove, California, USA Least-Squares Performance of Analog Product

More information

Lecture 1: Numerical Issues from Inverse Problems (Parameter Estimation, Regularization Theory, and Parallel Algorithms)

Lecture 1: Numerical Issues from Inverse Problems (Parameter Estimation, Regularization Theory, and Parallel Algorithms) Lecture 1: Numerical Issues from Inverse Problems (Parameter Estimation, Regularization Theory, and Parallel Algorithms) Youzuo Lin 1 Joint work with: Rosemary A. Renaut 2 Brendt Wohlberg 1 Hongbin Guo

More information

JACOBI S ITERATION METHOD

JACOBI S ITERATION METHOD ITERATION METHODS These are methods which compute a sequence of progressively accurate iterates to approximate the solution of Ax = b. We need such methods for solving many large linear systems. Sometimes

More information

10. Linear Models and Maximum Likelihood Estimation

10. Linear Models and Maximum Likelihood Estimation 10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that

More information

Optimal State Estimators for Linear Systems with Unknown Inputs

Optimal State Estimators for Linear Systems with Unknown Inputs Optimal tate Estimators for Linear ystems with Unknown Inputs hreyas undaram and Christoforos N Hadjicostis Abstract We present a method for constructing linear minimum-variance unbiased state estimators

More information

Introduction. Chapter One

Introduction. Chapter One Chapter One Introduction The aim of this book is to describe and explain the beautiful mathematical relationships between matrices, moments, orthogonal polynomials, quadrature rules and the Lanczos and

More information

Deconvolution. Parameter Estimation in Linear Inverse Problems

Deconvolution. Parameter Estimation in Linear Inverse Problems Image Parameter Estimation in Linear Inverse Problems Chair for Computer Aided Medical Procedures & Augmented Reality Department of Computer Science, TUM November 10, 2006 Contents A naive approach......with

More information

Linear Inverse Problems

Linear Inverse Problems Linear Inverse Problems Ajinkya Kadu Utrecht University, The Netherlands February 26, 2018 Outline Introduction Least-squares Reconstruction Methods Examples Summary Introduction 2 What are inverse problems?

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x

More information

An l 2 -l q regularization method for large discrete ill-posed problems

An l 2 -l q regularization method for large discrete ill-posed problems Noname manuscript No. (will be inserted by the editor) An l -l q regularization method for large discrete ill-posed problems Alessandro Buccini Lothar Reichel the date of receipt and acceptance should

More information

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING JIAN-FENG CAI, STANLEY OSHER, AND ZUOWEI SHEN Abstract. Real images usually have sparse approximations under some tight frame systems derived

More information

The Kalman filter is arguably one of the most notable algorithms

The Kalman filter is arguably one of the most notable algorithms LECTURE E NOTES «Kalman Filtering with Newton s Method JEFFREY HUMPHERYS and JEREMY WEST The Kalman filter is arguably one of the most notable algorithms of the 0th century [1]. In this article, we derive

More information

Inverse scattering problem from an impedance obstacle

Inverse scattering problem from an impedance obstacle Inverse Inverse scattering problem from an impedance obstacle Department of Mathematics, NCKU 5 th Workshop on Boundary Element Methods, Integral Equations and Related Topics in Taiwan NSYSU, October 4,

More information

On the regularization properties of some spectral gradient methods

On the regularization properties of some spectral gradient methods On the regularization properties of some spectral gradient methods Daniela di Serafino Department of Mathematics and Physics, Second University of Naples daniela.diserafino@unina2.it contributions from

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

System Identification

System Identification System Identification Lecture : Statistical properties of parameter estimators, Instrumental variable methods Roy Smith 8--8. 8--8. Statistical basis for estimation methods Parametrised models: G Gp, zq,

More information

6 The SVD Applied to Signal and Image Deblurring

6 The SVD Applied to Signal and Image Deblurring 6 The SVD Applied to Signal and Image Deblurring We will discuss the restoration of one-dimensional signals and two-dimensional gray-scale images that have been contaminated by blur and noise. After an

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information