Lecture 1: Numerical Issues from Inverse Problems (Parameter Estimation, Regularization Theory, and Parallel Algorithms)

Size: px

Start display at page:

Download "Lecture 1: Numerical Issues from Inverse Problems (Parameter Estimation, Regularization Theory, and Parallel Algorithms)"

Josephine Oliver
6 years ago
Views:

1 Lecture 1: Numerical Issues from Inverse Problems (Parameter Estimation, Regularization Theory, and Parallel Algorithms) Youzuo Lin 1 Joint work with: Rosemary A. Renaut 2 Brendt Wohlberg 1 Hongbin Guo 2 1. Los Alamos National Laboratory 2. School of Mathematical and Statistical Sciences, Arizona State University Graduate Student Workshop on Inverse Problem, 2016

2 Outline 1 Introduction: Inverse Problem and Regularization 2 Topic 1: Multisplitting for Regularized Least Squares Regularization Parallelization Multiple Right Hand Side Problem 3 Topic 2: Total-Variation Regularization Numerical Methods for Total Variation Optimal Parameter Selection, UPRE Approach 4 Topic 3: Projected Krylov Subspace Solvers on GPU Fine-Grained Parallelism Model for Krylov Subspace Solvers on GPU Optimizations of Krylov Subspace Algorithms on GPU Numerical Results 5 References Inverse Theory Inverse Problems Workshop 1 / 64

3 Outline 1 Introduction: Inverse Problem and Regularization 2 Topic 1: Multisplitting for Regularized Least Squares Regularization Parallelization Multiple Right Hand Side Problem 3 Topic 2: Total-Variation Regularization Numerical Methods for Total Variation Optimal Parameter Selection, UPRE Approach 4 Topic 3: Projected Krylov Subspace Solvers on GPU Fine-Grained Parallelism Model for Krylov Subspace Solvers on GPU Optimizations of Krylov Subspace Algorithms on GPU Numerical Results 5 References Inverse Theory Inverse Problems Workshop 2 / 64

4 Inverse Problems and Ill-posedness General Linear Inverse Problem b = Af + η, Measured data b and system (or transform) matrix A Input Data f Inverse Theory Inverse Problems Workshop 2 / 64

5 Inverse Problems and Ill-posedness General Linear Inverse Problem b = Af + η, Measured data b and system (or transform) matrix A Input Data f Image Restoration - Deconvolution Transform matrix A is blurring matrix, η is the noise Inverse Theory Inverse Problems Workshop 2 / 64

6 Inverse Problems and Ill-posedness General Linear Inverse Problem b = Af + η, Measured data b and system (or transform) matrix A Input Data f Image Restoration - Deconvolution Transform matrix A is blurring matrix, η is the noise Image Reconstruction - Tomography Transform matrix A is Radon transform, η is the noise Inverse Theory Inverse Problems Workshop 2 / 64

7 Inverse Problems and Ill-posedness General Linear Inverse Problem b = Af + η, Measured data b and system (or transform) matrix A Input Data f Image Restoration - Deconvolution Transform matrix A is blurring matrix, η is the noise Image Reconstruction - Tomography Transform matrix A is Radon transform, η is the noise Inverse Theory Inverse Problems Workshop 2 / 64

8 Direct Inverse Filtering How about f = A 1 b? Inverse Theory Inverse Problems Workshop 3 / 64

9 Direct Inverse Filtering How about f = A 1 b? Figure: Left: Blurred and Noisy Image with SNR of (db), Right: Restored Image by Direct Inversion with SNR of (db) Inverse Theory Inverse Problems Workshop 3 / 64

10 Constrained Least Squares and Regularization Model Constrained Least Squares: min f {J(f)}, subject to Af b 2 2 = σ2. Regularization Model: { } min Af b 2 2 +λ2 J(f), λ > 0, f Af b 2 2 : Data Fidelity Term J(f) : Regularization Term λ : Regularization Parameter Inverse Theory Inverse Problems Workshop 4 / 64

11 Specific Regularization Tikhonov Regularization: When J(f) is a quadratic functional f T L T Lf, { } min Af b 2 2 +λ2 Lf 2 2, f Total Variation (TV) Regularization: When J(f) is a nonquadratic functional like Total Variation, { } min Af b 2 2 +λ2 f 1, f Others: Learning Based Regularization, Compressive Sensing etc. Inverse Theory Inverse Problems Workshop 5 / 64

12 Outline 1 Introduction: Inverse Problem and Regularization 2 Topic 1: Multisplitting for Regularized Least Squares Regularization Parallelization Multiple Right Hand Side Problem 3 Topic 2: Total-Variation Regularization Numerical Methods for Total Variation Optimal Parameter Selection, UPRE Approach 4 Topic 3: Projected Krylov Subspace Solvers on GPU Fine-Grained Parallelism Model for Krylov Subspace Solvers on GPU Optimizations of Krylov Subspace Algorithms on GPU Numerical Results 5 References Inverse Theory Inverse Problems Workshop 6 / 64

13 Topic 1: Multisplitting for Regularized Least Squares [Renaut, Lin and Guo, 2009] Regularization Parallelization Bottleneck for Applications Formulation-Multisplitting and Domain Decomposition Local Solver Selection and Computational Cost Update Scheme Multiple Right-Hand Sides Problem A Second Look at Multisplitting Hybrid Regularization with Multisplitting and Multiple Right Hand Sides Computational Cost Analysis Application in Image Reconstruction and Restoration Numerical Results Inverse Theory Inverse Problems Workshop 6 / 64

14 Tikhonov Solution and Memory Bottleneck Solving for the TK regularization is equivalent to solving for a large scale linear system: (A T A + λ 2 L T L)f = A T b. What about RAM requirement? In the positron emission tomography (PET) image reconstruction example in single precision, for an image of size , with projection angles from 0 o to 180 o with a 2 o increment, the required RAM for storing A is:: GB! What can we do about this? We apply Domain Decomposition idea to our problem. Inverse Theory Inverse Problems Workshop 7 / 64

15 Formulation - Multisplitting Linear multisplitting (LMS): Given a matrix A R n n and a collection of matrices M (i), N (i), E (i) R n n, i = 1,..., p satisfying 1 A = M (i) N (i) for each i, i = 1,..., p. 2 M (i) is regular (nonsingular), i = 1,..., p. 3 E (i) is a non-negative diagonal matrix, i = 1,..., p and p i=1 E (i) = I. Then the collection of triples (M (i), N (i), E (i) ), i = 1,..., p is called a multisplitting of A and the LMS method is defined by the iteration: x (k+1) = p E (i) (M (i) ) 1 (N (i) x (k) + fb). i=1 Inverse Theory Inverse Problems Workshop 8 / 64

16 Formulation - Domain Decomposition Domain decomposition is a particular method of multisplitting in which the submatrices M (i) are defined to be consistent with a partitioning of the domain: x = (x T 1, x T 2,..., x T p ) T. Inverse Theory Inverse Problems Workshop 9 / 64

17 Formulation - Domain Decomposition Domain decomposition is a particular method of multisplitting in which the submatrices M (i) are defined to be consistent with a partitioning of the domain: x = (x T 1, x T 2,..., x T p ) T. eg: Different Splitting Schemes Inverse Theory Inverse Problems Workshop 9 / 64

18 Formulation - Operator Split [Renaut 1998] Corresponding to different splitting of the image x, we will have splitting on kernel operator A as, A = (A 1, A 2,..., A p ), Inverse Theory Inverse Problems Workshop 10 / 64

19 Formulation - Operator Split [Renaut 1998] Corresponding to different splitting of the image x, we will have splitting on kernel operator A as, A = (A 1, A 2,..., A p ), The linear system Ax = b is replaced with the split systems A i y i = bi (x), where b i (x) = b j i A jx j = b Ax + A i x i. Inverse Theory Inverse Problems Workshop 10 / 64

20 Formulation - Local Problem Local Problem - Solving Ax = b turns into solving p subproblems, min A i y i b i (x) 2, 1 i p. y R n i Inverse Theory Inverse Problems Workshop 11 / 64

21 Formulation - Local Problem Local Problem - Solving Ax = b turns into solving p subproblems, min A i y i b i (x) 2, 1 i p. y R n i Update Scheme - The global solution update from local solutions at step K to step K + 1 is given by, x (k+1) = p i=1 τ (k+1) i (x local ) (k+1) i, where (x local ) (k+1) i = ((x (k) 1 )T,..., (x (k) i 1 )T, (y (k+1) i ) T, (x (k) i+1 )T,..., (x (k) p ) T ) T Inverse Theory Inverse Problems Workshop 11 / 64

22 Formulation - Regularized Multisplitting Tikhonov Regularization: } min { Ax b 2 x 2 +λ2 Lx 2 2, λ > 0, Inverse Theory Inverse Problems Workshop 12 / 64

23 Formulation - Regularized Multisplitting Tikhonov Regularization: } min { Ax b 2 x 2 +λ2 Lx 2 2, λ > 0, equivalent to, ( ) min A x LΛ ( ) b 2, 0 2 where Λ is block diagonal with entries λi ni. Inverse Theory Inverse Problems Workshop 12 / 64

24 Formulation - Regularized Multisplitting Similarly, we will have splitting on the operator as, ( ) (( ) ( ) ( )) A A1 A2 Ap =. LΛ λ 1 L 1 λ 2 L 2 λ p L p Inverse Theory Inverse Problems Workshop 13 / 64

25 Formulation - Regularized Multisplitting Similarly, we will have splitting on the operator as, ( ) (( ) ( ) ( )) A A1 A2 Ap =. LΛ λ 1 L 1 λ 2 L 2 λ p L p i th block equation of the normal equations is given by p (A T i A i + λ 2 i LT i L i )y i = A T i b i (x) λ i λ j L T i L j x j. j=1,j i Inverse Theory Inverse Problems Workshop 13 / 64

26 Local Solver Selection and Computational Cost How to select the local solver? Inverse Theory Inverse Problems Workshop 14 / 64

27 Local Solver Selection and Computational Cost How to select the local solver? Computation Cost and Memory Cost Inverse Theory Inverse Problems Workshop 14 / 64

28 Local Solver Selection and Computational Cost How to select the local solver? Computation Cost and Memory Cost Iterative Solver: Conjugate Gradient for Least Squares (CGLS) Inverse Theory Inverse Problems Workshop 14 / 64

29 Local Solver Selection and Computational Cost How to select the local solver? Computation Cost and Memory Cost Iterative Solver: Conjugate Gradient for Least Squares (CGLS) Computation Cost: C CGLS 2n 2 i m + K (2(k i + 1)n 2 i + 10k i n i ) Memory Cost: Matrix vector multiplication, only need to store matrix A i Inverse Theory Inverse Problems Workshop 14 / 64

30 Local Solver Selection and Computational Cost How to select the local solver? Computation Cost and Memory Cost Iterative Solver: Conjugate Gradient for Least Squares (CGLS) Computation Cost: C CGLS 2n 2 i m + K (2(k i + 1)n 2 i + 10k i n i ) Memory Cost: Matrix vector multiplication, only need to store matrix A i Solver: CGLS seems to be reasonable at this point! Inverse Theory Inverse Problems Workshop 14 / 64

31 Four Different Update Schemes Block Jacobian Update Set τ (k+1) i = τ i = 1, so x (k+1) = ((y (k) 1 )T, (y (k) 2 )T,..., (y (k) i 1 )T, (y (k+1) i ) T, (y (k) i+1 )T,..., (y (k) p ) T ) T. Inverse Theory Inverse Problems Workshop 15 / 64

32 Four Different Update Schemes Block Jacobian Update Set τ (k+1) i = τ i = 1, so x (k+1) = ((y (k) 1 )T, (y (k) 2 )T,..., (y (k) i 1 )T, (y (k+1) i ) T, (y (k) i+1 )T,..., (y (k) p ) T ) T. Convex Update Set τ (k+1) i = τ i = 1 p, x (k+1) = x (k) + 1 p (y(k+1) x (k) ). Inverse Theory Inverse Problems Workshop 15 / 64

33 Four Different Update Schemes Block Jacobian Update Set τ (k+1) i = τ i = 1, so x (k+1) = ((y (k) 1 )T, (y (k) 2 )T,..., (y (k) i 1 )T, (y (k+1) i ) T, (y (k) i+1 )T,..., (y (k) p ) T ) T. Convex Update Set τ (k+1) i = τ i = 1 p, x (k+1) = x (k) + 1 p (y(k+1) x (k) ). Local Update x (k+1) = (x local ) (k+1) i. Inverse Theory Inverse Problems Workshop 15 / 64

34 Four Different Update Schemes Block Jacobian Update Set τ (k+1) i = τ i = 1, so x (k+1) = ((y (k) 1 )T, (y (k) 2 )T,..., (y (k) i 1 )T, (y (k+1) i ) T, (y (k) i+1 )T,..., (y (k) p ) T ) T. Convex Update Set τ (k+1) i = τ i = 1 p, x (k+1) = x (k) + 1 p (y(k+1) x (k) ). Local Update x (k+1) = (x local ) (k+1) i. Optimal τ Update min τ r(x (k+1) ), where r(x (k+1) ) = r(x (k) ) p i=1 ( ) τ (k+1) Ai i δ (k+1) λ i L i. i Inverse Theory Inverse Problems Workshop 15 / 64

35 A Second Look at the Subproblem The i th block equation of the normal equations is given by p (A T i A i + λ 2 i LT i L i )y i = A T i b i (x) λ i λ j L T i L j x j. j=1,j i The system matrix is unchanged; Inverse Theory Inverse Problems Workshop 16 / 64

36 A Second Look at the Subproblem Rewrite the right hand side (RHS) as, p b i (x (k+1) ) = b i (x (k) ) where B (k+1) j = A j (x (k+1) j x k j overlapped problem. j=1,j i ), OB(k+1) i τ (k+1) j B (k+1) j + OB (k+1) is the update of the i, The new RHS is updated based on the previous RHS; Inverse Theory Inverse Problems Workshop 16 / 64

37 Linear System with Multiple RHS Multiple RHS Problem Setup AX = B = [b (1),..., b (s) ], where A R m n, X R n s and B R m s. Inverse Theory Inverse Problems Workshop 17 / 64

38 Linear System with Multiple RHS Multiple RHS Problem Setup AX = B = [b (1),..., b (s) ], where A R m n, X R n s and B R m s. What about solving s systems independently? Direct Method (LU): Major Computational Cost O(n 3 ) + s O(n 2 ); Iterative Method (CGLS): Major Computational Cost s O(n 2 ); Inverse Theory Inverse Problems Workshop 17 / 64

39 Linear System with Multiple RHS Multiple RHS Problem Setup AX = B = [b (1),..., b (s) ], where A R m n, X R n s and B R m s. What about solving s systems independently? Direct Method (LU): Major Computational Cost O(n 3 ) + s O(n 2 ); Iterative Method (CGLS): Major Computational Cost s O(n 2 ); Several algorithms have been proposed to speed up solving the above system. Inverse Theory Inverse Problems Workshop 17 / 64

40 Efficient Iterative Algorithm for Solving Multiple RHS If b (1),..., b (s) are random or completely orthogonal to each other, no way to speed up the algorithm! Inverse Theory Inverse Problems Workshop 18 / 64

41 Efficient Iterative Algorithm for Solving Multiple RHS If b (1),..., b (s) are random or completely orthogonal to each other, no way to speed up the algorithm! If b (1),..., b (s) are close to each other and share information Galerkin-Projection based Krylov subspace methods are proposed for an efficient solver. We choose CG as the specific Krylov subspace method proposed in [Chan and Wan, 1997], because it is well-supported theoretically. Inverse Theory Inverse Problems Workshop 18 / 64

42 Basic Idea of Projected CG Algorithm for Solving MRHS Step 1: Solve the seed system. Select the seed; Apply the CGLS algorithm for solving seed system and save the conjugate directions for Step 2; Inverse Theory Inverse Problems Workshop 19 / 64

43 Basic Idea of Projected CG Algorithm for Solving MRHS Step 1: Solve the seed system. Select the seed; Apply the CGLS algorithm for solving seed system and save the conjugate directions for Step 2; Step 2: Solve the remaining system by projection Use the conjugate directions from the seed systems to span the Krylov subspace Project the systems onto the Krylov subspace for an approximated solution Inverse Theory Inverse Problems Workshop 19 / 64

44 Basic Idea of Projected CG Algorithm for Solving MRHS Step 1: Solve the seed system. Select the seed; Apply the CGLS algorithm for solving seed system and save the conjugate directions for Step 2; Step 2: Solve the remaining system by projection Use the conjugate directions from the seed systems to span the Krylov subspace Project the systems onto the Krylov subspace for an approximated solution Step 3: Restart the seed and Refine the solution A new seed might be selected from the remaining systems to restart the procedure. Refine the approximation, if the accuracy is insufficient. Inverse Theory Inverse Problems Workshop 19 / 64

45 Basic Idea of Projected CG Algorithm for Solving MRHS Step 1: Solve the seed system. Select the seed; Apply the CGLS algorithm for solving seed system and save the conjugate directions for Step 2; Step 2: Solve the remaining system by projection Use the conjugate directions from the seed systems to span the Krylov subspace Project the systems onto the Krylov subspace for an approximated solution Step 3: Restart the seed and Refine the solution A new seed might be selected from the remaining systems to restart the procedure. Refine the approximation, if the accuracy is insufficient. Inverse Theory Inverse Problems Workshop 19 / 64

46 Adapt to Our Problem: Solving Each Subproblem from RLSMS, Scheme 1 Scheme 1 (Project/Restart): Treat the i th subproblem at all iteration as a Multiple RHS with s = 2, i.e. at iteration k = 1, solve the subproblem as seed, for iteration k > 1, solve it as projection and refine it as necessary; Inverse Theory Inverse Problems Workshop 20 / 64

47 Adapt to Our Problem: Solving Each Subproblem from RLSMS, Scheme 1 Scheme 1 (Project/Restart): Treat the i th subproblem at all iteration as a Multiple RHS with s = 2, i.e. at iteration k = 1, solve the subproblem as seed, for iteration k > 1, solve it as projection and refine it as necessary; Comment: Straight forward adaptation, but usually the accuracy of approximation remaining after projection is not enough; Need extra steps for further refinement; Inverse Theory Inverse Problems Workshop 20 / 64

48 Adapt to Our Problem: Solving Each Subproblem from RLSMS, Scheme 2 Scheme 2 (Project/Augment): Based on seed selection as in Scheme 1, in the refinement stages of solving the remaining systems, stores the newly generated conjugate directions with those from solving seeds; Inverse Theory Inverse Problems Workshop 21 / 64

49 Adapt to Our Problem: Solving Each Subproblem from RLSMS, Scheme 2 Scheme 2 (Project/Augment): Based on seed selection as in Scheme 1, in the refinement stages of solving the remaining systems, stores the newly generated conjugate directions with those from solving seeds; Comment: The motivation of this change is again to improve the Krylov subspace. By updating the conjugate directions, we will be able to add in new information which is not from solving the seed system. Numerical tests indicate this scheme works best with large scale problems! Inverse Theory Inverse Problems Workshop 21 / 64

50 Computational Cost Analysis The computation cost of projected CGLS consists of two other components: one is the cost for solving the seed system, which is the cost of the CGLS; the other is the cost for solving the remaining systems, which includes the projection and the few more steps of CGLS as needed: C ProjCG 2(K + 2k i + 2)mn i + (6(K 1)k i + 10k i + 1)n i + µ CG. Inverse Theory Inverse Problems Workshop 22 / 64

51 1-D Phillips, Restoration Result Figure: Restoration results using different methods for the sample of Phillips size of , noise level 6%. Inverse Theory Inverse Problems Workshop 23 / 64

52 1-D Phillips, Global and Local Iteration Result Figure: Maximum numbers of inner iterations against outer iteration count. Inverse Theory Inverse Problems Workshop 24 / 64

2-D Seismic Reconstruction, Phantom Size 64 64 Figure: SNR (Without Splitting): 11.65 db, SNR (CGLS/Zero): 11.

53 2-D Seismic Reconstruction, Phantom Size Figure: SNR (Without Splitting): db, SNR (CGLS/Zero): db, SNR (Project/Restart): db, SNR (Project/Augment): db. Inverse Theory Inverse Problems Workshop 25 / 64

54 2-D Seismic Reconstruction, Global and Local Iteration Result Figure: Maximum numbers of inner iterations against outer iteration count. Inverse Theory Inverse Problems Workshop 26 / 64

55 Outline 1 Introduction: Inverse Problem and Regularization 2 Topic 1: Multisplitting for Regularized Least Squares Regularization Parallelization Multiple Right Hand Side Problem 3 Topic 2: Total-Variation Regularization Numerical Methods for Total Variation Optimal Parameter Selection, UPRE Approach 4 Topic 3: Projected Krylov Subspace Solvers on GPU Fine-Grained Parallelism Model for Krylov Subspace Solvers on GPU Optimizations of Krylov Subspace Algorithms on GPU Numerical Results 5 References Inverse Theory Inverse Problems Workshop 27 / 64

56 Topic 2: Total-Variation Regularization Numerical Methods for Total Variation Iteratively Reweighted Norm Algorithm Lagged Diffusivity Fixed Point Iteration Algorithm Optimal Regularization Parameter Selection, Trace Approximation UPRE Approach Numerical Results Inverse Theory Inverse Problems Workshop 27 / 64

57 Iteratively Reweighted Norm Algorithm Approximate L1 norm by weighted L2 norm, TV regularization becomes: { } 1 min f 2 Af b 2 2 +λ 2 W 1/2 r Df 2 2, ( ) (( A min f { 1 2 I 0 0 W 1/2 r λd ) f ( )) } b 0 2 2, where W r = diag(2((d x f ) 2 + (D y f ) 2 ) 1 2 ). This minimization problem can be solved by CG solver. Ref: B. Wohlberg, P. Rodrguez, An Iteratively Reweighted Norm Algorithm for Minimization of Total Variation Functionals, IEEE Signal Processing Letters, 2007 Inverse Theory Inverse Problems Workshop 28 / 64

58 Lagged Diffusivity Fixed Point Iteration Algorithm Approximate L1 norm by, ψ(f ) = ( df dx + β2 ) Gradient = A T (Af b) + λl(f )f, Approximate Hessian = A T A + λl(f ), where L(f ) = xd T diag(ψ (f ))D. Based on the Quasi-Newton idea, we solve for f iteratively by, f k+1 = f k (A T A + λl(f k )) 1 Gradient, which is called as Lagged Diffusivity Fixed Point Iteration. Ref: C.R. Vogel, M. E. Oman, Iterative Methods for Total Variation, SIAM J. on Sci. Comp, 17(1996) Inverse Theory Inverse Problems Workshop 29 / 64

59 Image Deblurring Comparison using TV and Tikhonov Figure: Left: Original Clean Image, Right: Blurred and Noisy Image Inverse Theory Inverse Problems Workshop 30 / 64

60 Image Deblurring Comparison using TV and Tikhonov Figure: Left: TK based Deblurred Result, Right: TV Based Deblurred Result Inverse Theory Inverse Problems Workshop 31 / 64

61 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

62 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

63 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

64 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

65 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

66 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

67 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

68 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

69 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

70 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

71 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

72 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

73 Problem Description-Simple Example Inverse Theory Inverse Problems Workshop 32 / 64

74 Fundamental Idea of UPRE Derived from the Mean Squared Error (MSE) of the predictive error, 1 n P λ 2 = 1 n Af λ Af true 2, where f λ is the regularized solution, f true is the true solution. To avoid using f true, we define a functional which is the unbiased estimator of the above, and name it UPRE (Unbiased Predictive Risk Estimator). λ opt = arg min UPRE(λ) λ Inverse Theory Inverse Problems Workshop 33 / 64

75 UPRE for Tikhonov Vogel in [1] proved that, because the solution for Tikhonov Regularization is linearly dependent on the right hand side of (2), i.e., f TK = R TK (b + η), where R TK = (A T A + λi) 1 A T, we will have UPRE Tikhonov (λ) = E( 1 n P λ 2 ) = 1 n r λ 2 + 2σ2 n trace(a λ) σ 2, where r λ = Af λ b, A λ = A(A T A + λi) 1 A T, and σ is the variance of the noise. Inverse Theory Inverse Problems Workshop 34 / 64

76 UPRE for TV Difficulty 1: Nonquadratic penalty functional in TV term. Hint: Linearization Approximation! In Lagged Diffusivity approach, f TV = R TV (b + η), where R TV = (A T A + λ 2 L(f k)) 1 A T the influence matrix, A λ = A(A T A + λ 2 L(f k)) 1 A T. A λ can be proved to be symmetric, which will give us the same derivation of the UPRE functional for TV. Inverse Theory Inverse Problems Workshop 35 / 64

77 UPRE for TV Difficulty 2: How to compute trace(a λ ) = trace(a(a T A + λ 2 L(f k)) 1 A T ) Hint: Krylov Space Method! Basic Idea : trace(f (M)) E(u T f (M)u), where u is a discrete multivariate random variable, where each entry takes the values -1 and +1 with probability 0.5, and the matrix M is symmetric positive definite (SPD). Inverse Theory Inverse Problems Workshop 36 / 64

78 UPRE for TV Golub [2] pointed out that, E(u T f (M)u) = k ω i f (θ i ), i=1 where θ i are the eigenvalues of M, and ω i are squares of the first components of the normalized eigenvectors of M. Start the Lanczos procedure on M, and find out the most significant eigenvectors until accuracy is satisfied. Inverse Theory Inverse Problems Workshop 37 / 64

79 Trace Approximation Results Figure: Left: Accurate Trace Vs. Estimated Trace, Right: Time Cost for Computing Accurate Trace Vs. Time Cost for Computing Estimated Trace Inverse Theory Inverse Problems Workshop 38 / 64

80 TV Based Deblurring Using Optimal and Random Parameters Figure: Left: Use Optimal Parameter, Right: Use Random Parameter Inverse Theory Inverse Problems Workshop 39 / 64

81 More General Results for Deblurring Figure: Plot of Optimal SNR and Estimated SNR versus original noisy SNR on satellite image. Inverse Theory Inverse Problems Workshop 40 / 64

82 More General Results for Denoising Figure: Plot of Optimal SNR and Estimated SNR versus original noisy SNR on satellite image. Ref: G. Gilboa, N. Sochen, Y. Y. Zeevi, Estimation of optimal PDE-based denoising in the SNR sense, IEEE Transactions on Image Processing 15(8), Inverse Theory Inverse Problems Workshop 41 / 64

83 Conclusions Tikhonov Regularization: Pros : easy to implement, cheap cost; Cons : the result is usually smoothed out; Total Variation (TV) Regularization: Pros : Preserve image edges provided with reasonable parameter, piecewise constant Cons : Expensive cost Inverse Theory Inverse Problems Workshop 42 / 64

84 Outline 1 Introduction: Inverse Problem and Regularization 2 Topic 1: Multisplitting for Regularized Least Squares Regularization Parallelization Multiple Right Hand Side Problem 3 Topic 2: Total-Variation Regularization Numerical Methods for Total Variation Optimal Parameter Selection, UPRE Approach 4 Topic 3: Projected Krylov Subspace Solvers on GPU Fine-Grained Parallelism Model for Krylov Subspace Solvers on GPU Optimizations of Krylov Subspace Algorithms on GPU Numerical Results 5 References Inverse Theory Inverse Problems Workshop 43 / 64

85 Topic 3: Projected Krylov Subspace Solvers on GPU [Lin and Renaut, 2010] Problem Description and Krylov Subspace Solvers 3-D Medical Image Reconstruction Inverse Linear Systems with Multiple Right-hand Sides Conjugate Gradient Algorithm Fine-Grained Parallelism Model for Krylov Subspace Solvers on GPU GPU Structure BLAS Performance on GPU CG on GPU (ver 0) Optimizations of Krylov Subspace Algorithms on GPU Projected CG Modified Projected CG Numerical Results Inverse Theory Inverse Problems Workshop 43 / 64

86 3-D Medical Image Reconstruction Figure: Image Slices from a 3-D Scan Inverse Theory Inverse Problems Workshop 44 / 64

87 Inverse Linear Systems with Multiple Right-hand Sides Multiple Right-hand Sides System: Ax = B = [b (1),... b (s) ], where b i stands for the i th measurement with respect to the i th slice as in a 3-D medical image. Tikhonov regularization is applied to the above systems. Regularization parameter is assumed to be close enough to the optimal value. Inverse Theory Inverse Problems Workshop 45 / 64

88 Conjugate Gradient Algorithm k = 0, r 0 = b Ax 0 ; while ( r k 2 > TOL b 2 ) k = k + 1; if (k == 1) p 1 = r 0 ; else β k =< r k 1, r k 1 > / < r k 2, r k 2 >; p k = r k 1 + β k p k 1 ; end; α k =< r k 1, r k 1 > / ; x k = x k 1 + α k p k ; r k = r k 1 α k Ap k ; end (while). Inverse Theory Inverse Problems Workshop 46 / 64

89 GPU Structure - NVIDIA Tesla C1060 Figure: NVIDIA Tesla C1060 Inverse Theory Inverse Problems Workshop 47 / 64

GPU Structure - NVIDIA Tesla C1060 Streaming Multiprocessors (SM): 30 Streaming Processors (SP): 8 Register: 16 K per SM Shared Memory: 16 KB per SM Constant Memory: 64 KB Global

90 GPU Structure - NVIDIA Tesla C1060 Streaming Multiprocessors (SM): 30 Streaming Processors (SP): 8 Register: 16 K per SM Shared Memory: 16 KB per SM Constant Memory: 64 KB Global Memory: 4 GB Bandwidth: 102 GB/Sec GFLOPS (PEAK) Single Precision: 933 Double Precision: 78 Figure: Structure of NVIDIA Tesla C1060 Inverse Theory Inverse Problems Workshop 48 / 64

91 BLAS Performance on GPU - Matrix Vector Multiplication Testing Environment Operating System: Ubuntu, 64 bit CPU (Host): Intel Quad Core Xeon E5506, 2.13 GHz GPU (Device): NVIDIA Tesla C1060 Programming on Host: Matlab using Single / Multi Core(s) Programming on Device: CUDA + CUBLAS, Matlab + Jacket Inverse Theory Inverse Problems Workshop 49 / 64

92 BLAS Performance on GPU - Matrix Vector Multiplication Testing matrices are selected from Matrix Market Benchmark Prob. Size bcsstm bus 1138 bcsstm bcsstk s1rmt3m bcsstk Inverse Theory Inverse Problems Workshop 50 / 64

93 BLAS Performance on GPU - Matrix Vector Multiplication Figure: Time Cost Figure: Speedup Ratio Inverse Theory Inverse Problems Workshop 51 / 64

94 BLAS Performance on GPU - Matrix Vector Multiplication Figure: Relative Error Inverse Theory Inverse Problems Workshop 52 / 64

95 BLAS Performance on GPU - BLAS 1, BLAS 2 and BLAS 3 Figure: GFLOPS Comparison Comment: BLAS 3 outperforms BLAS 1 and BLAS 2 significantly. Inverse Theory Inverse Problems Workshop 53 / 64

96 CG on GPU (ver 0) A synthetic problem setup using Chan s example [Chan and Wan, 1997] A = diag(1,..., n), n = b i (t) = sin(t + i t), i = 1,..., n, and t = 2π/100. TOL = Results: CG on Host: sec on a single core or sec on a multi-core. CG on Device: 5.49 sec. Inverse Theory Inverse Problems Workshop 54 / 64

97 CG on GPU (ver 0) - A closer look Comment: Mat-Vec is too expensive! Total Time: 5.49 sec; Mat-Vec: 3.98 sec (72.5%); Dot-Prod: 0.57 sec (10.4%); SAXPY: 0.16 sec (2.9%); Inverse Theory Inverse Problems Workshop 55 / 64

98 Projected CG (PrCG) - Algorithm By applying Lanczos-Galerkin projection method to CG, we have the PrCG [Chan and Wan, 1997]. i = 0; r (q,1) i = b (q) Ax (q,1) i ; for i = 1 to k α (q,1) i x (q,1) i r (q,1) = < p(1) i, r (q,1) i > ; = x (q,1) i i+1 = r(q,1) i + α (q,1) i p (1) i ; α (q,1) i Ap (1) i ; end (for); % Restart the system if needed; Inverse Theory Inverse Problems Workshop 56 / 64

99 Projected CG (PrCG) Total Time: 1.91 sec; Mat-Vec: 1.01 sec (52.8%); Dot-Prod: 0.45 sec (23.6%); SAXPY: 0.29 sec (15.2%); Comment: SAXPY and Dot-Prod are becoming relatively expensive. Inverse Theory Inverse Problems Workshop 57 / 64

100 Augmented Projected CG (APrCG) Using the orthogonality properties that = 0, j i, and = 0, j < i, take the dot product with p (1) on both sides of r (q,1) i+1 = b(q) k α (q,1) i Ap (1) i i=1 i α (q,1) i, we can rewrite α i as, = < p(1) i, b (q) > . Inverse Theory Inverse Problems Workshop 58 / 64

101 Augmented Projected CG (APrCG) - Algorithm Let P k = [p 1,..., p k ] and AP k = [Ap 1,..., Ap k ]. i = 0; r (q,1) i = b (q) Ax (q,1) i ; α =./diagvec(); Λ = diag(α); x (q,1) = x (q,1) 0 + sum(p k Λ); r (q,1) = r (q,1) 0 sum(ap k Λ); Inverse Theory Inverse Problems Workshop 59 / 64

102 Augmented Projected CG (APrCG) Figure: PrCG Figure: APrCG Inverse Theory Inverse Problems Workshop 60 / 64

103 Numerical Test / 3D Shepp-Logan Phantom Figure: Slice Show of A 3D Shepp-Logan Phantom Inverse Theory Inverse Problems Workshop 61 / 64

104 3D Shepp-Logan Phantom - Results Slice CG PrCG APrCG Cost SNR Cost SNR Cost SNR ImpRate % % % % Table: Reconstruction of Four Consecutive Slices from 65 to 68. The 64 th slice is selected as seed. Inverse Theory Inverse Problems Workshop 62 / 64

105 3D Shepp-Logan Phantom - Results Figure: Reconstruction Results on Host and Device, slice index = 66. Inverse Theory Inverse Problems Workshop 63 / 64

106 Outline 1 Introduction: Inverse Problem and Regularization 2 Topic 1: Multisplitting for Regularized Least Squares Regularization Parallelization Multiple Right Hand Side Problem 3 Topic 2: Total-Variation Regularization Numerical Methods for Total Variation Optimal Parameter Selection, UPRE Approach 4 Topic 3: Projected Krylov Subspace Solvers on GPU Fine-Grained Parallelism Model for Krylov Subspace Solvers on GPU Optimizations of Krylov Subspace Algorithms on GPU Numerical Results 5 References Inverse Theory Inverse Problems Workshop 64 / 64

107 References 1 R. A. Renaut, Y. Lin, and H. Guo. Solving Least Squares Problems by Using a Regularized Multisplitting Method. Numerical Linear Algebra with applications, vol. 19, issue. 4, Y. Lin, B. Wohlberg, and H. Guo. UPRE method for total variation parameter selection. Signal processing, vol. 90, no. 8, Inverse Theory Inverse Problems Workshop 64 / 64

UPRE Method for Total Variation Parameter Selection

UPRE Method for Total Variation Parameter Selection Youzuo Lin School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ 85287 USA. Brendt Wohlberg 1, T-5, Los Alamos National