On the regularization properties of some spectral gradient methods

Size: px

Start display at page:

Download "On the regularization properties of some spectral gradient methods"

Arlene Douglas
5 years ago
Views:

On the regularization properties of some spectral gradient methods Daniela di Serafino Department of Mathematics and Physics, Second University of Naples daniela.diserafino@unina2.

1 On the regularization properties of some spectral gradient methods Daniela di Serafino Department of Mathematics and Physics, Second University of Naples contributions from R. De Asmundis, G. Landi, W.W. Hager, G. Toraldo, M. Viola, H. Zhang PING (Inverse Problems in Geophysics) GNCS Project Opening Workshop Florence, April 6, 2016

2 Outline 1 Linear discrete inverse problems and gradient methods 2 Recent spectral gradient methods for QP: A and C 3 Regularization properties of A and C 4 Extension to bound-constrained QP 5 Possible applications in solving nonlinear inverse problems Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

3 Linear discrete inverse problems and gradient methods Linear discrete inverse problem b = Ax + n, A R p n, n R p, x R n, p n A and b known data, A ill-conditioned, with singular values decaying to zero, and full rank n unknown, representing perturbations in the data x unknown, representing the object to be recovered Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

4 Linear discrete inverse problems and gradient methods Linear discrete inverse problem b = Ax + n, A R p n, n R p, x R n, p n A and b known data, A ill-conditioned, with singular values decaying to zero, and full rank n unknown, representing perturbations in the data x unknown, representing the object to be recovered Reformulation as linear least squares problem: Exact least squares solution: x = A b = n i=1 minimize b Ax 2 2 u T i b n u T i n v i = x true + v i σ i σ i x R n 1 A = UΣV T, U = [u 1,..., u p] R p p, V = [v 1,..., v n] R n n, Σ = diag(σ 1,..., σ n) R p n i=1 useless, because the noise is amplified! Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

5 Linear discrete inverse problems and gradient methods Filter factors and iterative regularization Regularization by filter factors: x reg = n i=1 u T i b φ i v i σ i choose φ i 1 to preserve the components of the solution corresponding to large σ i s, and φ i 0 to filter out the components corresponding to small σ i s Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

6 Linear discrete inverse problems and gradient methods Filter factors and iterative regularization Regularization by filter factors: x reg = n i=1 u T i b φ i v i σ i choose φ i 1 to preserve the components of the solution corresponding to large σ i s, and φ i 0 to filter out the components corresponding to small σ i s Iterative regularization methods, with a suitable early stop, can provide useful regularized solutions x reg Widely investigated classical iterative methods (see, e.g., [Hanke 95; Engl, Hanke & Neubauer 96; Nagy & Palmer 05]): Landweber and Steepest Descent (): very slow but stable convergence, rarely used in practice unless they are coupled with ad hoc preconditioners CG (CGLS, LSQR): fast in reducing the error, but too sensitive to stopping criteria (an early or late stopping may significantly deteriorate the solution) Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

7 Linear discrete inverse problems and gradient methods Gradient methods for convex quadratic problems General framework choose x 0 R n ; k = 0 while (not stop cond) do g k = Qx c compute a suitable steplength α k x k+1 = x k α k g k k = k + 1 end while QP: minimize f (x) 1 x R n 2 xt Qx c T x old origins [Cauchy 1847; Akaike 1959; Forsythe 1968] long considered bad and ineffective because of slow convergence rate and oscillatory behaviour Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

8 Linear discrete inverse problems and gradient methods Gradient methods for convex quadratic problems General framework choose x 0 R n ; k = 0 while (not stop cond) do g k = Qx c compute a suitable steplength α k x k+1 = x k α k g k k = k + 1 end while QP: minimize f (x) 1 x R n 2 xt Qx c T x old origins [Cauchy 1847; Akaike 1959; Forsythe 1968] long considered bad and ineffective because of slow convergence rate and oscillatory behaviour Starting from [Barzilai & Borwein 88], several more efficient gradient methods have been developed, with steplengths related to Hessian spectral properties [Friedlander, Martínez, Molina & Raydan 99; Dai & Yuan 03, 05; Fletcher 05, 12; Dai, Hager, Schittowski & Zhang 06; Yuan 06, 08; Frassoldati, Zanni & Zanghirati 08; De Asmundis, ds, Riccio & Toraldo 13; De Asmundis, ds, Hager, Toraldo & Zhang 14; Gonzaga & Schneider 15] interest in the use of the new gradient methods as regularization methods [Ascher, van den Doel, Huang & Svaiter 09; Cornelio, Porta, Prato & Zanni 13; De Asmundis, ds & Landi 16] Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

9 Linear discrete inverse problems and gradient methods Analysis of gradient methods (for linear least squares) g k = A T (Ax k b), k = 0, 1, 2,... Write g k in terms of the SVD of A: if g 0 = n i=1 µ0 i v i, then n k g k = µ k i v i, µ k i = µ 0 i (1 α j σi 2 ) i=1 j=0 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

10 Linear discrete inverse problems and gradient methods Analysis of gradient methods (for linear least squares) g k = A T (Ax k b), k = 0, 1, 2,... Write g k in terms of the SVD of A: if g 0 = n i=1 µ0 i v i, then n k g k = µ k i v i, µ k i = µ 0 i (1 α j σi 2 ) i=1 if at the k-th iteration µ k i j=0 = 0 for some i, then µ l i = 0 for l > k µ k i = 0 iff µ 0 i = 0 or α j = 1/σi 2 for some j k α k 1 µ k+1 i << µ k i σi 2 = µ k+1 r < µ k r if r > i µ r k+1 > µ k r if r < i and λ r > 2σi 2 Non-restrictive assumptions: σ 1 > σ 2 > > σ n, µ 0 1 0, µ0 n 0 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

11 Recent spectral gradient methods for QP: A and C A framework for building fast gradient methods A new steplength selection rule α k = { α k if mod(k, h + m) < h ᾱ s otherwise, with s = max{i k : mod(i, h + m) = h} h 2 α k classical (Cauchy) steplength ᾱ s special steplength with spectral properties In other words: make h consecutive exact line searches and then compute a different steplength, to be kept constant and applied in m consecutive gradient iterations Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

12 Recent spectral gradient methods for QP: A and C A method [De Asmundis, ds, Riccio & Toraldo 13] Set ᾱ s = α s, where α s = ( 1 α s ) 1 αs Let {x k } be the sequence of iterates generated by the method applied to the least squares problem, starting from any point x 0. Then lim α k = k 1 σ σ2 n Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

13 Recent spectral gradient methods for QP: A and C A method [De Asmundis, ds, Riccio & Toraldo 13] Set ᾱ s = α s, where α s = ( 1 α s ) 1 αs Let {x k } be the sequence of iterates generated by the method applied to the least squares problem, starting from any point x 0. Then lim α k = k 1 σ σ2 n A ( with Alignment) combines the tendency of to choose its search direction in span{v 1, v n } the tendency of the gradient method with α k = 1/(σ σ2 n) to align the search direction with v n, R-linear conv., but significant improvement of practical convergence speed over Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

14 Recent spectral gradient methods for QP: A and C C method [De Asmundis, ds, Hager, Toraldo & Zhang 14] Set ᾱ s equal to the Yuan steplength [Yuan 06] αs Y = 2 ( 1 1 ) 2 g s αs 1 αs ( α s 1 g ) s 1 αs αs 1 Let {x k } be the sequence generated by the method applied to the least squares problem, starting from any point x 0. Then lim k αy k = 1 σ1 2. Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

15 Recent spectral gradient methods for QP: A and C C method [De Asmundis, ds, Hager, Toraldo & Zhang 14] Set ᾱ s equal to the Yuan steplength [Yuan 06] αs Y = 2 ( 1 1 ) 2 g s αs 1 αs ( α s 1 g ) s 1 αs αs 1 Let {x k } be the sequence generated by the method applied to the least squares problem, starting from any point x 0. Then lim k αy k = 1 σ1 2. C ( with Constant Yuan steps) uses a finite sequence of Cauchy steps in order to force the search in span{v 1, v n} and to get a suitable approximation of 1/σ 2 1 applies this approximation in multiple steps in order to drive toward zero the component of the gradient along v 1 R-linear convergence, but significant improvement of practical convergence speed over Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

16 Recent spectral gradient methods for QP: A and C Some remarks If σ 1 σ n, then 1/(σ1 2 + σ2 n) 1/σ1 2 and A fosters the elimination of the component of g k corresponding to σ 1 In the ideal case where the component of g k along v 1 is completely removed, the problem size decreases by 1, and A and C tend to drive toward zero the component of g k along v 2. The same reasoning applies to v i for i > 2 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

17 Recent spectral gradient methods for QP: A and C Some remarks If σ 1 σ n, then 1/(σ1 2 + σ2 n) 1/σ1 2 and A fosters the elimination of the component of g k corresponding to σ 1 In the ideal case where the component of g k along v 1 is completely removed, the problem size decreases by 1, and A and C tend to drive toward zero the component of g k along v 2. The same reasoning applies to v i for i > 2 In general A and C are non-monotone. However, for small values of m, such as m = 2, 3, 4, A and C show monotonicity in practice if h is sufficiently large when very low accuracy is required, as in the regularization of inverse ill-posed problems, h is not required to be too large (h = 2, 3 and m = 2 is a good combination) Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

18 Regularization properties of A and C Filter factors of A and C [De Asmundis, ds & Landi 16] Filter factors of gradient methods x k+1 = ( n k ( 1 1 αr σi 2 ) ) i=1 r=0 } {{ } φ k+1 i u T i b σ i v i, x 0 = 0 The better α r approximates 1/σi 2 for some r, the closer φ k+1 i will be to 1; multiple values of α r close to 1/σi 2 push φ k+1 i to quickly go toward 1 1/α r σi 2 φ k i 0 the tendency of A and C to push toward zero the components of the gradient, according to the decreasing order of the singular values, translates into the approximation of the most significant components of the solution Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

19 Regularization properties of A and C Comparison of filter factors heat problem from Regularization Tools [Hansen 94], size(a) = 64 64, cond(a) 10 28, Gaussian noise, noise level 0.01, A/C with h = 3 and m = 2 φ i heat k=5 A C CG φ i heat k=5 A C DY BB i heat k=10 A C CG i heat k=10 A C DY BB φ i 0 φ i i i Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

20 Regularization properties of A and C Comparison of filter factors (cont d) heat k=40 A C CG heat k=40 A C DY BB φ i 0 φ i i i Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

21 Regularization properties of A and C Comparison of filter factors (cont d) and relative errors heat k=40 A C CG heat k=40 A C DY BB φ i 0 φ i i i heat heat relative error A C CG relative error (log scale) 10 0 A C DY BB iteration iteration Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

22 Regularization properties of A and C Experiments on image restoration problems: paralleltomo parallel-beam tomography AIR Tools [Hansen & Saxild-Hansen 12] img size = 50 50, 36 angles (0 179 ), 75 parallel rays, cond(a) 10 15, h = 3, m = 2 relative error (log scale) A C DY CG paralleltomo (nl=0.025) relative error (log scale) A C DY CG paralleltomo (nl=0.075) iteration iteration Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

23 Regularization properties of A and C Experiments on image restoration problems: paralleltomo parallel-beam tomography AIR Tools [Hansen & Saxild-Hansen 12] img size = 50 50, 36 angles (0 179 ), 75 parallel rays, cond(a) 10 15, h = 3, m = 2 relative error (log scale) A C DY CG paralleltomo (nl=0.025) relative error (log scale) A C DY CG paralleltomo (nl=0.075) iteration iteration original A CG nl = Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

Regularization properties of A and C Experiments on image restoration problems: peppers image deblurring problem, image size = 256 256 Gaussian PSF, noise level nl = 0.

24 Regularization properties of A and C Experiments on image restoration problems: peppers image deblurring problem, image size = Gaussian PSF, noise level nl = 0.01, cond(a) 10 18, h = 3, m = 2 relative error (log scale) 1 A C DY CG peppers (nl=0.01) original iteration noisy & blurred A CG Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

25 Extension to bound-constrained QP Extending A and C to bound-constrained problems BCQP: minimize f (x) = 1 2 xt Qx c T x s. t. x Ω, Ω = {x : l x u} Q R n n symmetric (positive definite), l {R { }} n, u {R {+ }} n General framework x 0 R n ; k = 0 while (not stop cond) do g k = Qx k c compute a suitable steplength α k x k+1 = P Ω (x k α k g k ) k = k + 1 end while Gradient Projection (GP) methods [Goldstein, 1964; Levitin & Polyak, 1966; Calamai & Moré, 1987] P Ω (x) = argmin{ x z : z Ω} Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

26 Extension to bound-constrained QP Extending A and C to bound-constrained problems BCQP: minimize f (x) = 1 2 xt Qx c T x s. t. x Ω, Ω = {x : l x u} Q R n n symmetric (positive definite), l {R { }} n, u {R {+ }} n General framework x 0 R n ; k = 0 while (not stop cond) do g k = Qx k c compute a suitable steplength α k x k+1 = P Ω (x k α k g k ) k = k + 1 end while Gradient Projection (GP) methods [Goldstein, 1964; Levitin & Polyak, 1966; Calamai & Moré, 1987] P Ω (x) = argmin{ x z : z Ω} The spectral properties of A and C are not preserved! Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

27 Extension to bound-constrained QP Two-phase GP algorithm: basics active set at x: A(x) = {i : x i = l i opp. x i = u i } f i (x), x i (l i, u i ) projected gradient at x: ( Ω f (x)) i = min{ f i (x), 0}, x i = l i max{ f i (x), 0}, x i = u i x, x Ω binding set at x: B(x) = {i : (x i = l i and f i (x) 0) or (x i = u i e f i (x) 0)} x nondegenerate stationary point: f i (x ) 0 i A(x ) Identification of the active constraints at the solution [Calamai & Moré, 1987]: if {x k } converges to nondegenerate x Ω and { Ω f (x k ) } converges to 0, then A(x k ) = A(x ) for all sufficiently large k Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

28 Extension to bound-constrained QP Two-phase GP algorithm: basics active set at x: A(x) = {i : x i = l i opp. x i = u i } f i (x), x i (l i, u i ) projected gradient at x: ( Ω f (x)) i = min{ f i (x), 0}, x i = l i max{ f i (x), 0}, x i = u i x, x Ω binding set at x: B(x) = {i : (x i = l i and f i (x) 0) or (x i = u i e f i (x) 0)} x nondegenerate stationary point: f i (x ) 0 i A(x ) Identification of the active constraints at the solution [Calamai & Moré, 1987]: if {x k } converges to nondegenerate x Ω and { Ω f (x k ) } converges to 0, then A(x k ) = A(x ) for all sufficiently large k Basic idea (as in [Moré & Toraldo, 1991]): use a GP method to select a candidate active set use A/C to explore the face of Ω identified by GP (unconstr. subprob.) = preserve the spectral properties of the new gradient methods Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

29 Extension to bound-constrained QP Two-phase GP algorithm [ds, Toraldo, Viola, work in progress] Sketch of the algorithm x 0 R n ; k = 0 while (not stop cond) do apply a GP method to BCQP: starting from y 0 = x k, generate {y j } until cond1 is satisfied x k = y jk, where y jk = last y j apply A/C to min{f k (d) f ( x k + d) : d i = 0 i A( x k )}: starting from d 0 = 0, generate {d j } until cond2 is satisfied x k+1 = P Ω ( x k + α k d rk ), with d rk = last d k and α k computed by a projected search if A(x k+1 ) = B(x k+1 ), then continue with A/C end while cond1: A(y j ) = A(y j 1 ) or f (y j 1 ) f (y j ) η 2 max{f (y l 1 ) f (y l ), 1 l < j } cond2: f k (d j 1 ) f k (d j ) η 1 max{f k (d l 1 ) f k (d l ), 1 l < j } Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

30 Extension to bound-constrained QP Two-phase GP algorithm: convergence Projected search along f (x k ) e d k : generate a sequence of trial steplengths such that [ ] α (l+1) k γ 1 α (l) k, γ 2α (l) k, 0 < γ 1 < γ 2 < 1, α (0) k > 0 α k = α (r) k satisfying an Armijo-like condition for f Convergence: if Q is spd and x is the solution of BCQP, then any sequence {x k } generated by the two-phase GP algorithm is such that either x k = x after a finite number of iterations or x k x Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

31 Extension to bound-constrained QP Some numerical experiments random Q with n = 10 4 and varying κ(q); bounds: β x i β, β = 1, 5, 9 x 0 = 0; stop crit. Ω f (x k ) 10 5 f (x 0) ; C with h = m = 4 (Matlab) # MAT-VET PRODUCTS κ(q) η 1 η 2 10% active constr. 50% active constr. 90% active constr. GPC GPCG GPC GPCG GPC GPCG GPC competitive with GPCG, especially on the most difficult problems GPC less sensitive to η 1 ed η 2 than GPCG Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

32 Possible applications in solving nonlinear inverse problems Exploiting gradient methods for QP/BCQP in nonlinear inverse problems 1 (h, b) = data, m(x, h) = model function, x = parameters to be estimated, r(x) = b m(x, h) = error in model prediction minimize x R n f (x), f (x) = 1 2 r(x) 2 2 = m i=1 r 2 i (x) Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

33 Possible applications in solving nonlinear inverse problems Exploiting gradient methods for QP/BCQP in nonlinear inverse problems 1 (h, b) = data, m(x, h) = model function, x = parameters to be estimated, r(x) = b m(x, h) = error in model prediction minimize x R n f (x), f (x) = 1 2 r(x) 2 2 = m i=1 r 2 i (x) Regularized Gauss-Newton method x 0 R n ; k = 0 while (not stop cond) do compute J k Jacobian of r at x k compute d k regularized solution of minimize d R n J k d + r(x k ) 2 2 compute α k by a suitable line search x k+1 = x k + α k d k k = k + 1 end while Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

34 Possible applications in solving nonlinear inverse problems Exploiting gradient methods for QP/BCQP in nonlinear inverse problems 1 (cont d) Compute a regularized solution of minimize d R n J k d + r(x k ) 2 2 [Deidda, Fenu & Rodriguez, 2014]: compute a TSVD solution (or a TGSVD one, in order to introduce a regularization matrix) Possible alternative: use A/C to compute a regularized solution Less sensitive to the estimate of the noise norm Easy to use in a matrix-free regime Effective? Efficient? Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

35 Possible applications in solving nonlinear inverse problems Exploiting gradient methods for QP/BCQP in nonlinear inverse problems 2 minimize f (u) f fit (u) + λf reg (u) s. t. u 0 f fit (u) = KL(Au, b) Kullback-Leibler divergence f reg (u) = TV (u) or f reg (u) = W u 1 (frame-based regularization) Solve the problem by combining Iteratively Reweighted Norm approach [Wolke & Schwetlick, 1988; Rodriguex & Wohlberg, 2009] Weighted Least Squares approximation of KL fidelity term [Shen, Yin, Zhang, 2015] [Work in progress (just started), with G. Landi] Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

36 Possible applications in solving nonlinear inverse problems Exploiting gradient methods for QP/BCQP in nonlinear inverse problems 2 (cont d) Algorithm (sketch) u 0 R n ; k = 0 while (not stop cond) do 1. compute fk fit(u) quadratic approx of f fit (u) (using u k ) 2. compute f reg k (u) quadratic approx of f reg (u) (using u k ) 3. compute u k+1 argmin u 0 fk fit reg (u) + λfk (u) 4. k = k + 1 end while 1. Weighted Least Squares approximation 2. Iteratively Reweighted Norm approach 3. Two-phase GP algorithm Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

37 Possible applications in solving nonlinear inverse problems CAN WE EFFICIENTLY EXPLOIT SPECTRAL GRADIENT METHODS IN THE PING PROJECT? Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, / 24

First order methods in optimization: a second life?

First order methods in optimization: a second life? Gerardo Toraldo Dipartimento di Matematica e Applicazioni "Renato Caccioppoli", Università di Napoli Federico II toraldo@unina.it Joint work with Daniela