Astronomy. Astrophysics. Least-squares methods with Poissonian noise: Analysis and comparison with the Richardson-Lucy algorithm

Size: px

Start display at page:

Download "Astronomy. Astrophysics. Least-squares methods with Poissonian noise: Analysis and comparison with the Richardson-Lucy algorithm"

Amice Stevens
5 years ago
Views:

1 A&A 436, (5) DOI:.5/4-636:4997 c ESO 5 Astronomy & Astrophysics Least-squares methods with ian noise: Analysis and comparison with the Richardson-Lucy algorithm R. Vio,3,J.Bardsley,andW.Wamsteker 3 Chip Computers Consulting s.r.l., Viale Don L. Sturzo 8, S. Liberale di Marcon, 3 Venice, Italy robertovio@tin.it Department of Mathematical Sciences, The University of Montana, Missoula, MT , USA bardsleyj@mso.umt.edu 3 ESA-VILSPA, Apartado 577, 88 Madrid, Spain willem.wamsteker@esa.int Received September 4 / Accepted 8 February 5 Abstract. It is well-known that the noise associated with the collection of an astronomical image by a CCD camera is largely ian. One would expect, therefore, that computational approaches that incorporate this a priori information will be more effective than those that do not. The Richardson-Lucy (RL) algorithm, for example, can be viewed as a maximum-likelihood (ML) method for image deblurring when the data noise is assumed to be ian. Least-squares (LS) approaches, on the other hand, are based on the assumption that the noise is ian with fixed variance across pixels, which is rarely accurate. Given this, it is surprising that in many cases results obtained using LS techniques are relatively insensitive to whether the noise is ian or ian. Furthermore, in the presence of noise, results obtained using LS techniques are often comparable with those obtained by the RL algorithm. We seek an explanation of these phenomena via an examination of the regularization properties of particular LS algorithms. In addition, a careful analysis of the RL algorithm yields an explanation as to why it is more effective than LS approaches for star-like objects, and why it provides similar reconstructions for extended objects. Finally a comparative convergence analysis of the two algorithms is carried out, with a section devoted to the convergence properties of the RL algorithm. Numerical results are presented throughout the paper. The subject treated in this paper is not purely academic. In comparison with many ML algorithms, the LS algorithms are much easier to use and to implement, are computationally more efficient, and are more flexible regarding the incorporation of constraints on the solution. Consequently, if little to no improvement is gained in the use of an ML approach over an LS algorithm, the latter will often be the preferred approach. Key words. methods: data analysis methods: statistical techniques: image processing. Introduction The restoration of images is a common problem in Astronomy. Astronomical images are blurred due to several factors: the refractive effects of the atmosphere, the diffractive effects of the finite aperture of the telescope, the statistical fluctuations inherent in the collection of images by a CCD camera, and instrumental defects. An example is represented by the spherical aberration of the primary mirror of the Hubble Space Telescope (White 99) that limited the quality of the images before the detector system was refurbished. The widespread interest in this subject has resulted in the development of a large number of algorithms with different degrees of sophistication (for a review, see Starck et al. ). For example, recent wavelet-based approaches have been shown to provide excellent results (e.g., see Neelamani et al. 4). Unfortunately, these algorithms are very expensive to implement, prohibiting their use on large-scale image restoration problems and on problems that require the restoration of a large number of images. Consequently, for many restoration problems, less sophisticated and computationally more efficientalgorithmsmustbe used. In this respect, the algorithms based on a linear Least-Squares (LS) methodology represent an interesting class. In this paper we discuss two LS approaches: direct and iterative. Direct methods, which we discuss in Sect. 3., are the most computationally efficient, while iterative techniques, which we discuss in Sect. 3., allow for the straightforward incorporation of constraints. In spite of the advantages of the LS-based algorithms, astronomers typically use techniques based on a non-linear approach. Such algorithms are usually more difficult to implement, are less flexible and often have slow convergence rates. In particular, the original Richardson-Lucy (RL) algorithm (Richardson 97; Lucy 974; Shepp & Vardi 98) and later modifications have attracted much attention. RL can be viewed as an Expectation-Maximization (EM) algorithm associated Article published by EDP Sciences and available at or

2 74 R. Vio et al.: Algorithms for image restoration with a statistical noise model. Linear LS methods, on the other hand, can be viewed as the maximum-likelihood (ML) approach when the noise contaminating the image(s) of interest is additive ian with constant variance across pixels. For CCD camera noise, the statistical assumption inherent in the RL algorithm is much more accurate than that of the LS approach (see Sect. ). Nonetheless, it is often the case that these two methods provide results of similar quality (see Sect. 4). This is disappointing since it means that in certain instances the RL algorithm is not able to exploit the a priori statistical information. This is particularly relevant when the incorporation of the a priori information results in algorithms that are more expensive and difficult to implement. The aims of the present paper are as follows: I) to determine the performance of the LS algorithms when the noise is predominantly ian; II) to determine when the LS and RL approaches will give very similar qualitative results. Such questions are not only of academic interest. We believe that due to certain distinct computational advantages, the LS algorithms should be preferentially. However, the LS approach is not always the best choice. In the next section, we present the statistical model for CCD camera image formation as well as the approximate model that we will use here. After some preliminary considerations in Sect., in Sect. 3 we will explore the convergence properties of the two LS approaches. We will also discuss the performance of these algorithms on different objects. In Sect. 4 we will explore in detail the convergence properties of the RL algorithm. Throughout the paper we present numerical results. We give our conclusion in Sect. 5.. Statistical considerations Astronomical image data is typically collected with a CCD camera. The following statistical model (see Snyder et al. 993, 995) applies to image data from a CCD detector array: b(i, j) = n obj (i, j) + n bck (i, j) + n rdn (i, j). () Here, b(i, j) is the data acquired by a readout of the pixels of the CCD detector array; i, j =,,...,N (without loss of generality, square images are considered); n obj (i, j) is the number of object-dependent photoelectrons; n bck (i, j) is the number of background electrons; n rdn (i, j) is the readout noise. The random variables n obj (i, j ), n bck (i, j )andn rdn (i, j )areassumed to be independent of one another and of n obj (i, j ), n bck (i, j )andn rdn (i, j )fori, j i, j. For clearness we use matrix-vector notation. We rewrite Eq. () as b = n obj + n bck + n rdn, () where the vectors have been obtained by column stacking the corresponding two-dimensional arrays. The random vector n obj has a distribution with parameter Ax,where x is the true image (or object) and A is a matrix that corresponds to the point spread function (PSF). Depending on the assumed boundary conditions, its structure is typically block-circulant or block-toeplitz (e.g., see Vio et al. 3; Vogel ). n bck and n rdn are random vectors containing independent entries that have a ian distribution with mean β and a ian distribution with mean and fixed variance σ rdn, respectively. We will use the following notation to denote model (): b = P[Ax] + P[β ] + N (,σ rdn ), (3) where P[µ] denotes a ian random vector with independent entries and mean given by µ, N (µ,σ ) represents a ian random vector with independent entries and mean and variance given by the vectors µ and σ, respectively. For iid entries, µ = µ and σ = σ. Ifσ rdn is not too small, we have P[σ rdn ] N (σ rdn,σ rdn ) (see Feller 97), and hence, using the independence properties of the random variables in Eq. () we obtain the following approximation of Eq. (3): b = P[Ax + (β + σ rdn ) ] σ rdn. (4) In order to simplify the analysis, we assume the following simplified model b = P[Ax]. (5) The analysis is easily extended to the model given by Eq. (4). Furthermore, in regions of high intensity (Ax) i is much larger than both β and σ rdn, in which case model (4) is wellapproximated by model (5). A further useful approximation is possible if the elements of b are large. In this case model (5) can be well approximated by b = Ax + z, (6) where z is a zero mean ian random vector z γ N (,). (7) Here symbol denotes Hadamard (element-wise) multiplication and γ = (Ax) / = {(Ax) / i }. In other words, through Eq. (6), the nonlinear noise model (5) can be approximated by a linear, additive, nonstationary, ian noise model. Our own numerical experiments suggest that in order for approximation (6), (7) to be accurate, it is sufficient that b i > 4. This condition is true in many situations of practical interest (recall that b i is the number of photons detected by the ith pixel in the CCD camera). From Eqs. (6) and (7) we see that z has a flat spectrum, i.e., the expected modulus of its Discrete Fourier Transform (DFT) is constant. However, here, unlike a stationary ian white noise process, the various Fourier frequencies are not independent of each other (e.g., see Fig. ). The reason is that the pointwise multiplication in the spatial domain corresponds to convolution in the Fourier domain, and vice versa. Thus, from Eq. (7), we have ẑ(i, j) = [ γ N (,) ](i, j), (8) where the symbol indicates DFT, denotes convolution, and (i, j) represents a two-dimensional, discrete frequency index. Since convolution is a linear operation, and { N (,) (i, j)} are iid complex ian random variables with zero mean and unit variance, {ẑ(i, j)} are complex ian random variables with zero mean and a constant variance equal to i, j γ(i, j). However, they are not independent of each other. Article published by EDP Sciences and available at or

R. Vio et al.: Algorithms for image restoration 743 ian noise Power spectrum / ian noise ian noise Power spectrum / ian noise limited, i.e., they are very smooth functions.

Consequently, this constitutes the lower-limit with which the spectrum of the images can be expected to decrease;.

the object is larger than that of the noise. Fig.. Comparison of ian vs. stationary ian noise and corresponding power-spectra for the satellite image shown in Fig. 4.

Performance of the least-squares approach The above comments regarding the noise process z provide some insight into the performance of the LS deblurring algorithms in the presence of ian noise.

3 R. Vio et al.: Algorithms for image restoration 743 ian noise Power spectrum / ian noise ian noise Power spectrum / ian noise limited, i.e., they are very smooth functions. Furthermore, if a function is in C k (i.e., it has k continuous derivatives) then its spectrum decreases at least as fast as /ν k+. Consequently, this constitutes the lower-limit with which the spectrum of the images can be expected to decrease;. The discrete Picard condition plus the Riemann-Lebesgue lemma (Hansen 997) show that the only Fourier frequencies useful for the restoration of the image are (roughly) those where the contribution of the object is larger than that of the noise. Fig.. Comparison of ian vs. stationary ian noise and corresponding power-spectra for the satellite image shown in Fig. 4. The variance of the two noises is the same. 3. Performance of the least-squares approach The above comments regarding the noise process z provide some insight into the performance of the LS deblurring algorithms in the presence of ian noise. In particular, due to the fact that the LS approach corresponds to the statistical assumption that the contaminating noise is stationary ian, while z is well-approximated by a nonstationary ian random variable, the possible worsening of the performance of LS algorithms has to be expected due to their inability to take into account the dependence between the Fourier frequencies that characterize the spectrum of z. The question then arises as to what happens if such a dependence is not taken into account. In order to understand this point, we remark that the LS estimate of the true image x in Eq. (6) given by x LS ( Ax b ) ; (9) = argmin x = (A T A) A T b, () is extremely unstable since the matrix A is highly illconditioned. The equality in Eq. () holds provided A has full rank, and the least-squares cost function in problem (9) is convex if A is positive semi-definite. Because of the instability of x LS,aregularized or biased approximation is typically computed. Such an approximation is obtained by filtering out the high frequency components of x LS (e.g., see Vogel ). This fact, together with the following two arguments, suggest that in many astronomical applications the consequences of the LS approach ignoring the dependence between the Fourier frequencies in z will in many instances be small:. The images b obtained from astronomical experiments have spectra in which only the lowest frequencies have intensities that are significantly different from zero. This is a consequence of the fact that the PSFs are nearly band From such considerations, it is possible to conclude that in the construction of the deblurred image only a few frequencies (i.e., the lowest ones) are primarily used, whereas most of the remaining frequencies are of only marginal importance. For example, in the case of a star-like source (i.e., a nonbandlimited function) and ian PSF with circular symmetry (a C function) and dispersion (in pixels) σ p, the spectrum of the observed image is again a ian function with circular symmetry and dispersion (in pixels) σ p given by: σ p N/(πσ p ). () In several of the numerical experiments presented below, we have used N = 56 and σ p =. In this case, σ p 3.5. With the levels used in the simulations, it happens that in b(i, j) the noise becomes dominant when (approximately) i, j ϱ, where ϱ. Hence, the LS algorithms can be expected to be almost insensitive to the nature of the noise. Although the set of frequencies most corrupted by noise are determined both by the noise level and by the spectrum of the PSF, it is the latter factor that has the most influence. To see this we note that, for the case of the star-like source and ian PSF considered above, it is possible to show that the frequency index (i, j) where the spectrum of the signal and that of z have the same level is given by r N ln(a s /σ z )/(πσ p ), where r = (i + j ) /, A s is the amplitude of the source, and σ z is the level of the noise spectrum. In the two next sections, these arguments will be checked in the context of the direct and the iterative LS methods. The approximation (7) motivates the use of a weighted least-squares (WLS) technique. In this approach, the l norm in Eq. (9) is replaced by an energy norm C,whereC is the diagonal covariance matrix of the (approximate) noise random variable z, and u C = ut C u. () In the present context, the entries of C are given by {γ i },i.e., they depend on x. The WLS approach will provide more accurate reconstructions than LS in the case of nonstationary ian noise, i.e. when C is not a constant matrix, but the direct approach that is discussed in the next section does not, in general, allow for a fast implementation in the case of WLS. An iterative approach can be used for WLS problems, but the cost of implementing these algorithms is the same as that of the RL algorithm. Thus, since RL arises from a more accurate statistical noise model, and is much more widely used in astronomical imaging, we will use RL in the comparisons below. Article published by EDP Sciences and available at or

4 744 R. Vio et al.: Algorithms for image restoration 3.. The direct approach One of the most standard approaches for handling the illconditioning of problem (9) is known as Tikhonov regularization. In the Tikhonov approach, one obtains a stable estimate of the solution of the linear system of interest by imposing a norm penalization, or bias, to the solution itself. This means that we solve, instead, the penalized least-squares problem ( ) x λ = argmin Ax b x + λ x, (3) or, equivalently, the linear system (A T A + λ I)x = A T b. (4) Here λ is a positive scalar known as the regularization parameter. The direct solutions of the system (4) can be obtained very efficiently using fast implementations of the DFT (Hunt 977). Moreover, there are various reliable and well tested criteria that allow for the estimation of λ. A standard technique is known as the generalized cross-validation (GCV) method. With this approach, the optimal value of λ is estimated via the minimization of the GCV function Ax λ b GCV(λ) = /N [ trace(i A(λ))/N ] (5) Here A is the matrix that defines the estimator of Ab, i.e., Ab = Ax λ,andn is the number of pixels in the image. For Tikhonov regularization A(λ) = A(A T A + λ I) A T b. (6) It is useful to express model (3) and the GCV function (5) in the Fourier domain: Â (i, j) x λ (i, j) = Â(i, j) + λ b(i, j), (7) and N GCV(λ) = N b(i, j) i, j= Â(i, j) + λ N / (8) Â(i, j) + λ i, j= Here, one can compute both x λ and the minimizer of GCV(λ) very efficiently. Figures 5 compare the results obtainable with this method when the noise is stationary ian and ian, respectively. The image b p (i, j), contaminated by ian noise, has been obtained by simulating a nonstationary ian process with local mean given by the values of the pixels in the blurred images, i.e. using model (5). Four peak signal to noise (S/N) ratios have been considered :, 3, 4 and 6 db. They correspond to situations of very high, intermediate, and very low noise levels. The PSF used in the simulations is a two-dimensional ian function with circular Here S/N = log(b max /b / max) db,whereb max is the maximum value in the image Ax. symmetry. The image b g (i, j), contaminated by ian noise, has been obtained by the addition of a discrete stationary white noise process to the blurred images. Both the ian and the ian noises have been simulated through a classic inverse distribution method (e.g., see Johnson 987) by transforming the same set of random uniform deviates. They have exactly the same variance. Here, the subject of interest is superimposed on a sky whose intensity, in the blurred image, is set to % of the maximum value of the image. This means that, contrary to the ian case where the noise level is constant across the image, in the ian case the noise is mostly concentrated in the pixels with highest values. In spite of this fact, these figures show that the results provided by Tikhonov coupled with GCV are quite similar regardless of whether the noise is ian or ian. These results can be explained if one considers Eq. (7), where it is clear that the role of λ is to replace the Fourier coefficients Â(i, j) with small modulus, i.e., those coefficients that make the deblurring operation unstable. According to the two points mentioned above, the optimal value of λ should replace all the Fourier coefficients whose modulus is smaller than the expected level of the noise in the Fourier domain. Since in b p (i, j) andb g (i, j) the level of the noise is the same, such a replacement will be quite important and will have a similar effect for both images. This is shown by the results in Figs In particular, the c) panels show that GCV chooses λ so that the frequencies corresponding to the flat part of the spectra (i.e., those dominated by the noise) are filtered out. The consequence of this is that, for both ian and the ian noises, almost the same number of coefficients are filtered. Moreover, as is shown in the d) panels, the coefficients b p (i, j) and b g (i, j) corresponding to the coefficients Â(i, j) with the largest modulus, are very similar. From this, one can conclude that the deblurred images x λ can be expected to be very similar regardless of the nature of the noise. The reason why the two GCV curves are almost identical (see the b) panels) is that, independently of the nature of the noise, in Eq. (8) the quantity b(i, j) can be replaced by b(i, j) = Â(i, j) x(i, j) + ẑ n (i, j), where ẑ n (i, j) is given by Eq. (8) or by a stationary white noise process. Now, taking the expected value of the resulting GCV(λ), it is not difficult to show that N E[GCV(λ)] = N Â(i, j) x(i, j) + σ z n i, j= ( Â(i, j) + λ ) N / (9) Â(i, j) + λ i, j= Since the term σ z n is constant, the E[GCV(λ)] function is independent of the specific nature of the noise. The same is not true for the variance. However, because of the arguments presented above, no instabilities are to be expected. This is supported by the fact that in our numerical experiments we have never experienced stability problems (see also Fig. ). Article published by EDP Sciences and available at or

ian / rms T = 93 rms I = 348 Fig.. Comparison of the results obtained by Tikhonov coupled with GCV in the case of ian and ian noises (see text).

rms T and rms I are the rms of the true residuals calculated on the entire image and only on the pixels corresponding to the satellite, respectively. Fig. 4. As in Fig. but with S/N = 4 db.

As in Fig. but with S/N = 6 db. Fig. 3. As in Fig. but with S/N = 3 db.

() Typically L is taken to be a discrete approximation of some derivative operator. One can use an iterative approach together with Tikhonov regularization, i.e., one can use an iterative approach for solving problem (3).

5 R. Vio et al.: Algorithms for image restoration 745 Blurred image / ( db) / rms T = 3 rms I = 33 Blurred image / (4 db) / rms T = 84 rms I = 357 Blurred image / ian ian / rms T = 3 rms I = 3 Blurred image / ian ian / rms T = 93 rms I = 348 Fig.. Comparison of the results obtained by Tikhonov coupled with GCV in the case of ian and ian noises (see text). The images have sizes pixels, the PSF is ian with circular symmetry and dispersion set to pixels, S/N = db. rms T and rms I are the rms of the true residuals calculated on the entire image and only on the pixels corresponding to the satellite, respectively. Fig. 4. As in Fig. but with S/N = 4 db. Blurred image / (6 db) / rms T = 347 rms I = 993 Blurred image / (3 db) / rms T = rms I = 3 Blurred image / ian ian / rms T = 358 rms I = 9745 Blurred image / ian ian / rms T = rms I = 3 Fig. 5. As in Fig. but with S/N = 6 db. Fig. 3. As in Fig. but with S/N = 3 db. It is not difficult to see that similar arguments hold also when in model (3) the norm penalization is substituted by a smoothness one, i.e., when ( ) x λ = argmin Ax b x + λ Lx. () Typically L is taken to be a discrete approximation of some derivative operator. One can use an iterative approach together with Tikhonov regularization, i.e., one can use an iterative approach for solving problem (3). These approaches allow for the incorporation of nonnegativity constraints, while the direct approach does not. See Vogel () for a thorough discussion. 3.. The iterative approach To deal with the ill-conditioning of problem (9) an iterative approach can also be used. Although computationally less efficient than the direct method discussed above, they are much more flexible in that they allow for the straightforward incorporation of constraints. These algorithms provide regularization via a semiconvergence property; that is, the iterates first reconstruct the low frequency components of the signal, i.e. those less contaminated by noise, and then the high frequency ones. In other words, the number of iterations plays the same role as the regularization parameter λ in the Tikhonov approach. Semiconvergence has been rigorously proved only for a limited number of algorithms (see Vogel ; Lee & Nagy 4). For others, some theoretical results are available but the primary evidence stems from many years of success in use in applied problems. Article published by EDP Sciences and available at or

6 746 R. Vio et al.: Algorithms for image restoration 5 5 Panel a) GCV(λ) x Panel b) 5 5 Panel a) GCV(λ) x Panel b) Coefficient DFT[ b(i,j) ] x 4 Panel c) b(i,j) λ Panel d) Coefficient DFT[ b(i,j) ] x 4 Panel c) b(i,j) 6.8 x λ x 3 Panel d) 5 5 Fig. 6. Comparison of the results obtained by Tikhonov coupled with GCV in the case of ian and ian noises. This figures correspond to the experiment shown in Fig.. Panel a) the coefficients b g (i, j) and b p (i, j) in decreasing order. The two horizontal lines represent the values of λ for the two noises; b) corresponding GCV functions; c) coefficients b p (i, j) and b g (i, j) corresponding to the first coefficients of Â(i, j) shown in panel a). The vertical lines show the indices of b(i, j) closest to λ. d) b(i, j) = b g (i, j) b p (i, j) / b g (i, j) vs. the corresponding first coefficients of Â(i, j) with the largest modulus. S/N = db. Coefficient DFT[ b(i,j) ] Panel a) 4 6 x 4 Panel c) 5 5 GCV(λ) b(i,j) Fig. 7. As in Fig. 6 but with S/N = 3 db. x 6 Panel b) λ Panel d) The prototypical iterative regularization algorithm for least squares problems is the Landweber method ().Thisisa gradient descent algorithm for solving problem (9). In particular, it creates a sequence of iterates that, provided A has full column rank, converges to the solution of Eq. (). Because of its slow convergence, it is not frequently used in practical applications. However, because of its simplicity, it is often utilized in the theoretical analysis of the potential performances of the LS approach. 5 5 Fig. 8. As in Fig. 6 but with S/N = 4 db. Coefficient DFT[ b(i,j) ] Panel a) 4 6 x 4 Panel c) 5 5 GCV(λ) Fig. 9. As in Fig. 6 but with S/N = 6 db. b(i,j). x x 4 Panel b).5 λ x 3 Panel d) If x is the starting image (often x = ), then the iterations take the form x k = x k + ωa T [b Ax k ], () where, k =,,...,andω is a real positive parameter satisfying <ω</ A T A. The values of ω determine, in part, the convergence of the iteration. The semiconvergence property of this algorithm is typically proved using arguments based on the singularvaluesdecompositionof the matrix A (for a discussion of this, see Vogel ). However, it is, perhaps, more instructive to rewrite Eq. () in the Fourier domain, obtaining x k (i, j) = b(i, j) [ ( ω Â(i, j) ) k], () Â(i, j) with < ω < / max [ Â(i, j) ]. If, as usual, the PSF is assumed to have unit volume, then max [ Â(i, j) ] = and <ω<. From this equation, one can see that, for a given frequency index (i, j), the closer the term ω Â(i, j) is to one, Article published by EDP Sciences and available at or

7 R. Vio et al.: Algorithms for image restoration λ p κ k (r) λ g Fig.. Relationship between the estimates λ g and λ p, corresponding to the ian and the ian noise, respectively, obtained in different realizations of the experiment shown in Fig. 3. The mean value and the standard deviation are.8 and for λ g, and.8 and 9. 4 for λ p. the more rapid is the convergence to b(i, j)/â(i, j), which corresponds to the unregularized solution. Since, as mentioned above, the largest values of the spectrum of Â(i, j) correspond to the lowest frequencies, it is evident from Eq. () that the lower frequencies are restored in early iterations, while progressively higher frequencies are restored as the iteration progresses. There are gradient descent algorithms similar to Landweber for minimizing a least-squares function subject to nonnegativity constraints (e.g., see Vogel ) Convergence properties Equation () shows that the convergence of is driven by the rate with which the term K k (i, j) = ( ω Â(i, j) ) k (3) goes to zero. In order to understand what this means in practical situations, it is useful to see what happens in the case of a noise-free image when the PSF is a two-dimensional ian with circular symmetry and dispersion σ p. Without loss of generality, we set ω =. Then K k (i, j) [ exp( r / σ p )] k, (4) where r = i + j and σ p is the dispersion of the PSF in the frequency domain (see Eq. ()). From this equation, even in the case of moderate values of k, the term within square brackets on the rhs of Eq. () can be well-approximated by the Boxcar function r Fig.. Plot of K k (r) vs.r for different values of the iteration count k. where r,k is the value of r for which K k (r) = (seealso Fig. ). Therefore, the iterate () can be approximated by x k (i, j) = b(i, j) Â(i, j) Π k(i, j). (6) The requirement that K k (i, j) ɛ, with <ɛ<implies ln ɛ k > ln [ ] (7) exp( r / σ p) This result shows that the restoration of the highest frequencies (i.e., large r), requires a number of iterations that becomes rapidly huge. For the case of the star-like sources, where all the frequencies have to be restored, this means an extremely slow convergence. More specifically, from Eq. (7) one can see that for frequencies (i, j) suchasr < σ p, k r, while forlargervaluesofr, k increases exponentially. For example, some of the experiments presented in this paper are based on images with size pixels and with a ian PSF with σ p 3.5. In this case, if ɛ =, in order to have r,k = σ p, σ p, 3 σ p, 6 σ p (i.e., r,k = 3.5, 7,, ), it is necessary that k, 4, 56, 3 5, respectively. The obvious conclusion is that is useful only for the restoration of objects for which the low frequencies are dominant, e.g. extended objects with smooth light distributions Numerical results Since, regardless of the noise type, reconstructs the lower frequency components of the image first (i.e., the frequencies where the contribution of the noise is negligible), we expect the following for both b p (i, j)andb g (i, j):. The resulting deblurred images should be very similar; Π k (i, j) = { if r r,k ; otherwise, (5). In early iterations the convergence rate of the algorithms should be almost identical. Article published by EDP Sciences and available at or

rms T = 3 rms I = 33 Blurred image / ian / rms T = 93 rms I = 37 Fig.. deblurring of the images shown in Fig.

rms I is the rms calculated by using only the pixels containing the contribution of the satellite. Fig. 4. deblurring of the images shown in Fig. 4. rms T and rms I as in Fig.

image / ian / rms T = rms I = 3 Fig. 3. deblurring of the images shown in Fig. 3. rms T and rms I as in Fig.. These statements are supported in Figs.

In particular, from the last set of figures one can see that the convergence curves are almost identical until the minimum rms of the true residual

After that, because the high frequencies (the ones that are more sensitive to the nature of the noise) begin to be included in the restoration, the

Richardson-Lucy algorithm In the previous sections we have shown that the LS methods are relatively insensitive to the specific nature of the noise.

deblurring of the images shown in Fig. 5. rms T and rms I as in Fig.. characteristics of the noise should be able to provide superior results.

8 748 R. Vio et al.: Algorithms for image restoration Blurred image / ( db) / rms T = 3 rms I = 33 Blurred image / (4 db) / rms T = 86 rms I = 356 Blurred image / ian / rms T = 3 rms I = 33 Blurred image / ian / rms T = 93 rms I = 37 Fig.. deblurring of the images shown in Fig. corresponding to the smallest value of the rms of the true residuals calculated on the entire image, say rms T.rms I is the rms calculated by using only the pixels containing the contribution of the satellite. Fig. 4. deblurring of the images shown in Fig. 4. rms T and rms I as in Fig.. Blurred image / (6 db) / rms T = 654 rms I = 3498 Blurred image / (3 db) / rms T = rms I = 33 Blurred image / ian / rms T = 634 rms I = 365 Blurred image / ian / rms T = rms I = 3 Fig. 3. deblurring of the images shown in Fig. 3. rms T and rms I as in Fig.. These statements are supported in Figs. 5 and Figs In particular, from the last set of figures one can see that the convergence curves are almost identical until the minimum rms of the true residual is reached. After that, because the high frequencies (the ones that are more sensitive to the nature of the noise) begin to be included in the restoration, the curves diverge. 4. Richardson-Lucy algorithm In the previous sections we have shown that the LS methods are relatively insensitive to the specific nature of the noise. However, this does not mean that they are optimal. In principle, methods that exploit the a priori knowledge of the statistical Fig. 5. deblurring of the images shown in Fig. 5. rms T and rms I as in Fig.. characteristics of the noise should be able to provide superior results. As has been stated above, the least-squares problem (9) corresponds to the assumption of additive ian, stationary noise statistics. In this case, the solution is known as the maximum likelihood estimator (MLE) of the true image. Assuming, instead, noise model (5), one can show that the corresponding MLE is the minimizer of the negative log-likelihood function J(x) = T [ Ax b log(ax) ], (8) subject to the nonnegativity constraint x. The nonnegativity constraint can be motivated physically by noting that the photon count at each pixel is positive. The function J in Eq. (8) is convex. In fact, assuming A is positive definite, J is Article published by EDP Sciences and available at or

9 R. Vio et al.: Algorithms for image restoration 749 Landweber Method S/N = db Landweber Method S/N = 4 db s s 3 4 Fig. 6. Convergence rate of the algorithm applied to the problem of deblurring the images shown in Fig Fig. 8. Convergence rate of the algorithm applied to the problem of deblurring the images shown in Fig. 4. Landweber Method S/N = 3 db Landweber Method S/N = 6 db s s 3 4 Fig. 7. Convergence rate of the algorithm applied to the problem of deblurring the images shown in Fig Fig. 9. Convergence rate of the algorithm applied to the problem of deblurring the images shown in Fig. 5. strictly convex. In this case, the unique minimizer x of J subject to the nonnegativity constraint x must satisfy the Kuhn-Tucker condition x J(x) =, (9) where ( J(x) = A T b ) (3) Ax If the standard assumption A T = is made, Eq. (9) can then be written x = x A T b Ax, (3) where the fraction of two vectors denote Hadamard (component-wise) division. From Eq. (3) we obtain the fixed point iteration x k+ = x k A T b b k, (3) where b k = Ax k. Iteration (3) is known as Richardson-Lucy (RL). It can be also obtained using EM formalism (e.g., see Vogel ; Shepp & Vardi 98). Since RL exploits the a priori knowledge regarding the statistics of photon counts, it should be expected to yield more accurate reconstructions than an approach that does not use this information. In reality, as shown by Figs. 7, the situation is not so clear. These figures provide the convergence rates and the performances of RL and methods for objects with a size that is increasing with respect to the size of the PSF Article published by EDP Sciences and available at or

10 75 R. Vio et al.: Algorithms for image restoration RL RL Fig.. x k x / x vs. the number of iterations. The object of interest, located in the center of the image, is a two-dimensional ian with circular symmetry and dispersion (in pixels) given in the top of each panel. It is superimposed on a background whose level is % of the peak value of the blurred image. The PSF is a two-dimensional ian with a dispersion of 6 pixels. The size of the image are pixels. The noise is ian with peak S/N = 3 db. Two deblurring algorithms are used: Richardson-Lucy (RL) and. 8 RL Fig.. As in Fig.. 3 (a two-dimensional ian with circular symmetry). Two different types of objects are considered: a two-dimensional ian and a rectangular function. Since the first target object presents an almost band-limited spectrum, whereas for the second target object the high-frequency Fourier components are important, their restorations represent very different problems. For both experiments, a background with an intensity of % of the maximum value in the blurred image has been added. Finally, two different levels of noise have been considered corresponding to a peak S/N of 3 and 4 db, respectively. The first case provides a background with an expected number of counts Fig.. As in Fig. but with S/N = 4 db. 3 8 RL 3 3 Fig. 3. As in Fig. but with S/N = 4 db. 3 3 approximately equal to 3, i.e., a level for which the ian approximation of the ian distribution is not very good. From Figs. 7it appearsthattheperformanceofrl for objects narrower than the PSF is in general superior to for the band-limited target. The same is not true for the other objects. Interestingly, though, for extended objects, i.e. smooth objects with high intensity profiles over large regions, the performance of RL is roughly equivalent to that of LS (to properly compare the convergence rate, it is necessary to take into account that, for each iteration, RL requires the computation of twice the number of two-dimensional DFT than is required by ). This is especially true for the images characterized by the best S/N. Motivated by these numerical results, we seek answers to the following questions: (i) why does RL perform better than LS on star-like objects; and (ii) why do the RL and LS approaches yield roughly the same results on extended objects? Article published by EDP Sciences and available at or

11 R. Vio et al.: Algorithms for image restoration 75 RL RL Fig. 4. x k x / x vs. the number of iterations. The object of interest, located in the center of the image, is of a two-dimensional rectangular function, with side length (in pixels) given in the top of each panel. It is superimposed on a background whose level is % of the peak value of the blurred image. The PSF is a two-dimensional ian with a dispersion of 6 pixels. The size of the image are 8 8 pixels. The noise is ian with peak S/N = 3 db. Two deblurring algorithms are used: Richardson-Lucy (RL) and.. 8 RL. Fig. 5. As in Fig RL vs. LS: Preliminary comments.. 3 In practice, when either the RL or LS approaches are used in solving image reconstruction problems, the exact computation of the maximum likelihood estimate (MLE) is not sought. For example, as was stated above, the iteration implements regularization via the iteration count. In fact, the objective when using is to stop the iteration late enough so that an accurate reconstruction is obtained, but before the reconstruction is corrupted by the noise in the high frequency components of the image. Notice, for example, that in Figs. 7 the relative error begins to increase at a certain point in both the RL and iterations. Fig. 6. As in Fig. 4 but with S/N = 4 db.. 8 RL. Fig. 7. As in Fig. 4 but with S/N = 4 db... 3 As was stated above, one can show that the iterations provide regularized solutions of A T Ax = A T b via the singular value decomposition (SVD) of the matrix A. Unfortunately, such an analysis of RL is impossible due to the nonlinearity in the RL algorithm. In particular, note the Hadamard multiplication and division in algorithm (3). Instead, we first note that if A is an invertible matrix, RL iterations converge to the MLE corresponding to the statistical model (5) (see Wu 983). The MLE is also the minimizer of the function (8) subject to x. When the RL algorithm is used on image deblurring problems it exhibits a similar convergence behavior to that of the iteration. Specifically, for ill-conditioned problems, the RL iterates {x k } provide more accurate reconstructions in early iterations (semiconvergence property), while in later iterations blow-up occurs (Lucy 974; Carasso 999). To explain why this occurs, we note that at the positive components x j of the MLE vector x, we must have, from Eqs. (9) and (3), that [ b ] =, (33) Ax j Article published by EDP Sciences and available at or

12 75 R. Vio et al.: Algorithms for image restoration or [Ax] j = [b] j.ifx j > forall j and A is invertible, we have x = A b, which is what iterations will converge to if A is invertible. Thus, it is not surprising that since blow-up occurs as x k x in the iteration when A is a poorly conditioned matrix, we will see the same results when we use RL. One consequence of this fact is that during reconstruction, RL uses only a few frequencies and therefore cannot fully exploit prior statistical information regarding the noise (this would require the use of the entire spectrum). 4.. RL vs. LS: A sensitivity analysis To obtain a deeper understanding of the RL and LS approaches and to answer the two questions posed above, we introduce the quantities (x k ) = A T r k ; (34) RL (x k ) = x k A T r k, (35) Ax k which provide the correction to the solution x k at the kth iteration for and RL, respectively. Here, r k = Ax k b, (36) and, without loss of generality, we have set ω = in the iteration. We note that in order to obtain Eq. (35) we needed the identity A T = to hold. From these equations it is evident that at each iteration corrects the solution x k with a quantity proportional to r k, while the correction provided by RL is proportional to x k itself. Thus it is not surprising that RL outperforms when applied to reconstructing objects composed of star-like sources on a flat back ground, since in the early stages of both iterations the entries of x k are large and increasing in regions corresponding to the positions of the objects in the image, while the values of r k are correspondingly small and decreasing. However, as has been shown by the simulations presented above, RL does not outperform when applied to reconstructing objects with smooth light distribution and whose spatial extension is broader than the PSF. In order to understand this phenomenon, consider the negative Jacobian matrices of the quantities (34) and (35): J (x k ) = A T A (37) and [ J RL (x k ) = diag A T r ] k Ax k [ ] +diag[x k ] A T b diag A. (38) Ax k Ax k These matrices provide the sensitivities of the and RL algorithms to changes in the components of the iterate x k. Equations (37) and (38) allow us to make several observations. We begin by considering Eq. (37). Since in general astronomical applications A is the discretization of a PSF with an almost limited spatial support, the sensitivity matrix A T A will also have spatial support that is almost limited. From this observation, we can conclude that for a given pixel the corresponding component of the vector will be most sensitive to changes in the value of the pixel itself as well as in the values of nearby pixels. Here, the term nearby is defined by the characteristics of the PSF. More specifically, as the spread of the PSF increases, so does the collection of nearby pixels. Perhaps an even more important observation is that the sensitivity of the iteration to perturbations in x k is independent of both x k and b. Consequently, the algorithm has no means of distinguishing between low and high intensity regions within the object, and hence perturbations of the same magnitude are allowed for components of x k corresponding to regions of both low and high light intensity. This explains why in areas of low light intensity (where the margin of error is very small), and the least-squares approach in general, does poorly. The sensitivity matrix (38) for the RL iteration is more difficult to analyze. However, some simplification is possible when one considers the problem of the restoration of a flat background or of regions of an image in which the intensity distribution varies smoothly (e.g., the interior of the rectangular function considered in the simulations). In this case it is possibletodefinearegionω where the image can be considered constant or almost constant. Because of the semiconvergence property of RL, in such regions the components of the vector r k converge rapidly to zero (this has been verified via numerical simulations). Thus the first term in Eq. (38) converges to zero. The same is not true for the second term. Thus, it is reasonable to expect that it will provide an accurate approximation of the sensitivity of RL within Ω. Provided that the spread of the PSF is small relative to the size of Ω, early in RL iterations the vector x k is approximately constant and close to b, i.e. those pixels values are reconstructed rapidly, within Ω. Hence, the vector b/(ax k Ax k ) /Ax k is also approximately constant within Ω. In addition, if we define D Ω to be the diagonal matrix with components {, j Ω [D Ω ] jj =, j Ω, (39) then D Ω Ax k D Ω AD Ω x k (4) will be accurate within the interior of Ω. To obtain Eq. (4) we used the fact that x k is approximately constant on Ω and that the spread of A is small compared to the size of Ω. Finally, the second term of Eq. (38) can be approximated within Ω as follows: D Ω diag[x k ]A T diag[b/(ax k Ax k )]A diag[x k ]D Ω A T D Ω diag[b/(ax k Ax k )]A (4) diag[d Ω (x k /Ax k )]A T A (4) D Ω A T A. (43) Approximation (4) follows from Eq. (4). Approximation (4) follows from the fact that, as stated above, early in RL iterations b/(ax k Ax k ) /Ax k is approximately constant. Thus we see that not only does the second term in Eq. (38) not converge to zero, it is wellapproximated within Ω by the sensitivity (37). Recalling Article published by EDP Sciences and available at or

13 R. Vio et al.: Algorithms for image restoration 753 that the first term in Eq. (38) converges rapidly to zero in RL iterations, it is therefore not surprising that RL and provide similar results in the interior of the rectangular object mentioned above. We can extend this discussion to extended objects in general by noting that such objects can be decomposed into a union of regions in which the light intensity is approximately constant. Hence, RL and should provide similar results for extended objects in general RL vs. LS: Convergence properties As shown above, presents an acceptable convergence rate only in case of restoration of extended objects. Unfortunately, understanding the convergence properties of the RL algorithm () is difficult since it is not possible to carry out an analysis similar to that done in Sect. 3.. For this reason, we consider, again, a noise-free signal b and ian PSF with circular symmetry and variance σ p. In addition, we suppose that the object of interest is a circular ian source with variance σ b. The amplitude of the source is not considered since RL conserves the total number of counts. Due to the connection between the RL and iterations discussed in the previous section, an understanding of RL convergence may provide further understanding of the convergence of the iteration. For simplicity, in what follows we work in the continuous, and results will be later discretized. If a ian function with variance σ is denoted by G[σ ], then G[σ ] G[σ ] = G σ σ σ ; (44) σ G[σ ] G[σ ] = G σ σ σ + ; (45) σ G[σ ] G[σ ] = G [ σ + ] σ. (46) Here, the symbol indicates convolution. From these equations it is evident that the result of any of the above operations produces a new ian function. Only the first operation requires a condition be satisfied, i.e., σ >σ. This will always be satisfied during the RL iteration (see Eq. (49) below). If we define σ to be the variance of the ians on the right hand side of Eqs. (44) (46), then for Eq. (44) we have σ <σ ; in Eq. (45) we have σ <σ <σ ; and in Eq. (46) we have σ = σ + σ. Consequently, only the operation (45) results in a ian function with a variance that is smaller than both σ and σ. If the true object x is a ian with variance σ o,then using Eq. (46) and the fact that Ax = b, itis σ b = σ o + σ p. (47) As an initial guess in the RL algorithm, we take x =. Now, let us suppose that x k = G[σ k ]. Using Eqs. (44) (46) one can obtain ( G[σ k+ ] = G σ k σ k σ b + σ4 p + σ k p) σ (σ k + σ p) (48) It is a straightforward exercise to show that R(σ k ) = σ k+ σ k provided = σ k σ b + σ4 p + σ k σ p (σ k + σ p) < (49) σ k >σ b σ p. (5) To prove that Eq. (5) holds for all k, we use induction. Since x =, A T =, anda T = A, wehavethatx = A T b = G[σ p + σ b ]. Then σ = σ p + σ b, and hence, Eq. (5) is satisfied for k =. Now, we show that if Eq. (5) holds for k, it must hold also for k +. By replacing σ k+ in Eq. (5) by the argument of the ian function on the right hand side of Eq. (48), one can obtain an equivalent inequality involving σ k given by q(σ k ) >, (5) where q(σ )definedby q(σ ) = σ ( σ σ b + σ4 p + σ σ p) + ( σ p σ b)( σ + σ p = σ p(σ ) + ( 3σ 4 p σ pσ b) σ + ( σ 6 p σ b σ4 p). Notice that q is a quadratic function. We can therefore find its zeros via the quadratic formula. These are given by σ = σ p,σ b σ p. (5) Since σ p >, we know that the graph of q is an upward opening parabola. Furthermore, by Eq. (47) we have σ b σ p = σ >, and hence, we know that if σ >σ b σ p,thenq(σ ) >. Thus Eq. (5) follows from the inductive hypothesis, and our proof is complete. In light of these findings, it is possible to consider some convergence properties of the RL algorithm. We begin by showing that the sequence {σ k } converges to σ = σ b σ p. First, note that Eqs. (49) and (5) imply that {σ k } is a decreasing sequence that is bounded below by σ b σ p. Hence, {σ k } converges to some σ σ b σ p. From inequalities (49) and (5), we have that R(σ ) =. Furthermore, the arguments used in the proof of Eq. (5) imply that if σ >σ b σ p then R(σ ) <. Thus it must be that σ = σ b σ p. With regard to the convergence rate of the RL algorithm, Eq. (49) shows that, almost independently of the characteristics of the object, in the first iteration, when σ b σ p,wehave ( R(σ ) = σ σ b + σp). (53) σ 4 In fact, if x = (i.e., σ = ), it is not difficult to see that x = Ab, i.e., the result of the first iteration is given by G(σ p + σ b ). At this point, there are two possible situations:. For extended objects, we have σ b σ σ p. In this case, R(σ ). In general, this means that we can expect rapid progress in early iterations; after that the convergence rate slows down remarkably. This behavior is similar to that of the algorithm; ) Article published by EDP Sciences and available at or

14 754 R. Vio et al.: Algorithms for image restoration σ p = / σ o = / σ b = 8 σ p = / σ o = 6 / σ b = 3 Original image Observed image S/N (db) = 3 σ = σ k σ k σ p = / σ o = / σ b = 7 σ k σ k 8 6 σ = / σ = 48 / σ = 49.5 p o b RL rms of true residuals =.6543 rms of true residuals = Fig. 8. σ k vs. the number of iterations for RL in the case of a ian source with various values of σ o and a ian PSF with dispersion σ p = 6. [ R(σ k ) ]/ [ R(σ k ) ]/. σ p = / σ o = / σ b = σ p = / σ o = / σ b = 7. [ R(σ k ) ]/ [ R(σ k ) ]/. σ p = / σ o = 6 / σ b = 3 σ p = / σ o = 48 / σ b = Fig. 9. R / (σ k ) vs. the number of iterations for the cases shown in Fig. 8.. For star-like objects, we have σ b σ p.now,ifwesetσ k = ασ p,then R(σ k ) = + α ( + α ) (54) For example, when α =,, /3, /6, then R(σ k ) = 5, 6, 9, 99, respectively. In other words, although the convergence rate of RL slows down as the iteration progresses, this effect is notas pronouncedas it is for the algorithm. These statements are confirmed by Figs. 8, 9. A comparison between the RL solution for star-like sources at the kth iterate x k (i, j) exp ( r / σ k ) (55) Fig. 3. Deblurring results obtained with RL and for a star-like object superimposed on an extended source given by a ian with circular symmetry and dispersion set to 3 pixels. An additional background is present with a level set to % the maximum of the noise-free blurred image. The PSF is a ian with circular symmetry and dispersion set to 6 pixels. The size of the image are 8 8 pixels. The noise is ian with peak S/N = 3 db. The image presented corresponds to the result with the smallest rms of the true residuals. with the corresponding solution x k (i, j) =Π k (i, j), (56) provides some additional insight into the convergence properties of these algorithms. From Eq. (55) it is evident that although the high frequencies are filtered in the RL algorithm, the filter is less stringent for high frequencies than is the Landweber filter. The consequence is that, in general, at agivenk, RL has available a broader range of frequencies to restore the object. On the one hand, this can improve the convergence rate of RL compared to ; on the other hand this could create problems when one or more star-like objects are superimposed on an extended object. A few RL iterations are sufficient to restore the extended object. The same is not true for the star-like objects. Therefore, more iterations are necessary. However, because of the amplification of the noise, this means the degradation of the results in the parts of the image not occupied by the star-like objects. This effect is clearly visible in the experiment shown in Fig Conclusions In this paper we provide explanations for why, in spite of the incorporation of a priori information regarding the noise statistics of image formation, the RL deblurring algorithm often does not provide results that are superior to those obtained by techniques based on an LS approach. In particular, we have identified a possible explanation in the regularization approaches of the specific algorithms. The adoption of a penalty term in Article published by EDP Sciences and available at or

Dealing with edge effects in least-squares image deconvolution problems

Astronomy & Astrophysics manuscript no. bc May 11, 05 (DOI: will be inserted by hand later) Dealing with edge effects in least-squares image deconvolution problems R. Vio 1 J. Bardsley 2, M. Donatelli