Image Reconstruction in Radio Interferometry

Draft January 28, 2008 Image Reconstruction in Radio Interferometry S.T. Myers National Radio Astronomy Observatory, P.O. Box O, Socorro, NM 87801 and Los Alamos National Laboratory, Los Alamos, NM 87545 ABSTRACT The problem of image reconstruction in radio interferometry is traditionally solved through the application of iterative deconvolution algorithms such as CLEAN and MEM. Imaging from interferometer data is hampered by incomplete sampling of the aperture plane, but aided by our knowledge of the electrical and optical properties of the antennas. The imaging equation is linear but not invertable, and requires a statistical approach due to the instrumental and sky noise and for application of prior knowledge to the image propreties for convergence. In this article, I describe the mathematical formalism for standard radio (multiplicative) interferometry and suggest some interesting avenues for exploration of new reconstruction techniques. Subject headings: radio astronomy interferometry data analysis 1. Introduction Radio astronomy has flourished in the past 50 years, largely due to the development of connected-element and very long baseline intereferometry, allowing the synthesis of apertures much larger than those allowed by the physical construction of single telescopes. Radio interferometry is characterized by its ability to measure the complex (amplitude and phase) of the correlations of the electric field as voltages amplified by the telescope receivers. The ability to measure the phases of the correlations is critical to radio interferometric imaging. Because most radio interferometers are designed to obtain the highest angular resolutions while retaining the minimal surface brightness sensitivities, they tend to be sparse arrays equivalent to unfilled apertures. This in turn leads to a number of challenges in image reconstruction from these instruments, some of which have not been satisfactorily solved, particularly in the context of high dynamic range and high fidelity that we desire of our next generation of radio arrays. In this article, we review the state of the art in (radio) interferometric image reconstruction and deconvolution,

2 outline the challenges that must be overcome for the success of future instruments, and propose some avenues for exploration. There are a number of books that can serve as references on the theory and techniques of radio interferometry, for example Thompson, Moran, & Swenson (1986) and Taylor & Carilli (1999). Examples of radio interferometers include the Very Large Array (VLA) and Very Long Baseline Array (VLBA), both of which are described in Taylor & Carilli (1999). 2. Interferomety For simplicity, we illustrate the specific case of an interferometer such as the Cosmic Background Imager (CBI: Padin et al. (2001)) with antennas located in a plane. In the following exposition of interferometric imaging, we have built upon the ideas presented in Myers et al. (2003), and adopted the notation used in Myers et al. (2006). In this interferometer, the complex voltage from each pair of elements is brought together and multiplied in the correlator. This is schematically shown in Figure 1 (reproduced from Myers et al. (2006)). Thus, a wavefront arriving at the interferometer from the array axis ends up in phase at each multiplier. In a general interferometer, such as the VLA or VLBA, the elements are located in 3-dimensional space, though usually roughly confined to the surface of the Earth at varying elevations. In this case, delays are introduced in the signal chain from each element to the correlator to compensate, bringing into phase wavefronts from a particular direction referred to as the phase center. The phase center is usually, though not always, set to be the pointing direction of the antennas of the array at a given time. For connected element interferometers such as the VLA, these delays are adjusted in real time, while the VLBA records the signals at each antenna and then plays them back at the correlator, introducing the proper delays at that time. For our purposes, this is equivalent to the planar array with a phase center oriented along the array normal. We calibrate such a system such that a point source located at infinite distance in the pointing direction of the array whose wavefronts are thus perpendicular to the normal to the array, produces a correlated signal with constant amplitude equal to the flux density in Janskys (Jy) 1 of the source. For interferometers that have a field-of-view restricted to a few degrees or less, it is convenient to define a celestial coordinate system defined in a tangent plane perpendicular to the phase center direction. As in Myers et al. (2003), we define direction cosines as a vector θ, with θ = (θ x, θ y ) θ x = cos δ sin(α α 0 ) θ y = sin δ cos δ 0 cos δ sin δ 0 cos(α α 0 ) (1) between the direction with right ascension and declination α, δ and the phase center of the array 1 1 Jy = 10 26 W m 2 Hz 1.

3 Fig. 1. Schematic of a planar interferometer such as the CBI. Signals from pairs of elements are correlated, with the delays set such that the signals from on-axis wavefronts arrive coherently at the multipliers. Wavefronts coming from an angle θ off-axis correlate with a residual phase 2πθB/λ between antenna pairs with projected baseline B at observing wavelength λ. The real and imaginary parts of the complex correlations are computed, and the outputs are complex uncalibrated visibilities. (Reproduced from Myers et al. (2006).) in direction α 0, δ 0. The θ are then Cartesian coordinates in this tangent plane. Consider a pair of antennas in the array separated by the baseline vector (in meters) B ij from antenna i to antenna j, projected to the aperture plane perpendicular to the phase center direction. We choose a convention with the positive y-axis of both θ and B ij in the direction toward the North Celesial Pole, and the positive x-axis towards the East. Then, the phase of the correlation for this

4 baseline for a source in direction θ is φ ij = 2πθ B ij /λ (2) where λ is the observing wavelength (in meters). Therefore, we can represent the response to a point source at θ as S ij = S 0 e iφ ij = S 0 e i 2πθ B ij/λ. (3) The phase factor is the same as the phase of the kernel of a Fourier transform between the aperture plane of B and the sky plane of θ, and the interferometry equations can thus be formulated in terms of Fourier transforms of the sky brightness projected onto tangent planes Thompson, Moran, & Swenson (1986). The natural decomposition of the sky into waves is through spherical harmonics. For angular scales of a few degress or less, the spherical harmonics become Fourier modes, with the the k- vectors of the decomposition of the sky into plane-waves e ik θ with k = 2πu. The u = (u, v) is the coordinate system in the aperture plane, commonly called the uv-plane in radio interferometry. The Ĩ is the Fourier transform of the sky intensity field I. We have adopted the convention Ĩ(u) = d 2 θ I(θ) e 2πiu θ (4) with inverse transform I(θ) = d 2 u Ĩ(u) e2πiu θ. (5) Since I is real, the transform Ĩ is Hermitian in the complex uv-plane and only the real and imaginary parts over a half-plane are independent. Consider the case where our interferometer is sensitive only to the intensity of the radiation. In the absence of noise, our interferometer would then measure a set of complex visibilities, for each baseline pair of antennas and each time and frequency channel. Our interferometer sums the sky signal over all relative pointing directions θ with phase in the aperture plane given above. The wavefronts are modified by the aperture illumination functions of the individual antennas. For visibility index k, we write in the absence of noise, ṽ k = d 2 u Ãk(u k u) Ĩ(u) e2πiu θ k (6) = d 2 θ A k (θ θ k ) I(θ) (7) where θ k designates the pointing direction of the array with respect to the phase center when visibility k was taken. In general, θ k is fixed (and is zero when pointed at the phase center) if we are observing a single source, or will raster across the sky if imaging a mosaic. The function Ã is the aperture cross-correlation of the aperture (complex voltage) patterns of the pair of antennas involved in the visibility correlation, and A is the primary beam of the interferometer A k (θ) = d 2 u Ãk(u) e 2πiu θ (8)

5 Fig. 2. Left: The CBI. Each baseline between pairs of antennas is correlated. If a coherent signal were transmitted from the pair, it would project a sinusoidal fringe on the sky with k-vector k = 2π B/λ. Thus, a given baseline is sensitive to plane waves with this k. Right: In the uv-plane, the baseline is the center of a locus of points of width equal to the sum of the dish diameters in wavelengths. The correlation sums together the plane waves whose k = 2πu fall in this region. (Adapted from Figure 2 in Myers et al. (2006).) which sets the field-of-view. Note that the center of the correlation k between antennas i and j observed at wavelength λ k lies at u k = B ij /λ k (9) and has a width equal to the sum of the dish diameters in wavelengths, with no support outside this region of the uv-plane: Ã k (u) = 0 for u (D i + D j )/λ k (10) for antenna diameters D i and D j. This is illustrated in Figure 2. If we sampled the entire uv-plane with ṽ at a single pointing θ k using a single aperture function Ã k = Ã, then we could take the inverse Fourier transform of Equation 6 d 2 u ṽ(u ) e 2πiu θ = A(θ θ k ) I(θ) (11) Thus, we expect that by Fourier transforming our visibilties we can recover an image of the sky I multiplied by the primary beam A.

6 2.1. Interferometric Polarimetry Radio receivers are sensitive to the polarization of the incoming electromagnetic waves. Typically, the system is set up to output one or both of a pair of orthogonal polarization pairs, either linear (X and Y ) or circular (R and L). Our correlator will then produce products pq of these, pq = RR, LL, RL, LR for circularly polarized receiver systems (such as the VLA, VLBA or CBI), or pq = XX, Y Y, XY, Y X for a linearly polarized system (such as ATCA or ALMA). We then need to map these to the intensity fields corresponding to the Stokes parameters I, Q, U, V. See the series of papers Hamaker et al. (1996), Sault et al. (1996), Hamaker (2000) for a detailed description of radio interferometric polarimetry. Consider an interferometer sensitive to circular polarization (R and L). Then, in the absence of noise, [ ] ṽ RRk = d 2 θ A RRk (θ θ k ) I(θ) + V (θ) e 2πiu k (θ θ k ) (12) [ ] ṽ RLk = d 2 θ A RLk (θ θ k ) Q(θ) + i U(θ) e 2πiu k (θ θ k ) e i2ψ k (13) [ ] ṽ LRk = d 2 θ A LRk (θ θ k ) Q(θ) i U(θ) e 2πiu k (θ θ k ) e i2ψ k (14) [ ] ṽ LLk = d 2 θ A LLk (θ θ k ) I(θ) V (θ) e 2πiu k (θ θ k ) (15) or in the Fourier domain ṽ RRk = ṽ RLk = ṽ LRk = ṽ LLk = ] d 2 u ÃRRk(u k u) [Ĩ(u) + Ṽ (u) e 2πiu θ k (16) [ ] d 2 u ÃRLk(u k u) Q(u) + i Ũ(u) e 2πiu θ k e i2ψ k (17) [ ] d 2 u ÃLRk(u k u) Q(u) i Ũ(u) e 2πiu θ k e i2ψ k (18) ] d 2 u ÃLLk(u k u) [Ĩ(u) Ṽ (u) e 2πiu θ k. (19) The Q and U Stokes fields describe the linear polarization, and represent components of a polarization (pseudo-)vector in the coordinate frame of the celestial sphere, with Q aligned North (Q > 0) or East (Q < 0) and with U at ±π/4 with respect to North (NE/NW). The Stokes V field represents the circular polarization, with handedness of RCP (V > 0) and LCP (V < 0), again with respect to the celestial sphere. See Hamaker & Bregman (1996) for a discussion of the conventions adopted in interferometry. The quantity ψ k is the parallactic angle of the visibility k, and represents the relative orientations of the antenna frame and the celestial frame for the antenna pair in the correlation. For the planar CBI array, the antennas are locked to the deck of the telescope and thus rotate along with the baselines with respect to the sky. Then ψ k = χ(uk) χ ij0 = 0 where χ(u k ) is the baseline

7 orientation of visibility k χ(u) = tan 1 (v/u) (20) and χ ij0 is the reference orientation of baseline ij in the array (e.g. at deck-angle zero). For a tracking array like the VLA, the parallactic angle depends upon the design of the telescope mount. For an equatorial mount, such as on the WSRT, the receiver system maintains the same orientation with respect to the sky unless physically rotated around its optical axis, thus usually ψ k = 0 in this case. For antennas on altitude-azimuth mounts, the parallactic angle rotates as the pointing direction moves across the sky. In general, 2ψ k = ψ i + ψ j for a baseline between antennas i and j. For antennas located in the same geographical location such as the VLA, ψ i ψ j ψ k, while for widely separated antennas such as in the VLBA, these are different and must be computed separately. For alt-az telescopes, at time t [ ψ(t) = arctan cos(lat) sin(ha(t)) sin(lat) cos(δ) cos(lat) sin(δ) cos(ha(t)) where lat is the latitude of the telescope location and ha is the hour angle of the source the difference between its right ascension α and the current local sidereal time (LST) which is the right ascension currently on the meridian. See Taylor & Carilli (1999) for more details. From equation 16 one can see that the linear polarization of the radiation field is described by the combination P (u) = Q(u) + i Ũ(u) (22) which is the Fourier transform of the complex polarization field ] (21) P (θ) = Q(θ) + i U(θ). (23) Note that P is defined over the entire complex uv-plane and is not Hermitian, although Q and U are individually. Then ṽ RLk = d 2 u ÃRLk(u k u) P (u) e 2πiu θ k e i2ψ k (24) ṽ LRk = d 2 u ÃLRk(u k u) P (u) e 2πiu θ k e i2ψ k. (25) 3. Basics of Interferometric Imaging The observed visibilities are related to the distribution of (polarized) intensity on the sky through the above system of linear integral equations 12 and 16. We can assemble the visibility data points into a data vector ṽ, and operate on this accordingly. Consider a sky signal vector s with s i = S(θ i ) over a set of pixels i on the sky. This true image s is related to the Fourier domain signal s through s = F s s = F 1 s (26)

8 where s l = S(u l ) for uv-plane cell u l = (u l, v l ) and is the Fourier transform linear (matrix) operator and its inverse. F il = e 2πiu l θ i F 1 li = e 2πiu l θ i (27) We can set up the interferometric imaging problem as a linear operation relating the timeordered correlations or visibilities ṽ to the uv-plane signal s. We turn equation 6 into a linear algebraic matrix operation ṽ = Ã s + ñ (28) where we have now added an instrumental noise vector ñ to the visibility data. The elements of Ã contain the aperture cross-correlation function and interferometer phase factor of Equation 6, Ã kl = Ãk(u k u l ) e 2πiu l θ k. (29) Thus Ã has dimensions of the number of visibilities k times the number of cells l in uv-space. The sampling of the uv-plane is encapsulated in the set of u k spanned by the visibilities. We also treat the data and signal vectors as real, although they are most naturally formulated as complex quantities. In this case, the real and imaginary parts can be packed into the real vector with the transform symmetries built into the kernels for the operators. We will switch between real quantities in the matrix notation and complex quantities where convenient, so beware. It is the noise and the incomplete sampling of the uv-plane that prevents us from simply inverting the imaging equation, e.g. computing ŝ = Ã 1 ṽ. There are three main effects to deal with when looking for possible solutions to equation 28: 1. unknown noise ñ, 2. convolution due to the non-zero size (in uv-space) of kernel Ã, and 3. incomplete sampling of the uv-plane. The first effect leads us to adopt a statistical inference approach to the reconstruction, while the third leads us to apply some sort of prior information to break degeneracies in solutions caused by the incomplete sampling. Although the exact values of the noise vector ñ are unknowable, we can use the probability distribution function for ñ to determine the best (most probable) model for s. For example, we will assume that ñ is a realization of Gaussian noise with zero mean and covariance Ñ = ñ ñ t. (30) Note that the noise need not be independent between visibilities and thus Ñ need not be diagonal. The probability distribution function for the noise ñ = ṽ Ã s is thus given by the likelihood function L = [det(ñ 1 /2π)] 1/2 exp [ 12 ] (ṽ Ã s)t Ñ 1 (ṽ Ã s). (31)

9 By solving for the signal that maximizes L (solving dl/d s = 0), one can construct the maximum likelihood estimate (MLE) model m MLE for the signal s, m MLE = R 1 MLE Ãt Ñ 1 ṽ (32) where R MLE = Ãt Ñ 1 Ã. (33) See Hobson & Maisinger (2002) for an example of this in CMB interferometer analysis. In effect, to obtaion this MLE model we are convolving the data with kernel Ã t, and then deconvolving with R 1. The inverse noise variance weighting that optimizes the signal-to-noise ratio is given by Ñ 1. In most cases however, the operator R is ill-conditioned (and usually singular) as it involves convolution by Ã and also has zeroes due to the incomplete Fourier-space sampling described above. The noise and incompleteness of the information makes reconstruction inexact except in the case of very simple signals. This makes the process of deconvolution and imaging in radio interferometry often more of an art than an algorithm, as one tries to find a method to obtain an appropriate model m that can be transformed into an acceptable image m = F m. We can consider m MLE as a special case of a general map m = R 1 d (34) where d = H ṽ = R s + ñ d (35) is a gridded version of the visibilities using kernel H. In this case R = H Ã (36) is the response operator of the gridded data to the signal, and ñ d = H ñ (37) is the gridded noise vector. Note that using H MLE = Ãt Ñ 1 recovers the MLE map m MLE. Note that no more information is contained in m (or m) than in d, assuming R is even invertible. Therefore, one need only convolve the data with a kernel that approximates H MLE to be close to the optimal solution. The d can thus be used for further processing and analysis. One can make a dirty (ie. not deconvolved) image d of the sky by transforming d = F d = R s + n d R = F R F 1. (38) The matrix R encodes the mapping from the true sky s to the image d. For example, the vector formed from the diagonal of R corresponds to the primary beam of the array, relating a unit

10 input sky pixels s i to an output pixel m i, while the point spread functions (PSF) are a set of vectors taken from the rows of R normalized by the diagonal. Note that these dirty images are effectively multiplied by an extra primary beam attenuation factor, due to the presence of Ãt in H. A standard dirty interferometer image, for example made with AIPS, would use a much narrower gridding convolution function. Note that this factor is necessary in order to optimally weight and produce a single mosaiced image made from visibilities taken in a raster of different pointings θ k (see Myers et al. (2003) for a discussion of this). This does mean that in images made through transforming gridded data using this kernel, signals from sources in the image are attenuated by the square of the primary beam instead of just by the primary beam. This can be corrected for by dividing by the square of the primary beam to make a corrected image. 3.1. The Point-spread Function One can construct a vector for the response to a point source at a fiducial phase center θ 0 using Equation 4 s P SF (θ) = δ 2 (θ θ 0 ) s P SF (u) = e 2πiu θ 0 (39) where δ is the Dirac delta function. Then, d P SF = R s P SF d P SF = R s P SF (40) where the map d P SF is what is traditionally referred to as the point-spread function (PSF). Since s P SF is a delta-function, the PSF d P SF is just a column of R for the pixel corresponding to θ 0. This is another way of saying that Equation 38 implies that our dirty image is the true sky convolved with the response R. Note that in general sources at different positions θ 0 have different PSF vectors, particularly if they are made from mosaiced data (which gives different weights to different patches of the image plane). For simple images made from single field data, the PSF is the same across the image except for the amplitude which reflects the primary beam attenuation. 3.2. Polarimetric Imaging As in the definition of s, we can define a polarization vector p with p l = P (u l ). Using equation 24, we have a polarization imaging equation corresponding to the intensity equation 28, with ṽ P = ÃP p + ñ P (41) Ã P kl = ÃP (u k u l ) e 2πiu l θ k e i2ψ k (42)

11 where as before ψ k ) is the parallactic angle of visibility k. The ṽ P is equivalent to an RL correlation product as it works on P = Q + iu. The LR visibilities are treated as complex-conjugated RL correlations reflected about the origin in the uv-plane. Note that in most data that one gets from radio interferometers the parallactic angle rotation e i2ψ k (for RL correlation products) is already applied so that the visibility data can be transformed easily to make Q and U Stokes images in the standard software packages. Also, the polarization primary beam transform ÃP is in general different than Ã for the intensity, unless the optical system is perfectly symmetric (on-axis). For example, there is a significant polarization structure to the VLA primary beam Brisken (2003). As in the case for d for intensity in equation 35, we can construct the gridded linear polarization estimators d P = H P ṽ P = R P p + ñ dp RP = H P Ã P ñ dp = H P ñ P. (43) One can then construct a dirty polarization image d P equivalent to d using Equation 38. Unlike the case for total intensity, the linear polarization image is complex, being a map of Q + iu. In the absence of circular polarization (Stokes V ), then our total intensity estimator d can be made by gridding the co-polar visibilities ṽ RR and ṽ LL together in equation 35. Note that if circular polarization is signficant, then our intensity estimators d made from gridding ṽ RR and ṽ LL together will be contaminated with V (see equation 16), unless the relative weights between RR and LL visibilities are exactly balanced. In general, images of I and V will have to be constructed from the sum and difference of separate d RR and d LL or from the sum and difference of images made from them. This is fraught with peril, as any mismatched between RR and LL will show up as spurious V even in the case where V = 0. It is therefore difficult to reconstruct circular polarization from interferometer data taken with circularly polarized receivers. For this reason, linearly polarized receivers are typically used when circular polarization is the target. 4. Image Reconstruction and Deconvolution We now turn to details of the more involved and interesting problems in the reconstruction of images using interferometer data. This comes down to finding a model image m that best fits the data ṽ and is free of artifacts introduced by the loss of information due to the observing process, as encapsulated in the operator Ã. In our formulation, we grid the visibility data onto uv-plane map d using a kernel H. This is a regular Cartesian grid in u, and thus can be (inverse) Fourier transformed into a sky image d. Our model image m is thus congruent to the dirty image d and live as rasters on an image (pixel) grid θ. One can think of this process as moving between the different spaces in which the visibility

12 data and the image live. We have the relation Image Space Fourier Space Visibility Data Space (44) where we use Fourier space and uv-space interchangeably. The operators A, H and F carry out the transformations between these spaces. For example, s = F s transforms an image space signal into a Fourier-space signal. ṽ = Ã s + ñ transforms Fourier-space signal to visibility data, and d = H ṽ grids the visibility data into Fourier space. Note that not all operators have inverses (Ã and H are not invertible). Given a model image m, we need to compute the contribution to the visibilities corresponding to that model. We first transform m = F 1 m (45) and then find in analogy with equation residual visibility ṽ m = Ã m = Ã F 1 m (46) 28. At some level, finding the best model will involve minimizing the δṽ m = ṽ ṽ m = Ã ( s m) + ñ (47) which includes errors in the model plus the noise. 4.1. Model Spaces Although we wish to produce a clean model image m, we need not construct our model reconstruction directly on the image space. We can instead determine a model vector h on a hidden space (see Maisinger et al. (2004)) which can be transformed to the image space m = K f h (48) with the kernel K f giving the forward transform from the hidden space to the image space. The columns of K f contain the basis functions that we are using to decompose the sky, while the coefficients are in h. Note that K f need not be invertible and so there may not be a direct transformation from image space to the hidden space. In fact, inversion is possible only if the dimensionality of the image and hidden spaces are the same. For example, Fourier space is equivalent to a hidden space of the same dimensionality as image space and is thus an invertible decomposition. Note that an eigenmode decomposition of m would be another example of a congruent transformation that is useful. In general, however, we want to choose a hidden space with different dimensionality to image space, otherwise we could work directly in the image or Fourier space. One of our main goals will

13 be to look at classes of multiscale models which can capture shapes and sizes of objects in the sky. We can modify our transformation model in equation 44 to include the hidden space Hidden Space Image Space Fourier Space Visibility Data Space. (49) In order to compare our hidden model h to the data, we need to transform to uv-space. We find or Then, to compare to the visibilities, we compute m = F 1 m = F 1 K f h (50) m = K f h Kf = F 1 K f. (51) ṽ h = Ã F 1 K f h = Ã K f h (52) using equation 46. The residual visibility with respect to the hidden model is δṽ h = ṽ ṽ h = Ã ( s h) + ñ (53) where h = K f h = F 1 K f h (54) is the Fourier-space representation of h. 4.2. Deconvolution Let us first consider the traditional deconvolution problem in radio interferometry: find the best, or at least an acceptable, model image m that fits the data ṽ and our preconceived notions of what the sky looks like. Note that some sort of prior information is needed to fill in the information gaps and to prevent us from overfitting to the noise. There are a number of practical methods that have been devised to solve this problem, see the lecture notes in Taylor & Carilli (1999) and the discussions in Sault & Osterloo (1996). The two most widely used are the CLEAN algorithm (described in 4.2.1) and the Maximum Entropy Method (MEM) (described in 4.2.1). Other approaches have been tried, such as direct least-squares model-fitting to a number of components. Examples of these methods are discussed in 4.2.3. 4.2.1. CLEAN The CLEAN algorithm was introduced to radio astronomy by Högbom (1974). A detailed mathematical description of CLEAN was given by Schwarz (1978), similar to the simplified version given below.

14 We will illustrate the CLEAN algorithm using a brute-force toy version that leaves out some of the features that are used to accelerate the method. In its simplest form, clean iteratively cleans out point-sources from the dirty image and places these into a model image. We start with the dirty image d which is constructed from the gridded visibilities using equations 35 and 38. Let us start with the original raw data, which we will put into an initial vector δṽ 0 = ṽ, (55) and the initial dirty image d 0 = F d 0 = F H δṽ 0. (56) We also start with an initial model which is blank, m 0 = 0. In the first iteration, we mask off the dirty image and place a fraction f of that signal into our model δm 1 = f M d 0 (57) where our mask M is zero except at the pixel with maximum brightness, where it has the value 1. The factor f is known as the loop gain and is usually chosen to be f = 0.1 or a similarly small (but not too small) value. This model increment is added to the initial model We then can form the cumulative model visibilities m 1 = m 0 + δm 1 = δm 1. (58) ṽ 1 = Ã m 1 = Ã F 1 m 1 (59) and the residual visibility vector δṽ 1 = ṽ ṽ 1 (60) after the first iteration. This is then gridded and transformed to make a new residual dirty image d 1 = F H δṽ 1. (61) This whole process is then repeated. At iteration i: 1. Find the pixel q containing the maximum in the residual dirty image d i 1, 2. If the maximum residual d i 1,q is below some threshold (usually a multiple of the expected rms residual), then terminate iterations, otherwise, 3. Form mask M (i) with M (i) mn = δ qq (Kronecker delta), 4. Compute model increment δm i = f M (i) d i 1,

15 5. Add to cumulative model m i = m i 1 + δm i, 6. Compute cumulative model visibilities ṽ i = Ã F 1 m i, 7. Compute residual visibility vector δṽ 1 = ṽ ṽ 1, 8. Make new residual dirty image d i = F H δṽ i. Most of the expense in this algorithm is at Steps 5 7 where the model is transformed to model visibilities, a residual is formed, and then transformed back to a new dirty image. Most of the implemented versions of CLEAN change Steps 1 4 so that they find a number of new model components in the image plane before returning to the uv-plane. This significantly speeds up the process, but does not change the essence of the algorithm. It is often said that CLEAN is a non-linear algorithm. The only non-linearity comes about in finding the maximum in the residual vector (and also using it as a stopping criterion). The mask is a function of the dirty residual image that it works upon, M i (d i 1 ). If model δm i could be computed as a linear funtion of d i 1, then the entire sequence of operations could be written as one long linear equation. However, this is only a fairly benign non-linearity, and CLEAN has proven to be a resonably robust deconvolution method in practice. Its problems come in its poor performance on extended objects (CLEAN picks out point-source pixels) and other artifacts. If the true sky signal consisted of a single bright point source, then it is clear that our toy CLEAN implementation would recover the signal until we reached the level of the noise (typically within a few sigma of the noise). In complicated distibutions of point sources, the sidelobes of some sources will interfere with the signals of other point sources. The use of a small loop-gain factor f in the algorithm allows these sidelobes to be cleaned out incrementally, and in practice CLEAN works well for nearly any distribution of point sources (again, down to within a few sigma of the noise level). Using the above procedure, after some number iterations we have arrived at a final CLEAN model m f with a final residual dirty image d f. Note that the model m f is made up of individual model pixels and thus has a resolution given by the pixel size. For viewing or analysis purposes, it is customary to construct a CLEAN image d c that smooths the model by a regularizing kernel and adds back in the final residuals (which may contain some residual signal, plus the noise), d c = B m f + d f. (62) This restored image has the resolution given by the restoring beam B which is usually chosen to be a Gaussian with the same width as the core of the PSF. The reconstruction process can be greatly aided if one knows a priori where in the image the emission is located. This is equivalent to saying that the sky signal s is band-limited, and therefore there are intrinsic correlations in the Fourier domain that can be exploited to reconstruct unmeasured modes. Consider the clean mask M. If M = 0 except within a patch of size θ,

16 then this would imply that the uv-space signal corresponding to F 1 M d would be correlated on uv scales of width 1/ θ. An extreme example is the delta function mask used in our CLEAN procedure. Because M mn (i) = δ qq, the Fourier space signal corresponding to this model component is fully correlated over uv-space so F 1 M d has elements given by F 1 lq d q = e 2πiu l θ q d q. 4.2.2. MEM The MEM algorithm for astronomical imaging was proposed by Ables (1974(@), with practical application implemented by Gull & Daniell (1978). An excellent review of astronomical applications of MEM, including radio interferometric deconvolution, was presented in Narayan & Nityananda (1986). Although this article describes the state of the art as of 1986, the situation for radio interferometric reconstruction has not changed greatly in the past 20 years. There has been some notable recent work in this field however, including Bayesian approaches using wavelets (Maisinger et al. (2004)) and Gibbs sampling (Sutton & Wandelt (2006)). One feature of most MEM approaches is that positivity of the image in enforced. This is justifiable in most image-based applications (e.g. optical images or photon-counting X-ray images) and the positivity constraint is helpful to the convergence of these methods. However, for purely interferometric imaging the mean level is unmeasurable, and thus positivity is an artificial constraint that will lead to biasing. Even in the case where some single-dish or interferometer autocorrelation data is available, care must be taken to properly include these measurements rather than relying upon a positivity enforcement or prior. Also, reconstruction of Stokes Q, U, and V should be possible using the same method, and these have no positivity requirements. There have been attempts to generalize MEM to non-positive images (for example by treating as the difference between two positive images) but these have not been wildly successful. 4.2.3. Model Fitting One can take a more direct approach and fit parameterized component models directly to the visibility data, for example using the Adaptive Scale Pixel (ASP) method (Bhatnagar & Cornwell (2004)). In another example, a Markov-Chain Monte Carlo (MCMC) approach to fitting parameterized components was explored by Rao-Venkata & Cornwell (2006). 4.2.4. Other Approaches A number of other approaches have been explored for image reconstruction and deconvolution. One popular method that has seen commericalization is the Pixon method of Pina & Puetter (1993).

17 5. Future Interferometry and Algorithms We have been illustrating our imaging problems using single frequency or narrow-band data. However, the next generation of radio astronomical interferometers will have the capability to correlate the signals over wide bands (2:1 fractional bandwidths). Since the uv-locus of a baseline is inversely proportional to the wavelength, the spatial and spectral information is mixed up in the uv-plane. Multi-frequency synthesis (MFS) algorithms have been developed to cope with this, see Rao-Venkata et al. (2006) and references therein. These necessarily will require multi-scale methods also. The National Radio Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc. STM gratefully acknowedges the support of the Orson Anderson Scholarship at the IGGP at the Los Alamos National Laboratory and the support of CITA while on sabbatical during the writing of this paper. REFERENCES Ables, J.G. 1974, A&AS, 15, 383 Bhatnagar,S. & Cornwell, T.J. 2004, A&A, 426, 747 Bracewell, R.N. 1986, The Fourier Transform and Its Applications, New York:McGraw-Hill Brisken, W. 2003, EVLA Memo 53 (http://www.aoc.nrao.edu/evla/geninfo/memoseries/evlamemo58.pdf) Cornwell, T.J. & Evans, K.F. 1985, A&A, 143, 77 Gull, S.F., & Daniell, G.J. 1978, Nature, 272, 686 Hamaker, J.P., Bregman, J.D., & Sault, R.J. 1996, A&AS, 117, 137 Hamaker, J.P. & Bregman, J.D. 1996, A&AS, 117, 161 Hamaker, J.P. 2000, A&AS, 143, 515 Hamaker, J.P. 2006, A&A, 456, 395 Hobson, M.P. & Maisinger, K. 2002, MNRAS, 334, 569 Högbom, J.A. 1974, A&AS, 15, 417 Maisinger, K., Hobson, M.P. & Lasenby, A.N. 2004, MNRAS, 347, 339 Myers, S.T., et al. 2003, ApJ, 591, 575 Myers, S.T., et al. 2006, NewAstRev, 50, 951

18 Narayan, R. & Nityananda, R. 1986, ARAA, 24, 127 Padin, S., et al. 2001, ApJ, 549, L1 Pina, R.K. & Puetter, R.C. 1993, PASP, 105, 630 Rajguru, N., et al. 2005, MNRAS, 363, 1125 Rao-Venkata, U.,Cornwell, T.J. & Myers, S.T. 2006, EVLA Memo 101 (http://www.aoc.nrao.edu/evla/geninfo/memoseries/evlamemo101.pdf) Rao-Venkata, U. & Cornwell, T.J. 2006, EVLA Memo 102 (http://www.aoc.nrao.edu/evla/geninfo/memoseries/evlamemo102.pdf) Sault, R.J., Hamaker, J.P., & Bregman, J.D. 1996, A&AS, 117, 149 Sault, R.J. & Osterloo, T.A, in URSI review of radio science, 1993-1996, ed. W. Ross Stone, Oxford University Press, 1996 (astro-ph/0701171) Schwarz, U.J. 1978, A&A, 65, 345 Sutton, E.C. & Wandelt, B.D. 2006, ApJS, 162, 401 Synthesis Imaging in Radio Astronomy II, eds. G.B. Taylor & C.L. Carilli, ASP Conf. Ser. 180, San Francisco: ASP, 1999 Thompson, A. R., Moran, J. M. & Swenson, G. W. Jr. 1986, Interferometry and Synthesis in Radio Astronomy (New York:Wiley) This preprint was prepared with the AAS L A TEX macros v5.2.