On the Recovery of a Function on a Circular Domain

Similar documents
MOMENT functions are used in several computer vision

Science Insights: An International Journal

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

Lecture Notes 1: Vector spaces

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Notation. General. Notation Description See. Sets, Functions, and Spaces. a b & a b The minimum and the maximum of a and b

Convex Hodge Decomposition of Image Flows

TOPICS. P. Lax, Functional Analysis, Wiley-Interscience, New York, Basic Function Theory in multiply connected domains.

A NEW IDENTITY FOR PARSEVAL FRAMES

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Optimum Sampling Vectors for Wiener Filter Noise Reduction

FARSI CHARACTER RECOGNITION USING NEW HYBRID FEATURE EXTRACTION METHODS

THE information capacity is one of the most important

Recall that any inner product space V has an associated norm defined by

Deviation Measures and Normals of Convex Bodies

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

A New Method for Varying Adaptive Bandwidth Selection

Decomposition of Riesz frames and wavelets into a finite union of linearly independent sets

Some Background Material

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

SMOOTHNESS PROPERTIES OF SOLUTIONS OF CAPUTO- TYPE FRACTIONAL DIFFERENTIAL EQUATIONS. Kai Diethelm. Abstract

Introduction to the Mathematics of Medical Imaging

A Plane Wave Expansion of Spherical Wave Functions for Modal Analysis of Guided Wave Structures and Scatterers

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

UNIFORM BOUNDS FOR BESSEL FUNCTIONS

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

The Fractional Fourier Transform with Applications in Optics and Signal Processing

Vectors in Function Spaces

On the variety generated by planar modular lattices

On Riesz-Fischer sequences and lower frame bounds

Robustness of Principal Components

INDEX. Bolzano-Weierstrass theorem, for sequences, boundary points, bounded functions, 142 bounded sets, 42 43

A Smorgasbord of Applications of Fourier Analysis to Number Theory

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

The Skorokhod reflection problem for functions with discontinuities (contractive case)

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Kernel B Splines and Interpolation

Complex symmetric operators

Defect Detection using Nonparametric Regression

Cauchy Integral Formula Consequences

Hausdorff operators in H p spaces, 0 < p < 1

arxiv: v1 [math.cv] 14 Nov 2017

NONLINEAR FREDHOLM ALTERNATIVE FOR THE p-laplacian UNDER NONHOMOGENEOUS NEUMANN BOUNDARY CONDITION

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

1 Coherent-Mode Representation of Optical Fields and Sources

Discrete Simulation of Power Law Noise

Remarks on the Rademacher-Menshov Theorem

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

Decompositions of frames and a new frame identity

A geometric proof of the spectral theorem for real symmetric matrices

A DECOMPOSITION THEOREM FOR FRAMES AND THE FEICHTINGER CONJECTURE

A Variational Approach to Reconstructing Images Corrupted by Poisson Noise

ECE521 week 3: 23/26 January 2017

Geometry of subanalytic and semialgebraic sets, by M. Shiota, Birkhäuser, Boston, MA, 1997, xii+431 pp., $89.50, ISBN

Analogues for Bessel Functions of the Christoffel-Darboux Identity

The dichotomy between structure and randomness. International Congress of Mathematicians, Aug Terence Tao (UCLA)

L p Approximation of Sigma Pi Neural Networks

Hamburger Beiträge zur Angewandten Mathematik

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

Chapter 2 Fourier Series Phase Object Spectra

Course Description for Real Analysis, Math 156

Archiv der Mathematik Holomorphic approximation of L2-functions on the unit sphere in R^3

SCALE INVARIANT FOURIER RESTRICTION TO A HYPERBOLIC SURFACE

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

WAVELET EXPANSIONS OF DISTRIBUTIONS

Richard F. Bass Krzysztof Burdzy University of Washington

Efficient packing of unit squares in a square

TOPICS IN HARMONIC ANALYSIS WITH APPLICATIONS TO RADAR AND SONAR. Willard Miller

A PLANAR SOBOLEV EXTENSION THEOREM FOR PIECEWISE LINEAR HOMEOMORPHISMS

An Efficient Approach to Multivariate Nakagami-m Distribution Using Green s Matrix Approximation

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

Invariant Image Watermark Using Zernike Moments

Math 302 Outcome Statements Winter 2013

Recurrence Relations and Fast Algorithms

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

Contents. 2 Sequences and Series Approximation by Rational Numbers Sequences Basics on Sequences...

On the Class of Functions Starlike with Respect to a Boundary Point

Course Outline. Date Lecture Topic Reading

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

COMPLEX SIGNALS are used in various areas of signal

M E M O R A N D U M. Faculty Senate approved November 1, 2018

Accurate Orthogonal Circular Moment Invariants of Gray-Level Images

Modeling Multiscale Differential Pixel Statistics

XXXV Konferencja Statystyka Matematyczna -7-11/ XII 2009

Introduction to Mathematical Physics

Gaussian Measure of Sections of convex bodies

On Ridge Functions. Allan Pinkus. September 23, Technion. Allan Pinkus (Technion) Ridge Function September 23, / 27

Title Project Summary

MATHEMATICAL METHODS AND APPLIED COMPUTING

Tobias Holck Colding: Publications. 1. T.H. Colding and W.P. Minicozzi II, Dynamics of closed singularities, preprint.

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

arxiv:math/ v1 [math.rt] 9 Oct 2004

Introduction to Proofs

96 CHAPTER 4. HILBERT SPACES. Spaces of square integrable functions. Take a Cauchy sequence f n in L 2 so that. f n f m 1 (b a) f n f m 2.

Estimation of a quadratic regression functional using the sinc kernel

Uncertainty Principle Applied to Focused Fields and the Angular Spectrum Representation

An Efficient Algorithm for Fast Computation of Pseudo-Zernike Moments

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra

Transcription:

2736 IEEE ANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 10, OCTOBER 2002 On the Recovery of a Function on a Circular Domain Mirosław Pawlak, Member, IEEE, and Simon X. Liao, Member, IEEE Abstract We consider the problem of estimating a function ( ) on the unit disk ( ): 2 + 2 1, given a discrete and noisy data recorded on a regular square grid. An estimate of ( ) based on a class of orthogonal and complete functions over the unit disk is proposed. This class of functions has a distinctive property of being invariant to rotation of axes about the origin of coordinates yielding therefore a rotationally invariant estimate. For radial functions, the orthogonal set has a particularly simple form being related to the classical Legendre polynomials. We give the statistical accuracy analysis of the proposed estimate of ( ) in the sense of the 2 metric. It is found that there is an inherent limitation in the precision of the estimate due to the geometric nature of a circular domain. This is explained by relating the accuracy issue to the celebrated problem in the analytic number theory called the lattice points of a circle. In fact, the obtained bounds for the mean integrated squared error are determined by the best known result so far on the problem of lattice points within the circular domain. Index Terms Accuracy, circle orthogonal polynomials, circle problem, circular domain, lattice points, nonparametric estimate, radial functions, rotational invariance, two-dimensional (2-D) functions, Zernike functions. I. INODUCTION SUPPOSE that is a function which belongs to over the unit disk. We consider the problem of estimating, given the following data: is a zero-mean, finite-variance, spatially uncorrelated noise process. Let denote the variance of. We assume that the data are observed on a square grid of edge width, i.e.,. We consider functions on the circular domain since it is a common situation in a wide range of applications. These include the analysis of data from a positron scanning devices [1] [5], the diffraction theory of optical aberrations [6] [8], invariant pattern recognition [9] [14], image analysis [15] [18], [12], [19] [21], and statistical models of circular data [22], [23]. Nevertheless, relatively little estimation theory in such a setting has been done. In this paper, we propose an estimation technique for recovering utilizing a specific class of orthogonal basis Manuscript received December 18, 2000; revised January 21, 2002. This work was supported by NSERC. M. Pawlak is with the Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB R3T 5V6, Canada (e-mail: pawlak@ee.umanitoba.ca). S. X. Liao is with the Department of Business Computing, University of Winnipeg, Winnipeg, MB R3B 2E9, Canada (e-mail: sliao@uwinnipeg.ca). Communicated by G. Lugosi, Associate Editor for Nonparametric Estimation, Classification, and Neural Networks. Publisher Item Identifier 10.1109/TIT.2002.802627. (1) on. Although it is possible to construct a number of orthogonal systems on [24] we use the so-called Zernike orthogonal functions [25], [6], [1], [8] due to some of their unique properties. First, they are invariant in form with respect to orthogonal transformations of the -plane yielding a popular class of invariant features in object recognition [9] [11], [15] [18], [12], [19] [21], [13], [14]. Second, the Zernike functions have properties similar to both classical trigonometric series and Jacobi polynomials and they define a minimal set of permissible functions [25]. Furthermore, Radon transforms of the Zernike functions are also orthogonal [1]. The latter plays a fundamental role in tomographic reconstruction of objects from their projections [1] [5]. It is also worth mentioning that the Zernike orthogonal basis has recently found applications in ophthalmology for corneal imaging allowing eye-care providers to fit contacts and spectacles [26]. In this paper, an accuracy analysis of the estimate of based on Zernike functions is carried out. Let be an estimate of derived from data record (1) which utilizes the first Zernike functions. We give bounds for the mean integrated squared error () (2) for a large class of functions belonging to. This includes bounded variation functions, radial functions, and functions of a specific structural form. This analysis reveals the dependence of the error on the truncation point, image smoothness, noise characteristics, sampling rate, and the circular geometry of the support of. This is quantified by decomposing the into the variance, bias, discretization, and geometric error components. The first three factors are standard in the nonparametric curve estimation theory and yield the well-studied variance/bias/discretization tradeoff [27], [28]. The geometric error is caused by the fact that our reconstruction problem is confined to a unit circle and some lattice squares along the circular domain may be included and other excluded. This kind of error is quantified by using the celebrated problem in the analytic number theory referred to as lattice points of a circle due originally to Gauss [29] [33]. We show that the geometric error has a dominant influence on the overall performance of our reconstruction method. This is a distinctive feature of the function reconstruction problem on a circular domain. II. ORTHOGONAL BASES ON Functions that form a complete and orthogonal set in can be easily defined by applying the Gram Schmidt orthogonalization process to all monomials with respect to a circularly symmetric weight function defined on, see Remark 1 below for an example of such a 0018-9448/02$17.00 2002 IEEE

PAWLAK AND LIAO: ON THE RECOVERY OF A FUNCTION ON A CIRCULAR DOMAIN 2737 set. An important property to have in many applications is an orthogonal basis that is invariant to rotations of axes about the origin. It has been shown in [25], [6] that such a set has to be in the following generic form: (3),,, and is an integer that takes positive, negative, or zero values, and is a polynomial in of degree, containing no power of lower than. It is clear that there is an infinite number of orthogonal sets satisfying (3). Hence, to further narrow down the class of invariant bases we choose the circle polynomial, which are obtained from the orthogonalization of the linearly independent functions (a) with respect to a weight function. In [25], [6] it has been demonstrated that there is only one set of orthogonal functions obtained from (4) and which contains a polynomial for each permissible pair of values of (degree) and (angular dependence), i.e., for integral values of and such that (4) and is even (5) This basis is often referred to as the Zernike functions [8], [25], [6]. An explicit expression for the radial Zernike polynomial is given by with and. Polynomial (6) is of degree with the lowest power equal to and is an even or an odd polynomial according as is even or odd. Fig. 1 depicts the radial section of for and. Also, the density plot of for is shown. The orthogonality relation for is or in polar coordinates (6) (7) Fig. 1. (a) Radial section of R () for (p; q) =(8; 0) (bold line) and (p; q)=(9; 1) (light line). (b) Density plot of R (). A useful property of the polynomial is that it is closely related to the classical Jacobi polynomials [25], [1], [24], [34] according to the relation (b) (10) and is the Jacobi polynomial of order with the parameter. The Jacobi polynomials are orthogonal on the interval with respect to the weight function,,, [34]. Identity (10) implies that if then the asterisk denotes the complex conjugate. Combining (3) and (8), the orthogonality relation for the radial polynomials is (8) (9) while for (11) we have (12) is the th-order classical Legendre polynomial de-, [35], [34]. It is also known [35] that (13) fined on

2738 IEEE ANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 10, OCTOBER 2002 For further considerations, it is useful to know the growth of the polynomial. First, let us recall the fact that attains its maximum in at one of the endpoints [34, p. 168]. Hence, observing that Denoting and observing that is -periodic in we can rewrite (17) in the following equivalent form: (18) and then using (10) and (13), we obtain the following useful bound for : (19) with. Remark 1: The following functions (14) define the orthonormal and complete system in, see [24] and references cited therein. Observe, however, that this set is not in the form of (3) and thus is not invariant. In [13], a class of orthogonal functions on referred to as disk-harmonics has been proposed. They are eigenfunctions of the Laplacian operator confined to. This class is of the form in (3), hence invariant, with, is the th-order Bessel function of the first kind. Here, is a sequence of numbers that must be selected appropriately in order to assure orthogonality and completeness. We refer to [13], [37], and references cited therein for some theory and applications of such orthogonal bases. Observe, however, that this set has no simple connections to classical orthogonal systems and therefore its analytical properties are difficult to establish. Finally, let us also mention another interesting property of the Zernike functions. It has been shown in [38] that the generalization of the concept of prolate functions to the circular domain leads to a class of functions being closely related to the Zernike radial polynomials. III. FUNCTION APPROXIMATION BY ZERNIKE FUNCTIONS The completeness and orthogonality of allow us to represent any by the following series (15) the summation is carried out for satisfying (5) and, due to (7), is the normalizing constant. The Fourier coefficient of order with repetition is given by (16) is the usual th Fourier coefficient of the function. The preceding identities allow us to infer about the coefficient under different symmetry conditions on. Throughout the paper, we assume that the origin of the coordinate system is the center of symmetry. Hence if is symmetrical for every direction, i.e., is an even function of, then we have Hence, in this case is a real number. The fundamental property of the Zernike functions and consequently coefficients is their rotational invariance. Hence, if is rotated by the angle, then using (17) we can obtain that the Zernike coefficient of the rotated function is given by (20) Similarly, for the reflected function, the Zernike coefficient is equal to. Thus, the magnitude of can be used as an invariant feature. In fact, it is common in object recognition to use for a few values of as a feature vector. See [7], [11], [14], [18] [21], and references cited therein for applications of Zernike coefficients in pattern recognition. Let us also mention that (20) can be used to estimate the rotation angle [15] from the reference pattern and its rotated version. This can also serve as a tool for estimating the symmetry of images. It is important for further studies to evaluate the form of for various classes of functions defined on. Let us first consider an important class of radial functions, i.e., let be a function of only; write it as. In this case, it is easy to observe that defined in (19) is given by denotes the indicator function of a set. By this and (12) we have (21) Recalling (3), we can obtain the following useful polar coordinate version of (16): (17) is ellip- A more general case to the preceding is when tically contoured, i.e., (22)

PAWLAK AND LIAO: ON THE RECOVERY OF A FUNCTION ON A CIRCULAR DOMAIN 2739 for some nonsingular matrix. In particular, let and a single variable function Hence, represents the best radial approximation of and, consequently, there is an unique decomposition (29) plays the role of the correlation coefficient between the variables and. Assuming, we observe that (22) becomes (23) Note that for, the Zernike coefficient of function (22) is given by (21). For general, is defined in (18) with (24) It is interesting to know the behavior of as a function of. Expanding (24) into a Taylor series around and after some tedious algebra, we can obtain that (25) and is orthogonal to, i.e., Remark 2: It is interesting to note that the orthogonal projection of the invariant function in (3) onto is given by. It is of some interest to evaluate the size of the term in (29). For the function class defined in (23), it can be easily shown that the function in (23) satisfies (26). Here is the norm. Yet another important functional class [41], [4] consists of functions which can be decomposed into separate angle and radial components of the following form: and (30) for some measurable functions,,. A simple algebra shows that the Zernike coefficients for this class of functions have the following form: Here it has been assumed that possesses two continuous derivatives. Clearly, the term is identical to (21) and as we have already explained, it represents the Zernike coefficient for radial functions. It is interesting to observe that (25) does not possess a linear term with respect to. Generally, all terms with odd powers of are zero, provided that corresponding higher order derivatives of exist. Let us now assume that is Lipschitz, i.e., that for and (26) Then it can be easily shown that (31) is the th Fourier coefficient (see (19)) of the angle component and It can be easily shown that for the orthogonal projection of class (30) onto we have the following bound: (27) This result gives the bound for the distance between the Zernike coefficients of functions defined in (23) and the corresponding coefficients of radial function. The result (27) can be viewed as an example of a more general concept of the radial approximation of a given class of functions. Hence, let be a class of radial functions in. Then it can be shown [40] that the orthogonal projection of onto is of the form (28) it is assumed that all satisfy (26). On the other hand, a bound in terms of the global variation of is of the form See the Appendix for the proof of this result. (32) Remark 3: The class of functions defined in (30) is important since it is known [39] that it defines a dense subset in

2740 IEEE ANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 10, OCTOBER 2002 greater than, it can be easily shown that for. Due to (28) we also note that the orthogonal projection of the class of ridge functions onto is of the form is dense in. A special interesting case of class (30) is the additive model (33) for some measurable functions and. We refer to [42] for an extensive overview of the theory and applications of such models. It should be noted that (33) describes the additive model in the polar coordinate system. This should be contrasted with a counterpart of (33),, in the coordinates. The Zernike coefficient for in (33) takes the following form: For satisfying (26), the distance between and in (35) is bounded by On the other hand, a bound on variability of is given by in terms of the global (34) It is also clear that the orthogonal projection of class (33) onto is given by The distance between in (33) and is bounded by. In fact, the set of all finite linear combinations The proof of this bound is similar to (32). The interpretation of the above inequality and that in (32) is that if oscillations in the angular components are small then is closer to. Let us also mention the class of ridge functions with directions The proof of this fact can be obtained in the analogous way as the proof of (32). IV. THE RECONSUCTION ALGORITHM Let us now consider the problem of estimating,, given noisy data generated from observation model (1). Let us assume that lattice points from are located at the center of the lattice squares. Fig. 2 shows the geometry of lattice points. The first task is to estimate the Zernike coefficients defined in (16). Using a piecewise approximation of over the lattice squares we can define the following estimate of : (36) (35) are some single variable functions defined on. This class of functions plays an important role in projection-based approximation of multidimensional functions [41], [4]. In particular, it has been shown [4] that any polynomial of degree in and can be represented by (35) provided that are single variable polynomials of degree and are distinct directions. Furthermore, the ridge functions can be used as a powerful tool for approximating radial functions and functions with smooth angular behavior [41]. It is straightforward to show that the Zernike coefficient for (35) takes the following form: It is also assumed that is defined on by (3), (6), and is zero otherwise. Let us observe that is just a numerical approximation of the integral representing over the lattice. It is worth noting that if for all then we have. Such an unusual phenomenon does not hold for functions defined over the rectangular domain. In our case, we are dealing with the circular domain and the lattice squares cannot fill the circle perfectly. This problem will be examined in detail in Section V. Representation (15) and (36) yield the following estimate of : This formula can be further simplified assuming a specific form of ridge functions. In particular, for being polynomials of degree not for being even,. (37)

PAWLAK AND LIAO: ON THE RECOVERY OF A FUNCTION ON A CIRCULAR DOMAIN 2741 The performance of is measured by defined in (2). By virtue of (7) and Parseval s formula, we obtain the following decomposition of : VAR VAR BIAS (40) is the integrated variance, as BIAS (41) Fig. 2. The design points f(x ;y ); 1 i; j ng; (x ;y ) 2 D. Estimate (37) is a standard orthogonal series estimate defined by a truncation parameter. Let us observe that there are terms in (37). Remark 4: Let us note that due to the invariant form of our estimate is rotationally invariant. Hence when any rotation, is applied, the estimate is transformed into an estimate of the same form, i.e., To the best of our knowledge, no classical curve estimation method possesses this useful property. For the class of radial functions considered in Section III, our estimate can have a reduced complexity due to (21). In fact, we can define the following estimate of : is the integrated squared bias. The term is a bias caused by the truncation at. The term measures the discretization and geometric errors in estimating the coefficient. We show in Section V-C that can be further decomposed into the discretization error that reflects an usual need of numerical approximation of the integral and the geometric error being a distinctive feature of our estimation problem. The geometric error measures the accuracy of the lattice approximation of the circular domain. It turns out that this approximation is described by a celebrated problem in the analytic number theory on the number of lattice points inside a circle [29], [31]. The contribution of each term in (40) and (41) to the overall error is examined in Section V. Concerning the estimate in (38) and (39), let us first observe that This is decomposed as follows: (38) (39) is the estimate of in (21) with. Here, we denote, is the th Legendre polynomial. Furthermore, are observations at the lattice points taken along the diagonal direction. Hence, the actual number of points taken into account is of order.it is also clear that one can use other directions such as or. VAR (42) is in (21) with. It should be noted that in the case of radial functions the geometric error does not appear. The estimation problem studied in this paper falls into the category of nonparametric curve estimation since we do not assume any parametric knowledge of a class of functions taken into account. We use the orthogonal expansion approach utilizing a very specific class of rotationally invariant polynomials and possessing some other aforementioned unique properties. In the statistical literature on this subject [27], [43], [16], [28] one

2742 IEEE ANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 10, OCTOBER 2002 can find other nonparametric methods for function recovering. For example, one could apply the popular kernel estimate (43) The bound for this term is given in the following lemma. Lemma 1: Let the observation model (1) be in force and let the noise process be uncorrelated with zero mean and variance. Then for estimate (36) and (37), we have VAR The proof of this bound, see the Appendix, reveals that is a bandwidth parameter and function such that is a bivariate kernel Using the arguments as in Section V-C, see also the discussion following (76), we can show that Using arguments developed in this paper one can argue that the accuracy of the kernel estimate may be comparable to our method. Nevertheless, the accuracy of the kernel estimate depends not only on the choice of the bandwidth but also on the choice of the kernel function. Indeed, the order (quantified by the number of vanishing moments) of the kernel function must be selected according to the smoothness of the unknown function. It is also known that the kernel estimate suffers from the boundary effect [27], [28], i.e., at a boundary kernel mass falls outside the support of the function to be estimated and is lost. This effect will be further amplified by the presence of the inherent geometric error caused by the circular nature of the domain of estimated functions. It is also not clear whether the kernel estimate can have the property similar to (21) allowing to propose an estimate of reduced complexity for the important class of radial functions. Moreover, the kernel estimate is not invariant to rotations. As we have already mentioned, the Zernike basis has been a popular technique in optical pattern recognition and medical imaging one uses a few Zernike coefficients to define a feature space in order to summarize complex data efficiently. Clearly, the kernel estimate cannot be used for such applications. describes the so-called geometric error. The counterpart of Lemma 1 for the class of radial functions takes the following form. Lemma 2: Let the observation model (1) be in force and let the noise process be uncorrelated with zero mean and variance. Then for estimate (38) and (39), we have VAR Comparing Lemmas 1 and 2, we observe that for the case of radial functions the variance grows linearly with, as for a general class of functions it increases as. B. Truncation Error Owing to (40), the truncation for is given by even (45) as for by (46) V. ACCURACY ANALYSIS In this section, we examine the components of exhibited in decompositions (40) (42). All the proofs are deferred to the Appendix. In Section V-A, we deal with the stochastic component of the error quantified by the term VAR. Then, the truncation error will be evaluated (Section V-B) followed by the detailed analysis of the discretization and geometric errors in Section V-C. Section V-D summarizes all the obtained results by giving the evaluation of the overall error. Throughout most of the paper, we use the observation model (1). Nevertheless, in Section V-E we extend our results to the model with nonhomogeneous variance. A. Variance Let us recall that the integrated variance term is given by the following formula: It is clear that, as, and the rate at which, will be determined by the degree of smoothness of the functions being approximated by the Zernike functions expansion. Let us consider the class BV of functions of bounded variation on the set, i.e., functions which have finite variations in the amplitude as well as the length of the contours along which they occur. Let us note that, without loss of generality, we can choose. In fact, in our case using the polar coordinates to represent we can take. Formally, we say that BV if the total variation is finite. Here the modulus of the gradient is defined as (47) for VAR (44) being even. It is important to observe that the class BV can admit discontinuous functions. In this case, the derivatives are meant in the generalized distribution sense. For example, if

PAWLAK AND LIAO: ON THE RECOVERY OF A FUNCTION ON A CIRCULAR DOMAIN 2743 is the characteristic function of a set having piecewise-smooth boundary, then is equal to the length of. We refer to [44] [47] for various definitions of bounded variation functions of many variables. In particular, in this paper we use the concept of bounded variation for functions of two variables due to Hardy [45]. It is also worth noting that the class BV has been recently used in image analysis [36], [48]. A simple characterization (see [47]) of the class BV is that BV iff (48) of Lipschitz continuous functions. This yields the following bound: (49) and the constant depends on. Similarly, if one assumes that all partial derivatives exist and that is the total variation of in the -direction and analogously for. It is also clear that are Lipschitz (with ) then It is worth noting that in contrary to the one-dimensional case, the bivariate function belonging to BV need not be bounded [46], [47]. The following result gives the approximation error for the Zernike functions expansion of functions of the class BV. Let recall the notation. It is clear that the requirement BV is equivalent to BV. Lemma 3: Let BV and let be bounded. Then we have depends on. The condition of bounded variation for the class of functions of the additive form introduced in (30) can be easily verified. In fact, function (30) is in BV if BV and BV,. The ridge function in (35) is in BV if each belongs to BV. Regarding the case of radial functions let us assume that is in. The following lemma gives a bound for. Lemma 4: Let BV. Then we have is the total variation of on and. Lemmas 3 and 4 hold for functions of bounded variation. This class allows functions that are discontinuous, which is often the case in image processing applications. Let us consider a class of Lipschitz continuous functions, i.e., for,, and a positive constant. To evaluate the truncation error for this case one can use the known results [49] on the best polynomial approximation for (50) the constant depends on the -norms of and the Lipschitz constant. The bounds in (49) and (50) can be improved in some special cases. For example, result (49) can be improved in the case of radial functions as shown in the following lemma. Lemma 5: Let. Then we have Hence, we have the rate which should be compared with in (49) for. The rate in Lemma 5 can also be extended to the class of additive functions defined in (33), i.e., the functions of the following form: (51) Formula (34) gives the Zernike coefficient for the additive functions. The first term in (34) is identical to the Zernike coefficient of radial functions. As for the second term, we will prove that it is exponentially small. Hence, the angular component in (51) plays a negligible role in comparison to the radial one in the overall approximation error. Lemma 6: Let be of the additive form (51). Let and let. Then we have for some positive constants,, and. It should be noted that the constant can be selected as, is given in Lemma 5. To get further insight into the

2744 IEEE ANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 10, OCTOBER 2002 order of the approximation error let us consider a specific class of functions defined on. Example 1: Let be radial, denote this by. First, consider the function (52) It is clear that BV and also satisfies the condition of Lemma 5 if. The discussion in Section III reveals, see (21), that for radial functions we need to evaluate the following integral: (53) is the th-order Legendre polynomial. Using the result in [50], we can show that (53) is of order. This applied in (46) yields (54) In the same way, we can argue that for the function,,, the rate in (54) holds. It should be noted that both and are monotonic on. Let us now consider the function,, which is not monotonic but convex on. Again, using [50] we can obtain that the integral in (53) is of the order and, consequently, we have This is a twice slower rate than in (54). The following characteristic represents the class of discontinuous piecewise-constant functions (55) and, without loss of generality,. Note that BV with. A tedious but otherwise straightforward algebra making use of some properties of the Legendre polynomials [35] yields the following bound: This leads to This is consistent with the general result given in Lemma 4. Let us finally consider the case of an unbounded function and also not being in BV. Hence, let It is clear that if only. Some algebra shows that Hence, we obtain This is clearly a very slow rate of convergence. C. Geometric and Discretization Errors In order to reveal the nature of the geometric error let us consider the indicator function of, i.e.,. Then we obtain Noting that (see (7)) and the case, i.e., for otherwise, it suffices to consider (56) The above term is not equal to zero due to the fact that if the center of a lattice square falls inside the border of the unit disk, this square is used in the computation of the estimate otherwise, the square is discarded. Therefore, the area used for computing is not equal to the area of the unit disk. Fig. 2 shows the union of the squares whose centers fall inside the unit circle. Note that some squares are not entirely inside the circle; on the other hand, some parts of the circle are not covered by the squares. Recalling that we can rewrite (56) as follows: (57) denotes the number of the points inside the unit circle. As we shall see, (57) fully describes the geometric error. Hence, let us define the following number: (58) Since our theory models the performance of function recovery on the grid which becomes increasingly fine, it is crucial to know the size of, i.e., how fast tends to zero as. It turns out that the quantity has been extensively examined in the analytic number theory with the relation to the so-called lattice points of a circle problem due originally to Gauss [29] [33]. Gauss problem on the number of points inside a circle is to determine the correct order of magnitude of as. First, it is known after Gauss that. This result was substantially improved by Sierpiński in 1906 to the form of. Generally,, with the following significant steps in the history of finding the largest possible : Gauss (1834) Sierpinski (1906) Walfisz (1927) Titchmarsh (1935) Hua (1942) We refer to [29], [31], and [33] for a detail history of this fascinating problem of the analytic number theory.

PAWLAK AND LIAO: ON THE RECOVERY OF A FUNCTION ON A CIRCULAR DOMAIN 2745 Fig. 3. The geometric error jr(1)j versus n, 10 n 128; n=2=1. Recently, [32], [30], [31] the following results have been proved: Iwaniec and Mozzochi (1988) for arbitrary small ones. Since the number greatly influences the accuracy of the estimate, therefore, is more likely to be positively biased. We refer to [51] for similar numerical results concerning the evaluation of. Let us now consider the term in (41), i.e., and the best result known so far Huxley (1990) for arbitrary small Note that and. It is conjectured that the value of can be arbitrarily close to but it is known that is impossible. This still remains an open problem in the analytic number theory, see [29], [31], [33], and references cited therein. Hence, with the latest results due to Huxley [29] [31] we have the following bound for the size of : To evaluate this term it suffices to consider the difference. First, it is an important observation that can be decomposed into two separate components describing the discretization and geometric errors, i.e., (61) The conjectured rate is of order (59) and (62) (60) (63) A numerical study has been conducted to evaluate for a large range of. Fig. 3 depicts as a function of for, i.e.,. The oscillatory behavior of can be observed with the error taking both positive and negative values. However, the positive values (around 78%) are more frequent than the negative Here, denotes the lattice square centered at and is the region being the intersection of with the union of those grid squares whose centers fall outside the circle, see Fig. 2. The magnitude of the discretization error is given in the following lemma.

2746 IEEE ANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 10, OCTOBER 2002 Lemma 7: Let BV and let be bounded. Then for any we have Thus, with the best known result on the lattice approximation of a circle we obtain (67) is the total variation of and With the best possible result on this would be The magnitude of the discretization error can be further reduced assuming a bit stronger conditions on. Hence, if is Lipschitz of order then, by a simple modification of the proof of Lemma 7, we obtain (64) The following lemma summarizes the analogous results for radial functions. This is implied by Lemma 8 and formula (42). Lemma 11: a) Let the conditions of Lemma 8 a) hold. Then we have b) Let the conditions of Lemma 8 b) hold. Then we have in (38) is de- for some constant. The discretization error for the estimate scribed in the following lemma. Lemma 8: a) Let BV. Then b) Let. Then Using the bounds obtained in Lemma 8 the constants are given by and. and D. Taking into consideration the results of Sections V-A V-C, one can now easily evaluate the overall estimation error. In fact, combining Lemmas 1, 3, and 10 we obtain the following. Theorem 1: Let BV and let be bounded. Then we have Let us now turn to the geometric error expressed by (63). It is important to observe that the area of the region in (63) is bounded by the error of the lattice approximation of defined in (58), i.e., we have Area (65) This, (7) and Cauchy Schwarz inequality readily lead to the following result. Lemma 9: Let be bounded. Then for any we have Hence, the geometric error depends on the lattice approximation factor introduced in (58) (see Fig. 3). Using the best known bound so far on given in (59) we obtain (66) Comparing this with the results of Lemma 7 or (64) we can conclude that the geometric error dominates the magnitude of. Lemmas 7 and 9 yield the following result concerning the discretization and geometric term. Lemma 10: Let BV and let be bounded. Then we have (68) for some positive constant. Remark 5: Formula (68) reveals an apparent tradeoff between the variance, the discretization, geometric errors, and the truncation error. It is important to note that the overall error is completely dominated by the geometric and truncation errors. This should be contrasted with the classical variance bias tradeoff well documented in a number of nonparametric curve estimation problems [27]. It is clear that there is the truncation parameter minimizing (68) and yielding the best possible error bound for the assumed class of functions. Hence, replacing in (68) by, and taking into account only the dominating terms in (68) we can find that the optimal minimizing (68) is of order (69) for some constant. Inserting into the formula in Theorem 1 we find that (70) for all functions of bounded variation. Using the best know value for, see (59), we obtain our final result. Corollary 1: Under the conditions of Theorem 1 we have (71) with the truncation parameter selected as.

PAWLAK AND LIAO: ON THE RECOVERY OF A FUNCTION ON A CIRCULAR DOMAIN 2747 If one uses the conjectured size for from (69) and (70) that, see (60), we obtain (72) with. We can conjecture that (72) represents the best possible rate of any linear nonparametric recovering technique for the problem of estimating a function of the class BV. The for functions satisfying the condition leading to the approximation error in (50) is of the following order: (73) with the truncation parameter selected as Fig. 4. Bounds for ( ^f ) versus T, 1 T 30 for two different classes of functions. is used the latter rate be- If the best possible result for comes Hence, we can conjecture that for smooth functions, the optimal rate of convergence is of the order The rate given in (73) can be improved if some structural properties of are exploited. In fact, Lemma 6 shows that for the class of additive functions is of order. By this, Lemma 2 and the result in (67) we obtain with selected as. This result should be compared with (73) for, which is of order. It is clear that. The aforementioned considerations clearly reveal that the accuracy of our estimate is almost completely controlled by the geometric error. This is a distinctive feature of the estimation problem on the circular domain. Let us now turn to the case of radial functions, i.e., let us consider the accuracy of the estimate. Recalling Lemmas 2, 4, and 8 we obtain the following. Theorem 2: a) Let BV. Then we have b) If in turn then The last term in (74) is derived from the bound in Lemma 5 by observing that. Direct minimization of the bounds in Theorem 2 yields the following result concerning the rate of convergence. Corollary 2: Under the condition of Theorem 2 a) we have with the truncation parameter selected as. In turn, under the condition of Theorem 2 b) we obtain with the truncation parameter selected as. Example 2: To illustrate the above results we plot in Fig. 4 the bound in (68). We use with the signal-to-noise ratio equals, hence,, and also. We take with. In this case, our numerical studies (partially illustrated in Fig. 3) show that is of order. It is seen that the optimal value of the truncation parameter is given by. In the same plot, we depict bound (68) with the bias term replaced by. This corresponds to functions with all partial derivatives of order one satisfying the Lipschitz condition, see (50). It can be observed that now. Remark 6: In practice, it is of great importance to know how the truncation parameter should be chosen for a given, finite sample size. For some general results on data-driven selection of smoothing and regularization parameters in nonparametric curve estimation we refer to [27] and the references cited therein. In our context, one could use the well-studied cross-validation method seeking which minimizes CV is the norm of. (74) is an estimate the same type as except that it is computed without data value.it should be noted that in the definition of in (36)

2748 IEEE ANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 10, OCTOBER 2002 we have to integrate over two pixels in the case of pixels adjacent to the pixel. The CV criterion corresponds to squared error SE which measures the discrepancy between and on the grid points. Let be the truncation value minimizing SE. It is an interesting problem to assess the accuracy of as an estimate of. Using the results of this paper and techniques developed in [52], [12] we can show that in probability. Details of this result and other issues related to data-driven selection of will be reported else. E. Generalizations Thus far we have considered the rather classical observation model defined in (1). In a number of applications, in particular in image analysis, one is confronted with the so-called signaldependent noise model, see [53] and references cited therein. Hence, (1) is replaced by (75) is a nonnegative function representing the inhomogeneity of the noise process. Examples include the film grain noise, and the multiplicative noise model. It is clear that this type of noise model is going merely to influence the variance of the estimate. Hence, the term VAR in decomposition (40) has to be examined. Proceeding as in the proof of Lemma 1 we can obtain Using arguments as in (61), i.e., observing that we can approximate the sum in (76) as follows: (76) (77) is the error term of the approximation. Clearly, is of order. The dependence of VAR on, however, is weaker than in the bias term as contributes in (77) to the second-order effect. The results of this paper can be extended into other directions. For instance, one can consider the three-dimensional (3-D) case (which finds applications in computer vision and crystallography). In fact, the following expression is a complete and orthogonal set of functions in, is the 3-D unit ball. Here, is the radial component and, are angular variables. The problem of lattice points which is critical for our developments has also been studied for. Hence, let have the meaning as in (58), i.e., it is the number of lattice cubes falling into. Then it is known that [29], [31], [33], [54] It is rather surprising result that the geometric error in the 3-D case is of smaller order than the one for the two-dimensional (2-D) circle. For the recent improvement of the above result from to we refer to [54]. VI. CONCLUDING REMARKS In this paper, we have presented a basic theory of function recovery defined on the circular domain. The estimate based on a class of invariant orthogonal functions has been proposed and its statistical properties have been evaluated. Various bounds for the mean integrated squared error have been demonstrated. These bounds have been obtained by decomposing the error into four noninterfering terms that reflect different errors such as the amount of noise in the data, the discretization error, the geometric error, and the approximation term. In fact, the estimate of the Fourier coefficient is decomposed as follows: (78) the first term in the parentheses represents the stochastic component of, as the second term in the parentheses is the bias term of. We show that can be decomposed further into geometric and numerical errors. The stochastic term is in probability, as the numerical error is, depends on the smoothness of. The geometric error, on the other hand, can never exceed. The best known result on the circle lattice problem [29] [31] yields the geometric error of order. Hence, the geometric error decreases much slower than the other terms in decomposition (78). Although our results are based on upper bounds, we believe that the order of convergence rates are correct and that the geometric error is an inherent factor of any function reconstruction problem defined on nonrectangular domain. We refer to [30] and [31] for lattice approximation results concerning a circle and other convex domains in. We have observed that the geometric error can be eliminated for the case of radial functions for which the Zernike coefficients take a particularly simple form (see (21)) allowing one to define the simplified estimate, see (38). In practice, however, it is difficult to say whether the observed data in (1) represent a radial function. It is interesting to note, however, that property (21) allows us to form a simple statical test to verify the radiality of the underlying function. In fact, let us consider the following statistic:

PAWLAK AND LIAO: ON THE RECOVERY OF A FUNCTION ON A CIRCULAR DOMAIN 2749 the summation is taken for all such that is even and. Our considerations show that in probability, as. Hence tends to Proof of (32): First observe that for the function defined in (30) we have which due to (21) is equal to zero. Thus, one could test the radiality by checking whether is sufficiently small. Further details of using for testing the radiality will be presented else. Our estimate based on the radial polynomials, being global functions in nature, suffers degradation in accuracy due to the geometric error. To alleviate the error one could use local multiresolution/wavelet expansions [48] with possible different spatial scales. The issue of invariance, however, remains still open as standard wavelet expansions do not have this desirable property. This topic is left for future studies. APPENDIX For further considerations we shall need the following two auxiliary results. Lemma 12 (Wirtinger s Inequality): If,, and either or, then Due to the Cauchy Schwarz inequality we obtain and Using Lemma 12 we have (80) (81) This follows from the result in [55, eq. (7.7.1) ]. Lemma 13: Let satisfy the following conditions: for all for all This yields the following bound for the right-hand side of (81): Since (82) for all and (79) a) Any function whose variation in is finite may be expressed in the form and satisfy conditions (79). b) Let be a finite and integrable function on, and let satisfy conditions (79). Then and (80) (82) the proof of inequality (32) is completed. Proof of Lemma 1: Let denote the th lattice square centered at. By the fact that are uncorrelated we have By virtue of the Cauchy Schwarz inequality this does not exceed,. These results were proved originally in [45] and are not easily accessible in standard books on multivariate calculus. Note that the result in a) represents a counterpart of the classical decomposition of an univariate function of bounded variation as a difference of two nondecreasing and positive functions. On the other hand, part b) is a generalization of the second mean-value theorem to the case of 2-D functions. the equality is due to (7). Since there are terms in (37), we can conclude the proof of Lemma 1. Proof of Lemma 2: By (39) we obtain the polynomial is supported on the interval.

2750 IEEE ANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 10, OCTOBER 2002 The Cauchy Schwarz inequality and (9) yield Using (10), we find that (87) is equal to. This, in turn, is equal to Recalling the definition of VAR in (42), we can conclude the proof of Lemma 2. Proof of Lemma 3: Let denote the total variation of and let. By virtue of Lemma 13 a) and (17) we can write Using the identity in [57, p. 848] and some algebra we can find that the last integral is of the order This combined with (86) yields (83) and are functions defined over the rectangle satisfying conditions (79). A quick inspection of the decomposition given in [45] reveals that Consequently Using the bound for established in (14) one can observe that is a finite and integrable function on. Thus, by Lemma 13 b) we can rewrite (83) as follows: (88) To evaluate the above double sum let us split the inner sum in (88) as follows: (84) and,. Assuming, without loss of generality, that and we can conclude that (84) is equal to (89) It is clear that the first term on the right-hand side of (89) is bounded by (85) Noting that and are functions of bounded variation on and using the result in [56] we have that The second term in (89) is bounded by Consequently, the sum on the right-hand side of (88) does not exceed and By this and (85) it suffices to evaluate the integral for. (86) (87) This concludes the proof of Lemma 3. Proof of Lemma 4: Using the notation as in (42) and by virtue of (21) we have. (90) is a bounded variation function on

PAWLAK AND LIAO: ON THE RECOVERY OF A FUNCTION ON A CIRCULAR DOMAIN 2751 Using the result in [35, p. 202] we have Hence, by this and (46) we have the summation in the second term is carried out over satisfying (5) with. Using the result of Lemma 5, we can easily show the first expression in (93) is bounded by. This yields the first term of the bound in Lemma 6. In order to evaluate the second term in (93), let us consider the integral. This, due to identity (10), is equal to (94) This concludes the proof of Lemma 4. Proof of Lemma 5: Let. By the formula in [57, p. 847], the right-hand side of (94) is equal to is defined in (90) and is the derivative of. Recalling the following formula (see [35, p. 176]): is the Gamma function. Using the fact that one can find that the above formula is equivalent to and then using integration by parts we obtain Applying Parseval s formula and (91) we obtain (91) Using Stirling s formula for the Gamma function we observe that for large Integration by parts of the left-hand side of this equality yields (92) Since, this term decays exponentially. Hence, the second term in (93) is of the order for some constants,. The proof of Lemma 6 is thus completed. Proof of Lemma 7: By virtue of Cauchy Schwarz inequality, we obtain Using (46) and (90) we have By (92) this is bounded by (95) Let us observe that Since the proof of Lemma 5 is completed. Proof of Lemma 6: Formula (34) implies that (93) is the oscillation of over the lattice square. The above inequality, (95), and (7) yield the proof of Lemma 7. Proof of Lemma 8: Part a) of Lemma 8 can be proved in the analogous way as the proof of Lemma 7. Regarding part b),

2752 IEEE ANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 10, OCTOBER 2002 let us observe that using arguments as in the proof of Lemma 7 we obtain By Lemma 12 we have This readily yields The proof of Lemma 8 is thus completed. ACKNOWLEDGMENT The authors wish to thank the reviewers and Dr. G. Lugosi, the Associate Editor, for their valuable comments. REFERENCES [1] S. Deans, The Radon Transform and Some of Its Applications. New York: Wiley, 1983. [2] I. M. Johnstone and B. Silverman, Speed of estimation in positron emission tomography and related inverse problems, Ann. Statist., vol. 18, pp. 251 280, 1990. [3] M. C. Jones and B. W. Silverman, An orthogonal series density estimation approach to reconstructing positron emission tomography images, J. Appl. Statist., vol. 16, pp. 177 191, 1989. [4] B. F. Logan and L. A. Shepp, Optimal reconstruction of a function from its projections, Duke Math. J., vol. 42, pp. 645 659, 1975. [5] P. Milanfar, W. C. Karl, and A. S. Willsky, A moment-based varational approach to tomographic reconstruction, IEEE Trans. Image Processing, vol. 5, pp. 459 470, Mar. 1996. [6] M. Born and E. Wolf, Principles of Optics. Oxford, U.K.: Pergamon, 1975. [7] P. H. Hu, J. Stone, and T. Stanley, Applications of Zernike polynomials to atmospheric propagation problems, J. Opt. Soc. Amer. Ser. A, vol. 6, pp. 1595 1608, 1989. [8] F. Zernike, Beugungstheorie des schneidenverfahrens und seiner verbesserten form, der phasenkontrastmethode, Physica, vol. 1, pp. 689 701, 1934. [9] Y. S. Abu-Mostafa and D. Psaltis, Recognitive aspects of moment invariants, IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-6, pp. 698 706, Nov. 1984. [10] M. K. Hu, Visual problem recognition by moment invariant, IRE Trans. Inform. Theory, vol. IT-8, pp. 179 187, Feb. 1962. [11] A. Khotanzad and Y. H. Hong, Invariant image recognition by Zernike moments, IEEE Trans. Pattern Anal. Machine Intell., vol. 12, pp. 489 498, May 1990. [12] M. Pawlak, On the reconstruction aspects of moment descriptors, IEEE Trans. Inform. Theory, vol. 38, pp. 1698 1708, Nov. 1992. [13] S. C. Verral and R. Kakarala, Disk-harmonic coefficients for invariant pattern recognition, J. Opt. Soc. Amer. Ser. A, vol. 15, pp. 389 401, 1998. [14] L. Wang and G. Healey, Using Zernike moments for the illumination and geometry invariant classification of multispectral texture, IEEE Trans. Image Processing, vol. 7, pp. 196 203, Feb. 1998. [15] W. Y. Kim and Y. S. Kim, Robust rotation angle estimator, IEEE Trans. Pattern Anal. Machine Intell., vol. 21, pp. 768 773, Aug. 1999. [16] A. P. Korostelev and A. B. Tsybakov, Minimax Theory of Image Reconstruction. New York: Springer-Verlag, 1993. [17] S. Liao and M. Pawlak, On image analysis by moments, IEEE Trans. Pattern Anal. Machine Intell., vol. 18, pp. 254 266, Mar. 1996. [18], On the accuracy of Zernike moments for image analysis, IEEE Trans. Pattern Anal. Machine Intell., vol. 20, pp. 1358 1364, Dec. 1998. [19] R. J. Prokop and A. P. Reeves, A survey of moment-based techniques for unoccluded object representation and recognition, in CVGIP: Graphical Models and Image Processing, vol. 54, 1992, pp. 438 460. [20] M. R. Teague, Image analysis via the general theory of moments, J. Opt. Soc. Amer., vol. 70, pp. 920 930, 1980. [21] C. H. Teh and R. T. Chin, On image analysis by the methods of moments, IEEE Trans. Pattern Anal. Machine Intell., vol. 10, pp. 496 512, July 1988. [22] T. Chang, Spherical regression, Ann. Statist., vol. 14, pp. 907 924, 1989. [23] N. I. Fisher, Statistical Analysis of Circular Data. Cambridge, U.K.: Cambridge Univ. Press, 1993. [24] T. Koornwinder, Two-variable analogues of the classical orthogonal polynomials, in Theory and Applications of Special Functions. New York: Academic, 1975, pp. 435 495. [25] A. B. Bhatia and M. Born, On the circle polynomials of Zernike and related orthogonal sets, in Proc. Cambridge Phil. Soc., vol. 50, 1954, pp. 40 53. [26] J. Schwiegerling, J. E. Greivenkamp, and J. M. Miller, Representation of videokeratoscopic height data with Zernike polynomials, J. Opt. Soc. Amer. Ser. A, vol. 12, pp. 2105 2113, 1995. [27] S. Efromovich, Nonparametric Curve Estimation: Methods, Theory, and Applications. New York: Springer-Verlag, 1999. [28] J. S. Simonoff, Smoothing Methods in Statistics. New York: Springer- Verlag, 1996. [29] R. K. Guy, Gauss lattice point problems, in Unsolved Problems in Number Theory. New York: Springer-Verlag, 1994, pp. 240 241. [30] M. N. Huxley, Exponential sums and lattice points, Proc. London Math. Soc., vol. 60, pp. 471 502, 1990. [31], Area, Lattice Points, and Exponetial Sums. Oxford, U.K.: Clarendon, 1996. [32] H. Iwaniec and C. J. Mozzochi, On the divisor and circle problems, J. Number Theory, vol. 29, pp. 60 93, 1988. [33] J. R. Wilton, The lattice points of a circle: An historical account of problem, Messenger of Math., vol. 58, pp. 67 80, 1929. [34] G. Szegö, Orthogonal Polynomials. Providence, RI: Amer. Math. Soc. Colloquie Pub., 1975. [35] G. Sansone, Orthogonal Functions. New York: Interscience, 1955. [36] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Phys. D, vol. 60, pp. 259 268, 1992. [37] A. A. Kakarala and J. Cadzow, Estimation of phase for noisy linear phase signals, IEEE Trans. Signal Processing, vol. 44, pp. 2483 2497, Oct. 1996. [38] D. Slepian, Prolate spheroidal wave functions, Fourier analysis and uncertainty IV: Extensions to many dimensions; generalized prolate spheroidal functions, Bell Syst. Tech. J., vol. 43, pp. 3009 3056, 1964. [39] M. W. Wong, Weyl Transforms. New York: Springer-Verlag, 1998. [40] Y. Yang, Y. N. Hsu, and H. H. Arsenault, Optimal circular symmetrical filters and their uses in optical pattern recognition, Opt. Acta, vol. 29, pp. 627 644, 1982. [41] D. L. Donoho and I. M. Johnstone, Projection-based approximation and a duality with kernel method, Ann. Statist., vol. 17, pp. 58 106, 1989. [42] T. J. Hastie and R. J. Tibshirani, Generalized Additive Models. London, U.K.: Chapman & Hall, 1990. [43] P. Hall and M. Raimondo, On the global performance of approximations to smooth curves using gridded data, Ann. Statist., vol. 26, pp. 2206 2217, 1998. [44] E. Giusti, Minimal Surfaces and Functions of Bounded Variation. Boston, MA: Birkhäuser, 1984. [45] G. J. Hardy, On the double Fourier series and especially those which represent the double zeta-function with real and imcommensurable parameters, Quart. J. Math., vol. 37, pp. 53 68, 1906. [46] A. S. Kronrod, On functions of two variables, Usp. Math. Nauk, vol. 5, pp. 24 134, 1950. [47] W. P. Ziemer, Weakly Differentiable Functions. New York: Springer- Verlag, 1989. [48] S. Mallat, A Wavelet Tour of Signal Processing, 2nd ed. San Diego, CA: Academic, 1999. [49] A. F. Timan, Theory of Approximation of Functions of a Real Variable. New York: Dover, 1994.