S most important information in signals. In images, the

Size: px

Start display at page:

Download "S most important information in signals. In images, the"

Albert Hunt
5 years ago
Views:

1 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH Singularity Detection and Processing with Wavelets Stephane Mallat and Wen Liang Hwang Abstract-Most of a signal information is often carried by irregular structures and transient phenomena. The mathematical characterization of singularities with Lipschitz exponents is explained. Theorems are reviewed that estimate local Lipschitz exponents of functions from the evolution across scales of their wavelet transform. It is then proven that the local maxima of the wavelet transform modulus detect the locations of irregular structures and provide numerical procedures to compute their Lipschitz exponents. The wavelet transform of singularities with fast oscillations have a particular behavior that is studied separately. The local frequency of such oscillations are measured from the wavelet transform modulus maxima. It has been shown numerically that one- and two-dimensional signals can be reconstructed, with a good approximation, from the local maxima of their wavelet transform modulus. As an application, an algorithm is developed that removes white noises from signals by analyzing the evolution of the wavelet transform maxima across scales. In two-dimensions, the wavelet transform maxima indicate the location of edges in images. The denoising algorithm is extended for image enhancement. Index Terms-Edges, singularities, wavelets. fractals, Lipschitz exponents, noise, I. INTRODUCT~ON INGULARITIES and irregular structures often carry the S most important information in signals. In images, the discontinuities of the intensity provide the locations of the object contours, which are particularly meaningful for recognition purposes. For many other types of signals, from electrocardiograms to radar signals, the interesting information is given by transient phenomena such as peaks. In physics, it is also important to study irregular structures to infer properties about the underlined physical phenomena [ 11, [3], [19]. Until recently, the Fourier transform was the main mathematical tool for analyzing singularities. The Fourier transform is global and provides a description of the overall regularity of signals, but it is not well adapted for finding the location and the spatial distribution of singularities. This was a major motivation for studying the wavelet transform in mathematics [22] and in applied domains [13]. By decomposing signals into elementary building blocks that are well localized both in space and frequency, the wavelet transform can characterize the local regularity of signals. The wavelet Manuscript received October 25, This work was supported by the NSF Grant IRI , AFOSR Grant , and ONR Grant " , This work was presented in part at NATO, AS1 Workshop on Stochastic Processes, Italy, July The authors are with Courant Institute, New York University, 251 Mercer Street, New York, NY IEEE Log Number transform and its main properties are briefly introduced in Section 11. In mathematics, the local regularity of a function is often measured with Lipschitz exponents. Section 111 is a tutorial review on Lipschitz exponents and their characterization with the Fourier transform and the wavelet transform. We explain the basic theorems that relate local Lipschitz exponents to the evolution across scales of the wavelet transform values. In practice, these theorems do not provide simple and direct strategies for detecting and characterizing singularities in signals. The following sections show that the local maxima of the wavelet transform modulus provide enough information for analyzing these singularities. The detection of singularities with multiscale transforms has been studied not only in mathematics but also in signal processing. In Section IV, we explain the relation between the multiscale edge detection algorithms used in computer vision and the approach of Grossmann [12], based on the phase of the wavelet transform. The detection of the local maxima of the wavelet transform modulus is strongly motivated by these techniques. Section V is a mathematical analysis of the modulus maxima properties. We prove that modulus maxima detect all singularities and that local Lipschitz exponents can often be measured from their evolution across scales. We derive practical algorithms to analyze isolated or nonisolated singularities in signals. Numerical examples illustrate the mathematical results. The wavelet transform has a particular behavior when singularities have fast oscillations; this case is studied separately. The local frequency of the oscillations can be measured from the points where the modulus of the wavelet transform is locally maximum both along the scale and the spatial variable. This approach is closely related to the work of Escudie and Torresani [lo] for measuring the modulation law of asymptotic signals [9]. An important issue is to understand how much information is carried by the local maxima of a wavelet transform modulus. Is it possible to reconstruct the original signal or a close approximation from these modulus maxima? Meyer proved [23] that the local maxima of a wavelet transform modulus do not characterize uniquely a function. However, a numerical algorithm developed by Zhong and one of us [ 181, is able to reconstruct a close approximation of the original signal. We can, thus, process the singularities of a signal by modifying the local maxima of its wavelet transform modulus, and then reconstruct the corresponding function. The reconstruction algorithm is briefly introduced and we describe an application to the removal of noises from signals. In denoising problems, /92$03.00 O 1992 IEEE

2 ~ 618 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH 1992 we often have some prior information on the differences between the signal singularities and the noise singularities. The algorithm differentiates the signal components from the noise, by selecting the wavelet transform modulus maxima that correspond to the signal singularities. After removing the modulus maxima of the noise fluctuations, we reconstruct a denoised signal. The detection of the wavelet transform maxima has been extended in two dimensions for image processing applications [18]. We explain how to detect edges in images from the local maxima of the wavelet transform modulus. Images can also be reconstructed from these modulus maxima with no visual distortions. It allows us to extend the denoising algorithm for image enhancement. We discriminate the noise from the original image information, by analyzing the behavior of the wavelet transform maxima across scales. We also take advantage of the spatial coherence of the image singular structures, to suppress the noise components. Notation: L p( R) denotes the Hilbert space of measurable, functions such that +m I f(x) I dx < +a. The classical norm of f(x) L2(R) is given by Ilf1I2 = / +mlf(x)12dx. -m We denote the convolution of two functions f(x) el2(r) and g(x) el2(r) by +m f*g(x) = / f(u)g(x - 4 du. -m The Fourier transform of a function f( x) is written f(w) and defined by +m f(4 = J -, f(x>e -iwx dx. For any function f( x), f,( x) denotes the dilation of f( x) by the scale factor s L2( R2) is the Hilbert space of measurable, square-integrable two-dimensional functions. The classical norm of f( x, y) E L2(R2) is given by +m +m Ilf1I2 = J.I I f (K v) -m -m I * dxdv The Fourier transform of f( x, y) E L2( R2) is written f(ax, ay) and is defined by +m +m f(a,, ay) = J J -, f(x, y)e-i(wxx+w yy) dx dy. -m For any function f( x, y) E L2( R2), f,( x, y) denotes the dilation of f(x, y) by the scale factor s 11. CONTINUOUS WAVELETRANSFORM This first section reviews the main properties of the wavelet transform. The formalism of the continuous wavelet transform was first introduced by Morlet and Grossmann [ 131. Let $(x) be a complex valued function. The function $(<) is said to be a wavelet if and only if its Fourier transform $(a) satisfies This condition implies that +m /, $(U> df.4 = 0 Let $,(x) = (l/s)$(x/s) be the dilation of $(x) by the scale factor s. The wavelet transform of a function f( X) E L2( R) is defined by W ( S 3 x) = f * $Ax) (2) The Fourier transform of Wf(s, x) with respect to the x variable is simply given w) =f(w)$(sw). (3) The wavelet transform can easily be extended to tempered distributions that is useful for the scope of this paper. For a quick presentation of the theory of distributions, the reader might want to consult the book of Folland [ 1 I]. If f( x) is a tempered distribution of order n and if the wavelet $( x) is n times continuously differentiable, then the wavelet transform of f(x) defined by (2) is well defined. For example, a Dirac 6( x) is a tempered distribution of order 0 and W6(s, x) = $s( x), if $( x) is continuous. One can prove [13] that the wavelet transform is invertible and f(x) is recovered with where &,(x) denotes the complex conjugate of $,(x). The wavelet transform Wf(s, x) is a function of the scale s and the spatial position x. The plane defined by the couple of variables (s, x) is called the scale-space plane [27]. Any function F(s, x) is not a priori the wavelet transform of some function f( x). One can prove that F( s, x) is a wavelet transform, if and only if it satisfies the reproducing kernel equation +m +m ds F(so, xo) = i I-, F(s, x)k(so, s, xo, x) dx-, S with (5)

3 MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS 619 The reproducing kernel K( so, s, x,, x) expresses the intrinsic redundancy between the value of the wavelet transform at (s, x) and its value at (so, x,). UI. CHARACTERIZATION OF LOCAL REGULARITY WITH THE WAVELET TRANSFORM As mentioned in the introduction, a remarkable property of the wavelet transform is its ability to characterize the local regularity of functions. In mathematics, this local regularity is often measured with Lipschitz exponents. Definition 1: Let n be a positive integer and n 5 a I n + 1. A function f( x) is said to be Lipschitz a, at x,, if and only if there exists two constants A and h, > 0, and a polynomial of order n, P,,(x), such that for h < h, The function f(x) is uniformly Lipschitz a over the interval ]a, b[, if and only if there exists a constant A and for any x, ]a, b[ there exists a polynomial of order n, P,( h), such that equation (7) is satisfied if x, + h ]a, b[. We call Lipschitz regularity of f( x) and x,, the superior bound of all values a such that f(x) is Lipschitz a at x,. We say that a function is singular at x,, if it is not Lipschitz 1 and x,. A function f(x) that is continuously differentiable at a point is Lipschitz 1 at this point. If the derivative of f (x) is bounded but discontinuous at x,, f(x) is still Lipschitz 1 and x,, and following Definition 1, we consider that f( x) is not singular at x,. One can easily prove that if f(x) is Lipschitz CY, for a > n, then f( x) is n times differentiable at x, and the polynomial P,(h) is the first n + 1 terms of the Taylor series of f(x) at xo. For n = 0, we have P,(h) = f( x,). The Lipschitz regularity CY, gives an indica- tion of the differentiability of f(x) but it is more precise. If the Lipschitz regularity CY, of f(x) satisfies n < a, < n + 1, then we know that f( x) is n times differentiable at x, but its nth derivative is singular at x, and a, characterizes this singularity. One can prove that if f(x) is Lipschitz a then its primitive g(x) is Lipschitz a + 1. However, it is not true that if a function is Lipschitz a at a point x,, then its derivative is Lipschitz a - 1 at the same point. This is due to oscillatory phenomena that are further studied in Section V-C. On the opposite, one can prove that if a is not an integer and (Y > 1, a function is uniformfy Lipschitz a on an interval ]a, b[, if and only if its derivative is uniformly Lipschitz a - 1 on the same interval. By extending this property for a > 1, we can define negative uniform Lipschitz exponents for tempered distributions. Integer Lipschitz exponents have a different behavior that is not studied in this article. It is necessary to define properly the notion of negative Lipschitz exponents for tempered distributions, because they are often encountered in numerical computations. Definition 2: Let f( x) be a tempered distribution of finite order. Let a be a noninteger real number and [a, b] an interval of R. The distribution f (x) is said to be uniformly Lipschitz CY on ]a, b[, if and only if its primitive is uniformly Lipschitz a + 1 on la, b[. For example, the second order primitive of a Dirac is a function which is piece-wise linear in the neighborhood x = 0. This function is uniformly Lipschitz 1 in the neighborhood of 0, and thus uniformly Lipschitz a for a < 1. As a consequence of Definition 2, we can derive that a Dirac is uniformly Lipschitz a, for a < - 1, in the neighborhood of 0. Since Definition 2 is not valid for integer Lipschitz exponents, it does not allow us to conclude that a Dirac is uniformly Lipschitz - 1 in a neighborhood of 0. We can however derive that in a neighborhood of 0, the uniform Lipschitz regularity of a Dirac, which is the sup of the Lipschitz exponents, is equal to - 1. Definition 2 is global because uniform Lipschitz exponents are defined over intervals but not at points. It is possible to make a local extension of Lipschitz exponents to negative values through the microlocalization theory of Bony [6], [17], but these sophisticated results go beyond the scope of this article. For isolated singularities, one can define pointwise Lipschitz exponents through Definition 2. We shall say that a distribution f(x) has an isolated singularity Lipschitz a at xo, if and only if f( x) is uniformly Lipschitz a over an interval 10, b[, with x, ]a, b[, and f( x) is uniformly Lipschitz 1 over any subinterval of ]a, b[ that does not include x,. For example, a Dirac centered at 0 has an isolated singularity at x = 0, whose Lipschitz regularity is - 1. A classical tool for measuring the Lipschitz regularity of a function f(x) is to, look at the asymptotic decay of its Fourier transform f(o). One can prove that a bounded function f (x) is uniformly Lipschitz a over R if it satisfies +m /- m If(o)1(1+ Iwla)dw< +m. (8) This condition is sufficient but not necessary. It gives a global regularity condition over the whole real line but from this condition, one can not determine whether the function is locally more regular at a particular point x,. This is because the Fourier transform unlocalizes the information along the spatial variable x. The Fourier transform is therefore not well adapted to measure the local Lipschitz regularity of functions. If the wavelet has a compact support, the value of Wf(s, x,) depends upon the values of f(x) in a neighborhood of x,, of size proportional to the scale s. At fine scales, it provides a localized information on f( x). Theorems 1 and 2 relate the asymptotic decay of the wavelet transform at small scales, to the local Lipschitz regularity. We suppose that the wavelet 11/( x) is continuously differentiable, with real values and a compact support, although the last two conditions are not necessary. The first theorem is a well-known result and a proof can be found in [ 151. Theorem I: Let f(x) L2(R) and [a, b] an interval of R. Let 0 < a < 1. For any E > 0, f( x) is uniformly Lips-

4 620 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH 1992 chitz a over ]a + E, b - E [, if and only if for any E > 0 there exists a constant A, such that for X E ] + ~ E, b - E [ and s > 0, 1 Wf(s, x) I I A,S*. (9) If f( x) E L2(R), for any scale so > 0, by applying the Cauchy-Schwartz inequality, we can easily prove that the function I Wf(s, x) I is bounded over the domain s > so. Hence, (9) is really a condition on the asymptotic decay of I Wf(s, x) 1 when the scale s goes to zero. The sufficient cofdition (8) based on the Fourier transform implies that I f (w) 1 has a decay faster than l/ua, for large frequencies W. Equation (9) is similar if one considers the scale s as locally equivalent to l/o. However, in opposition to the Fourier transform condition, (9) is a necessary and sufficient condition and is localized on intervals, not over the whole real line. In order to extend Theorem 1 to Lipschitz exponents a larger than 1, we must impose that the wavelet $(x) has enough vanishing moments. A wavelet $(x) is said to have n vanishing moments, if and only if for all positive integer k < n, it satisfies +m /-, xk$(x) dx = 0. ( 10) If the wavelet $( x) has n vanishing moments, then Theorem 1 remains valid for any noninteger value a such that 0 < a < n. Let us see how this extension works, in order to understand the impact of vanishing moments. Since $( x) has a compact support, &(a) is n times continuously differentiable and one can derive from equations (10) that $!U) has a zero of order n in w = 0. For any integer p < n, $(U) can, thus, be factorized into In the spatial domain we have and the function $ ( x) satisfies the wavelet admissibility condition (1). The pth derivative of any function f(x) is well defined in the sense of distributions. Hence. The wavelet transform f(x) with respect to the wavelet $(x) is, thus, equal to the wavelet transform of its pth derivative, computed with the wavelet $I( x), and multiplied by sp. Let p be an integer such that 0 < a - p < 1. The function f( x) is uniformly Lipschitz a on an interval ]a, b[, if and only if (dpf)/(dxp) is uniformly Lipschitz a - p on the same interval. Since 0 < a - p < 1, Theorem 1 applies to the wavelet transform of (dpf)/(dxp) defined with respect to the wavelet {/. Theorem 1 proves that (dpf)/(dxp) is uniformly Lipschitz a - p over intervals ]a + E, b - E[, if and only if we can find constants A, > 0 such that for xe]a + E, b - E [, Equation (13) proves that this is true, if and only if I Wf(s, x) I I A S*. (14) Equation (14) extends Theorem 1 for a < n. If $(x) has n vanishing moments but not n + 1, then the decay of I Wf(s, x) I does not tell us anything about Lipschitz exponents for a > n. For example, the function f( x) = sin ( X) is uniformly Lipschitz + 03 on any interval, but if $( X ) has exactly n vanishing moments, one can easily prove that the asymptotic decay of I Wf(s, x) I is equivalent to s on any interval. This decay does not allow us to derive anything on the regularity of the nth + 1 derivative of sin (x). For a < 0 and a $2, (9) of Theorem 1 remains valid to characterize uniform Lipschitz exponents. In this case, we do not need to impose more than one vanishing moment on the wavelet $(x). The proof can easily be derived from the statement of Definition 2. For integer Lipschitz exponents a, (9) is necessary but not sufficient to prove that a function f( x) is uniformly Lipschitz a over intervals ]a + E, b - E [. If a = 1 and the wavelet has at least two vanishing moments, the class of functions that satisfy (9), for any x E R, is called the Zygmund class. This class of functions is larger than the set of functions that are uniformly Lipschitz 1. For example, x log (x) belongs to the Zygmund class although it is not Lipschitz 1 at x = 0. The reader is referred to Meyer s book [22] for more detailed explanations on the Zygmund class. Theorem 1 gives a characterization of the Lipschitz regularity over intervals but not at a point. The second theorem proved by Jaffard [16] shows that one can also estimate the Lipschitz regularity of f(x), precisely at a point xo. The theorem gives a necessary condition and a sufficient condition, but not a necessary and sufficient condition. We suppose that $(x) has n vanishing moments, is n times continuously differentiable, and has a compact support. Theorem 2: Let n be a positive integer and a 5 n. Let f( x) E L2( R). If f( x) is Lipschitz a at xo, then there exists a constant A such that for all point x in a neighborhood of xo and any scale s I yf(s, X) I I A(s* + 1 x - x0i *). (15) Conversely, let a <: n be a noninteger value. The function f(x) is Lipschitz a at -xo, if the following two conditions hold. 0 There exists some E > 0 and a constant A such that for all points x in a neighborhood of xo and any scale s I Wf(s, x) I 5 As. ( 16) 0 There exists a constant B such that for all points x in a neighborhood of xo and any scale s

5 MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS 62 1 As a consequence of Theorem 1, we know that (16) imposes that f(x) is uniformly Lipschitz E in some neighborhood of x,,. The value E can be arbitrarily small. To interpret equations (15) and (17), let us define in the scalespace the cone of points (s, x) that satisfy (x-xo) 5s. For (s, x) inside this cone, (15) and (17) impose that when s goes to zero, 1 Wf(s, x) 1 = O(s"). Below this cone, the value of 1 Wf(s, x ) 1 is controlled by the distance of x with respect to xo, but the necessary and sufficient conditions have different upper bounds. Equation (17) means that for (s, x) below the cone, I Wf(s, x) I = 0(( 1 x - xo 1 1 log 1 x - xo 1 1 )) The behavior of the wavelet transform inside a cone pointing to xo and below this cone are two components that must often be treated separately. Theorems 1 and 2 prove that the wavelet transform is particularly well adapted to estimate the local regularity of functions. For example, Holschneider and Tchamitchian [ 151 used a similar result to analyze the differentiability of the Riemann-Weierstrass function. As mentioned in the introduction, we often want to detect and characterize the irregular parts of signals. Many interesting physical processes yield irregular structures [3]. A well-known example is high Reynold numbers turbulence, for which there is still no comprehensive theory to understand the nature and repartition of irregular structures [5]. For numerical experiments, it is, however, difficult to apply directly Theorems 1 and 2, in order to detect singularities and to characterize their Lipschitz exponents. Indeed, these theorems impose to measure in a whole two-dimensional neighborhood of xo in the scale-space (s, x), which requires a lot of computations. The next section reviews briefly the different techniques that have been used to numerically detect singularities with a wavelet transform. We, then, explain how singular points are related to the local maxima of the wavelet transform modulus. the decay of I Wf(s, x) 1 IV. DETECTION AND MEASUREMENT OF SINGULARITIES In his pioneer work on wavelets, Grossmann [12] gives an approach to detect singularities, with a wavelet which is a Hardy function. A Hardy function g(x) is a complex function whose Fourier transform satisfies g(w) = 0, for w < 0. (18) Let f(x) el2(r) and Wf(s, x) be the complex wavelet transform built with a Hardy wavelet, For a fixed scale s, (3) implies that the Fourier transform Wf(s, w) is also zero at negative frequencies, so it is also a Hardy function. Let us observe that a Hardy wavelet does not satisfy (1) for negative frequencies. The reconstruction formula (4) is valid, if and only if f(x) is also a Hardy function. Let $(s, x) and p(s, x) be, respectively, the argument and modulus of the complex number Wf(s, x). The argument $(s, x) is called the phase of the wavelet transform. Grossmann [12] indicates that in the neighborhood of an isolated singularity located at xo, the lines in the scale-space (s, x) where the phase +(s, x) remains constant, converge to the abscissa xo, when the scale s goes to 0. One can use this observation to detect singularities, but the phase +(s, x) is not sufficient to measure their Lipschitz regularity. Moreover, the value of $(s, x) is unstable when the modulus p(s, x) is close to zero. It is thus necessary to combine the modulus and the phase information to characterize the different singularities, but no effective method has been derived yet. In computer vision, it is extremely important to detect the edges that appear in images and many researchers [7], [20], [21], [26], [27] have developed techniques based on multiscale transforms. These multiscale transforms are equivalent to a wavelet transform but were studied before the development of the wavelet formalism. Let us call a smoothing function, any real function I9( x) such that 19( x) = O( 1 /( 1 + x2)) and whose integral is nonzero. A smoothing function can be viewed as the impulse response of a low-pass filter. An important example often used in computer vision is the Gaussian function. Let e,( x) = (1 /s)i9( x/s). Let f( x) be a real function in L2(R). Edges at the scale s are defined as local sharp variation points of f(x) smoothed by O,(X). Let us explain how to detect these edges with a wavelet transform. Let $'(x) and q2(x) be the two wavelets defined by dcd $'(x) = - dx d20 (x) and g2(x) = dx2. (19) The wavelet transforms defined with respect to each of these wavelets are given by W'f(s, x) =f*$i(x) and and wif(s,.) =f* S- (2) dx W2f(s, x) =f*$:(x); (20) (x) = s-(f*e,)(x) d (21) (22) The wavelet transforms W'f(s, x) and W2f(s, x) are proportional respectively to the first and second derivative of f(x) smoothed by fj,(x). For a fixed scale s, the local extrema of W'f(s, x) along the x variable, correspond to the zero-crossings of W2f(s, x) and to the inflection points of f * e,(x) (see Fig. 1). If the wavelet $2( x) is continuously differentiable, the wavelet transform W2f(s, x) is a differentiable surface in the scale-space plane. Hence, the zero-crossings of W2f(s, x) define a set of smooth curves that often look like finger prints [271. Let us prove that one can define a particular Hardy wavelet such that the phase of the wavelet transform remains constant or changes sign, along these finger prints. Let $J~(x) be the Hilbert transform of $'(x) and $4(x) = $'(x) + i1,5~(x). The wavelet ~+b~(x) is a Hardy wavelet. Let W4f(s, x) = f * $:(x). The real part of W4f(s, x) is equal to W2f(s, x). Hence, the phase $(s, x)

6 622 IEEE 1 RANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH 1992 We suppose that the wavelet $( x) and the function f( x) we analyze, are both real. wzf(s,x).. Fig. 1. Extrema of W f(s, x) and the zero-crossings of W2f(s, x) are the inflection points of f * e,( x). Points of abscissa xo and x2 are sharp variations of f * e,( x) and are local maxima of I W f(s, x) 1. The local minimum of I WIf(s, x) I at xl is also an inflection point but it is a slow variation point. is equal to a/2 or -a/2, if and only if, W2f(s, x) = 0. Since W4f(s, x) is a continuous function, the phase +(s, x) can not jump from a /2 to - a /2 along a connected line in the scale space, unless the modulus is equal to 0. If the modulus of W4f(s, x) is equal to 0, the phase is not defined and it can change sign at these points. Similarly to lines of constant phase, the zero-crossings finger prints indicate the locations of sharp variation points and singularities but do not characterize their Lipschitz regularity. We need more information about the decay of I W2 f (s, x) 1, in the neighborhood of these zero-crossings lines. Detecting the zero-crossings of W2f(s, x) or the local extrema of W f( s, x) are similar procedures, but the local extrema approach has several important advantages. An in- flection point of f *O,(X) can either be a maximum or a minimum of the absolute value of its first derivative. Like at the abscissa x, and x2 of Fig. 1, the local maxima of the absolute value of the first derivative are sharp variation points of f * e,( x), whereas the minima correspond to slow variations (abscissa x]). These two types of inflection points can be distinguished by looking whether an extremum of I W f(s, x) 1 is a maximum or a minimum, but they cannot be differentiated from the zero-crossings of W2f(s, x). For edge or singularity detection, we are only interested in the When detecting the local local maxima of I W f(s, x) 1. maxima of 1 W f(s, x) 1, we can also keep the value of the wavelet transform at the corresponding location. With the results of Theorems 1 and 2, we prove in the next section that the values of these modulus maxima often characterize the Lipschitz exponents of the signal irregularities. V. WAVELETRANSFORMODULUS MAXIMA If the wavelet $(x) is the first derivative of a smoothing function, it has only one vanishing moment. In general, we do not want to impose only one vanishing moment because, as explained in Section 111, in that case we can not estimate Lipschitz exponents larger than 1. In the following sections, we study the mathematical properties of the wavelet modulus maxima and explain how to measure Lipschitz exponents. A. General Properties Let us precisely define what we mean by local maxima of the wavelet transform modulus. Definition 3: Let Wf(s, x) be the wavelet transform of a function f( x). We call local extremum any point (so, x,) such that (a Wf( so, x))/( a x) has a zero-crossing at x = x,, when x varies. We call modulus maximum, any point (so, x,) such that I Wf(s,, x) I < I Wf(s,, x,) I when x belongs to either a right or the left neighborhood of x,, and I Wf(s,, x) I I I Wf(s,, x,) 1 when x belongs to the other side of the neighborhood of x,. We call maxima line, any connected curve in the scale space (s, x) along which all points are modulus maxima. A modulus maximum (so, x,) of the wavelet transform is a strict local maximum of the modulus either on the right or the left side of the x,. The first theorem proves that if the wavelet transform has no modulus maximum at fine scales in a given interval, then the function is uniformly Lipschitz a, for CY < n, in this interval. Theorem 3: Let n be a strictly positive integer. Let $( x) be a wavelet with compact support, n vanishing moments and n times continuously differentiable. Let f(x) E L1([a, bl). If there exists a scale so > 0 such that for all scales s < so and x ]a, b[, 1 Wf(s, x) 1 has no local max- ima, then for any E > 0 and a < n, f( x) is uniformly Lipschitz a in ]a + E, b - E [. If $(x) is the nth derivative of a smoothing function, then f(x) is uniformly Lipschitz n on any such interval ]a + E, b - E [. The proof of this theorem is in Appendix A. Theorem 3 proves that a function is not singular in any neighborhood where its wavelet transform has no modulus maxima at fine scales. In the following, we suppose that $(x) is the nth derivative of a smoothing function. Let us define the closure of the wavelet transform maxima as the set of points on the real line that are arbitrarily close to some modulus maxima in the scale-space (s, x). This means that for any points x, in this closure and for any E > 0, there exists a wavelet transform modulus maxima at a point (sl, x,) that satisfy 1 x1 - xo( < E and s1 < E. Corollary 1: The closure of the set of points XE R, where f(x) is not Lipschitz n, is included in the closure of the wavelet transform maxima of f( x). This corollary is a straight-forward implication of Theorem 3. It proves that all singularities of f( x) can be located by following the maxima lines when the scale goes to zero. It is, however, not true that the closure of the points where f( x) is not Lipschitz n is equal to the closure of the wavelet

7 ~ MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS 623 transform maxima. Equation (32) proves for example that if $(x) is antisymmetrical then for f(x) = sin(x), all the points pz (p E Z) belong to the closure of the wavelet modulus maxima, although sin (x) is infinitely continuously differentiable at these points. It is even possible to create functions that are infinitely continuously differentiable and whose wavelet transforms have an infinite number of maxima lines that converge inside a finite interval [a, b]. Let us now study how to use the value of the wavelet transform maxima in order to estimate the Lipschitz regularity of f(x) at the points that belong to the closure of the wavelet transform maxima. B. Nonoscillating Singularities In this section, we study the characterization of singularities when locally the function has no oscillations. The next section explains the potential impact of oscillations. We suppose that the wavelet $(x) has a compact support, is n times continuously differentiable and is the nth derivative of a smoothing function. The following theorem characterizes a particular class of isolated singularities, from the behavior of the wavelet transform modulus maxima. Theorem 4: Let f(x) be a tempered distribution whose wavelet transform is well defined over ]a, b[, and let xo ]a, b[. We suppose that there exists a scale so > 0, and a constant C, such that for x~]a, b[ and s < so, all the modulus maxima of Wf(s, x) belong to a cone defined by 1 x - xo 1 I cs. (23) Then, at all points x1 ]a, b[, x1 # xo, f(x) is uniformly Lipschitz n in a neighborhood of xl. Let a < n be a noninteger. The function f(x) is Lipschitz a at xo, if and only if there exists a constant A such that at each modulus maxima (s, x) in the cone defined by (23) 1 Wf(s, x) 1 5 As". (24) The proof of this theorem is in Appendix B. Equation (24) is equivalent to log 1 Wf(s, X) 1 I log ( A) + a log (s). (25) If the wavelet transform maxima satisfy the cone distribution imposed by Theorem 4, (25) proves that the Lipschitz regularity at xo, is the maximum slope of straight lines that remain above log I Wf(s, x) 1, on a logarithmic scale. The fact that all modulus maxima remain in a cone that points to xo also implies that f(x) is Lipschitz n at all points x la, b[, x # xo. Fig. 3 shows the wavelet transform of a function with isolated singularities that verify the cone localization hypothesis. To compute this wavelet transform we used a wavelet with only one vanishing moment. The graphs of $(x> and its primitive O(x) are shown in Fig. 2. The Fourier transform of $(x) is Fig. 2. (a) Graph of a wavelet $(x) with compact support and one vanishing moment. It is a quadratic spline. (b) Graph of the primitive O(x) with compact support. It is a cubic spline. This wavelet belongs to a class for which the wavelet transform can be computed with a fast algorithm [29]. In numerical computations, the input function is not known at all abscissa values x but is characterized by a uniform sampling which approximates f(x) at a resolution that depends upon the sampling interval [18]. These samples are generally the result of a low-pass filtering of f( x) followed by a uniform sampling. If we suppose for normalization purpose that the resolution is 1, we can only compute the wavelet transform at scales larger than 1. When a function is approximated at a finite resolution, strictly speaking, it is not meaningful to speak about singularities, discontinuities and Lipschitz exponents. This is illustrated by the fact that we can not compute the asymptotic decay of the wavelet transform amplitude at scales smaller than 1. In practice, we still want to use the mathematical tools that describe singularities, even though we are limited by the resolution of measurements. Suppose that the approximation of f( x) at the resolution 1 is given by a set of samples (fn)nez, with f,, = 0 for n < no, and f,, = 1 for n 2 no, like at the abscissa 0.88 of Fig. 3(a). We would like to say that at the resolution 1, f(x) behaves as if it has a discontinuity at n = no, although it is possible that f( x) is continuous at no but has a sharp transition at that point which is not visible at the resolution 1. The characterization of singularities from the decay of the wavelet transform enables us to give a precise meaning to this discontinuity at the resolution 1. Since we can not measure the asymptotic decay of the wavelet transform when the scale goes to 0, we measure the decay of the wavelet transform up to the finest scale available. The Lipschitz regularity is computed by finding the coefficient CY such that As" approximates at best the decay of 1 Wf(s, x) 1, over a given range of scales larger than 1 (see Fig. 3(b)). With this approach, we can use Lipschitz exponents to characterize the irregularities of discrete signals. In Fig. 3(b), the discontinuity appears clearly from the fact that 1 Wf(s, x) I remains approximately constant over a large range of scales, in the neighborhood of the abscissa Negative Lipschitz exponents correspond to sharp irregularities where the wavelet transform modulus increases at fine scales. A sequence (f,jnez, with f,, = 0 for n # no, and f,,, = 1, can be viewed as the approximation of a Dirac at the resolution 1. At the abscissa 0.44, the signal of Fig. 3(a) has such a discrete Dirac. The wavelet transform maxima increase proportionally to s- I, over a large range of

8 624 IEBE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH (a) 3%,, ; '. I. ' / 4% -4-5 ::i Fig. 3. (a) In the left neighborhood of the abscissa 0.16, the signal locally behaves like 1 + I x ', whereas in the right neighborhood it behaves like 1 + I x '. At the abscissa 0.44 the signal has a discrete Dirac (Lipschitz regularity equal to - I). At 0.7, the Lipschitz regularity is 1.5 and at the abscissa 0.88 the signal is discontinuous. (b) Wavelet transform between the scales 1 and 2' computed with the wavelet shown in Fig. 2(a). The finner scales are at the top, and the scale varies linearly along the vertical. Black, grey and white points indicate that the wavelet transform has respectively negative, zero, and positive values. (c) Each black point indicates the position of a modulus maximum in the wavelet transform shown in (b). The singularity of the derivative can not be detected at the abscissa 0.7 because the wavelet has only one vanishing moment. (d) Modulus maxima of the wavelet transform of the signal (a), computed with a wavelet with two vanishing moments. Number of maxima lines increases. Singularity of the derivative at 0.7 can now be detected from the decay of the wavelet modulus maxima. (e) Decay of log, I Wf(s, x) I as a function of log, (s) along the two maxima lines that converge to the point of abscissa 0.16, computed with the wavelet of Fig. 2(a). Two different slopes show that f(x) has a different singular behavior in the left and right neighborhoods of 0.16, and we can measure the two exponents 0.2 and 0.6 scales in the corresponding neighborhood. In the rest of this paper, we suppose that all numerical experiments are performed on functions approximated at the resolution 1, and we consider that the decay of the wavelet transform at scales larger than 1 characterize the Lipschitz regularity of the function up to the resolution 1. Fast algorithms to compute the wavelet transform are described in [14], [18]. We shall not worry anymore about the issue related to asymptotic measurements and finite resolution. The modulus maxima of the wavelet transform of Fig 3(b) are shown in Fig. 3(c). The black lines indicate the position of the modulus maxima in the scale-space. Fig. 3(e) gives the value of log, I Wf(s, x) 1 as a function of log, (s) along the two maxima lines that converge to the point of abscissa At fine scales, the slope of theses two maxima line is different and are approximately equal to 0.2 and 0.6. This shows that f(x) behaves like a function Lipschitz 0.2 in its left neigh- borhood and a function Lipschitz 0.6 in its right neighborhood. The Lipschitz regularity of f(x) at 0.16 is 0.2 which is the smallest slope of the two maxima lines. At this point one might wonder how to choose the number of vanishing moments to analyze a particular class of signals. If we want to estimate Lipschitz exponents up to a maximum value n, we know that we need a wavelet with at least n vanishing moments. In Fig. 3(c), there is one maxima line converging to the abscissa 0.7, along which the decay of log I Wf(s, x) 1 is proportional to log (s). The signal was built from a function whose derivative is singular, but this can not be detected from the slope of log 1 Wf(s, x) I because the wavelet has only one vanishing moment. Fig. 3(d) shows the maxima lines obtained from a wavelet that has two vanishing moments. The decay of the wavelet transform along the two maxima lines that converge to the abscissa 0.7 indicates that f(x) is Lipschitz 1.5 at this location. Using

9 MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS 625 wavelets with more vanishing moments has the advantage of being able to measure the Lipschitz regularity up to a higher order, but it also increases the number of maxima line as it can be observed by comparing Fig. 3(c) and 3(d). Let us prove this last observation. A wavelet $(x) with n + 1 vanishings moment is the derivative of a wavelet $ ( x) with n vanishing moments. Similarly to (21), we obtain The wavelet transform of f(x), defined with respect to $(x), is proportional derivative of the wavelet transform of f(x), defined with respect to $ (x). Hence, the number of local maxima of 1 Wf(s, x) I is always larger than the number of local maxima of I W f(s, x) 1. The number of maxima at a given scale often increases linearly with the number of moments of the wavelet. In order to minimize the amount of computations, we want to have the minimum number of maxima necessary to detect the interesting irregular behavior of the signal. This means that we must choose a wavelet with as few vanishing moments as possible, but with enough moments to detect the Lipschitz exponents of highest order that we are interested in. Another related property that influences the number of modulus maxima is the number of oscillations of the wavelet $(x). For most types of singularities, the number of maxima lines converging to the singularity depends upon the number of local extrema of the wavelet itself. A Dirac 6(x) gives a simple verification of this property, since W6(s, x) = (l/s)$(x/s). A wavelet with n vanishing moments has at least n + 1 local extrema. For numerical computations, it is better to choose a wavelet with exactly n + 1 local extrema. In image processing, we often want to detect discontinuities and peaks that have Lipschitz exponents smaller than 1. It is, therefore, sufficient to use a wavelet with only one vanishing moment. In signals obtained from turbulent fluids, interesting structures have a Lipschitz exponent between 0 and 2 [4]. We, thus, need a wavelet with two vanishing moments to analyze turbulent structures. Let us suppose that the wavelet $(x) has a symmetrical support equal to [ - K, K]. We call cone of influence of xo in the scale-space plane, the set of points (s, x) that satisfy I x - x ~ I ~ K S. It is the set of point (s, x) for which Wf(s, x) is influenced by the value of f(x) at xo. In order to characterize the regularity of f(x) at a point xo, one might think that it is sufficient to measure the decay of the wavelet transform within the cone of influence of xo. Theorem 2 proves that this is wrong in general and that one must also measure the decay of the wavelet transform below this cone of influence. This is due to oscillations that can create a singularity at x,,. The next theorem proves that if we suppose that f( x) has no such oscillations, then the regularity of f( x) at a point xo is characterized by the behavior of its wavelet transform along any line that belongs to a cone strictly smaller than the cone of influence. Section V-C explains why this property is wrong when f(x) oscillates too much. In the following, we suppose that $( x ) is a real wavelet which is n times continuously differentiable, has a support equal to [ - K, K], and is equal to the nth derivative of a function O(x). We also impose that e( x) is strictly positive on the interval ] - K, K [. Theorem 5: Let x,, E R and let f(x) EL*( R). We suppose that there exists a neighborhood ]a, b[ of xo and a scale so > 0, such that the wavelet transform Wf(s, x) has a constant sign for s < so and X E ] ~ b[., Let us also suppose that there exists a constant B and E > 0 such that for all points x 10, b[ and any scale s 1 Wf(s, x) I I Bs. (28) Let x = X(s) be a curve in the scale space (s, x) such that I xo - X(s) I I Cs, with C < K. If there exist a constant A such that for any scale s < so, the wavelet transform satisfies 1 Wf(s, X(s)) I I AsY, with0 I y I n, (29) then f(x) is Lipschitz cy at xo, for any cy < y. The proof of this theorem is in Appendix C. One can easily prove that the sign constraint over the wavelet transform of f(x) is equivalent to impose that the nth derivative of f(x) is a distribution which has a constant sign in a neighborhood of x,,. Theorem 5 proves that the regularity of f( x) is controlled by the behavior of its wavelet transform in the cone of influence, if its nth derivative does not have an oscillatory behavior that accelerate in the neighborhood of xo. A similar theorem can be obtained if we suppose that the nth derivative of f(x) has a constant sign over a left and right neighborhood of x,,, but changes sign at xo. This means that in the neighborhood of xo, Wf(s, x) has only one zero-crossing at any fixed scale s, which is small enough. When s goes to zero, the zero-crossing curve converges to the abscissa xo. In this case, we need to control the decay of the wavelet transform along two lines that remain, respectively, in the left and the right part of the cone of influence of xo. From Theorem 5, one can compute the Lipschitz regularity of certain types of nonisolated singularities, from the behavior of the wavelet transform modulus maxima. We find whether the wavelet transform has a constant sign in a neighborhood of xo, by testing the sign at the locations where its modulus is locally maximum. It also sufficient to verify equation (28) along the lines of maxima that converge in the same neighborhood of xo. The Lipschitz regularity of f(x) at xo is then obtained from the decay of the wavelet transform modulus, along any line of maxima that converges towards xo, inside the cone of influence. Let us emphasize again that if at each scale, the wavelet transform has only one zero-crossing in a neighborhood of x,,, Theorem 5 can be extended by measuring the decay of the wavelet transform along two maxima lines that are respectively in the left and the right parts of the cone of influence of xo. This is the case for the singularity located at the abscissa 0.16, in Fig. 3(a). A devil staircase is an interesting example to illustrate the application of Theorem 5 to the detection of nonisolated singularities. The derivative of a devil staircases is a Cantor

10 626 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH 1992 measure. For the devil staircase shown in Fig. 5(a), the Cantor measure is built recursively as follows. For p = 0, the support of the measure po is the interval [0, 11, and it has a uniform density equal to 1 on [0, 11. The measure pp is defined by subdividing each domain where ppp has a uniform density equal to a constant c > 0, into three domains whose respective sizes are 1/5, 2/5, and 215. The density of the measure p p is equal to 0 in the central part, to c/3 in the first part and to 2 c/3 in last part (see Fig. 4). One can verify that /dpp(dx) = 1. The limit measure p, obtained with this iterative process is a Cantor measure. The devil staircase is defined by : 0 1 P 0 2/5 p - 1- p _- s Fig. 4. Recursive operation for building a multifractal Cantor measure. Cantor measure is obtained at the limit of this iterative procedure. Fig. 5(a) shows the graph of a devil staircase, and Fig. 5(b) its wavelet transform computed with the wavelet of Fig. 2(a). For a devil staircase, we can prove that the maxima lines converge exactly to the points where the function f(x) is singular. There is no maxima line that converges to a point where the function is not singular. Proof: By definition, the set of points where the maxima lines converge is the closure of the wavelet transform maxima and Corollary 1 proves that it includes the closure of the points where f(x) is singular. For a devil staircase, the support of the points where f(x) is singular is equal to the support of the Cantor measure which is a closed set. It is thus equal to its closure. For any point xo outside this closed set, we can find a neighborhood ]xo- E, xo + E [ that does not intersect the support of p,( x). On this open interval, f( x) is constant, so for s small enough and XE]X, - ~ /2, xo + E /2[, Wf(s, x) is equal to zero. The point xo, therefore, can not belong to the closure of the wavelet transform maxima. This proves that the closure of the wavelet transform maxima is included in the singular support of f( x). Since the opposite is also true, it implies that both sets are equal. For the particular devil staircase that we defined, the Lipschitz regularity of each singular point depends upon the location of the point. One can prove [4] that at all locations, the Lipschitz regularity a! satisfies 1 c, 1 c, Fig. 5. (a) Devil staircase. (b) Wavelet transform of the devil staircase computed with the wavelet of Fig. 2(a). Black and white points indicate respectively that the wavelet transform is zero or strictly positive. (c) Modulus maxima of the wavelet transform shown in (b). Hence, (28) of Theorem 5 is verified for E < (log(2/3))/ (log(2/5)). Since a devil staircase is monotonously increasing and our wavelet is the derivative of a positive function, the wavelet transform remains positive. Theorem 5 proves that the local Lipschitz regularity of f(x) at any singular point can be estimated from the decay of the wavelet transform along the maxima line that converges to that point. Fig. 5(c) shows the position of the maxima lines in the scale-space. The renormalization properties of the Cantor set appear as renormalization properties of the graph of maxima lines. Ameodo, Bacry, and Muzy [2], [24] have shown that one can precisely compute the singularity spectrum f( a) of multifractal signals, from the evolution across scales of the wavelet transform modulus maxima. These results are particularly interesting for studying irregular physical phenomena such as turbulence [2], [24]. C. Singularities with Fast Oscillations If the function f (x) is oscillating quickly in the neighborhood of xo, then one can not characterize the Lipschitz regularity of f( x) from the behavior of its wavelet transform in the cone of influence of xo. We say that a function f( x) has fast oscillations at xo, if and only if there exists a! > 0 such that f( x) is not Lipschitz a! at xo although its primitive is Lipschitz a + 1 at xo. This situation occurs when f(x) is a function which oscillates very quickly, and whose singularity behavior at xo is dominated by these oscillations. The

11 MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS 621 primitive of f( x) is computed through an integral that averages f( x), so the oscillations are attenuated and the Lipschitz exponent at xo increases by more than 1. Singularities with f (x) such an oscillatory behavior have been thoroughly studied in mathematics [30]. A classical example is the function f(x) = sin (l/x), in the neighborhood of x = 0. This function is I ' " ' I ' ' ' ' ~ " " ~ not continuous at 0 but is bounded in the neighborhood of 0, (a) so that its Lipschitz regularity is equal to 0 at x = 0. Let g( x) be a primitive of sin (l/x). One can easily prove that 4% I a(x) - ~ (0) I = O(x2) in the neighborhood of x = 0, SO Similarly to (21) we can prove that W'g(s, x) = g*$,l(x) = s(f*$,)(x) = sw(s, x). We, thus, derive that This equation proves that although f(x) is not Lipschitz CY, in the cone of influence of xo, we have I Wf(s, x) I = O(sa). The fact that f(x) is not Lipschitz CY can not be detected from the decay of 1 Wf(s, x) 1 inside the cone of influence of xo, but by looking at its decay below the cone of influence, as a function of I x - xo 1. For example, for (s, x) below the cone of influence of xo, f( x) might not satisfy the necessary condition I Wf(s, x) 1 = O( 1 x - xo 1 "). When a function has fast oscillations, its worst singular behavior at a point xo is observed below the cone of influence of xo in the scale-space plane. Let us study in more detail the case of f(x) = sin (l/x). Since the primitive is Lipschitz 2, we can take CY = 1. Equation (31) implies that in the cone of influence of 0, the wavelet transform satisfies 1 Wf(s, x) 1 = O(s). Fig. 6(b) shows the wavelet transform of sin (l/x). It has a high amplitude along a curve in the scale space (s, x) which reaches (0,O) below the cone of influence of 0. It is along this path in the scale-space that the singular part of f(x) reaches 0. Let us interpret this curve and show that it is a parabola. Through this analysis we derive a procedure to estimate locally the size of the oscillations of f( x). The function f(x) = sin(l/x) can be written f(x) = sin (w,x), where w, = l/xz can be viewed as an "instantaneous" frequency. Let us compute the wavelet transform of a sinusoidal wave of constant frequency wo. If we suppose that the wavelet $(x) is antisymmetrical, as it is the case in our numerical computations, from (3) we derive that the wavelet transform of h( x) = sin ( wo x) satisfies (d) 3 X Fig. 6. (a) Graph of sin (1,'~). (b) Wavelet transform of sin (l/x). Amplitude is maximum along a parabola that converges to (0,O) in the scale-space. (c) Modulus maxima of the wavelet transform. (d) Maxima line are displayed from the scale where is located the largest general maxima. Extremity of each maxima line indicates the position of a general maxima point and it belongs to a parabola in the scale-space (s, x). For a symmetrical wavelet, the cosine is replaced by a sine in the right-hand side of this equation. For a fixed abscissa X, the decay of 1 Whjs, x) 1 as a function of s, is proportional to the decay of I $(suo) 1. If I $(a) I reaches its maximum at w = a,, then for x fixed, I Wh(s, x) I is maximum at so = U, /wo. The scale where 1 Wh(s, x) 1 is maximum is inversely proportional to the frequency of the sinusoidal wave. The value of Wh(s, x) depends upon the values of h( x) in a neighborhood of size proportional to the scale s, so the frequency measurement is local. Since f( x) = sin (1 / X ) has an instantaneous frequency w, = 1/x2, for a fixed abscissa x, 1 Wf(s, x) I is globally maximum for

12 628 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38. NO. 2. MARCH 1992 s = w,/w, = w,x2. This is why we see in Fig. 6(b) that the wavelet transform has a maximum amplitude along a parabola that converges to the abscissa 0 in the scale-space. This instantaneous frequency measurement is based on an idea that has been developed previously by Escudie and Torresani [lo] for measuring the modulation law of asymptotic signals. The results of Escudie and Torresani have also been refined by Delprat et al. [9] who explain how to precisely extract the amplitude and frequency modulation laws, by using Hardy wavelets. Let us now study the behavior of the wavelet transform maxima. The inflection points of f(x) are located at x = I/nn, for n 2, and f(x) is continuously differentiable. Since the wavelet $(x) has only one vanishing moment, all the maxima lines converge toward the points x = 1 / n T, and the wavelet transform along these maxima line satisfies The derivative of f( x) at 1 /nt is equal to ( - l),+ln2, so one can derive that A, = O( n ). It is interesting to observe that along all maxima lines in the neighborhood of 0, the wavelet transform decays proportionally to the scale s, although f(x) is discontinuous at 0. This singularity at 0 can however be detected because the constants An grow to + 03 when we get closer to 0. Fig. 6(c) displays the modulus maxima of the wavelet transform of sin (1 / x). In the neighborhood of 0, at fine scales, the maxima line have a different geometry in the scale space (s, x), due to the aliasing when sampling sin (1 / x) for numerical computations. Let us introduce the general maxima points and explain how they are related to the size of the oscillations of f( x). Definition 4: We call general maximum of Wf(s, x), any point (so, x,,) where 1 Wf(s, x) 1 has a strict local maximum within a two-dimensional neighborhood in the scale-space plane (s, x). A general maximum is also a modulus maximum of the wavelet transform, as defined by Definition 3, and thus belongs to a modulus maxima line. General maxima are points where I Wf(s, x) I reaches local maxima when the variables (s, x) vary along a maxima line. Equation (32) proves that the maxima line of the wavelet transform of sin (w,,x) are vertical lines in the scale-space^ plane (s, x) given by x = n?r, for n EZ. If for w > 0, I $(U) 1 has one global maxima at w, and no other local maxima, then (32) implies that there is only one general maximum along each maxima line, and that it appears at the scale so = U, /U,. A wavelet $(x) equal to the nth deriyative of a Gaussian has such a property. If for w > 0, 1 $(U) 1 has several local maxima, there are several general maxima along each maxima line, but the one where 1 Wf(s, x) I has the highest value is at the scale so = U, /ao. One can thus recover the frequency w, from the location of this general maximum. Fig. 6(d) displays the subpart of each maxima line that is below the general maximum of largest amplitude. In the scale-space, these general maxima belong to a parabola whose equation is approximately given by s = ( ~,,~)/(w~) = Ax. This equation is only an approximation because the frequency w, is not constant. A finer analysis of this type of properties can be found in the work of Delprat et al. [9]. If f(x) is locally equal to the sum of several sinusoidal waves whose frequency are well apart, so that they can be discriminated by $(sw) when s varies (see (32)), then we can measure the frequency of each of these sinusoidal waves from the scales of the general maxima that they produce. The efficiency of :his method depends on how concentrated is the support of $(U). Here, we are limited by the uncertainty principle that imposes that $( x) can not have its energy well concentrated both in spatial and frequency domains. To distinguish spectral lines that are too close, it is necessary to use more sophisticated methods as described by Delprat et al. [9]. Let us now give a spatial domain interpretation of this frequency measurement. We show that if the wavelet $(x) has only one vanishing moment, the general maxima points provide a measurement of the local oscillations, even if the function is not locally similar to a sinusoidal wave. If $( x) is the derivative of a smoothing function O(x), (21) proves that hence If locally f(x) has a simple oscillation like in Fig. 7, df (x)/ dx has a constant sign between the two top points x1 and x, of the oscillation. The point (so, x,) is a general maximum if the support of 0 ( x,, - x / so) covers as much as possible the positive part of df( x)/dx, without paying the cost of covering a domain where df(x)/dx is too negative. Hence, the distance between the two top points of the oscillation is of the order of the size of the support of O(x), multiplied by the scale so: X? - X, Ks,. (35) This spatial domain interpretation shows that even if the function is not locally similar to a sinusoidal wave, the size of the oscillation is approximatively proportional to the scale s,, of the general maximum point that is created. If the wavelet $(x) has more than one vanishing moment, this spatial interpretation is not valid. With (3 l), we saw that if a function f( x) has fast oscillations in the neighborhood of xo, then the regularity at x,, depends on the behavior of Wf(s, x) below the cone of influence of x,,. To estimate this behavior, one approach is to measure the decay of I Wf(s,, x,) 1 at the general maxima points (sr, xr) that are below the cone of influence of xo, when x, converges to xo. Indeed, these general maxima points characterize the size of the oscillations of f( x) and give an upper bound for the value of the wavelet transform along each maxima line. Theorem 2 proves that f(x) is Lipschitz a at x,,, only if 1 Wf(s, x) 1 = O( 1 x - xo I ) below the cone of influence. Hence, f( x) can be Lipschitz cy at a point xo only if the general maxima point (s,, x,) below

13 m MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS 629 algorithm based on modulus maxima. Section VI1 describes an application to the suppression of white noise with a local estimation of Lipschitz exponents. We call dyadic wavelet transform the sequence of functions of the variable x xl x2 (37) Equation (3) implies that the Fourier transform of Wf(2 J, X) is given by Fig. 7. We suppose that the wavelet is the first derivative of a smoothing function B(x). The point (so. x,) is a general maximum of Wf(s, x), if Bs0(x - x,) covers as much as possible the domain where f(x) has a derivative of constant sign. the cone of influence of xo satisfy This necessary condition gives an upper bound on the Lipschitz exponents at xo. For f (x) = sin (l/x), equation (36) is satisfied only for CY = 0. We thus detect the discontinuity at x = 0, from the values of the general maxima points. In most situations, the general maxima points must be used in conjunction with the modulus maxima lines in order to estimate the decay of I Wf(s, x) I inside and below the cone of influence of xo. VI. COMPLETENESS OF THE WAVELET MODULUS MAXIMA We proved that the singularities of a function can be detected from the wavelet transform modulus maxima. One might wonder how much information is carried by the positions and the values of the wavelet transform modulus maxima. The reconstruction of a function from the local maxima of its wavelet transform modulus has been studied numerically by Zhong and one of us [18]. Modulus maxima are detected only along a dyadic sequence of scales (2j),,,- to obtain efficient numerical implementations. At each scale 2 J, we record the position of the local maxima of 1 Wf(2, x) 1, and the value of Wf(2J, x) at the corresponding location. The reconstruction algorithm recovers an approximation of the original signal, with a signal to error ratio of the order of 40 db. The approximation error is mostly concentrated at high frequencies. More recently, Meyer [23] proved that the wavelet transform modulus maxima do not provide a complete signal representation. He constructed different functions whose wavelet transform have the same local extrema at all scales. These functions mostly differ at high frequencies, and their relative L2(R) distance is of the same order than the precision of the numerical reconstruction algorithm. This seems to indicate that the wavelet transform modulus maxima provide a complete characterization of functions, modulo a small high-frequency error that remains to be identified mathematically. This section reviews briefly the properties of a dyadic wavelet transform as well as the w) = 4 (2Jw)j(w). (38) The function f(x) can be reconstructed from its wavelet transform, and the reconstruction is stable [8], [18], if and only if there exists two constants A > 0 and B > 0 such that +W A 5 I$(2Jo)IZ~B. (39) J= - W One can then prove that there exists a (nonunique) reconstructing wavelet x( x) whose Fourier transform satisfies +W J= ~W 4(2Jw)i(2Jw) = 1. The function f(x) is recovered from its dyadic wavelet transform with +W f(x) = c W (2J9.>*X2J(X). J= ~ W Similarly to the continuous wavelet transform, the dyadic wavelet transform is overcomplete. This means that any sequence (gj( x))~,~ is not a priori the dyadic wavelet transform of some function f( x) E L2( R). One can prove [ 181 that such a sequence is the dyadic wavelet transform of some function in L2(R), if and only if it satisfies the reproducing kernel equations with +W VjEZ c g,*k,,,(x) = g,(x), (40) I= ~ If the wavelet satisfies (39), Theorems 1 and 2 remain valid if we restrict the scale to the sequence (2j)j,z [16]. Hence, the Lipschitz regularity of a function is characterized by the decay across dyadic scales of the wavelet transform modulus. The results and theorems of Section V are also valid if we restrict the scale parameter s to the values (2j)j,z. Fig. 8(b) is the dyadic wavelet transform of the signal in Fig. 8(a), computed with the wavelet shown in Fig. 2(a). The finest scale is limited by the resolution of the original discrete signal. We also stop the decomposition at a finite largest scale 2. One can prove [18] that the information at scales larger than 2, ( Wf(2 J, x))~,, can be aggregated in a single low-frequency function Sf(2J, x). The function Sf(2, x) is obtained by convolving f( x) with an appropriate smoothing function O( x), dilated by 2 [18]. In Fig. 8(b), the largest scale 2 is 26. Fig. 8(c) displays the

14 630 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH 1992 has at most a countable number of modulus maxima. Let (xj, JncZ be the abscissa where I Wf(2j, x) I is locally maximum. The maxima constraints on Wh(2', x) can be decomposed in two conditions. a) At each scale 2J, the local maxima of 1 Wh(2', x) 1 are located at the abscissa ( xj, n)nez. b) At each scale 2J and any abscissa xj, n, Wh(2J, xj,,) = Wf(2J, XjJ. The condition a) is difficult to use as such because it is non convex. We replace this condition by the minimization of the following Sobolev norm +m I Fig. 8. (a) Original signal. (b) Wavelet transform computed up to the scale 26. Bottom graph gives the remaining low-frequencies, at scales larger than 26. (c) At each scale, each Dirac indicates the position of a modulus maximum and the value of the wavelet transform at the corresponding location. (d) Signal reconstructed from the wavelet transform modulus maxima shown in (c) plus the low-frequency signal Sf(26, x). modulus maxima of the wavelet transform. Each Dirac indicates the position of a modulus maxima and the value of Wf(2', x) at the corresponding location. Since the wavelet is the first derivative of a smoothing function, the wavelet transform maxima are located where the signal has sharp transitions. They provide an adaptive description of the signal information. The more irregularities in the signal, the more wavelet maxima. We also keep the remaining low frequency information Sf (26, x), with the maxima information. Let us explain the algorithm introduced by Zhong and one of us [ 181, that reconstructs the approximation of a function from the modulus maxima of its wavelet transform. Let f( x) E L2( R) and (Wf(2J, be its dyadic wavelet transform. We want to find the set of functions h(x) such that at each scale 2', the modulus maxima of Wh(2j, x) are the same as the modulus maxima of Wf(2J, x). We suppose that the wavelet $( x) is differentiable in the sense of Sobolev. Since Wf(2', x) is obtained through a convolution with G2,(x), it is also differentiable in the sense of Sobolev and it Let us explain why this minimization has a similar effect. At each scale 2', we try to decrease the L2(R) norm 11 W/Z(~~, x)ii, so that Wh(2,, x) is as small as possible. In conjunction with condition b), this constraint has a tendency to create local maxima at the positions xj, n. By minimizing 11 dwh(2', x)/dxll, we also impose that Wf(2J, x) has as few oscillations as possible, and, thus, as few spurious local maxima as possible, outside the abscissa x,, n. The weight on the derivative component is proportional to the scale because the smoothness of Wf (2', x) increases with the scale 2 J. For a large class of wavelets defined in [ 181, one can prove that for any h(x) EL'(R), I( Wh(21, x ) ) I is ~ finite. ~ ~ Let us now interpret the condition b). Let K be the space of all sequences of functions ( g,( x)),~~ such that l(gj(x))j zl < +03. The norm 1 * 1 defines a Hilbert structure over K. The space V of all dyadic wavelet transforms of functions in L2(R) is included in K, since (41) has a finite value. Let us define the set r of all sequences of functions (g,( x)),,~ E K, such that for any index j and all maxima positions x,, n, The set I' is an affine space which is closed in K. The dyadic wavelet transforms that satisfy the condition b) are the sequences of the functions that belong to A=rnv. The reconstruction problem thus consists in minimizing the norm 1 * I over the closed affine space A. This minimization has a unique solution that we compute with alternative projections. Let Pr be the projection on the convex set r, which is orthogonal with respect to the norm 1 1. This operator is characterized in [ 181. Let P, be the projection on the Hilbert space Y, which is orthogonal with respect to I - 1. One can prove [18] that this operator is defined by the reproducing kernel equations (40). Let P = P, OF', be alternative projections on both spaces. Let P(") be the composition n times of

15 MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS 63 1 the operator P. Since r is an affine space and V a Hilbert space, a classical result on alternative projections [28] proves that for any sequence of functions (g,( x)),,~ E K, limn+ + P(n)( g,( x)),~~ converges strongly to orthogonal projection of ( gj( x)),,~ onto A. Hence, if gj( x) = 0 for all j E Z, the alternative projection converges to the element of A whose norm I. I is minimum. The algorithm is illustrated by Fig. 9. When the signal has sharp isolated singularities such as step edges, the solution of the minimization problem can generate a wavelet transform with spurious oscillations, similar to Gibbs phenomena. To avoid these oscillations, we impose some further convex constraints on the sign of the reconstructed wavelet transform and on the sign of its derivative, between two consecutive modulus points. Numerical experiments [18] show that the convergence is not affected by these other constraints. The theoretical and numerical stability of this reconstruction algorithm is further studied in [18], and an efficient discrete implementation is described. If the discrete signal has a total of N samples, the computation complexity to implement the projections on V and r is O( N log (N)). In general, the algorithm does not reconstruct to the original wavelet transform, but it converges to a close element in A (see Fig. 9). We reconstruct the corresponding signal by applying the inverse wavelet transform operator. In all numerical experiments, we found that the signal-to-noise ratio (SNR) of this reconstruction is larger than 30 db after 20 iterations on the operator P [18]. Fig. 8(d) is an example of signal reconstructed with 20 iterations. The differences with the original function are not visible on the graph. If we increase the number of iterations, the reconstruction error decreases but reaches a limit which is of the order of 40 db, because we do not converge to the original wavelet transform. There is no mathematical proof that for any signal, this algorithm does reconstruct a wavelet transform that is a good approximation of the original wavelet transform. However, extensive numerical experiments show that the precision of the reconstructions is sufficient for a large class of signal processing applications. The next section describes an application to denoising. VII. SIGNAL DENOISING BASED ON WAVELET MAXIMA IN ONE DIMENSION The properties of a signal can be modified by processing its wavelet transform maxima and by reconstructing the corresponding function. We describe an application to denoising based on a local analysis of the signal and the noise singularities. Let us first describe the properties of the wavelet transform of a white noise. Let n(x) be a real, wide sense stationary white noise of variance u2, and Wn(s, x) be its wavelet transform. We denote by E(X) the expected value of a random variable X. We suppose that the wavelet $(x) is real. Grossmann et al. [12] have shown that the decay of E( I Wn(s, x) I ) is proportional to l/s. Indeed, +m +m I Wn(s, x) l 2 =.I.I n(u)n(v) -m -m * $,(x - U>$,(X - V ) dudv. Fig. 9. An approximation of the wavelet transform cf f(x) is reconstructed by alternating orthogonal projections on an affine space r and on the space V of all dyadic wavelet transforms. Projections begin from the zero-element and converges to its orthogonal projection on r n V. Since n(x) is a white noise, E(n(u)n(v)) = u26(u - v), hence +m +m E( I Wn(s, x)l ) =.I.I u26(u - v) -m -m We, thus, derive that E( I wn(s, * $,(x - u)$,(x - V ) dudv. II x) 1 ) = ~. 11 2cJ2 S At a given scale s, the wavelet transform Wn(s, x) is a random process in x. If we suppose that the white noise n(x) is a Gaussian white noise, then Wn(s, x) is also a Gaussian process. From this property, we prove in Appendix D that at a scale s, the average density of wavelet transform modulus maxima is where $( )(x) and $(2)(x) are, respectively, the first and second derivatives of $( x). The average density of modulus maxima is thus inversely proportional to the scale s. The realization of a white noise is a distribution which is almost everywhere singular. One can prove that such a realization is a distribution which is uniformly Lipschitz - $ - E, for any E > 0. Fig. 10(a) was obtained by adding a Gaussian white noise to the signal in Fig. 8(a), and the SNR is 5.4 db. Fig. 10(b) shows its dyadic wavelet transform computed over four scales. The bottom graph is the remaining low-frequencies J (z4, x), at scales larger than 24. Let us suppose that the original signal has singularities, whose Lipschitz regularity are positive. This is the case of the signal in Fig. 8(a). In the left part, the worst singularities are discontinuities which are Lipschitz 0. The right part in the realization of a Brownian process, which is uniformly Lipschitz - E, for any E > 0. Since the noise creates singularities whose Lipschitz regularity is negative, we can discriminate the modulus maxima created by the white noise from the one produced by the signal, by looking at the evolution of their amplitude across scales. If the modulus maxima have an amplitude which increase strongly when the scale decreases, it indicates that the corresponding singularities have negative Lipschitz exponents. These maxima are mostly dominated by the white noise and are thus removed.

16 ~ IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH 1992 to other maxima at the next scale. We suppose that a modulus maximum propagates from a scale 2' to a coarser scale 2J+', if it has a large amplitude and its position is close to a maximum at the scale 2J" having the same sign. Such an ad hoc algorithm is not exact but saves computations since we do not need to compute the wavelet transform at any other scale. (a) The denoising algorithm removes all maxima whose amplitude increase on average when the scale decreases, or which WJP'. 1) ) do not propagate to larger scales. These are the modulus rvj(22.z) 2 -- maxima that are mostly influenced by the noise fluctuations. Since the noise dominates the signal at the finest scale 2 ', this computation can only be done up to the scale 22. If we it7124. I A A remove all modulus maxima at the scale 2', we restore a blurred signal. Instead, we create a maximum at the scale 2' Sf(24.1) at any abscissa where there exists a maximum at the scale 22. I " " I " " I ' " ~ I " " I To compute the amplitude of this new maximum, we estimate (b) the Lipschitz regularity of the corresponding singularity by - computing the best Lipschitz exponent CY that matches the - decay of the modulus maxima at scales larger than 2 2, at the ~ - corresponding location. We then impose that the ratio of the g 2 3 maxima amplitude at the scales 22 and 2' is equal to 2", in order to keep the same amplitude decay up to the finest scale. 2'4 Wherever the signal has no modulus maximum at the scale Sf(24.z) 2', we do not create a modulus maximum at the scale 2 ' and 1""l""l""l""I thus restore a smooth signal. At the maxima locations, we (C) restore a singularity which is Lipschitz a. This nonlinear Fig. 10. (a) Signal of Fig. 8(a) to which we added a Gaussian white noise. SNR is 5.4 db. (b) Wavelet transform computed up to the scale z4. (c) interpolation algorithm recovers an approximation of the original signal singularities. Fig. 1 l(a) shows the wavelet Modulus maxima of the wavelet transform. transform modulus maxima that are selected by the denoising algorithm. Most of the modulus maxima of the Brownian At the locations where the signal has singularities with posi- motion part are not selected because they are completely tive Lipschitz exponents, the noises adds singularities with dominated by the noise. The position and amplitude of the negative Lipschitz exponents. Mathematically, the sum is a modulus maxima that have been selected are mostly influsignal whose singularities have negative Lipschitz exponents. enced by the original signal sharp variations, but are also However, if the signal has localized singularities of larger affected by the white noise at the corresponding locations. At amplitude than the noise, at large scales the modulus maxima scales larger than 24, the signal dominates the noise. Hence, created by the signal can be discriminated from the one we compute the wavelet transform up to the scale 24, and produced by the noise fluctuations, and their amplitude in- keep intact the noise within the low-frequency signal crease only slightly when the scale decreases (see Fig. lo(c)). Sf(Z4, x). At the finest scale 2', when the SNR is too small, the signal After the maxima selection, we reconstruct a "denoised" is dominated by the noise and it is extremely difficult to signal with the alternative projection algorithm described in recover any information from wavelet transform modulus Section 6. The function shown in Fig. ll(b) is obtained with maxima (see Fig. lo(c)). 20 alternative projections. The SNR of the noisy signal was In order to evaluate the behavior of the wavelet maxima 5.4 db, and after denoising the SNR is 11.6 db. We restore a across scales, we need to make a correspondence between the signal with sharp singularities but many original details have maxima that appear at different scales 2/. We say that a disappeared because they were dominated by the noise. The maxima at a scale 2/ propagates to another maxima at the restored signal singularities are also distorted because of the coarser scale 2'+', if both maxima belong to the same influence of the white noise on the positions and values of the maxima line in the scale space (s, x). Equation (43) proves modulus maxima that are selected. Table I gives the evoluthat for a white noise, on average, the number of maxima tion of the SNR of the denoised signal, when the amount of decreases by a factor 2 when the scale increases by 2. Half of white noise added to the signal of Fig. 8(a) is modified. The the maxima do not propagate from the scale to the scale gain is of the order of 7 db at low SNR. When the SNR 2/+'. In order to find which maxima propagate to the next increases, the gain decreases because this technique is not scale, one should compute the wavelet transform on a dense able to recover the Brownian texture in the right part of the sequence of scales. However, with a simple ad hoc algo- signal. Indeed, at the scale 2 ' and Z2, we select the modulus rithm, one can still try to find which maxima propagate to the maxima that propagate at least up to the scale 23, and this is next scale, by looking at their value and position with respect not the case for all modulus maxima created by this texture.

17 MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS 633 VIII. WAVELE TRANSFORM OF IMAGES The edges of the different structures that appear in an image are often the most important features for pattern recognition. This is well illustrated by our visual ability to recognize an object from a drawing that only outlines edges. Edge points are often located where the image intensity has sharp transitions. Hence, in computer vision, a large class of edge detectors look for points where the gradient of the image intensity has a modulus which is locally maximum. Canny's [17] edge detector is a multiscale version of this approach. This extension of the wavelet maxima representation in two-dimensions, is mostly inspired by Canny's multiscale edge detector. Since we want to characterize sharp transitions of the signal itself, we choose wavelets with only one vanishing moment. We call two-dimensional smoothing function, any function whose double integral is nonzero. We define two wavelets that are, respectively, the partial derivatives along x and y of a two-dimensional smoothing function e( x, y): Let $i(x, U) = (l/s>'$'(x/s, Y/S) and $:(x, Y) = (l/s)2$2(x/s, y/s). For any function f(x, y) el2(r2), the wavelet transform defined with respect to $'(x, y) and $'(x, y) has two components: (4 Fig. 11. (a) Modulus maxima kept by the denoising algorithm. (b) Signal reconstructed from the modulus maxima shown in (a). SNR is 11.6 db. Quality of the denoising can be appreciated by comparing this graph with the graphs in (c) and (d). (c) Original signal. (d) Noisy signal. Similarly to (21), one can easily prove that TABLE I Noisy SNR Denoised SNR Gain The first column gives the SNR after adding a Gaussian white noise to the signal in Fig. 8(a). The second column is the SNR of the signal after the noise removal. The third column gives difference between the two previous values. This simple algorithm shows the feasibility to discriminate a signal from its noise, with an analysis of the modulus maxima behavior across scales. Better strategies for selecting the maxima can certainly be developed depending upon the applications. The denoising procedure does not require that the noise is white but only that its singularities have Lipschitz exponents that can be differentiated from the signal singularities. = s?(f*e,)(x, Y). (45) Hence, the two components of the wavelet transform are proportional to the coordinates of the gradient vector of f(x, y) smoothed by Os(x, y). Canny [7] defines the edge points of f(x, y) at the scale s, as the points where the modulus of the gradient vector of f * e,( x, y) is maximum in the direction where the gradient vector points too. The orientation of the gradient vector indicates the direction where partial derivative of f(x, y) has an absolute value which is maximum. It is the direction where f(x, y) has the sharpest variation. Edge points are inflection points of the surface f * e,( x, y). We use the same approach to define the wavelet transform modulus maxima. Before studying in more details these modulus maxima, let us briefly review the properties of a two-dimensional wavelet transform.

18 ~ 634 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH 1992 In two dimensions, the scale space is a three dimensional space (s, x, y), and it is crucial to keep as few scales as possible in order to limit the computations as well as the memory requirements. We, thus, define a two-dimensional dyadic wavelet transform, where the scale s varies only along the dyadic sequence (2j)j,z. We call two-dimensional dyadic wavelet transform of f (x, y), the set of functions Let J1(wX, ay) and J2(wX, ay) be the Fourier transform of $ (x, y) and $ (x, y). The Fourier transform of W f (2, x, y) and W f (2j, x, y) is respectively given by and k1f(2j, U, ay) =?(Ux, Uy)Jl(2JUx,2jUy), (47) k2f(2, U,, U,) =?(ax, U,,)~ (~~U~,~ U,). (48) A dyadic wavelet transform is a complete and stable representation of f(x, y), if and only if the two-djmensional Fourier plane is covered by the dyadic dilations of $ (U,, wy) and $ (U,, wy). This means that there exists two strictly positive constants A and B such that I B. +m In two-dimensions, a dyadic wavelet transform is also overcomplete. Any sequence of two dimensional functions (g,!(x, y), g;(x, Y))~,~ is not a priori the dyadic wavelet transform of some two-dimensional function f (x, y). In order to be a dyadic wavelet transform, such a sequence must satisfy reproducing kernel equations similar to (40) [ 181. In two dimensions, Lipschitz exponents are defined with a simple extension of Definition 1. Let 1 1 CY 1 0. A function f(x, y) is said to be Lipschitz CY in the neighborhood of (x,,y,), if and only if there exists h, > 0 and k, > 0 as well as a A > 0 such that for any h < h, and k < k, lf(x,+h,y,+h) -f(xo&l sa(h2 + k2)? (49) If there exists a constant A such that (49) is satisfied for any pair of points ( xo, yo) and ( x, + h, yo + k) within an open set of R, the function f(x, y) is uniformly Lipschitz CY over this open set. Theorems 1 and 2, that characterize Lipschitz exponents from the asymptotic decay across scales of the wavelet transform, remain valid in two dimensions. The local Lipschitz regularity of a function f(x, y) is estimated from the evolution across scales of both 1 W1f(2, x, y) I and 1 W2 f(2, x, y) 1. The value of each of these components is bounded by Mf(2, x, Y) = dl W1f(2, x, y) I * + I W2f(2, x, U) I T. (50) The function Mf(2, x, y) is called the modulus of the wavelet transform at the scale 2. For wavelets defined by (44), equation (45) proves that Mf(2j, x, 2) is proportional to the modulus of the gradient vector V(f * 02,)( x, y). Theorem 1 is extended as follows. We suppose that the wavelets $I( x, y) and $2( x, y ) are continuously differentiable and have a compact support. Theorem 6: Let f(x, y) E L2(R2) and 0 < CY < 1. For any E > 0, f(x, y) is uniformly Lipschitz CY over ]a + E, b - E [ x ] c + E, d - 1, if and only if for any E > 0 there exists a constant A, such that for (x, y) ]a + E, b - E[ x ] c + E, d - E [ and any scale 2 The proof of this theorem is a simple extension of the proof of Theorem 1. Equation (5 1 ) yields Theorem 6 proves that uniform Lipschitz exponents can be measured from the slope of the decay of log, ( 1 Mf(2J, x, y) I ), when the scale 2 tends to 0. Like in one dimension, integer Lipschitz exponents have a particular behavior. For CY = 0 and CY = 1, (51) is a necessary condition which is not sufficient. The two-dimensional extension of Theorem 2 is similar and is left to the reader. To recover the two components W1f(2, x, y) and W2f(2j, x, y) from the modulus Mf(2J, x, y), we also need to compute the angle Equation (45) proves_that Af(2], x, y) is the angle between the gradient vector V(f * 02,)(x, y) and the horizontal. It indicates locally the direction where the signal has the sharpest variation. This orientation component is the main difference between one and two-dimensional wavelet transforms. Fast algorithms are described in [ 181 to compute the two-dimensional dyadic wavelet transform of an image. The first two rows of Fig. 12 shows the dyadic wavelet transform of the image of house. We can recognize the effect of the partial derivative along x and y in each component of the wavelet transform. The modulus and angle images are shown in the third and fourth rows of Fig. 12. The angles indicate the local orientation of edges. IX. MODULUS MAXIMA OF AN IMAGE WAVELETRANSFORM Like in a Canny edge detection, we detect the points where the modulus of V( f * e,)( x, y) is locally maximum in the direction where the gradient vector points too. At each scale 2, the modulus maxima of the wavelet transform are thus defined as points (x, y) where the modulus image Mf (2 J, x, y) is locally maximum, along the gradient direction given by Af(2-, x, y). These modulus maxima are

MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS 635 4 6 I 14 16 17 r,, 19 I?IJ I 21 I 22.i 23 Fig. 12. Original image is at the begining of the 3rd row (9).

19 MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS I r,, 19 I?IJ I 21 I 22.i 23 Fig. 12. Original image is at the begining of the 3rd row (9). Low-frequencies below the scale 24 are carried by the image Sf(24, X, y) that is shown in (14). First row (1-4) gives the images (W f(2, x, y))ldjr4, and the scale increases from left to right. Second row (5-8) displays ( W2f(2, x, Y))~ dj54. Black, grey, and white pixels indicate, respectively, negative, zero and positive sample values. Third row (IO- 13) displays the modulus images (Mf(2, x, y)), s,54. Black pixels indicate zero values and white ones correspond to the highest values. Fourth row (15-18) gives the angle images ( Af(2, x, y)), 4. Angle values range from 0 (black) to 27r (white). Fifth row (19-22) displays in black the position of the points that are local maxima of Mf(2, X, y). in thedirection given by the angle images Af(2J. x, y). Sixth row (23-26) shows the modulus maxima, where the modulus value Mf(2j, x, y) is larger than a given threshold 25

20 636 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38. NO. 2, MARCH 1992 inflection points of f * 02,( x, y). We record the position of each modulus maximum and the values of Mf(2J, x, y) and Af(2, x, y) at the corresponding location. The fifth row of Fig. 12 shows the position of the modulus maxima at different scales. At fines scales, there are many modulus maxima created by the image noise and the light textures. The last row gives the positions of the modulus maxima, where the values of Mf(2, x, y) is larger than a given threshold. The modulus maxima created by the light noise and textures have disappeared and only the important edges remain. Let us study the inverse problem that consists in reconstructing images from the local maxima of their wavelet transform modulus. Meyer proved [23] that these modulus maxima do not characterize uniquely images. However, it is possible to reconstruct a close approximation of an image, with an extension of the one-dimensional reconstruction algorithm, as described in [IS]. Let f(x, y) el2(r2) and (W;,f(x, y), W2f(2J, x, f)),ez be its dyadic wavelet transform. For each scale 2/, we detect the local maxima of Mf(2j, X, y) along the direction given by the angle image Af(2, x, y). We record the positions of modulus maxima ((xj, vi, as well as the values Mf(2J, xj, y,,,) and Af(2/, x,, y,, at the maxima locations. In two dimensions, the number of modulus maxima is not countable anymore. From Mf(2J. xj, y,, and Af(2, xj, yj, we can compute W f(2, x ~, yj,u) ~, and W2f(2, x,, y,, and vice-versa. The inverse problem consists in finding the set of functions h( x, y) that satisfy the following two constraints. a) At each scale 2/, the modulus maxima obtained from W h(2, x, y) and W2h(2J, x, y), are located at the abscissa ((Xj, u,.yj, u))uer. b) At each scale 2 and for each modulus maxima location (xj, U) Yj, W h(2, xj,u> ~ j U), W f(2 3 xj, yj, u) and W2h(2J, xj, U> ~ j U), = w f( 2J, xj,u, Yj,d. Like in one dimension, we replace the maxima condition a) by the minimization of a Sobolev norm, which is defined by In conjunction with the constraint b), the minimization of this norm creates a wavelet transform whose horizontal and vertical components reach local maximum values in the neighborhood of the points (x~,~, yj,,), and which have as few spurious oscillations as possible. We use a partial derivative along x for W h(2, x, y), because it oscillates mostly along the x direction, since it is computed by smoothing the signal and making a partial derivative along x (see Fig. 12). The transpose result is valid for W2h(2j, x, y). By imposing a simple condition on the Fourier transform of the two wavelets $ (x, y) and $ (x, y), one can prove [ls] that for any function h(x, y) el2(r2), ((W h(2j, x, y), W2h(2j, x, y)),,? is finite. We can compute the wavelet transform that satisfies the condition b) and minimizes the norm 1 * I with an alternative projection algorithm. Let K the space of all sequences of function (gf (x, y), g:( x, Y))~,~ such that l(gf(x,y),g,?(x,y))j,zl < +ej. The spaces V of all dyadic wavelet transforms of functions in L2(R2) is included in K. We define the set J? of all sequences of functions (g;( x, y), gf( x, Y ) ) ~ E, K ~ such that and for any index j and all maxima positions ( x,, yj,.) The set r is an affine space which is closed with respect to the norm I 1. The wavelet transforms that satisfy the condition b) are the sequences of functions that belong to A= vnr. To reconstruct the element of r 0 V that minimizes the norm 1 * 1, we alternate orthogonal projections on r and V. The projectors PI. on I and P, on V are defined in [ls]. For a discrete image of N2 pixels, the implementations of P, and Pr require O(N2 log, (N)) operations. If we begin the iteration from the zero element of K, the alternative projections converge strongly to the element of A whose norm 1 1 is minimum (see Fig. 9). We then reconstruct an image by applying the inverse wavelet transform operator. This algorithm does not reconstruct the original wavelet transform but an approximation. Numerical experiments indicate that we always recover a close approximation of the original wavelet transform. After less than ten iterations the algorithm reconstructs an image which has no visual differences with the original one. Fig. 13 shows the reconstruction of the house image from the wavelet transform modulus maxima. The signal-to-noise ratio of this reconstruction is 31 db. Most of the errors are concentrated at high frequencies and are too small to be visible. Like in one dimension, it seems that the modulus maxima representation is complete modulo small errors that can be neglected since they are below our visual sensitivity. There is however no mathematical proof that these errors remain small for any reconstructed image. The stability of the reconstruction algorithm enables us to process the modulus maxima before the reconstruction. The bottom left image of Fig. 13 is reconstructed from the threshold modulus maxima shown in the last row of Fig. 12. Since we suppressed the modulus maxima corresponding to noises and textures, these have disappeared in the reconstructed image, but the image remains sharp and the other strong singularities are not degraded. The next section describes a more sophisticated technique to remove noises from images.

21 I, MALLAT AND HWANG: SINGULARITY DETECTION AND PROCESSING WITH WAVELETS 637 Fig. 13. Top left: original image. Top right: image reconstructed from the wavelet transform modulus maxima shown in the fifth row of Fig. 12, with 10 iterations. SNR is 31 db. Bottom left: image reconstructed with 10 iterations, from the thresholded wavelet transform modulus maxima shown in the last row of Fig. 12. X. IMAGE DENOISING BASED ON WAVELET MAXIMA Like in one dimension, we separate the image information from the noise by trying to discriminate the image singularities from the noise singularities. The Lipschitz exponents of the image singularities are measured from the evolution of the wavelet transform maxima across scales. When the image does not include irregular textures, the worth singularities are discontinuities. All singularities have positive Lipschitz exponents, so the wavelet transform modulus maxima do not increase when the scale decreases. On the contrary, the realization of a white noise random field is a distribution which is almost everywhere singular, with negative Lipschitz exponents. Let n( x, y) be a wide-sense stationary, white noise random field of variance U. Let Mn(2j, x, y ) be the modulus of the wavelet transform of n( x, y). With a similar proof than for (42), one can show that On average, the square of the wavelet transform modulus increases by a factor 2 when the scale decreases by a factor 2. The modulus maxima created by the noise have a different behavior than the modulus maxima that are mainly affected by a signal singularity. The noise can also be discriminated by using some a priori knowledge on the spatial coherence of the image components. For a large class of images, the border of important structures are regular curves in the image plane (x, y). Along these curves, the image intensity is singular in one direction but varies smoothly in the perpendicular direction. For example, the contours of the house elements, in Fig. 12, are mostly smooth curves. We reorganize the maxima representation into chains of modulus maxima, to recover these edge curves. At any point of a smooth edge curve, the direction of the gradient vector of f(x,y) is perpendicular to the tangent of the curve in the image plane (x, y). To chain a modulus maximum with its neighbors, we use this orientation information at each scale 2/, as well as the fact that the modulus of the gradient vector varies smoothly along such a curve [29]. A white noise rather creates randomly distributed edge points in the image plane. It does not produce smooth singular curves. We can, therefore, also discriminate the modulus maxima created by the image structures, by analyzing the geometrical properties of the edge curves that they produce in the image plane. At the top left of Fig. 14 is the original house image and at the right is an image where we added a Gaussian white noise, with an SNR of 2.9 db. The first column of Fig. 15 gives the modulus maxima of this noisy image, between the scales 2 and 24. At the finest scale, the noise dominates the signal and the geometrical coherence of the house edges can not be discriminated anymore. The white noise has essentially destroyed all the image information at this scale. This is coherent with the fact that when looking at the noisy image, one has the impression that the noise is a uniform texture that blurs our visualization of the image. At the next scale, we can distinguish the contours of the house among all the modulus maxima created by the noise, which means that the image information is not completely dominated by the noise. At the scale 23, the image structures clearly stand-out of the noise, and the values of the modulus maxima at the borders of most image components are much less affected by the noise. In order to estimate the evolution of the modulus maxima amplitude across scales, we need to relate at different scales 2 and 2 J+ the modulus maxima that belong to the same maxima line in the scale-space cube (s, x, y). We relate the modulus maxima across scales with the same ad hoc algorithm as in one dimension. We consider that at a scale 2/, only the maxima of largest amplitude propagate to the next scale. Any such modulus maximum propagates to a modulus maximum at the scale 2 /+ I, which has a close position in the image plane and a similar angle value Af(2/+, x, y). The denoising algorithm removes all modulus maxima that do not propagate to coarser scales or whose modulus values increase, on average, when the scale decreases. Among the remaining maxima, we keep the modulus maxima that belong to maxima chains that are longer than a given threshold. This simple geometrical criteria selects modulus maxima that are part of coherent geometrical structures, which most probably belong to the original image rather than the noise. The maxima removal is performed at scales larger or equal to 2, because at the finest scale 2 I, the noise dominates the signal. The position, modulus and angle values of the maxima that are kept, are mostly influenced by the original image singularities but also depend upon the noise values in the corresponding neighborhoods. The geometrical coherence hypothesis, that we used at the selection stage, supposes that important singularities belong to regular curves, and that the

63 8 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH 1992 Fig. 14. Top left: original image. Top right: image contaminated by a Gaussian white noise. SNR is 2.9 db.

22 63 8 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 2, MARCH 1992 Fig. 14. Top left: original image. Top right: image contaminated by a Gaussian white noise. SNR is 2.9 db. Lower left: image reconstructed from the regularized maxima shown in the second column of Fig. 15. SNR is 14.3 db. Lower right: image reconstructed from the same modulus maxima, but without regularization along the maxima curves. SNR is 14.2 db. singularity type varies smoothly along these curves. Hence, the irregular variations of the positions, angle and modulus values of the maxima along the remaining maxima chains, are mostly due to the noise influence. We remove part of the noise by applying low-pass filters on these values, along each maxima chain. This regularization does not affect the sharpness of the image singularities but it smooths their evolution in the image plane. The second column of Fig. 15 shows the maxima curve obtained after this regularization. At the finest scale 2', if we remove all modulus maxima, we restore a blurred image. Instead, like in one-dimension, we create a modulus maximum at the scale 2', at the abscissa where there exists a modulus maximum at the scale 22. The edge map at the scales 2' and 22 are thus the same. The angle value of a maximum created at the scale 2', is copied from the angle value of the corresponding maximum at the scale 22. Its modulus value is computed by estimating the Lipschitz regularity a from the decay of the modulus maxima at scales larger than 22, at the same location. The ratio of the modulus values at the scales 22 and 2' is set to 2", in order to keep the same modulus decay at the finest scale. Like in one dimension, at scales larger than Z4, the SNR is large enough so that we do not need to process the low-frequency image SfQ4, x, y). The lower left of Fig. 14 is the image reconstructed from the cleaned modulus maxima representation. The denoising algorithm suppressed a large portion of the noise but also all the image textures. The SNR of this image is 14.3 db. The lower right of Fig. 14 is the image obtained if we do not regularize the angle and modulus values along the maxima chains, at the different scales. The SNR is 14.2 db which is approximately equal to the SNR with regularization. Fig. 15. First column gives the position of the wavelet transform modulus maxima of the noisy image shown in Fig. 14. Second column displays the position of the modulus maxima selected by the denoising algorithm, after regularization along the maxima chains. However, the edges are more irregular and qualitatively the image is of lower visual quality. The regularization improvement is mostly qualitative because after smoothing, the edge locations might be slightly shifted so the SNR is not much improved. Table I1 gives the evolution of the SNR of the denoised images, when we vary the amount of Gaussian white noise added to the original image. At low SNR, the gain is over 10 db. Like in one dimension, this gain decreases when the SNR increases because we can not recover the original image textures. Indeed, at the scales 2' and 2', we only select the modulus maxima that propagate at least up to the scale 23. The removal of textures has relatively less impact on the error at low SNR. Fig. 16 gives another example of image denoising. The SNR of the noisy image is 5.5 db. We applied the same denoising procedure, with the same threshold parameters as for Fig. 15. The SNR of the

SINGULARITY DETECTION AND PROCESSING WITH WAVELETS

SINGULARITY DETECTION AND PROCESSING WITH WAVELETS Stephane Mallat and Wen Liang Hwang Courant Institute of Mathematical Sciences New York University, New York, NY 10012 Technical Report March 1991 Abstract