A Method for Blur and Similarity Transform Invariant Object Recognition

Size: px

Start display at page:

Download "A Method for Blur and Similarity Transform Invariant Object Recognition"

Julie Thornton
5 years ago
Views:

1 A Method for Blur and Similarity Transform Invariant Object Recognition Ville Ojansivu and Janne Heikkilä Machine Vision Group, Department of Electrical and Information Engineering, University of Oulu, PO Box 4500, 90014, Finland Abstract In this paper, we propose novel blur and similarity transform (i.e. rotation, scaling and translation) invariant features for the recognition of objects in images. The features are based on blur invariant forms of the log-polar sampled phase-only bispectrum and are invariant to centrally symmetric blur, including linear motion and out of focus blur. An additional advantage of using the phase-only bispectrum is the invariance to uniform illumination changes. According to our knowledge, the invariants of this paper are the first blur and similarity transform invariants in the Fourier domain. We have compared our features to the blur invariants based on complex image moments using simulated and real data. The moment invariants have not been evaluated earlier in the case of similarity transform. The results show that our invariants can recognize objects better in the presence of noise. 1. Introduction Recognition of objects and patterns in images is a fundamental part of computer vision, with numerous applications. The task is difficult as the objects rarely look exactly the same in different conditions. In real applications, images contain various artifacts such as geometrical and convolutional degradations. Image analysis systems should be able to operate also in these nonideal conditions. There has been a vast amount of research in this field of invariant pattern and object recognition [14]. However, the invariant recognition of objects degraded by blur is a much less studied topic. In various applications, images may contain blur, which can result, for example, from atmospheric turbulence, being out of focus, or relative motion between the camera and the scene. This degradation process can be modeled as a linear shift-invariant system in which the relation between an ideal image f(x) and an observed image g(x) is given by g(x) =f(x) h(x)+n(x), (1) where x is a 2-D spatial coordinate vector, h(x) the point spread function (PSF) of the system, n(x) additive noise, and denotes 2-D convolution. The point spread function h(x) represents blur while other degradations are captured by the noise term n(x). The analysis of blurred images is often carried out by first deblurring the images and then applying standard methods for further analysis. Unfortunately, image deblurring is a difficult problem. Conventional solutions include the estimation of the PSF of the blur and deconvolution of the image using that PSF. When the PSF is known, the latter illposed problem can be solved using approaches which use regularization [1]. In practice, the PSF is often unknown and very hard to estimate accurately. In this case, blind image restoration algorithms are used [8]. The analysis of the blurred images can also be performed without deblurring using features which are invariant to blur. Flusser and Suk proposed the first blur invariant features in [5]. These invariants are based on geometric moments or the spectrum of the image. Of these, the invariants based on central moments are also invariant to translation. The moment invariants have been applied to template matching [5], recognition of defocused objects [6] and registration of X-ray images [2]. In [9], new Fourier domain invariants based on the phase of the image spectrum or bispectrum were proposed. The bispectrum invariants are also invariant to a shift of the image. In [7], blur, rotation, scale and shift invariants based on complex moments were proposed and in [15, 12], the theory was extended to blur and affine moment invariants. A shortcoming of the moment invariants is their sensitivity to noise and background clutter. On the other hand, the invariance to geometrical transformations of the Fourier domain features is limited to image shift. In this paper, we propose novel Fourier domain blur in-

2 variants, which are invariant to rotation, scaling and translation i.e. similarity transform. We call them blur and similarity transform invariants (BSI). Similarly to above mentioned former invariants, also the BSIs are invariant to centrally symmetric convolution. Many practical blur forms, such as linear motion or defocus blur fall into this category. The invariants are based on the blur invariant form of the log-polar sampled phase-only bispectrum. 2. Blur invariants in the Fourier domain In this section, we first show the conditions of blur invariance in the Fourier domain in Section 2.1 and then present the blur-similarity transform invariants in Section 2.2. The implementation of the invariants is discussed in Section Blur invariance If noise n(x) is neglected, (1) can be expressed in the Fourier domain using the convolution theorem by G(u) =F (u) H(u), where G(u), F (u),andh(u) are the Fourier transforms of the observed image, ideal image and the PSF. In the phasor form, the equation becomes G(u) = G(u) e iφg(u), where u is a vector in the 2-D frequency space. If the Fourier transform G(u) is normalized by its magnitude, only the complex exponential containing the phase remains, namely G(u) G(u) = e iφg(u) = e i[φ f (u)+φ h (u)], (2) where φ f (u) is the phase of the original image f(x) and φ h (u) the phase of the blur PSF h(x). Since h(x) is assumed to be centrally symmetric, its Fourier transform H(u) is real and its phase φ h (u) has only two possible values φ h (u) =0 φ h (u) =π. It follows from this, and from the periodicity of the complex argument, that the equality [e iφg(u) ] 2n = e i2nφg(u) holds for any integer n. = e i2nφ f (u) e i2nφ h(u) = [e iφ f (u) ] 2n (3) Thus, any even power of the normalized Fourier transform, i.e. e i2nφ(u), of the observed image is invariant to the convolution of the original image with any centrally symmetric PSF Blur and similarity transform invariants The phase spectrum of an image, that is used in Section 2.1, is not invariant to translation. The amplitude spectrum is invariant to translation, but blur invariants cannot be constructed from it, as they are based on phase. For this reason, we have to turn to the higher order spectra defined by Ψ m (u 1, u 2,, u n )=F (s) m F (u i ), (4) i=1 where u i with i = 1,,m are vectors in the 2-D frequency space, and s = u 1 + u u m. It can be easily shown that Ψ m is shift invariant [3]. When m =1in (4) we get the power spectrum, and further with the value m =2, we get the bispectrum, namely Ψ 2 (u 1, u 2 )=F(u 1 )F (u 2 )F (u 1 + u 2 ). (5) The bispectrum contains both amplitude and phase information and is shift invariant. It is possible to construct a blur invariant phase-only bispectrum where the exponentials containing the phase are raised to the second power, similar to (3), namely [ ] 2 [ ] 2 [ F (u1 ) F (u2 ) F ] 2 (u 1 + u 2 ) P (u 1, u 2 )= F (u 1 ) F (u 2 ) F (u 1 + u 2 ) = e i2φ(u1) e i2φ(u2) e i2φ(u1+u2) = e i2[φ(u1)+φ(u2) φ(u1+u2)]. (6) The bispectrum (5) as well as its blur invariant form (6) are functions of two vector arguments, containing a total of four scalar variables. Assuming that F (u) is an N-by-N discrete Fourier transform (DFT) of an image f(x), the bispectrum becomes a four-dimensional N-by-N-by-N-by- N matrix. Fortunately, it is not necessary to evaluate the whole bispectrum. It is possible to take only 2-D slices of the original bispectrum, which contain basically the same information. There are various ways of defining the slices [4, 10]. We define the slices as S k (u) =Ψ 2 (u,ku) k R. (7) Similarly to (6) but using only one slice (7) we form a blurtranslation invariant phase-only bispectrum slice P k (u) = [ ] 2 Sk (u) = e i2[φ(u)+φ(ku) φ((k+1)u)]. (8) S k (u)

3 The blur-translation invariants we proposed in [9] were based on (8). A pure bispectrum has also been used for rotation, scale and translation invariant object recognition [13]. An advantage of using the phase-only bispectrum is its robustness to illumination changes. The bispectrum is also fairly insensitivity to noise. The bispectrum slices (7) have the following scaling and rotation properties: f(ax, by) B 1 ab 3 S k( u a, v b ) (9) f(r, θ + α) B S k (ω, φ + α), (10) where B denotes the transformation from spatial image f(x, y) to its bispectrum slice S k (u, v), f(r, θ) and S k (ω, φ) are the corresponding representations in the polar coordinates, and a, b,andα are some constants. (9) and (10) correspond to the scaling and rotation properties of the Fourier transform, except for the term 1 which is 1 ab 3 ab for the Fourier transform. This means that the bispectrum slices can be used to form rotation, scaling and translation invariant image representation similar to the amplitude spectrum in the case of Fourier-Mellin invariant descriptors [11]. Using log-polar representation, the property (9) transforms the scale difference of the images into displacement of their logarithmically sampled bispectrum slice. The property (10) converts the rotation of the image into displacement along the φ-axis in the polar coordinates. As a result, both scale and orientation changes of the image turn into shifts in the log-polar sampled bispectrum slice. We denote this log-polar resampled version of the k th blur invariant bispectrum slice (8) as R k (r, θ). The shift of R k (r, θ) can be eliminated by computing its bispectrum slice which is shift invariant. We denote the l th bispectrum slice (7) of R k (r, θ) as ζ kl (v), wherek and l identify the first and second successive slices. ζ kl (v) is now invariant to blur and similarity transform (rotation, scaling and translation) of the original image. The final BSI features that we used for object classification task are acquired by normalizing the slice, namely I kl (v) = ζ kl(v) ζ kl (v). (11) The computation of the BSIs can be summarized into following steps, which are discussed in detail in Section 2.3: 1) compute the Fourier transform of image:f(x) F (u), 2) compute the k th bispectrum slice: F (u) S k (u), 3) normalize and raise to a 2 nd power: S k (u) P k (u), 4) log-polar transform: P k (u) R k (r, θ), 5) compute the Fourier transform: R k (r, θ) F{R k (r, θ)}, 6) compute the l th bispectrum slice: F{R k (r, θ)} ζ lk (v), 7) normalize to get the invariants: ζ lk (v) I lk (v). The resulting BSI features are invariant to similarity transform and centrally symmetric blur, similar to the moment invariants of [7]. This includes linear motion blur and out of focus blur which are common. The features possess also invariance to uniform illumination changes and, because of the nature of the bispectrum, are fairly insensitive to noise Implementation To compute an arbitrary bispectrum slice S k (u) in step 2, we need frequency samples in points u but also at ku and (k +1)u. The samples in the two latter cases can be extracted from the DFT corresponding to F (u) by utilizing its conjugate symmetry and periodicity. The same is true in step 6 when another bispectrum slice is computed. After computation of the blur invariant phase-only slice P k (u), follows the log-polar sampling using bilinear interpolation. The sampling points are selected in the same way as in [11], namely u i = t ri/t cos θ i v i = t ri/t sin θ i, where t = N/2 1, and(r i,θ i ) are the points in the new uniform sampling lattice. The size of the DFT is N N. The resulting log-polar sampled bispectrum slice is denoted by R k (r i,θ i ). The accuracy of the final invariants depends on the resolution of the DFT and interpolation. In the experiments, the DFT dimensions were eight times the corresponding image dimensions, with values r i {0, 2,...,t} and θ i {0, 0.25,...,360}. The computation of the l th bispectrum slice ζ kl (v) of R k (r, θ) is achieved using a complex DFT and (7). It is possible to obtain slightly better classification results by using invariants from multiple slices k and l in the steps 2 and 6, respectively. In the experiments, we used only one slice in each step with values k =1and l =1.If the DFT size is N N, this results in approximately N 2 /2 non-redundant invariants I 11 (v). Itseemsthattheinvariants corresponding to the lower frequencies have a higher signal-to-noise ratio. In the experiments, we used the invariants of (11) within a radius ρ from the zero frequency, but without using the conjugate symmetric or zero frequency components.

100 Figure 1. Top row: images of the first four letters in the alphabet. Bottom row: motion blurred, rotated, scaled, translated and noisy versions of the same images.

0 50 40 30 20 10 PSNR [db] Figure 2. Classification accuracy of the nearest neighbor classification of distorted images of the letters of alphabet.

BSIs of (11) for images f and g, respectively, v =[v 1,v 2 ],and v1 2 + v2 2 ρ. 3.

4 100 Figure 1. Top row: images of the first four letters in the alphabet. Bottom row: motion blurred, rotated, scaled, translated and noisy versions of the same images. PSNRs are from left to right: 50, 40, 30 and 20 db. Noise is removed from the background using thresholding and connectivity analysis. Classification accuracy [%] BSI Mom inv PSNR [db] Figure 2. Classification accuracy of the nearest neighbor classification of distorted images of the letters of alphabet. The distance between the images f(x) and g(x) to be classified is computed as D = e(v 1,v 2 ), (12) v 2 where v 1 e(v) = I (f) 11 (v) I(g) 11 (v) 2, and where I (f) 11 (v) and I(g) 11 (v) are the BSIs of (11) for images f and g, respectively, v =[v 1,v 2 ],and v1 2 + v2 2 ρ. 3. Experiments In the experiments, we classified blurred, rotated, scaled and shifted images using the nearest neighbor classifier. We evaluated the classification accuracy of the BSIs against complex moment invariants which exhibit similar invariance properties. The moment invariants, although possessing invariance to the similarity transform, have not been evaluated previously so that scaling is present. Moment invariants were implemented as proposed in [7] using nine invariants of the orders 3, 5 and 7. To ensure roughly the same range of values, the moment invariants were normalized by their standard deviations for the whole test data in each experiment. The BSIs were implemented as explained in Section 2.3. We used the values k =1, l =1and ρ =30. In the first experiment, the target classes were formed by the images of the letters of the alphabet. The test images were distorted versions of the same images. Some of the original and distorted images are shown in Figure 1. The distortion included motion blur of length 30 pixels in a random direction, rotation of {0, 15, 30, 45, 60, 75, 90} degrees, scaling by a factor of {0.8, 0.9, 1.0, 1.1, 1.2},random translation in the range [-5, 5] and noise. Because the moment invariants are known to be sensitive to noise, we removed the noise from the background by using thresholding and connectivity analysis. At the same time, the image information at the border of the object is distorted. Each letter underwent all combinations of the above mentioned disturbances and was each time classified using the BSIs and moment invariants. The resulting percentage of correct classification is shown in Figure 2 for five different peak signal-to-noise ratios (PSNR). As was expected, both methods can classify images correctly at a low noise level. When the noise level is increased, the BSIs become more robust compared to the moment invariants. In the second experiment, the test material included images of fish. These original images formed the target classes into which the distorted versions of the images were to be classified. Some original and distorted fish images are shown in Figure 3. The images underwent similar distortions as those in the first experiment, but the blur was now circular and created by a disk shaped PSF with a radius of 15 pixels. The blur was also smaller relative to the image size compared to the first experiment. We performed similar thresholding and connectivity analysis to that in the first experiment to remove the background noise. The classification results are shown in the diagram of Figure 4. Again, both methods classify all cases correctly when the noise level is low, but when noise level increases the performance of the BSIs remains almost unchanged while the

Figure 3. Top row: four examples of the 90 fish images. Bottom row: circularly blurred, rotated, scaled, translated and noisy versions of the same images.

(200 400 images are cropped from the 500 500 images that were used.) Classification accuracy [%] 100 80 60 40 20 BSI Mom. inv. 0 50 40 30 20 10 PSNR [db] Figure 4.

In the final experiment, we used real images of four playing cards, the number seven from each suit (Seven of Spades, Hearts, Diamonds, and Clubs), which look very similar.

The cards were photographed on a black background using a Sony DFM-X710 video camera with a zoom lens. The image size was 350 350 pixels.

At the same time, this operation causes inevitable but realistic distortion at the image borders.

Motion blur was generated by linearly panning the camera and out of focus blur by defocusing the lens.

As can be seen, the blur is so severe that it is impossible to recognize the cards visually.

The moment invariants were able to classify only 47 % of the images correctly. This result is in line with the results of the simulated experiments. 4. Conclusions In this paper, we have proposed novel Fourier domain blur and similarity transform invariants called BSIs.

5 Figure 3. Top row: four examples of the 90 fish images. Bottom row: circularly blurred, rotated, scaled, translated and noisy versions of the same images. PSNRs are from left to right: 50, 40, 30 and 20 db. Noise is removed from the background using thresholding and connectivity analysis. ( images are cropped from the images that were used.) Classification accuracy [%] BSI Mom. inv PSNR [db] Figure 4. Classification accuracy of the nearest neighbor classification of distorted images of 90 fish. moments break down after 30 db. In the final experiment, we used real images of four playing cards, the number seven from each suit (Seven of Spades, Hearts, Diamonds, and Clubs), which look very similar. Sharp images of these cards formed the four target classes of the classification. The cards were photographed on a black background using a Sony DFM-X710 video camera with a zoom lens. The image size was pixels. The background clutter was removed from the images using thresholding and connectivity analysis similarly to the simulated experiments. At the same time, this operation causes inevitable but realistic distortion at the image borders. We photographed four motion blurred and four out of focus blurred versions of each card rotating, zooming and translating the camera between the shots. Motion blur was generated by linearly panning the camera and out of focus blur by defocusing the lens. Altogether, this resulted in 32 degraded images to be classified into one of the four classes. Each row in Figure 5 shows a sharp and two degraded images of one class. As can be seen, the blur is so severe that it is impossible to recognize the cards visually. However, the BSI features classified all the distorted images into the correct classes, performing perfectly. The moment invariants were able to classify only 47 % of the images correctly. This result is in line with the results of the simulated experiments. 4. Conclusions In this paper, we have proposed novel Fourier domain blur and similarity transform invariants called BSIs. The features are invariant to blur or any other convolution of the image having centrally symmetric PSF. Special cases of this are linear motion blur and out of focus blur. As demonstrated, the invariants can be used to recognize blurred images or objects. Because features are based on a phase only bispectrum, they are also robust to illumination changes and as bispectrum, they are fairly insensitive to noise. We compared BSIs to the complex moment invariants proposed in [7] for classification of blurred and noisy objects under similarity transform. The results show that our BSIs perform better when the objects are noisy. 5. Acknowledgments The authors would like to thank the Academy of Finland (project no ), and Prof. Petrou and Dr. Kadyrov for providing us with the fish image database.

Figure 5. The rows contain one sharp and two blurred and transformed versions of the cards Seven of Hearts and Seven of Diamonds, respectively. Cards are photographed on a black background.

IEEE Signal Processing Magazine, 14(2):24 41, Mar. 1997. [2] Y. Bentoutou, N. Taleb, M. C. E. Mezouar, M. Taleb, and L. Jetto.

Pattern recognition using invariants defined from higher order spectra: 2-d image inputs. IEEE Trans. Image Processing, 6(5):703 712, 1997. [4] S. A. Dianat and R. M. Rao.

6 Figure 5. The rows contain one sharp and two blurred and transformed versions of the cards Seven of Hearts and Seven of Diamonds, respectively. Cards are photographed on a black background. Background clutter is removed using thresholding and connectivity analysis. References [1] M. R. Banham and A. K. Katsaggelos. Digital image restoration. IEEE Signal Processing Magazine, 14(2):24 41, Mar [2] Y. Bentoutou, N. Taleb, M. C. E. Mezouar, M. Taleb, and L. Jetto. An invariant approach for image registration in digital subtraction angiography. Pattern Recognition, 35(12): , [3] V. Chandran, B. Carswell, B. Boashash, and S. Elgar. Pattern recognition using invariants defined from higher order spectra: 2-d image inputs. IEEE Trans. Image Processing, 6(5): , [4] S. A. Dianat and R. M. Rao. Fast algorithms for phase and magnitude reconstruction from bispectra. Optical Engineering, 29(5): , [5] J. Flusser and T. Suk. Degraded image analysis: An invariant approach. IEEE Trans. Pattern Anal. Machine Intell., 20(6): , [6] J. Flusser, T. Suk, and S. Saic. Recognition of blurred images by the method of moments. IEEE Trans. Image Processing, 5(3): , [7] J. Flusser and B. Zitová. Combined invariants to linear filtering and rotation. Int. J. Pattern Recognition and Artificial Intelligence, 13(8): , [8] D. Kundur and D. Hatzinakos. Blind image deconvolution. IEEE Signal Processing Magazine, 13(3):43 64, May [9] V. Ojansivu and J. Heikkilä. Object recognition using frequency domain blur invariant features. In Proc. Scandinavian Conference on Image Analysis (SCIA 07), Aalborg, Denmark, June [10] A. P. Petropulu and H. Pozidis. Phase reconstruction from bispectrum slices. IEEE Trans. Signal Processing, 46(2): , [11] Q. sheng Chen, M. Defrise, and F. Deconinck. Symmetric phase-only matched filtering of fourier-mellin transforms for image registration and recognition. IEEE Trans. Pattern Anal. Machine Intell., 16(12): , [12] T. Suk and J. Flusser. Combined blur and affine moment invariants and their use in pattern recognition. Pattern Recognition, 26(12): , Dec [13] M. K. Tsatsanis and G. B. Giannakis. Object and texture classification using higher order statistics. IEEE Trans. Pattern Anal. Machine Intell., 14(7): , [14] J. Wood. Invariant pattern recognition: A review. Pattern Recognition, 29(1):1 17, [15] Y. Zhang, C. Wen, Y. Zhang, and Y. C. Soh. Determination of blur and affine combined invariants by normalization. Pattern recognition, 35(1): , Jan

Blur Insensitive Texture Classification Using Local Phase Quantization

Blur Insensitive Texture Classification Using Local Phase Quantization Ville Ojansivu and Janne Heikkilä Machine Vision Group, Department of Electrical and Information Engineering, University of Oulu,