SIMPLE GABOR FEATURE SPACE FOR INVARIANT OBJECT RECOGNITION

Size: px

Start display at page:

Download "SIMPLE GABOR FEATURE SPACE FOR INVARIANT OBJECT RECOGNITION"

Amie Adams
6 years ago
Views:

1 SIMPLE GABOR FEATURE SPACE FOR INVARIANT OBJECT RECOGNITION Ville Kyrki Joni-Kristian Kamarainen Laboratory of Information Processing, Department of Information Technology, Lappeenranta University of Technology, P.O. Box 0, FIN Lappeenranta, Finland Abstract Invariant object recognition is one of the most challenging problems in computer vision. Authors propose a simple Gabor feature space, which has been successfully applied to facial feature extraction in invariant face detection in demanding environments (Kamarainen et al., 00c; Hamouz et al., 003). In the proposed feature space, illumination, rotation, scale, and translation invariant recognition of objects can be established within a reasonable amount of computation. In this study, the fundamental properties of Gabor features, construction of the simple feature space, and invariant search operations in the feature space are discussed in more detail. Key words: Gabor filtering, invariant object recognition 1 Introduction Invariant object recognition has been one of the most important topics in the computer vision research for decades. Often the research has focused on invariant features. They make invariant recognition possible in a straightforward manner but also present some pitfalls as an invariant feature is always a generalization and may lack some useful information. In addition, a single feature often does not allow accurate recognition and the relationships between several features need to be examined. It seems that the requirements of a truly invariant system cannot be exclusively met with a single invariant feature, Corresponding author. Tel.: ; fax: addresses: Ville.Kyrki@lut.fi (Ville Kyrki), Joni.Kamarainen@lut.fi (Joni-Kristian Kamarainen).

2 such as the moment invariants; at least not within a reasonable computing complexity. Since the study of a general image processing operator by Granlund (1978), Gabor filters have been used in the feature extraction from images. With properly chosen filter parameters, the Gabor filters have similar characteristics to multiresolution techniques, such as the Gabor expansion and wavelets (e.g. Bastiaans, 1980; Daubechies, 1990; Lee, 1996). Thus, it is reasonable to assume that similar properties can be established also for the Gabor filters. The authors have recently proposed a novel framework utilizing Gabor filter based features in rotation, scale, and translation invariant recognition of objects (Kamarainen et al., 00c; Hamouz et al., 003). The framework is based on a feature space constructed from the Gabor filter responses and invariant search operations established in the feature space. In this study the research results are concluded by showing the most important properties of the Gabor filters for invariance and introducing the feature space in more detail. The feature space assumes that the objects can be distinguished by features at only one spatial location. Despite this restriction, the method performs outstandingly well in the applications, and on the present knowledge, there is no evidence that the method could not be extended to handle several simultaneous locations. Feature Extraction with Gabor Filters There are two basic approaches to use Gabor functions in the feature extraction: 1) Gabor expansion and ) Gabor filtering. In the expansion, the Gabor functions represent distinct, not necessarily orthogonal, pieces of image information. A distinct advantage of the Gabor functions is their optimality in the time frequency, or space spatial-frequency in two dimensions, plane, providing the smallest possible pieces of information about time-frequency events (Gabor, 196). Any well-behaving function can be represented as a linear combination of the Gabor functions (Bastiaans, 1980; Gabor, 196). However, the Gabor expansion requires a computation of biorthogonal analysis functions, which may turn to be a very time consuming task. Thus, it has been more common to use Gabor functions as analysis filters in the image processing. Lades et al. (1993) use Gabor filter responses at single spatial location but do not take advantage of the invariance properties presented here. The use of Gabor filters is also motivated by the physiology of the mammalian visual system (Daugman, 1985). Nevertheless, image analysis with the biorthogonals can be thought as a dual to the analysis with the Gabor filters, and thus, also the Gabor filters extract space spatial-frequency events from images. The Gabor s original 1-D function of time and frequency has been generalized

3 to a -D function of space and spatial-frequency and several forms have been proposed (e.g. Granlund, 1978; Daugman, 1985). The authors have defined the following form of a normalized -D Gabor filter function in the continuous spatial domain (Kyrki, 00) ψ(x, y; f, θ) = f ( f γ x + f πγη e x = x cos θ + y sin θ, y = x sin θ + y cos θ, η y )e jπfx, where f is the frequency of a sinusoidal plane wave, θ is the anti-clockwise rotation of the Gaussian envelope and the sinusoid, γ is the spatial width of the filter along the plane wave, and η the spatial width perpendicular to the wave. The form in Eq. (1) is centered to the origin and a filter response for an image function ξ(x, y) can be calculated at any location (x, y) with the convolution (1) r ξ (x, y; f, θ) = ψ(x, y; f, θ) ξ(x, y) = ψ(x x τ, y y τ ; f, θ)ξ(x τ, y τ )dx τ dy τ. () The response of the Gabor filter in Eq. () is the low-level Gabor feature used in this study. Next it is shown that the Gabor features can be used to extract exact information of pose and scale changes when the geometric manipulation of an input image ξ(x, y) is a function of the Gabor filter parameters; (x, y) for translation, θ for rotation, and f for scale. In a feature space constructed from the Gabor filter responses, invariant search operations can be established based on the rotation, scale, and translation properties of the Gabor features, which are considered next. 3 Invariance properties of Gabor features The invariance theorems in this section are not complicated or completely novel, but will be briefly revisited as the literature of Gabor filtering research seems to lack them. Similar theorems generally hold for any function, which maintains the same shape in all translations, scales, and orientations; probably the most popular ones being the wavelets over scales and translations (e.g. Daubechies, 1990). The Gabor features provide properties similar to the multiresolution analysis, but which behave smoothly (Kamarainen et al., 00b), which is difficult to achieve with wavelets. In this sense the Gabor filters approximate the joint shiftability conditions better than the orthogonal wavelets (Simoncelli et al., 199). The shiftability concept becomes important since the 3

4 following theorems are given in the continuous domain, where for example the translation invariance is trivial, but in the discrete case the sampling of the parameters destroys the invariance of non-shiftable functions and transforms. 3.1 Rotation For an image ξ 1 (x, y), the Gabor feature, the response of the normalized Gabor filter in Eq. (), at location (x 0, y 0 ) is r ξ1 (x 0, y 0 ; f, θ) = ψ(x 0 x τ, y 0 y τ ; f, θ)ξ 1 (x τ, y τ )dx τ dy τ. (3) Next, a rotated version ξ (x, y) of an image ξ 1 (x, y) is considered, where the image ξ 1 (x, y) is rotated anti-clockwise around the spatial location (x 0, y 0 ) by the angle φ as ξ (x, y) = ξ 1 (ˆx, ŷ) ˆx = (x x 0 ) cos φ + (y y 0 ) sin φ + x 0 ŷ = (x x 0 ) sin φ + (y y 0 ) cos φ + y 0. () By substituting the rotated image in Eq. () into the convolution formula in Eq. (3) and by integrating over integration axes (x τ, y τ) rotated around the same point (x 0, y 0 ) by an angle φ it is straightforward to show that r ξ (x 0, y 0 ; f, θ) = ψ(x 0 x τ, y 0 y τ; f, θ φ)ξ 1 (x τ, y τ)dx τdy τ = r ξ1 (x 0, y 0 ; f, θ φ), (5) which is due to the rotation property of the Gabor features. The rotation property appearing as the equivalence in Eq. (5) proves that the response of the Gabor filter for a rotated image is equal to the response of the correspondingly rotated filter for the original image. 3. Scaling Next, an image which is homogeneously scaled version of the image ξ 1 (x, y) is considered. The new image is scaled by a factor a as ξ 3 = ξ 1 (ax, ay). A response for the scaled image is r ξ3 (x, y; f, θ) = ψ(x x τ, y y τ ; f, θ)ξ 3 (x τ, y τ )dx τ dy τ = ξ 3 (x x τ, y y τ )ψ(x τ, y τ ; f, θ)dx τ dy τ (6) = ξ 1 (ax ax τ, ay ay τ )ψ(x τ, y τ ; f, θ)dx τ dy τ.

5 By making substitutions ˆx τ = ax τ dx τ = dˆx τ a, ŷ τ = ay τ dy τ = dŷ τ a, (7) and by respectively re-ordering the variables in Eq. (1), Eq. (6) can be rewritten as r ξ3 (x, y; f, θ) = ξ 1 (ax ˆx τ, ay ŷ τ )ψ(ˆx τ, ŷ τ ; f a, θ)dˆx τdŷ τ = r ξ1 (ax, ay; f (8) a, θ). The scale property of the Gabor features, shown by the equality in Eq. (8), proves the response of the Gabor filter for a scaled image to be equal to the response of the correspondingly scaled Gabor filter at the same relative location of the original image. The scale property is fundamental for the wavelets (Daubechies, 1990) and holds also for the normalized Gabor filters in Eq. (1) (Porat and Zeevi, 1988). 3.3 Translation In the case of the centered filters, Eq. (1), the translation invariant search is trivial as the features in Eq. () can be calculated at any location (x, y). For a translated image ξ = ξ 1 (x x 1, x y 1 ) (9) the response at (x, y) is r ξ (x, y; f, θ) = ξ (x x τ, y y τ )ψ(x τ, y τ ; f, θ)dx τ dy τ = ξ 1 (x x 1 x τ, y y 1 y τ )ψ(x τ, y τ ; f, θ)dx τ dy τ = r ξ1 (x x 1, y y 1 ; f, θ), (10) which implies the translation property. 3. Illumination Uniform illumination change can be modeled as a multiplication by a constant. Let ξ 5 (x, y) = c ξ 1 (x, y). By the linearity of the convolution, it can be written r ξ5 (x, y; f, θ) = c r ξ1 (x, y; f, θ). (11) 5

6 Simple Gabor feature space According to the properties defined in the previous section it can be stated for an image ξ 6, which equals ξ 1 rotated by φ, scaled by a and intensity multiplied by c, that r ξ6 (x 0, y 0 ; f, θ) = c r ξ1 (ax 0, ay 0 ; f, θ φ). (1) a In general, an equality similar to Eq. (1) can be obtained for all functions, which retain the same shape regardless of the translation, scale or orientation parameters. Yet the result is not easily obtained by e.g. orthogonal wavelets but it can be easier to inherit the wavelet properties to the Gabor functions (Lee, 1996). So far only features at a single location (x, y) have been considered, but even then the Gabor filters capture information from a remarkable large area, and thus, by combining the responses of several filters in different orientations and frequencies, surprisingly complex objects can be represented (e.g. Krüger and Sommer, 00). For more complicated object structures and objects where the discriminative information at one location is not sufficient, the features must be combined from several spatial locations (Lades et al., 1993; Krüger and Sommer, 00). In that case the invariant properties are more difficult to use, yet still possible. Since the presented method is restricted to a single spatial location, it is referred as simple, but still being surprisingly efficient (Kamarainen et al., 00c; Hamouz et al., 003)..1 Sampling filter parameters It is clear that a filter bank, consisting of several filters, needs to be used, as the relationships between responses of filters provide the basis for distinguishing objects. Next, it is considered how the filter parameters should be chosen for the bank of filters. The selection of discrete rotation angles θ k has already been demonstrated by Kyrki et al. (001) and Park and Yang (001), where it was shown how the orientations must be spaced uniformly, that is, θ k = kπ n k = {0,..., n 1}, (13) where θ k is the kth orientation and n is the number of orientations to be used. However, the computation can be reduced to half since responses on angles [π, π[ are 90 phase shifted from responses on [0, π[ in a case of a real valued input. 6

7 In the selection of discrete frequencies f k, exponential sampling must be used (e.g. Kamarainen et al., 00c; Daugman, 1988), that is, f k = a k f max k = {0,..., m 1}, (1) where f k is the kth frequency, f 0 = f max is the highest frequency desired, and a is the frequency scaling factor (a > 1). Useful values for a include a = for octave spacing and a = for half-octave spacing.. Feature matrix Now, using the features in Eq. () and the parameter selection schemes in Eq. (13) and Eq. (1) to cover frequencies of interest f 0,..., f m 1 and the orientations for desired angular discrimination, one can construct a feature matrix G at an image location (x 0, y 0 ) G = r(x 0, y 0 ; f 0, θ 0 ) r(x 0, y 0 ; f 0, θ 1 ) r(x 0, y 0 ; f 0, θ n 1 ) r(x 0, y 0 ; f 1, θ 0 ) r(x 0, y 0 ; f 1, θ 1 ) r(x 0, y 0 ; f 1, θ n 1 ) r(x 0, y 0 ; f m 1, θ 0 ) r(x 0, y 0 ; f m 1, θ 1 ) r(x 0, y 0 ; f m 1, θ n 1 ). (15) For convenience, typical matrix indexing is used in G, i.e., G(1, 1) = g 1,1 = r(x 0, y 0 ; f 0, θ 0 ) and G(m, n) = g m,n = r(x 0, y 0 ; f m 1, θ n 1 ). For illumination invariance, the response matrix can be normalized as G = G i,j g i,j. (16) The feature matrix in Eq. (15) or Eq. (16) can be used as an input feature for any classifier, e.g., as used in face evidence detection in Kamarainen et al. (00c). If features are extracted using the Gabor feature matrices from objects in a standard pose, it is possible to introduce matrix manipulations that allow invariant search. First, a column-wise circular shift of the feature matrix can be defined as ( ) G (θ+k) = G(1 : m, k : n) G(1 : m, 1 : k 1), (17) where G(i : j, u : v) represents the sub-matrix of G containing rows i... j and columns u... v. It should be noted that if the responses are calculated only for half orientation space, e.g., [0, π[, the phase wrapping must be taken 7

8 into consideration in the column-wise shift. Similarly a row-wise shift for scale manipulation can be defined G (f+k) = ( ) G(k + 1 : m, 1 : n) G(m + 1 : m + k, 1 : n). (18) The column-wise circular shift in Eq. (17) corresponds to search over all rotation angles, and the row-wise shift in Eq. (18) corresponds to search over all down-scales. Note that the row-wise shift is not circular but the highest frequencies (f 0,..., f k 1 ) vanish and new lower frequencies (f m+1,..., f m+k ) are mapped to the Gabor feature matrix as replacements. The shifts for the feature matrix manipulation are results from the rotation and translation properties of the Gabor filter. To provide a smooth behavior of the features, the sharpness parameters γ and η must be adjusted to have a sufficient overlap of the Gabor filters in the feature space (Daugman, 1988; Lee, 1996). As a conclusion it can be said that a very versatile recognition is possible, as rotation, scaling and illumination invariance can be achieved. 5 Experimental results The simple Gabor feature space introduced in this study has already been used in several successful applications, e.g., in rotation invariant and noise robust recognition of electric components by Kamarainen et al. (00a), and illumination, scale, translation, and orientation invariant detection of facial evidences by Kamarainen et al. (00c) and Hamouz et al. (003). In this study, the main contributions are the formulation of the response matrix, representing the simple Gabor feature space, and the simple row-wise and column-wise shifts of the matrix representing the orientation and scale manipulation of an image. The response matrix and the shift operations are demonstrated in Fig. 1, which shows the application in the invariant detection of facial evidences. Rotation of an image induces a circular column-wise shift of the response matrix, Fig. 1(b), and scaling induces a row-wise shift, Fig. 1(c). 6 Discussion Motivated by the excellent results in applications, the authors have introduced the simple Gabor feature space and its theoretical framework in this study. The proposed feature space is especially useful in the scale, rotation, 8

1 50 100 150 00 50 300 350 00 scale 1 3 11 10 9 8 7 6 5 50 500 3 550 1 1 3 orientation 100 00 300 00 500 600 700 (a) original image 1 50 100 150 00 50 300 350 00 scale 1 3 11 10 9 8 7 6 5 50 500 3

orientation 0 (c) rotated and scaled by 0.5 Fig. 1.

9 scale orientation (a) original image scale () (3) 3 () (1) orientation (b) rotated anti-clockwise by () (3) scale 3 () (5) () (3) 3 () (1) orientation 0 (c) rotated and scaled by 0.5 Fig. 1. Original image and its rotated and scaled versions, and corresponding feature matrices computed at the centroid of the right eye (f k = 1/1, 1/8, 1/56, 1/11, γ = 1, η = 1, θ k = 0, 5, 90, 135 ). For rotation note that the image origin is in the upper left corner. and translation invariant recognition of objects and furthermore it provides robustness to noise and illumination changes. The feature space is considered simple since it occupies only one spatial location and thus is aimed to be 9

10 used with sufficiently distinguishable objects. Sometimes the presence of the objects is too general to be distinguished in the simple Gabor feature space, but even then it may be possible to distinguish salient sub-parts, which can be used as input to a more general classification systems. This approach has already been successfully utilized in the face detection presented by Hamouz et al. (003). In this study, the motivation was to inspect the feature space itself and to present the theory behind the methods, including the invariant properties in Eqs. (5), (6) and (10), the construction of the feature matrix at one location in Eq. (15), and the rotation and scale invariant search operations by formulas in Eqs. (17) and (18). The proposed Gabor features lie somewhere between the shiftable operators (Simoncelli et al., 199) and the wavelets (Daubechies, 1990). In the future it would be intriguing to study what kind of advantages can be achieved with the features, which approximate the shiftability of the steerable functions and the orthogonality of the separable wavelets. References Bastiaans, M. J., Gabor s expansion of a signal into Gaussian elementary signals. Proceedings of the IEEE 68 (). Daubechies, I., The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory 36 (5), Daugman, J. G., July Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America A (7), Daugman, J. G., Complete discrete -d Gabor transforms by neural networks for image analysis and compression. IEEE Transactions on Acoustics, Speech, and Signal Processing 36 (7), Gabor, D., 196. Theory of communications. Journal of Institution of Electrical Engineers 93, Granlund, G. H., In search of a general picture processing operator. Computer Graphics and Image Processing 8, Hamouz, M., Kittler, J., Kamarainen, J.-K., Kälviäinen, H., 003. Hypothesesdriven affine invariant localization of faces in verification systems. In: Proceedings of the International Conference on Audio- and Video-Based Biometric Person Authentication (to appear). Kamarainen, J.-K., Kyrki, V., Kälviäinen, H., 00a. Fundamental frequency Gabor filters for object recognition. In: Proceedings of the 16th International Conference on Pattern Recognition. Vol. 1. Quebec, Canada, pp Kamarainen, J.-K., Kyrki, V., Kälviäinen, H., 00b. Robustness of Gabor 10

11 feature parameter selection. In: Proceedings of the IAPR Workshop on Machine Vision Applications. pp Kamarainen, J.-K., Kyrki, V., Kälviäinen, H., Hamouz, M., Kittler, J., 00c. Invariant Gabor features for evidence extraction. In: Proceedings of the IAPR Workshop on Machine Vision Applications. pp Krüger, V., Sommer, G., 00. Gabor wavelet networks for efficient head pose estimation. Image and Vision Computing 0 (9-10), Kyrki, V., 00. Local and global feature extraction for invariant object recognition. Ph.D. thesis, Lappeenranta University of Technology. Kyrki, V., Kamarainen, J.-K., Kälviäinen, H., 001. Content-based image matching using Gabor filtering. In: Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems Theory and Applications. Baden-Baden, Germany, pp Lades, M., Vorbrüggen, J., Buhmann, J., Lange, J., von der Malsburg, C., Würtz, R., Konen, W., Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computers (3), Lee, T. S., Image representation using D Gabor wavelets. IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (10), Park, H. J., Yang, H. S., 001. Invariant object detection based on evidence accumulation and gabor features. Pattern Recognition Letters, Porat, M., Zeevi, Y. Y., The generalized Gabor scheme of image representation in biological and machine vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 10 (), Simoncelli, E. P., Freeman, W. T., Adelson, E. H., Heeger, D. J., Mar Shiftable multiscale transforms. IEEE Transactions on Information Theory 38 (),

INVARIANCE of features has been an active area in object

INVARIANCE of features has been an active area in object IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. Y, MONTH Invariance Properties of Gabor Filter Based Features Overview and Applications Joni-Kristian Kamarainen, Ville Kyrki, Member, IEEE, Heikki Kälviäinen,