Object Recognition Using Local Characterisation and Zernike Moments

Object Recognition Using Local Characterisation and Zernike Moments A. Choksuriwong, H. Laurent, C. Rosenberger, and C. Maaoui Laboratoire Vision et Robotique - UPRES EA 2078, ENSI de Bourges - Université d Orléans, 10 boulevard Lahitolle, 18020 Bourges Cedex, France anant.choksuriwong@ensi-bourges.fr http://www.bourges.univ-orleans.fr/rech/lvr/ Abstract. Even if lots of object invariant descriptors have been proposed in the literature, putting them into practice in order to obtain a robust system face to several perturbations is still a studied problem. Comparative studies between the most commonly used descriptors put into obviousness the invariance of Zernike moments for simple geometric transformations and their ability to discriminate objects. Whatever, these moments can reveal themselves insufficiently robust face to perturbations such as partial object occultation or presence of a complex background. In order to improve the system performances, we propose in this article to combine the use of Zernike descriptors with a local approach based on the detection of image points of interest. We present in this paper the Zernike invariant moments, Harris keypoint detector and the support vector machine. Experimental results present the contribution of the local approach face to the global one in the last part of this article. 1 Introduction A fundamental stage for scene interpretation is the development of tools being able to consistently describe objects appearing at different scales or orientations in the images. Foreseen processes, developed for pattern recognition applications such as robots navigation, should allow to identify known objects in a scene permitting to teleoperate robots with special orders such as move towards the chair. Many works have been devoted to the definition of object invariant descriptors for simple geometric transformations [1], [2]. However, this invariance is not the only one desired property. A suited structure should indeed allow to recognize objects that appear truncated in the image, with a different color or a different luminance, on a complex background (with noise or texture). Amongst the available invariant descriptors, the Zernike moments [3], [4] have been developed to overcome the major drawbacks of regular geometrical moments regarding noise effects and presence of image quantization error. Based on a complete and orthonormal set of polynomials defined on the unit circle, these moments help in J. Blanc-Talon et al. (Eds.): ACIVS 2005, LNCS 3708, pp. 108 115, 2005. c Springer-Verlag Berlin Heidelberg 2005

Object Recognition Using Local Characterisation and Zernike Moments 109 achieving a near zero value of redundancy measure. In [5], a comparative study shows the relative efficiency of Zernike moments face to other invariant descriptors such as Fourier-Mellin ones or Hu moments. Nevertheless, Zernike moments can fail when objects appear partially hidden in the image or when a complex background is present. In order to improve the performances of the method, we propose to combine the Zernike moments with the keypoints detector proposed by Harris [6]. The Zernike moments will then be calculated in a neighborhood of each detected keypoint. This computation is more robust face to partial object occultation or if the object appears in a complex scene. In the first part of this article, the Zernike moments and Harris keypoints detector are briefly presented. The method we used for the training and recognition steps, based on a support vector machine [7], is also described. Experimental results, computed on different objects of the COIL-100 basis [8], are then presented permitting to compare the performances of the global and local approaches. Finally, some conclusions and perspectives are given. 2 Developed Method 2.1 Zernike Moments Zernike moments [3], [4] belong to the algebraic class for which the features are directly computed on the image. These moments use a set of Zernike polynomials that is complete and orthonormal in the interior of the unit circle. The Zernike moments formulation is given below: A mn = m +1 π I(x, y)[v mn (x, y)] (1) x y with x 2 + y 2 1. The values of m and n define the moment order and I(x, y) is a pixel gray-level of the image I over which the moment is computed. Zernike polynomials V mn (x, y) are expressed in the radial-polar form: where R mn (r) is the radial polynomial given by: V mn (r, θ) = R mn (r)e jnθ (2) R mn (r) = m n 2 s=0 ( 1) s (m s)! r m 2s s!( m+ n 2 s)! ( m n 2 s)! (3) These moments yield invariance with respect to translation, scale and rotation. For this study, the Zernike moments from order 1 to 15 have been computed (it represents 72 descriptors).

110 A. Choksuriwong et al. 2.2 Harris Keypoints Detector Lots of keypoints detectors have been proposed in the literature [9]. They are either based on a preliminary contour detection or directly computed on greylevel images. The Harris detector [6] that is used in this article belongs to the second class. It is consequently not dependant of a prior success of the contour extraction step. This detector is based on statistics of the image and rests on the detection of average changes of the auto-correlation function. Figure Fig.1 presents the interest points obtained for one object extracted from the COIL-100 basis and presented under geometric transformation. We can observe that not all points are systematically detected. However, this example shows the good repeatability of the obtained detector. Fig. 1. Keypoints detection for the same object under different geometric transformations The average number of detected keypoints is around 25 for the used images. In the local approach, the Zernike moments are computed on a neighborhood of each detected keypoint (see figure Fig.2). Fig. 2. Detected keypoints and associated neighborhood

Object Recognition Using Local Characterisation and Zernike Moments 111 2.3 Training and Recognition Method Suppose we have a training set {x i, y i } where x i is the invariant descriptors vector described previously (x i is composed of N KPi *N ZM values, with N KPi corresponding to the keypoint number of image i and N ZM the number of Zernike moments depending on the chosen order) and y i the object class. For two classes problems, y i { 1, 1}, the Support Vector Machines implement the following algorithm. First of all, the training points {x i } are projected in a space H (of possibly infinite dimension) by means of a function Φ( ). Then, the goal is to find, in this space, an optimal decision hyperplane, in the sense of a criterion that we will define shortly. Note that for the same training set, different transformations Φ( ) lead to different decision functions. A transformation is achieved in an implicit manner using a kernel K(, ) and consequently the decision function can be defined as: f(x) = w, Φ(x) + b = l α i y ik(x i, x)+b (4) with α i R. Thevaluew and b are the parameters defining the linear decision hyperplane. We use in the proposed system a radial basis function as kernel function. In SVMs, the optimality criterion to maximize is the margin, that is the distance between the hyperplane and the nearest point Φ(x i ) of the training set. The α i allowing to optimize this criterion are defined by solving the following problem: l max αi i=1 α i 1 l 2 i,j=1 α iα j y i K(x i, x j y j ) with constraints, (5) 0 α i C, l i=1 α iy i = 0. where C is a penalization coefficient for data points located in or beyond the margin and provides a compromise between their numbers and the width of the margin (for this study C = 1). Originally, SVMs have essentially been developed for the two classes problems. However, several approaches can be used for extending SVMs to multiclass problems. The method we use in this communication, is called one against one. Instead of learning N decision functions, each class is discriminated here from another one. Thus, N(N 1) 2 decision functions are learned and each of them makes a vote for the affectation of a new point x. The class of this point x becomes then the majority class after the voting. i=1 3 Experimental Results The experimental results presented below correspond to a test database composed of 100 objects extracted from the Columbia Object Image Library (COIL- 100) [8]. For each object of the gray-level images database, we have 72 views (128*128 pixels) presenting orientation and scale changes (see figure Fig. 3).

112 A. Choksuriwong et al. Fig. 3. Three objects in the COIL-100 database presented with different orientations and scales We first used different percentages of the image database in the learning set (namely 25%, 50% and 75%). For each object, respectively 18, 36 and 54 views have been randomly chosen to compose the learning set. The Zernike moments from order 1 to 15 (that is to say 72 descriptors) have been computed on a 11*11 pixels neighborhood of each detected keypoint of these images. We present for each experiment the recognition rate of the neighborhood of a keypoint. In this experiment, we tuned the parameter of the Harris detector in order to have about 25 keypoints for each object sample. In fact, this step has been repeated 10 times in order to make the results independent of the learning base draw. Table 1 presents the results obtained for the global and local approaches. We can note that, the largest the learning basis is, the highest the recognition rate is. The best results are obtained with the local approach. In order to measure the influence of the neighborhood size on the recognition rate, we tested four windowing size (7*7 pixels, 11*11 pixels, 15*15 pixels, 19*19 pixels). For this experiment the learning basis was constituted by 50% of the Table 1. Recognition rate for different database sizes Size of training database 25% 50% 75% Global approach 70.0% 84.6% 91.9% Local approach 94.0% 94.1% 97.7%

Object Recognition Using Local Characterisation and Zernike Moments 113 images database (36 views for each object). Table 2 presents the results obtained in each case. Results show that we obtain the best recognition rate with a window size equal to 15*15 pixels. Table 2. Influence of the neighborhood size on the recognition rate Neighborhood size 7*7 11*11 15*15 19*19 Recognition rate 91.2% 94.1% 98.6% 97.3% In order to evaluate the robustness of the proposed approach, we created 75 images for each object corresponding to alterations (see figure Fig. 4): 10 with an uniform background, 10 with a noisy background, 10 with a textured background, 10 with an occluding black box, 10 with an occluding grey-level box, 10 with a luminance modification and 15 with gaussian noise adding (standard deviation: 5, 10 and 20). Fig. 4. Alterations examples We kept the same Harris parameters setting and used for the local approach with a window size of 11 11 pixels. Figure Fig. 5 presents an example of detected keypoints and associated neighborhood face to three alterations (textured background, occluding box and noise adding). Table 3 presents the results of robustness for the global and local approaches with different sizes of neighborhood. We used the whole database for the learning

114 A. Choksuriwong et al. Fig. 5. Detected keypoints and associated neighborhood for three alterations (textured bakground, occluding box and noise adding) Table 3. Robustness of the proposed approach face to alterations Global Local 7x7 Local 11x11 Local 15x15 Local 19x19 uniform background 31.8 % 83.1 % 85.7 % 86.2% 86.1 % noise background 34.9 % 62.5 % 63.0 % 63.5% 62.6 % textured background 7.5 % 54.3 % 54.9 % 55.1 % 56.8 % black occluding 74.7 % 78.0 % 78.5 % 79.1 % 80.2 % gray-level occluding 71.2 % 79.4% 80.3 % 80.9 % 81.2 % luminance 95.9 % 87.7 % 88.35 % 80.0 % 89.8 % noise (σ =5) 100 % 70.5 % 73.0 % 73.4 % 73.1 % noise (σ = 10) 100 % 68.3 % 69.9 % 70.1 % 69.4 % noise (σ = 20) 100 % 62.2 % 62.5 % 62.9 % 61.2 % phase and we try to recognize altered objects. These results show the benefit of the local approach except to noise adding and luminance modification. In this case, a lot of keypoints are extracted due to the presence of noise. The local approach is then penalized. 4 Conclusion and Perspectives We present in this paper a study on object recognition by using Zernike moments computed in the neighborhood of Harris keypoints. Experimental results show the benefit of using the local approach face to the global one. We studied the influence of the neighborhood size for object recognition. The neighborhood of size 15 15 pixels is for us a good compromise between recognition rate and robustness to alterations. Perspectives of this study concern first of all the computation of the recognition rate. The percentage of well-labeled keypoints is actually taken into account. In order to improve the method, the recognitionof an object could be realizedby determining the majority vote of the image keypoints. We finally plan to apply the proposed approach for the navigation of mobile robots.

Object Recognition Using Local Characterisation and Zernike Moments 115 References 1. A. K. Jain, R.P.W. Duin and J.Mao: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1) (2000) 4 37 2. M. Petrou and A. Kadyrov: Affine Invariant Features from the Trace Transform. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(1) (2004) 30 44 3. A. Khotanzad and Y. Hua Hong: Invariant Image Recognition by Zernike Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(5) (1990) 489 497 4. C.-W. Chong, P. Raveendran and R. Mukundan: Mean Shift: A Comparative analysis of algorithms for fast computation of Zernike moment. Pattern Recognition 36 (2003) 731-742 5. A. Choksuriwong, H. Laurent and B. Emile: Comparison of invariant descriptors for object recognition. To appear in proc. of ICIP-05 (2005) 6. C. Harris and M. Stephens: A combined corner and edge detector. Alvey Vision Conference (1988) 147 151 7. C. Cortes and V. Vapnik: Support Vector Networks. Machine Learning 20 (1995) 1 25 8. http://www1.cs.columbia.edu/cave/research/softlib/coil-100.html 9. C. Schmid, R. Mohr and C. Bauckhage: Evaluation of interest point detectors. International Journal of Computer Vision 37(2) (2000) 151 172