Scale & Affine Invariant Interest Point Detectors

Scale & Affine Invariant Interest Point Detectors Krystian Mikolajczyk and Cordelia Schmid Presented by Hunter Brown & Gaurav Pandey, February 19, 2009

Roadmap: Motivation Scale Invariant Detector Affine Invariant Detector Applications Conclusion 2

Problem Easy to find good interest points. Hard to find good interest points invariant to changing viewing conditions. 3

Related Work Kadir & Brady 1, 2001 Scale Selection Lowe 2, 1999 SIFT Lindeberg 3, 1998 Scale invariant detectors: LoG Lindeberg & Garding 4, 1997 Blob affine features Harris & Stephens 5, 1988 Harris Corner Detector 4

Roadmap Motivation Scale Invariant Detector Affine Invariant Detector Applications Conclusion 5

Scale Invariant Detector Idea: Scale Adapted Harris Corner Detector + Automatic Scale Selection 6

Harris Corner Detector Derivation Let the autocorrelation function be: (1) Now, approximate the second term with a taylor series: (2) Partial derivatives in x and y. Then substitute (2) back into (1): (3) 7

Harris Corner Derivation (cont) Now we have: (4) And finally, (5) Scale factors 8 Smooth with weighted gaussian kernal of size σ i

Automatic Scale Selection LoG: Laplacian-of-Gaussians Smooth via Gaussian kernel Apply second order differential operator (Laplacian) Courtesy Image Metrology A/S, Denmark. Courtesy Don Matthys, S.J., Marquette University. 9 Sneak peak: Only computes one scale per pixel.

Scale Invariant Algorithm Build image pyramid for pre-selected scales: (σ n = 1.4 n σ 0 ) For each level, compute Harris corners For every detected point, find the extrema of the LoG Keep points for which LoG is a local maximum 10 Courtesy Berend Engelbrecht, TheCodeProject.com

Roadmap: Motivation Scale Invariant Detector Affine Invariant Detector Applications Conclusion 11

Affine Transformation: Characteristic Shape Rotation + Scale x + Scale y Cool math alert: Can use the second moment matrix to find the affine deformation of an isotropic (invariant to direction) structure. 12

13 Courtesy Silvio Savarese, EECS442 Lecture 17

14 Courtesy Silvio Savarese, EECS442 Lecture 17

Affine Transformation of 2 nd Moment Matrix Goal 15

Second Moment Matrix Recall: Gradient: [dx dy] T T Define: μ(x, Σ I, Σ D ) = det( ) g( )*[( L)( x, )( L)( x, ) ] D D D D Covariance matrices (4) 16

Adjoint Transformation P=D T SD is adjoint transformation 6. Given: x R = Ax L Let: T ( xl, I, L, D, L) A ( xr, I, R, D, R) A Scale adapted 2 nd moment matrix!! A T T ( AxL, A I, LA, A, D L A T ) A M M (, I L, D, L x L, L (, I R, D, R x R, R ) ) 17

Adjoint Transformation M M L R A A T T M M R L A A 1 (6) R A L A T Suppose that M L can be computed such that, (7) 18

Adjoint Transformation Then, we can derive the following I, R D, R I M D M 1 R 1 R (8) If we estimate Σ R and Σ L such that 7,8 are true, then 6 must be true. 19

Affine Transformation Finally, Define: A 1/ 2 M R RM L 1/ 2 Orthogonal, represents arbitrary rotation or mirror transformation. M R 1 0 1 R 0 2 20

Normalized Isotropy Measure Q min max ( ) ( ) 21

Roadmap: Motivation Scale Invariant Detector Affine Invariant Detector Applications Conclusion 22

Feature Matching: Estimated Scale Factor: 4.9 Estimated Rotation: 19 23

Feature Matching: 24

25 Courtesy Silvio Savarese, EECS442 Lecture 17

26 Courtesy Silvio Savarese, EECS442 Lecture 17

Roadmap: Motivation Scale Invariant Detector Affine Invariant Detector Applications Conclusion 27

Conclusion Scale Invariance: Harris+Auto Scale (LoG) Affine Invariance: Harris+Affine Adaptation 28

Computational Complexity 29

Future Work Stability and convergence Invariance with occlusions New applications 30

References: 1. Kadir, T. and Brady,M. 2001. Scale, saliency and image description. International Journal of Computer Vision, 45(2):83 105. 2. Lowe, D.G. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 1150 1157. 3. Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):79 116. 4. Lindeberg, T. and Garding, J. 1997. Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure. Image and Vision Computing, 15(6):415 434. 5. Harris, C. & Stephens, M. A Combined Corner and Edge Detector. ALVEY Vision Conference, 1988, p147-152. 6. Hartley, R. I. & Zisserman, A. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521540518, 2004. 31

Evaluation of Feature Detectors and Descriptors based on 3D Objects By-: Pierre Moreels and Pietro Perona Califonia Institute of Technology, Pasadena, CA Presented by Hunter Brown & Gaurav Pandey, January 20, 2009

Objective: To explore the performance of a number of popular feature detectors and descriptors in matching 3D object features across view points and lightening conditions. 33

Motivation Critical issues in detection, description and matching of features are-: Robustness with respect to viewpoint and lighting changes. The number of features detected in a typical image. The frequency of false alarms and mismatches. The computational cost of each step. Different applications weigh these requirements differently. For example, object recognition, SLAM and wide-baseline stereo demand robustness to viewpoint, while the frequency of false matches may be more critical in object recognition, where thousands of potentially matching images are considered, rather than in wide-baseline stereo and mosaicing where only few images are present. 34

Previous Work The first extensive study of features stability depending on the feature detector being used, was performed by Schmid et al. (2000). The database consisted of drawing and paintings photographed from a number of view points. The key point here is that all scenes were planar, and the transformation between two images taken from different viewpoint was a homography. Ground truth (Homography) was computed from a grid of artificial points projected onto the paintings. Performance was measured by the repeatability rate, i.e. the percentage of locations selected as features in two images. 35

Previous Work (Schmid et al) Repeatability Criteria 3D points detected in one image should also be detected at approximately corresponding positions in subsequent images x i = H 1i x 1 where H 1i = P i P 1-1 36

Previous Work (Schmid et al) Ground Truth / Homography estimation 37

Previous Work Mikolajczyk et al (2004) performed a similar study of affineinvariant features detectors. Again they used planar scenes and the ground truth homography was computed using manually selected correspondences. Note -: They completely excluded the descriptor from their studies. If we have a detector which fires at any/most points of the image then according to their studies that will correspond to a stable feature, which is not true. Mikolajczyk and Schmid (2005) provided a complementary study where the focus was not on the detector stage but on the descriptor. 38

In this paper Focus is on 3D objects instead of just planar surfaces which allows greater variability in view point. Focus on detector and descriptor both rather than only one of these. Ground Truth estimated from the epipolar constraint because the dataset is not planar anymore. 39

Ground Truth Estimation 40

Ground Truth Estimation Potential matches for p have to lie on the corresponding epipolarline l 41

Ground Truth Estimation 42

Feature Detectors Based on Second Moment Matrix -: The motivation for these detectors is to select points where the image intensity has a high variability both in x and y directions Frostner Detector (1986)-: Selects as features the local maxima of the function Harris Corner Detector (1988)-: Selects as features the extrema of the function Lucas-Tomasi-Kanade Feature Detector (1994)-: Very similar to Harris, but with a greedy corner selection criterion. 43

Lucas-Tomasi-Kanade - Corner detector Very similar to Harris, but with a greedy corner selection criterion. We know that a corner is detected for which the eigenvalues of the Second moment matrix are large. Put all points for which λ 1 > thresh in a list L Sort the list in decreasing order by λ 1 Declare highest pixel p in L to be a corner. Then remove all points from L that are within a DxD neighborhood of p Continue until L is empty 44

Feature Detectors Interest point detection performed at different scale-: The Difference of Gaussian Detector (1994)-: Selects scale-space extrema of the image filtered by a difference of Gaussian. The Kadir-Brady Detector (2004)-: Selects locations where the local entropy has a maximum over scale and where the intensity probability density function varies fastest. MSER Features (2002) -: They are based on watershed flooding performed on the image intensities. Scale and Affine Interest Point Detectors (2004) computes a multiscale representation for the Harris interest point detector and then selects points at which a local measure (the Laplacian) is maximal over scales. 45

Kadir Brady Detector The method consists of three steps: Calculation of Shannon entropy of local image attributes for each x over a range of scales Select scales at which the entropy over scale function exhibits a peak sp ; Calculate the magnitude change of the PDF as a function of scale at each peak The final saliency Y D (x,sp) is the product of H D (x,sp) and W D (x,sp). 46

MSER-: Maximally Stable Extremal Regions Thresholds on intensity Form blobs and look for intensity range where watershed basin remains relatively stable. 47

Descriptors The role of the descriptor is to characterize the local image appearance around the location identified by the feature detector. Some popular descriptors are -: SIFT features (Lowe, 2004) are computed from gradient information. PCA-SIFT (Ke and Sukthankar, 2004) computes a primary orientation similarly to SIFT. Local patches are then projected onto a lowerdimensional space by using PCA analysis. Steerable filters (Freeman and Adelson, 1991) steer derivatives in a particular direction given the components of the local jet. E.g, steering derivatives in the direction of the gradient makes them invariant to rotation. Scale invariance is achieved by using various filter sizes. The Shape context descriptor (Belongie et al., 2002) is comparable to SIFT, but based on edges. Edges are extracted with the Canny filter, their location and orientation are then quantized into histograms using log-polar coordinates. 48

Steerable Filters Consider the 2D Gaussian filter The first derivative of this function in x direction is -: The same function rotated by 90 0 -: So for any arbitrary angle this can be written as-: Convolving these filters with the image in the direction of the gradient makes them invariant to rotation. Scale invariance is achieved by using various filter sizes 50

Performance Evaluation 51

Distance Measure in Appearance Space 52

Results 53

SIFT and Shape context descriptors with different detectors 54

55 Hessian/Harris Affine and DOG detectors with different descriptors

Best detector for a given descriptor 56

Stable keypoints as the complexity of object increases 57

58 Stable keypoints as the complexity of object increases

Conclusion Best Overall choice is using an Affine-rectified detector (Harris/Hessian Affine by Mikolajczyk and Schmid) followed by a SIFT (Lowe) or Shape Context descriptor (Belongie et al) 59