Feature detectors and descriptors. Fei-Fei Li

Feature detectors and descriptors Fei-Fei Li

Feature Detection e.g. DoG detected points (~300) coordinates, neighbourhoods Feature Description e.g. SIFT local descriptors (invariant) vectors database of local descriptors Matching / Indexing / Recognition e.g. Mahalanobis distance + Voting algorithm [Mikolajczyk & Schmid 01]

Some of the challenges Geometry Rotation Similarity (rotation + uniform scale) Affine (scale dependent on direction) valid for: orthographic camera, locally planar object Photometry Affine intensity change (I a I + b)

Detector Descriptor Intensity Rotation Scale Affine Harris corner 2 nd moment(s) Mikolajczyk & Schmid 01, 02 2 nd moment(s) Tuytelaars, 00 2 nd moment(s) Lowe 99 (DoG) SIFT, PCA-SIFT Kadir & Brady, 01 Matas, 02 others others

An introductory example: Harris corner detector C.Harris, M.Stephens. A Combined Corner and Edge Detector. 1988

Harris Detector: Basic Idea flat region: no change in all directions edge : no change along the edge direction corner : significant change in all directions

Harris Detector: Mathematics Change of intensity for the shift [u,v]: xy, [ ] 2 Euv (, ) = wxy (, ) I( x+ u, y + v) I( x, y) Window function Shifted intensity Intensity Window function w(x,y) = or 1 in window, 0 outside Gaussian

Harris Detector: Mathematics For small shifts [u,v] we have a bilinear approximation: u Euv (, ) [ uv, ] M v where M is a 2 2 matrix computed from image derivatives: M 2 I x II x y = w( x, y) 2 xy, I xiy Iy

Harris Detector: Mathematics Intensity change in shifting window: eigenvalue analysis u Euv (, ) [ uv, ] M v λ 1, λ 2 eigenvalues of M Ellipse E(u,v) = const direction of the fastest change (λ max ) -1/2 (λ min )-1/2 direction of the slowest change

Harris Detector: Mathematics Classification of image points using eigenvalues of M: λ 2 Edge λ 2 >> λ 1 Corner λ 1 and λ 2 are large, λ 1 ~ λ 2 ; E increases in all directions λ 1 and λ 2 are small; E is almost constant in all directions Flat region Edge λ 1 >> λ 2 λ 1

Harris Detector: Mathematics Measure of corner response: R = det M k M ( trace ) 2 det M trace M = λ λ 1 2 = λ + λ 1 2 (k empirical constant, k = 0.04-0.06)

Harris Detector: Mathematics R depends only on eigenvalues of M λ 2 Edge R < 0 Corner R is large for a corner R is negative with large magnitude for an edge R > 0 R is small for a flat region Flat R small Edge R < 0 λ 1

Harris Detector The Algorithm: Find points with large corner response function R (R > threshold) Take the points of local maxima of R

Harris Detector: Workflow

Harris Detector: Workflow Compute corner response R

Harris Detector: Workflow Find points with large corner response: R>threshold

Harris Detector: Workflow Take only the points of local maxima of R

Harris Detector: Workflow

Harris Detector: Summary Average intensity change in direction [u,v] can be expressed as a bilinear form: u Euv (, ) [ uv, ] M v Describe a point in terms of eigenvalues of M: measure of corner response ( ) 2 R = λ λ k λ + λ 1 2 1 2 A good (corner) point should have a large intensity change in all directions, i.e. R should be large positive

Harris Detector: Some Rotation invariance Properties Ellipse rotates but its shape (i.e. eigenvalues) remains the same Corner response R is invariant to image rotation

Harris Detector: Some Properties Partial invariance to affine intensity change Only derivatives are used => invariance to intensity shift I I + b Intensity scale: I a I R R threshold x (image coordinate) x (image coordinate)

Harris Detector: Some Properties But: non-invariant to image scale! All points will be classified as edges Corner!

Harris Detector: Some Properties Quality of Harris detector for different scale changes Repeatability rate: # correspondences # possible correspondences C.Schmid et.al. Evaluation of Interest Point Detectors. IJCV 2000

Detector Descriptor Intensity Rotation Scale Affine Harris corner 2 nd moment(s) No No Mikolajczyk & Schmid 01, 02 2 nd moment(s) Tuytelaars, 00 2 nd moment(s) Lowe 99 (DoG) SIFT, PCA- SIFT Kadir & Brady, 01 Matas, 02 others others

Interest point detectors Harris-Laplace [Mikolajczyk & Schmid 01] Adds scale invariance to Harris points Set s i = λs d Detect at several scales by varying s d Only take local maxima (8-neighbourhood) of scale adapted Harris points Further restrict to scales at which Laplacian is local maximum

Interest point detectors Harris-Laplace [Mikolajczyk & Schmid 01] Selected scale determines size of support region Laplacian justified experimentally compared to gradient squared & DoG [Lindeberg 98] gives thorough analysis of scale-space

Interest point detectors Harris-Affine [Mikolajczyk & Schmid 02] Adds invariance to affine image transformations Initial locations and isotropic scale found by Harris-Laplace Affine invariant neighbourhood evolved iteratively using the 2 nd moment matrix μ: ) )), ( ))(, ( (( ) ( ),, ( T D D I D I L L g Σ Σ Σ = Σ Σ x x x μ ) 2 exp( 2 1 ) ( 1 x x Σ Σ = Σ T g π ) ( ) ( ), ( x x I g L Σ = Σ

Interest point detectors Harris-Affine [Mikolajczyk & Schmid 02] For affinely related points: x = L Ax R If μ ( x L, I, L, Σ D, L Σ ) = M L Σ I, L = tm 1 L Σ D, L = dm 1 L and μ ( xr, I, R, ΣD, R Σ ) = M R Σ I, R = tm 1 R Σ D, R = dm 1 R Then by normalising: We get: 1/ 2 2 L L and x R M 1/ R xr x L M x x L Rx R so the normalised regions are related by a pure rotation See also [Lindeberg & Garding 97] and [Baumberg 00]

Interest point detectors Harris-Affine [Mikolajczyk & Schmid 02] Algorithm iteratively adapts shape of support region spatial location x (k) integration scale σ I (based on Laplacian) derivation scale σ D = sσ I

Affine Invariant Detection Take a local intensity extremum as initial point Go along every ray starting from this point and stop when extremum of function f is reached f It () I points along the ray We will obtain approximately corresponding regions f () t = t 1 t o 0 I () t I dt 0 T.Tuytelaars, L.V.Gool. Wide Baseline Stereo Matching Based on Local, Affinely Invariant Regions. BMVC 2000.

Affine Invariant Detection The regions found may not exactly correspond, so we approximate them with ellipses Geometric Moments: mpq p q = x y f( x, y) dxdy Fact: moments m pq uniquely 2 determine the function f Taking f to be the characteristic function of a region (1 inside, 0 outside), moments of orders up to 2 allow to approximate the region by an ellipse This ellipse will have the same moments of orders up to 2 as the original region

Affine Invariant Detection Covariance matrix of region points defines an ellipse: q = Ap T p Σ = 1 1 p 1 T q Σ = 1 2 q 1 Σ = 1 T pp region 1 Σ = 2 T qq region 2 ( p = [x, y] T is relative to the center of mass) Σ = AΣ 2 1 A T Ellipses, computed for corresponding regions, also correspond!

Affine Invariant Detection Algorithm summary (detection of affine invariant region): Start from a local intensity extremum point Go in every direction until the point of extremum of some function f Curve connecting the points is the region boundary Compute geometric moments of orders up to 2 for this region Replace the region with ellipse T.Tuytelaars, L.V.Gool. Wide Baseline Stereo Matching Based on Local, Affinely Invariant Regions. BMVC 2000.

Detector Descriptor Intensity Rotation Scale Affine Harris corner 2 nd moment(s) No No Mikolajczyk & Schmid 01, 02 2 nd moment(s) Tuytelaars, 00 2 nd moment(s) No ( 04?) Lowe 99 (DoG) SIFT, PCA-SIFT Kadir & Brady, 01 Matas, 02 others others

Interest point detectors Difference of Gaussians [Lowe 99] Difference of Gaussians in scale-space detects blob -like features Can be computed efficiently with image pyramid Approximates Laplacian for correct scale factor Invariant to rotation and scale changes

Res ample Blur Subtra ct Key point localization Detect maxima and minima of difference-of-gaussian in scale space (Lowe, 1999) Fit a quadratic to surrounding values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002) Taylor expansion around point: Offset of extremum (use finite differences for derivatives):

Select canonical orientation Create histogram of local gradient directions computed at selected scale Assign canonical orientation at peak of smoothed histogram Each key specifies stable 2D coordinates (x, y, scale, orientation) 0 2π

Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures

Creating features stable to viewpoint change Edelman, Intrator & Poggio (97) showed that complex cell outputs are better for 3D recognition than simple correlation

SIFT vector formation Thresholded image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions

Feature stability to noise Match features after random change in image scale & orientation, with differing levels of image noise Find nearest neighbor in database of 30,000 features

Feature stability to affine change Match features after random change in image scale & orientation, with 2% image noise, and affine distortion Find nearest neighbor in database of 30,000 features

Other interest point detectors Scale Saliency [Kadir & Brady 01, 03]

Other interest point detectors Scale Saliency [Kadir & Brady 01, 03] Uses entropy measure of local pdf of intensities: Takes local maxima in scale Weights with change of distribution with scale: To get saliency measure: = D d D d s d p s d p s H d ).,, ( ) log,, ( ), ( 2 x x x = D d D d s d p s s s W d ).,, ( ), ( x x ), ( ), ( ), ( x x x s W s H s Y D D D =

Other interest point detectors Scale Saliency [Kadir & Brady 01, 03] H D (s,x) W D (s,x) Just using H D (s,x) Using Y D (s,x) = H D W D Most salient parts detected

Other interest point detectors maximum stable extremal regions [matas et al. 02] Sweep threshold of intensity from black to white Locate regions based on stability of region with respect to change of threshold

Affine-invariant texture recognition Texture recognition under viewpoint changes and non-rigid transformations Use of affine-invariant regions invariance to viewpoint changes spatial selection => more compact representation, reduction of redundancy in texton dictionary [A sparse texture representation using affine-invariant regions, S. Lazebnik, C. Schmid and J. Ponce, CVPR 2003]

Overview of the approach

Region extraction Harris detector Laplace detector

Descriptors Spin images

Spatial selection clustering each pixel clustering selected pixels

Signature and EMD Hierarchical clustering => Signature : S = { ( m 1, w 1 ),, ( m k, w k ) } Earth movers distance D( S, S ) = [ i,j f ij d( m i, m j )] / [ i,j f ij ] robust distance, optimizes the flow between distributions can match signatures of different size not sensitive to the number of clusters

Database with viewpoint changes 20 samples of 10 different textures

Results Spin images Gabor-like filters

Feature detectors and descriptors

Widely used descriptors SIFT Gray-scale intensity values Steerable filters GLOH Shape context & geometric blur

SIFT

Gray-scale intensity Normalize 11x11 patch Projection onto PCA basis c 1 c 2.. c 15

Steerable filters

GLOH

Shape context Belongie et al. 2002

Geometric blur Berg et al. 2001

Geometric blur