Deformation and Viewpoint Invariant Color Histograms

Similar documents
Histogram Preserving Image Transformations

Corners, Blobs & Descriptors. With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros

Introduction to Computer Vision

Lecture 8: Interest Point Detection. Saad J Bedros

Blobs & Scale Invariance

Low-level Image Processing

Detectors part II Descriptors

CSE 473/573 Computer Vision and Image Processing (CVIP)

Instance-level l recognition. Cordelia Schmid INRIA

SIFT: SCALE INVARIANT FEATURE TRANSFORM BY DAVID LOWE

Multiscale Autoconvolution Histograms for Affine Invariant Pattern Recognition

Vlad Estivill-Castro (2016) Robots for People --- A project for intelligent integrated systems

Edges and Scale. Image Features. Detecting edges. Origin of Edges. Solution: smooth first. Effects of noise

Image Filtering. Slides, adapted from. Steve Seitz and Rick Szeliski, U.Washington

1. Projective geometry

INTEREST POINTS AT DIFFERENT SCALES

Advanced Features. Advanced Features: Topics. Jana Kosecka. Slides from: S. Thurn, D. Lowe, Forsyth and Ponce. Advanced features and feature matching

CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015

Science Insights: An International Journal

Instance-level recognition: Local invariant features. Cordelia Schmid INRIA, Grenoble

Lecture 8: Interest Point Detection. Saad J Bedros

Pattern Recognition 2

Advances in Computer Vision. Prof. Bill Freeman. Image and shape descriptors. Readings: Mikolajczyk and Schmid; Belongie et al.

A Method for Blur and Similarity Transform Invariant Object Recognition

Discriminative Direction for Kernel Classifiers

Nonlinear Diffusion. Journal Club Presentation. Xiaowei Zhou

DESIGN OF MULTI-DIMENSIONAL DERIVATIVE FILTERS. Eero P. Simoncelli

Region Covariance: A Fast Descriptor for Detection and Classification

2015 Math Camp Calculus Exam Solution

Feature extraction: Corners and blobs

Image enhancement. Why image enhancement? Why image enhancement? Why image enhancement? Example of artifacts caused by image encoding

Overview. Introduction to local features. Harris interest points + SSD, ZNCC, SIFT. Evaluation and comparison of different detectors

Properties of detectors Edge detectors Harris DoG Properties of descriptors SIFT HOG Shape context

Covariance Tracking Algorithm on Bilateral Filtering under Lie Group Structure Yinghong Xie 1,2,a Chengdong Wu 1,b

Feature detectors and descriptors. Fei-Fei Li

Scale & Affine Invariant Interest Point Detectors

Edge Detection PSY 5018H: Math Models Hum Behavior, Prof. Paul Schrater, Spring 2005

Original Article Design Approach for Content-Based Image Retrieval Using Gabor-Zernike Features

Rapid Object Recognition from Discriminative Regions of Interest

Image Analysis. Feature extraction: corners and blobs

SURF Features. Jacky Baltes Dept. of Computer Science University of Manitoba WWW:

Introduction to Computer Vision. 2D Linear Systems

A RAIN PIXEL RESTORATION ALGORITHM FOR VIDEOS WITH DYNAMIC SCENES

Feature detectors and descriptors. Fei-Fei Li

SIFT: Scale Invariant Feature Transform

Edge Detection in Computer Vision Systems

Instance-level recognition: Local invariant features. Cordelia Schmid INRIA, Grenoble

CS5670: Computer Vision

Lecture 3: Linear Filters

Constraint relationship for reflectional symmetry and rotational symmetry

Edge Detection. Introduction to Computer Vision. Useful Mathematics Funcs. The bad news

Motion Estimation (I) Ce Liu Microsoft Research New England

Blur Insensitive Texture Classification Using Local Phase Quantization

Rotational Invariants for Wide-baseline Stereo

Overview. Harris interest points. Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points

Pose estimation from point and line correspondences

Computer Assisted Image Analysis

Role of Assembling Invariant Moments and SVM in Fingerprint Recognition

When taking a picture, what color is a (Lambertian) surface?

Uncertainty Models in Quasiconvex Optimization for Geometric Reconstruction

Maximally Stable Local Description for Scale Selection

Lecture 3: Linear Filters

Affine Adaptation of Local Image Features Using the Hessian Matrix

Shape of Gaussians as Feature Descriptors

Lecture 3: The Shape Context

Edge Detection. CS 650: Computer Vision

Iterative Image Registration: Lucas & Kanade Revisited. Kentaro Toyama Vision Technology Group Microsoft Research

CS4495/6495 Introduction to Computer Vision. 6B-L1 Dense flow: Brightness constraint

The Kadir Operator Saliency, Scale and Image Description. Timor Kadir and Michael Brady University of Oxford

Wavelet Footprints: Theory, Algorithms, and Applications

Harris Corner Detector

Reconnaissance d objetsd et vision artificielle

SYMMETRY is a highly salient visual phenomenon and

Lecture 6: Edge Detection. CAP 5415: Computer Vision Fall 2008

Loss Functions and Optimization. Lecture 3-1

Level-Set Person Segmentation and Tracking with Multi-Region Appearance Models and Top-Down Shape Information Appendix

An Adaptive Confidence Measure for Optical Flows Based on Linear Subspace Projections

Filtering and Edge Detection

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION

Edge Detection. Computer Vision P. Schrater Spring 2003

A New Efficient Method for Producing Global Affine Invariants

Roadmap. Introduction to image analysis (computer vision) Theory of edge detection. Applications

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Motion Estimation (I)

RESTORATION OF VIDEO BY REMOVING RAIN

CS 664 Segmentation (2) Daniel Huttenlocher

Chapter I : Basic Notions

Integration - Past Edexcel Exam Questions

CITS 4402 Computer Vision

Detection of Motor Vehicles and Humans on Ocean Shoreline. Seif Abu Bakr

ENGI 4430 Line Integrals; Green s Theorem Page 8.01

Robust Motion Segmentation by Spectral Clustering

ENERGY METHODS IN IMAGE PROCESSING WITH EDGE ENHANCEMENT

Representing regions in 2 ways:

Scale & Affine Invariant Interest Point Detectors

CS 231A Section 1: Linear Algebra & Probability Review

Video and Motion Analysis Computer Vision Carnegie Mellon University (Kris Kitani)

Gradient Domain High Dynamic Range Compression

Transcription:

1 Deformation and Viewpoint Invariant Histograms Justin Domke and Yiannis Aloimonos Computer Vision Laboratory, Department of Computer Science University of Maryland College Park, MD 274, USA domke@cs.umd.edu, yiannis@cfar.umd.edu Abstract We develop a theoretical basis for creating color histograms that are invariant under deformation or changes in viewpoint. The gradients in different color channels weight the influence of a pixel on the histogram so as to cancel out the changes induced by deformations. Experiments show these histograms to be invariant under a variety of distortions and changes in viewpoint. 1 Introduction histograms have been widely used for object recognition. Though in practice these histograms often vary slowly under changes in viewpoint, it is clear that the color histogram generated from an image surface is intimately tied up with the geometry of that surface, and the viewing position. In this paper, we suggest a method to create color histograms based on the color gradients. We will see that our method is invariant under any mapping of the surface which is locally affine, and thus a very wide class of viewpoint changes or deformations. 1.1 Related Work The use of color histograms for recognition was initiated in a famous paper by Swain and Ballard [4]. This paper used simple color histograms for object recognition, with surprisingly strong results. Among the work that followed this, Funt and Finlayson [1] reduce sensitivity to lighting by considering changes among neighboring pixels, similarly to Land s Retinex theory. Stricker and Orengo [3] provide a measure of color similarity between images, storing only the dominant features of the histogram. They interpret the color histogram as a probability distribution, and then store only the central moments. 2 Derivation Our derivation will proceed as follows. First, we will explore how small areas change in size under a given local affine transformation. Next, we will show that, given two corresponding points, the derivatives in two color channels may be used to recover the relevant

2 parameters of the homography, and thus the ratio of differential areas. 1 Finally, we will show that on this basis, the derivatives can be used to weight pixels when constructing a color histogram to cancel out viewpoint dependent effects. 2.1 Local Affine Approximation Given two corresponding points in two images, one can change coordinate systems such that both points lie at the origin in their respective systems. Consider this done. Assuming that the transformation between the two images is continuous, it is locally linear. Note that this is a weak assumption, satisfied by transformations such as homographies, changes in viewpoint or smooth deformations. Thus, for infinitesimal x we can write the local affine transformation as where x = Hx (1) [ A B H = C D ]. (2) Here, x and x are the points in the first and second coordinate systems, respectively. 2.2 Differential Area From Gradients By definition, the differential area in the first coordinate system is da = dx dy. (3) We can then find the corresponding area in the second coordinate system. So, the ratio of differential areas is 2.3 Invariant Histograms da = x x x y = AD BC dx dy (4) da = AD BC. (5) da Denote the image intensity function in one color channel by f(x,y) = f (x,y ). Using standard calculus, we can relate the image derivatives. f x = f x x x + f y y x (6) f y = f x x y + f y y y (7) 1 We emphasize that no correspondence is used in our technique- this is merely an artifact of the derivation.

3 Now, write the derivative of f with respect to x, evaluated at the origin as f x, and for f similarly. Evaluating the coordinate derivatives yields f x = f x A+ f y C (8) f y = f xb+ f y D. (9) Notice that, given only the derivates of f in both images, it will not be possible to recover AD BC, and thus the differential area. Two solutions to this problem are apparent. First, one could use higher-order derivatives. Second, one can use other color channels. Given the difficulty of accurately measuring high-order derivatives, we will pursue the second option in this work. Denote the image intensity function in a second color by g. The derivatives will obey the same constraints as above. g x = g xa+g y C (1) g y = g xb+g y D (11) Now, given all derivatives, it is possible to solve for each of A, B, C, and D. A = f yg x g y f x f x g y g x f y, C = f cg x g c f x f x g y g x f y, B = f yg y g y f y f x g y g x f y D = f cg y g x f y f x g y g x f y Substituting these expressions into Equation 5 and simplifying gives us (12) da da = AD CB = f xg y f yg x f x g y f y g x. (13) We can now see a direct relationship between the color derivatives, and the differential areas. da f x g y f y g x = da f xg y f yg x (14) This is the key relationship that we will use to construct invariant color histograms. Notice that one can consider the gradients of the two color channels as vectors. Then, Equation 14 says that the differential area, times the norm of the cross product of the color gradients is invariant. If we take integrals over the areas in both images with a given color, these integrals will be equal, and so no correspondence is required. If the color of pixel x is some integer x c, then, x:x c =c f x (x)g y (x) f y (x)g x (x) da = f x :x c =c x (x )g y (x ) f y (x )g x (x ) da. (15) Intuitively, the contribution in the integral in Equation 15 will be largest when there are significant gradient magnitudes in both color channels, and these gradients are in different directions.

4 3 Algorithm Using Equation 14, we can create a very simple algorithm to create deformation invariant color histograms. In our notation, the histogram component for color c, h c, in a traditional color histogram is h c = 1. (16) s:s c =c The only difference with our method is that, rather than each pixel getting one vote, the influence of each pixel is weighted by a function of the derivatives. h c = w(s) (17) s:s c =c w(s) = f x (s)g y (s) f y (s)g x (s) (18) 4 Implementation In the experiments in this paper, we used 5 color bins in each of the three color channels, resulting in a total of 125 bins. Using such a coarse sampling of the color space is principally for display purposes. In our experiments, it was possible to accurately recover histograms with many more colors. This method requires derivatives in two color channels, f and g. Here, we have used red as one channel, and the average of green and blue for the other. Derivatives are taken in the simplest possible way, though convolution with the filters [ ] 1 1 and 1. (19) 1 Experimentation with more complex methods of measuring the derivative have shown the above filters to be preferred. 5 Experiments In all the synthetic experiments shown in this paper, the initial images are first processed with a low-pass filter to make derivatives more reliable. This filter is applied once, before warping, and so no similar process can be applied with real images. See Section 6 for further discussion of this issue. As a first example, Figure 1 shows the computation of an invariant histogram, and compares it to a traditional histogram. We emphasize that the only difference in the computation of the two histograms is as contrasted in Equations 16 and 17. In our experiments, we seek to answer two questions. First, how do invariant histograms change under deformation or change in viewpoint? We will see that invariant histograms are nearly unchanging under distortions or viewpoint changes that lead to large changes in traditional histograms. Secondly, how do invariant histograms change under true changes in the scene? If the invariance was gained at the price of simply being

.1 5.7.6.5.4.3.2.7.6.5.4.3.2.1 2 x 1 4 2 1 2 4 6 8 1 12 2 4 6 8 1 12 Figure 1: First row: initial image, and h( ) mapped over it. Second row: image warped under a homography, and h( ) mapped over it. Third row: traditional (unweighted) color histogram. Fourth row: Invariant color histogram. insensitive to the images, we would not have a useful technique. We will see that this is not the case- under true changes in the scene, invariant histograms appear to have similar sensitivity as traditional histograms. To investigate changes under deformation, we warped an image under a range of radial distortions, and computed histograms for each resulting image. To compute the difference of two histograms we used the sum of squared differences. To make the comparison between traditional and invariant histograms meaningful despite traditional histograms having vastly higher magnitudes, the error is scaled by the sum of squares of the first histogram. difference(h A,h B ) = c(h A c hb c )2 c (h A c) 2 (2) Results are shown in Figure 2. We can observe that though traditional histograms increase in difference with larger distortions, the invariant histograms are essentially un-

6.3.25 Traditional histogram Invariant histogram.2 error.15.1.5.2.2 radial distortion 5 x 14 5 x 14 2 4 6 8 1 12 Third Image 1 2 4 6 8 1 12 1 2 4 6 8 1 12 Third Image 2 4 6 8 1 12 Figure 2: Histograms under radial distortion. First row: The first image, and the result of warping it under the minimum and maximum distortion coefficients. Second row: The sum of absolute differences of the first image and each of the warped image. Bottom rows show traditional and invariant histograms comparing images as labeled. changed. Since the image we have used are taken from a stereo sequence [2], we can compare how histograms change under small change in structure. Here, the actual surfaces in view are changing, so invariance is neither expected nor desired. In Figure 2 we compare the histograms resulting from images taken with a continually increasing viewpoint. Here, the invariant histogram has similar performance to the traditional histogram. Notice particularly that despite the small change in viewpoint, the differences for the invariant histogram quickly become larger than those resulting from the major distortions

7.1.8 SSD.6.4.2 2 4 6 8 1 Image Figure 3: Effects of changing scene. Left: First image. Center: Final image. Right: Histogram differences computed with respect to the first image in the scene. Compare to Figure 2. in Figure 2. This suggests that invariance with respect to deformation is possible with out giving up sensitivity to true changes in the scene. In Figure 4 we see the behavior of histograms under a range of projective transformation. There, the first image was warped under a range of homographies, H = λ H 1 +(1 λ)h 2 where <= λ <= 1 We should note that for λ =.5, H = I. We see again here that the invariant histogram remains very similar despite the distortions. Finally, Figure 5 shows traditional and invariant histograms derived from a pair of real images. The errors in the invariant histogram are likely due to the fact that that this technique accounts for deformation changes only- no attempt is made to account for changes due to lighting conditions, or image noise. 6 Conclusions In this paper we have seen that, given two corresponding points in two images, if the gradients in two color channels were available, the local affine transformation could be recovered. On this basis, it is possible to construct a measure using the gradients which cancels out the changes in image area resulting from viewpoint. Using this measure, we can integrate over areas in two images having a given color. The resulting value will be invariant to any transformation which is locally affine, and hence essentially any smooth transformation. Future work should further address practical issues with finding these histograms in real images. One could incorporate techniques to overcome color changes due to lighting [1]. Another issue is the measure of difference between two invariant histograms. Images, and specifically gradient measurements are inevitably corrupted by noise. If many measurements are taken the noise is likely to average out, but this will not occur for few measurements. A close observation of some of our results suggests that errors result in the histogram precisely at those points that have the smallest components in the traditional histogram, and thus at those points derived from the fewest image measurements. Thus, when measuring the distance between two invariant histograms, it would be natural use the number of pixels having each value, to weight the importance of each component. This could be naturally phrased in a probabilistic view. The technique developed in this paper will only work so far as the mathematical model of the problem is accurate. We emphasize that this is often not the case. The method is

8 2.5 2 Traditional histogram Invariant histogram SSD 1.5 1.5 1 x 14 5.2.4.6.8 1 λ 2 4 6 8 1 12 1 x 14 5 Third Image 2 4 6 8 1 12 2 1 2 4 6 8 1 12 2 1 Third Image 2 4 6 8 1 12 Figure 4: Histograms under projective transformations. First row: The first image, and the result of warping it under H 1 and H 2. Second row: The sum of absolute differences of the first image and each of the warped image. Bottom rows show traditional and invariant histograms comparing images as labeled.

9 x 1 4 5 5 2 4 6 8 1 12 2 4 6 8 1 12 Figure 5: Traditional and Invariant histograms derived from segmented regions of two real images. invariant only to pure deformations of the scene. In real images, however, other effects will often be present. These include changes due to reflectance, and illumination, for which the literature offers several potential solutions. Perhaps more problematic is the fact that filtering is present in the imaging process. This means that the signals in two images are not just transformed version of each other- different signals are present. Intuitively, our model of the imaging process suggests that an image of a physical object is simply the projection of the intensity function onto the image plane. Crucially, our model states that if the scale of the observed object is halved, all derivatives will be doubled. However, if a real image is taken of a rich texture from far away, rather than resulting in high derivatives, the image will often be near a constant color, with very low derivatives. If a technique is to work for arbitrary objects and scales, it will be necessary to properly account for the real behavior of images under scale changes. The present work is extendable in two major directions. First one could use measurements other than gradients to do the balancing required to create invariance. Using other measurements here could significantly increase robustness of image effects. Second, one could create histograms of quantities other than color. By using histograms of more sophisticated filters, it is likely that recognition of regions could be made significantly more powerful.

1 References [1] Brian V. Funt and Graham D. Finlayson. constant color indexing. IEEE Trans. Pattern Anal. Mach. Intell., 17(5):522 529, 1995. [2] Daniel Scharstein and Richard Szeliski. High-accuracy stereo depth maps using structured light. In CVPR (1), pages 195 22, 23. [3] Markus A. Stricker and Markus Orengo. Similarity of color images. In Storage and Retrieval for Image and Video Databases (SPIE), pages 381 392, 1995. [4] Michael J. Swain and Dana H. Ballard. indexing. Int. J. Comput. Vision, 7(1):11 32, 1991.