CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, PDF Free Download

CS 3710: Visual Recognition Describing Images with Features Adriana Kovashka Department of Computer Science January 8, 2015

Plan for Today Presentation assignments + schedule changes Image filtering Feature detection Feature description Feature matching Next time: Classification and detection Adriana s research

Announcements Open door policy Fixed office hours? Adriana s travel Clarification of experiment presentations

Presentation Assignments

Image Description

An image is a set of pixels What we see What a computer sees S. Narasimhan Source: S. Narasimhan

Problems with pixel representation Not invariant to small changes Translation Illumination etc. Some parts of an image are more important than others What do we want to represent?

Preprocessing: Image filtering

Image filtering Compute a function of the local neighborhood at each pixel in the image Function specified by a filter or mask saying how to combine values from neighbors. Uses of filtering: Enhance an image (denoise, resize, etc) Extract information (texture, edges, etc) Detect patterns (template matching) Derek Hoiem, Kristen Grauman

Motivation: noise reduction Even multiple images of the same static scene will not be identical. Kristen Grauman

Common types of noise Salt and pepper noise: random occurrences of black and white pixels Impulse noise: random occurrences of white pixels Gaussian noise: variations in intensity drawn from a Gaussian normal distribution Kristen Grauman, Steve Seitz

Motivation: noise reduction How could we reduce the noise, i.e., give an estimate of the true intensities? What if there s only one image? Kristen Grauman

First attempt at a solution Let s replace each pixel with an average of all the values in its neighborhood Assumptions: Expect pixels to be like their neighbors Expect noise processes to be independent from pixel to pixel Kristen Grauman

First attempt at a solution Let s replace each pixel with an average of all the values in its neighborhood Moving average in 1D: Kristen Grauman, S. Marschner

Weighted Moving Average Can add weights to our moving average Weights [1, 1, 1, 1, 1] / 5 Non-uniform weights [1, 4, 6, 4, 1] / 16 Kristen Grauman, S. Marschner

Moving Average In 2D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Kristen Grauman, Steve Seitz

Moving Average In 2D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Kristen Grauman, Steve Seitz

Moving Average In 2D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Kristen Grauman, Steve Seitz

Moving Average In 2D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 30 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Kristen Grauman, Steve Seitz

Moving Average In 2D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 30 30 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Kristen Grauman, Steve Seitz

Moving Average In 2D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 10 20 30 30 30 20 10 0 20 40 60 60 60 40 20 0 30 60 90 90 90 60 30 0 30 50 80 80 90 60 30 0 30 50 80 80 90 60 30 0 20 30 50 50 60 40 20 10 20 30 30 30 30 20 10 10 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Kristen Grauman, Steve Seitz

Correlation filtering Say the averaging window size is 2k+1 x 2k+1: Attribute uniform weight to each pixel Loop over all pixels in neighborhood around image pixel F[i,j] Now generalize to allow different weights depending on neighboring pixel s relative position: Non-uniform weights Kristen Grauman

Correlation filtering This is called cross-correlation, denoted Filtering an image: replace each pixel with a linear combination of its neighbors. The filter kernel or mask H[u,v] is the prescription for the weights in the linear combination. Kristen Grauman

Gaussian filter What if we want nearest neighboring pixels to have the most influence on the output? 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 2 4 2 1 2 1 This kernel is an approximation of a 2d Gaussian function: Removes high-frequency components from the image ( low-pass filter ). Kristen Grauman, Steve Seitz

Kristen Grauman Smoothing with a Gaussian

Smoothing with a Gaussian Dali Antonio Torralba

Aude Oliva Marilyn Einstein

Describing images with features: Feature detection

Kristen Grauman What points would you choose?

Local features: desired properties Repeatability The same feature can be found in several images despite geometric and photometric transformations Saliency Each feature has a distinctive description Compactness and efficiency Many fewer features than image pixels Locality A feature occupies a relatively small area of the image; robust to clutter and occlusion Kristen Grauman

Goal: interest operator repeatability We want to detect (at least some of) the same points in both images. Kristen Grauman No chance to find true matches, yet we have to be able to run the detection procedure independently per image.

Goal: descriptor distinctiveness We want to be able to reliably determine which point goes with which.? Must provide some invariance to geometric and photometric differences between the two views. Kristen Grauman

Corners as distinctive interest points We should easily recognize the point by looking through a small window Shifting a window in any direction should give a large change in intensity flat region: no change in all directions Alyosha Efros, Darya Frolova, Denis Simakov edge : no change along the edge direction corner : significant change in all directions

Harris Detector: Mathematics Window-averaged squared change of intensity induced by shifting the image data by [u,v]: Window function Shifted intensity Intensity Window function w(x,y) = or 1 in window, 0 outside Gaussian Darya Frolova, Denis Simakov

Harris Detector: Mathematics Window-averaged squared change of intensity induced by shifting the image data by [u,v]: Window function Shifted intensity Intensity E(u, v) Darya Frolova, Denis Simakov

Harris Detector: Mathematics Expanding I(x,y) in a Taylor series expansion, we have, for small shifts [u,v], a quadratic approximation to the error surface between a patch and itself, shifted by [u,v]: where M is a 2 2 matrix computed from image derivatives: Darya Frolova, Denis Simakov

y y y x y x x x I I I I I I I I y x w M ), ( x I I x y I I y y I x I I I y x Notation: Kristen Grauman Harris Detector: Mathematics

What does this matrix reveal? Since M is symmetric, we have M X 1 0 0 X 2 T Mx i x i i The eigenvalues of M reveal the amount of intensity change in the two principal orthogonal gradient directions in the window. Kristen Grauman

Corner response function edge : 1 >> 2 2 >> 1 corner : 1 and 2 are large, 1 ~ 2 flat region: 1 and 2 are small Alyosha Efros, Darya Frolova, Denis Simakov, Kristen Grauman

Harris Detector: Mathematics Measure of corner response: (k empirical constant, k = 0.04-0.06) Darya Frolova, Denis Simakov

Harris Detector: Summary Compute image gradients Ix and Iy for all pixels For each pixel Compute by looping over neighbors x,y compute Find points with large corner response function R (R > threshold) Take the points of locally maximum R as the detected feature points (i.e., pixels where R is bigger than for all the 4 or 8 neighbors). Darya Frolova, Denis Simakov 42

Kristen Grauman Example of Harris application

Harris Detector: Some Properties Partial invariance to additive and multiplicative intensity changes Only derivatives are used => invariance to intensity shift Intensity scaling: fine, except for the threshold that s used to specify when R is large enough. R R threshold x (image coordinate) x (image coordinate) Darya Frolova, Denis Simakov

Harris Detector: Some Properties Invariant to image scale? image zoomed image Antonio Torralba

Harris Detector: Some Properties Not invariant to image scale! All points will be classified as edges Corner! Darya Frolova, Denis Simakov

Scale Invariant Detection The problem: how do we choose corresponding circles independently in each image? Do objects in the image have a characteristic scale that we can identify? Darya Frolova, Denis Simakov

Solution: Scale Invariant Detection Design a function on the region (circle), which is scale invariant (the same for corresponding regions, even if they are at different scales) Take a local maximum of this function f Image 1 f Image 2 scale = 1/2 Antonio Torralba s 1 s 2 region size region size

Scale Invariant Detection A good function for scale detection: has one stable sharp peak f Bad region size f Bad region size f Good! region size For usual images: a good function would be a one which responds to contrast (sharp local intensity change) Antonio Torralba

Scale Invariant Detection Functions for determining scale Kernels: (Laplacian: 2nd derivative of Gaussian) (Difference of Gaussians) where Gaussian Note: both kernels are invariant to scale and rotation Darya Frolova, Denis Simakov

K.Mikolajczyk, C.Schmid. Indexing Based on Scale Invariant Interest Points. ICCV 2001 Laplacian Scale Invariant Detectors Harris-Laplacian Find local maximum of: Harris corner detector in space (image coordinates) Laplacian in scale scale y Harris x

Describing images with features: Feature description

Raw patches as local descriptors The simplest way to describe the neighborhood around an interest point is to write down the list of intensities to form a feature vector. But this is very sensitive to even small shifts, rotations. Kristen Grauman

Geometric transformations Kristen Grauman e.g. scale, translation, rotation

Photometric transformations Kristen Grauman, Tinne Tuytelaars

SIFT descriptor [Lowe 2004] Use histograms to bin pixels within sub-patches according to their orientation. 0 2p Kristen Grauman

Making the descriptor rotation invariant CSE 576: Computer Vision Rotate the patch according to its dominant gradient orientation This puts the patches into a canonical orientation Kristen Grauman, Matthew Brown

Image features: Histograms of oriented gradients (HOG) Bin gradients from 8x8 pixel neighborhoods into 9 orientations Deva Ramanan (Dalal & Triggs CVPR 05)

http://web.mit.edu/vondrick/ihog/ What is this?

Kristen Grauman Filter Banks

Image from http://www.texasexplorer.com/austincap2.jpg Kristen Grauman

Kristen Grauman Showing magnitude of responses

Kristen Grauman

Can you match the texture to the response? Filters A 1 B 2 C 3 Derek Hoiem Mean responses

Representing texture by mean response Filters Derek Hoiem Mean responses

[r1, r2,, r38] We can form a feature vector from the list of responses at each pixel. Kristen Grauman

Shape Context Belongie, Malik and Puzicha, PAMI 2002 Representation of the local shape around a feature location as histogram of edge points in an image relative to that location. Computed by counting edge points in log polar space. Tamara Berg

Color Histograms Representation of the distribution of colors in an image, derived by counting the number of pixels of each of given set of color ranges in a typically (3D) color space (RGB, HSV etc). Tamara Berg

Gist Oliva and Torralba, IJCV 2001 Captures the global energy of the scene. Computes edge orientation responses for multiple orientations and scales. Tamara Berg

Describing images with features: Feature matching

Correspondence and alignment Correspondence: matching points, patches, edges, or regions across images James Hays

Correspondence and alignment Alignment: find the parameters of the transformation that best align matched points Fitting: find the parameters of a model that best fit the data James Hays

Hough Transform P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959 Given a set of points, find the curve or line that explains the data points best y m x Hough space b y = m x + b m = (y b) / x James Hays, Silvio Savarese

Hough Transform y m x b y m 3 5 3 3 2 2 3 7 11 10 4 3 2 3 1 4 5 2 2 1 0 1 3 3 James Hays, Silvio Savarese x b

RANSAC (RANdom SAmple Consensus) : Fischler & Bolles in 81. Algorithm: 1. Sample (randomly) the number of points required to fit the model 2. Solve for model parameters using samples 3. Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence James Hays, Silvio Savarese

RANSAC Line fitting example Algorithm: 1. Sample (randomly) the number of points required to fit the model (#=2) 2. Solve for model parameters using samples 3. Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence James Hays, Silvio Savarese

RANSAC Line fitting example N I 6 Algorithm: 1. Sample (randomly) the number of points required to fit the model (#=2) 2. Solve for model parameters using samples 3. Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence James Hays, Silvio Savarese

RANSAC Algorithm: 1. Sample (randomly) the number of points required to fit the model (#=2) 2. Solve for model parameters using samples 3. Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence N I 14 James Hays, Silvio Savarese

Example: solving for translation A 1 A 2 A 3 B 1 B 2 B 3 Given matched points in {A} and {B}, estimate the translation of the object y x A i A i B i B i t t y x y x Derek Hoiem

Example: solving for translation A 1 A 2 A 3 B 1 B 2 B 3 Least squares solution y x A i A i B i B i t t y x y x (t x, t y ) 1. Write down objective function 2. Write in form Ax=b 3. Solve using pseudo-inverse or eigenvalue decomposition A n B n A n B n A B A B y x y y x x y y x x t t 1 1 1 1 1 0 0 1 1 0 0 1 Derek Hoiem

Example: solving for translation B 4 A 1 B 5 B 6 A 2 A B 1 3 (t x, t y ) A 4 B 2 B 3 A 5 A 6 Problem: outliers, multiple objects, and/or many-to-one matches Derek Hoiem Hough transform solution 1. Initialize a grid of parameter values 2. Each matched pair casts a vote for consistent values 3. Find the parameters with the most votes 4. Solve using least squares with inliers x y B i B i x y A i A i t t x y

Example: solving for translation A 1 A 5 B 4 A 2 A B 1 3 (t x, t y ) A 4 B 2 B 3 B 5 Problem: outliers RANSAC solution 1. Sample a set of matching points (1 pair) 2. Solve for transformation parameters 3. Score parameters with number of inliers 4. Repeat steps 1-3 N times x y B i B i x y A i A i t t x y Derek Hoiem

Local features: main components 1) Detection: Identify the interest points 2) Description: Extract vector feature descriptor surrounding each interest point x (1) [ x,, x (1) 1 1 d ] 3) Matching: Determine correspondence between descriptors in two views x (2) [ x,, x (2) 2 1 d ] Kristen Grauman

Next Time Classification and detection Adriana s research

CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015