Visual Object Recognition

Similar documents
CS5670: Computer Vision

Lecture 8: Interest Point Detection. Saad J Bedros

Feature detectors and descriptors. Fei-Fei Li

Scale & Affine Invariant Interest Point Detectors

Corners, Blobs & Descriptors. With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros

Feature detectors and descriptors. Fei-Fei Li

EE Camera & Image Formation

Overview. Harris interest points. Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points

Single Exposure Enhancement and Reconstruction. Some slides are from: J. Kosecka, Y. Chuang, A. Efros, C. B. Owen, W. Freeman

CSE 473/573 Computer Vision and Image Processing (CVIP)

CSE 252B: Computer Vision II

Detectors part II Descriptors

Lecture 12. Local Feature Detection. Matching with Invariant Features. Why extract features? Why extract features? Why extract features?

CS4495/6495 Introduction to Computer Vision. 3D-L3 Fundamental matrix

Advances in Computer Vision. Prof. Bill Freeman. Image and shape descriptors. Readings: Mikolajczyk and Schmid; Belongie et al.

Overview. Introduction to local features. Harris interest points + SSD, ZNCC, SIFT. Evaluation and comparison of different detectors

3D from Photographs: Camera Calibration. Dr Francesco Banterle

Camera calibration. Outline. Pinhole camera. Camera projection models. Nonlinear least square methods A camera calibration tool

Vlad Estivill-Castro (2016) Robots for People --- A project for intelligent integrated systems

Multiple View Geometry in Computer Vision

INTEREST POINTS AT DIFFERENT SCALES

Blob Detection CSC 767

Properties of detectors Edge detectors Harris DoG Properties of descriptors SIFT HOG Shape context

SIFT: Scale Invariant Feature Transform

Feature extraction: Corners and blobs

SIFT keypoint detection. D. Lowe, Distinctive image features from scale-invariant keypoints, IJCV 60 (2), pp , 2004.

Camera calibration by Zhang

LoG Blob Finding and Scale. Scale Selection. Blobs (and scale selection) Achieving scale covariance. Blob detection in 2D. Blob detection in 2D

Edges and Scale. Image Features. Detecting edges. Origin of Edges. Solution: smooth first. Effects of noise

Telescopes (Chapter 6)

Tracking for VR and AR

Achieving scale covariance

Lecture 8: Interest Point Detection. Saad J Bedros

Introduction to pinhole cameras

Rotational Invariants for Wide-baseline Stereo

Optical/IR Observational Astronomy Telescopes I: Optical Principles. David Buckley, SAAO. 24 Feb 2012 NASSP OT1: Telescopes I-1

Lecture 5. Epipolar Geometry. Professor Silvio Savarese Computational Vision and Geometry Lab. 21-Jan-15. Lecture 5 - Silvio Savarese

Advanced Features. Advanced Features: Topics. Jana Kosecka. Slides from: S. Thurn, D. Lowe, Forsyth and Ponce. Advanced features and feature matching

Final Exam Due on Sunday 05/06

CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015

Affine Adaptation of Local Image Features Using the Hessian Matrix

Extract useful building blocks: blobs. the same image like for the corners

6.869 Advances in Computer Vision. Prof. Bill Freeman March 1, 2005

Camera Models and Affine Multiple Views Geometry

What is Motion? As Visual Input: Change in the spatial distribution of light on the sensors.

Video and Motion Analysis Computer Vision Carnegie Mellon University (Kris Kitani)

Maximally Stable Local Description for Scale Selection

Blobs & Scale Invariance

Invariant local features. Invariant Local Features. Classes of transformations. (Good) invariant local features. Case study: panorama stitching

Parameterizing the Trifocal Tensor

Trinocular Geometry Revisited

The state of the art and beyond

Position & motion in 2D and 3D

Overview. Introduction to local features. Harris interest points + SSD, ZNCC, SIFT. Evaluation and comparison of different detectors

Optimisation on Manifolds

Computer Vision Motion

Image Analysis. Feature extraction: corners and blobs

Multiple View Geometry in Computer Vision

Motion Estimation (I) Ce Liu Microsoft Research New England

Radiometry. Nuno Vasconcelos UCSD

Lecture 2. September 13, 2018 Coordinates, Telescopes and Observing

Given a feature in I 1, how to find the best match in I 2?

Interest Operators. All lectures are from posted research papers. Harris Corner Detector: the first and most basic interest operator

Lecture 7: Finding Features (part 2/2)

Multiple View Geometry in Computer Vision

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!

Calibration Goals and Plans

CHAPTER IV INSTRUMENTATION: OPTICAL TELESCOPE

Local Features (contd.)

Commissioning of the Hanle Autoguider

Uncertainty Models in Quasiconvex Optimization for Geometric Reconstruction

AQA Physics /7408

Astronomy 114. Lecture 26: Telescopes. Martin D. Weinberg. UMass/Astronomy Department

CSE 252B: Computer Vision II

Wavelet-based Salient Points with Scale Information for Classification

Agenda Announce: Visions of Science Visions of Science Winner

Chapter 6 Telescopes: Portals of Discovery

What is Image Deblurring?

Lecture 7: Finding Features (part 2/2)

CE 59700: Digital Photogrammetric Systems

Multi-Frame Factorization Techniques

EPIPOLAR GEOMETRY WITH MANY DETAILS

Lecture Fall, 2005 Astronomy 110 1

7. Telescopes: Portals of Discovery Pearson Education Inc., publishing as Addison Wesley

Optical Flow, Motion Segmentation, Feature Tracking

C4B Computer Vision Question Sheet 2 answers

Telescopes. Lecture 7 2/7/2018

TRACKING and DETECTION in COMPUTER VISION Filtering and edge detection

Homographies and Estimating Extrinsics

Generalized Laplacian as Focus Measure

The structure tensor in projective spaces

ECE 468: Digital Image Processing. Lecture 8

ABOUT SPOTTINGSCOPES Background on Telescopes

SURF Features. Jacky Baltes Dept. of Computer Science University of Manitoba WWW:

Lecture 6: Finding Features (part 1/2)

Chapter 6 Telescopes: Portals of Discovery. Agenda. How does your eye form an image? Refraction. Example: Refraction at Sunset

A geometric interpretation of the homogeneous coordinates is given in the following Figure.

Telescopes, Observatories, Data Collection

Conformal Clifford Algebras and Image Viewpoints Orbit

Transcription:

Visual Object Recognition Lecture 2: Image Formation Per-Erik Forssén, docent Computer Vision Laboratory Department of Electrical Engineering Linköping University

Lecture 2: Image Formation Pin-hole, and thin lens cameras Projective geometry, lens distortion, vignetting, intensity, colour Geometric and Photometric Invariance Colour constancy, colour spaces, affine illumination model, homographies, epipolar geometry, canonical frames

The Pin-Hole Camera A brightly illuminated scene will be projected onto a wall opposite of the pin-hole. The image is rotated 180º.

The Pin-Hole Camera Optical Centre ( ) X= X Y Y Z y x f x y z Z X From similar triangles we get: x = f X Z y = f Y Z 2 3 x 4y5 = 1 2 f 0 3 0 40 f 05 0 0 1 2 3 X 4Y 5 Z

The Pin-Hole Camera 2 3 x 4y5 = 1 2 f 0 3 0 40 f 05 0 0 1 2 3 X 4Y 5 Z More generally, we write: 2 3 x 4y5 = 1 2 3 2 3 f s c x X 40 fa c 5 4 y Y 5 0 0 1 Z f-focal length, s-skew, a-aspect ratio, c-projection of optical centre

The Pin-Hole Camera 2 3 x 4y5 = 1 2 3 2 3 f s c x X 40 fa c 5 4 y Y 5 0 0 1 Z Motivation: Optical Centre z Image Plane f-focal length, s-skew, a-aspect ratio, c-projection of optical centre y f O FOV o FOV x Image Grid w c ( x c y ) h

The Pin-Hole Camera 2 3 x 4y5 1 {z} x = 2 3 2 3 f s c x X 40 fa c 5 4 y Y 5 0 0 {z 1 Z } {z } K X, x K X Motivation: Optical Centre z Image Plane f-focal length, s-skew, a-aspect ratio, c-projection of optical centre y f O FOV o FOV x Image Grid w c ( x c y ) h

The Pin-Hole Camera 2 3 2 3 2 3 x 4y 5 1 {z} x = f s c x 40 fa c y 0 0 1 Also normalized image coordinates: u = 5 {z } K 2 3 X/Z 4Y/Z5 1 X 4Y 5 Z {z } X, x K X x = Ku K X u = K 1 x

The Pin-Hole Camera For a general position of the world coordinate system (WCS) we have: u 2 2 3 r 11 r 12 r 13 t 1 4r 21 r 22 r 23 t 5 2 r 31 r 32 r 33 {z t 3 } [R t] X 6Y 4Z 3 7 5 1 {z } X

The Pin-Hole Camera For a general position of the world coordinate system (WCS) we have: u 2 2 3 r 11 r 12 r 13 t 1 4r 21 r 22 r 23 t 5 2 r 31 r 32 r 33 {z t 3 } [R t] X 6Y 4Z 3 7 5 1 {z } X, u [R t]x

The Pin-Hole Camera For a general position of the world coordinate system (WCS) we have: u and thus 2 2 3 r 11 r 12 r 13 t 1 4r 21 r 22 r 23 t 5 2 r 31 r 32 r 33 {z t 3 } [R t] x K[R t]x X 6Y 4Z 3 7 5 1 {z } X, u [R t]x

Epipolar geometry The epipolar geometry of two cameras: epipolar plane ( X= X ) Y Z o 1 o 2 e e 2 baseline 1 camera 1 camera 2 x 1 epipolar lines x 2 e 1, e 2 are called epipoles. o 1,o 2 are the optical centres.

Homographies For a planar object, we can imagine a world coordinate system fixed to the object y x f Optical Centre x y z Z X Y ( X= X ) Y 0 2 3 x 4y5 = K 1 2 3 r 11 r 12 r 13 t 1 4r 21 r 22 r 23 t 5 2 r 31 r 32 r 33 t 3 2 X 6Y 4Z 1 3 7 5

Homographies For a planar object, we can imagine a world coordinate system fixed to the object y x f Optical Centre x y z Z X Y ( X= X ) Y 0 2 3 x 4y5 = K 1 2 3 2 r 11 r 12 t 1 4r 21 r 22 t 5 2 r 31 r 32 t 3 X 4Y 1 3 5 = 2 3 h 11 h 12 h 13 4h 21 h 22 h 5 23 h 31 h 32 h 33 {z } H 2 3 X 4Y 5 1

Homographies Projections into two cameras: As the homography is invertible, we can now map from camera 2 to the object and on to camera 1:

Epipolar Geometry So in general, two view geometry only tells us that a corresponding point lies somewhere along a line. In practice, we often know more, as objects often have planar, or near planar surfaces. i.e., we are close to the homography case. Also: If the views have a short relative baseline, we can use even more simple models.

Thin Lens Camera An actual pinhole lets in too little light, and a bigger hole blurs the picture. Real cameras instead use lenses to obtain a sharp image using more light.

Thin Lens Camera A thin lens is a (positive) lens with Focal point f Optical Centre d << f Parallel rays converge at the focal points d Rays through the optical centre are not refracted

Thin Lens Camera Image plane Focal point Optical Centre Focal point f d Z l Thin lens relation (from similar triangles): 1 f = 1 Z + 1 l

Thin Lens Camera Image plane Focal point Optical Centre Focal point l f d Z Focus at one depth only. Objects at other depths are blurred.

Thin Lens Camera Image plane Aperture Focal point Optical Centre Focal point l f Adding an aperture increases the depth-of-field, the range which is sharp in the image. A compromise between pinhole and thin lens. d Z

Lens distortion Correct Barrel distortion Pin-cushion distortion For zoom lenses: - Barrel at wide FoV - pin-cushion at narrow FoV

Lens distortion ) Correct image Distorted Modelling x Kf(u, 0 ) Used in optimisation such as BA

Lens distortion ) Distorted image Correct Rectification x 0 f 1 (Ku, ) Used in dense stereo

Lens Effects Correct Darkened periphery Vignetting and cos 4 -law - more severe on wide angle cameras

Lens Effects Vignetting cos 4 -law dampening with cos 4 (w) http://software.canon-europe.com/files/documents/ef_lens_work_book_10_en.pdf

Image intensity Sensor activation is linear Z a(x) = s( )r(, x)e( )d s-sensor absorption spectrum, r-reflectance spectrum of object, e- emission spectrum of light source (attenuated by the atmosphere)

Image intensity Sensor activation is linear Z a(x) = s( )r(, x)e( )d s-sensor absorption spectrum, r-reflectance spectrum of object, e- emission spectrum of light source (attenuated by the atmosphere) However, most images are gamma corrected i(x) =a(x)

Image intensity Sensor activation is linear Z a(x) = s( )r(, x)e( )d HVS handles a 10 10 dynamic range on Exposure time and aperture size also scale the activation. Unless we know all aspects of image formation, we cannot trust the absolute intensity value.

Colour Colour perception is done using three different activation functions http://publiclab.org/wiki/ndvi-plots-ir-kit

Colour Colour perception is done using three different activation functions Sensor activation is not colour. Colour is an object property, i.e. a representation of. In order to estimate colour, we need to somehow compensate for the illumination.

Invariance Transformations varying illumination Two categories of nuisance factors for recognition/matching varying camera pose

Invariance Transformations varying illumination varying camera pose For matching we need either to know the changes, or an invariance transformation Ideally, an invariance transformation should keep information intrinsic to the object, but remove all influence from the imaging process

Invariance Transformations Photometric invariance gives robustness to illumination changes Geometric invariance gives robustness to view changes varying illumination varying camera pose

Colour Constancy http://web.mit.edu/persci/people/adelson/checkershadow_illusion.html

Colour Constancy http://web.mit.edu/persci/people/adelson/checkershadow_illusion.html

Colour Constancy The colours we perceive are not the activations of cones in the retina. Colour constancy is an attempt by the HVS to transform the retinal activation into a normalized (white) reference illumination. Complex process that takes place at many levels (retina, V2, ) and uses high level information (e.g. known object colour). White balancing in cameras is a low-level technical equivalent.

Colour Constancy Object based colour transfer If you know what you are looking at, you also know something about the illumination Forssén & Moe, View Matching with Blob Features, JIVC 09

Photometric invariance I(x) =k 1 I 0 (x) J(x) =k 2 I 0 (x) Input If illumination changes, direct matching fails X I(x) J(x) = large x

Photometric invariance I(x) =k 1 I 0 (x) J(x) =k 2 I 0 (x) Input If illumination changes, direct matching fails X I(x) J(x) = large x We seek a function that is invariant to scalings X f(i(x)) f(j(x)) = small x

Photometric invariance For cameras with non non-linear radiometric response (and e.g. gamma correction), or if two different cameras are used we may use the affine model: I(x) =k 1 I 0 (x)+k 0 How should we choose f? we want: X x f(i(x)) f(j(x)) = small

Photometric invariance Mean subtraction, derivatives, and other DC free linear filters remove a constant offset in intensity Normalising a patch by e.g. the standard deviation, removes scalings of the intensity. Affine invariance by combining both: f(i(x)) = (I(x) µ I )/ I µ I = mean of patch I = std of patch

Photometric invariance Illustration Input Gradient Normalised gradient Normalised input I(x) J(x)

Other colour spaces Transform each pixel separately Move intensity change into one dimension. E.g. HSV space. In matching, V is then downweighted (or discarded)

Other colour spaces Other popular choices are the CIE Lab space (left) and the Yuv space. Colour space changes do not handle changes in the illumination colour.

Geometric invariance The geometric invariances we use make a locally planar assumption. (see also today s paper) They can thus be described using homographies.

Geometric invariance A homography is a transformation between points x on one plane, and points y on another. 2 3 2 3 2 3 2 4 y 1 y 2 1 5 = H 4 x 1 x 2 1 5 = h 11 h 12 h 13 4h 21 h 22 h 23 h 31 h 32 h 33 At most 8 degrees-of-freedom (dof) as H and kh,k 2 R \ 0 define the same transformation See e.g. R. Hartley and A. Zisserman, Multiple View Geometry for Computer Vision 5 3 x 1 4x 5 2 1

Geometric invariance A hierarchy of transformations: scale+translation (3dof) similarity (4dof) (scale+translation+rotation) affine (6dof) (similarity+skew) plane projective (8dof) (affine+forshortening) 2 3 s 0 t 1 40 s t 5 2 0 0 1 2 4 3 c s t 1 s c t 2 5 0 0 1 2 3 a 11 a 12 t 1 4a 21 a 22 t 2 5 0 0 1 2 3 h 11 h 12 h 13 4h 21 h 22 h 23 5 h 31 h 32 h 33

Canonical Frames Aka. covariant frames, and invariant frames. Resample patches to canonical frame. Points from e.g. Harris detector, or DoG maxima.

Canonical Frames After resampling matching is much easier!

Canonical Frames Combinatoral issues From Harris or DoG we get images full of keypoints.

Canonical Frames Combinatoral issues From Harris or DoG we get images full of keypoints. Using the points, we want to generate frames in both reference and query view and match them. We don t want to miss a combination in one of the images, but we don t want to generate too many combinations either.

Canonical Frames Solutions: Use each point as a reference point. Restrict frame construction to k-nearest neighbours in scale space (or image plane). Remove duplicate groupings, and reflections.

Summary Photometric invariance is needed to handle changes in illumination. Common approaches: colour constancy, other colour spaces, and normalisation Geometric invariance handles changes in object pose A locally planar assumption is very useful for geometric invariance

Discussion Questions/comments on paper: M. Brown, D. Lowe, "Invariant Features from Interest Point Groups", BMVC 2002 PhD students only, round robin scheduling