Structure from Motion. CS4670/CS Kevin Matzen - April 15, 2016

Similar documents
3D from Photographs: Camera Calibration. Dr Francesco Banterle

Pose estimation from point and line correspondences

Camera calibration Triangulation

CS4495/6495 Introduction to Computer Vision. 3D-L3 Fundamental matrix

Multiple View Geometry in Computer Vision

CS 231A Computer Vision (Fall 2011) Problem Set 2

Multiview Geometry and Bundle Adjustment. CSE P576 David M. Rosen

Camera calibration by Zhang

Augmented Reality VU Camera Registration. Prof. Vincent Lepetit

Lecture 5. Epipolar Geometry. Professor Silvio Savarese Computational Vision and Geometry Lab. 21-Jan-15. Lecture 5 - Silvio Savarese

Camera Calibration The purpose of camera calibration is to determine the intrinsic camera parameters (c 0,r 0 ), f, s x, s y, skew parameter (s =

Camera Models and Affine Multiple Views Geometry

Vision 3D articielle Session 2: Essential and fundamental matrices, their computation, RANSAC algorithm

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!

Theory of Bouguet s MatLab Camera Calibration Toolbox

CE 59700: Digital Photogrammetric Systems

Video and Motion Analysis Computer Vision Carnegie Mellon University (Kris Kitani)

Camera calibration. Outline. Pinhole camera. Camera projection models. Nonlinear least square methods A camera calibration tool

CMPSCI 250: Introduction to Computation. Lecture #29: Proving Regular Language Identities David Mix Barrington 6 April 2012

OPPA European Social Fund Prague & EU: We invest in your future.

A Practical Method for Decomposition of the Essential Matrix

M3: Multiple View Geometry

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Robust Camera Location Estimation by Convex Programming

Mean-Shift Tracker Computer Vision (Kris Kitani) Carnegie Mellon University

Camera Projection Model

Structure from Motion. Read Chapter 7 in Szeliski s book

Photometric Stereo: Three recent contributions. Dipartimento di Matematica, La Sapienza

A Study of Kruppa s Equation for Camera Self-calibration

Image enhancement. Why image enhancement? Why image enhancement? Why image enhancement? Example of artifacts caused by image encoding

Cross Product Angular Momentum

CSE 554 Lecture 6: Deformation I

Reverse engineering using computational algebra

Tracking for VR and AR

EE595A Submodular functions, their optimization and applications Spring 2011

Final Exam Due on Sunday 05/06

Visual Object Recognition

CSE 252B: Computer Vision II

Affine and Perspective Warping (Geometric Transforms)

Math 0290 Midterm Exam

Introduction to pinhole cameras

EE Camera & Image Formation

Mathematical Methods - Lecture 9

Intro Vectors 2D implicit curves 2D parametric curves. Graphics 2011/2012, 4th quarter. Lecture 2: vectors, curves, and surfaces

Feature detectors and descriptors. Fei-Fei Li

EPIPOLAR GEOMETRY WITH MANY DETAILS

Blob Detection CSC 767

Multi-Frame Factorization Techniques

Bézier Curves and Splines

Principal Component Analysis and Linear Discriminant Analysis

Consensus Algorithms for Camera Sensor Networks. Roberto Tron Vision, Dynamics and Learning Lab Johns Hopkins University

Feature detectors and descriptors. Fei-Fei Li

What s for today. Random Fields Autocovariance Stationarity, Isotropy. c Mikyoung Jun (Texas A&M) stat647 Lecture 2 August 30, / 13

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

First Derivative Test

Affine invariant Fourier descriptors

Corners, Blobs & Descriptors. With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros

Galaxies in Pennsylvania. Bernstein, Jarvis, Nakajima, & Rusin: Implementation of the BJ02 methods

Intro Vectors 2D implicit curves 2D parametric curves. Graphics 2012/2013, 4th quarter. Lecture 2: vectors, curves, and surfaces

Final Examination. CS 205A: Mathematical Methods for Robotics, Vision, and Graphics (Fall 2013), Stanford University

EECS150 - Digital Design Lecture 15 SIFT2 + FSM. Recap and Outline

Affine Adaptation of Local Image Features Using the Hessian Matrix

Gradient-domain image processing

Machine Learning Linear Models

Multicore Bundle Adjustment - Supplemental Material

Visual SLAM Tutorial: Bundle Adjustment

Quality Report Generated with Pix4Dmapper Pro version

Transformations. Lars Vidar Magnusson. August 24,

Maxima and Minima. (a, b) of R if

Outline. Recall... Limits. Problem Solving Sessions. MA211 Lecture 4: Limits and Derivatives Wednesday 17 September Definition (Limit)

Image Processing 1 (IP1) Bildverarbeitung 1

CS 188: Artificial Intelligence Spring Announcements

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

1 Lecture 8: Interpolating polynomials.

Lecture 7: Positive Semidefinite Matrices

Computer Vision Motion

Making Minimal Problems Fast

Image Analysis. Feature extraction: corners and blobs

Robert Collins CSE486, Penn State. Lecture 25: Structure from Motion

Mechanics Physics 151

Vectors Coordinate frames 2D implicit curves 2D parametric curves. Graphics 2008/2009, period 1. Lecture 2: vectors, curves, and surfaces

Homogeneous Transformations

5.2 Infinite Series Brian E. Veitch

CH.3. COMPATIBILITY EQUATIONS. Multimedia Course on Continuum Mechanics

Parametric Equations, Function Composition and the Chain Rule: A Worksheet

M E 320 Professor John M. Cimbala Lecture 10

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Lecture 16: Projection and Cameras. October 17, 2017

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Sparse Levenberg-Marquardt algorithm.

THE EULER CHARACTERISTIC OF A LIE GROUP

Properties of surfaces II: Second moment of area

Introduction to Machine Learning

Motion Estimation (I)

Camera Calibration. (Trucco, Chapter 6) -Toproduce an estimate of the extrinsic and intrinsic camera parameters.

MAT137 Calculus! Welcome!

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Lecture 5: Moment generating functions

Math Lecture 18 Notes

Transcription:

Structure from Motion CS4670/CS5670 - Kevin Matzen - April 15, 2016 Video credit: Agarwal, et. al. Building Rome in a Day, ICCV 2009

Roadmap What we ve seen so far Single view modeling (1 camera) Stereo modeling (2 cameras) Multi-view stereo (3+ cameras) How do we recover camera parameters necessary for MVS?

Wednesday s Lecture Assume we are always given the camera calibration. f1 T1 T2 f2 y x

Today s Lecture Assume we are always never given the camera calibration.???? y x

Calibration makes 3D reasoning possible! f1 x1 z x2 f2 1 2 b

Today s outline How can we calibrate our cameras? How can we calibrate a camera without photos of a calibration target? How can we automate this calibration at scale?

Projection Model

Projection Model Some 3D world-space point

Projection Model A 2D image-space projection Some 3D world-space point

Projection Model Calibration gives us these A 2D image-space projection Some 3D world-space point

Camera Calibration

Camera Calibration

Camera Calibration y (10, 12, 0) (0, 0, 0) x

DLT Method

DLT Method

DLT Method

DLT Method

Question: Is a single plane enough?

Question: Is a single plane enough? Assume plane is at Z = 0 (rotate and translate coordinates to make it so) 0 0 0 0

Question: Is a single plane enough? 0 0 0 0 Columns are all 0 > Rank is at most 9 No, calibration target cannot be planar with DLT method. But we can combine many planes.

Non-Linear Method DLT method does not automatically give decomposition into extrinsics and intrinsics May wish to impose additional constraints on camera model (e.g. isotropic focal length, square pixels) Non-linearities such as radial distortion are not easily modeled with DLT

2 4 u i w i v i w i w i 3 5 = 2 4 f x 0 c x 0 f y c y 0 0 1 3 5 2 4 r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 r 33 t z 3 5 2 6 4 x i y i z i 1 3 7 5

2 4 u i w i v i w i w i 3 5 = 2 4 f x 0 c x 0 f y c y 0 0 1 3 5 2 4 r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 r 33 t z 3 5 2 6 4 x i y i z i 1 3 7 5 3D world-space point

2 4 u i w i v i w i w i 3 5 = 2 4 f x 0 c x 0 f y c y 0 0 1 3 5 2 4 r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 r 33 t z 3 5 2 6 4 x i y i z i 1 3 7 5 Rotate and translate point into camera space 3D world-space point

2 4 u i w i v i w i w i 3 5 = 2 4 f x 0 c x 0 f y c y 0 0 1 3 5 2 4 r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 r 33 t z 3 5 2 6 4 x i y i z i 1 3 7 5 Project point into image plane Rotate and translate point into camera space 3D world-space point

2 4 u i w i v i w i w i 3 5 = 2 4 f x 0 c x 0 f y c y 0 0 1 3 5 2 4 r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 r 33 t z 3 5 2 6 4 x i y i z i 1 3 7 5 Project point into image plane Rotate and translate point into camera space 3D world-space point apple ui w i w i = Let s work through a simpler 2D version apple f c 0 1 apple cos( ) sin( ) tx sin( ) cos( ) t y 2 4 x i y i 1 3 5

apple ui w i w i = apple f c 0 1 apple cos( ) sin( ) tx sin( ) cos( ) t y 2 4 x i y i 1 3 5 2D point 1D projection

apple ui w i w i = apple f c 0 1 apple cos( ) sin( ) tx sin( ) cos( ) t y 2 4 x i y i 1 3 5 apple ui w i w i = apple f c 0 1 apple cos( )xi sin( )y i + t x sin( )x i + cos( )y i + t y

apple ui w i w i = apple f c 0 1 apple cos( ) sin( ) tx sin( ) cos( ) t y 2 4 x i y i 1 3 5 apple ui w i w i = apple f c 0 1 apple cos( )xi sin( )y i + t x sin( )x i + cos( )y i + t y u i w i w i = apple f(cos( )xi sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y

apple ui w i w i = apple f c 0 1 apple cos( ) sin( ) tx sin( ) cos( ) t y 2 4 x i y i 1 3 5 apple ui w i w i = apple f c 0 1 apple cos( )xi sin( )y i + t x sin( )x i + cos( )y i + t y u i w i w i = apple f(cos( )xi sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y u i = f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y L(f,c,,t x,t y )= X i (u i h(f,c,,t x,t y,x i,y i )) 2

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y L(f,c,,t x,t y )= X i (u i h(f,c,,t x,t y,x i,y i )) 2 argmin L(f,c,,t x,t y ) f,c,,t x,t y

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y L(f,c,,t x,t y )= X i (u i h(f,c,,t x,t y,x i,y i )) 2 argmin L(f,c,,t x,t y ) f,c,,t x,t y Apply non-linear optimization method. Exercise: Derive @L @f, @L @c, @L @, @L, @L @t x @t y

What if we don t have a target?

What if we don t have a target? The world is our calibration target!

What if we don t have a target? The world is our calibration target! But we don t know the position of all points in the world.

Structure from Motion Key goals of SfM: Use approximate camera calibrations to match features and triangulate approximate 3D points Use approximate 3D points to improve approximate camera calibrations Chicken-and-egg problem Can extend and use our non-linear optimization framework Requires a good initialization

SfM building blocks What do we need from our CV toolbox? Keypoint detection Descriptor matching F-matrix estimation Ray triangulation Camera projection Non-linear optimization Useful metadata Focal length guess (EXIF tags)

Given: 1 2 Images 1 and 2 Focal length guesses

1. Compute feature 1 2 matches and F- matrix

2. Use approx K s 1 to get E-matrix 2 E = K2 T FK1

3. Decompose E 1 into relative pose 2 E = R[t]x

1 4. Triangulate features 2

1 5. Apply non-linear optimization 2

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y L(f,c,,t x,t y )= X i (u i h(f,c,,t x,t y,x i,y i )) 2 argmin L(f,c,,t x,t y ) f,c,,t x,t y

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y (f,c,,t x,t y, (x 1,y 1 ),...,(x n,y n )) = X i (u i h(f,c,,t x,t y,x i,y i )) 2 argmin f,c,,t x,t y,(x 1,y 1 ),...,(x n,y n ) L(f,c,,t x,t y, (x 1,y 1 ),...,(x n,y n )) Doesn t make sense for 1 camera

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y L(K 1,...,K m, (x 1,y 1 ),...,(x n,y n )) = X i X j w i,j (u i h(k j, (x i,y i ))) 2 argmin K 1,...,K m,(x 1,y 1 ),...,(x n,y n ) L(K 1,...,K m, (x 1,y 1 ),...,(x n,y n )) Called Bundle Adjustment

Camera sets can be incrementally built up Essential matrix

Camera sets can be incrementally built up Perspective n Point method

Camera sets can be incrementally built up Perspective n Point method

Camera sets can be incrementally built up Perspective n Point method

Dubrovnik - Incremental Bundle Adjustment

Dubrovnik

Sacré-Cœur

SfM Ambiguities x = PX

SfM Ambiguities x = PX x =(PQ)(Q 1 X)

SfM Ambiguities x = PX x =(PQ)(Q 1 X) x = K(TQ)(Q 1 X)

SfM Ambiguities x = PX x =(PQ)(Q 1 X) x = K(TQ)(Q 1 X) T is a rigid body transformation If we want TQ to be a RBT, then Q could be an RBT > If we rotate and translate all the points, everything works out if we rotate and translate all the cameras.

SfM Ambiguities x = PX x =(PS 1 )(SX)

SfM Ambiguities x = PX x =(PS 1 )(SX) x = K(TS 1 )(SX)

SfM Ambiguities x = PX x =(PS 1 )(SX) x = K(TS 1 )(SX) x = K(S 1 TT 0 )(SX) x =(KS 1 )(TT 0 )(SX)

SfM Ambiguities x = PX x =(PS 1 )(SX) x = K(TS 1 )(SX) x = K(S 1 TT 0 )(SX) x =(KS 1 )(TT 0 )(SX) x =(S 1 K)(TT 0 )(SX) Sx = K(TT 0 )(SX)

SfM Ambiguities Sx = Sx = K(TT 0 )(SX) 2 3 2 3 suw uw 4 svw 5 = 4 vw 5 = x sw w

SfM Ambiguities Sx = Sx = K(TT 0 )(SX) 2 3 2 3 suw uw 4 svw 5 = 4 vw 5 = x sw w > If we scale all the points, everything works out if we move the camera positions.

SfM Ambiguities x = PX x =(PQ)(Q 1 X) In this case Q is a general similarity transform. We resolve the ambiguity often by placing one camera at the origin facing some direction and a second camera at fixed offset from the first.

Applications

Internet-scale 3D

Snavely, et. al. Finding Paths through the World's Photos. SIGGRAPH 2008.