Structure from Motion. CS4670/CS Kevin Matzen

Structure from Motion CS4670/CS5670 - Kevin Matzen - April 15, 2016 Video credit: Agarwal, et. al. Building Rome in a Day, ICCV 2009

Roadmap What we ve seen so far Single view modeling (1 camera) Stereo modeling (2 cameras) Multi-view stereo (3+ cameras) How do we recover camera parameters necessary for MVS?

Wednesday s Lecture Assume we are always given the camera calibration. f1 T1 T2 f2 y x

Today s Lecture Assume we are always never given the camera calibration.???? y x

Calibration makes 3D reasoning possible! f1 x1 z x2 f2 1 2 b

Today s outline How can we calibrate our cameras? How can we calibrate a camera without photos of a calibration target? How can we automate this calibration at scale?

Projection Model

Projection Model Some 3D world-space point

Projection Model A 2D image-space projection Some 3D world-space point

Projection Model Calibration gives us these A 2D image-space projection Some 3D world-space point

Camera Calibration

Camera Calibration y (10, 12, 0) (0, 0, 0) x

DLT Method

Question: Is a single plane enough?

Question: Is a single plane enough? Assume plane is at Z = 0 (rotate and translate coordinates to make it so) 0 0 0 0

Question: Is a single plane enough? 0 0 0 0 Columns are all 0 > Rank is at most 9 No, calibration target cannot be planar with DLT method. But we can combine many planes.

Non-Linear Method DLT method does not automatically give decomposition into extrinsics and intrinsics May wish to impose additional constraints on camera model (e.g. isotropic focal length, square pixels) Non-linearities such as radial distortion are not easily modeled with DLT

2 4 u i w i v i w i w i 3 5 = 2 4 f x 0 c x 0 f y c y 0 0 1 3 5 2 4 r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 r 33 t z 3 5 2 6 4 x i y i z i 1 3 7 5

2 4 u i w i v i w i w i 3 5 = 2 4 f x 0 c x 0 f y c y 0 0 1 3 5 2 4 r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 r 33 t z 3 5 2 6 4 x i y i z i 1 3 7 5 3D world-space point

2 4 u i w i v i w i w i 3 5 = 2 4 f x 0 c x 0 f y c y 0 0 1 3 5 2 4 r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 r 33 t z 3 5 2 6 4 x i y i z i 1 3 7 5 Rotate and translate point into camera space 3D world-space point

2 4 u i w i v i w i w i 3 5 = 2 4 f x 0 c x 0 f y c y 0 0 1 3 5 2 4 r 11 r 12 r 13 t x r 21 r 22 r 23 t y r 31 r 32 r 33 t z 3 5 2 6 4 x i y i z i 1 3 7 5 Project point into image plane Rotate and translate point into camera space 3D world-space point apple ui w i w i = Let s work through a simpler 2D version apple f c 0 1 apple cos( ) sin( ) tx sin( ) cos( ) t y 2 4 x i y i 1 3 5

apple ui w i w i = apple f c 0 1 apple cos( ) sin( ) tx sin( ) cos( ) t y 2 4 x i y i 1 3 5 2D point 1D projection

apple ui w i w i = apple f c 0 1 apple cos( ) sin( ) tx sin( ) cos( ) t y 2 4 x i y i 1 3 5 apple ui w i w i = apple f c 0 1 apple cos( )xi sin( )y i + t x sin( )x i + cos( )y i + t y

apple ui w i w i = apple f c 0 1 apple cos( ) sin( ) tx sin( ) cos( ) t y 2 4 x i y i 1 3 5 apple ui w i w i = apple f c 0 1 apple cos( )xi sin( )y i + t x sin( )x i + cos( )y i + t y u i w i w i = apple f(cos( )xi sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y L(f,c,,t x,t y )= X i (u i h(f,c,,t x,t y,x i,y i )) 2

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y L(f,c,,t x,t y )= X i (u i h(f,c,,t x,t y,x i,y i )) 2 argmin L(f,c,,t x,t y ) f,c,,t x,t y

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y L(f,c,,t x,t y )= X i (u i h(f,c,,t x,t y,x i,y i )) 2 argmin L(f,c,,t x,t y ) f,c,,t x,t y Apply non-linear optimization method. Exercise: Derive @L @f, @L @c, @L @, @L, @L @t x @t y

What if we don t have a target?

What if we don t have a target? The world is our calibration target!

What if we don t have a target? The world is our calibration target! But we don t know the position of all points in the world.

Structure from Motion Key goals of SfM: Use approximate camera calibrations to match features and triangulate approximate 3D points Use approximate 3D points to improve approximate camera calibrations Chicken-and-egg problem Can extend and use our non-linear optimization framework Requires a good initialization

SfM building blocks What do we need from our CV toolbox? Keypoint detection Descriptor matching F-matrix estimation Ray triangulation Camera projection Non-linear optimization Useful metadata Focal length guess (EXIF tags)

Given: 1 2 Images 1 and 2 Focal length guesses

1. Compute feature 1 2 matches and F- matrix

2. Use approx K s 1 to get E-matrix 2 E = K2 T FK1

3. Decompose E 1 into relative pose 2 E = R[t]x

1 4. Triangulate features 2

1 5. Apply non-linear optimization 2

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y (f,c,,t x,t y, (x 1,y 1 ),...,(x n,y n )) = X i (u i h(f,c,,t x,t y,x i,y i )) 2 argmin f,c,,t x,t y,(x 1,y 1 ),...,(x n,y n ) L(f,c,,t x,t y, (x 1,y 1 ),...,(x n,y n )) Doesn t make sense for 1 camera

h(f,c,,t x,t y,x i,y i )= f(cos( )x i sin( )y i + t x )+c(sin( )x i + cos( )y i + t y ) sin( )x i + cos( )y i + t y L(K 1,...,K m, (x 1,y 1 ),...,(x n,y n )) = X i X j w i,j (u i h(k j, (x i,y i ))) 2 argmin K 1,...,K m,(x 1,y 1 ),...,(x n,y n ) L(K 1,...,K m, (x 1,y 1 ),...,(x n,y n )) Called Bundle Adjustment

Camera sets can be incrementally built up Essential matrix

Camera sets can be incrementally built up Perspective n Point method

Dubrovnik - Incremental Bundle Adjustment

Dubrovnik

Sacré-Cœur

SfM Ambiguities x = PX

SfM Ambiguities x = PX x =(PQ)(Q 1 X)

SfM Ambiguities x = PX x =(PQ)(Q 1 X) x = K(TQ)(Q 1 X)

SfM Ambiguities x = PX x =(PQ)(Q 1 X) x = K(TQ)(Q 1 X) T is a rigid body transformation If we want TQ to be a RBT, then Q could be an RBT > If we rotate and translate all the points, everything works out if we rotate and translate all the cameras.

SfM Ambiguities x = PX x =(PS 1 )(SX)

SfM Ambiguities x = PX x =(PS 1 )(SX) x = K(TS 1 )(SX)

SfM Ambiguities x = PX x =(PS 1 )(SX) x = K(TS 1 )(SX) x = K(S 1 TT 0 )(SX) x =(KS 1 )(TT 0 )(SX)

SfM Ambiguities x = PX x =(PS 1 )(SX) x = K(TS 1 )(SX) x = K(S 1 TT 0 )(SX) x =(KS 1 )(TT 0 )(SX) x =(S 1 K)(TT 0 )(SX) Sx = K(TT 0 )(SX)

SfM Ambiguities Sx = Sx = K(TT 0 )(SX) 2 3 2 3 suw uw 4 svw 5 = 4 vw 5 = x sw w

SfM Ambiguities Sx = Sx = K(TT 0 )(SX) 2 3 2 3 suw uw 4 svw 5 = 4 vw 5 = x sw w > If we scale all the points, everything works out if we move the camera positions.

SfM Ambiguities x = PX x =(PQ)(Q 1 X) In this case Q is a general similarity transform. We resolve the ambiguity often by placing one camera at the origin facing some direction and a second camera at fixed offset from the first.

Applications

Internet-scale 3D

Snavely, et. al. Finding Paths through the World's Photos. SIGGRAPH 2008.

Structure from Motion. CS4670/CS Kevin Matzen - April 15, 2016