Visual SLAM Tutorial: Bundle Adjustment

Similar documents
Sparse Levenberg-Marquardt algorithm.

nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!

Method 1: Geometric Error Optimization

Lie Groups for 2D and 3D Transformations

On the Observability and Self-Calibration of Visual-Inertial Navigation Systems

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Chapter 3 Numerical Methods

Pose estimation from point and line correspondences

Numerical Methods for Inverse Kinematics

Non-linear least squares

Optimisation on Manifolds

Augmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit

Linear and Nonlinear Optimization

Gaussian Processes as Continuous-time Trajectory Representations: Applications in SLAM and Motion Planning

Least-Squares Fitting of Model Parameters to Experimental Data

Mobile Robot Geometry Initialization from Single Camera

Theory of Bouguet s MatLab Camera Calibration Toolbox

L03. PROBABILITY REVIEW II COVARIANCE PROJECTION. NA568 Mobile Robotics: Methods & Algorithms

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Bastian Steder

Screw Theory and its Applications in Robotics

Autonomous Mobile Robot Design

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Diego Tipaldi, Luciano Spinello

Mysteries of Parameterizing Camera Motion - Part 2

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

Multi-Frame Factorization Techniques

Advanced Techniques for Mobile Robotics Least Squares. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

Robot Mapping. Least Squares. Cyrill Stachniss

13. Nonlinear least squares

Single Exponential Motion and Its Kinematic Generators

Lecture 13 Visual Inertial Fusion

1 Computing with constraints

A Factorization Method for 3D Multi-body Motion Estimation and Segmentation

Vector Derivatives and the Gradient

Numerical computation II. Reprojection error Bundle adjustment Family of Newtonʼs methods Statistical background Maximum likelihood estimation

Robotics, Geometry and Control - Rigid body motion and geometry

. D CR Nomenclature D 1

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

2 Nonlinear least squares algorithms

1 Bayesian Linear Regression (BLR)

Nonlinear Optimization for Optimal Control

SLAM Techniques and Algorithms. Jack Collier. Canada. Recherche et développement pour la défense Canada. Defence Research and Development Canada

1 Kalman Filter Introduction

Camera Models and Affine Multiple Views Geometry

Generalized Gradient Descent Algorithms

Position and orientation of rigid bodies

Camera Calibration The purpose of camera calibration is to determine the intrinsic camera parameters (c 0,r 0 ), f, s x, s y, skew parameter (s =

Camera calibration. Outline. Pinhole camera. Camera projection models. Nonlinear least square methods A camera calibration tool

IMU-Camera Calibration: Observability Analysis

Machine Learning (CS 567) Lecture 5

Lecture Note 4: General Rigid Body Motion

Visual Tracking via Geometric Particle Filtering on the Affine Group with Optimal Importance Functions

Basic Math for

TRACKING and DETECTION in COMPUTER VISION

Choice of Riemannian Metrics for Rigid Body Kinematics

Numerical Methods I Solving Nonlinear Equations

Linear Algebra and Robot Modeling

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Motion Estimation (I)

Trinocular Geometry Revisited

Sensor Localization and Target Estimation in Visual Sensor Networks

Kernel Methods. Machine Learning A W VO

Matrix Derivatives and Descent Optimization Methods

Multi-Robotic Systems

Computing MAP trajectories by representing, propagating and combining PDFs over groups

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Numerical Optimization

L11. EKF SLAM: PART I. NA568 Mobile Robotics: Methods & Algorithms

Robotics I. February 6, 2014

State observers for invariant dynamics on a Lie group

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Parameterizing the Trifocal Tensor

Numerical solution of Least Squares Problems 1/32

Video 3.1 Vijay Kumar and Ani Hsieh

Motion Estimation (I) Ce Liu Microsoft Research New England

Math 273a: Optimization Netwon s methods

Simultaneous Localization and Mapping

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Video 6.1 Vijay Kumar and Ani Hsieh

Computational Statistics and Data Analysis. Optimal Computation of 3-D Similarity: Gauss-Newton vs. Gauss-Helmert

Differential Kinematics

Linear & nonlinear classifiers

Essential Matrix Estimation via Newton-type Methods

Chapter 3 Numerical Methods

Linear Regression (continued)

Regularization in Neural Networks

Linear Models for Regression CS534

Iterative Reweighted Least Squares

Nonlinear Wind Estimator Based on Lyapunov

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

Robotics 1 Inverse kinematics

Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm

Lecture Note 8: Inverse Kinematics

Representation of a Three-Dimensional Moving Scene

Frequentist-Bayesian Model Comparisons: A Simple Example

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Linear Regression and Its Applications

A Study of Covariances within Basic and Extended Kalman Filters

Transcription:

Visual SLAM Tutorial: Bundle Adjustment Frank Dellaert June 27, 2014 1 Minimizing Re-projection Error in Two Views In a two-view setting, we are interested in finding the most likely camera poses T1 w and T 2 w, both in the space SE3) of 3D poses, and 3D points Pj w R 3, with j 1..n, which we can do by minimizing the re-projection errors f Tc w,p w ; p) = htc w,p w ) p 2 Σ associated with each measurement p R 2. Above, the measurement function h : SE3) R 3 R 2 predicts the measurement p from the unknowns Tc w = R w c,tc w ) and P w, and the notation e 2 Σ e T Σ 1 e is shorthand for the squared Mahalanobis distance with covariance Σ. The objective function is then the log-likelihood of the unknowns given all such measurements p in the two views. Under the Gaussian noise assumptions, this is exactly 2 In a Linear World... E T1 w,t 2 w,{ }) = P j hti w,pj w ) p i j 2 Σ 1) i=1,2 j In a linear world, the measurement function h would be a linear function of the unknowns. For reasons that will become clear, let us introduce a 6DOF unknown ξ i associated with camera i, and a 3DOF unknwon δ j associated with point j. Then linearity implies = and the objective function ˆp i j = h i j ξi,δ j ) = Fi j ξ i + G i j δ j E ξ 1,ξ 2, { }) = δ j F i j ξ i + G i j δ j b i j 2 Σ 2) i=1,2 j where we also introduced b i j, the linear measurement. Then we can easily re-write the objective function 2) as E ξ 1,ξ 2, { δ j }) = Ax b 2 Σ 3) with x and b vectorized forms of our unknowns and measurements, respectively, and Σ a suitable defined block-diagonal) covariance matrix. 1

3 Sparse Matters The matrix A will be a block-sparse matrix [Hartley and Zisserman, 2004]. For example, with 3 points, we have F 11 G 11 b F 12 G 12 ξ 11 1 A = F 13 G 13 ξ 2 b 12 F 21 G 21, x = δ 1 F 22 G 22 δ 2, b = b 13 b 14 b δ 15 F 23 G 3 23 b 16 A linear least-squares criterion of the form 3) is easy to solve: we just take the derivative wrt x and set to zero, obtaining the condition A A ) x = A b, 4) also known as the normal equations. In the above, the symmetric matrix A A is the known as the Hessian of the system. Because A is sparse, it is imperative that, for the sake of computational efficiency, a sparse solver is used to solve for x, e.g., using GTSAM or cholmod. The latter is built into MATLAB and is automatically invoked simply by typing x = A\b. 4 Back to Reality In reality, the measurement function ht w c,p w ) is quite non-linear, and generates the predicted measurement ˆp by first transforming the point P w into camera coordinates P c, as specified by the camera T w c, then projecting the point so obtained into the image: ˆp = ht w c,p w ) = π gt w c,p w )) In detail, the point P c in camera coordinates is obtained by the coordinate transformation g, as in P c = gt w c,p w ) = R w c ) T P w t w c ) 5) Then the predicted image coordinates ˆp are obtained by the camera projection function π ˆp = ˆx,ŷ) = π X c,y c,z c ) = X c Z c, Y c Z c ) 6) Above we assumed that the measurements p are pre-calibrated to normalized image coordinates p = x, y), as we are interested in the setting where the calibration matrix K and radial distortion function f r) are known. So how can we get back to the easy, linear setting? Maybe we can do a Taylor expansion, as in ht w c + ξ,p w + δ) ht w c,p w ) + Fξ + Gδ. However, here we run into trouble, as + is not defined for elements of SE3). The next section introduces a more general view of Taylor expansion that will work, however. 2

5 A Twist of Lim)e The notion of an incremental update to an element on a manifold a M is provided by the notion of a tangent space at a a, denoted T a M, which is a vector space R n with n equal to the dimensionality of the manifold. Any vector x T a M in the tangent space can be mapped back to the manifold, with x = 0 mapping to a. Hence, vectors x correspond to updates around a, and this provides a way for our linearized problem to update the solution on the manifold [Absil et al., 2007]. In the special case that the manifold is also a group, i.e., a Lie group, the mapping is provided by the exponential map. In what follows, I use the notation from [Murray et al., 1994], directly specialized to SE3): a vector ξ R 6 is mapped to an element of the Lie Algebra se3), in this case a 4 4 matrix, which is then mapped to a Lie group element by matrix exponentiation: T w c T w c exp ˆξ where the hat operator ˆis the mapping from R 6 to se3): [ ] [ ξ = ω θ v ˆξ = [ω] v 0 0 For SE3) the 6-dimensional quantity ω,v) is also called a twist, whereas θ R + is the amount of twist applied, resulting in a finite screw motion. In general, the elements of the vector space isomorphic to the Lie algebra are called canonical coordinates. More details about the Lie algebra se3) and its corresponding exponential map can be found in Appendix A. ] θ 6 A Generalized Taylor Expansion Equipped with this notion, we can generalize the Taylor expansion to locally approximate a function f : G R m, with G an n-dimensional Lie group, as follows: f ae ˆξ ) f a) + f a)ξ where ˆξ g, the Lie algebra of G, and f a) is an m n Jacobian matrix, linearly mapping the n-dimensional exponential coordinates ξ to m-dimensional corrections on f a). Note that f a) is itself a function of a, i.e., the linear mapping is in general different for different values of a. In the image projection case, we need to similarly generalize the notion of partial derivatives, to approximate the two-argument measurement function h: ht w c e ˆξ,P w + δ) ht w c,p w ) + h [1] T w c,p w )ξ + h [2] T w c,p w )δ Above h [1].) and h [2].) are the 2 6 and 2 3 Jacobian matrices corresponding to the partial derivatives of h in its first and second arguments, respectively. Note that the incremental pose exp ˆξ ) is applied on the camera side, and hence the exponential coordinates ξ = ω,v) are expressed in the camera frame. 3

7 Nonlinear Optimization By defining F i j = h [1] Ti w,pj w), G i j = h [2] Ti w,pj w), and the linearization prediction error b i j = hti w,pj w) p i j, we recover exactly the linear objective function in 2), which we can minimize for the optimal exponential coordinates ξ i and δ j. Gauss-Newton nonlinear optimization simply iterates between linearization, solve, and update, until a predefined convergence criterion is stasfied: 1. Start with a good initial estimate θ 0 = T1 w,t 2 w,{ } P j 2. Linearize 1) around θ it to obtain A and b 3. Solve for x = ξ 1,ξ 2, { δ j } using the normal equations 4) A A ) x = A b where A A the Gauss-Newton approximation to the Hessian of the actual non-linear system. 4. Update the nonlinear estimate θ it+1 using the exponential map a) T1 w T 1 w exp ˆξ 1 b) T2 w T 2 w exp ˆξ 2 c) P j P j + δ j 5. If not converged, go to 2. 8 Too much Freedom! A problem that will pop up when using the Gauss-Newton for the two-view case is that the Hessian A A will be singular. The reason is that there is a so-called gauge freedom: the solution is not uniquely determined, as we can displace the cameras using an arbitrary, 7DOF similarity transform without affecting the objective function. There are several ways to get around this: Switch to a minimal representation of a scale-free relative transform between the two cameras. This works well for two cameras but is not so easy to apply in the multi-camera case. Use the so called inner constraints from photogrammetry, which constrain the centroid and scale of the solution. This generalizes to multiple cameras, but a downside is that the inner constraints destroy some of the sparsity of the system, which affects computational efficiency. Add prior terms on the first pose and the distance of the second pose from the first pose e.g., unity). This works well and scales to multiple cameras, but in a monocular situation there will still be scale drift. Fuse in information from other sensors, e.g., IMU and/or GPS. In the case of a calibrated stereo-rig the problem of scale disappears, but one still has to put a prior on or clamp down the first pose in order to eliminate the gauge freedom. A common but unadvisable way is to use the technique from the next section to make the problem disappear. 4

9 A Matter of Trust When far from the solution, the Hessian approximation A A might be valid for a very small region only, and taking quadratic steps as given by the normal equations will lead to divergence. A well known way to deal with this issue is to limit the steps based on the amount of trust we have in the current linearization, switching to a more cautious gradient descent approach when far from the local minimum. The most well-known of these so-called region trust methods is Levenberg-Marquardt LM), which solves for A A + λd ) x = A b, where D is either the diagonal of A or the identity matrix. For the latter case, with λ high, the solution is given by x = λ 1 A b, which is gradient descent with step size λ 1. When λ is zero or small, LM reverts to quadratic Gauss-Newton steps. Simple implementations of LM start with high values of λ and gradually decrease it, backing off if this leads to steps with non-decreasing error. More sophisticated variants put a lot more work into finding a good λ at every step. What LM is not designed for is masking out gauge freedoms, although in many implementations it is used that way: taking smaller steps when not needed will lead to slower convergence. 10 Finally, the Jacobians... To apply all of the above, one last task has to be undertaken, and that is deriving the Jacobians of the measurement function h. Because h itself is the composition of the coordinate transformation g followed by the projection π, the following chain rules apply: h [1] T w c,p w ) = π p c ) g [1] T w c,p w ) h [2] T w c,p w ) = π p c ) g [2] T w c,p w ) We now need to compute three Jacobian matrices: 1. The 2 3 Jacobian matrix π X c,y c,z c ) of the projection function π is easily computed as π p c ) = 1 [ 1 0 X c /Z c ] Z c 0 1 Y c /Z c = 1 [ ] 1 0 ˆx z c 7) 0 1 ŷ 2. The 3 3 Jacobian matrix g [2] T w c,p w ) of the coordinate transformation g with respect to a change δ in the point P w is also easy to compute, as is clearly linear in δ, and hence gt w c,p w + δ) = R w c ) T P w +t w c ) + R w c ) T δ g [2] T w c,p w ) = R w c ) T 8) 5

3. The change in pose is a bit more challenging. It can be shown, however, that g Tc w e ˆξ,P w) = P c + P c ω v This is linear in both ω and v, and hence the 3 6 Jacobian matrix g [1] Tc w,p w ) of g with respect to a change in pose easily follows as g [1] Tc w,p w ) = [ [P c ] ] I 3 9) In other words, rotating the camera frame is magnified by the arm of the point in camera coordinates, whereas moving the camera simply moves the point s coordinates in the opposite direction. In summary, using Eqns. 7, 8, and 9, the derivatives of image projection are h [1] Tc w,p w ) = 1 [ ] 1 0 ˆx [ [P Z c 0 1 ŷ c ] ] I 3 h [2] T w c,p w ) = 1 Z c [ 1 0 ˆx 0 1 ŷ 10) ] R w c ) T 11) Note the arm of P c for rays near the optical axis is now nearly canceled by the inverse depth factor 1/z c in π P c ), i.e., we expect flow nearly proportional to the incremental rotation ω. 11 More Than One Camera Many readers will have already figured out that the entire story above generalizes to many cameras almost at once. The general case is known as Structure from Motion SfM) or bundle adjustment. The total re-projection error now sums over all measurements p k E {Ti w }, { }) = P j hti w k,pj w k ) p k 2 Σ 12) k where i k and j k encode the data association, assumed known here. Techniques such as Levenberg- Marquardt and obtaining a good initial estimate will be of greater importance for the many-cameras case, as the objective function above is very non-linear, especially when points move closer to image planes and coross-over to the wrong side, in which case local minimizers such as Gauss Newton and LM will almost always converge to a local minimum. Another often used technique in batch SfM is the Schur complement trick, where in the linear step one first solves for the point updates δ j as a function of pose updates { ξ j }. However, the utility of that in an online, visual SLAM scenario is debatable - a discussion for a different part of the tutorial. 6

A 3D Rigid Transformations The Lie group SE3) is a subgroup of the general linear group GL4) of 4 4 invertible matrices of the form [ ] T = R t 0 1 where R SO3) is a rotation matrix and t R 3 is a translation vector. An alternative way of writing down elements of SE3) is as the ordered pair R, t), with composition defined as R 1, t 1 )R 2, t 2 ) = R 1 R 2, R 1 t 2 +t 1 ) Its Lie algebra se3) is the vector space of 4 4 twists ˆξ parameterized by the twist coordinates ξ R 6, with the mapping [Murray et al., 1994] [ ] [ ] ξ = ω θ v ˆξ = 6 [ω] v θ = ξ 0 0 i G i where the 4 4 matrices G are called the generators for se3). Note I follow a different convention from Murray and reserve the first three components for rotation, and the last three for translation. Hence, with this parameterization, the generators for SE3) are G 1 = G 4 = 0 0 1 0 0 1 0 0 0 0 0 1 G2 = G5 = 0 0 1 0 1 0 0 0 0 0 0 1 i=1 G3 = G6 = 0 1 0 0 1 0 0 0 0 0 0 1 Applying the exponential map to a twist ξ, scaled with θ R +, yields a screw motion in SE3): ) T t) = exp ˆξ θ A closed form solution for the exponential map is given in [Murray et al., 1994, page 42]. References exp [ ω v ] ) [ e [ω] θ I e t = [ω] θ )ω v) + ωω T vθ 0 1 [Absil et al., 2007] Absil, P.-A., Mahony, R., and Sepulchre, R. 2007). Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ, USA. [Hartley and Zisserman, 2004] Hartley, R. I. and Zisserman, A. 2004). Multiple View Geometry in Computer Vision. Cambridge University Press, second edition. [Murray et al., 1994] Murray, R., Li, Z., and Sastry, S. 1994). A Mathematical Introduction to Robotic Manipulation. CRC Press. ] 7