Sparse Levenberg-Marquardt algorithm.

Similar documents
nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.

Theory of Bouguet s MatLab Camera Calibration Toolbox

Visual SLAM Tutorial: Bundle Adjustment

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!

13. Nonlinear least squares

Applied Mathematics 205. Unit I: Data Fitting. Lecturer: Dr. David Knezevic

Pose estimation from point and line correspondences

Generalized Gradient Descent Algorithms

Method 1: Geometric Error Optimization

Multi-Frame Factorization Techniques

Non-polynomial Least-squares fitting

COMP 558 lecture 18 Nov. 15, 2010

Matrices and systems of linear equations

Linear and Nonlinear Optimization

Augmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit

Designing Information Devices and Systems I Discussion 13B

Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI

Algorithms for Computing a Planar Homography from Conics in Correspondence

Conjugate Gradient (CG) Method

Camera Calibration The purpose of camera calibration is to determine the intrinsic camera parameters (c 0,r 0 ), f, s x, s y, skew parameter (s =

Augmented Reality VU Camera Registration. Prof. Vincent Lepetit

Lecture 2: Linear Algebra Review

Practical Linear Algebra: A Geometry Toolbox

Homographies and Estimating Extrinsics

Computational Methods. Least Squares Approximation/Optimization

Nonlinear Least Squares

1 Determinants. 1.1 Determinant

Advanced Techniques for Mobile Robotics Least Squares. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

Uncertainty Models in Quasiconvex Optimization for Geometric Reconstruction

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Iterative Methods. Splitting Methods

Derivation of the Kalman Filter

Chapter 7 Iterative Techniques in Matrix Algebra

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

Chapter 3 Numerical Methods

2 Nonlinear least squares algorithms

Dense LU factorization and its error analysis

Math/CS 466/666: Homework Solutions for Chapter 3

Matrix Derivatives and Descent Optimization Methods

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Section 6.2 Larger Systems of Linear Equations

MTH5112 Linear Algebra I MTH5212 Applied Linear Algebra (2017/2018)

Gauss-Jordan Row Reduction and Reduced Row Echelon Form

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

Homogeneous Coordinates

Vision par ordinateur

= V I = Bus Admittance Matrix. Chapter 6: Power Flow. Constructing Ybus. Example. Network Solution. Triangular factorization. Let

Activity: Derive a matrix from input-output pairs

CS4495/6495 Introduction to Computer Vision. 3D-L3 Fundamental matrix

Camera Models and Affine Multiple Views Geometry

MAT 1332: CALCULUS FOR LIFE SCIENCES. Contents. 1. Review: Linear Algebra II Vectors and matrices Definition. 1.2.

Numerical solution of Least Squares Problems 1/32

Motion Estimation (I) Ce Liu Microsoft Research New England

Math 250B Midterm I Information Fall 2018

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Basic Math for

Fast Incremental Square Root Information Smoothing

On Geometric Error for Homographies

A geometric interpretation of the homogeneous coordinates is given in the following Figure.

Maximum Likelihood Estimation

Numerical Analysis of Electromagnetic Fields

Optimisation on Manifolds

Linear Algebra M1 - FIB. Contents: 5. Matrices, systems of linear equations and determinants 6. Vector space 7. Linear maps 8.

Foundations of Computer Vision

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Bastian Steder

High Accuracy Fundamental Matrix Computation and Its Performance Evaluation

Math 273a: Optimization Netwon s methods

Matrix Algebra. Matrix Algebra. Chapter 8 - S&B

Review Packet 1 B 11 B 12 B 13 B = B 21 B 22 B 23 B 31 B 32 B 33 B 41 B 42 B 43

Math 302 Outcome Statements Winter 2013

1. Projective geometry

Components and change of basis

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Diego Tipaldi, Luciano Spinello

Numerical computation II. Reprojection error Bundle adjustment Family of Newtonʼs methods Statistical background Maximum likelihood estimation

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Multiple View Geometry in Computer Vision

I = i 0,

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Math 240 Calculus III

Numerical Linear Algebra Homework Assignment - Week 2

Robot Mapping. Least Squares. Cyrill Stachniss

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Multiplying matrices by diagonal matrices is faster than usual matrix multiplication.

22A-2 SUMMER 2014 LECTURE 5

Matrices. In this chapter: matrices, determinants. inverse matrix

Review of Matrices and Block Structures

EE364a Review Session 7

GAUSSIAN ELIMINATION AND LU DECOMPOSITION (SUPPLEMENT FOR MA511)

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Knowledge Discovery and Data Mining 1 (VO) ( )

Lie Groups for 2D and 3D Transformations

Linear Algebra and Matrix Inversion

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Motion Estimation (I)

Math 313 Chapter 1 Review

1. Let m 1 and n 1 be two natural numbers such that m > n. Which of the following is/are true?

Linear Algebra March 16, 2019

Transcription:

Sparse Levenberg-Marquardt algorithm. R. I. Hartley and A. Zisserman: Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, 2004. Appendix 6 was used in part. The Levenberg-Marquardt (LM) algorithm minimizes the following error function in an iterative procedure e 2 res = X ˆX 2 Σ X = (X ˆX) Σ 1 X (X ˆX) = ǫ Σ 1 X ǫ. In the t-th iteration X ˆX t = ǫ t, an N-dimensional vector in Cartesian coordinates. The way ǫ t is written here will have no minus sign in the minimizations. Represents the error between the measured original values and the current estimate of the measurements. This means simultaneous minimization of the parameters and of the measurements. The value M from now on means all the unknown and not only the parameters. The first order Taylor series gives for the next iteration ǫ t+1 = X f(ˆp t+1 ) = X f(ˆp t ) J f Pt (ˆP t+1 ˆP t ) = ǫ t J f Pt δ where δ = (ˆP t+1 ˆP t ), is the adjustment of all the estimates. The Jacobian J f Pt is an N M matrix. The number of measurements in Cartesian coordinated times the number of estimates: dimension of the parameters space plus the measurement in homogeneous coordinates. Therefore N < M. The variables to be determined are in the M dimensional vector δ. In each iteration we have to take into account the constraints of the parameters to have only the essential parameters. Also, if the last element of an estimated measurement is not one, we will divide all the coordinates with it to compute in Cartesian coordinates the new ǫ t+1. The normal equation, with a shorter writing of the Jacobian, is (J t Σ 1 X J t)δ = J t Σ 1 X ǫ t. The LM algorithm uses two different ways to converge to a minimum which is hopefully close to the global optimum. An augmented normal equations decides if the next iteration should be a Gauss-Newton or a gradient descent procedure J t Σ 1 X J t + λ t diag(j t Σ 1 X J t) δ t = J t Σ 1 X ǫ t

where λ is a scalar multiplier with the initial value typically λ 0 = 10 3. The M updates are the diagonal values of the M M square matrix. In the (t + 1)-th iteration: If the error did decrease in the previous iteration, λ decreases by λ t+1 = λ t /10. For small λ-s the LM does a Gauss-Newton iteration with the updated Jacobian (J Σ 1 X J)δ = J Σ 1 X ǫ. The matrix J Σ 1 X J is always positive semi-definite and the error in the estimate of the objective function seems to be close to local linear convergence. If the increment in δ is further decreasing this error, the Gauss-Newton part continues. If the error did increase in the previous iteration, λ increases by λ t+1 = 10λ t. For large λ-s the LM does gradient descent λδ = J Σ 1 X ǫ. The right side of the equation is proportional with the gradient of the objective function, ǫ Σ 1 X ǫ, and since ǫ = X ˆX, the sign will be negative. If the increment in δ decreases the error, the method goes back to Gauss-Newton. If not, the λ is further increased. If the nondiagonal terms are insignificant with respect to the diagonal ones, each of the estimates P = {p j } M j=1 can be estimated separately. The process is repeated till δ becomes smaller then a threshold. The LM algorithm needs an initial estimate of the parameters, which can be obtained by total least squares. From the initial estimate of the parameters the initial estimate of the measurements is obtained. The LM algorithm is sensitive to initialization. If the initial estimates is very far from the solution, LM may not be converging to the wanted result. In computer vision the parameters and the measurements can be separated. For example, you want to recover 10,000 points in 3D from 100 views of 2D images. That is 3 10, 000 unknown in 3D positions. You also need the camera parameters, which have 12 100 unknown. (In fact is only 11 parameters and the first camera is given.) There are more than 30, 000 unknowns and inversion of more than 30000 30000 matrix. However, if at each iteration the cameras are solved first and only after that we solve for the 3D measurements, we will have to do only about 1100 1100 matrix and 3D measurements are obtained by back substitution, transforming into an upper-triangular matrix. 2

The sparsity assumption separates the parameters from the estimated measurements ( ) a P = b where, a is the vector of parameters and b is the vector of measurements. The Jacobian matrix has a block structure of A as (N parameters) plus B as (N measurements homogeneous coordinates) J = ˆX P = A B A = ˆX a B = ˆX b We would like the next iteration of the estimates to be zero. The first order Taylor series becomes J f Pt (P t+1 P t ) = X f(p t ) ( ) δa,t A t B t = ǫ t where ǫ t is the error we want to reduce in the squared Mahalanobis distance. δ b,t We discard the iteration number, and the normal equation is A Σ 1 X A A Σ 1 X B δ a A Σ 1 X ǫ =. B Σ 1 X A B Σ 1 X B δ b B Σ 1 X ǫ The M M augmentation is only along the diagonal of the big Jacobian matrix and therefore only the stared matrices change with λ ( ) ( ) U W δa ǫa =. W V Identification of the matrices is immediate. In the upper-left corner are the parameters, in the lower-right corner are the estimated measurements and in the upperright (equal to the lower-left) are the cross-covariances. We assume that V B Σ 1 X B are always full rank and therefore invertable. We I multiply both sides with matrix para para WV ( 1) resulting 0 meas.h.c. para I meas.h.c. meas.h.c. in ( ) ( ) U WV ( 1) W 0 δa ǫa WV = ( 1) ǫ B. W V 3 δ b δ b ǫ B ǫ B.

At each iteration the parameters are estimated from the equation δ a,(t+1) = (U t W t V ( 1) t Wt ) + (ǫ A,t W t V ( 1) t ǫ B,t ) and the measurements by back substitution of one coordinate after the other from V t δ b,(t+1) = ǫ B,t W t δ a,t. The next iteration starts from ( a + δa,(t+1) P (t+1) = b + δ b,(t+1) ) and function of the error value, the algorithm does one of the two possibilities till convergence. Lets illustrate the process with an example. The simultaneous estimation of multiple projective cameras and the 3D points from several views of 2D data, is called bundle adjustment. Assume we have three cameras j = 1, 2, 3 and in each view there are four 2D points i = 1, 2, 3, 4 from 3D. A 3D points can be written for each view, X i = (x i1,...,x i3 ). The points in the j-th view have the same, independent noise. The unknown vector contains the estimated cameras and the estimated 3D points: P = (a 1, a 2, a 3, b 1, b 2, b 3, b 4 ). Take the i-th 2D point on the j-th camera. The 2D point depends only on the j-th camera. The Jacobian in the camera parameters a = (a 1,...,a 3 ) for i- th 2D point A i = ˆX i / a is zero for any other camera k j because then A ik = ˆx ik / a k. The matrix is a diagonal matrix A i = diag(a i1, A i2, A i3 ) in the cameras and each block has i = 1,...4. The Jacobian of the i-th 3D point b i is B i = ˆX i / b i which decomposed in 2D points B i = (B i1,...,b i3 ) has an element B ij = ˆx ij / b i. In the j-th camera, the matrix B kj = ˆx kj / b k is zero if the point is k i. The matrix is diagonal diag(b 1j,...,B 4j ) and the diagonal matrix is repeated for each of the j = 1, 2, 3 cameras. 4

A total of 12 3 + 3 4 estimates have to be recovered in each iteration. (The first camera can be a 3 3 unit matrix and a 3-vector of zeros instead of translation.) In general, 12 camera 3 measurements. The covariance of the estimated quantities is ΣˆP = (J f ˆP Σ 1 X J f ˆP )+ where no variation in the parameters is allowed in directions perpendicular to the constraint surface. Since the final estimates are used, the augmentation is no longer needed. The matrix can be diagonalized is this way U W J f ˆP Σ 1 X J f ˆP = W = V Ipr pr WV = 1 0 mhc pr I mhc mhc U WV 1 W 0 0 V Two matrices, G invertible and H, have the identity (GHG ) + = G H + G 1 under very relaxed null-space conditions. Writing I 0 V W I. X = (U WV 1 W ) + Y = WV 1 5

we obtain ΣˆP = X Y X XY Y XY + V 1. See the matrix inversion in block form in the second part of the planar homography material. All the covariances are based on the measured data and not on other computed covariances. The covariance for the (essential) parameters is Σâ = (U WV 1 W ) + the covariance of the estimated measurements is Σˆb = Y ΣâY + V 1 and the cross-covariance between the parameters and estimated measurements is Σâˆb = ΣâY. Sparse LM for 2D homography estimation. The corresponding homogeneous image points in the two images are x i x i, connected through a 3 3 nonsingular matrix H. The measurements are X i = ( x i, x i ). In total there are n measurements in a 4n long column vector. The noise covariance for each point is independent, isotropic and equal. Therefore, also Σ 1 x i = Σ 1 x i = S 2 and Σ 1 X is n times the diag(s 2, S 2 ), a 4n 4n matrix. The Jacobian matrix for homography is an n time 4 9 matrix A i = ˆ Xi h = 0 2 9 x i h since ˆx i does not depend of h. Note that ˆ Xi is in Cartesian coordinates, and the precise formulation of the derivate was given at the beginning of the lecture. The Jacobian matrix for the first image homogeneous coordinates is an n time 4 3 matrix B i = ˆ Xi = I 2 3 x i x i x i 6

The estimated homogeneous measurements for a pair is ˆX i = ( ˆx i, ˆx i ) = ( ˆx i, (Ĥˆx i) ) and it is enough to estimate the parameters (9) and the points in the first image (3n), because the points in the second image are then determined P = (h, ˆx 1,..., ˆx n ). The eight essential parameters, the 2n Cartesian coordinated in the first image and from there the coordinates in the second image, are computed in each iteration. We will compute the covariance of the homography transformation in two simple cases. The first case is H = si. 02 9 I2 3 I2 2 A i = B i = = J i si 2 3 si 2 2 since the last column in B i is always zero. The matrix U from the Jacobian is a 9 9 matrix because A i simplifies to 2 9 U = A i diag(s 2, S 2 )A i = J i S 2 J i. The matrix V = diag(v 1,...,V n ) is 2n 2n matrix because B i is only 4 2 and only the first image coordinated are sought V i = B i diag(s 2, S 2 )B i = (s 2 + 1)S 2. The matrix W = W 1,...,W n is 9 2n matrix, each element being 9 2 W i = A i diag(s 2, S 2 )B i = sj i S 2. We have to recover 9 + 2n estimates at each iteration. The H being very simple, the estimation in Cartesian coordinate directly. The covariance of the homography is ( Σĥ = (U WV 1 W ) + = J i S 2 J i (sj i S 2 ) ) + 1 s 2 + 1 S 1 2 (ss 2 J i ) 7

where we took into account the symmetry of the noise covariance. Gives a 9 9 matrix of rank eight ( ) + Σĥ = (s 2 + 1) J i S 2 J i. In the second case H is an affine transformation D t H = 0 1 and the noise covariance is a unit matrix S 2 = I 2. In this case the second image is also directly in Cartesian coordinates and B i can be expressed in Cartesian coordinates too 02 9 I2 2 A i = B i =. D J i The matrix U is U = A i diag(i 2, I 2 )A i = The matrix V i of the i-th measurements is J i J i. V i = B i diag(i 2, I 2 )B i = I + D D. The matrix W i, the cross-covariance of the parameters and the i-th measurement, is W i = A i diag(i 2, I 2 )B i = J i D. The covariance of the homography is ( Σĥ = J i J i (J i D)(I + D D) 1 (D J i ) or Σĥ = ( J i ( I D(I + D D) 1 D ) J i ) +. ) + 8