Mutual information for multi-modal, discontinuity-preserving image registration

Similar documents
Motion Estimation (I) Ce Liu Microsoft Research New England

Motion Estimation (I)

Pose estimation from point and line correspondences

Efficient Nonlocal Regularization for Optical Flow

Dense Optical Flow Estimation from the Monogenic Curvature Tensor

A Tutorial on Primal-Dual Algorithm

Introduction to motion correspondence

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

Nonlinear Diffusion. Journal Club Presentation. Xiaowei Zhou

Copula based Divergence Measures and their use in Image Registration

Edges and Scale. Image Features. Detecting edges. Origin of Edges. Solution: smooth first. Effects of noise

Dual methods for the minimization of the total variation

Convex Hodge Decomposition of Image Flows

A First Order Primal-Dual Algorithm for Nonconvex T V q Regularization

Generalized Newton-Type Method for Energy Formulations in Image Processing

An Adaptive Confidence Measure for Optical Flows Based on Linear Subspace Projections

A Generative Model Based Approach to Motion Segmentation

Erkut Erdem. Hacettepe University February 24 th, Linear Diffusion 1. 2 Appendix - The Calculus of Variations 5.

Global parametric image alignment via high-order approximation

Revisiting Horn and Schunck: Interpretation as Gauß-Newton Optimisation

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!

Optic Flow Computation with High Accuracy

A Riemannian Framework for Denoising Diffusion Tensor Images

Vlad Estivill-Castro (2016) Robots for People --- A project for intelligent integrated systems

Adaptive Primal Dual Optimization for Image Processing and Learning

NONLINEAR DIFFUSION PDES

ENERGY METHODS IN IMAGE PROCESSING WITH EDGE ENHANCEMENT

ITK Filters. Thresholding Edge Detection Gradients Second Order Derivatives Neighborhood Filters Smoothing Filters Distance Map Image Transforms

Corners, Blobs & Descriptors. With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros

Lecture 8: Interest Point Detection. Saad J Bedros

Linear Diffusion and Image Processing. Outline

Variational Methods in Signal and Image Processing

Total Variation Theory and Its Applications

Mixture Models and EM

ECS289: Scalable Machine Learning

Multigrid Acceleration of the Horn-Schunck Algorithm for the Optical Flow Problem

Image Alignment and Mosaicing

Feature Vector Similarity Based on Local Structure

Introduction to Nonlinear Image Processing

Kernel Correlation for Robust Distance Minimization

INTEREST POINTS AT DIFFERENT SCALES

Learning features by contrasting natural images with noise

Parameter Identification in Partial Differential Equations

Filtering and Edge Detection

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Energy-Based Image Simplification with Nonlocal Data and Smoothness Terms

Supplementary Material: Minimum Delay Moving Object Detection

Human Pose Tracking I: Basics. David Fleet University of Toronto

Deformation and Viewpoint Invariant Color Histograms

Variational Methods in Image Denoising

Feature extraction: Corners and blobs

Lecture 8: Interest Point Detection. Saad J Bedros

Generalized Laplacian as Focus Measure

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

PDEs in Image Processing, Tutorials

Uncertainty Models in Quasiconvex Optimization for Geometric Reconstruction

Multiscale Image Transforms

ECE521 week 3: 23/26 January 2017

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Lecture Notes 5: Multiresolution Analysis

Notes on Regularization and Robust Estimation Psych 267/CS 348D/EE 365 Prof. David J. Heeger September 15, 1998

Single-Image-Based Rain and Snow Removal Using Multi-guided Filter

Image enhancement. Why image enhancement? Why image enhancement? Why image enhancement? Example of artifacts caused by image encoding

Analysis of Numerical Methods for Level Set Based Image Segmentation

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Motion estimation. Digital Visual Effects Yung-Yu Chuang. with slides by Michael Black and P. Anandan

Convexity and Non-Convexity in Partitioning and Interpolation Problems

STA141C: Big Data & High Performance Statistical Computing

NON-LINEAR DIFFUSION FILTERING

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

DESIGN OF MULTI-DIMENSIONAL DERIVATIVE FILTERS. Eero P. Simoncelli

Multi-modal Image Registration Using Dirichlet-Encoded Prior Information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

T H E S I S. Computer Engineering May Smoothing of Matrix-Valued Data. Thomas Brox

Image Alignment and Mosaicing Feature Tracking and the Kalman Filter

Wavelet-based Salient Points with Scale Information for Classification

Optical Flow, Motion Segmentation, Feature Tracking

Convex Hodge Decomposition and Regularization of Image Flows

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

A Tensor Variational Formulation of Gradient Energy Total Variation

Recent developments on sparse representation

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

Lucas-Kanade Optical Flow. Computer Vision Carnegie Mellon University (Kris Kitani)

CS4495/6495 Introduction to Computer Vision. 6B-L1 Dense flow: Brightness constraint

Iterative Image Registration: Lucas & Kanade Revisited. Kentaro Toyama Vision Technology Group Microsoft Research

Video and Motion Analysis Computer Vision Carnegie Mellon University (Kris Kitani)

Methods in Computer Vision: Introduction to Optical Flow

Sparsity Regularization

Variational Image Restoration

Landscapes & Algorithms for Quantum Control

Scale Space Analysis by Stabilized Inverse Diffusion Equations

Rotational Invariants for Wide-baseline Stereo

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Photometric Stereo: Three recent contributions. Dipartimento di Matematica, La Sapienza

Orientation Map Based Palmprint Recognition

SIFT: SCALE INVARIANT FEATURE TRANSFORM BY DAVID LOWE

Nonparametric Bayesian Methods (Gaussian Processes)

Transcription:

Mutual information for multi-modal, discontinuity-preserving image registration Giorgio Panin German Aerospace Center (DLR) Institute for Robotics and Mechatronics Münchner Straße 20, 82234 Weßling Abstract. Multi-sensory data fusion and medical image analysis often pose the challenging task of aligning dense, non-rigid and multi-modal images. However, optical sequences may also present illumination variations and noise. The above problems can be addressed by an invariant similarity measure, such as mutual information. However, in a variational setting convex formulations are generally recommended for efficiency reasons, especially when discontinuities at the motion boundaries have to be preserved. In this paper we propose the TV-MI approach, addressing for the first time all of the above issues, through a primal-dual estimation framework, and a novel approximation of the pixel-wise Hessian matrix, decoupling pixel dependencies while being asymptotically correct. At the same time, we keep a high computational efficiency by means of prequantized kernel density estimation and differentiation. Our approach is demonstrated on ground-truth data from the Middlebury database, as well as medical and visible-infrared image pairs. 1 Introduction An important problem in computer vision is to find visual correspondences between two views of a scene, possibly acquired by multi-modal sensors, or under different illumination conditions. The former is a preliminary step for multisensory data fusion, as well as medical image analysis and visualization. However, robustness to illumination and image noise is also a vital requirement for motion estimation in optical sequences. In the optical flow literature, we can first distinguish between global and local methods, dating back to [1] and [2] respectively, or combinations of both [3]. The former minimize a global energy, that combines a pixel-wise data term, assessing the quality of matching, with a regularization prior, coping with the ill-posedness of the problem. The others extend data terms to local windows of a given aperture, increasing robustness to noise and avoiding further regularization, but usually limited to a sparse set of features in textured areas, roughly undergoing planar homographies. Global energies are efficiently minimized through locally convex approximations of the nonlinear cost function, typically obtained by linearizing residuals,

2 Giorgio Panin under an L p -norm or a convex M-estimator. For differentiable cost functions, discretized Euler-Lagrange equations are employed: for example, in [1] a linearized L 2 -norm data term is regularized by the L 2 -norm of the motion field f, and the resulting quadratic problem is solved by Jacobi iterations. These algorithms are also suitable for graphics hardware implementation, because of their highly parallel structure. For preserving motion discontinuities at the surface boundaries, the total variation (TV) regularizer employs instead the L 1 -norm, that allows non differentiable solutions, however adding non-trivial issues to the optimization procedure. Earlier works in this direction [4] use the approximate L 1 regularizer f 2 + ɛ 2, where ɛ is a small positive constant, thus keeping the Euler- Lagrange framework. However, this procedure introduces ill-conditioning, especially for small ɛ. More recently, careful studies have shown how to directly and efficiently address convex TV-L 1 problems [5], including optical flow [6], by means of primaldual formulations, that introduce a dual variable and solve a saddle-point problem in two alternate steps (min-max), coupled by a quadratic penalty. Considering the data term, the simplest and most common assumption is the brightness constancy, that may be violated in presence of photometric changes. This happens in case of a variable camera exposure, as well as environment light variations, and especially for multi-modal data (such as medical, or multispectral images), that bear nonlinear and many-to-one relationships. Since the L p -norm is not robust to such variations, several alternatives have been proposed. To cope with smooth, additive illumination fields, in [6] both images are pre-processed by a structure-texture decomposition [7], which amounts to a L 1 denoising (the ROF model [8]), producing a structure image, that is afterwards removed so that only texture components are used for matching. Other works introduce additional terms such as image gradients [4], which are robust to additive changes, but also more noisy and requiring a proper relative weighting; while others estimate smooth, additive illumination fields [5], or complex parametrized models [9]. A different class of approaches looks instead for more robust and invariant matching indices. For example, normalized cross-correlation (NCC) is invariant to brightness mean and variance, thus allowing linear photometric relationships; it has been recently included into the convex variational framework [10], through local correlation windows, and a second-order Taylor expansion with numerical differentiation. Another index is the correlation ratio (CR) [11], which is invariant to a class of nonlinear, one-to-one relationships. So far, the most general index is mutual information (MI), defined in information theory to express the statistical dependency between two random variables, in this case the corresponding grey pairs: in this way, any photometric relationship is enforced, also nonlinear and many-to-one. Due to this property, as well as a higher robustness to outliers and noise, MI has been initially proposed for medical image registration [12, 13]. Later on,

TV-MI image registration 3 it has been applied to stereo, in [14] and in the semi-global matching (SGM) algorithm [15], for object tracking [16] and visual navigation [17]. Notably, [11] considered a unified variational formulation of global NCC, CR and MI, as well as their local counterparts, for multi-modal and non-rigid registration. This approach only relies on gradient descent, through the nonlinear Euler-Lagrange equations. Although MI has been used also for variational registration, in this case we are not aware of any locally convex formulation, which, as we have seen, is the key for an efficient optimization using discontinuity-preserving priors. Our main contribution is, therefore, the integration of global MI into the primal-dual TV framework through locally convex, second-order Taylor expansion. Furthermore, we adopt a particular approximation of the Hessian matrix, motivated by the following insights. In fact, it is well-known that MI is a cascade of two mappings: one at the level of grey-value statistics (Sec. 3.1) and one at pixel-level (Sec. 3.2), where both Hessian contributions contain first- and second-order terms. We choose to retain at the upper level only second-order terms, while keeping only first-order ones at the lower level. This leads to a block-diagonal, negative-semidefinite approximation, resulting in directional searches along image gradients, while being asymptotically correct. By contrast, the traditional approximation first proposed in [18], intuitively following the Gauss-Newton approach, neglects second-order terms everywhere. However, this has been put recently under discussion [17],while already confirmed by a seldom usage even in a few dimensions (e.g. Levenberg-Marquardt strategies [19] show less efficiency than the LSE counterpart). At pixel level, instead, (2 2) rank-1 structure tensors are consistent with the aperture problem of global approaches. By comparison, the second-order approximation of local NCC [10] neglects off-diagonal terms, further decoupling the horizontal and vertical flow components, by assuming in most places to have diagonally-dominant, full-rank blocks, due to the extended sampling windows. In our case, this assumption would be clearly incorrect. The remainder of the paper is organized as follows: in Sec. 2 we review the primal-dual variational approach. Sec. 3 describes our formulation for the MI data term and optimization strategy, finally resuming the TV-MI algorithm. Sec. 4 shows experimental results on the Middlebury training dataset and multimodal images, and Sec. 5 proposes future developments. 2 TV-regularized motion estimation Given two images I 0, I 1, a motion field f = (u(x, y), v(x, y)) is sought in order to match corresponding points I 0 (x, y), I 1 (x + u, y + v) with possibly sub-pixel accuracy, such that some similarity index is maximized, at the same time keeping a smooth field, while preserving discontinuities at the motion boundaries. The first requirement can be expressed, omitting the x, y coordinates for brevity, by a global data term E data (I 0, I 1 (u, v)). The other constraints are usu-

4 Giorgio Panin ally incorporated into a smoothness (or soft penalty) term E smooth (u, v), which is a function of the local behaviour of the field, typically through the spatial gradients arg min E smooth ( u, v) + λe data (I 0, I 1 (u, v)) (1) (u,v) with a proper weighting factor λ. Following [5], let F = E smooth, G = λe data, we have the general problem arg min F (Df) + G (f) (2) f X where f : Ω R 2 belongs to an Euclidean space X of functions with open domain, D : X Y is a linear operator such as the component-wise gradient, mapping onto another space Y, F : Y R + and G : X R + are the prior and data terms, for example given by an integral over Ω of the respective L p -norm. Both spaces are endowed with the scalar product, and induced norm f, g = f i g i dx dy; f = f, f (3) i Ω summed over the vector field components i = {1, 2}. If both F, G are convex and lower semi-continuous [5], then (2) can be cast into a saddle-point problem min max Df, p + G (f) F (p) (4) f X p Y where p Y is the dual variable, and F is the Legendre-Fenchel conjugate F (p ) sup p, p F (p) (5) p Y In order to solve (4), first-order algorithms alternate descent and ascent steps in the respective variables f, p, by defining the resolvent, or proximal operators ( ) f = (I + τ G) 1 f ; p = (I + σ F ) 1 ( p) (6) where τ, σ are two parameters, I is the identity mapping, and F is the subgradient of F, which extends the (variational) gradient to non-differentiable but convex functions, being well-defined over the whole domain Y. This operator is given by ( ) (I + τ G) 1 1 f = arg min f f 2τ f 2 + G (f) (7) and similarly for F. Then, an efficient algorithm (Alg. 1 in [5], with θ = 1) iterates the following steps Initialization: choose τ, σ > 0 s.t. τσ D 2 1, set initial values f 0, p 0, and the auxiliary variable f 0 = f 0

TV-MI image registration 5 Iterate: for n = 1, 2,... p n = (I + σ F ) 1 ( p n 1 + σd f n 1) f n = (I + τ G) 1 ( f n 1 τd p n) f n = 2f n f n 1 (8) where D is the dual operator: Df, p Y = f, D p X. In particular, the total variation regularizer F T V = Df dxdy (9) Ω is the isotropic L 1 -norm of the distributional derivative, that is defined also for discontinuous fields, and reduces to the gradient D = when f is sufficiently smooth, so that Df = fx 2 + fy 2. The corresponding dual operator is the divergence, p = divp. Thus, proximal operators in (8) are applied to p n p n 1 + σ f n 1 ; f n f n 1 + τdivp n (10) In the following, we will consider the problem in a discrete setting, where f, p are defined on pixel grids, and the discretized operators are given in [5]. Then, it can be shown that D 2 8, and a common choice is τ = σ = 1/ 8. Furthermore, (I + τ F T V ) 1 is the point-wise Euclidean projection p = (I + τ F T V ) 1 ( p) p x,y = p x,y max (1, p x,y ) (11) that is, the temporal variation between I 0 and the warped image I 1 (f) is assumed to be a zero-mean white noise process. 3 Mutual information data term Formally, MI is the Kullback-Leibler divergence between P (i 0, i 1 ) and the product of marginals P (i 0 )P (i 1 ) MI(I 0, I 1 f) = H(I 0 ) + H(I 1 f) H(I 0, I 1 f) (12) = 1 1 0 0 P (i 0, i 1 f) log P (i 0, i 1 f) P (i 0 )P (i 1 f) di 0di 1 where H are the marginal and joint entropies, and we emphasize the dependency of the I 1 sample on f. This quantity must be maximized with respect to f, so we can write E data = MI(I 0, I 1 f). In order to introduce our Hessian approximation, we will first consider the statistical dependency of MI on grey values, and then the lowerlevel dependency upon flow vectors.

6 Giorgio Panin 3.1 Approximating the Hessian: grey-value statistics For a given a density estimate P (i 0, i 1 ), obtained from a sample of grey pairs I 0,h, I 1,h ; h = 1,..., N, let us consider the dependency of MI on the I 1 sample 1 (suppressing the 1 index) MI I h = 2 MI I h I k = P (i 0, i 1 ) log P (i 0, i 1 ) i 0,i 1 I h P (i 1 ) 2 P (i 0, i 1 ) log P (i 0, i 1 ) i 0,i 1 I h I k P (i 1 ) ( P (i 0, i 1 ) I h P (i 0, i 1 ) I k + 1 P (i 0, i 1 ) 1 P (i 1 ) ) (13) This Hessian is generally not diagonal since, although sampling schemes for P (i 0, i 1 ) ensure that mixed partials are zero, the last term is generally non zero for h k, leading to a problem of untractable complexity. In order to reduce MI to a sum of independent terms, [14] and [15] linearize P log P around the previous density estimate P = P (I 0, I 1 f), leading to P log P P log P. Although these methods are derivative-free, this corresponds to neglecting first-order terms in the Hessian, that cause the undesired coupling. We can see that the resulting accuracy is mainly related to the finite sample size N, and to the kernel bandwidth: in fact, because of the products, first order terms decay as 1/N 2, while second order terms as 1/N. Moreover, we observed that the approximation is always best at the optimum, i.e. when the joint density is maximally clustered. Finally, the eigenvalues of our approximation have always a larger magnitude than those of the true Hessian, that can be seen from the fact that first-order terms on the diagonal are always non-negative. Among the many existing non-parametric procedures for entropy estimation, we decided to follow the efficient strategy used in [15, 14], extended to our derivative-based framework. Briefly resumed, it consists of a Parzen-based estimation, with pre-quantized kernels assigned to the cells of a (256 256) joint histogram P. The density is estimated, after warping I 1 (f), by collecting the histogram of I 0,h, I 1,h, and subsequently convolving it with an isotropic Gaussian K w of bandwidth w. Afterwards, a further convolution of log P with the same kernel, evaluated at the same sample points 2, produces the desired data terms, whose sum is the entropy. H(I 0, I 1 ) = 1 [K w log (K w P )] (I 0,h, I 1,h ) (14) N h and similarly for the marginal entropy H(I 1 ), this time with mono-dimensional convolutions, and a possibly different bandwidth w 1. 1 Notice that we write log instead of (1 + log) as it is often found, because derivatives of a twice-differentiable density integrate to 0. 2 In order to keep a sub-pixel/sub-grey precision, we perform bilinear interpolation at non-integer histogram positions.

TV-MI image registration 7 From (14) we obtain derivatives in a straightforward way, again by convolution of the log P table H I 1,h = 1 N [K w log (K w P )] (I 0,h, I 1,h ) (15) 2 H I1,h 2 1 N [K w log (K w P )] (I 0,h, I 1,h ) where K, K are first- and second-derivatives along I 1, and the last equation comes from the previously explained approximation to (13). All of these operations are efficiently carried out by the FFT. The bandwitdths w, w 1 for (14) are estimated in a maximum-likelihood way, according to cross-validation rules [11], that can be shown to require a convolution by K/ w. In practice, performing the above convolutions is still an expensive operation; therefore, we update those tables only once per pyramid level, while interpolating them at new values of I 1 (f) for computing the Hessian and gradient. The latter operations are performed in an intermediate warp loop (Fig. 1), while the innermost loop alternates primal-dual steps (8) until convergence. 3.2 Approximating the Hessian: directional derivatives At the pixel level, the aperture problem results in rank-deficient (2 2) diagonal blocks of the overall Hessian. In fact, after decoupling pixel-wise dependencies, we can compute derivatives of MI w.r.t. the flow MI f h 2 MI fh 2 = MI I 1,h I 1,h (16) = 2 MI I1,h 2 I 1,h I1,h T + MI 2 I 1,h I 1,h fh 2 where, hereafter dropping the h index [ I I I T 2 = x I x I y I x I y I 2 y ] ; 2 [ ] I f 2 = Ixx I xy I xy I yy (17) are the (rank-1) structure tensor, and the Hessian of I 1, respectively. At the optimum, in absence of noise, (MI)/ I 1 vanishes from (16), so we approximately keep the rank-1 term, scaled by the second derivatives of MI. The image Hessian is seldom used in the literature, because it may be indefinite, and consists of possibly noisy values. However, we have to further check the factor 2 MI/ I 2 1 in order to ensure a negative-semidefinite matrix. In fact, during the initial stages the density is spreaded out, and some places may have a positive (or almost zero) curvature. Therefore, we threshold each factor to a maximum value D 2 max < 0. In order to cope with the rank deficiency, the primal step thus relies on the regularizing prior, whose strict convexity ensures a unique minimum. Since the

8 Giorgio Panin Initialization: Let I 0, I 1 be two images, set f 0 and an initial guess for w, w 1. Compute the two pyramids at L levels, including sub-octaves and related subsampling. Outer loop: let f l 1 be the result at the previous level 1. Upsample f l 1 f l (and the dual field p l 1 p l ) 2. Warp I 1 and I 1 at f l, and collect the joint histogram 3. Adapt w, w 1 with maximum-likelhood ascent 4. Compute the entropy tables for MI (14)(15) 5. Warp loop: initialize f 0 = f l, and repeat (a) Warp I 1 at f 0 and compute MI gradient and Hessian, by interpolating the tables at (I 0, I 1) (b) Inner loop: iterate n = 1, 2,... i. Perform the dual step (10)(11) to obtain f n ii. Solve (20) and update the primal variable f n (c) Apply median filtering to f n, and update the expansion point f 0 = f n Fig. 1. The TV-MI algorithm. prior f f 2 is isotropic in (x, y), the problem reduces to a mono-dimensional search, along n = I 1 / I 1. For this purpose, first- and second-order directional derivatives are given by MI n 2 MI n 2 = MI I 1 I 1 (18) = 2 MI I 2 1 I 1 2 + MI 2 I 1 I 1 n 2 where, once again, the last term of the second derivative is neglected. Thus, we look for ρ n (f T f ), the projected motion field along n, and conversely f = f + ρn. Several primal-dual steps (Fig. 1) are needed for the TV-regularized optimization, so that prior values f n will be different from the initial expansion point f 0. ( Therefore, by defining ρ n = n T f 0 f ), n dropping the n index, the primal step becomes { [ ρ 2 MI arg min ρ 2τ λ I 1 (ρ ρ) + 1 2 ]} MI I 1 2 I1 2 I 1 2 (ρ ρ) 2 (19) where derivatives are computed at f 0, that is solved by ρ = MI I 1 2 MI I 1 I1 2 I 1 2 ρ (20) 1 λτ 2 MI I1 2 I 1 2

TV-MI image registration 9 Fig. 2. Photometric variations of different types (see text), added to the second image of the RubberWhale sequence. First row: image; Second row: result of TV-L 1 with illumination field estimation [5] (β = 0.05); Third row: result of TV-MI. Dataset Dimetrodon Grove2 Grove3 Hydrangea RubberWhale Urban2 Urban3 Venus Average angular error (AE) Original 3.39, 3.16 2.95, 2.66 7.86, 6.64 2.83, 2.57 4.95, 4.43 3.17, 2.65 5.83, 5.14 4.74, 4.59 Noise 26.09, 26.52 19.47, 18.33 27.72, 16.19 44.39, 10.41 33.34, 31.05 24.84, 11.59 27.47, 15.72 30.69, 16.86 Linear 6.08, 3.13 4.49, 2.57 8.96, 6.53 3.48, 2.52 7.32, 4.29 5.24, 2.62 17.77, 5.47 7.03, 4.67 Square 5.80, 3.15 5.14, 2.68 9.38, 6.71 4.35, 2.58 8.68, 4.42 10.34, 2.73 15.61, 5.53 7.47, 4.38 Neg. square 80.79, 3.15 53.44, 2.71 115.67, 6.73 125.83, 2.59 72.65, 4.43 105.00, 2.74 133.18, 5.66 92.03, 4.45 Two-to-one 88.56, 3.31 59.12, 3.01 115.10, 7.54 96.92, 4.24 66.74, 4.61 88.64, 2.92 118.66, 5.54 96.18, 11.32 Average end-point error (EE) Original 0.18, 0.17 0.21, 0.19 0.77, 0.66 0.23, 0.22 0.16, 0.14 0.40, 0.35 0.82, 0.67 0.32, 0.30 Noise 1.14, 1.18 1.32, 1.19 2.19, 1.47 2.73, 1.07 0.98, 0.87 2.36, 1.09 3.42, 1.86 2.17, 1.16 Add. field 0.22, 0.53 1.71, 0.44 0.79, 1.00 0.20, 0.77 0.14, 0.56 1.52, 1.81 1.61, 1.63 0.38, 0.63 Linear 0.30, 0.17 0.31, 0.18 0.85, 0.65 0.39, 0.21 0.24, 0.13 0.55, 0.36 1.44, 0.69 0.43, 0.30 Square 0.28, 0.17 0.36, 0.19 0.89, 0.67 0.44, 0.23 0.27, 0.13 1.03, 0.37 1.67, 0.75 0.48, 0.31 Neg. square 31.04, 0.17 32.40, 0.19 77.91, 0.67 66.03, 0.23 43.24, 0.13 24.18, 0.37 35.27, 0.77 43.85, 0.31 Two-to-one 22.65, 0.18 32.21, 0.21 44.61, 0.73 27.52, 0.42 32.51, 0.14 46.93, 0.67 75.56, 0.69 40.79, 0.71 Table 1. Ground-truth comparison on the Middlebury dataset. On each entry, results for TV-L 1 (left) and TV-MI (right) are shown. Optimization failures are marked. 4 Experimental results In order to assess the quality of the TV-MI algorithm, we tested it first on optical sequences with ground-truth, using the Middlebury datasets 3, and compared with the illumination-robust TV-L 1 algorithm [5], that estimates additive fields q(x, y) I t I n 1 (u u n, v v n ) + I n t + βq (21) with an additional coefficient β, so that f is augmented to f = (u, v, q). This over-parametrization leads to a compromise between robustness and precision: a high β tends to estimate strong brightness variations and suppress motion, while a low β cannot deal with the actual illumination changes, increasing the risk of divergence. For this comparison, we run the TV-L 1 Matlab implementation available at the TU-Graz computer vision website 4. Our algorithm is currently in Matlab 3 http://vision.middlebury.edu/flow/ 4 http://www.gpu4vision.org

10 Giorgio Panin code, showing roughly the same timing: for example, the RubberWhale sequence takes about 45 sec. for TV-MI and 51 sec. for TV-L 1. Throughout all sequences, parameters were set as follows: data term weight λ = 1 for TV-MI (λ = 50 for TV-L 1 ), initial guess for kernel size w = 5, pyramid levels 30 (with reduction factor 0.9), primal-dual coefficients τ = σ = 1/ 8, 1 warp iteration and 50 inner-loop iterations. 70 60 TV L1 (beta=0) TV L1 (beta=0.05) TV MI Rubberwhale average angular error 6 5 Rubberwhale average end point error TV L1 (beta=0) TV L1 (beta=0.05) TV MI 50 4 40 3 30 20 2 10 1 0 10 3 10 2 10 1 10 0 Gaussian noise std. 0 10 3 10 2 10 1 10 0 Gaussian noise std. Fig. 3. Average estimation errors at different levels of additive noise. In the first set of experiments, we also set β = 0, obtaining the result marked Original in Table 1. As we can see, for a constant illumination, our algorithm shows similar performances or slight improvements. Subsequently, we create more challenging conditions, by making photometric changes to the second image I 1 of each sequence (Fig. 2 shows the RubberWhale example) in the following order: additive Gaussian noise (σ = 0.1), linear map 0.7I 1 + 0.3, nonlinear one-to-one map I 2 1, with color inversion 1 I 2 1, and twoto-one map 2 I 1 0.5. In order to cope with these changes, we set β = 0.05 for TV-L 1. We can see how MI can cope with linear and nonlinear maps, outperforming L 1 most of the times, and showing an improved robustness to random noise (see also Fig. 3). Examples of MRI/CT and near infrared (NIR)/optical pairs, bearing more complex photometric relationships, are shown in Fig. 4. 5 Conclusions In this paper, we presented the TV-MI approach for multi-modal and discontinuitypreserving variational image registration. Future developments may follow several directions. For example, the TV regularizer can be replaced by a more robust, anisotropic Huber term [10]. Moreover, as for any global data term, MI performances degrade in presence of a slowly

TV-MI image registration 11 Fig. 4. Multi-modal registration of medical and infrared-optical images. From left to right: original images; superimposed images, before and after warping. Optical/NIR pictures re-printed with permission ( c James McCreary, www.dpfwiw.com). varying illumination field, that creates a one-to-many relationship by spreading out the joint histogram. For this purpose, here one may resort either to a local formulation of statistics [11], or to an additional parametric field. Finally, a GPU-based implementation can largely improve the speed of histogram sampling, FFT convolution, gradient and Hessian interpolation, and solution to the primal problem. References 1. Horn, B.K.P., Schunk, B.G.: Determining optical flow. Artificial Intelligence 17 (1981) 185 203

12 Giorgio Panin 2. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision (darpa). In: Proceedings of the 1981 DARPA Image Understanding Workshop. (1981) 121 130 3. Bruhn, A., Weickert, J., Schnörr, C.: Lucas-kanade meets horn-schunck: combining local and global optic flow methods. International Journal of Computer Vision 61 (2005) 211 231 4. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Computer Vision - ECCV 2004, 8th European Conference on Computer Vision, Prague, Czech Republic, May 11-14, 2004. Proceedings, Part IV. (2004) 25 36 5. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40 (2011) 120 145 6. Wedel, A., Pock, T., Zach, C., Bischof, H., Cremers, D. In: An Improved Algorithm for TV-L1 Optical Flow. Springer-Verlag, Berlin, Heidelberg (2009) 23 45 7. Aujol, J.F., Gilboa, G., Chan, T.F., Osher, S.: Structure-texture image decomposition - modeling, algorithms, and parameter selection. International Journal of Computer Vision 67 (2006) 111 136 8. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60 (1992) 259 268 9. Haussecker, H.W., Fleet, D.J.: Computing optical flow with physical models of brightness variation. IEEE Trans. Pattern Anal. Mach. Intell. 23 (2001) 661 673 10. Werlberger, M., Pock, T., Bischof, H.: Motion estimation with non-local total variation regularization. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA (2010) 11. Hermosillo, G., Chefd Hotel, C., Faugeras, O.D.: Variational methods for multimodal image matching. International Journal of Computer Vision 50 (2002) 329 343 12. Gaens, T., Maes, F., Vandermeulen, D., Suetens, P. In: Non-rigid multimodal image registration using mutual information. Volume 1496. Springer (1998) 1099 1106 13. Wells, W., Viola, P., Atsumi, H., Nakajima, S., Kikinis, R.: Multi-modal volume registration by maximization of mutual information. Medical Image Analysis 1 (1996) 35 51 14. Kim, J., Kolmogorov, V., Zabih, R.: Visual correspondence using energy minimization and mutual information. In: 9th IEEE International Conference on Computer Vision (ICCV 2003), 14-17 October 2003, Nice, France. (2003) 1033 1040 15. Hirschmüller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30 (2008) 328 341 16. Panin, G., Knoll, A.: Mutual information-based 3d object tracking. International Journal of Computer Vision 78 (2008) 107 118 17. Dame, A., Marchand, E.: Accurate real-time tracking using mutual information. In: IEEE Int. Symp. on Mixed and Augmented Reality, ISMAR 10, Seoul, Korea (2010) 47 56 18. Thevenaz, P., Unser, M.: Optimization of mutual information for multiresolution image registration. IEEE Transactions on Image Processing 9 (2000) 2083 2099 19. Pluim, J.P.W., Maintz, J.B.A., Viergever, M.A.: Mutual-information-based registration of medical images: a survey. Medical Imaging, IEEE Transactions on 22 (2003) 986 1004