Mutual information for multi-modal, discontinuity-preserving image registration Giorgio Panin German Aerospace Center (DLR) Institute for Robotics and Mechatronics Münchner Straße 20, 82234 Weßling Abstract. Multi-sensory data fusion and medical image analysis often pose the challenging task of aligning dense, non-rigid and multi-modal images. However, optical sequences may also present illumination variations and noise. The above problems can be addressed by an invariant similarity measure, such as mutual information. However, in a variational setting convex formulations are generally recommended for efficiency reasons, especially when discontinuities at the motion boundaries have to be preserved. In this paper we propose the TV-MI approach, addressing for the first time all of the above issues, through a primal-dual estimation framework, and a novel approximation of the pixel-wise Hessian matrix, decoupling pixel dependencies while being asymptotically correct. At the same time, we keep a high computational efficiency by means of prequantized kernel density estimation and differentiation. Our approach is demonstrated on ground-truth data from the Middlebury database, as well as medical and visible-infrared image pairs. 1 Introduction An important problem in computer vision is to find visual correspondences between two views of a scene, possibly acquired by multi-modal sensors, or under different illumination conditions. The former is a preliminary step for multisensory data fusion, as well as medical image analysis and visualization. However, robustness to illumination and image noise is also a vital requirement for motion estimation in optical sequences. In the optical flow literature, we can first distinguish between global and local methods, dating back to [1] and [2] respectively, or combinations of both [3]. The former minimize a global energy, that combines a pixel-wise data term, assessing the quality of matching, with a regularization prior, coping with the ill-posedness of the problem. The others extend data terms to local windows of a given aperture, increasing robustness to noise and avoiding further regularization, but usually limited to a sparse set of features in textured areas, roughly undergoing planar homographies. Global energies are efficiently minimized through locally convex approximations of the nonlinear cost function, typically obtained by linearizing residuals,
2 Giorgio Panin under an L p -norm or a convex M-estimator. For differentiable cost functions, discretized Euler-Lagrange equations are employed: for example, in [1] a linearized L 2 -norm data term is regularized by the L 2 -norm of the motion field f, and the resulting quadratic problem is solved by Jacobi iterations. These algorithms are also suitable for graphics hardware implementation, because of their highly parallel structure. For preserving motion discontinuities at the surface boundaries, the total variation (TV) regularizer employs instead the L 1 -norm, that allows non differentiable solutions, however adding non-trivial issues to the optimization procedure. Earlier works in this direction [4] use the approximate L 1 regularizer f 2 + ɛ 2, where ɛ is a small positive constant, thus keeping the Euler- Lagrange framework. However, this procedure introduces ill-conditioning, especially for small ɛ. More recently, careful studies have shown how to directly and efficiently address convex TV-L 1 problems [5], including optical flow [6], by means of primaldual formulations, that introduce a dual variable and solve a saddle-point problem in two alternate steps (min-max), coupled by a quadratic penalty. Considering the data term, the simplest and most common assumption is the brightness constancy, that may be violated in presence of photometric changes. This happens in case of a variable camera exposure, as well as environment light variations, and especially for multi-modal data (such as medical, or multispectral images), that bear nonlinear and many-to-one relationships. Since the L p -norm is not robust to such variations, several alternatives have been proposed. To cope with smooth, additive illumination fields, in [6] both images are pre-processed by a structure-texture decomposition [7], which amounts to a L 1 denoising (the ROF model [8]), producing a structure image, that is afterwards removed so that only texture components are used for matching. Other works introduce additional terms such as image gradients [4], which are robust to additive changes, but also more noisy and requiring a proper relative weighting; while others estimate smooth, additive illumination fields [5], or complex parametrized models [9]. A different class of approaches looks instead for more robust and invariant matching indices. For example, normalized cross-correlation (NCC) is invariant to brightness mean and variance, thus allowing linear photometric relationships; it has been recently included into the convex variational framework [10], through local correlation windows, and a second-order Taylor expansion with numerical differentiation. Another index is the correlation ratio (CR) [11], which is invariant to a class of nonlinear, one-to-one relationships. So far, the most general index is mutual information (MI), defined in information theory to express the statistical dependency between two random variables, in this case the corresponding grey pairs: in this way, any photometric relationship is enforced, also nonlinear and many-to-one. Due to this property, as well as a higher robustness to outliers and noise, MI has been initially proposed for medical image registration [12, 13]. Later on,
TV-MI image registration 3 it has been applied to stereo, in [14] and in the semi-global matching (SGM) algorithm [15], for object tracking [16] and visual navigation [17]. Notably, [11] considered a unified variational formulation of global NCC, CR and MI, as well as their local counterparts, for multi-modal and non-rigid registration. This approach only relies on gradient descent, through the nonlinear Euler-Lagrange equations. Although MI has been used also for variational registration, in this case we are not aware of any locally convex formulation, which, as we have seen, is the key for an efficient optimization using discontinuity-preserving priors. Our main contribution is, therefore, the integration of global MI into the primal-dual TV framework through locally convex, second-order Taylor expansion. Furthermore, we adopt a particular approximation of the Hessian matrix, motivated by the following insights. In fact, it is well-known that MI is a cascade of two mappings: one at the level of grey-value statistics (Sec. 3.1) and one at pixel-level (Sec. 3.2), where both Hessian contributions contain first- and second-order terms. We choose to retain at the upper level only second-order terms, while keeping only first-order ones at the lower level. This leads to a block-diagonal, negative-semidefinite approximation, resulting in directional searches along image gradients, while being asymptotically correct. By contrast, the traditional approximation first proposed in [18], intuitively following the Gauss-Newton approach, neglects second-order terms everywhere. However, this has been put recently under discussion [17],while already confirmed by a seldom usage even in a few dimensions (e.g. Levenberg-Marquardt strategies [19] show less efficiency than the LSE counterpart). At pixel level, instead, (2 2) rank-1 structure tensors are consistent with the aperture problem of global approaches. By comparison, the second-order approximation of local NCC [10] neglects off-diagonal terms, further decoupling the horizontal and vertical flow components, by assuming in most places to have diagonally-dominant, full-rank blocks, due to the extended sampling windows. In our case, this assumption would be clearly incorrect. The remainder of the paper is organized as follows: in Sec. 2 we review the primal-dual variational approach. Sec. 3 describes our formulation for the MI data term and optimization strategy, finally resuming the TV-MI algorithm. Sec. 4 shows experimental results on the Middlebury training dataset and multimodal images, and Sec. 5 proposes future developments. 2 TV-regularized motion estimation Given two images I 0, I 1, a motion field f = (u(x, y), v(x, y)) is sought in order to match corresponding points I 0 (x, y), I 1 (x + u, y + v) with possibly sub-pixel accuracy, such that some similarity index is maximized, at the same time keeping a smooth field, while preserving discontinuities at the motion boundaries. The first requirement can be expressed, omitting the x, y coordinates for brevity, by a global data term E data (I 0, I 1 (u, v)). The other constraints are usu-
4 Giorgio Panin ally incorporated into a smoothness (or soft penalty) term E smooth (u, v), which is a function of the local behaviour of the field, typically through the spatial gradients arg min E smooth ( u, v) + λe data (I 0, I 1 (u, v)) (1) (u,v) with a proper weighting factor λ. Following [5], let F = E smooth, G = λe data, we have the general problem arg min F (Df) + G (f) (2) f X where f : Ω R 2 belongs to an Euclidean space X of functions with open domain, D : X Y is a linear operator such as the component-wise gradient, mapping onto another space Y, F : Y R + and G : X R + are the prior and data terms, for example given by an integral over Ω of the respective L p -norm. Both spaces are endowed with the scalar product, and induced norm f, g = f i g i dx dy; f = f, f (3) i Ω summed over the vector field components i = {1, 2}. If both F, G are convex and lower semi-continuous [5], then (2) can be cast into a saddle-point problem min max Df, p + G (f) F (p) (4) f X p Y where p Y is the dual variable, and F is the Legendre-Fenchel conjugate F (p ) sup p, p F (p) (5) p Y In order to solve (4), first-order algorithms alternate descent and ascent steps in the respective variables f, p, by defining the resolvent, or proximal operators ( ) f = (I + τ G) 1 f ; p = (I + σ F ) 1 ( p) (6) where τ, σ are two parameters, I is the identity mapping, and F is the subgradient of F, which extends the (variational) gradient to non-differentiable but convex functions, being well-defined over the whole domain Y. This operator is given by ( ) (I + τ G) 1 1 f = arg min f f 2τ f 2 + G (f) (7) and similarly for F. Then, an efficient algorithm (Alg. 1 in [5], with θ = 1) iterates the following steps Initialization: choose τ, σ > 0 s.t. τσ D 2 1, set initial values f 0, p 0, and the auxiliary variable f 0 = f 0
TV-MI image registration 5 Iterate: for n = 1, 2,... p n = (I + σ F ) 1 ( p n 1 + σd f n 1) f n = (I + τ G) 1 ( f n 1 τd p n) f n = 2f n f n 1 (8) where D is the dual operator: Df, p Y = f, D p X. In particular, the total variation regularizer F T V = Df dxdy (9) Ω is the isotropic L 1 -norm of the distributional derivative, that is defined also for discontinuous fields, and reduces to the gradient D = when f is sufficiently smooth, so that Df = fx 2 + fy 2. The corresponding dual operator is the divergence, p = divp. Thus, proximal operators in (8) are applied to p n p n 1 + σ f n 1 ; f n f n 1 + τdivp n (10) In the following, we will consider the problem in a discrete setting, where f, p are defined on pixel grids, and the discretized operators are given in [5]. Then, it can be shown that D 2 8, and a common choice is τ = σ = 1/ 8. Furthermore, (I + τ F T V ) 1 is the point-wise Euclidean projection p = (I + τ F T V ) 1 ( p) p x,y = p x,y max (1, p x,y ) (11) that is, the temporal variation between I 0 and the warped image I 1 (f) is assumed to be a zero-mean white noise process. 3 Mutual information data term Formally, MI is the Kullback-Leibler divergence between P (i 0, i 1 ) and the product of marginals P (i 0 )P (i 1 ) MI(I 0, I 1 f) = H(I 0 ) + H(I 1 f) H(I 0, I 1 f) (12) = 1 1 0 0 P (i 0, i 1 f) log P (i 0, i 1 f) P (i 0 )P (i 1 f) di 0di 1 where H are the marginal and joint entropies, and we emphasize the dependency of the I 1 sample on f. This quantity must be maximized with respect to f, so we can write E data = MI(I 0, I 1 f). In order to introduce our Hessian approximation, we will first consider the statistical dependency of MI on grey values, and then the lowerlevel dependency upon flow vectors.
6 Giorgio Panin 3.1 Approximating the Hessian: grey-value statistics For a given a density estimate P (i 0, i 1 ), obtained from a sample of grey pairs I 0,h, I 1,h ; h = 1,..., N, let us consider the dependency of MI on the I 1 sample 1 (suppressing the 1 index) MI I h = 2 MI I h I k = P (i 0, i 1 ) log P (i 0, i 1 ) i 0,i 1 I h P (i 1 ) 2 P (i 0, i 1 ) log P (i 0, i 1 ) i 0,i 1 I h I k P (i 1 ) ( P (i 0, i 1 ) I h P (i 0, i 1 ) I k + 1 P (i 0, i 1 ) 1 P (i 1 ) ) (13) This Hessian is generally not diagonal since, although sampling schemes for P (i 0, i 1 ) ensure that mixed partials are zero, the last term is generally non zero for h k, leading to a problem of untractable complexity. In order to reduce MI to a sum of independent terms, [14] and [15] linearize P log P around the previous density estimate P = P (I 0, I 1 f), leading to P log P P log P. Although these methods are derivative-free, this corresponds to neglecting first-order terms in the Hessian, that cause the undesired coupling. We can see that the resulting accuracy is mainly related to the finite sample size N, and to the kernel bandwidth: in fact, because of the products, first order terms decay as 1/N 2, while second order terms as 1/N. Moreover, we observed that the approximation is always best at the optimum, i.e. when the joint density is maximally clustered. Finally, the eigenvalues of our approximation have always a larger magnitude than those of the true Hessian, that can be seen from the fact that first-order terms on the diagonal are always non-negative. Among the many existing non-parametric procedures for entropy estimation, we decided to follow the efficient strategy used in [15, 14], extended to our derivative-based framework. Briefly resumed, it consists of a Parzen-based estimation, with pre-quantized kernels assigned to the cells of a (256 256) joint histogram P. The density is estimated, after warping I 1 (f), by collecting the histogram of I 0,h, I 1,h, and subsequently convolving it with an isotropic Gaussian K w of bandwidth w. Afterwards, a further convolution of log P with the same kernel, evaluated at the same sample points 2, produces the desired data terms, whose sum is the entropy. H(I 0, I 1 ) = 1 [K w log (K w P )] (I 0,h, I 1,h ) (14) N h and similarly for the marginal entropy H(I 1 ), this time with mono-dimensional convolutions, and a possibly different bandwidth w 1. 1 Notice that we write log instead of (1 + log) as it is often found, because derivatives of a twice-differentiable density integrate to 0. 2 In order to keep a sub-pixel/sub-grey precision, we perform bilinear interpolation at non-integer histogram positions.
TV-MI image registration 7 From (14) we obtain derivatives in a straightforward way, again by convolution of the log P table H I 1,h = 1 N [K w log (K w P )] (I 0,h, I 1,h ) (15) 2 H I1,h 2 1 N [K w log (K w P )] (I 0,h, I 1,h ) where K, K are first- and second-derivatives along I 1, and the last equation comes from the previously explained approximation to (13). All of these operations are efficiently carried out by the FFT. The bandwitdths w, w 1 for (14) are estimated in a maximum-likelihood way, according to cross-validation rules [11], that can be shown to require a convolution by K/ w. In practice, performing the above convolutions is still an expensive operation; therefore, we update those tables only once per pyramid level, while interpolating them at new values of I 1 (f) for computing the Hessian and gradient. The latter operations are performed in an intermediate warp loop (Fig. 1), while the innermost loop alternates primal-dual steps (8) until convergence. 3.2 Approximating the Hessian: directional derivatives At the pixel level, the aperture problem results in rank-deficient (2 2) diagonal blocks of the overall Hessian. In fact, after decoupling pixel-wise dependencies, we can compute derivatives of MI w.r.t. the flow MI f h 2 MI fh 2 = MI I 1,h I 1,h (16) = 2 MI I1,h 2 I 1,h I1,h T + MI 2 I 1,h I 1,h fh 2 where, hereafter dropping the h index [ I I I T 2 = x I x I y I x I y I 2 y ] ; 2 [ ] I f 2 = Ixx I xy I xy I yy (17) are the (rank-1) structure tensor, and the Hessian of I 1, respectively. At the optimum, in absence of noise, (MI)/ I 1 vanishes from (16), so we approximately keep the rank-1 term, scaled by the second derivatives of MI. The image Hessian is seldom used in the literature, because it may be indefinite, and consists of possibly noisy values. However, we have to further check the factor 2 MI/ I 2 1 in order to ensure a negative-semidefinite matrix. In fact, during the initial stages the density is spreaded out, and some places may have a positive (or almost zero) curvature. Therefore, we threshold each factor to a maximum value D 2 max < 0. In order to cope with the rank deficiency, the primal step thus relies on the regularizing prior, whose strict convexity ensures a unique minimum. Since the
8 Giorgio Panin Initialization: Let I 0, I 1 be two images, set f 0 and an initial guess for w, w 1. Compute the two pyramids at L levels, including sub-octaves and related subsampling. Outer loop: let f l 1 be the result at the previous level 1. Upsample f l 1 f l (and the dual field p l 1 p l ) 2. Warp I 1 and I 1 at f l, and collect the joint histogram 3. Adapt w, w 1 with maximum-likelhood ascent 4. Compute the entropy tables for MI (14)(15) 5. Warp loop: initialize f 0 = f l, and repeat (a) Warp I 1 at f 0 and compute MI gradient and Hessian, by interpolating the tables at (I 0, I 1) (b) Inner loop: iterate n = 1, 2,... i. Perform the dual step (10)(11) to obtain f n ii. Solve (20) and update the primal variable f n (c) Apply median filtering to f n, and update the expansion point f 0 = f n Fig. 1. The TV-MI algorithm. prior f f 2 is isotropic in (x, y), the problem reduces to a mono-dimensional search, along n = I 1 / I 1. For this purpose, first- and second-order directional derivatives are given by MI n 2 MI n 2 = MI I 1 I 1 (18) = 2 MI I 2 1 I 1 2 + MI 2 I 1 I 1 n 2 where, once again, the last term of the second derivative is neglected. Thus, we look for ρ n (f T f ), the projected motion field along n, and conversely f = f + ρn. Several primal-dual steps (Fig. 1) are needed for the TV-regularized optimization, so that prior values f n will be different from the initial expansion point f 0. ( Therefore, by defining ρ n = n T f 0 f ), n dropping the n index, the primal step becomes { [ ρ 2 MI arg min ρ 2τ λ I 1 (ρ ρ) + 1 2 ]} MI I 1 2 I1 2 I 1 2 (ρ ρ) 2 (19) where derivatives are computed at f 0, that is solved by ρ = MI I 1 2 MI I 1 I1 2 I 1 2 ρ (20) 1 λτ 2 MI I1 2 I 1 2
TV-MI image registration 9 Fig. 2. Photometric variations of different types (see text), added to the second image of the RubberWhale sequence. First row: image; Second row: result of TV-L 1 with illumination field estimation [5] (β = 0.05); Third row: result of TV-MI. Dataset Dimetrodon Grove2 Grove3 Hydrangea RubberWhale Urban2 Urban3 Venus Average angular error (AE) Original 3.39, 3.16 2.95, 2.66 7.86, 6.64 2.83, 2.57 4.95, 4.43 3.17, 2.65 5.83, 5.14 4.74, 4.59 Noise 26.09, 26.52 19.47, 18.33 27.72, 16.19 44.39, 10.41 33.34, 31.05 24.84, 11.59 27.47, 15.72 30.69, 16.86 Linear 6.08, 3.13 4.49, 2.57 8.96, 6.53 3.48, 2.52 7.32, 4.29 5.24, 2.62 17.77, 5.47 7.03, 4.67 Square 5.80, 3.15 5.14, 2.68 9.38, 6.71 4.35, 2.58 8.68, 4.42 10.34, 2.73 15.61, 5.53 7.47, 4.38 Neg. square 80.79, 3.15 53.44, 2.71 115.67, 6.73 125.83, 2.59 72.65, 4.43 105.00, 2.74 133.18, 5.66 92.03, 4.45 Two-to-one 88.56, 3.31 59.12, 3.01 115.10, 7.54 96.92, 4.24 66.74, 4.61 88.64, 2.92 118.66, 5.54 96.18, 11.32 Average end-point error (EE) Original 0.18, 0.17 0.21, 0.19 0.77, 0.66 0.23, 0.22 0.16, 0.14 0.40, 0.35 0.82, 0.67 0.32, 0.30 Noise 1.14, 1.18 1.32, 1.19 2.19, 1.47 2.73, 1.07 0.98, 0.87 2.36, 1.09 3.42, 1.86 2.17, 1.16 Add. field 0.22, 0.53 1.71, 0.44 0.79, 1.00 0.20, 0.77 0.14, 0.56 1.52, 1.81 1.61, 1.63 0.38, 0.63 Linear 0.30, 0.17 0.31, 0.18 0.85, 0.65 0.39, 0.21 0.24, 0.13 0.55, 0.36 1.44, 0.69 0.43, 0.30 Square 0.28, 0.17 0.36, 0.19 0.89, 0.67 0.44, 0.23 0.27, 0.13 1.03, 0.37 1.67, 0.75 0.48, 0.31 Neg. square 31.04, 0.17 32.40, 0.19 77.91, 0.67 66.03, 0.23 43.24, 0.13 24.18, 0.37 35.27, 0.77 43.85, 0.31 Two-to-one 22.65, 0.18 32.21, 0.21 44.61, 0.73 27.52, 0.42 32.51, 0.14 46.93, 0.67 75.56, 0.69 40.79, 0.71 Table 1. Ground-truth comparison on the Middlebury dataset. On each entry, results for TV-L 1 (left) and TV-MI (right) are shown. Optimization failures are marked. 4 Experimental results In order to assess the quality of the TV-MI algorithm, we tested it first on optical sequences with ground-truth, using the Middlebury datasets 3, and compared with the illumination-robust TV-L 1 algorithm [5], that estimates additive fields q(x, y) I t I n 1 (u u n, v v n ) + I n t + βq (21) with an additional coefficient β, so that f is augmented to f = (u, v, q). This over-parametrization leads to a compromise between robustness and precision: a high β tends to estimate strong brightness variations and suppress motion, while a low β cannot deal with the actual illumination changes, increasing the risk of divergence. For this comparison, we run the TV-L 1 Matlab implementation available at the TU-Graz computer vision website 4. Our algorithm is currently in Matlab 3 http://vision.middlebury.edu/flow/ 4 http://www.gpu4vision.org
10 Giorgio Panin code, showing roughly the same timing: for example, the RubberWhale sequence takes about 45 sec. for TV-MI and 51 sec. for TV-L 1. Throughout all sequences, parameters were set as follows: data term weight λ = 1 for TV-MI (λ = 50 for TV-L 1 ), initial guess for kernel size w = 5, pyramid levels 30 (with reduction factor 0.9), primal-dual coefficients τ = σ = 1/ 8, 1 warp iteration and 50 inner-loop iterations. 70 60 TV L1 (beta=0) TV L1 (beta=0.05) TV MI Rubberwhale average angular error 6 5 Rubberwhale average end point error TV L1 (beta=0) TV L1 (beta=0.05) TV MI 50 4 40 3 30 20 2 10 1 0 10 3 10 2 10 1 10 0 Gaussian noise std. 0 10 3 10 2 10 1 10 0 Gaussian noise std. Fig. 3. Average estimation errors at different levels of additive noise. In the first set of experiments, we also set β = 0, obtaining the result marked Original in Table 1. As we can see, for a constant illumination, our algorithm shows similar performances or slight improvements. Subsequently, we create more challenging conditions, by making photometric changes to the second image I 1 of each sequence (Fig. 2 shows the RubberWhale example) in the following order: additive Gaussian noise (σ = 0.1), linear map 0.7I 1 + 0.3, nonlinear one-to-one map I 2 1, with color inversion 1 I 2 1, and twoto-one map 2 I 1 0.5. In order to cope with these changes, we set β = 0.05 for TV-L 1. We can see how MI can cope with linear and nonlinear maps, outperforming L 1 most of the times, and showing an improved robustness to random noise (see also Fig. 3). Examples of MRI/CT and near infrared (NIR)/optical pairs, bearing more complex photometric relationships, are shown in Fig. 4. 5 Conclusions In this paper, we presented the TV-MI approach for multi-modal and discontinuitypreserving variational image registration. Future developments may follow several directions. For example, the TV regularizer can be replaced by a more robust, anisotropic Huber term [10]. Moreover, as for any global data term, MI performances degrade in presence of a slowly
TV-MI image registration 11 Fig. 4. Multi-modal registration of medical and infrared-optical images. From left to right: original images; superimposed images, before and after warping. Optical/NIR pictures re-printed with permission ( c James McCreary, www.dpfwiw.com). varying illumination field, that creates a one-to-many relationship by spreading out the joint histogram. For this purpose, here one may resort either to a local formulation of statistics [11], or to an additional parametric field. Finally, a GPU-based implementation can largely improve the speed of histogram sampling, FFT convolution, gradient and Hessian interpolation, and solution to the primal problem. References 1. Horn, B.K.P., Schunk, B.G.: Determining optical flow. Artificial Intelligence 17 (1981) 185 203
12 Giorgio Panin 2. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision (darpa). In: Proceedings of the 1981 DARPA Image Understanding Workshop. (1981) 121 130 3. Bruhn, A., Weickert, J., Schnörr, C.: Lucas-kanade meets horn-schunck: combining local and global optic flow methods. International Journal of Computer Vision 61 (2005) 211 231 4. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Computer Vision - ECCV 2004, 8th European Conference on Computer Vision, Prague, Czech Republic, May 11-14, 2004. Proceedings, Part IV. (2004) 25 36 5. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40 (2011) 120 145 6. Wedel, A., Pock, T., Zach, C., Bischof, H., Cremers, D. In: An Improved Algorithm for TV-L1 Optical Flow. Springer-Verlag, Berlin, Heidelberg (2009) 23 45 7. Aujol, J.F., Gilboa, G., Chan, T.F., Osher, S.: Structure-texture image decomposition - modeling, algorithms, and parameter selection. International Journal of Computer Vision 67 (2006) 111 136 8. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60 (1992) 259 268 9. Haussecker, H.W., Fleet, D.J.: Computing optical flow with physical models of brightness variation. IEEE Trans. Pattern Anal. Mach. Intell. 23 (2001) 661 673 10. Werlberger, M., Pock, T., Bischof, H.: Motion estimation with non-local total variation regularization. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA (2010) 11. Hermosillo, G., Chefd Hotel, C., Faugeras, O.D.: Variational methods for multimodal image matching. International Journal of Computer Vision 50 (2002) 329 343 12. Gaens, T., Maes, F., Vandermeulen, D., Suetens, P. In: Non-rigid multimodal image registration using mutual information. Volume 1496. Springer (1998) 1099 1106 13. Wells, W., Viola, P., Atsumi, H., Nakajima, S., Kikinis, R.: Multi-modal volume registration by maximization of mutual information. Medical Image Analysis 1 (1996) 35 51 14. Kim, J., Kolmogorov, V., Zabih, R.: Visual correspondence using energy minimization and mutual information. In: 9th IEEE International Conference on Computer Vision (ICCV 2003), 14-17 October 2003, Nice, France. (2003) 1033 1040 15. Hirschmüller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30 (2008) 328 341 16. Panin, G., Knoll, A.: Mutual information-based 3d object tracking. International Journal of Computer Vision 78 (2008) 107 118 17. Dame, A., Marchand, E.: Accurate real-time tracking using mutual information. In: IEEE Int. Symp. on Mixed and Augmented Reality, ISMAR 10, Seoul, Korea (2010) 47 56 18. Thevenaz, P., Unser, M.: Optimization of mutual information for multiresolution image registration. IEEE Transactions on Image Processing 9 (2000) 2083 2099 19. Pluim, J.P.W., Maintz, J.B.A., Viergever, M.A.: Mutual-information-based registration of medical images: a survey. Medical Imaging, IEEE Transactions on 22 (2003) 986 1004