Variational Methods for Multimodal Image Matching

Size: px

Start display at page:

Download "Variational Methods for Multimodal Image Matching"

Hubert Smith
5 years ago
Views:

1 International Journal of Computer Vision 5(3), , 22 c 22 Kluwer Academic Publishers. Manufactured in The Netherlands. Variational Methods for Multimodal Image Matching GERARDO HERMOSILLO, CHRISTOPHE CHEFD HOTEL AND OLIVIER FAUGERAS INRIA Sophia Antipolis, Odyssée Lab, 24 route des Lucioles BP 93 Sophia Antipolis Cedex, France Gerardo.Hermosillo@inria.fr Christophe.Chef d hotel@inria.fr Olivier.Faugeras@inria.fr Received December 8, 2; Revised July 25, 22; Accepted July 25, 22 Abstract. Matching images of different modalities can be achieved by the maximization of suitable statistical similarity measures within a given class of geometric transformations. Handling complex, nonrigid deformations in this context turns out to be particularly difficult and has attracted much attention in the last few years. The thrust of this paper is that many of the existing methods for nonrigid monomodal registration that use simple criteria for comparing the intensities (e.g. SSD) can be extended to the multimodal case more complex intensity similarity measures are necessary. To this end, we perform a formal computation of the variational gradient of a hierarchy of statistical similarity measures, and use the results to generalize a recently proposed and very effective optical flow algorithm (L. Alvarez, J. Weickert, and J. Sánchez, 2, Technical Report, and IJCV 39():4 56) to the case of multimodal image registration. Our method readily extends to the case of locally computed similarity measures, thus providing the flexibility to cope with spatial non-stationarities in the way the intensities in the two images are related. The well posedness of the resulting equations is proved in a complementary work (O.D. Faugeras and G. Hermosillo, 2, Technical Report 4235, INRIA) using well established techniques in functional analysis. We briefly describe our numerical implementation of these equations and show results on real and synthetic data. Keywords: image matching, cross correlation, correlation ratio, mutual information, partial differential equations, regularization, variational methods. Introduction The problem of establishing correspondences between two or more images is fundamental in computer vision. It is one of the building blocks for a number of challenging problems such as 3D reconstruction, camera motion estimation, template matching and camera calibration. When images have been acquired through similar sensors, they can be realigned by a direct comparison of their intensities. This results in matching algorithms that essentially look for the geometric transformation between two images which minimizes the Part of this work was done while the second author was visiting the MIT Artificial Intelligence Laboratory. sum of the squared differences between their intensity values. There are several situations in which the hypothesis of the invariance of the intensity is no longer valid. One may consider for instance varying illumination conditions, or sensors with different spectral sensitivities. The same situation is encountered in medical imaging, several acquisition modalities must be realigned to allow for an accurate fusion of complementary information. To cope with this new difficulty, statistical similarity measures between the image intensities have been proposed. Two such similarity measures are the mutual information, proposed by Viola and Wells (997) and independently by Maes et al. (997) and the correlation ratio, proposed by Roche et al. (998a).

2 33 Hermosillo, Chefd hotel and Faugeras Initially, these criteria were used through their maximization over a low-dimensional space of parametric transformations, e.g. rigid or affine. Recent extensions to larger sets of deformations rely on more complex parametric models (Meyer et al., 998; Rückert et al., 998) or block-matching strategies (Maintz et al., 998; Hata et al., 998; Gaens et al., 998). Another interesting approach to multimodal, nonrigid matching was proposed by Roche et al. (2a), parametric intensity corrections are combined with standard monomodal algorithms. In this paper, we propose to extend these approaches by analyzing three statistical criteria in a variational setting. The idea is to recover the underlying deformation by estimating a dense displacement field through the maximization of the similarity measures over a suitable, infinite-dimensional functional space. We believe that this approach opens the possibility of generalizing many nonrigid matching algorithms that are expressed under similar formalisms to the case of multimodal matching. Our main contribution in this paper is the derivation of the Euler-Lagrange equations for the statistical criteria, a task which is nontrivial due to their intricate definition. We consider the cases of mutual information, correlation ratio and cross correlation since they provide a suitable hierarchy of statistical criteria going from the most robust (cross correlation) to the most general (mutual information) (Roche et al., 2b). Our approach relies on smooth estimates of the joint probability density function (pdf) of the intensities in the first and the warped second image. We extend our previous work reported in Chefd hotel et al. (2) by also considering locally estimated pdfs, thus accounting for possible spatial non-stationarities in the intensity relations. The paper is organized as follows. Section 2 discusses the variational formalism and its application to defining the matching problem. Section 3 describes the choice of a suitable functional to perform regularization. In Section 4, we start with a simple registration method which assumes that the joint intensity pdf between different modalities is known (e.g. learned from sample datasets). This idea is extended in Section 5 to the minimization of statistical dissimilarity measures when the joint intensity distribution is unknown. Theoretical results on the existence and uniqueness of a generic matching flow which embodies both the dissimilarity and the regularization components are presented in Section 6. In Section 7 we briefly describe the numerical implementation of the flows and show results of experiments on synthetic and real images in Section 8. In Section 9, we conclude and provide some perspectives. 2. Dense Matching and the Variational Framework We now state our modeling assumptions and define the matching problem in the context of the calculus of variations. We consider two images I σ = I G σ and I2 σ = I 2 G σ at a given scale σ, i.e. resulting from the convolution of two square-integrable functions I : R n R and I 2 : R n R with a Gaussian kernel of standard deviation σ (we restrict ourselves to the cases n = 2, 3). Given a region of interest, a bounded region of R n, we look for a function h: R n assigning to each point x in a displacement vector h(x) R n. This function is searched for in a set F of admissible functions such that it minimizes an energy functional I : F R of the form I(h) = J (h) + R(h). Generally speaking, the set F is assumed to be a linear subspace of a Hilbert space H, the scalar product of which is denoted (, ) H. The term J (h) is designed to measure the dissimilarity between the reference image (I σ ) and the h-warped second image (I2 σ (Id + h)), Id denoting the identity mapping of R n. The term R(h) is designed to penalize fast variations of the function h. To summarize, we define the matching problem as that of minimizing I with respect to h, the matching being a solution ĥ of this minimization problem: ĥ = arg min h F I(h) = arg min (J (h) + R(h)). h F Assuming that I is sufficiently regular, its first variation at h F in the direction of k F is defined by (see e.g. Courant, 946) δ k I(h) = I(h + ɛk) ɛ. () ɛ= The gradient H I(h) ofi is defined by requiring the equality δ k I(h) = ( H I(h), k) H to hold for every k F. If a minimizer ĥ of I exists, then the set of equations δ k I(ĥ) = must hold for every k F, which is equivalent to H I(ĥ) =.

3 Variational Methods for Multimodal Image Matching 33 These are called the Euler-Lagrange equations associated with the energy functional I. Rather than solving them directly (which is usually impossible), the search for a minimizer of I is done using a gradient descent strategy. Given an initial estimate h H, a timedependent, differentiable function (also noted h) from the interval [, + [ into H is computed as the solution of the following initial value problem: { dh dt = ( H J (h) + H R(h)), h()( ) = h ( ). (2) The asymptotic state (i.e. when t ) of h(t) is then chosen as the solution of the matching problem, provided that h(t) F for all t s. 3. Partial Differential Equations for Image Matching In this section, we summarize a method for optical flow estimation that was recently proposed by Alvarez et al. (2) and point out its relation with the previously described framework. We then give some examples of differential operators H R(h). Assuming the images have been acquired through similar sensors (possibly corrupted by a Gaussian noise), a suitable dissimilarity functional is given by the sum of squared differences of the image intensities (Toga, 998): J (h) = ( I σ (x) I σ 2 (x + h(x))) 2 dx. Besides, one of the simplest ways of penalizing fast variations of h is by defining R(h) = α Dh 2 dx. In this case, Eq. (2) yields a set of reaction-diffusion partial differential equations (PDEs) with initial condition h and isotropic diffusion: { h (t, x) = F(h)(x) + αh(x) t (3) h(, x) = h (x), F(h)(x) = ( I σ (x) I 2 σ (x+h(x))) I2 σ (x+h(x)). (4) Considering a more general regularization functional of the form R(h) = α Tr(Dh TDh T ) dx yields a anisotropic diffusion term α div(tdh). Alvarez et al. (2) proposed an optical flow estimation method using this type of PDEs. They propose an image-driven anisotropic diffusion term based on previous work by Nagel and Enkelmann (986). Their diffusion tensor is designed to prevent fast variations of h across the contours of I σ. We propose here a n-dimensional generalization of this tensor ( λ + I σ 2 ) I I σ T I σ = I σ T (n ) I σ 2. (5) + nλ It has one eigenvector equal to I σ, while the remaining eigenvectors span the plane perpendicular to I σ. The corresponding eigenvalues are respectively and λ κ n = (n ) I σ 2 + nλ λ + I σ 2 κ,...,n = (n ). I σ 2 + nλ In homogeneous regions, as I σ λ, all the eigenvalues tend to /n and the diffusion is nearly isotropic. Along the contours of I σ, when I σ λ, we have κ n, while the remaining eigenvalues tend to /(n ). The diffusion becomes anisotropic in this case, taking place mainly along the contours. As in Christensen et al. (994), another possible regularization term is obtained by minimizing an approximate strain energy borrowed from linear elasticity. This approach yields a regularization operator of the form λh + (µ + λ) ( h). (6) The constants λ and µ are known as the Lamé coefficients. In the following two sections, we generalize Eq. (3) to the case of nonrigid multimodal registration by focusing on the matching term F(h). We replace it by a set of functions designed for the minimization of statistical dissimilarity measures.

4 332 Hermosillo, Chefd hotel and Faugeras 4. Supervised Registration Our approach relies on regarding the intensity values of two different modalities as samples of two random processes. Within this probabilistic framework, the link between the two modalities is characterized by their joint probability density function (pdf). In this section, we assume that reference templates, or sets of manually pre-registered images are available to estimate the real joint pdf P, which is assumed to be differentiable and strictly positive. From this knowledge, we can derive a supervised registration principle (SR). We borrow from information theory the notion of uncertainty or information,defined for an event with respect to a probability measure. In this case, an event is the co-occurrence of two intensity values i = I σ (x) and i 2 = I2 σ (x + h(x)) at any point x. For conciseness, we will use the notations i = (i, i 2 ) and I h (x) = (I σ (x), I 2 σ (x + h(x))). The amount of information conveyed by this event is given by log P(i). A global dissimilarity measure follows by computing the total amount of information for a given displacement field h: J SR (h) = log P(I h (x)) dx. This expression can be viewed as a continuous form of the maximum likelihood principle developed by Leventon and Grimson in Leventon and Grimson (998). By applying (), the first variation of J SR is readily found to be δ k J SR (h) = 2 P(I h (x)) P(I h (x)) I 2 σ (x + h(x)) k(x) dx, 2 P denotes the partial derivative of P with respect to its second variable. Choosing H as the set of vector-valued square integrable functions on, this expression is of the form We define the function δ k J SR (h) = ( H J SR (h), k) H. L SR (i) = 2 P(i)/P(i) so that the gradient H J SR of our first dissimilarity measure J SR is given by H J SR (h)(x) = L SR (I h (x)) I2 σ (x + h(x)). (7) The mapping L SR : R 2 R plays the role of an intensity comparison function. It is the so-called score function, often encountered in the derivation of maximum likelihood principles. Notice that a functional dependence between the intensities is not required. Any intensity in the first image may have several corresponding intensities in the second one, and conversely. The knowledge of the nearest most likely intensity correspondence is implicitly given by this function. Let us now assume that we do not have any training set available for the learning process. One can develop the following argument: if the initial pose is close to the solution (small deformations), a sufficiently robust estimate of their joint intensity distribution may be a good approximation of the real joint density for these two modalities. The incorrectly matched image values are considered as noise. Subsequently, h could be recovered from this initial joint density estimate. This is one of the underlying ideas of the methods presented in the following section. 5. Statistical Dissimilarity Functionals In this section, we drop the assumption of the prior knowledge of the function P and compute the variational gradient of three statistical dissimilarity functionals. Among many possible criteria, the cross correlation, the correlation ratio and the mutual information provide us with a convenient hierarchy because of the relation they can enforce between the intensities of the two images (see Fig. ). To be able to evaluate these criteria for a given field h, we consider a nonparametric Parzen estimator (Parzen, 962) for the joint pdf as described below. We call X the random variable associated to the intensity values of I σ and Y h that associated to the values of I2 σ (Id + h). 5.. Density Estimation Our estimator is based on a normalized Gaussian kernel of variance β, noted G β (i): P(i, h) = G β (I h (x) i) dx. Note that for each pair of intensities i R 2, the value of the estimated joint pdf is a nonlinear functional of h.

Variational Methods for Multimodal Image Matching 333 Figure. The three criteria provide a hierarchy of dependence-levels between two random variables.

5 Variational Methods for Multimodal Image Matching 333 Figure. The three criteria provide a hierarchy of dependence-levels between two random variables. The cross correlation (CC) measures their affine dependence so that maximizing this criterion amounts to finding an affine function best fitting the joint pdf. This is illustrated in the left figure by the straight line superimposed over the schematic joint pdf P, dark values represent high probabilities. The correlation ratio (CR) measures their functional dependence so that the optimal density can have the shape of a nonlinear function (middle). Finally, the mutual information (MI) gives an estimate of the statistical dependence and maximizing this criterion tends to cluster P (right). Its first variation is obtained by applying (): δ k P(i, h) = 2 G β (I h (x) i) I2 σ (x + h(x)) k(x) dx. (8) An interesting generalization is to make the density estimator local, since it allows to take into account nonstationarities in the relation between intensities. To do this, we build an estimate in the neighborhood of each point x in. This is achieved by weighting our previous estimate with a normalized Gaussian of variance γ. This means that to each point x we associate a joint pdf defined by: P(i, h, x ) = G γ (x ) G γ (x ) = G β (I h (x) i)g γ (x x ) dx, G γ (x x ) dx. This new pdf is along the line of the ideas discussed in Koenderink and van Doorn (999), except that we now have a bidimensional local histogram at each point Mutual Information Using these tools, we first consider the maximization of mutual information, a concept which is borrowed from information theory. Given two random variables X and Y, their mutual information is defined as MI(X, Y ) = H(X) + H(Y ) H(X, Y ), H stands for the differential entropy. The mutual information is positive and symmetric, and measures how the intensity distributions of two images fail to be independent. It can be defined in terms of the joint pdf P and its marginals p(i ) = R P(i, h) di 2 and p(i 2, h) = R P(i, h) di. The following short notation will be useful: E MI (i, h) = log P(i, h) p(i )p(i 2, h). (9) The dissimilarity functional based on mutual information is then defined as the expected value of the function E MI : J MI g (h) = MI(X, Yh ) = P(i, h)e MI (i, h) di. R 2 We do an explicit computation of its first variation by directly applying (). Using the fact that δ k P(i, h) di = δ k p(i 2, h) di 2 =, R 2 R this yields δ k J MI g (h) = R 2 δ k P(i, h)e MI (i, h) di. () We then apply (8) to obtain δ k J MI g (h) = E MI (i, h) R 2 2 G β (I h (x) i) I2 σ (x + h(x)) k(x) dx di. A convolution with respect to the intensity variable i appears in this expression. It commutes with the derivative 2 with respect to the second intensity variable i 2,

6 334 Hermosillo, Chefd hotel and Faugeras and therefore δ k J MI g (h) = (G β 2 E MI )(I h (x), h) I2 σ (x + h(x)) k(x) dx. We define the function R 2 H R: L g MI (i, h) = G β 2 E MI (i, h) = G β ( 2 P(i, h) P(i, h) ) p (i 2, h). p(i 2, h) The variational gradient H J MI g (h) of our second dissimilarity functional J MI g is thus given by g H J MI g (h)(x) = L MI (I h(x), h) I2 σ (x + h(x)). Similarly to the case of supervised registration (recall equation (7)), the function L g MI plays the role of an intensity comparison function. Its first term 2 P(i, h)/p(i, h) tends to cluster the joint pdf, while the term p (i 2, h)/p(i 2, h) tries to prevent the marginal law p(i 2, h) from becoming too clustered; i.e. keeps the intensities of I2 σ (Id + h) as unpredictable as possible Extension to Locally Estimated Densities. The extension of the previous dissimilarity criterion using the local version of the density estimate is carried out in a straightforward manner. All the definitions remain unchanged, up to their dependence on the position x. For example, Eq. (9) becomes E MI (i, h, x) = log P(i, h, x) p(i, x) p(i 2, h, x). The local version of the dissimilarity functional based on mutual information is then defined as J MI l (h) = = J MI (h, x) dx R 2 P(i, h, x)e MI (i, h, x) di dx. The computation of its first variation is carried out in the same way as that of J MI g in the previous section. The only difference is the additional integral in the space variable which, combined with G γ, yields a spatial convolution. The intensity comparison function becomes a function from R 2 H into R: L l MI (i, h, x) = G γ G β 5.3. Correlation Ratio ( ) 2 P(i, h, x) p (i 2, h, x). G γ (x) P(i, h, x) p(i 2, h, x) We now consider the correlation ratio. This criterion relies on a slightly different notion of similarity. Given two random variables X and Y, the correlation ratio is defined as Var(E(Y X)) CR(X, Y ) =. Var(Y ) From this formula, the correlation ratio can be described as the proportion of energy in Y which is explained by X. More formally, this measure is bounded ( CR ) and expresses the level of functional dependence between Y and X: { CR(X, Y ) = φ : Y = φ(x) CR(X, Y ) = E(Y X) = E(Y ). To define the dissimilarity functional based on the correlation ratio and compute its variational gradient, we need the conditional expectation E(Y h X). We note the value of this random variable µ 2 (i, h), indicating that it depends on the intensity value i and on the field h: P(i, h) µ 2 (i, h) = i 2 R p(i ) di 2. We also use the conditional variance Var[E(Y h X)], which is again a random variable. Its value is abbreviated v 2 (i, h): v 2 (i, h) = i 2 P(i, h) 2 R p(i ) di 2 µ 2 (i, h) 2. Finally, the mean and variance of Y h will also be used: µ 2 (h) = i 2 p(i 2, h) di 2, R v 2 (h) = i2 2 p(i 2, h)di 2 µ 2 (h) 2. R Instead of working with the original definition of CR, we use the total variance theorem to obtain CR(X, Y ) = E(Var(Y X)). Var(Y )

7 Variational Methods for Multimodal Image Matching 335 This transformation was suggested by Roche et al. (998b). We then define the dissimilarity measure based on the correlation ratio as w(h) J CR g (h) = v 2 (h), w(h) = v 2 (i, h)p(i ) di. R The basic apparatus for computing the first variation of J CR g is similar to the one we use for the mutual information. We have δ k w(h) J CR g (h)δk v 2 (h) δ k J CR g (h) =, v 2 (h) ( δ k w(h) = i 2 2 2i 2 µ 2 (h) ) δ k P(i, h) di R 2 and ( δ k v 2 (h) = i 2 2 2i 2 µ 2 (h) ) δ k P(i, h) di. R 2 As in (), the first variation of J CR g (h) is of the form δ k J CR g (h) = R 2 δ k P(i, h)e CR (i, h) di. The discussion starting at Eq. () remains identical in this case and therefore g H J CR g (h)(x) = L CR (I h(x), h) I2 σ (x + h(x)), L g CR (i, h) = G β 2 E CR (i, h). The expanded expression of the function L g CR and its extension using the local version of the joint pdf are given in Table. Like in the previous two cases, L g CR plays the role of an intensity-comparison function. Minimizing J CR g amounts to making i2 lie as close as possible to µ 2 (i, h) (which gives the backbone of P), while keeping the value of µ 2 (i, h) away from µ 2 (h) Cross Correlation To conclude this section, we consider the case of the cross correlation, which has been widely used as a robust comparison function for image matching. Within recent energy-minimization approaches relying on the computation of its gradient, we can mention for instance the works of Faugeras and Keriven (998), Cachier and Pennec (2), and Netsch et al. (2). The cross correlation, being a measure of the affine dependency between the intensities, is the most constrained of the three criteria. Besides the quantities already introduced, its definition relies on the mean and variance of X: µ = i p(i ) di, v = i 2 p(i ) di µ 2, R R Table. Comparison functions. Method Global intensity comparison L g (i, h) MI L g MI (i, h) = G β ( ) 2 P(i, h) p (i 2, h) P(i, h) p(i 2, h) CR L g CR (i, h) = G β µ 2(h) µ 2 (i, h) + ( J CR g (h))(i 2 µ 2 (h)) 2 v 2(h) ( ( )] i µ i2 µ 2 (h) CC L g 2 CC (i, h) = [ v,2 (h) v 2 (h) v ) + J CC g (h) v 2 (h) MI CR CC L l MI (i, h, x) = G γ G β Local intensity comparison L l (i, h, x) ( ) 2 P(i, h, x) G γ (x) p (i 2, h, x) P(i, h, x) p(i 2, h, x) L l CR (i, h, x) = G γ G β µ 2(h, x) µ 2 (i, h, x) + ( J CR (h, x))(i 2 µ 2 (h, x)) 2 v 2(h, x)g γ (x) ( ( i µ (x) i2 µ 2 (h, x) L l CC (i, h, x) = G γ 2 G γ (x) ( v,2 (h, x) v 2 (h, x) v (x) ) + J CC (h, x) v 2 (h, x) ))

8 336 Hermosillo, Chefd hotel and Faugeras as well as on the covariance of X and Y h, noted v,2 (h): v,2 (h) = i i 2 P(i, h)di µ µ 2 (h). R 2 The dissimilarity functional based on the cross correlation is simply defined in terms of these quantities: J CC g (h) = v,2 (h) 2 v v 2 (h). Its gradient has the same general expression and g H J CC g (h)(x) = L CC (I h(x), h) I2 σ (x + h(x)), L g CC (i, h) = G β 2 E CC (i, h) E CC (i, h) = v v 2 (h) (2v,2(h)i 2 (i µ ) + J CC g (h)v i 2 (i 2 2µ 2 (h))). The expanded expression of L g CC and that of its extension using the local version of the joint pdf (L l CC ) are given in Table. Notice that in this case the convolution with respect to the intensity variable yields the same expression and is thus not required, since only linear terms are involved. Again, the L CC functions play the role of comparing intensities. They show that minimizing J CC g amounts to making the pair of intensities lie on a straight line in R 2 (not necessarily passing through the origin). 6. Well-Posedness The main idea that we have pushed forward is that many of the methods already developed for dense, nonrigid matching which assume the invariance of the intensities can be generalized to the case this assumption is no longer valid. This has been achieved by computing the variational gradient of some statistical similarity measures, which have been shown to be very well suited for solving the rigid, multimodal registration problem. We have proposed a general framework for formally computing this variational gradient (i.e. through nonparametric local and global joint pdf estimates) and carried over these calculations in the case of three statistical dissimilarity measures which, in our opinion, provide a convenient trade-off between flexibility and robustness. We have used these results to generalize a recently proposed and very effective optical flow estimation method (Alvarez et al., 2) based on solving a set of partial differential equations of the type of (3). These equations have the general form { dh + Ah(t) = F(h(t)) dt h()( ) = h ( ), the linear operator A corresponds to the regularization term and F is the (nonlinear) matching function. Our study of statistical measures yields a set of matching functions given by F(h)(x) = L(I h (x), h, x) I2 σ (x + h(x)) L(i, h, x) is an intensity comparison term whose expression for the different criteria is summarized in Table. Alvarez et al. showed the well-posedness of their method by proving that the set of PDEs they consider has a unique generalized solution. The proof relies on showing two conditions:. A is a maximal monotone operator from a linear subspace of H into H. 2. The function F : H H is bounded, and Lipschitz continuous. We have extended their work to the cases considered in this paper for the six different matching functions and the two regularization criteria. One of the difficulties of this generalization is the fact that in our case equations (2) are not PDEs since F(h)(x) depends not only of the value of h(x) as in (4) but on its values in a neighborhood of x (which can be as large as the whole image in the case of the global criteria). Moreover we have shown that the unique solution of the initial value problem (2) is classical and regular, i.e. more than a generalized solution. Showing these properties (especially the Lipschitz-continuity) for the matching functions that we present here is an important aspect of our work. The proofs are however quite long and technical and we refer the interested reader to Faugeras and Hermosillo (2).

9 Variational Methods for Multimodal Image Matching Numerical Implementation The numerical implementation of the described continuous matching flows involves estimating the matching term, which depends on one of the six intensitycomparison functions of Table, and the regularization operator, which in our case is either div(t I σ Dh) with T I σ given by Eq. (5) or the elasticity operator, given by Eq. (6). For the discretization in time, we adopt an explicit forward scheme. Implicit schemes are difficult to devise due to the high nonlinearity of the matching functions. 7.. Regularization Operators Alvarez et al. (2) propose a very efficient scheme for estimating the Nagel-Enkelmann operator which we adopt in our experiments. Concerning that of linear elasticity, we use standard centered finite-difference schemes based on first-order Taylor expansions of h and ( h). Assuming the pixel size in both directions to be equal to one, the corresponding scheme for h in 2D is L ij k = ( i+, j h k + h i, j i, j i, j+ k + hk + hk 4h i, j ) i, 4 L i, j k and h i, j k denote respectively the components (k =, 2) of h and h at a grid point (i, j). Using a stencil notation, this scheme may be written as L i, j k = 4 4, {{ h k i increases from left to right and j from top to bottom. The labeled under-brace indicates the grid is that of the component h k of h. Using this notation, our scheme for ( h) in 2D, whose components we denote as A i, j k is: A i, j = 2 + 4, {{ {{ h h 2 A i, j 2 = {{ {{ h 2 h 7.2. Intensity Comparison Functions Concerning the intensity comparison functions, we approximate convolutions by a Gaussian kernel through recursive filtering using the smoothing operator introduced in Deriche (99). Terms of the form I2 σ (x + h(x)) and I2 σ (x + h(x)) are calculated by tri-linear interpolation. The global functions L g MI, L g CR and L g CC are estimated by explicitly computing the global density estimate P(i, h) through recursive smoothing of the discrete joint histogram of intensities as detailed in Section 7.3. Values such as µ 2 (h) are then estimated using finite sums on P. For the local functions, special implementations have been developed as detailed in Sections 7.4 and Density Estimation Parzen density estimates are obtained by smoothing the discrete joint histogram of intensities. We define the piecewise constant function v : [, N] 2 N 2 by quantification of I h (x) into N + intensity levels (bins) between O and A: ( ζ I σ v(x) = (x) ) ζ I2 σ (x + h(x)) (, ) T on, =. (N, N) T on N,N, ζ = N/A, denotes the floor operator in R +, i.e. the function R + N such that x =max{n N : n x, and { k,l (k,l) [,N] 2 is a partition of. We then compute, setting β = ζ 2 β, P(i, h) = G β (I h (x) i) dx = ζ G β (ζ (I h (x) i)) dx ζ G β (v(x) ζ i) dx = ζ = ζ N N k= l= N k= l= G β (k ζi, l ζi 2 ) dx k,l N k,l / G β (k ζi, l ζi 2 ) {{ K (k,l) = ζ (K G β )(ζ i),

10 338 Hermosillo, Chefd hotel and Faugeras K being the discrete joint histogram. The convolution is performed by recursive filtering. Note that this way of computing P is quite efficient since only one pass on the images is required, followed by the convolution. of up to thirty processors, obtaining up to twenty times faster execution with respect to the sequential version. We refer the interested reader to Hermosillo (22) for more details concerning implementation issues Implementation of L l CC The function L l CC is estimated as and L l CC (i, x) = (G γ f )(x)i + (G γ f 2 )(x)i 2 + (G γ f 3 )(x), f (x) = 2v,2 (h, x)/(g γ (x)v (x)v 2 (h, x)), f 2 (x) = 2J CC (h, x)/(g γ (x)v 2 (h, x)) f 3 (x) = ( f (x)µ (x) + f 2 (x)µ 2 (h, x)). All the required space dependent quantities like µ (x) are computed through recursive spatial smoothing. This algorithm is similar to the one proposed in Cachier and Pennec (2) Approximate Implementations of L l MI and Ll CR The functions L l MI and Ll CR are much more difficult to compute than L l CC because they involve two convolutions, one with respect to the intensity variable i and the other with respect to the space variable x. Applying a smoothing filter would require a dense data structure of dimension (n + ) for L l CR and (n + 2) for Ll MI. With 3D images, it becomes extremely difficult to maintain these four and five-dimensional structures. Our implementations rely on computing un-smoothed versions of these functions i.e. on eliminating both convolutions. Although the resulting functions are no longer the gradient of the considered dissimilarity functionals, this approximation makes the computation tractable and has given good results in practice. Quantities like µ 2 (h, x) are estimated using sums, weighted by gaussian functions, over a large neighborhood around each pixel. This corresponds to a non-stationary filtering procedure which is very computationally expensive. However, the resulting algorithms are very well adapted to parallelization and we have implemented them using the MPI library for parallel execution using a cluster 7.6. Parameters We now discuss the way in which the different parameters of the algorithms are determined. γ : This is the variance of the spatial Gaussian for local density estimates. In the case of the cross correlation, its value does not affect the computation time since the local statistics are calculated using the recursive smoothing filter. Thanks to this, we have conducted some experiments with different values of this parameter, which have shown that the algorithms are not very sensitive to it. Qualitatively speaking, the local window has to be large enough for the statistics to be significant, and small enough to account for non-stationarities of the density. It is fixed to 5 with a local window size of 9 9 for the mutual information and the correlation ratio. β: This is the variance of the Gaussian for the Parzen estimates. Unlike γ, determining a good value for β is important for obtaining good results. In our case, it is determined automatically as follows (we refer to Bosq (998)) for a recent comprehensive study on nonparametric density estimation). We adopt a crossvalidation technique based on an empirical maximum likelihood method. We note {i k a set of m intensity pair samples (k =...m) and take the value of β which maximizes the empirical likelihood: L(β) = m k= ˆP β,k (i k ) = m n k ˆP β,k (i k ) {s:i s i k G β (i k i s ) and n k is the number of data samples for which i s = i k. α: This parameter determines the weight of the regularization term in the energy functional. Since the range of the different matching functions varies considerably, we replace it by another one, noted C such that α = Cκ,

Variational Methods for Multimodal Image Matching 339 κ is given by κ = F(h ), h being the initial field and F any of the matching functions. σ : This is the scale parameter.

11 Variational Methods for Multimodal Image Matching 339 κ is given by κ = F(h ), h being the initial field and F any of the matching functions. σ : This is the scale parameter. We adopt a multiresolution approach, smoothing the images at each stage by a small amount. Within each stage of the multi-resolution pyramid, the parameter σ is fixed to a small value, typically.25 voxels. Besides these global parameters, one extra parameter is needed for each family of regularization operators. ξ: Instead of the Lamé coefficients in the linear elasticity operator (Eq. (6)), we use a single parameter ξ controlling the relative weight between the two operators h and ( h)asξh+( ξ) ( h). For ξ close to, the Laplacian operator becomes dominant, while the operator ( h) becomes dominant for ξ close to zero. In practice, we fix the value of ξ to.5, giving thus the same weight to both operators. λ: This is the parameter controlling the anisotropic behavior of the Nagel-Enkelmann tensor. We adopt the method proposed by Alvarez et al. (2). Given s, which in practice is fixed to., we take the value of λ such that s = λ H I σ (z)dz, H I σ (z) is the normalized histogram of I σ. 8. Experiments We present results of experiments using the previously described algorithms. To recover large deformations, we use a multi-resolution approach by applying the gradient descent to a set of smoothed and subsampled images. Since the considered functionals are not convex, this coarse-to-fine strategy helps avoiding irrelevant extrema while reducing the computational cost of the algorithms. 8.. Supervised Registration Method (Fig. 2) In this experiment, a synthetic learning set is produced by applying a sin function and adding Gaussian noise (zero mean with. variance) to four images (whose intensity has been normalized on [, 2π]). We use the Figure 2. SR matching: image mosaics (a) and (b) represent the learning sets: 4 pairs of images. (c) is the reference image I σ and (d) the image to register I 2 σ. The surface (e) represents the estimated score function L SR (i) = 2 P(i)/P(i). (f) shows the resulting image I2 σ (Id + ĥ). (g) and (h) represent the contours of I 2 σ and I2 σ (Id + ĥ) superimposed on I σ. comparison function estimated from this learning set to match two views of a face. The functional dependence between these two intensity maps is clearly recovered from the estimated comparison function, and a visually correct realignment is achieved. Some small artefacts appear, mainly due to some elements in the background which are not present in both views.

Two experiments are carried out, one using linear elasticity regularization (second row of Fig. 3) and the other one using the Nagel-Enkelmann operator (third row).

12 34 Hermosillo, Chefd hotel and Faugeras 8.2. Different Regularization Operators (Fig. 3) In this example, we use the global mutual information criterion to compute the displacement field between two synthetic images. Two experiments are carried out, one using linear elasticity regularization (second row of Fig. 3) and the other one using the Nagel-Enkelmann operator (third row). In this case, we clearly see the advantage of the latter regularization method. It preserves the natural discontinuities of the displacement field. This example underlines the role of the prior knowledge which is embedded in the regularization functional. It also illustrates the fact that validating with synthetic deformation fields may not always be appropriate, since the solution depends on assumed properties of the field. Another interesting feature of this experiment is the fact that the discrete histogram (i.e. before Gaussian smoothing) contains only six non-zero entries, which underlines the importance of the Parzen-window regularization. Figure 4. PD/T2 MRI registration using the global CR comparison function. The first row shows the reference image I σ on the left and the image to register I2 σ. The second row represents the warped image I2 σ (Id + ĥ), and the corresponding displacement field ĥ. The last row represents the contours of I2 σ and I 2 σ (Id + ĥ) superimposed on I σ Global Correlation Ratio (Fig. 4) We use the global correlation ratio criterion to realign two slices from a Proton Density (PD) and a T2 MRI volume (same patient). An artificial geometric distortion (based on a set of three Gaussian kernels) has been applied to the original pre-registered dataset. In order to evaluate the accuracy of the realignment, we superimposed some contours of the T2 image (initial and recovered pose) over the reference image (PD). It gives a good qualitative indication of the quality of the registration. Most of the anatomical structures seem correctly realigned. Figure 3. Global MI criterion with a synthetic example. The first row shows the reference image ( )I σ on the left and the image to register I2 σ. The second and third rows represent the warped image I2 σ (Id + ĥ), and the corresponding displacement field ĥ, for two different types of regularization operators (linear elasticity model for the second row, geometry-driven anisotropic diffusion for the third row) Registration Between MRI and fmri Modalities Using Global Mutual Information (Fig. 5) This example shows an experiment with real MR data of the brain of a macaque monkey. The reference image is a T-weighted anatomical volume and the image to register is a functional, mion contrast MRI (fmri). The

Variational Methods for Multimodal Image Matching 34 Figure 5. MRI-fMRI registration using the global MI criterion. The two columns show two different regions of interest within the image volume.

contrast in this modality is related to blood oxygenation level. This registration was obtained using global mutual information.

13 Variational Methods for Multimodal Image Matching 34 Figure 5. MRI-fMRI registration using the global MI criterion. The two columns show two different regions of interest within the image volume. The first and second rows show the reference, anatomical MRI and the initial fmri volume, respectively, while the third row shows the final, corrected fmri (to be compared with the second row). contrast in this modality is related to blood oxygenation level. This registration was obtained using global mutual information. Notice that the alignment of the main axis of the volume has been corrected Local Mutual Information and Correlation Ratio (Fig. 6) This experiment shows results using the dissimilarity measures based on the local mutual information and the local correlation ratio on synthetic data. The reference and deformed image both taken from the same 2D plane in a MRI data volume. The reference image J was then transformed as: ( ) 2π(x + y ) J (x, y) = sin (2π J(x, y)) cos. A nonrigid smooth deformation was then applied to the second image. As expected, the global similarity criteria failed to align these two images, due to the severe Figure 6. Matching using local mutual information and correlation ratio. The first row shows I σ on the left and I 2 σ on the right. The second row shows I2 σ (Id + ĥ) on the left and its superposition with I σ on the right. The third and fourth rows show respectively the applied and recovered displacement fields (x component on the left). Similar results were obtained using both criteria. non-stationarity in the intensity distributions, while a correct realignment was obtained using the local versions. Similar results were obtained using mutual information and correlation ratio Face Template Matching Using Local Cross Correlation (Fig. 7) This last experiment shows template matching of human faces. The different albedos of the two skins create a multimodal situation and the transformation is truly nonrigid due to the different shapes of the noses and mouths. Notice the excellent matching of the different features. This result was obtained using local cross correlation. The running time was approximately five minutes on a PC at 9 MHz. With the correspondences, one can interpolate the displacement field and the texture to perform fully automatic morphing.

14 342 Hermosillo, Chefd hotel and Faugeras 3. Showing that the resulting equations are well-posed (existence and uniqueness of a solution). 4. Showing that the resulting computational theory provides interesting predictions (results) on real images. Our undergoing research concerns the application of these results in the context of the diffeomorphic matching framework described by Trouvé (998), in which large deformations are more naturally handled. Acknowledgments We thank Jacques Bride for providing us with his implementation of the exponential recursive filters used in our experiments. This work was partially supported by NSF grant DMS , EC grant Mapawamo QLG3-CT-2-36, INRIA ARCs IRMf and MC2, and the Mexican National Council for Science and Technology, Conacyt. Note. The uniqueness of H I(h) is guaranteed by the Riesz representation theorem (Evans, 998). It depends on the choice of the scalar product (, ) H, a fact which explains our notation. Figure 7. Human face template matching: the left column shows the reference image (I σ ) with some reference points at the bottom. The right column shows from top to bottom: I2 σ, I 2 σ (Id + ĥ) and the corresponding reference points according to the found displacement field ĥ. Local cross correlation was used as similarity criterion. 9. Conclusion In this paper, we proposed a variational framework to address the problem of dense matching between images when the hypothesis of intensity preservation is not valid. Our approach relies on the computation of the first variation of a hierarchy of statistical criteria, computed either globally or locally within corresponding regions. We illustrated the efficiency of the algorithms that we propose trough several real and synthetic examples. Our main contributions consist in:. Modeling a real problem as a variational problem. 2. Deriving the necessary equations satisfied by the solutions (if any) of this variational problem. References Alvarez, L., Weickert, J., and Sánchez, J. 2. Reliable estimation of dense optical flow fields with large displacements. Technical Report, Cuadernos del Instituto Universitario de Ciencias y Tecnologías Cibernéticas. A revised version has appeared at IJCV 39():4 56. Bosq, D Nonparametric Statistics for Stochastic Processes. 2nd edn. vol. of Lecture Notes in Statistics. Springer Verlag: Berlin. Cachier, P. and Pennec, X. 2. 3d non-rigid registration by gradient descent on a gaussian weighted similarity measure using convolutions. In Proceedings of MMBIA, pp Chefd hotel, C., Hermosillo, G., and Faugeras, O. 2. A variational approach to multi-modal image matching. In IEEE Workshop on Variational and Level Set Methods, University of British Columbia, Vancouver, Canada. IEEE Computer Society, pp Christensen, G., Miller, M., and Vannier, M A 3d deformable magnetic resonance textbook based on elasticity. In Proceedings of the American Association for Artificial Intelligence, Symposium: Applications of Computer Vision in Medical Image Processing. Courant, R Calculus of Variations. New York. Deriche, R. 99. Fast algorithms for low-level vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2):78 88.

15 Variational Methods for Multimodal Image Matching 343 Evans, L Partial Differential Equations. In Proceedings of the American Mathematical Society. vol. 9 of Graduate Studies in Mathematics. Faugeras, O. and Keriven, R Variational principles, surface evolution, PDE s, level set methods and the stereo problem. IEEE Transactions on Image Processing, 7(3): Faugeras, O.D. and Hermosillo, G. 2. Well-posedness of eight problems of multi-modal statistical image-matching. Technical Report 4235, INRIA. Gaens, T., Vandermeulen, F.M.D., and Suetens, P Nonrigid multimodal image registration using mutual information. In First International Conference on Medical Image Computing and Computer-Assisted Intervention. G. Goos and J. Hartmanis, (Eds.), vol. 496 of Lecture Notes in Computer Science. Springer: Berlin. Hata, N., Dohi, T., Warfield, S., III, W.W., Kikinis, R., and Jolesz, F.A Multi-modality deformable regristration of pre- and intra-operative images for mri-guided brain surgery. In First International Conference on Medical Image Computing and Computer- Assisted Intervention. G. Goos and J. Hartmanis (Eds.), vol. 496 of Lecture Notes in Computer Science. Springer: Berlin. Hermosillo, G. 22. Variational Methods for Multimodal Image Matching. PhD thesis, INRIA, The document is accessible at ftp://ftp-sop.inria.fr/robotvis/html/papers /hermosillo:2.ps.gz. Koenderink, J.J. and van Doorn, A.J Blur and disorder. In Scale-Space Theories in Computer Vision, Second International Conference, Scale-Space 99, M. Nielsen, P. Johansen, O.F. Olsen, and J. Weickert (Eds.), vol. 682 of Lecture Note in Computer Science, Springer: Berlin, pp. 9. Leventon, M. and Grimson, W Multi-modal volume registration using joint intensity distributions. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 98, W. Wells, A. Colchester, and S. Delp (Eds.). Cambridge, MA, USA, vol. 496 in Lecture Notes in Computer Science. Springer: Berlin. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., and Suetens, P Multimodality image registration by maximization of mutual information. IEEE Transactions on Medical Imaging, 6(2): Maintz, J., Meijering, H., and Viergever, M General multimodal elastic registration based on mutual information. In Medical Imaging 998 Image Processing, vol. 3338, SPIE, pp Meyer, C., Boes, J., Kim, B., and Bland, P Evaluation of control point selection in automatic, mutual information driven, 3d warping. In First International Conference on Medical Image Computing and Computer-Assisted Intervention, Proceedings,G. Goos and J. Hartmanis (Eds.), vol. 496 of Lecture Notes in Computer Science. Nagel, H. and Enkelmann, W An investigation of smoothness constraint for the estimation of displacement vector fiels from images sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8: Netsch, T., Rosch, P., van Muiswinkel, A., and Weese, J. 2. Towards real-time multi-modality 3d medical image registration. In Proceedings of the 8th International Conference on Computer Vision, Vancouver, Canada. IEEE Computer Society, IEEE Computer Society Press. Parzen, E On the estimation of probability density function. Ann. Math. Statist., 33: Roche, A., Guimond, A., Meunier, J., and Ayache, N. 2a. Multimodal elastic matching of brain images. In Proceedings of the 6th European Conference on Computer Vision, Dublin, Ireland. Roche, A., Malandain, G., and Ayache, N. 2b. Unifying maximum likelihood approaches in medical image registration. International Journal of Imaging Systems and Technology: Special Issue on 3D Imaging, ():7 8. Roche, A., Malandain, G., Pennec, X., and Ayache, N. 998a. The correlation ratio as new similarity metric for multimodal image registration. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 98, W. Wells, A. Colchester, and S. Delp (Eds.), Cambridge, MA, USA, vol. 496 in Lecture Notes in Computer Science. Springer: Berlin, pp Roche, A., Malandain, G., Pennec, X., and Ayache, N. 998b. Multimodal image registration by maximization of the correlation ratio. Technical Report 3378, INRIA. Rückert, D., Hayes, C., Studholme, C., Summers, P., Leach, M., and Hawkes, D Non-rigid registration of breast mr images using mutual information. In Medical Image Computing and Computer- Assisted Intervention-MICCAI 98, W. Wells, A. Colchester, and S. Delp (Eds.), Cambridge, MA, USA, vol. 496 in Lecture Notes in Computer Science. Springer: Berlin. Toga, A., (Ed.) Brain Warping. Academic Press. Trouvé, A Diffeomorphisms groups and pattern matching in image analysis. International Journal of Computer Vision, 28(3):23 2. Viola, P. and Wells, III, W. M Alignement by maximization of mutual information. The International Journal of Computer Vision, 24(2):37 54.

Matching: theory and applications. Variational Methods for Multimodal Image. Olivier Faugeras

Variational Methods for Multimodal Image Matching: theory and applications Olivier Faugeras Joint work with Gerardo Hermosillo and Christophe Chefd Hotel The problem we consider: Performing automatic non-rigid