Variational Methods for Multimodal Image Matching

Size: px

Start display at page:

Download "Variational Methods for Multimodal Image Matching"

Russell Boyd
5 years ago
Views:

Variational Methods for Multimodal Image Matching Gerardo Hermosillo Valadez To cite this version: Gerardo Hermosillo Valadez.

<tel-00457459> HAL Id: tel-00457459 https://tel.archives-ouvertes.

documents, whether they are published or not.

1 Variational Methods for Multimodal Image Matching Gerardo Hermosillo Valadez To cite this version: Gerardo Hermosillo Valadez. Variational Methods for Multimodal Image Matching. Human-Computer Interaction [cs.hc]. Université Nice Sophia Antipolis, English. <tel > HAL Id: tel Submitted on 7 Feb 200 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientic research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

2 THÈSE présentée à UNIVERSITÉ DE NICE - SOPHIA ANTIPOLIS pour obtenir le titre de DOCTEUR EN SCIENCES École Doctorale Sciences et Technologies de l Information et de la Communication Spécialité Image & Vision soutenue par Gerardo Hermosillo Valadez le 3 mai 2002 Titre Variational Methods for Multimodal Image Matching Méthodes Variationnelles pour le Recalage Multimodal Directeur de thèse : Olivier Faugeras Jury Président Rapporteurs Examinateurs Nicholas Ayache Luis Alvarez Joachim Weickert Laurent Younes Michel Barlaud Luc Robert

4 Acknowlegments This thesis was funded by the Mexican National Council for Science and Technology (CONACYT) through the scholarship offered in conjunction with the French Society for the Exportation of Educational Resources (SFERE). I am grateful to Luis Alvarez, Joachim Weickert and Laurent Younes for accepting to be reviewers of the thesis dissertation. Their comments have being highly encouraging to me. I also thank Michel Barlaud and Luc Robert for accepting to be examiners of the thesis and Nicholas Ayache, who was the chairman of the committee and whose kind suggestions helped improve the manuscript. I thank very specially my thesis advisor Olivier Faugeras, for welcoming me as part of the ROBOTVIS team and for the many hours of fruitful and exciting research discussions. I have learned a lot from his example during these years and my thesis has largely beneted from his deep insight and rigorous work. I would like to acknowledge numerous friends and colleagues with whom I have partaken of many aspects of life over the last few years. I can not possibly mention them all but they will recognize themselves. Each of them brings warm memories to my mind. Perhaps more special thanks are due to José Gomes, Jacques Bride and Chirstophe Chefd Hotel, with whom I worked more closely. I am thankful for their help and what they have taught me. I must also mention the people that rst welcomed me four years ago, of which Nour, Imad, Diane, Nikos and Robert are just a few. They all occupy special places in my memory. Thanks go also to our friends Jacques, Fred, Marie- Cécile, Pierre, David, and many others who have shared with us most of our social activities during our stay in France. I want to express my deep thankfulness to my wife Alejandra, who has helped me a lot with her strength and courage. Her love and unwavering support have been indispensable pillars to my work, and I truly share this accomplishment with her. Throughout my life, my parents Jorge and Socorro have helped me in every possible way, frequently not without self-sacrice. I dedicate this achievement to them in gratefulness for their unconditional love. Gerardo Hermosillo, May 2002.

6 Abstract During the past few years, the use of the theory of partial differential equations has provided a solid formal approach to image processing and analysis research, and has yielded provably well-posed algorithms within a set of clearly dened hypotheses. These algorithms are the state-of-the-art in a large number of application elds such as image de-noising, segmentation and matching. At the same time, the combination of stochastic and variational approaches has led to powerful algorithms which may also be described in terms of partial differential equations. This is the approach followed in the present work, which studies the problem of dense matching between two images using statistical dissimilarity criteria. Two classes of algorithms are considered, corresponding to these criteria being calculated globally for the entire image, or locally within corresponding regions. In each case, three dissimilarity criteria are studied, dened as the opposite of the following similarity measures: mutual information (well adapted to a general statistical dependence between the grey-level intensities), correlation ratio (adapted to a functional dependence), and cross correlation (adapted to an afne dependence). The minimization of the sum of the dissimilarity term and a regularization term denes, through the associated Euler-Lagrange equations, a set of coupled functional evolution equations. Particular emphasis is put in establishing the conditions under which these evolution equations are well posed, i.e. they have a unique solution. It is shown that the proposed algorithms satisfy these conditions for two classes of linear regularization terms, including one which encourages discontinuities of the solution at the contours of the reference image. The discretization and the numerical implementation of the matching algorithms is discussed in detail and their performance is illustrated through several real and synthetic examples, both with 2D and 3D images. As these examples show, the described algorithms are of interest in applications which do not necessarily involve sensors of multiple modalities. They are also of special interest to the medical imaging community, where data fusion between different imaging sensors often requires correcting for nonlinear distortions.

8 Resumé Depuis quelques années, l utilisation des équations aux dérivées partielles a pourvu la recherche en traitement d images d une approche formelle solide, et a abouti à des algorithmes dont on peut montrer le caractère bien posé, étant donné un ensemble d hypothèses clairement dénies. Ces algorithmes forment l état de l art dans beaucoup de domaines d application tels que le débruitage, la segmentation et la mise en correspondance. En parallèle à ceci, des approches combinant des principes variationnels et stochastiques ont amené à de puissants algorithmes qui peuvent aussi être décrits en termes d équations aux dérivées partielles. C est l approche suivi dans ce travail, où est étudié le problème de mise en correspondance dense entre deux images, en utilisant des critères statistiques de dissemblance. Deux classes d algorithmes sont considérées, selon que ces critères soient calculés globalement pour toute l image, ou localement entre des régions correspondantes. Dans chaque cas, trois critères de dissemblance sont étudiés, dénis comme l opposé des critères de ressemblance suivants: information mutuelle (bien adaptée à une dépendance statistique très générale entre les niveaux de gris), rapport de corrélation (adapté à une dépendance fonctionnelle), et corrélation croisée (adaptée à une dépendance afne). La minimisation de la somme du terme de dissemblance et un terme de régularisation dénit, à travers les équations d Euler-Lagrange, un système d équations fonctionnelles d évolution. Nous étudions les conditions sous lesquelles ces équations d évolution sont bien posées, c est-à-dire ont une solution unique et montrons que les algorithmes proposés satisfont ces conditions pour deux classes d opérateurs linéaires régularisants, dont une est conue pour encourager des variations rapides de la solution le long des contours de l image de référence. La performance de ces algorithmes est illustrée à travers plusieurs exemples synthétiques et réels, aussi bien sur des images 2D que 3D. Comme le montrent ces exemples, les algorithmes décrits sont applicables à des problèmes qui ne font pas nécessairement intervenir des capteurs de modalités différentes. Ils sont aussi spécialement intéressants pour la communauté de l imagerie médicale, où le problème de fusionner des données provenant de différentes modalités d imagerie nécessite souvent de corriger des distorsions non-linéaires.

10 Contents Méthodes Variationnelles pour le Recalage Multimodal 5 Introduction 23 Contributions Document Layout I A Generic Image Matching Problem 29 Overview 3. Denition of Images Image Matching Multimodality and Statistical Similarity Criteria Dense Matching and the Variational Framework Study of the Abstract Matching Flow 4 2. Denitions and Notations Basic Properties Semigroups of Linear Operators Solutions of the Abstract Matching Flow Mild and Strong Solutions Classical Solution Regularization Operators Functional Spaces Notations Image Driven Anisotropic Diffusion The Linearized Elasticity Operator Existence of Minimizers

11 0 CONTENTS II Study of Statistical Similarity Measures 63 4 Denition of the Statistical Measures Global Criteria Local Criteria Continuity of MI g and MI l The Euler-Lagrange Equations Global Criteria Mutual Information Correlation Ratio Cross Correlation Local Criteria Mutual Information Correlatio Ratio Cross Correlation Summary Properties of the Matching Terms Preliminary Results Global Criteria Mutual Information Correlation Ratio Cross Correlation Local Criteria Mutual Information Correlation Ratio Cross Correlation III Implementation Aspects 2 7 Numerical Schemes Regularization Operators The Linearized Elasticity Operator The Nagel-Enkelmann Operator Dissimilarity Terms Approximate Implementations of FMI l (h) and F CR l (h) Mutual Information Correlation Ratio Parallel Implementation... 33

12 CONTENTS 8 Determining Parameters Determining the Smoothing Parameter Experimental Results Classication Description of the Experiments Appendices 63 A Other Applications 63 A. Entropy Minimization for Image Segmentation A.2 Diffeomorphic Matching B Library Description 7 B. General Remarks B.2 C++ Listing: Global Mathching Functions B.2. Mutual Information B.2.2 Correlation Ratio B.2.3 Cross Correlation B.3 C++ Listing: Local Mathching Functions B.3. Mutual Information B.3.2 Correlation Ratio B.3.3 Cross Correlation B.4 C++ Listing: 2D Matching Flow B.5 C++ Listing: Main Program and Multiscale Handling Bibliography 89

14 List of Figures. Examples of different image modalities Monomodal matching examples Nonrigid multimodal matching examples Synthetic sensors and the support of their joint intensity distributions Schematic joint intensity distribution The complex plane and the sector of Denition The complex plane and the sectors f and ± f dened in Theorem The complex plane and the sectors ± f and f dened in Theorem Local joint intensity distribution Interpretation of rj MI g (h) Interpretation of rj CR g (h) Interpretation of rj CC g (h) Execution flow for the master processor Execution flow for each slave processor in the parallel implementation of the matching flow Density estimation example Density estimation example Behavior of the two regularization operators Displacement elds with linearized elasticity and anisotropic diffusion Linearized elasticity with ο close to and close to one Determinant of D(Id + h Λ ) Proton density image matching against T2-weighted MRI Deformation eld recovered in the experiment of gure Components of the deformation eld Determinant of the Jacobian of the deformation Matching with local mutual information and correlation ratio Realigned image and its superposition with the reference image Components of the applied deformation eld

15 4 LIST OF FIGURES 9.2 Components of the recovered deformation eld Determinant of the Jacobian for the deformation Global mutual information with fmri data Matching of T2-weighted anatomical MRI against EPI functional MRI Matching of anatomical vs diffusion-tensor-related MRI Sterao matching using global mutual information Deformed and reference images Components of the obtained deformation eld Determinant of the Jacobian for the obtained deformation Human template matching. Reference (left) and target (right) images Deformed template Some corresponding points Components of the obtained deformation Determinant of the Jacobian for the obtained deformation A. Segmentation example A.2 Segmentation example A.3 Segmentation example A.4 Diffeomorphic matching using local mutual information A.5 Diffeomorphic matching using local cross correlation: rst example. 68 A.6 Diffeomorphic matching using local cross correlation: second example 68 A.7 Diffeomorphic matching using local cross correlation: third example. 69

16 Méthodes Variationnelles pour le Recalage Multimodal

18 Méthodes Variationnelles pour le Recalage Multimodal 7 Introduction Cette thèse porte sur le problème de la mise en correspondance dense entre deux images, et en particulier lorsqu une comparaison directe des intensités s avère impossible. Résoudre automatiquement ce problème est une étape fondamentale dans l exploitation et l étude du contenu des images. Par exemple, c est un pré-requis essentiel dans plusieurs problèmes de vision par ordinateur tels que l étalonage de caméras et la reconstruction 3D à partir de (au moins) deux vues d une scène. Le problème peut être vu génériquement comme celui de la fusion de données, c est-à-dire celui de la mise en correspondance d informations provenant de plusieures sources. Quand les sources sont d une nature complémentaire, elles partagent par dénition très peu d information comune, et il s avère donc difcile de fusioner leur sorties respectives. Ceci est un problème très courant dans l analyse des images médicales, où l on est souvent confronté à des multiples modalités d imagerie (Tomographie par rayons X, Résonance Magnetique Nucléaire, Emission de Positrons, etc.). Dans ce contexte, le problème est souvent appellé recalage multimodal. D autres situations où une comparison directe des intensités devient inutile apparaissent en vision par ordinateur. Ainsi, mettre en correspondence des structures similaires sous des conditions d illumination variantes ou lorsque les objets ont des propriétés de réflectance ou de diffusion différents (albédos différents) sont deux exemples ou les méthodes de recalage multimodal peuvent également s appliquer. Nous proposons une approche variationnelle pour le recalage multimodal nonrigide. Les techniques décrites reposent sur le calcul de mesures statistiques de dissemblence entre les intensités de régions correspondantes. Deux familles d algorithmes sont considérées, correspondant au calul global ou local de ces critères. L approche suivie est celle d une modélisation continue du problème et le calul de la première variation des critères statistiques. L existence et l unicité d une solution aux flots de minimisation est démontrée pour les trois critères étudiés (dans leur version locale et globale) ainsi que pour deux familles d opérateurs différentiels de régularisation. Plan du manuscrit Le document est divisé en trois parties. La première partie (chapitres à3)est consacrée à la description des concepts essentiels mis en jeu dans l appariement de deux images en utilisant des critères statistiques de dissemblance, et donne une vue d ensemble de l approche proposée. Les conditions nécessaires à l exsistence et unicité de la solution du problème de minimisation sont établies et deux opérateurs de régularisation sont étudiés en montrant qu ils satisfont les propriétés requises. La seule partie des algorithmes qui n est pas traitée est celle qui concerne le terme de mise en correspondance, issue du critère de dissemblance. C est là l objet de la deuxième partie (chapitres 4 à 6), qui étudie en détail ce terme des équations fonctionnelles,

19 8 Méthodes Variationnelles pour le Recalage Multimodal en calculant tout d abord la première variation des six critères de dissemblance et en établissant ensuite leurs bonnes propriétés pour le caractère bien posé du processus de minimisation. Finalement, la troisième partie (chapitres 7 à9) décrit en détail la discrétisation et l implémentation numérique des algorithmes qui résultent de ces équations, et présente des résultats expérimentaux avec des déformations synthétiques et réelles, mettant en jeu des images 2D ou 3D. Un résumé de chaque chapitre est donné à continuation. PARTIE I: Un Problème Générique de Mise en Correspondance Dense CHAPITRE Ce chapitre introduit les concepts de base mis en jeu dans la mise en correspondance de deux images. Il commence par une introduction de la théorie de l espace d échelle (scale-space) et donne la dénition d image adoptée dans la suite. Il continue en dénissant le problème de mise en correspondance en fonction du type de transformation recherchée et décrit ensuite les critères statistiques de ressemblance en général. Il termine en décrivant le formalisme du calcul de variations et résume l approche suivie dans cette thèse en donnant la forme générale des équations d évolution qui gouvernent le processus de minimisation. CHAPITRE 2 Ce chapitre est consacré à l étude du problème de minimisation introduit au chapitre, dans le cadre abstrait de l analyse fonctionnelle. L existence et l unicité de plusieurs types (faible, forte, classique) de solutions au problème d évolution est démontrée en supposant un terme de mise en correspondance Lipschitz-continu et des opérateurs de régularisation générant des semi-groupes (continus, analytiques) de contractions. CHAPITRE 3 Ce chapitre étudie la partie régularisation des algorithmes de mise en correspondance. Deux familles différentes d opérateurs linéaires sont considérées, dont une conçue pour encourager les discontinuités du champ de déplacement le long des contours de l image de référence. Il est montré que ces opérateurs génèrent des semi-groupes uniformément continus et analytiques de contractions et satisfont donc les conditions nécessaires établies au chapitre 2. Après une discussion sur les espaces fonctionnels considérés, des preuves générales d existence de fonctions minimisantes pour les fonctionnelles d énergie proposées sont décrites.

20 Méthodes Variationnelles pour le Recalage Multimodal 9 PARTIE II: Étude des Mesures Statistiques de Similarité CHAPITRE 4 Ce chapitre introduit les deux classes de termes de dissemblance considérées, appelées globales et locales. Leur dénition est donnée en termes d estimations nonparamètriques de la densité jointe des intensités à partir soit des images dans leur ensemble (globales) soit de régions correspondantes autour de chaque point (locales). Dans chaque cas, trois mesures de similarité sont dénies: corrélation croisée, rapport de corrélation et information mutuelle. CHAPITRE 5 Dans ce chapitre, les équations d Euler-Lagrange associées sont dérivées pour les six mesures de dissemblance. Étant donnée la forme complexe de ces fonctionnelles, un calcul explicite de leur dérivée de Gâteaux est nécessaire pour le calcul des équations d Euler-Lagrange. CHAPITRE 6 Ce chapitre est consacré à montrer que les gradients calculés au chapitre 5 dénissent des fonctions satisfaisant les conditions de continuité nécessaires à l existence et unicité de la solution du problème de minimisation, telles qu elles sont établies au chapitre 2. PARTIE III: Aspects Numériques CHAPITRE 7 Ce chapitre décrit les schémas numériques utilisés pour discrétiser les équations d évolution continues, ainsi que pour l interpolation des images et leurs gradients. Des schémas en temps explicites et implicites sont considérés. Le schéma d implémentation de l estimation de densité basé sur du ltrage récursif est décrit en détail. CHAPITRE 8 Ce chapitre étudie l estimation des différents paramètres des alogrithmes, et en particulier du paramètre de lissage dans l estimation de la densité jointe d intensités.

21 20 Méthodes Variationnelles pour le Recalage Multimodal CHAPITRE 9 Ce chapitre présente des résultats expérimentaux pour tous les algorithmes décrits en utilisant des données aussi bien synthétiques que réelles. Les exemples incluent des images 2D pour des applications en vision par ordinateur, et des images 3D provenant de différentes modalités d imagerie médicale. Conclusion Dans cette thèse, nous étudions le problème de mise en correspondance dense entre deux images, en utilisant des critères statistiques de dissemblance. Deux classes d algorithmes sont considérées, selon que ces critères soient calculés globalement pour toute l image, ou localement entre des régions correspondantes. Dans chaque cas, trois critères de dissemblance sont étudiés, dénis comme l opposé des critères de ressemblance suivants: information mutuelle (bien adaptée à une dépendance statistique très générale entre les niveaux de gris), rapport de corrélation (adapté à une dépendance fonctionnelle), et corrélation croisée (adaptée à une dépendance afne). La minimisation de la somme du terme de dissemblance et un terme de régularisation dénit, à travers les équations d Euler-Lagrange, un système d équations fonctionnelles d évolution. Nous étudions les conditions sous lesquelles ces équations d évolution sont bien posées, c est à dire ont une solution unique et montrons que les algorithmes proposés satisfont ces conditions pour deux classes d opérateurs linéaires régularisants, dont une est conçue pour encourager des variations rapides de la solution le long des contours de l image de référence. La performance de ces algorithmes est illustrée à travers plusieurs exemples synthétiques et réels, aussi bien sur des images 2D que 3D. Comme le montrent ces exemples, les algorithmes décrits sont applicables à des problèmes qui ne font pas nécessairement intervenir des capteurs de modalités différentes. Ils sont aussi spécialement intéressants pour la communauté de l imagerie médicale, où le problème de fusionner des données provenant de différentes modalités d imagerie nécessite souvent de corriger des distorsions non-linéaires.

22 Variational Methods for Multimodal Image Matching

24 Introduction The present thesis deals with a specic problem in the eld of image analysis, namely image matching. Loosely speaking, the problem is that of establishing correspondences between points in two different images. Solving this problem is a fundamental prerequisite in understanding and exploiting the contents of images. For instance, it is essential in middle-level computer vision tasks such as camera calibration and 3D reconstruction from two or more views. The problem may be viewed in a generic fashion as that of data fusion, i.e. that of putting in correspondence information from several sources. When the sources are of a complementary nature, they share by denition little or no common information, and therefore fusing their outputs becomes particularly difcult. This is a common problem in medical imaging, where several modalities are widely used (Xray Tomography, Magnetic Resonance Imaging (MRI), functional MRI (fmri), Positron Emission Tomography, etc.). In this context, the problem is called multimodal image matching. Other situations in which a comparison of the source outputs becomes difcult are matching under varying illumination conditions, or images produced by physical objects with different responses to the same illumination (e.g. different albedos). A possible approach to the solution of this problem is to dene meaningful structures in the image, invariant under transformations of the grey-level intensities, for example edges, corners, etc, and then design low-level methods to extract them from the image. Methods to match these structures can then be devised. This thesis proposes a variational framework for dense multimodal matching. Rather than working with extracted features, the described techniques rely on the computation of statistical dissimilarity measures between the intensities of corresponding regions. The approach which is followed is that of a continuous modeling of the problem, based on the theory of the calculus of variations and partial differential equations (PDEs). This formalism has proved very fruitful in image processing and analysis through its application in image de-noising, segmentation, and matching, mainly because it de-emphasizes the role of discretization and allows to take prot of many results from these mathematical disciplines. The proposed algorithms are divided into two families, corresponding respectively to global and local statistical dissimilarity criteria. The well posedness of the proposed algorithms is proved by showing existence and uniqueness of the solution to the evolution equations that describe the maximiza-

25 24 Introduction tion of the similarity criteria. Contributions This section aims at situating the present work in its context with respect to existing methods, so that a clear appreciation of its contributions and limits may be established. The amount of literature on the subject of image matching is very large. A good survey in the computer vision domain is provided by Mitiche and Boutemy [59]. Barron, et al. give a performance evaluation of some popular optical flow algorithms in [3]. In the domain of medical image registration, a good and recent survey is provided by Maintz and Viergever [9]. At a conceptual level, most of the existing methods for the recovery of motion rely on the minimization of an energy which encompasses two sources of a priori knowledge: (a) what should a good matching satisfy and (b) a model of the transformation or some other constraint allowing to limit the search for possible matches. As for the rst point we can mention the optical flow constraint, the local image differences, or more general block matching strategies [79, 66], which allow to use richer, non-local similarity measures (cross-correlation [35, 36, 2, 64], mutual information [87, 93, 88, 52], correlation-ratio [76], among several others [94, 43, 70, 50]). As for the second point, we can nd for instance the search for low-dimensional transformations (e.g. afne, quadratic, or spline-interpolation between a set of control points [57, 78]). Another example of a constrained deformation is the stereo case, in which the knowledge of the fundamental matrix allows to restrict the search for the matching point along the epipolar line [3, 95]. If the searched transformation is a more general function (i.e. not described by parameters), the constraint may consist in requiring some smoothness of the displacement eld, possibly preserving discontinuities [8, 3, 72, 6, 55, 56,, 0, 32]. Statistical similarity measures have been widely used in the context of image registration through their maximization over a set of low parametric transformations [9]. Mutual information was introduced by Viola et al. [87, 93, 88] and independently by Maes et al. [52]. The correlation ratio was rst proposed as a similarity measure for image matching by Roche et al. [76]. Other statistical approaches rely on learning the joint distribution of intensities, as done for instance by Leventon et al. [50]. Extensions to more complex (non-rigid) transformations using statistical similarity measures include approaches relying on more complex parametric transformations [57, 78], block-matching strategies [53, 4, 37], or parametric intensity corrections [74]. Some recent approaches rely on the computation of the gradient of the local cross correlation [2, 64]. Concerning the regularization of dense displacement elds, we distinguish the approaches based on explicit smoothing of the eld, as in Thirion s deamons algorithm [8] (we refer to [69] for a variational interpretation of this algorithm), from

26 Contributions 25 those considering an additive term in the global energy, yielding (possibly anisotropic) diffusion terms [2, 9]. For a comparison of these two approaches, we refer to the work of Cachier and Ayache [9, 20]. Typically, differential methods are valid only for small displacements and special techniques are required in order to recover large deformations. For instance Alvarez et al. [6] use a scale-space focusing strategy. Christensen et al. [25] adopt a different approach. They look for a continuously invertible mapping which is obtained by the composition of small displacements. Each small displacement is calculated as the solution of an elliptic PDE describing the non-linear kinematics of fluid-elastic materials under deforming forces given by the matching term (in their case the image-differences). Trouvé [84] has generalized this approach using Lie group ideas on diffeomorphisms. Under a similar formalism, a very general framework which also allows for changes in the intensity values is proposed by Miller and Younes [58]. In this thesis, we focus on the study of a family of functional equations resulting from the minimization of global and local statistical dissimilarity measures. The emphasis is put on to the computation of the rst variation of these criteria and on the study of the properties of their gradient operators which are important to establish the well posedness of the minimization flows. Concerning smoothness of the solution, we consider an energy functional composed of the sum of a matching and a regularization term and restrict our study to regularization terms yielding linear operators. We obtain a large family of matching algorithms, each one implying different a priori knowledge about the smoothness of the deformation and the relation between image intensities. We prove that all these problems have a global solution and that the functional equations governing the minimization are well posed in the sense of Hadamard. Interesting generalizations of these results may be obtained for more complex regularization schemes. In this respect we refer to the work of Weickert and Schnörr [9], Trouvé [84] and Miller and Younes [58]. The main contributions of our work are listed below. ffl We propose a unifying framework for a family of variational problems for multimodal image matching. This framework subsumes block matching algorithmic approaches as well as techniques for non-rigid matching based on the global estimation of the intensity relations. ffl We formally compute the gradient of local and global statistical dissimilarity measures, which is an essential step in dening and studying the well posedness of their minimization. Contrary to more standard matching terms like intensity differences or the optical flow constraint, these matching terms are non-local, which makes the standard method of the calculus of variations inapplicable. ffl We show that the operators dened by the gradients of these criteria satisfy some Lipschitz-continuity conditions which are required for the well posedness of the

27 26 Introduction associated matching flows. Document Layout This manuscript is divided in three parts. Part I (chapters to 3) is devoted to the description of the basic concepts involved in matching two images using statistical dissimilarity measures and provides an overview of the proposed approach. The conditions for the existence and uniqueness of a solution to the minimization problem are established and two regularization operators are studied by showing that they satisfy the required properties. The only part of the algorithms that is not treated is the study of the matching term, coming from the dissimilarity measure. This is the object of part II (chapters 4 to 6), which studies this term of the functional equations in detail, computing the rst variation of the six dissimilarity criteria and establishing their good properties in the sense of the well posedness of the minimization process. Finally, part III (chapters 7 to 9) describes in detail the numerical implementation of the resulting algorithms, and presents several experimental results with real and synthetic deformations, involving both 2D and 3D images. In the following, a detailed summary of each chapter is given. PART I: The Generic Image Matching Problem CHAPTER This chapter gives an overview of the type of algorithms studied in the thesis. After providing the formal denition of an image adopted in the sequel, the general matching problem is dened. The chapter continues with a discussion of the statistical similarity criteria and their intuitive behavior. It ends by describing the general framework of the calculus of variations and summarizes the approach followed in the thesis by giving the general form of the functional equations which describe the minimization flows. CHAPTER 2 This chapter is devoted to the study of the minimization problem introduced in chapter, within the abstract framework of functional analysis. The chapter starts with a discussion of the functional spaces considered. Then the existence and uniqueness of several kinds of solutions (weak, strong, classical) to the generic evolution problem is shown assuming Lipschitz-continuity of the matching term and regularization operators generating certain types of contraction semigroups of operators (uniformly continuous, analytical).

28 Document Layout 27 CHAPTER 3 This chapter studies the regularization part of the algorithms. Two different families of linear operators are considered, including one which is designed to encourage discontinuities of the displacement eld along the edges of the reference image. It is shown that these operators generate uniformly continuous, as well as analytical semigroups of contractions and therefore satisfy the required conditions established in chapter 2. PART II: Study of Statistical Similarity Measures CHAPTER 4 This chapter introduces the two classes of matching terms considered, which are called local and global. Their denition is given in terms of non-parametric Parzen-window estimates of the joint intensity distribution from either the whole image or corresponding regions around each pixel (voxel). In each case, three similarity measures are dened: cross-correlation, correlation ratio and mutual information. Existence of minimizers for the energy functional obtained is then shown. CHAPTER 5 In this chapter, the Euler-Lagrange equations are derived for the six dissimilarity measures. Due to the non-standard form of these functionals, an explicit computation of their Gateaux-derivative is necessary. CHAPTER 6 This chapter is devoted to showing that the gradients of the statistical criteria computed in chapter 5 satisfy the Lipschitz-continuity conditions established in chapter 2, necessary to assert the well-posedness of the evolution equations. PART III: Implementation Aspects CHAPTER 7 This chapter describes the numerical schemes employed in implementing the continuous evolution equations, as well as for interpolating image and gradient values.

29 28 Introduction CHAPTER 8 This chapter discusses the way in which the different parameters of the algorithms are determined, particularly the smoothing parameter for the Parzen window estimates. CHAPTER 9 This chapter presents experimental results for all the described algorithms using both real and synthetic data. Examples include 2D images for applications in computer vision and 3D images concerning different medical image modalities.

30 Part I A Generic Image Matching Problem

32 Chapter Overview This chapter gives an overview of the type of algorithms studied in the thesis. After providing the formal denition of an image adopted in the sequel, the general matching problem is dened. The chapter continues with a discussion of the statistical similarity criteria and their intuitive behavior. It ends by describing the general framework of the calculus of variations and summarizes the approach followed in the thesis by giving the general form of the functional equations which describe the minimization flows.. Denition of Images Physically, an image is a set of measurements obtained by integration of some density eld, for example irradiance or water concentration, over a nite area (pixel) or volume (voxel). Sometimes images are vector valued, as color images for example. We shall restrict ourselves to scalar images. In a computer, an image appears as a set of scalar values ordered in a two or three-dimensional array. The grey-value obtained involves a neighborhood of a point, and the idea of resolution, or scale, is captured by modeling the physical eld as a tempered distribution. In practice, this amounts to dening image derivatives by convolution with the derivative of an appropriate kernel. We will view images as functions dened over a two or three dimensional manifold, usually a bounded domain of R n (n = 2; 3) with smooth The range of an image will be considered to be the interval [0; A]..2 Image Matching In many applications, one needs to integrate information coming from different types of sensors, compare data acquired at different times, or put similar structures of two different images into correspondence. These tasks are known respectively as data fu-

32 Chapter : Overview sion, image registration and template matching, and they are all based upon the ability to automatically map points

Additionally, computing the optical flow, reconstructing a 3D scene from (at least) two views, tracking a feature or a region in a video

(a) Black and white photography (b) Magnetic resonance angiography (c) T-weighted magnetic resonance (d) Functional magnetic resonance

Given two sets of points on a manifold (for instance R n ), we want to be able to automatically put them into correspondence, say by nding

33 32 Chapter : Overview sion, image registration and template matching, and they are all based upon the ability to automatically map points between the respective domains of the images. Additionally, computing the optical flow, reconstructing a 3D scene from (at least) two views, tracking a feature or a region in a video sequence and calibrating a camera, also require the ability to establish point correspondences between two images. (a) Black and white photography (b) Magnetic resonance angiography (c) T-weighted magnetic resonance (d) Functional magnetic resonance Figure.: Examples of different image modalities The problem can be formulated as follows. Given two sets of points on a manifold (for instance R n ), we want to be able to automatically put them into correspondence, say by nding a function f : R n! R n. This function can be constrained in many ways, depending on how much we know about the relation between the two sets. For instance, when matching points from a stereo pair we know that corresponding points should belong to epipolar lines, and that from two views taken with the same center

34 .3 Multimodality and Statistical Similarity Criteria 33 of perspective (but different viewing orientations), the transformation is a homography [34]. In other cases, other, more complicated functions are needed, but some a priori knowledge may still be available, like the fact that the transformation should be smooth and invertible. Consider for instance the images shown in Figure.2 on the following page. The rst image pair represents a three-dimensional scene with no moving objects, viewed by a projective camera from two different points in space. Consequently, the transformation which links the points in both images is a homography within each plane of the scene. The regions where occlusions occur, are regions where the transformation f is not invertible. The two images of the second example were constructed by calculating and assigning to each pixel its signed distance from two given curves (the curves outlined in red). One possible way of matching points in the rst curve with points in the second curve is by matching all the points in these two images. This would require to nd a highly nonlinear mapping which however should be smooth and invertible, at least for the points near the curves..3 Multimodality and Statistical Similarity Criteria The second component in the matching problem (the rst one was the nature of the transformation f) is the knowledge about what should be satised when two points are to be associated with one another. Coming back to the examples of Figure.2, it is clear that for the rst image pair a reasonable way of matching the images is by simply comparing the intensities of corresponding pixels. For the second case, since the value of the images is zero for points lying on the curves, it seems also reasonable to match the images by a comparison of the local image intensities. However, images may be produced by a variety of sensors (Figure. on the facing page gives some examples), and this simple way of measuring their similarity is no longer adapted. More general ways of comparing the images are therefore needed. This is the role of statistical similarity measures, which have been widely used to cope with the problem of registering different medical image modalities (see the rst example of Figure.3 on page 35). Nevertheless, these criteria can be used in other situations in which no intensity comparison can be made, even though the acquiring sensors are of similar kind. This is the case for instance when matching images of similar objects which however have different responses to similar lighting conditions (e.g. the two skins with different albedos in the second example of Figure.3). Let us try to give an intuition behind these similarity measures by picturing articial imaging sensors. The described situation is admittedly far simpler than reality, but the idea behind the similarity criteria can be better grasped in this ideal situation. Formal denitions will be postponed until chapter 4. Suppose our detectors are sensitive to a physical quantity Q. To x ideas, we may picture Q as the intensities of a given image (see Figure.4 on page 37). We note

34 Chapter : Overview (a) (b) (c) (d) Figure.

plane of the scene, the matching function is a homography.

35 34 Chapter : Overview (a) (b) (c) (d) Figure.2: Between (a) and (b) the camera has undergone a rigid 3D movement so that, within each plane of the scene, the matching function is a homography. On the other hand, (c) and (d) are constructed as the signed distance functions to the red curves. The matching of these curves requires a highly nonlinear mapping between the two images. The occlusions in the top-row example are regions where the mapping function is not invertible. The mapping between the two curves should on the contrary be invertible everywhere.

36 .3 Multimodality and Statistical Similarity Criteria 35 (a) (b) (c) (d) Figure.3: Nonrigid multimodal matching examples: (a) and (b): T-weighted anatomical magnetic resonance image (MRI) against functional MRI. (c) and (d) : two human faces (with different skin albedos) under similar illuminating conditions.

37 36 Chapter : Overview the output of two given sensors i and i 2. If their response is a smooth function of Q, the support of the joint intensity distribution of intensities is generally a curve in the plane [i ;i 2 ]. A particular case is obtained when one of the responses is an invertible function of Q (say i ). In this case, the support of the joint distribution has a functional form f (i ). When both i and i 2 are invertible functions of Q, the support of the joint distribution is also an invertible function and the output of the two sensors may be equalized to yield the same image. This suggests that looking at the joint distribution of intensities and somehow constraining it to be clustered is an appropriate way of matching related outputs. As will be clear from their expressions, the gradients of the three similarity measures that we consider dene three types of clustering processes of the joint distribution according to a hierarchy of constraints on the intensity relations. Roche et al. [75, 73] have claried the assumptions on which these similarity measures rely by looking for optimal measures from various sensor models. At the most general stage, mutual information is a measure of the statistical dependency between i and i 2. A more constrained criterion is the correlation ratio, which measures the functional dependency between the intensities. Finally, the cross correlation is still more constrained, as it measures their afne dependency (see Figure.5)..4 Dense Matching and the Variational Framework We now summarize the modeling assumptions used in the sequel and dene the matching problem in the context of the calculus of variations. We consider two images I ff = I?G ff and I ff 2 = I 2?G ff at a given scale ff, i.e. resulting from the convolution of two square-integrable functions I : R n! R and I 2 : R n! R (n = 2; 3) with a Gaussian kernel of standard deviation ff. Given a region of interest, a bounded region of R n (we may require its to fulll some regularity constraints, e.g. that of being of class C 2 ), we look for a function h :! R n assigning to each point x in a displacement vector h(x) 2 R n. This function is searched for in a set F of admissible functions such that it minimizes an energy functional I : F! R of the form I(h) =J (h) +R(h); where J (h) measures the dissimilarity between I ff and I ff 2 f (Id + h) and R(h) is a measure of the irregularity of h (Id is the identity mapping of R n ). The dissimilarity term will be dened in terms of global or local statistical measures on the intensities of I ff and Iff 2 f (Id +h), and the irregularity term will generally be a measure of the variations of h in. For example if h is differentiable, R(h) could

.4 Dense Matching and the Variational Framework 37 Same input for the three sensors z } 250 250 250 225 225 225 200 200 200 75 75 75

25 50 75 00 25 50 75 200 225 250 Sensor a Sensor b Sensor c Joint histogram 4 250 250 250 225 225 225 200 200 200 75 75 75 50 25 00

0 0 25 50 75 00 25 50 75 200 225 250 Intensity I SJID a-b SJID a-c SJID b-c Figure.

38 .4 Dense Matching and the Variational Framework 37 Same input for the three sensors z } Sensor a Sensor b Sensor c Joint histogram Intensity I Intensity I SJID a-b SJID a-c SJID b-c Figure.4: Synthetic sensors and the support of the joint intensity distribution (SJID) of their outputs. The second and third rows represent the response and output of three synthetic sensors.

39 38 Chapter : Overview i i i Joint intensity distribution P (i; i2) I(x) p(i ji2(x)) conditionals p(i) marginals p(i2) I2(x) i2 i2 p(i2 ji(x)) i2 Figure.5: Schematic joint intensity distribution. The three criteria give a hierarchy of measures to compare image intensities. The cross correlation measures their afne dependency, so that maximizing this criterion amounts to trying to t an afne function to the joint density. The correlation ratio measures their functional dependency, so that the optimal density can have the shape of a nonlinear function. Finally, their mutual information gives an estimate of their statistical dependency; maximizing this criterion tends to cluster P. be dened as a certain norm of its Jacobian Dh. In summary, the matching problem is dened as the solution of the following minimization problem: h Λ = arg min h2f I(h) = arg min (J (h) +R(h)) : (.) h2f Assuming that I is sufciently regular, its rst variation at h 2Fin the direction of k 2Fis dened by I(h + fflk) I(h) fi(h; k) = lim = ffl!0 ffl di(h + fflk) dffl ffl=0 : If a minimizer h Λ of I exists, then fi(h Λ ; k) = 0 must hold for every k 2 F. The equations fi(h Λ ; k) =0are called the Euler-Lagrange equations associated with the energy functional I. Assuming that F is a linear subspace of a Hilbert space H, endowed with a scalar product ( ; ) H, we dene the gradient r H I(h) of I by requiring that 8k 2F; di(h + fflk) dffl =(r H I(h); k) H : ffl=0

40 .4 Dense Matching and the Variational Framework 39 The Euler equations are then equivalent to r H I(h Λ ) = 0. Rather than solving them directly (which is usually impossible), the search for a minimizer of I is done using a gradient descent strategy. Given an initial estimate h 0 2 H, we introduce time and a differentiable function, also noted h from the interval [0;T] into H (we say that h 2 C ([0;T]; H)) and we solve the following initial value problem: 8 >< >: dh dt = r HI(h) = r H J (h) +r H R(h) h(0)( ) =h 0 ( ): ; (.2) That is, we start from the initial eld h 0 and follow the gradient of the functional I (the minus sign is because we are minimizing). The solution of the matching problem is then taken as the asymptotic state (i.e. when t!)ofh(t), provided that h(t) 2F for a sufciently large t. The boundary conditions, i.e. the values of h(t)( ) must also be specied. This will be done along with the choice of the space F of admissible functions in Chapter 3. We assume for the moment (since this is the case we shall treat) that r H R(h) is a linear application from a linear subspace of H into H. In Chapter 3, concrete functional spaces F and H will be chosen and two families of regularization operators will be studied. The computation and study of the properties of r H J (h) for a set of statistical dissimilarity measures will be the object of Part II of this manuscript. In the following chapter, we study the existence and uniqueness of a solution of (.2) from an abstract viewpoint, by borrowing tools from the theory of semigroups generated by unbounded linear operators on a Hilbert space.

42 Chapter 2 Study of the Abstract Matching Flow In the previous chapter, r H I was dened by assuming that h belongs to a Hilbert space denoted H. Consequently, equation (.2) may be viewed as a rst-order ordinary differential equation with values in H. It turns out that studying it from such an abstract viewpoint allows to prove the existence and uniqueness of several types of solutions (mild, strong, classical) of (.2), by borrowing tools from functional analysis and the theory of semigroups of linear operators. We refer to the books of Brezis [8] and Pazy [68] for formal studies of these subjects. In the present chapter, we study the generic minimization flow (.2) within this abstract framework. The linear operator r H R(h) dened by the regularization term will be simply noted A and the non-linear matching term r H J will be generically noted F. The unknown of the problem is an H valued function h :[0; +[! H dened on R +. The goal of this chapter is to establish the properties required by A and F in order for equation (.2), which is now written as a semilinear abstract initial value problem of the form 8 >< >: dh dt Ah(t) =F (h(t)); t>0 h(0) = u 0 2 H; (2.) to have a unique solution (in a sense to be dened). That these conditions are met will be the object of Chapter 3 concerning two different families of linear regularization operators A, and of Chapter 6 concerning six different matching functions F. 2. Denitions and Notations We begin by introducing some denitions and notations. H will denote a complex Hilbert space with scalar product ( ; ) H 2 C, i.e. satisfying for u and v in H,

43 42 Chapter 2: Study of the Abstract Matching Flow (u; v) H = (v; u) Λ H ; where Λ denotes the complex conjugate of. The real and imaginary parts of 2 C will be noted Re( ) and Im( ). The norm of H, induced by the Hilbert product, will be noted k k H =( ; ) =2 H. If E and F denote two Banach spaces, a linear operator is any linear application A : D(A) ρ E! F from its domain D(A), a linear subspace of E, into F. We shall restrict ourselves to densely dened linear operators, i.e. for which D(A) is dense in E. In the following, we consider a linear operator A : D(A) ρ H! H. The range of A is the linear subspace of H and its graph is the set of pairs Ran(A) =ff 2 H : f = Au; u 2D(A)g (A) =f[u; A u];u2d(a)gρh H: A is said to be closed if (A) is a closed subset of H H. It is said to be bounded if there exists c 0 such that kauk H» c kuk H ; 8u 2D(A): The smallest such c will be denoted kak. The graph norm of A is the norm jjj jjj A dened, for u 2D(A), as and its numerical range is the set jjjujjj A = kuk H + ka uk H Q(A) =f(au; u) H ; kuk H =gρc : A is said to be invertible if, for all f 2 H, there exists a unique u 2D(A) such that Au = f. It implies that Ran(A) = H. We note u = A f and readily verify that A is a linear application from H into D(A). If an invertible operator A is closed, it follows (Proposition 2.3) that A is a bounded linear operator. Finally, if I denotes the identity operator on H, the resolvent set ρ(a) of a closed linear operator A is the set of all 2 C for which I A is invertible, i.e. ( I A) is a bounded linear operator. The family R( : A) =( I A) ; 2 ρ(a) of bounded linear operators is called the resolvent of A. 2.2 Basic Properties We now state some basic properties of densely dened closed linear operators that will be useful in the sequel. In all this section, A denotes a densely dened closed linear operator from D(A) ρ H into H.

44 2.2 Basic Properties 43 Proposition 2. D(A), endowed with the graph norm of A, is a Banach space. Proof : Consider a Cauchy sequence fu n g in D(A), i.e. such that ku n u p k H + kau n Au p k H n;p!! 0: (2.2) We must prove that fu n g converges to u 2 D(A). Because of (2.2), we have that ku n u p k H! 0 and kau n Au p k H! 0, i.e. we have two Cauchy sequences in H, which are convergent since H is complete. We therefore have [u n ;Au n ]! [u; f ], where u 2 H and f 2 H. Since (A) is closed, we have that [u; f ] 2 (A). This means that (a) f = Au, which implies that jjju n ujjj A! 0, and (b) u 2 D(A) which completes the proof. 2 We next recall the closed graph theorem. Theorem 2.2 (Closed graph theorem) Let E and F be two Banach spaces and let T : E! F be a linear operator. If the graph of T is closed then there exists c > 0 such that ktuk F» c kuk E, i.e. T is continuous. Proof : The proof can be found for example in Theorem II.7 of the book of Brezis [8]. 2 The closed graph theorem allows to prove the following. Proposition 2.3 If A is invertible then A is a bounded linear operator. Proof : We have A : H!D(A) is a linear application. Since A is closed, D(A) endowed with the graph norm of A is a Banach space (Proposition 2.). Now since Ran(A) =H and 8f 2 H, A Af = f, we have that (A) =f[u; Au];u2D(A)g = f[a f;f];f 2 Hg = (A ) and thus A is closed. We therefore can apply the closed graph theorem to A, which says that there exists c>0 such that This implies that ka uk H + kuk H» c kuk H : ka uk H» c kuk H and thus A is a bounded linear operator. 2 From Proposition 2.3, the following result readily follows. Proposition 2.4 If A is invertible then there exists c > 0 such that kauk H c kuk H ; 8u 2D(A):

45 44 Chapter 2: Study of the Abstract Matching Flow Proof : Since A is invertible, A is a bounded linear operator. Therefore there exists c>0 such that ka Auk H» c kauk H ; 8u 2D(A): This completes the proof since A Au = u. 2 As a direct consequence of Proposition 2.4, we have the following useful result. Proposition 2.5 If A is invertible then the graph norm of A, jjj jjj A and the norm ka k H are equivalent, i.e. there exist c > 0 and c 2 > 0 such that jjjujjj A c»kauk H» c 2 jjjujjj A ; 8u 2D(A): Proof : We have jjjujjj A = kauk H + kuk H and therefore the right part of the inequality is obvious (c 2 = ). For the left part, since A is invertible we apply Proposition 2.4 to A which says that there exists c>0 such that kauk H c kuk H. Adding ckauk H to both sides of this inequality yields the desired estimate Semigroups of Linear Operators Consider a one-parameter family S(t), 0» t» + of bounded linear operators from H to H. This family is said to be a C 0 semigroup of bounded linear operators if Denition 2.. S(0) = I, 2. S(t + t 2 )=S(t ) S(t 2 ), 8 t ;t 2 0: (the semigroup property) 3. lim t!0 + S(t)u = u, 8 u 2 H. The Hille-Yosida theorem says that there is a one-to-one correspondence between C 0 semigroups of contractions (ks(t)k» ; 8t 0) and maximal monotone operators in a Hilbert space. A linear operator A is maximal monotone if and only if. A is monotone: Re(Au; u) H 0; 8u 2D(A); 2. and maximal: Ran(I + A) = H: That is, a linear operator A is maximal if 8f 2 H; 9u 2D(A) such that u + Au= f: if A is a maximal monotone operator, A is said to be the innitesimal generator of the corresponding C 0 semigroup noted S A (t); t 0. The relation between A and S A (t) is the following. Given u 0 2D(A), consider the initial value problem, 8 >< >: dh dt Ah(t) =0; t>0 h(0) = u 0 : (2.3)

46 2.3 Semigroups of Linear Operators 45 if A is a maximal monotone operator, the Hille-Yosida theorem asserts that there exists a unique solution of (2.3), i.e. a unique function h :[0; +[! H such that:. h(t) is continuous and h(t) 2D(A) for t h(t) is continuously differentiable for t h(t) satises (2.3) for t 0. Moreover, the solution satises kh(t)k H above are summarized by saying that» ku 0 k H for t 0. The rst two points h 2 C([0; +[; D(A)) C ([0; +[;H): The linear application S A (t), D(A)! D(A) is dened by S A (t)u 0 = h(t), where h(t) is the solution of (2.3) at time t. Since ks A (t)u 0 k H» ku 0 k H, it is possible, using the Hahn-Banach theorem and the fact that H is a Hilbert space [8], to extend S A (t) by continuity and density to a linear continuous operator H! H. This family of operators, also noted S A (t),isthec 0 semigroup of contractions corresponding to A. A property of the C 0 semigroups of bounded operators that we will need later is given next. Proposition 2.6 For all u 2D(A), S A (t)u 2D(A) and d dt S A(t) u = AS A (t) u = S A (t) Au: Proof : The proof can be found for example in Theorem.2.4 of the book of Pazy [68]. 2 We will also make use of analytic semigroups of operators, which are dened as follows. For more details, the interested reader is referred to [68]. Denition 2.2 Let = fz 2 C : ' < arg z < ' 2 ;' < 0 < ' 2 g and for z 2, let S(z) be a bounded linear operator. The family S(z); z 2 is an analytic semigroup in if. z! S(z) is analytic in. 2. S(0) = I and lim z! 0 z 2 S(z)u = u; 8u 2 H: 3. S(z + z 2 )=S(z ) S(z 2 ) 8z ;z 2 2 : (the semigroup property) A semigroup S(t) will be called analytic if it is analytic in some sector containing the nonnegative real axis (Figure 2.). Clearly, the restriction of an analytic semigroup to the real axis is a C 0 semigroup. We will make use of the following characterization of the innitesimal generator of an analytic semigroup.

47 46 Chapter 2: Study of the Abstract Matching Flow C Im 0 ' ' 2 Re Figure 2.: The complex plane and the sector of Denition 2.2, containing [0; +[. A semigroup S(t) will be called analytic if it is analytic in. Theorem 2.7 Let A be the innitesimal generator of a uniformly bounded C 0 semigroup S(t) and assume 0 2 ρ(a). The following statements are equivalent.. S(t) can be extended to an analytic semigroup in a sector f = fz : j arg zj < fg and ks(z)k is uniformly bounded in every closed sub-sector μ f0, f 0 <f,of f. 2. There exist 0 <f<ß=2 and M > 0 such that ρ(a) ff ± f = f : j arg j < ß 2 + fg[f0g and kr( : A)k» M j j for 2 ± f ; 6= 0: Proof : The proof is found in Theorem of the book of Pazy [68]. 2 Figure 2.2 illustrates the relation between the sectors ± f and f of Theorem 2.7. C f Im ± f ρ ρ(a) 0 f f f Re f Figure 2.2: The complex plane and the sectors f and ± f dened in Theorem 2.7. A C 0 semigroup S(t) can be extended to an analytic semigroup in f if the resolvent set ρ(a) of A includes the sector ± f for some 0 <f<ß=2.

48 2.4 Solutions of the Abstract Matching Flow Solutions of the Abstract Matching Flow We now consider the initial value problem (2.): 8 >< >: dh dt Ah(t) =F (h(t)); t>0 h(0) = u 0 2 H; (2.4) and start by dening four different kinds of solutions. Denition 2.3 (Global classical solution) A function h : [0; +[! H is a global classical solution of (2.4) if h 2 C([0; +[; H) C (]0; +[; H) C(]0; +[; D(A)); and (2.4) is satised for t>0. Denition 2.4 (Local classical solution) A function h :[0;T[! H is a local classical solution of (2.4) if h 2 C([0;T[; H) C (]0;T[; H) C(]0;T[; D(A)); and (2.4) is satised for 0 <t<t. Denition 2.5 (Strong solution) A function h which is differentiable almost everywhere on [0;T] such that dh=dt 2 L (]0;T[; H) is called a strong solution of the initial value problem (2.4) if h(0) = u 0 and dh=dt Ah(t) =F (h(t)) almost everywhere on [0;T]. Denition 2.6 (Mild solution) A continuous solution h of the integral equation h(t) =S A (t)u 0 + t is called a mild solution of the initial value problem (2.4). 0 S A (t s)f (h(s)) ds (2.5) The last denition is motivated by the following argument. If (2.4) has a classical solution then the H valued function k(s) = S A (t s)h(s) is differentiable for 0 < s<tand (Proposition 2.6): dk ds = AS A(t s)h(s) +S A (t s)h 0 (s) = AS A (t s)h(s) +S A (t s)ah(s) +S A (t s)f (h(s)) = S A (t s)f (h(s)): (2.6)

49 48 Chapter 2: Study of the Abstract Matching Flow If F f h 2 L ([0;T[; H) then S A (t s)f (h(s)) is integrable and integrating (2.6) from 0 to t yields hence k(t) k(0) = h(t) S A (t)u 0 = Denition 2.6 is thus natural. h(t) =S A (t)u 0 + t 0 t 0 S A (t s)f (h(s)) ds; S A (t s)f (h(s)) ds: The main goal of this chapter is to establish sufcient conditions on A (in view of the regularization operators that will be studied in the next chapter) and on F in order for the initial value problem (2.4) to have a unique global classical solution Mild and Strong Solutions Sufcient conditions on A and F for (2.4) to have a unique mild solution are given by the following theorem. Theorem 2.8 Let F : H! H be uniformly Lipschitz continuous on H and let A be a maximal monotone operator. Then the initial value problem (2.4) has a unique mild solution h 2 C([0;T]; H) (given by equation (2.5)). Moreover, the mapping u 0! h is Lipschitz continuous from H into C([0;T]; H). Proof : The proof can be found for example in Theorem 6..2 of [68]. 2 Since H is a Hilbert space, taking an initial value u 0 2D(A) sufces to obtain existence and uniqueness of a strong solution. Theorem 2.9 Let F, A and h be those of Theorem 2.8. Then, if u 0 2D(A), h is the unique strong solution of (2.4). Proof : This is a direct consequence of Theorem 6..6 in [68] since H, being a Hilbert space, is a reflexive Banach space Classical Solution To show the existence of a classical solution of (2.4), we will make use of analytic semigroups. If A generates an analytic semigroup of operators and 0 2 ρ(a) (i.e. A is invertible), it is shown in Section of the book of Pazy [68] that A ff can be dened for 0 <ff» and that A ff is a closed linear invertible operator with domain dense in H.

50 2.4 Solutions of the Abstract Matching Flow 49 The closedness of A ff implies that its domain, endowed with the graph norm of A ff, is a Banach space (Proposition 2.). Moreover, since A ff is invertible, its graph norm is equivalent to the norm k k ff = ka ff k H (Proposition 2.5). Thus D(A ff ), equipped with the norm k k ff, is a Banach space which we denote by H ff. Proposition 2.0 Let H ff be the Banach space dened above. Then H ff ρ H with continuous embedding. Proof : Since A ff is a densely dened closed linear invertible operator from D(A ff ) ρ H into H, we may apply Proposition 2.4 to A ff, which says that there exists c>0 such that ka ff uk H c kuk H, 8u 2D(A ff ). Therefore there exists c>0 such that 2 kuk H» c kuk ff ; 8u 2 H ff : (2.7) The importance of the continuous embedding of H ff into H lies in the fact that if the function F in (2.4) is Lipschitz continuous in H, i.e. if it satises for some L F > 0: kf (u ) F (u 2 )k H» L F ku u 2 k H ; 8u ;u 2 2 H; then it follows from equation (2.7) that it is also Lipschitz continuous in H ff. Moreover, if F is bounded in H, i.e. if it satises for some K F > 0: kf (u)k H» K F ; 8u 2 H; then F is well dened in H ff. The main result that we will use is the following, which is a special case of Theorems 6.3. and in [68]. Theorem 2. Assume that A generates an analytic semigroup S(t) satisfying ks(t)k» M and that 0 2 ρ(a), so that the Banach space H ff above is well dened. Assume further that for some L F > 0 and K F > 0 and for 0» ff 0 <ff<, the function F satises. kf (u ) F (u 2 )k H» L F ku u 2 k ff 8u ;u 2 2 H ff. 2. kf (u)k H» K F 8u 2 H ff. Then for every u 0 2 H ff, the initial value problem (2.4) has a unique global classical solution h 2 C([0; +[; H) C(]0; +[; D(A)) C (]0; +[; H): Moreover, the function t! dh=dt from ]0; +[ into H ff is Hölder continuous.

51 50 Chapter 2: Study of the Abstract Matching Flow Proof : This follows directly from Theorem 6.3. (existence of a local classical solution) and Theorem (extension to a global solution using the boundedness of F ) in [68], with k(t) = K F in Theorem The Hölder continuity follows from corollary in [68] which also shows that the Hölder exponent veries 0 << ff. 2 We are thus interested in the possibility of dening A ff, i.e. that of extending a given C 0 semigroup to an analytic semigroup in some sector around the nonnegative real axis. In order to do that, we will use Theorem 2.7 on page 46, together with the following one. Theorem 2.2 Let A be a densely dened closed linear operator in H. Let Q(A) be the closure in C of the numerical range of A and ± its complement, i.e. ±=C nq(a). If ± 0 is a connected component of ± satisfying ρ(a) ± 0 6= ; then ρ(a) ± 0 and kr( : A)k» d( : Q(A)) ; where d( : Q(A)) is the distance of from Q(A). Proof : The proof is found in Theorem.3.9 of [68]. 2 In view of the regularization operators studied in the next chapter, the following theorem establishes sufcient assumptions for A to be the innitesimal generator of an analytic semigroup. Theorem 2.3 Let A be the innitesimal generator of a C 0 semigroup of contractions on H (i.e. let A be a maximal monotone operator). We assume that A is invertible, i.e. that 0 2 ρ(a) and that:. (Au; v) H =(u; Av) H, 8u; v 2D(A) (A is called symmetric). 2. Re(Au; u) H» c kuk H for some c>0 (coerciveness). Then A is the innitesimal generator of an analytic semigroup of operators on H. Proof : From the two assumptions about (Au; u) H, it follows that the numerical range Q(A) =f(au; u) H ; kuk H =gof A is a subset of the interval ( ; c] for some c>0 (since the rst assumption implies, by the denition of the scalar product, that (Au; u) H 2 R; 8u 2 H). Choosing 0 <f<ß=2and denoting ± f = f 2 C : jarg j <ß=2+fg (see Figure 2.3 on the facing page), there exists a constant C f such that d( : Q(A)) C f j j for all 2 ± f : (2.8) This is clear from Figure 2.3, where we see that d( : Q(A)) d d 0 = j j cos f, so we can set C f = cos f. Moreover, d(0 : Q(A)) = c and therefore ± f ρ C nq(a).

52 2.4 Solutions of the Abstract Matching Flow 5 C Im ± f f f d d 0 ( ; c] ff Q(A) f c 0 Re Figure 2.3: The complex plane and the sectors ± f and f dened in Theorem 2.3. The fact that 0 2 ρ(a) shows that ± f, which contains 0, is a connected component of C nq(a) that has a nonempty intersection with ρ(a). This implies by Theorem 2.2 on the facing page that ρ(a) ± f and that for every in ± f, 6= 0, kr( : A)k» d( : Q(A))» C j j : We can therefore apply Theorem 2.7 on page 46, which allows us to conclude that the C 0 semigroup generated by A can be extended to an analytic semigroup S(z) in the sector f = fz 2 C : jargzj < fg (see Figure 2.3), and that ks(z)k is uniformly bounded in every closed sub-sector μ f0, f 0 <f,of f. 2 As a summary of the results of this chapter, we end it by stating the main result arrived at in the form of a single theorem. Theorem 2.4 (Main result) If the following assumptions are satised,. The linear operator A : D(A) ρ H! H is the innitesimal generator of a C 0 semigroup of contractions on H ( A is maximal monotone) and A is invertible. 2. 8u; v 2 D(A), (Au; v) H = (v; Au) H and there exists c > 0 such that (Au; u) H» ckuk H. 3. F is bounded and Lipschitz continuous in H. then, for each u 0 2 H ff, the initial value problem (2.4) has a unique global classical solution as dened in Denition 2.3 on page 47.

53 52 Chapter 2: Study of the Abstract Matching Flow Proof : Assumptions and 2 are the assumptions of Theorem 2.3 and therefore A generates an analytic semigroup S(t) satisfying ks(t)k» M. This is the rst assumption of Theorem 2. and, since we assume that A is invertible, we also have that 0 2 ρ(a). Now assumption 3 together with equation (2.7) implies the two remaining assumptions of Theorem 2. and the proof is complete. 2

54 Chapter 3 Regularization Operators This chapter studies the regularization part of the initial value problem (.2), i.e. the term r H R(h). Two families of regularization operators are considered, including one which encourages the preservation of edges of the displacement eld along the edges of the reference image. In view of the results of the previous chapter, we choose concrete functional spaces F and H and specify the domain of the regularization operators. We then show that these operators satisfy the properties of A which are sufcient to assert the existence of a classical solution of (2.) according to the main result of the previous chapter. 3. Functional Spaces We begin by a brief description of the functional spaces that will be appropriate for our purposes. In doing this, we will make reference to Sobolev spaces, denoted W k;p (). We refer to the books of Evans [33] and Brezis [8] for formal denitions and in-depth studies of the properties of these functional spaces. For the denition of r H I, we use the Hilbert space H = L 2 () = (W 0;2 ()) n : The regularization functionals that we consider are of the form R(h) =ff '(Dh(x)) dx; (3.) where Dh(x) is the Jacobian of h at x, ' is a quadratic form of the elements of the matrix Dh(x) and ff > 0. Therefore the set of admissible functions F will be contained in the space H () = (W ;2 ()) n :

55 54 Chapter 3: Regularization Operators Additionally, the boundary conditions for h will be specied in F. Assuming for deniteness that h i h =0almost everywhere we set F = H ;2 0 () = (W0 ()) n : As will be seen, due to the special form of R(h), the regularization operators are second order differential operators, and we therefore will need the space for the denition of their domain. H 2 () = (W 2;2 ()) n 3.2 Notations We introduce in this section some notations that will be used in the sequel. Recall the general form of R(h) given by (3.). The quadratic form ' : M n n! R + is dened on the set M n n of n n matrices with real coefcients. The components of a vector x 2 R n will be noted x i, i f will denote the i th partial derivative of a scalar function f, so that its gradient rf is given by rf = [@ f;:::;@ n f ] T. The mapping '(Dh(x)) is given by '(Dh(x)) = X i;j;k;l a ijkl i h j k h l (x); where a ijkl are n 4 scalar functions dened in. The divergence of a vector eld h : R n! R n is denoted div(h) = r h = P ih i. For a matrix T 2 M n n, composed of row vectors t fg :::t fng, i.e. T =[t fg :::t fng ] T, we note div(t) =[r t fg ; :::; r t fng ] T ; so that the following relations hold: 8 >< >: div Dh T = div (r h) Id = r r h ; div Dh =[ h ; :::; h n ] T h: (3.2) Given R(h) as in (3.), the computation of r H R(h) is standard: r H R(h) = ff div(d'(dh)): 3.3 Image Driven Anisotropic Diffusion The rst regularization functional that we consider is dened by ' (Dh) = 2 Tr DhT I ff Dh T ; (3.3)

56 3.3 Image Driven Anisotropic Diffusion 55 where T I ff is a n n symmetric matrix dened at every point of by the following expression: T f = ( + jrfj2 )Id rfrf T (n )jrfj 2 + n ; for f : R n! R: This matrix is a regularized projector in the plane perpendicular to rf. It was rst proposed by Nagel and Enkelmann [62] for computing optical flow while preserving the discontinuities of the deforming template. As pointed out by Alvarez et al. [6], applying the smoothness constraint to the reference image (here I ff ) instead of the deforming one (here I ff 2 ) allows to avoid artifacts which appear when recovering large displacements. The matrix T f has one eigenvector equal to rf, while the P remaining eigenvectors span the plane perpendicular to rf. Its eigenvalues i verify i i = independently of rf. It is straightforward to verify that div(d' (Dh)) = 0 div(t I ff rh ). div(t I ff rh n ) C A : Thus, the regularization operator r H R(h) yields a linear diffusion term with T f as diffusion tensor. In regions where rh i is small compared to the parameter in T f, the diffusion tensor is almost isotropic and so is the regularization. At the edges of f, where jrf j >>, the diffusion takes place mainly along these edges. This operator is thus well suited for encouraging large variations of h along the edges of the reference image I ff. We dene our rst regularization operator as follows. Denition 3. The linear operator A : D(A )! H is dened as 8 >< >: D(A )=H 0 () H2 (); A h = 0 div(t I ff rh ). div(t I ff rh n ) C A : We now check that A is a symmetric maximal monotone invertible operator, applying the standard variational approach [33]. Proposition 3. The operator (I A ) denes a bilinear form B on the space H 0 () which is continuous and coercive (elliptic). Proof : Because of the form of the operator A, it is sufcient to work on one of the coordinates and consider the operator a : D(a )! L 2 () dened by a u = div(t I ff ru);

57 56 Chapter 3: Regularization Operators and to show that the operator u! (u a u) denes a bilinear form b on the space H 0 () which is continuous and coercive. Indeed, we dene b (u; v) = uv vdiv(ti ff ru) dx: We integrate by parts the divergence term, use the fact that v 2 H0 (), and obtain b (u; v) = Because the coefcients of T I ff Schwarz: uv + rv T T I ff ru dx: are all bounded, we obtain, by applying Cauchy- jb (u; v)j»c kuk H ()kvk H (); where c is a positive constant, hence continuity. Because the eigenvalues of the symmetric matrix T I ff are strictly positive, we have c T Id, where c T is a positive constant. This implies that T I ff from which it follows that ru T T I ff rudx = b (u; u) kuk 2 L 2 () c Tkruk 2 L 2 () ; b (u; u) c 3 kuk 2 H () ; for some positive constant c 3 > 0 and hence we have coerciveness. 2 We can therefore apply the Lax-Milgram theorem and state the existence and uniqueness of a weak solution in H 0 () to the equation h A h = f for all f 2 L 2 (). Since is regular (in particular C 2 ), the coefcients of T I ff in C (), the solution is in H 0 () H2 () and is a strong solution (see e.g. [33]). Proposition 3.2 A is a maximal monotone self-adjoint operator from D(A ) = H 0 () H2 () into L 2 (). Proof : Monotonicity follows from the coerciveness of B proved in the previous proposition. Maximality also follows from the proof of proposition 3.. According to the same proposition, we have D(A ) = H 0 () H2 () and Ran(Id A ) = H (application of the Lax-Milgram theorem). In order to prove that the operator is self-adjoint, it is sufcient, since it is maximal monotone, to prove that it is symmetric ([8], proposition VII.6), i.e. that ( A h; k) L 2 () = (h; A k) L 2 () and this is obvious from the proof of proposition Lemma 3.3 The linear operator ffa is invertible for all ff>0.

58 3.4 The Linearized Elasticity Operator 57 Proof : It is sufcient to show that the equation ffa h = f has a unique solution for all f 2 L 2 (). The proof of proposition 3. shows that the bilinear form associated to the operator ffa is continuous and coercive in H (), hence the Lax-Milgram theorem tells us that the equation ffa h = f has a unique weak solution in H 0 () for all f 2 L 2 (). Since is regular the weak solution is in H 0 () H2 () and is a strong solution The Linearized Elasticity Operator The second regularization operator that we propose is inspired from the equilibrium equations of linearized elasticity (we refer to Ciarlet [27] for a formal study of threedimensional elasticity theory), which are of the form: μ h +( + μ) r(r h) =0: (3.4) The constants and μ are known as the Lamé coefcients. Rather than modeling the domain as an elastic material, the idea in this section is simply to view the lefthand side of (3.4) as a kind of diffusion operator and use it as an instance of r H R(h) in (.2). What interests us is the flexibility gained by the relative weight which we can give to the two operators h and r(r h), so that a single parameter (controlling this weight) is a priori needed. Also, in order to assert the existence of a minimizer of the functional I obtained, it is desirable to dene '(Dh) in such a way that it is convex in the variable Dh. To this end, we consider the one-parameter family ( <ο» ) of 2 functions of the form for which we have ' 2 (Dh) = 2 ο Tr(Dh T Dh) + ( ο) Tr(Dh 2 ) ; (3.5) div(d' 2 (Dh)) = ο h +( ο) r(r h): Thus, we dene the second regularization operator as follows. Denition 3.2 The linear operator A 2 : D(A 2 )! H is dened as 8 >< >: D(A 2 )=H 0 () H2 (); A 2 h = ο h +( ο) r(r h): for 2 <ο». This would require a more complex modeling since true elasticity is always non-linear [27].

59 58 Chapter 3: Regularization Operators We now check that A 2 is a symmetric maximal monotone invertible operator. Proposition 3.4 The operator (I A 2 ) denes a bilinear form B 2 on the space H 0 which is continuous and coercive (elliptic). Proof : We consider the bilinear form C 2 dened as C 2 (h; k) = k T A 2 h dx; where h and k are functions in H 0. Integrating by parts, we nd C 2 (h; k) = and B 2 (h; k) =C 2 (h; k) + ο Tr(Dh T Dk) + ( ο) Tr(Dh Dk) dx; jc 2 (h; k)j» X ijkl h(x) k(x) dx. Wehave ja ijkl j j@ i h k h l j dx; where the constants a ijkl are all bounded. Thus, by applying several times Cauchy- Schwarz, we nd that jc 2 (h; k)j»c 2 khk H ()kkk H (); c 2 > 0; and hence, using Cauchy-Schwarz again, jb 2 (h; k)j»b 2 khk H ()kkk H (); b 2 > 0: This proves the continuity of B 2. Next we note that which proves the coerciveness of B 2. 2 B 2 (h; h) οkhk 2 H () ; Proposition 3.5 A 2 is a maximal monotone self-adjoint operator from D(A 2 ) = H 0 () H2 () into L 2 (). Proof : Monotonicity follows from the coerciveness of B 2 proved in the previous proposition. More precisely, since ( A 2 h; h) L 2 () = C 2(h; h), the proof shows that ( A 2 h; h) L 2 () ο R Tr(DhT Dh) dx 0. Regarding maximality, proposition 3.4 shows that the bilinear form B 2 associated to the operator Id A 2 is continuous and coercive in H (). We can therefore apply the Lax-Milgram theorem and state the existence and uniqueness of a weak solution in H 0 () of the equation h A 2h = f for all f 2 L 2 (). Since is regular (in

60 3.5 Existence of Minimizers 59 particular C 2 ), the solution is in H 0 () H2 () and is a strong solution (see e.g. [27], Theorem 6.3-6). Therefore we have D(A 2 ) = H 0 () H2 () and Ran(I A 2 ) = H. Finally, A is self-adjoint for the same reasons as those indicated in the proof of proposition Lemma 3.6 The linear operator ffa 2 is invertible for all ff>0. Proof : It is sufcient to show that the equation ffa 2 h = f has a unique solution for all f 2 L 2 (). The proof of proposition 3.4 shows that the bilinear form associated to the operator ffa 2 is continuous and coercive in H (), hence the Lax-Milgram theorem tells us that the equation ffa 2 h = f has a unique weak solution in H 0 () for all f 2 L 2 (). Since is regular the weak solution is in H 0 () H2 () and is a strong solution Existence of Minimizers Having dened the regularization functionals, we discuss in this section the existence of minimizers of the global energy functional I(h) =J (h) +ff '(Dh(x)) dx; (3.6) where ' is either ' dened in Equation (3.3) or ' 2 dened in Equation (3.5). We assume that J (h) is continuous in h, and bounded below. These properties will be shown for the statistical dissimilarity functionals J (h) that we study in Part II. In the following, we use the notion of weak convergence, dened as follows (see e.g. [33]). Denition 3.3 Let E be a Banach space and let E Λ be its dual. We say that a sequence fh k gρe weakly converges to h 2 E, written h k * h; if k Λ ; h k ff! k Λ ; h ff for each bounded linear functional k Λ 2 E Λ. The main result that we will use is given by the following theorem, found in Ciarlet [27]. Theorem 3.7 Let be a bounded open subset of R n and 2 R. Assume that the function ' : R μ! [;] satises the following two conditions:. '(x; ) :u 2 R μ! '(x; u) is convex and continuous for almost all x 2.

61 60 Chapter 3: Regularization Operators 2. '( ; u) :x 2! '(x; u) is measurable for all u 2 R μ. Then u k * u in L () ) '(u) dx» lim inf k! Proof : The proof is found in Theorem 7.3- of [27]. 2 '(u k ) dx: The following theorem is a slight modication of Theorem in [27], which assumes that J (h) is a linear continuous functional. Theorem 3.8 Given I(h) as in (3.6), assume that ' is convex and coercive, i.e. that there exist ff>0 and such that '(F) ffjjfjj 2 + ; for all F 2M n n : Assume further that J (h) is continuous in h and bounded below, and that inf k2h 0 () I(k) < : Then there exists at least one function h 2 H 0 () satisfying h = inf k2h 0 () I(k): Proof : First, by the coerciveness of the function ' and the fact that J (h) is bounded below, we have (using the inequality of Poincaré) I(h) c khk 2 H () + d; for all h 2 H 0 () and some constants c>0and d. Let fh k g be a minimizing sequence for I, i.e. h k 2 H 0 () 8k; and lim I(h k)=inf k! k2h 0 () I(k) =m: The assumption that m < and the relation I(k)! as kkk H ()! imply together that fh k g is bounded in the reflexive Banach space H (). Hence fh k g contains a subsequence fh p g that weakly converges to an element h 2 H (). The closed convex set H 0 () is weakly closed and thus the weak limit h belongs to H 0 (). The fact that h p * h in H () implies that Dh p * Dh in L 2 () and, since is bounded (which implies that L () ρ L 2 ()), we have Dh p *Dh in L 2 () ) Dh p *Dh in L (): We conclude from Theorem 3.7 on the preceding page that '(Dh) dx» lim inf p! '(Dh p ) dx:

62 3.5 Existence of Minimizers 6 Since J is continuous, J (h) = lim J (h p) and thus I(h)» lim inf I(h p) = m: p! p! But since h 2 H 0 (), we also have I(h) m and consequently 2 I(h) =m = inf k2h 0 () I(k): We now check that ' and ' 2 satisfy the hypotheses of Theorem 3.8. For the case of ', we consider each of its scalar components since this separation is possible. As pointed out in [6], because of the smoothness i I ff, T I ff has strictly positive eigenvalues and therefore, clearly, Proposition 3.9 The mapping is convex. The coerciveness of ' readily follows. ' : R n 7! R + X 7! XT I ff X T Proposition 3.0 The functional R (h) = ' (Dh(x)) dx; satises the coerciveness inequality, i.e. 9 c > 0; c 2 0 such that: ' (Dh(x)) c jdhj 2 c 2 : Proof : We have ru T T I ff ru jruj 2 8x 2 Where >0 is the smallest eigenvalue of T I ff. 2 We now turn to ' 2. Proposition 3. The mapping is convex. ' 2 : M n n 7! R + X 7! ο Tr(X T X) + ( ο) Tr(X 2 );

63 62 Chapter 3: Regularization Operators Proof : We write ' 2 as a quadratic form of the components X k of X, ' 2 (X) = Xn 2 n 2 i X j a ij X i X j and notice that the smallest eigenvalue of the matrix a ij is equal to 2ο. The result follows from the fact that <ο». 2 2 Proposition 3.2 The functional R 2 (h) = ' 2 (Dh(x)) dx; satises the coerciveness inequality, i.e. 9 c > 0; c 2 0 such that: ' 2 (Dh(x)) c jdhj 2 c 2 : Proof : We choose c equal to the smallest eigenvalue of ' 2 and c 2 =0. 2

64 Part II Study of Statistical Similarity Measures

66 Chapter 4 Denition of the Statistical Measures As mentioned before, a general way of comparing the intensities of two images is by using some statistical or information-theoretic similarity measures. Among numerous criteria, the cross correlation, the correlation ratio and the mutual information provide us with a convenient hierarchy in the relation they assume between intensities [75, 73]. The cross correlation has been widely used as a robust comparison function for image matching. Within recent energy-minimization approaches relying on the computation of its gradient, we can mention for instance the works of Faugeras and Keriven [36], Cachier and Pennec [2] and Netsch et al. [64]. The cross correlation is the most constrained of the three criteria, as it is a measure of the afne dependency between the intensities. The correlation ratio was introduced by Roche et.al [76, 77] as a similarity measure for multi-modal registration. This criterion relies on a slightly different notion of similarity. From its denition given two random variables X and Y, CR = Var[E[XjY ]] ; (4.) Var[X] the correlation ratio can intuitively be described as the proportion of energy in X which is explained byy. More formally, this measure is bounded (0» CR» )) and expresses the level of functional dependence between X and Y : ( CR =, 9f X = f(y ) CR =0, E[XjY ]=E[X]: The concept of mutual information is borrowed from information theory, and was introduced in the context of multi-modal registration by Viola and Wells III [88]. Given

67 66 Chapter 4: Denition of the Statistical Measures two random variables X and Y, their mutual information is dened as MI = H[X] +H[Y ] H[X; Y ] ; where H stands for the differential entropy. The mutual information is positive and symmetric, and measures how the intensity distributions of two images fail to be independent. We analyze these criteria from two different perspectives, namely that of computing them globally for the entire image, or locally within corresponding regions. Both types of similarity functionals are based upon the use of an estimate of the joint probability of the grey levels in the two images. This joint probability, noted P h (i ;i 2 ),is estimated by the Parzen window method [67]. It depends upon the mapping h since we estimate the joint probability distribution between the images I ff 2 f (Id + h) and I ff. To be compatible with the scale-space idea and for computational convenience, we choose a Gaussian window with variance > 0 as the Parzen kernel. We will often use the notation i (i ;i 2 ) and G (i) =g (i )g (i 2 )= 2ß exp jij2 2 = p exp( i2 2ß 2 ) p exp( i2 2 2ß 2 ): Notice that G and all its partial derivatives are bounded and Lipschitz continuous. We will need in Chapter 6 the innite norms kg k and kg 0 k. For conciseness, we will sometimes use the following notation when making reference to a pair of grey-level intensities at a point x: I h (x) (I ff (x);iff 2 (x + h(x))): 4. Global Criteria We note X I ff the random variable whose samples are the values I ff (x) and X I2 ff; h the random variable whose samples are the values I ff 2 (x + h(x)). The joint probability density function of X g I ff and Xg I2 ff ; h (the upper index g stands for global) is dened by the function P h : R 2! [0; ]: P h (i) = G jj (I h (x) i) dx: (4.2) R Notice that the usual property R P h(i)di =holds true. With the help of the estimate (4.2), we dene the cross correlation between the two images I ff 2 f (Id + h) and I ff, noted CCg (h), the correlation ratio, noted CR g (h) and the mutual information, noted MI g (h). In order to do this we need to introduce more random variables besides X g I ff and X g I2 ff ; h. They are summarized in Table 4.. We have introduced in this table the conditional law of X g I2 ff; h with respect to Xg I ff, noted P h (i 2 ji ): P h (i 2 ji )= P h(i) p(i ; ) (4.3)

68 4. Global Criteria 67 Random variable Value PDF (X g I ff;xg I2 ff; h) (i) P h(i) X g I ff i p(i )= X g I ff 2 ; h i 2 p h (i 2 )= E[X g I ff 2 ; hjxg I ff ] μ 2j(i ; h) i 2 P h (i 2 ji ) di 2 p(i ) R P h (i) di 2 R P h (i) di R Var[X g I ff 2 ; hjxg I ff ] v 2j(i ; h) p(i ) i 2 2 P h (i 2 ji ) di 2 μ 2j (i ; h) 2 R Table 4.: Random variables: global case. and the conditional expectation E[X g I2 ff; hjxg I ff ] of the intensity in the second image I ff (Id 2 + h) conditionally to the intensity in the rst image Iff. We note the value of this random variable μ 2j (i ; h), indicating that it depends on the intensity value i and on the eld h. Similarly the conditional variance of the intensity in the second image conditionally to the intensity in the rst image is noted Var[X g I2 ff; hjxg I ff ] and its value is abbreviated v 2j (i ; h). The mean and variance of the images will also be used. Note that these are not random variables and that, for the second image, they are functions of h: μ 2 (h) i 2 p h (i 2 ) di 2 ; (4.4) R v 2 (h) i 2 2 p h(i 2 ) di 2 (μ 2 (h)) 2 : (4.5) R Their counterparts for the rst image do not depend on h: The covariance of X g I ff v μ i p(i ) di ; (4.6) R i 2 p(i ) di (μ ) 2 : (4.7) R and Xg I2 ff ; h will be noted v ;2 (h) R 2 i i 2 P h (i) di μ μ 2 (h): (4.8)

69 68 Chapter 4: Denition of the Statistical Measures The three similarity measures may now be dened in terms of these quantities : CC g (h) = v ;2(h) 2 v v 2 (h) ; (4.9) CR g (h) = v v 2j (i ; h) p(i ) di ; (4.0) 2 (h) R MI g P h (i) (h) = P h (i)log di: (4.) R 2 p(i )p h (i 2 ) The three criteria are positive and should be maximized with respect to the eld h. Therefore we propose the following denition. Denition 4. The three global dissimilarity measures based on the cross correlation, the correlation ratio and the mutual information are as follows: J CC g (h) = CC g (h); J CR g (h) = CR g (h) +; J MI g (h) = MI g (h): Note that this denition shows that the mappings h! J CC g (h), h! J CR g (h) and h!j MI g (h) are not of the form h! R L(h(x)) dx, for some smooth function L : R n! R. Therefore the Euler-Lagrange equations will be slightly more complicated to compute than in this classical case. 4.2 Local Criteria An interesting generalization of the ideas developed in the previous section is to make the estimator (4.2) local. This allows us to take into account non-stationarities in the distributions of the intensities. We weight our estimate (4.2) with a spatial Gaussian of variance fl > 0 centered at x 0. This means that for each point x 0 in we have two random variables, noted XI l ff; x and Xl 0 I2 ff; x 0; h (the upper index l stands for local) whose joint pdf is dened by: where and P h (i; x 0 )= G fl (x 0 ) G (I h (x) i)g fl (x x 0 ) dx; (4.2) G fl (x x 0 )= p ( 2ßfl) n exp( jx x 0j 2 ); 2fl G fl (x 0 )= G fl (x x 0 ) dx»jjg fl (0): (4.3) Note that instead of using the original denition of CR, we use the total variance theorem to obtain E[Var[XjY ]] CR = Var[Y ] : This transformation was suggested in [77], and turns out to be more convenient.

4.2 Local Criteria 69 I 2 i 2 A Joint local intensity distribution P (i ;i 2 ; x 0 ) I 0 A i Figure 4.: Local joint intensity distribution. The pdf dened by expression (4.

estimate, we dene at every point x 0 of the local cross correlation between the two images I ff and Iff 2 f (Id +h), noted CC l (h; x 0 ), the local correlation ratio, noted CR l (h; x 0 ) and the

70 4.2 Local Criteria 69 I 2 i 2 A Joint local intensity distribution P (i ;i 2 ; x 0 ) I 0 A i Figure 4.: Local joint intensity distribution. The pdf dened by expression (4.2) is in the line of the ideas discussed by Koenderink and Van Doorn in [47], except that we now have a bidimensional histogram calculated around each point (see gure 4.). With the help of this estimate, we dene at every point x 0 of the local cross correlation between the two images I ff and Iff 2 f (Id +h), noted CC l (h; x 0 ), the local correlation ratio, noted CR l (h; x 0 ) and the local mutual information, noted MI l (h; x 0 ). In order to do this, just as in the global case, we need to introduce more random variables besides XI l ff; x and Xl 0 I2 ff; x 0; h. We summarize our notations and denitions in Table 4.2. As in the global case, we dene the mean and variance of XI l ff; x (note that they 0 are not random variables but they are functions of x 0 ): μ (x 0 ) i p h (i ; x 0 ) di ; (4.4) R v (x 0 ) i 2 p h (i ; x 0 ) di (μ (x 0 )) 2 ; (4.5) R the mean and variance of XI l 2 ff; x 0; h (note that these quantities depend additionally on the displacement eld h): μ 2 (h; x 0 ) i 2 p h (i 2 ; x 0 ) di 2 ; (4.6) R

71 70 Chapter 4: Denition of the Statistical Measures Random variable Value PDF (X l I ff ; x 0 ;Xl I ff 2 ; x 0; h ) (i) P h(i ;i 2 ; x 0 ) X l I ff ; x 0 i p(i ; x 0 )= X l I ff 2 ; x 0; h i 2 p h (i 2 ; x 0 )= P h (i ;i 2 ; x 0 ) di 2 R P h (i ;i 2 ; x 0 ) di R E[X l I ff 2 ; x 0; h jxl I ff ; x 0 ] μ 2j(i ; h; x 0 ) p(i ; x 0 ) i 2 P h (i 2 ; x 0 ji ) di 2 R Var[X l I ff 2 ; x 0; h jxl I ff ; x 0 ] v 2j(i ; h; x 0 ) p(i ; x 0 ) i 2 2 P h(i 2 ; x 0 ji ) di 2 R μ 2j (i ; h; x 0 ) 2 Table 4.2: Random variables: local case. v 2 (h; x 0 ) i 2 2 p h (i 2 ; x 0 ) di 2 (μ 2 (h; x 0 )) 2 ; (4.7) R as well as their covariance: v ;2 (h; x 0 ) i i 2 P h (i 2 ; x 0 ) di μ (x 0 ) μ 2 (h; x 0 ): (4.8) R The semi-local similarity measures (i.e. depending on x 0 ) can be written in terms of these quantities: CC l (h; x 0 ) = CR l (h; x 0 ) = v 2 (h; x 0 ) MI l (h; x 0 ) = v ;2 (h; x 0 ) 2 v (x 0 ) v 2 (h; x 0 ) ; (4.9) R 2 P h (i; x 0 )log v 2j (i ; h; x 0 ) p(i ; x 0 ) di ; (4.20) R P h (i; x 0 ) di: (4.2) p(i ; x 0 )p h (i 2 ; x 0 )

72 4.3 Continuity of MI g and MI l 7 We dene global similarity functionals by aggregating these local measures: CC l (h) = CR l (h) = MI l (h) = CC l (h; x 0 ) dx 0 ; CR l (h; x 0 ) dx 0 ; MI l (h; x 0 ) dx 0 : The three criteria are positive and should be maximized with respect to the eld h. In order to dene a minimization problem, we propose the following denition. Denition 4.2 The three local dissimilarity measures based on the cross correlation, the correlation ratio and the mutual information are as follows: J CC l(h) = CC l (h); J CR l(h) = CR l (h) +jj; J MI l(h) = MI l (h): Note that, just as in the global case, this denition shows that the mappings h! J CC l(h), h!j CR l(h) and h!j MI l(h) are not of the form h! R L(h(x)) dx, for some differentiable function L : R n! R. Therefore the Euler-Lagrange equations will be more complicated to compute than in this classical case. This will be the object of the next chapter. 4.3 Continuity of MI g and MI l Recall that the existence of minimizers for I(h) was discussed in the end of Chapter 3 by assuming continuity and boundedness of J (h). This is proved in Theorems 6.29 on page 04 and 6.6 on page 7 for the cross correlation in the global and local cases, respectively, and in Theorems 6.2 on page 00 and 6.53 on page 5 for the correlation ratio in the global and local cases, respectively. In the case of the mutual information, we have the following. Proposition 4. Let h n ;n = ; ; be a sequence of functions of H such that h n! h almost everywhere in. Then MI g (h n )! MI g (h). Proof : Because I ff 2 and g are continuous, G (i hn (x) i)! G (i h (x) i) a.e. in R 2. Since G (i hn (x) i)» g (0) 2, the dominated convergence theorem implies

73 72 Chapter 4: Denition of the Statistical Measures that P hn (i)! P h (i) for all i 2 R 2. A similar reasoning shows that p hn (i 2 )! p h (i 2 ) for all i 2 2 R. Hence, the logarithm being continuous, P hn (i) log P hn (i) p(i )p hn (i! 2 ) P P h (i) h(i) log p(i )p h (i 2 ) We next consider three cases to nd an upper bound for P hn (i) log i 2» 0 This is the case where Hence This yields and and therefore 8i 2 R 2 : P h n (i) p(i )p h n (i 2) 0»ji 2 j»ji 2 I ff 2 (x + h n(x))j»ji 2 Aj n : g (i 2 A)» g (i 2 I ff 2 (x + h n(x)))» g (i 2 ) n : g (i 2 A) g (i 2 ) log» P hn (i) p(i )p hn (i 2 ) P hn (i) log P hn (i) p(i )p hn (i 2 ) P hn (i) p(i )p hn (i 2 )» g (i 2 ) g (i 2 A)» g (i log 2 ) g (i 2 A) ;» g (i 2 )p(i ) log g (i 2 ) g (i 2 A) : The function on the right-hand side is continuous and integrable in R ] ; A]. 0» i 2»A We have Hence This yields and and therefore 0»ji 2 I ff 2 (x + h n(x))j»a n : g (A)» g (i 2 I ff 2 (x + h n(x)))» g (0) n : g (A) g (0)» log P hn (i) p(i )p hn (i 2 )» g (0) g (A) ; P hn (i) p(i )p hn (i 2 ) P hn (i) log P hn (i) p(i )p hn (i 2 )» log g (0) g (A) ;» g (0)p(i ) log g (0) g (A) : The function on the right-hand side is continuous and integrable in R [0; A]. :

74 4.3 Continuity of MI g and MI l 73 i 2 A This is the case where Hence This yields and and therefore 0» i 2 A»i 2 I ff 2 (x + h n(x))» i 2 n : g (i 2 )» g (i 2 I ff 2 (x + h n(x)))» g (i 2 A) n : g (i 2 ) g (i 2 A)» log P hn (i) p(i )p hn (i 2 ) P hn (i) log P hn (i) p(i )p hn (i 2 ) P hn (i) p(i )p hn (i» g (i 2 A) 2 ) g (i 2 )» log g (i 2 A) g (i 2 )» g (i 2 A)p(i )log g (i 2 A) ; ; g (i 2 ) The function on the right-hand side is continuous and integrable in R ]A; +]. The dominated convergence theorem implies that 2 MI g (h n )= R 2 P hn (i)log P hn (i) p(i )p hn (i 2 ) di! MI g (h) = R 2 P h (i)log P h (i) p(i )p h (i 2 ) di: : Concerning the local case, a similar result holds true. Proposition 4.2 Let h n ;n = ; ; be a sequence of functions of H such that h n! h almost everywhere in then MI l (h n )! MI l (h). Proof : The proof is similar to that of proposition 4. 2

76 Chapter 5 The Euler-Lagrange Equations In this chapter, the computation of the rst variation of the statistical dissimilarity measures dened in the previous chapter is carried out by considering the variations of the joint density estimates. This provides us with a simple way to quantify the contribution of local innitesimal variations of h to these intrinsically non-local dissimilarity criteria. 5. Global Criteria We start with the global criteria, studying them in decreasing order of generality, i.e. starting with mutual information. 5.. Mutual Information We do an explicit computation of the rst variation of J MI g (h) (denition 4. on page 68). For this purpose, we recall that the rst variation is dened MI g (h + fflk) ffl=0 : From the denition of J MI g (h), we readily MI g (h + = = R 2 P h+fflk (i) + log P h+fflk (i) p(i ) p h+fflk (i 2 ) di P h+fflk (i) di di 2 p(i ) p h+fflk (i 2 R 2 P h+fflk h+fflk (i 2 ) di: p h+fflk (i 2

77 76 Chapter 5: The Euler-Lagrange Equations We notice that the second term on the right-hand side is zero, since we have h+fflk (i 2 p h+fflk (i 2 ) P h+fflk (i) di di 2 z } p h+fflk (i 2 ) Thus, we may write the rst variation of J MI g (h) as MI g (h + ffl=0 = Eh MI (i) R 2 E MI h (i) = + log The function P h+fflk (i) is given by equation (4.2): and h+fflk Thus, we nally MI g (h + P h+fflk (i) = P h (i) : p(i ) p h (i 2 ) p h+fflk (i 2 ) di 2 =0: R z } ffl=0 di; (5.) G (I h+fflk (x) i) dx; (5.2) jj 2 G (I h+fflk (x) i) ri ff 2 (x + h(x) +fflk(x)) k(x) dx: jj ffl=0 = R2 Eh MI 2G (I(x)h i) ri ff 2 (x + h(x)) k(x) dx di: A convolution appears with respect to the intensity variable i. This convolution commutes with the 2 with respect to the second intensity variable i 2, and MI g (h + = ffl=0 jj G?@ 2 Eh MI Ih (x) ri ff 2 (x + h(x)) k(x) dx: By identifying this expression with a scalar product in H = L 2 (), we dene the gradient of J MI g (h), denoted r H J MI g (h), with the property that: Thus, MI g (h + ffl=0 =(r H J MI g (h); k)l 2 (): r H J MI g (h)(x) = G jj?@ 2 Eh MI Ih (x) ri ff 2 (x + 2 E MI h (i) P h (i) P h (i) p0 h (i 2) : p h (i 2 )

78 5. Global Criteria 77 We dene the function R 2! R: L g MI;h (i) 2 E MI h (i): The gradient of J MI g (h) is a smoothed version of this function, evaluated at the intensity pair I h (x), times the vector pointing to the direction of local maximum increase of i 2, namely ri ff 2 (x +h(x)). It is therefore of interest to interpret the behavior of this function. Given a point x, the pair I h (x) lies somewhere in the square [0; A] within the domain of intensities, i.e. R 2 (see Figure 5.). The rst term in L g MI;h, 2 P h (i) P, tries to make the intensity i h (i) 2 move closer to a local maximum of P h. It thus tends to cluster P h. On the contrary, the second term, namely p0 h (i 2) p h (i 2, tries to prevent ) the marginal law p h (i 2 ) from becoming too clustered, i.e keep X I ff 2 ; h as unpredictable as possible. The fact that only the value of i 2 is changed implies that these movements take place only along one of the axes. This lack of symmetry is a general problem coming from the way in which the problem is posed. We refer to the works of Trouvé and Younes [83], Cachier and Rey [22], Christensen and He [24] and Alvarez et al. [] for some recent approaches to overcome this lack of symmetry. The red sketch in Figure 5. depicts a possible state of the function P h after minimization of J MI g (h). i i i Joint intensity distribution P (i ;i2) I(x) p(i) marginals I2(x) i2 p(i ji2(x)) conditionals p(i2) i2 p(i2 ji(x)) i2 Figure 5.: Sketch of a possible state of the joint pdf of intensities after minimization of J MI g (h) (see text).

79 78 Chapter 5: The Euler-Lagrange Equations 5..2 Correlation Ratio In this section, we compute the rst variation of J CR g (h) (denition 4. on page 68). To do this, we note so that where w(h) E[Var[X I ff 2 ;hjx I ff ]] = J CR g (h) = w(h) v 2 (h) = v 2 (h) Thus, we readily CR g (h + where 2 (h + = = = v 2 (h + fflk) R 2 i 2 2 R 2 i 2 2 μ 2j (i ; h) h+fflk R 2 i 2 2 P h(i) di + v 2j (i ; h) p(i ) di ; R i 2 P h (i) p(i ) di 2: μ 2j (i ; h) 2 p(i ) di ; R J CR g (h + 2(h + di 2μ 2j (i ; h h+fflk (i) + fflk) i 2 di; R di 2μ 2 (h + h+fflk (i) i 2 R di: (5.3) Similarly to the case of the mutual information (see equation (5.)), the rst variation of J CR g (h) can be put in the form CR g (h + ffl=0 = Eh CR (i) R ffl=0 Eh CR (i) = i 2 i 2 2μ v 2j (i ; h) J CR g (h)(i 2 2μ 2 (h)) : 2 (h) di; The discussion starting before equation (5.2) remains identical in this case. Thus, the gradient of J CR g (h) is given by: where r H J CR g (h)(x) = jj G?@ 2 E CR h Ih (x) ri ff 2 (x + 2 Eh CR (i) = 2 μ 2 (h) μ v 2j (i ; h) +CR g (h) (i 2 μ 2 (h)) : 2 (h)

80 5. Global Criteria 79 As for the mutual information, we dene the function R 2! R: L g CR;h (i) 2 E CR h (i); and interpret its behavior as an intensity comparison function. The function μ 2j (i ; h) gives the backbone ofp h. We see that trying to decrease the value of J CR g (h) amounts to making i 2 lie as close as possible to μ 2j (i ; h), while keeping this backbone as complex as possible (away from μ 2 (h)). The red sketch in Figure 5.2 depicts a possible state of the function P h after minimization of J CR g (h). i i i Joint intensity distribution P (i ;i2) I(x) p(i) marginals I2(x) i2 p(i ji2(x)) conditionals p(i2) i2 p(i2 ji(x)) i2 Figure 5.2: Sketch of a possible state of the joint pdf of intensities after minimization of J CR g (h) (see text) Cross Correlation In this section, we compute the rst variation of J CC g (h) (denition 4. on page 68). This case is extremely similar to the previous two. From the denition of J CC g (h), we readily CC g (h + fflk) v v 2 (h + fflk) 2 v ;2 (h + ;2(h + CC (h 2 (h + fflk) + fflk) v

81 80 Chapter 5: The Euler-Lagrange Equations ;2 (h + h+fflk (i) i i 2 R di h+fflk (i) i 2 R v 2(h + fflk) is given by equation (5.3). Thus, one more time, we may put the rst variation of J CC g (h) in the form CC g (h + Eh CC (i) = v v 2 (h) ffl=0 = Eh CC (i) R ffl=0 2 v ;2 (h) i 2 (i μ ) CC g (h) v i 2 (i 2 2 μ 2 (h)) : Again, the discussion starting before equation (5.2) remains valid, and therefore the gradient of J CC g (h) is given by: r H J CC g (h)(x) = jj G?@ 2 E CC h di; Ih (x) ri ff 2 (x + h(x)); where» 2 Eh CC (i) = 2 (h) i μ CC g (h) v 2 (h) v i2 μ 2 (h) v 2 (h) : Notice that in this simple case the convolution may be applied formally, yielding the same expression (since only linear terms are involved), so that in fact r H J CC g (h)(x) = 2 Eh CC Ih (x) ri ff 2 (x + h(x)): As for the previous two criteria, we dene the function R 2! R: L g CC;h (i) 2 E CR h (i); and interpret its behavior as an intensity comparison function. Decreasing the value of J CC g (h) amounts to making the pair of intensities lie on a non-vertical straight line in R 2 (not necessarily passing through the origin). Again, the lack of symmetry in the problem limits the change of intensities to a single direction. The red sketch in Figure 5.3 depicts a possible state of the function P h after minimization of J CC g (h). 5.2 Local Criteria We now analyse the case of the local criteria. As will be noticed, the reasoning is completely analog to that of the global case, but the functions obtained are signicantly more complex.

82 5.2 Local Criteria 8 i i i Joint intensity distribution P (i ;i2) I(x) p(i) marginals I2(x) i2 p(i ji2(x)) conditionals p(i2) i2 p(i2 ji(x)) i2 Figure 5.3: Sketch of a possible state of the joint pdf of intensities after minimization of J CC g (h) (see text) Mutual Information We compute the rst variation of J MI l(h) (denition 4.2 on page 68). From this denition, we MI l(h + P P h+fflk (i; x h+fflk (i; x 0 ) 0 )log p(i ; x 0 0 )p h+fflk (i 2 ) We notice that Q = = R 2 R2 = + log h+fflk (i 2 ; x 0 P h+fflk (i; x 0 ) p(i ; x 0 ) p h+fflk (i 2 ; x 0 ) p h+fflk (i 2 ; x 0 (i; x 0 ) di dx P h+fflk (i; x 0 h+fflk (i 2 ; x 0 ) didx R 2 p h+fflk (i 2 ; x 0 : 0 z } Q P h+fflk (i; x 0 ) di di 2 = R z } p h+fflk (i 2 ;x p h+fflk (i 2 ; x 0 ) di 2 =0: R z }

83 82 Chapter 5: The Euler-Lagrange Equations Thus, the rst variation of J MI l(h) may be written as MI l(h + ffl=0 = Eh MI (i; x h+fflk(i; x 0 ) R di dx 0 ; ffl=0 Eh MI (i; x P h (i; x 0 ) 0)= + log : p(i ; x 0 ) p h (i 2 ; x 0 ) The law P h+fflk (i; x 0 ) is given by equation (4.2): Therefore P h+fflk (i; x 0 )= G fl (x 0 ) G (I h+fflk (x) i) G fl (x x 0 ) dx: h+fflk (i; x 0 ) G G fl (x fl (x x 0 )@ 2 G (I h+fflk (x) i)ri ff 2 (x + h(x) +fflk(x)) k(x) dx: 0 ) Thus, we nally MI l(h + ffl=0 = R2 G fl (x 0 ) EMI h (i; x 0) G fl (x x 0 2 G (I h (x) i) ri ff 2 (x + h(x)) k(x) dx di dx 0: Two convolutions appear, one with respect to the space variable x and the other one with respect to the intensity variable i. This last convolution commutes with the partial 2 with respect to the second intensity variable i 2, and MI l(h + = ffl=0 Gfl? G? G 2 Eh MI This expression gives the gradient of J MI l(h): where r H J MI l(h)(x) = G fl? G? Ih (x); x ri ff 2 G 2 Eh 2 E MI h (i; x) P h (i; x) P h (i; x) We dene the function R 2 R n! R: (x + h(x)) k(x) dx: Ih (x); x ri ff 2 (x + h(x)); p0 h (i 2; x) : p h (i 2 ; x) L l MI;h (i; x) 2 Eh MI (i; x); fl (x) which plays exactly the same role as its global counterpart, L g MI;h (i).

84 5.2 Local Criteria Correlatio Ratio In this section, we compute the rst variation of J CR l(h) (denition 4.2 on page 68). If we write w(h; x 0 ) v 2j (i ; h; x 0 ) p(i ; x 0 ) di ; R so that J CR l(h) = v 2 (h; x 0 ) w(h; x 0 ) v 2 (h; x dx 0 = 0 ) R 2 i 2 2 P h (i; x 0 ) di μ 2j (i ; h; x 0 ) 2 p(i ; x 0 ) di R dx 0 ; we immediately see that we are in a situation completely analog to the global case, with the same modication as that of mutual information in the previous section, i.e. that we can write the rst variation of J CR l(h) as CR l(h + E CR h (i; x 0)= ffl=0 = Eh CR (i; x h+fflk(i; x 0 ) R di dx 0 ; ffl=0 i 2 i v 2 (h; x 2 2μ 2j (i ; h; x 0 ) w(h; x 0) 0 ) v 2 (h; x 0 ) (i 2 2μ 2 (h; x 0 )) : The discussion starting before equation (5.4) applies directly to this case. Thus, the gradient of J CR l(h) is given by: where r H J CR l(h)(x) = G fl? 2 E CR h (i; x) = 2 v 2 (h; x) We dene the function R 2 R n! R: G 2 Eh CR Ih (x); x ri ff 2 (x + h(x)); μ 2 (h; x) μ 2j (i ; h; x) +CR l (h; x) i 2 μ 2 (h; x) : L l CR;h (i; x) G fl 2 Eh CR (i; x): which plays exactly the same role as its global counterpart, L g CR;h (i) Cross Correlation Again, the situation is quite similar in the case of J CC l(h) (denition 4. on page 68). We can put its rst variation in the CC l(h + ffl=0 = Eh CC (i; x h+fflk(i; x 0 ) R di dx 0 ; ffl=0

85 84 Chapter 5: The Euler-Lagrange Equations where E CC h (i; x 0)= v (x 0 ) v 2 (h; x 0 ) 2 v ;2 (h; x 0 ) i 2 (i μ (x 0 )) CC l (h; x 0 ) v (x 0 ) i 2 (i 2 2 μ 2 (h; x 0 )) : The discussion starting at equation (5.4) remains valid, and therefore the gradient of J CC l(h) is given by: where r H J CC l(h)(x) = G fl? 2 E CC h (i; x) = 2» v;2 (h; x) v 2 (h; x) G 2 Eh CC i μ (x) v (x) Ih (x); x ri ff 2 (x + h(x)); CC l (h; x) i2 μ 2 (h; x) v 2 (h; x) : Like in the global case, the convolution in the intensity domain yields the same expression (since it is linear in i). Thus, r H J CC l(h)(x) = G fl? G 2 E CC h We dene the function R 2 R n! R: L l CC;h (i; x) G fl 2 Eh CC (i; x): (I h (x); x) ri ff 2 (x + h(x)): which plays the same role as its global counterpart, L g CC;h (i). 5.3 Summary We now summarize the results of this chapter by dening the functions that will be used to specify the function F in the generic matching flow (equation (2.4) on page 47), depending on the various dissimilarity criteria. Theorem 5. The innitesimal gradient of the global dissimilarity criteria is given by F g (h)(x) =r H J g (h)(x) =(G?L g h )(I h(x)) ri ff 2 (x + h(x)); (5.5) where the function L g h (i) is equal to L g MI;h (i) = jj in the case of the mutual information, P h (i) P h (i) p0 h (i 2) p h (i 2 ) L g CR;h (i) =μ 2(h) μ 2j (i ; h) +CR g (h) i 2 μ 2 (h) 2 jj v 2(h) (5.6) (5.7)

86 5.3 Summary 85 in the case of the correlation ratio and to v;2 (h) L g 2 CC;h (i) = jj i μ CC g (h) v i2 μ 2 (h) v 2 (h) v 2 (h) in the case of the cross correlation. This last case is especially simple since (5.8) so that no convolution is required. This denes three functions H! H: G?L g CC;h = Lg CC;h ; F g MI (h) = (G?LMI;h g )(Iff ;Iff(Id 2 + h)) riff 2 (Id + h); F g CR (h) = (G?LCR;h g )(Iff ;Iff(Id 2 + h)) riff 2 (Id + h); (5.9) F g CC (h) = Lg CC;h (Iff ;Iff(Id 2 + h)) riff 2 (Id + h): Proof : The only point that has not been proved is the fact that the functions F g MI (h), F g CR (h) and F g CC (h) belong to H. This is a consequence of theorems 6.3, 6.25 and 6.3, respectively. 2! R It is worth clarifying how equation (5.5) is interpreted. The function L g h : R2 is convolved with the 2D gaussian G and the result is evaluated at the intensity pair (I ff (x);iff 2 (x + h(x))). The value of the gradient at the point x is then obtained by multiplying this value by the gradient of the second image at the point x + h(x). For the global cross correlation, no convolution is required. The case of the local criteria is very similar. Theorem 5.2 The innitesimal gradient of the local criteria is given by F l (h)(x) =r H J l (h)(x) =(G fl? (G?L l h ))(I h(x); x) ri ff 2 (x + h(x)); (5.0) where the function L l h (i ;i 2 ; x) is equal P h (i; x) L l MI;h (i; x) = G fl (x) in the case of the mutual information, to P h (i; x) p0 h (i 2; x) ; (5.) p h (i 2 ; x) L l CR;h (i; x) =μ 2(h; x) μ 2j (i ; h; x) +CR l (h; x) i 2 μ 2 (h; x) G 2 fl(x) v 2 (h; x) in the case of the correlation ratio, and to (5.2) LCC;h l (i; x) = 2 G fl (x) v;2 (h; x) v 2 (h; x) i μ (x) v (x) CC l i2 μ 2 (h; x) (h; x) v 2 (h; x) (5.3)

87 86 Chapter 5: The Euler-Lagrange Equations in the case of the cross correlation. The case of the cross correlation is especially simple since G?L l CC;h = Ll CC;h ; so that only the spatial convolution is necessary. This denes three functions H! H: FMI l (h) = (G fl?g?l l MI;h )(Iff ;Iff(Id 2 + h); Id) riff 2 (Id + h); FCR l (h) = (G fl?g?l l CR;h )(Iff ;Iff(Id 2 + h); Id) riff 2 (Id + h); (5.4) FCC l (h) = (G fl?lcc;h l )(Iff ;Iff(Id 2 + h); Id) riff 2 (Id + h): Proof : The fact that FMI l (h), F CR l (h) and F CC l (h) belong to H is a consequence of theorems 6.4, 6.57 and 6.64, respectively. 2 It is worth clarifying how equation (5.0) is interpreted. The function L l h : R2 R n! R is convolved with the 2D gaussian G for the rst two variables (intensities) and the nd gaussian G fl for the remaining n variables (spatial), and the result is evaluated at the point ((I ff (x);iff 2 (x + h(x))); x) of R2 R n. The value of the gradient at the point x is then obtained by multiplying this value by the gradient of the second image at the point x + h(x). For the local cross correlation, only the spatial convolution is required.

88 Chapter 6 Properties of the Matching Terms This chapter is devoted to showing that the gradients of the statistical criteria computed in the previous chapter satisfy the Lipschitz-continuity conditions established in Chapter 2, necessary to assert the well-posedness of the evolution equations. 6. Preliminary Results We begin by some elementary results on Lipschitz-continuous functions that will be used very often in the sequel. Proposition 6. Let H be a Banach space and let us denote its norm by k:k H. Let f i ;i=; 2:H! R be two Lipschitz continuous functions. We have the following:. f + f 2 is Lipschitz continuous. 2. If f and f 2 are bounded then the product f f 2 is Lipschitz continuous. 3. If f 2 > 0 and if f and f 2 are bounded, then the ratio f f 2 is Lipschitz continuous. Proof : We prove only 2 and 3. Let h and h 0 be two vectors of H: jf (h)f 2 (h) f (h 0 )f 2 (h 0 )j = j(f (h) f (h 0 ))f 2 (h) +f (h 0 )(f 2 (h) f 2 (h 0 )j» from which point 2 above follows. Similarly f (h) f f (h 0 ) 2 (h) f 2 (h 0 ) jf 2 (h)jjf (h) f (h 0 )j + jf (h 0 )jjf 2 (h) f 2 (h 0 )j; = jf (h)f 2 (h 0 ) f 2 (h)f (h 0 )j» f 2 (h)f 2 (h 0 ) jf (h) f (h 0 )jf 2 (h 0 )+jf (h 0 )jjf 2 (h) f 2 (h 0 )j f 2 (h)f 2 (h 0 : )

89 88 Chapter 6: Properties of the Matching Terms If f 2 > 0, there exists a>0 such that f 2 >a. Hence f (h) f f (h 0 ) 2 (h) f 2 (h 0 )» from which point 3 above follows. 2 a 2 jf (h) f (h 0 )jf 2 (h 0 ) + jf (h 0 )jjf 2 (h) f 2 (h 0 )j; In the following, we will need the following denitions and notations. Denition 6. We note H =[0; A] H and H 2 =[0; A] 2 H the Banach spaces equipped with the norms k(z;h)k H = jzj+khk H and k(z ;z 2 ; h)k H2 = jz j+jz 2 j+ khk H, respectively. We will use several times the following result. Lemma 6.2 Let f : H 2! R be such that (z ;z 2 )! f (z ;z 2 ; h) is Lipschitz continuous with a Lipschitz constant l f independent of h and such that h! f (z ;z 2 ; h) is Lipschitz continuous with a Lipschitz constant L f independent of (z ;z 2 ). Then f is Lipschitz continuous. Proof : We have 2 jf (z ;z 2 ; h) f (z 0 ;z 0 2; h 0 )j» jf (z ;z 2 ; h) f (z 0 ;z 0 2; h)j + jf (z 0 ;z 0 2; h) f (z 0 ;z 0 2; h 0 )j» l f (jz z 0 j + jz 2 z 0 2j) +L f kh h 0 k H» max(l f ;L f )(jz z 0 j + jz 2 z 0 2j + kh h 0 k H ): In Section 6.3, we will need a slightly more general version of this lemma. Lemma 6.3 Let f :[0; A] 2 H! R be such that (z ;z 2 )! f (z ;z 2 ; h; x) is Lipschitz continuous with a Lipschitz constant l f independent of x and h and such that h! f (z ;z 2 ; h; x) is Lipschitz continuous with a Lipschitz constant L f independent of (z ;z 2 ; x). Then f is Lipschitz continuous on [0; A] 2 H uniformly on. Proof : Indeed, jf (z ;z 2 ; h; x) f (z 0 ;z0 2 ; h; x0 )j» jf (z ;z 2 ; h; x) f (z 0 ;z 0 2; h; x)j + jf (z 0 ;z 0 2; h; x) f (z 0 ;z 0 2; h 0 )j» l f (jz z 0 j + jz 2 z 0 2j) +L f kh h 0 k H» max(l f ;L f )(jz z 0 j + jz 2 z 0 2j + kh h 0 k H ) 8x 2 ; and the Lipschitz constant max(l f ;L f ) is independent of x 2. 2

90 6.2 Global Criteria Global Criteria We show in this section the Lipschitz continuity of the gradients of the global criteria, as dened in (5.9) Mutual Information We rst prove that in the mutual information case, there is a neat separation in the denition of the function F between its local and global dependency in the eld h. More precisely we have the following. Proposition 6.4 The function q h : R! R dened by satises the following equation: where the function 0» a(i 2 ; h)» A. Proof : p h is dened by p h (i 2 )= q h (i 2 )= p0 h (i 2) p h (i 2 ) q h (i 2 )=a(i 2 ; h) i 2 ; g (I ff 2 (x + h(x)) i 2) dx; hence p 0 h (i 2)= (I ff (x 2 + h(x)) i 2)g (I ff (x 2 + h(x)) i 2) dx: The function a(i 2 ; h) is equal to a(i 2 ; h) = I ff (x 2 + h(x))g (I ff (x 2 + h(x)) i 2) dx ; (6.) g (I ff (x 2 + h(x)) i 2) dx and the result follows from the fact that I ff 2 (x + h(x)) 2 [0; A]. 2 A simple variation of the previous proof shows the truth of the following. Proposition 6.5 The function Q h : R 2! R dened by Q h (i) 2P h (i) P h (i)

91 90 Chapter 6: Properties of the Matching Terms satises the following equation: Q h (i) =A(i; h) i 2 ; where function A(i; h) = I ff (x 2 + h(x)) G (I h (x) i) dx ; (6.2) G (I h (x) i) dx satises 0» A(i; h)» A. In the following, we will use the function L g MI; h (i) :R2! R dened as (see theorem 5.) L g MI; h (i) = jj (Q h(i) q h (i 2 )) = jj (A(i; h) a(i 2; h)): (6.3) We then consider the result f g MI of convolving Lg MI; h with G, i.e. the two functions b : R H! R dened as b(z 2 ; h) =(g?a)(z 2 ; h) = and B : R 2 H! R dened as B(z ;z 2 ; h) =(G?A)(z; h) = We prove a series of propositions. g (z 2 i 2 )a(i 2 ; h) di 2 ; (6.4) R R 2 G (z i)a(i; h) di: (6.5) Proposition 6.6 The function R! R + dened by z 2! b(z 2 ; h) is Lipschitz continuous with a Lipschitz constant l g b which is independent of h. Moreover, it is bounded by A. Proof : The second part of the proposition follows from the fact that 0» a(i 2 ; h)» A 8i 2 2 R and 8h 2 H (proposition 6.4). In order to prove the rst part, we prove that the magnitude of the derivative of the function is bounded independently of h. Indeed jb 0 (z 2 ; h)j = + (z 2 i 2 )g (z 2 i 2 )a(i 2 ; h)di 2» A + jz 2 i 2 jg (z 2 i 2 )di 2 : The function on the right-hand side of the inequality is independent of h and continuous on [0; A], therefore upperbounded. 2

92 6.2 Global Criteria 9 Proposition 6.7 The function h! b(z 2 ; h) :L 2 ()! R is Lipschitz continuous on L 2 () with Lipschitz constant L g b, which is independent of z 2 2 [0; A]. Proof : We consider b(z 2 ; h ) b(z 2 ; h 2 )= g (z 2 i 2 )(a(i 2 ; h ) a(i 2 ; h 2 )) di 2 (6.6) R According to equation (6.), a(i 2 ; h) is the ratio N (i 2 ; h)=d(i 2 ; h) of the two functions N (i 2 ; h) = I ff (x 2 + h(x))g (I ff (x 2 + h(x)) i 2) dx; and D(i 2 ; h) = g (I ff 2 (x + h(x)) i 2) dx: We ignore the factor = which is irrelevant in the proof. We write jb(z 2 ; h ) b(z 2 ; h 2 )j» R g (z 2 i 2 ) jn (i 2; h 2 )jjd(i 2 ; h 2 ) D(i 2 ; h )j D(i 2 ; h )D(i 2 ; h 2 ) R and consider the rst term of the right-hand side. D(i 2 ; h 2 ) D(i 2 ; h )= di 2 + g (z 2 i 2 ) D(i 2; h 2 ) jn (i 2 ; h 2 ) N (i 2 ; h )j D(i 2 ; h )D(i 2 ; h 2 ) di 2 ; (6.7) (g (i 2 I ff 2 (x + h 2(x)) ) g (i 2 I ff 2 (x + h (x)) )) dx We use the rst order Taylor expansion with integral remainder of the C function g. This says that g (i + t) =g (i) +t as the reader will easily verify. We can therefore write 0 g 0 (i + tff) dff; g (i 2 I ff 2 (x + h 2(x)) ) g (i 2 I ff 2 (x + h (x)) ) = ( I ff 2 (x + h (x)) I ff 2 (x + h 2(x)) ) g 0 0 i 2 ffi ff 2 (x + h 2(x)) + ( ff) I ff 2 (x + h (x)) dff We use the fact that I ff 2 is Lipschitz continuous and write jd(i 2 ; h 2 ) D(i 2 ; h )j» Lip(I ff 2 ) jh (x) h 2 (x)j 0 g 0 (i 2 (ffi ff 2 (x + h 2(x)) +( ff) I ff 2 (x + h (x)) )) dff dx

93 92 Chapter 6: Properties of the Matching Terms Schwarz inequality implies jd(i 2 ; h 2 ) D(i 2 ; h )j»lip(i ff 2 )kh h 2 k H ψ 0 g 0 (i 2 (ffi ff 2 (x + h 2(x)) +( ff) I ff 2 (x + h (x)) )) dff 2 dx! 2 We introduce the function r(i 2 ; h ; h 2 )= 0 ji 2 ffi ff 2 (x + h 2(x)) +( ff) I ff 2 (x + h (x)) j g i 2 ffi ff (x 2 + h 2(x)) +( ff) I ff (x 2 (x)) + h 2 2 dff dx We notice that ψ 0 g 0 (i 2 (ffi ff 2 (x + h 2(x)) +( ff) I ff 2 (x + h (x)) )) dff 2 dx! 2 Sofarwehave R g (z 2 i 2 ) jn (i 2; h 2 )jjd(i 2 ; h 2 ) D(i 2 ; h )j D(i 2 ; h )D(i 2 ; h 2 ) di 2»» r(i 2; h ; h 2 ): Lip(I ff 2 ) kh h 2 k H g (z 2 i 2 ) jn (i 2; h 2 )j r H (i 2 ; h ; h 2 ) di R D(i 2 ; h )D(i 2 ; h 2 ) 2 (6.8) We study the function of z 2 that is on the right-hand side of this inequality. First we note that the function is well dened since no problems occur when i 2 goes to innity because there are three gaussians in the numerator and two in the denominator. We then show that this function is bounded independently of h and h 2 for all z 2 2 [0; A]. We consider three cases: i 2» 0 This is the case where Hence 0»ji 2 j» ji 2 I ff 2 (x + h j(x))j»ji 2 Aj j =; 2 0»ji 2 j» ji 2 (ffi ff 2 (x + h 2(x)) + ( ff)i ff 2 (x + h (x)))j»ji 2 Aj 0» ff» g (i 2 A)» g (i 2 I ff 2 (x + h j(x)))» g (i 2 ) j =; 2 g (i 2 A)» g (i 2 (ffi ff 2 (x + h 2(x)) + ( ff)i ff 2 (x + h (x))))» g (i 2 ) 0» ff»

94 6.2 Global Criteria 93 This yields 0 g (z 2 i 2 ) jn (i 2; h 2 )j r(i 2 ; h ; h 2 ) D(i 2 ; h )D(i 2 ; h 2 ) jj =2 A 0 di 2» g (z 2 i 2 )ji 2 Aj g (i 2 ) g (i 2 A) 2 di 2 ; The integral on the right-hand side is well-dened and denes a continuous function of z 2. 0» i 2»A 0» ji 2 I ff 2 (x + h j(x))j»a j =; 2 0»ji 2 (ffi ff 2 (x + h 2(x)) + ( ff)i ff 2 (x + h (x)))j»a 0» ff» Hence This yields A 0 g (A)» g (i 2 I ff 2 (x + h j(x)))» g (0) j =; 2 g (A)» g (i 2 (ffi ff 2 (x + h 2(x)) + ( ff)i ff 2 (x + h (x))))» g (0) 0» ff» g (z 2 i 2 ) jn (i 2; h 2 )j r(i 2 ; h ; h 2 ) D(i 2 ; h )D(i 2 ; h 2 ) di 2» jj =2 Ag (0) g (A) 2 A 0 g (z 2 i 2 ) di 2 ; The integral on the right-hand side is convergent and denes a continuous function of z 2. i 2 A This is the case where Hence 0»i 2 A» i 2 I ff 2 (x + h j(x))» i 2 j =; 2 0»i 2 A» i 2 (ffi ff 2 (x + h 2(x)) + ( ff)i ff 2 (x + h (x)))» i 2 0» ff» g (i 2 )» g (i 2 I ff 2 (x + h j(x)))» g (i 2 A) j =; 2 g (i 2 )» g (i 2 (ffi ff 2 (x + h 2(x)) + ( ff)i ff 2 (x + h (x))))» g (i 2 A) 0» ff»

95 94 Chapter 6: Properties of the Matching Terms This yields + A g (z 2 i 2 ) jn (i 2; h 2 )j r(i 2 ; h ; h 2 ) D(i 2 ; h )D(i 2 ; h 2 ) jj =2 A + A di 2» g (z 2 i 2 )i 2 g (i 2 A) g (i 2 ) The integral on the right-hand side is convergent and denes a continuous function of z 2. In all three cases, the functions of z 2 appearing on the right-hand side are continuous, independent of h and h 2, therefore upperbounded on [0; A] by a constant independent of h and h 2. Returning to inequality (6.8), we have proved that there existed a positive constant C independent of z 2 such that R g (z 2 i 2 ) jn (i 2; h 2 )jjd(i 2 ; h 2 ) D(i 2 ; h )j D(i 2 ; h )D(i 2 ; h 2 ) di 2» 2 di 2 ; Ckh h 2 k H 8z 2 2 [0; A] 8h ; h 2 2 H A similar proof can be developed for the second term in the right-hand side of the inequality (6.7). In conclusion we have proved that there existed a constant L g b, independent of z 2 such that 2 jb(z 2 ; h ) b(z 2 ; h 2 )j»l g b kh h 2 k H 8z 2 2 [0; A] 8h ; h 2 2 H Thus, we can state the following. Proposition 6.8 The function b : H! R is Lipschitz continuous. Proof : The proof follows from propositions 6.6, 6.7 and lemma We now proceed with showing the same kind of properties for the function B. Proposition 6.9 The function [0; A] 2! R + dened by (z ;z 2 )! B(z ;z 2 ; h) is Lipschitz continuous with a Lipschitz constant l g B which is independent of h. Moreover, it is bounded by A. Proof : The second part of the proposition follows from the fact that, 8i ;i 2 2 R 2 and 8h 2 H, we have (proposition 6.5): 0» A(i ;i 2 ; h)» A : The rst part follows the same pattern as the proof of proposition

96 6.2 Global Criteria 95 Proposition 6.0 The function h! B(z ;z 2 ; h), H! R is Lipschitz continuous with a Lipschitz constant L g B which is independent of (z ;z 2 ) 2 [0; A] 2. Proof : The proof follows the same pattern as the one of proposition Therefore we can state the following result. Proposition 6. The function B : H 2! R is Lipschitz continuous. Proof : The proof follows of propositions 6.9, 6.0 and lemma From propositions 6.8, 6. and 6. we obtain the following. Corollary 6.2 The function f g MI : H 2! R dened by (z ;z 2 ; h)! jj (B(z ;z 2 ; h) b(z 2 ; h)) is Lipschitz continuous and bounded by 2A=jj. We note Lip(f g MI ) the corresponding Lipschitz constant. We can now state the main result of this section: Theorem 6.3 The function F g MI : H! H dened by F g MI (h) =f g MI (Iff ;I ff (Id 2 + h); h)riff 2 (Id + h) = jj (B(Iff ;Iff(Id 2 + h); h) b(iff(id 2 + h); h))riff 2 (Id + h); is Lipschitz continuous and bounded. Proof : Boundedness comes from the fact that b and B are bounded (propositions 6.6 and 6.9, respectively) and that jri ff 2 j is bounded. This implies that F g MI (h) 2 H = L 2 () 8h 2 H. We consider the ith component F gi MI of F g MI : with F gi MI (h )(x) F gi MI (h 2)(x) = jj (S T S 2 T 2 ); S j = B(I ff (x);iff 2 (x + h j(x)); h j ) b(i ff 2 (x + h j(x)); h j ) T j i I ff 2 (x + h j(x)); and j =; 2. We continue with jf gi MI (h )(x) F gi MI (h 2)(x)j» jj (js S 2 jjt j + js 2 jjt T 2 j) i I ff 2 is bounded, jt jj» k@ i I ff 2 k. Because of propositions 6.6 and 6.9, js 2 j» 2 A. ii ff 2 is Lipschitz continuous jt T 2 j» Lip(@ i I ff 2 )jh (x) h 2 (x)j.

97 96 Chapter 6: Properties of the Matching Terms Finally, because of corollary 6.2 and the fact that I ff 2 is Lipschitz continuous, js S 2 j»lip(f g MI )(Lip(Iff 2 )jh (x) h 2 (x)j + kh h 2 k H ) : Collecting all terms we obtain jf gi MI (h )(x) F gi MI (h 2)(x)j»C i (jh (x) h 2 (x)j + kh h 2 k H ); for some positive constant C i ;i= ; ;n. The last inequality yields, through the application of Cauchy-Schwarz: kf g MI (h ) F g MI (h 2)k H» L g F kh h 2 k H for some positive constant L g F and this completes the proof. 2 The following proposition will be needed later. Proposition 6.4 The function! R n such that x! F g MI (h(x)) satises jf g MI (h(x)) F g MI (h(y))j»k(jx yj + jh(x) h(y)j); for some constant K>0. Proof : We write F g MI (h(x)) F g MI (h(y)) = Hence f g MI (h(x))riff(x 2 + h(x)) f g MI (h(x))riff 2 (y + h(y))+ jf g MI (h(x)) F g MI (h(y))j» f g MI (h(x))riff(y 2 + h(y)) f g MI (h(y))riff 2 (y + h(y)): jf g MI (h(x))jjriff(x 2 + h(x)) riff 2 (y + h(y))j+ Corollary 6.2 and the fact that the functions I ff, Iff 2 Lipschitz continuous imply jri ff 2 (y + h(y))jjf g MI (h(x)) f g MI (h(y))j: and its rst order derivative, are jf g MI (h(x)) F g (h(y))j» MI 2A jj Lip(rIff)(jx yj 2 + jh(x) h(y)j) +kjriff 2 jk Lip(f g MI )jh(x) h(y)j; hence the result. 2

98 6.2 Global Criteria Correlation Ratio We produce a simple expression of CR g (h) in terms of the two images I ff and Iff 2 and use it to prove that the correlation ratio is Lipschitz continuous as a function of h. In the sequel, we drop the upper index g. We begin with some estimates. Lemma 6.5 where (h) = jj 0» μ 2 (h) =E[X I ff 2 ;h] = I jj ff 2 (x + h(x)) dx»a (6.9)» v 2 (h) =Var[X I ff 2 ;h] = + (h)» + A 2 ; (6.0) I ff (x 2 + h(x))2 dx I jj ff 2 (x + h(x)) dx 2 : (6.) Proof : Because of equation (4.4) we have μ 2 (h) = jj R i 2 g (i 2 I ff 2 (x + h(x))) dx di 2 = jj I ff 2 (x + h(x)) dx: This yields the rst part of the lemma. For the second part, we use equation (4.5): v 2 (h) = and hence Var[X I ff 2 ;h] = Var[X I ff 2 ;h] = + jj R i 2 2 g jj (i 2 I ff 2 (x + h(x))) dx di 2 μ 2 (h) 2 ; I ff (x 2 + h(x))2 dx I jj ff 2 (x + h(x)) dx 2 ; from which the upper and lower bounds of the lemma follow. 2 We next take care of E[Var[X I ff 2 ;hjx I ff ]] with the following lemma. Lemma 6.6 where and E[Var[X I ff 2 ;hjx I ff ]] = + jj M (h) = I ff 2 (x + h(x))2 dx M (h); f (x; x 0 )I ff 2 (x + h(x))iff 2 (x0 + h(x 0 )) dx dx 0 ; f (x; x 0 )= g (i I jj R ff (x))g (i I ff (x0 )) di 2 ; p(i ) is such that R f (x; x0 ) dx = R f (x; x0 ) dx 0 = jj :

99 98 Chapter 6: Properties of the Matching Terms Proof : According to Table 4. we have v 2j (i ; h) =S(i ) T 2 (i ) and with and E[Var[X I ff 2 ;hjx I ff ]] = T (i )= It is straightforward to show that R S(i )= R R S(i )p(i ) di = + jj It is also straightforward to show that and hence that R T (i )= jjp(i ) T 2 (i )p(i ) di = jj 2 R We next write p(i ) v 2j (i ; h)p(i ) di ; R i 2 P h (i) 2 p(i di 2; ) i 2 P h (i) p(i ) di 2 = μ 2j (i ; h): I ff 2 (x + h(x))2 dx: g (i I ff (x))iff 2 (x + h(x)) dx; g (i I ff (x))iff 2 (x + h(x)) dx 2 = g (i I ff (x))iff 2 (x + h(x)) dx 2 di : g (i I ff (x))iff 2 (x + h(x))g (i I ff (x0 ))I ff 2 (x0 + h(x 0 )) dx dx 0 ; commute the integration with respect to i with that with respect to x and x 0 to obtain the result. 2 We pursue with another lemma. Lemma 6.7 The function M : H! R dened in lemma 6.6 is bounded and Lipschitz continuous. Proof : lemma 6.6. For the second part we compute For the rst part, jm (h)j» A 2 R f (x; x0 ) dx dx 0 = A 2, according to M [h ] M [h 2 ]= f (x; x 0 ) I ff (x 2 + h (x))i ff 2 (x0 + h (x 0 )) I ff 2 (x + h 2(x))I ff 2 (x0 + h 2 (x 0 )) dx dx 0 :

100 6.2 Global Criteria 99 jm [h ] M [h 2 ]j» f (x; x 0 )(ji ff 2 (x + h (x)) I ff 2 (x + h 2(x))jI ff 2 (x0 + h (x 0 ))+ ji ff 2 (x0 + h (x 0 )) I ff 2 (x0 + h 2 (x 0 ))ji ff 2 (x + h 2(x))) dx dx 0 : Because I ff 2 is Lipschitz continuous and bounded jm [h ] M [h 2 ]j» ki ff 2 k Lip(I ff 2 ) ki ff 2 k Lip(I ff 2 ) jj f (x; x 0 ) jh (x) h 2 (x)j + jh (x 0 ) h 2 (x )j 0 dx dx 0 = jh (x) h 2 (x)j dx + 2kI ff 2 k Lip(I ff 2 ) jj jh (x 0 ) h 2 (x 0 )j dx 0 = jh (x) h 2 (x)j dx: The previous to the last equality is obtained from lemma 6.6. Therefore we have from Cauchy-Schwarz: 2 jm [h ] M [h 2 ]j» 2kIff 2 k Lip(I ff 2 ) jj =2 kh h 2 k H : Lemma 6.8 The functions H! R + dened by h! jj are bounded and Lipschitz continuous. I ff 2 (x + h(x)) dx and h! jj I ff 2 (x + h(x))2 dx Proof : Boundedness has been proved in lemma 6.5 for the rst function. For the second we have jj Next, for Lipschitz continuity: jj because I ff 2 jj I ff 2 (x + h (x)) dx jj I ff 2 (x + h(x))2 dx»a 2 : I ff (x 2 + h 2(x)) dx» Lip(I ff 2 ) jj is Lipschitz, and hence (Cauchy-Schwarz): I ff 2 (x + h (x)) dx jj jh (x) h 2 (x)j dx; I ff (x 2 + h 2(x)) dx» Lip(Iff 2 ) jj kh =2 h 2 k H :

101 00 Chapter 6: Properties of the Matching Terms Similarly (Cauchy-Schwarz), 2 jj I ff 2 (x + h (x)) 2 dx jj I ff (x 2 + h 2(x)) 2 dx» 2 ALip(Iff 2 ) jj =2 kh h 2 k H : From this lemma, proposition 6. and lemma 6.5 we deduce the following. Corollary 6.9 The function H! R dened by h! Var[X I ff 2 ;h] is Lipschitz continuous. We also prove the following. Lemma 6.20 The function H! R dened by h! E[Var[X I ff 2 ;hjx g I ff bounded and Lipschitz continuous. Proof : This follows from lemmas 6.6, 6.7, and ]] is We can now prove the important intermediary result that the correlation ratio, as a function of the eld h, is Lipschitz continuous. Theorem 6.2 The function H! R dened by h! E[Var[X I 2 ff;h jx g I ff ]] Var[X is Lipschitz I ff 2 ;h ] continuous. Proof : This follows from proposition 6. and from lemmas 6.20, 6.5 and corollary We pursue with another lemma. Lemma 6.22 The function f g CR = G?L g CR : H 2 equation (5.7), is equal to the following expression:! R, where L g CR is given by f g CR (z ;z 2 ; h) = 2 jj v 2 (h) (μ 2(h) d(z ; h) +CR(h)(z 2 μ 2 (h))) ; where R d(z ; h) = g (z i ) R g (i I ff (x))iff 2 (x + h(x)) dx R g (i I ff (x)) dx di :

102 6.2 Global Criteria 0 Proof : We use equation (5.7) and apply the convolution to it. The value of d is obtained from: 2 d(z ; h) = g (z i )μ 2j (i ; h) di = R g (z i ) i 2 P h (i) di R p(i ) 2 di = R g (z i ) R p(i ) I ff (x 2 + h(x))g (i I ff (x)) dx di : We next prove the following. Lemma 6.23 The function d : H! R is bounded and Lipschitz continuous. Proof : The proof of the rst part uses exactly the same ideas as those of the second part of proposition 6.7. For the second part, we rst prove that the function H! R, h! d(z ; h) is Lipschitz continuous with a Lipschitz constant L d that is independent of z 2 [0; A] and second prove that the function [0; A]! upperbounded independently of h 2 H. Indeed, jd(z ; h ) d(z ; h 2 )j» g (z i ) R Because I ff 2 R g (i I ff (x))jiff(x 2 + h (x)) I ff (x 2 + h 2(x))j dx R g (i I ff (x)) dx is Lipschitz continuous and of Schwarz inequality, we have jd(z ; h ) d(z ; h 2 )j» jjlip(i ff 2 )kh h 2 k H g (z i ) R R g (i I ff dx (x))2 2 R g (i I ff (x)) dx di : A di : The function of z that appears on the right-hand side of this inequality does not depend on h, is continuous and therefore bounded on [0; A]. We now = and, since R (i z )g (z i ) R R g (i I ff (x))iff 2 (x + h(x)) dx g (i I ff (x))iff 2 (x + h(x)) dx R g (i I ff (x)) dx R g (i I ff (x)) R jz i jg (z i ) di : di ; is

103 02 Chapter 6: Properties of the Matching Terms The right-hand side of this inequality is equal to 2 A conclusion follows. 2 R + 0 zg (z) dz, from where the This allows us to prove the following result. Theorem 6.24 The function f g CR : H 2! R is Lipschitz continuous and bounded. Proof : The denominator jjv 2 (h) is > 0 and bounded (lemma 6.5), and Lipschitz continuous on H and K (corollary 6.9). The numerator is bounded because CR(h) is bounded by, d is bounded (lemma 6.23) and μ 2 (h) is bounded (lemma 6.5). The product μ 2 (h)cr(h) is Lipschitz continuous on H as the product of two bounded Lipschitz continuous functions (proposition 6. and theorem 6.2). Hence we have proved the boundedness of f g CR. The function d : H! R is Lipschitz continuous (lemma 6.23). The function r : H! R dened by (z 2 ; h)! z 2 CR(h) is Lipschitz continuous because of theorem 6.2 and jz 2 CR(h) z 0 2CR(h 0 )j = jz 2 (CR(h) CR(h 0 )) + CR(h 0 )(z 2 z 0 2 )j» AjCR(h) CR(h 0 )j + jz 2 z 0 2j: Hence the numerator is also Lipschitz continuous and, from proposition 6., so is f g CR. 2 We nally obtain the main result of this section. Theorem 6.25 The function F g CR : H! H dened by is Lipschitz continuous and bounded. F g CR (h) =f g CR (Iff ;I ff (Id 2 + h))riff 2 (Id + h) Proof : bounded. It implies that F g CR the same pattern as the proof of theorem 6.3 and uses theorem The boundedness follows from theorem 6.24 and the fact that jri ff 2 j is (h) 2 H 8h 2 H. The rest of the proof follows exactly We nish this section with the following proposition. Proposition 6.26 The function! R n such that x! F g CR (h(x)) satises jf g CR (h(x)) F g CR (h(y))j»k(jx yj + jh(x) h(y)j); for some constant K>0.

104 6.2 Global Criteria 03 Proof : The proof is similar to that of proposition 6.4 and follows from theorem 6.24 and the fact that the functions I ff, Iff 2 and all its derivatives, are Lipschitz continuous Cross Correlation The goal of this section is to prove the Lipschitz-continuity of the function F g CC (h) (equation (5.9)). Lemma 6.27 The following inequalities are veried for μ and v. Proof : We have μ = R 2 i jj 0» μ» A» v» + A 2 : G (I h (x) i) dx di di 2 = I jj ff (x) dx; (6.2) and the rst inequality follows from the fact that I ff (x) 2 [0; A]. Similarly, for v we have v = R 2 i 2 G jj (I h (x) i) dx di di 2 μ 2 = + jj from which the second inequality follows. 2 I ff (x)2 dx jj I ff (x) dx 2 ; (6.3) Proposition 6.28 The function H! R dened by h 7! v ;2 (h) is bounded and Lipschitz continuous. Proof : We have v ;2 (h) = (i μ )(i 2 μ 2 (h)) R jj 2 = (i jj μ ) g (I ff (x) i ) di R G (I h (x) i) dx di di 2 (i 2 μ 2 (h)) g (I ff 2 (x + h(x)) i 2) di 2 R = (I jj ff (x) μ )(I ff (x 2 + h(x)) μ 2(h)) dx; dx

105 04 Chapter 6: Properties of the Matching Terms that is v ;2 (h) = I jj ff (x) Iff(x 2 + h(x)) dx μ μ 2 (h): (6.4) Hence, we have jv ;2 (h)j»a 2, which proves the rst part of the proposition. For the second part, since μ 2 (h) is Lipschitz continuous (lemma 6.8), it sufces to show the Lipschitz continuity of the rst term in the right-hand side. For this term, we have jj I ff (x) Iff 2 (x + h (x)) dx I ff (x) Iff 2 (x + h 2(x)) dx» ji jj ff (x)jjiff(x 2 + h (x)) I ff (x 2 + h 2(x))j dx» A Lip(Iff 2 ) jj Hence (Cauchy-Schwarz) 2 jj I ff (x) Iff 2 (x + h (x)) dx j h (x) h 2 (x) j dx I ff (x) Iff 2 (x + h 2(x)) dx» A Lip(Iff 2 ) jj =2 jjh h 2 jj H : Theorem 6.29 The function H! R dened by h 7! CC g (h) is bounded and Lipschitz continuous. Proof : CC g (h) is bounded by. The mapping h 7! CC g (h) is Lipschitz because we have CC g (h) = v ;2(h) 2 v v 2 (h) ; (6.5) with v >0 (proposition 6.27), v ;2 (h) bounded and Lipschitz (proposition 6.28), v 2 (h) Lipschitz (corollary 6.9) and v 2 (h) (lemma 6.5), which allows us to apply proposition Theorem 6.30 The function f g CC : H 2! R dened by (z ;z 2 ; h) 7! 2 jj» v;2 (h) v 2 (h) is bounded and Lipschitz continuous. z μ CC g (h) v z2 μ 2 (h) v 2 (h) :

106 6.3 Local Criteria 05 Proof : Taking into account the properties mentioned in the proof of the previous proposition, plus the boundedness and Lipschitz continuity of CC g (h) (proposition 6.29) and of μ 2 (lemma 6.8), we see that f g CC may be written as f g CC (z ;z 2 ; h) =f (h) z + f 2 (h) z 2 + f 3 (h); where the functions H! R f ;f 2 and f 3 are Lipschitz continuous and bounded (proposition 6.). The result readily follows since 2 jz i f i (h) zi 0 f i(h 0 )j = jz i (f i (h) f i (h 0 )) + f i (h 0 )(z i zi 0 )j» Ajf i (h) f i (h 0 )j + kf i k jz i zij 0» A Lip(f i )kh h 0 k H + kf i k jz i zij 0 i =; 2: Theorem 6.3 The function F g CC : H! H, dened by is Lipschitz continuous and bounded. F g CC (h) =f g CC (Iff ;Iff(Id 2 + h)) riff 2 (Id + h) Proof : bounded. It implies that F g CC the same pattern as the proof of theorem 6.3 and uses theorem The boundedness follows from theorem 6.30 and the fact that jri ff 2 j is (h) 2 H 8h 2 H. The rest of the proof follows exactly We nish this section with the following proposition. Proposition 6.32 The function! R n such that x! F g CC (h(x)) satises jf g CC (h(x)) F g CC (h(y))j»k(jx yj + jh(x) h(y)j); for some constant K>0. Proof : The proof is similar to that of proposition 6.4 and follows from theorem 6.30 and the fact that the functions I ff, Iff 2 and all its derivatives, are Lipschitz continuous Local Criteria We now study the Lipschitz continuity of the gradients of the local criteria, as dened in (5.4).

107 06 Chapter 6: Properties of the Matching Terms The analysis of the local criteria follows pretty much directly from the analysis of the global ones and from theorem 5.2. The main difference with the global case is that we have an extra spatial dependency. In the next lemma we introduce a constant that is needed in the sequel. Lemma 6.33 Let diam() be the diameter of the open bounded set : diam() = sup kx yk: x ;y2 We note G fl (diam()) the value infx ;y2 G fl (x y) and dene We say that Proof : Therefore 2 K = G fl (0) G fl (diam()) : (6.6) G fl (x x 0 ) G fl (x dx 0» K 8x 2 : 0 ) Since G fl (x 0 ) = R G fl(y x 0 ) dy, wehaveg fl (x 0 ) jjg fl (diam()). G fl (x x 0 ) G fl (x dx 0» 0 ) jjg fl (diam()) G fl (x x 0 )dx 0» jjg fl (diam()) jjg fl(0) =K : We note k jjg fl (diam()), i.e. the constant such that 8x 2, G fl (x) k Mutual Information The functions q h and Q h dened in propositions 6.4 and 6.5 are now functions of x 0 but the propositions are unchanged. The function a(i 2 ; x 0 ; h), the local version of equation (6.), is given by a(i 2 ; x 0 ; h) = I ff 2 (x + h(x))g (I ff 2 (x + h(x)) i 2) G fl (x x 0 ) dx g (I ff 2 (x + h(x)) i 2) G fl (x x 0 ) dx while the function A(i; h; x 0 ), the local version of equation (6.2), is given by A(i; h; x 0 )= I ff 2 (x + h(x))g (I h (x) i) G fl (x x 0 ) dx G (I h (x) i) G fl (x x 0 ) dx ; (6.7) : (6.8) Similarly, the function L g MI; h (i) dened by equation (6.3) must be modied as follows: L l MI; h (i; x 0)= G fl (x 0 ) (A(i; x 0; h) a(i 2 ; x 0 ; h)); (6.9)

108 6.3 Local Criteria 07 as well as the function b of equation (6.4): b(z 2 ; h; x) =(G fl?g? a G fl )(z 2 ; h; x) = and the function B of equation (6.5): R G fl (x x 0 )g (z 2 i 2 ) G fl (x 0 ) a(i 2; x 0 ; h) di 2 dx 0 ; (6.20) B(z; h; x) =(G fl?g? A G fl )(z; h; x) = R 2 G fl (x x 0 )G (z i) G fl (x 0 ) A(i; x 0; h) di dx 0 : (6.2) This being done, propositions 6.6 and 6.7 can be adapted to the present case as follows. Proposition 6.34 The function R! R + dened by z 2! b(z 2 ; h; x) is Lipschitz continuous with a Lipschitz constant lb which is independent of h and x. Moreover, it is bounded by A. Proof : We give the proof in this particular simple case to give the flavor of the ideas which extend to the more complicated cases that come later. The second part of the proposition follows from the fact that 0» a(i 2 ; h; x)» A 8i 2 2 R and 8h 2 H (local version of proposition 6.4) and lemma In order to prove the rst part, we prove that the magnitude of the derivative of the function is bounded independently of h and x. 2; h; x) 2 R (i 2 z 2 )G fl (x x 0 ) G fl (x 0 ) g (z 2 i 2 )a(i 2 ; x 0 ; h)di 2 dx 0» AK R jz 2 i 2 jg (z 2 i 2 ) di 2 : In order to derive the last inequality we have used the local version of proposition 6.4 and lemma The function on the right-hand side of the last inequality is independent of h and x and continuous on [0; A], therefore bounded. 2 Proposition 6.35 The function H! R dened by h! b(z 2 ; h; x) is Lipschitz continuous with a Lipschitz constant L l b which is independent of z 2 2 R and x 2. Proof : The proof is similar to that of proposition We also have the following.

109 08 Chapter 6: Properties of the Matching Terms Proposition 6.36 The function! R dened by x! b(z 2 ; h; x) is Lipschitz continuous uniformly on H. Proof : Because of equation (6.20) and proposition 6.34 we have jb(z 2 ; h; x) b(z 2 ; y; h)j» A The proof of lemma 6.33 allows us to write jb(z 2 ; h; x) b(z 2 ; y; h)j» and therefore A jjg fl (diam()) R R jg fl (x x 0 ) G fl (y x 0 )j g G fl (x (z 2 i 2 ) di 2 dx 0 : 0 ) jg fl (x x 0 ) G fl (y x 0 )jg (z 2 i 2 ) di 2 dx 0 ; 2 jb(z 2 ; h; x) b(z 2 ; y; h)j» ALip(G fl) jx yj: G fl (diam()) As a consequence of lemma 6.3 we can state the following. Proposition 6.37 The function b : H! R is Lipschitz continuous on H uniformly on. Proof : The proof follows from lemma 6.3 and proposition 6.34 and Similarly we have the Proposition 6.38 The function! R dened by x! B(z ;z 2 ; h; x) is Lipschitz continuous uniformly on H 2. Proof : The proof is similar to that of proposition The following proposition is also needed. Proposition 6.39 The function B : H 2! R is Lipschitz continuous on H 2 uniformly on. It is bounded by A. Proof : The proof is similar to that of proposition And therefore

110 6.3 Local Criteria 09 Corollary 6.40 The function f l MI : H 2! R dened by (z ;z 2 ; h; x)! (B(z ;z 2 ; h; x) b(z 2 ; h; x)) is Lipschitz continuous on H 2 uniformly on and bounded. From this follows the local version of theorem 6.3 Theorem 6.4 The function FMI l : H! H dened by FMI l (h) =f MI l (Iff ;Iff(Id 2 + h); Id; h) riff 2 (Id + h) = (B(I ff ;Iff(Id 2 + h); Id; h) b(iff(id 2 + h); Id; h))riff 2 (Id + h) is Lipschitz continuous and bounded. Proof : Boundedness follows from corollary 6.40 and the fact that jri ff 2 j is bounded. It implies that FMI l (h) 2 H 8h 2 H. We next consider the ith component FMI li of F MI l : with F li MI (h )(x) F li MI (h 2)(x) =S T S 2 T 2 ; S j = (B(I ff (x);iff 2 (x + h j(x)); h j ; x) b(i ff 2 (x + h j(x)); h j ; x)) T j i I ff 2 (x + h j(x)) j =; 2; j =; 2. We continue with jf li MI (h )(x) F li MI (h 2)(x)j»jS S 2 jjt j + js 2 jjt T 2 j i I ff 2 is bounded, jt jj» k@ i I ff 2 k. Because of propositions 6.34 and 6.39, js 2 j» 2 A. ii ff 2 is Lipschitz continuous jt T 2 j» Lip(@ i I ff )jh 2 (x) h 2 (x)j. Finally, because of corollary 6.40 and the fact that I ff 2 is Lipschitz continuous, js S 2 j»lip(f l MI )(Lip(Iff 2 )jh (x) h 2 (x)j + kh h 2 k H ) : The conclusion of the theorem follows from these inequalities through the same procedures as in the proof of theorem We nish this section with the Proposition 6.42 The function! R n such that x! FMI l (h(x)) satises jfmi l (h(x)) F MI l (h(y))j»k(jx yj + jh(x) h(y)j); for some constant K>0. Proof : The proof is similar to that of proposition 6.4 and uses propositions 6.36 and

111 0 Chapter 6: Properties of the Matching Terms Correlation Ratio In this case also, the proofs follow pretty much the same pattern as those in the global case. In detail, the analog of lemma 6.5 is the Lemma 6.43 where 0» μ 2 (h; x 0 )= (h; x 0 )= I G ff fl (x (x 2 0 ) + h(x))g fl(x x 0 ) dx»a (6.22)» v 2 (h; x 0 )= + (h; x 0 )» + A 2 ; (6.23) I G ff fl (x (x 2 0 ) + h(x))2 G fl (x x 0 ) dx G fl (x 0 ) 2 I ff (x 2 + h(x))g fl(x x 0 ) dx : (6.24) Proof : Because of equation (4.6) we have μ 2 (h; x 0 )= R i 2 G fl (x 0 ) g (i 2 I ff 2 (x + h(x)))g fl(x x 0 ) dx G fl (x 0 ) di 2 = I ff 2 (x + h(x))g fl(x x 0 ) dx: This yields the rst part of the lemma. For the second part, we use equation (4.7): v 2 (h; x 0 )= R i 2 2 and hence G fl (x 0 ) v 2 (h; x 0 )= + G fl (x 0 ) g (i 2 I ff (x 2 + h(x))g fl (x x 0 ) dx di 2 μ 2 (x 0 ; h) 2 ; I ff (x 2 + h(x))2 G fl (x x 0 ) dx G fl (x 0 ) from which the upper and lower bounds of the lemma follow. 2 2 I ff (x 2 + h(x))g fl(x x 0 ) dx ; We next take care of E[Var[X I ff 2 ;hjx I ff ](x 0 )] with the analog of lemma 6.6 Lemma 6.44 E[Var[X I ff 2 ;hjx I ff ](x 0 )] = + G fl (x 0 ) I ff 2 (x + h(x))2 G fl (x x 0 ) dx M [x 0 ; h];

112 6.3 Local Criteria where M [x 0 ; h] = f (x 0 ; x; x 0 )I ff (x 2 + h(x))iff 2 (x0 + h(x 0 )) dx dx 0 ; and f (x 0 ; x; x 0 )= G fl(x x 0 )G fl (x 0 x 0 ) g (i I G fl (x 0 ) R ff (x))g (i I ff (x0 )) di 2 p(i ; x ; 0 ) is such that and f (x 0 ; x; x 0 ) dx = G fl(x 0 x 0 ) G fl (x 0 ) f (x 0 ; x; x 0 ) dx 0 = G fl(x x 0 ) : G fl (x 0 ) f (x 0 ; x; x 0 ) dx dx 0 =: Proof : According to Table 4.2 we have v 2j (i ; h; x 0 )=S(i ; x 0 ) T 2 (i ; x 0 ) and with and E[Var[X I ff 2 ;hjx I ff ](x 0 )] = T (i ; x 0 )= S(i ; x 0 )= It is straightforward to show that R R R i 2 P h (i ;i 2 ; x 0 ) p(i ; x 0 ) S(i ; x 0 )p(i ; x 0 ) di = + G fl (x 0 ) It is also straightforward to show that T (i ; x 0 )= and hence that G fl (x 0 )p(i ; x 0 ) T 2 (i ; x 0 )p(i ; x 0 ) di = R G fl (x 0 ) 2 R p(i ) v 2j (i ; h; x 0 )p(i ; x 0 ) di ; R i 2 P h (i ;i 2 ; x 0 ) 2 di p(i ; x 2 ; 0 ) di 2 = μ 2j (i ; h; x 0 ): I ff 2 (x + h(x))2 G fl (x x 0 ) dx: g (i I ff (x))iff 2 (x + h(x))g fl (x x 0 ) dx; 2 g (i I ff (x))iff(x 2 + h(x))g fl(x x 0 ) dx di : (6.25)

113 2 Chapter 6: Properties of the Matching Terms We next write 2 g (i I ff (x))iff(x 2 + h(x))g fl (x x 0 ) dx = g (i I ff (x))iff 2 (x + h(x))g fl(x x 0 ) g (i I ff (x0 ))I ff 2 (x0 + h(x 0 ))G fl (x 0 x 0 ) dx dx 0 ; commute the integration with respect to i with that with respect to x and x 0 to obtain the result. 2 We continue with the analog of lemma 6.22: Lemma 6.45 The function h l CR = G?L l CR : H 2! R, where L l CR equation (5.2), is equal to the following expression is given by hcr l (z 2 ;z 2 ; h; x 0 )= G fl (x 0 ) v 2 (h; x 0 ) μ 2 (h; x 0 ) d(z ; h; x 0 )+CR l (h; x 0 )(z 2 μ 2 (h; x 0 )) where d(z ; h; x 0 )= g (z i ) R R g (i I ff (x))iff(x 2 + h(x))g fl(x x 0 ) dx R g (i I ff (x))g fl(x x 0 ) dx Proof : We use equation (5.2) and apply the convolution to it. The value of d is obtained from: d(z ; h; x 0 )= R g (z i ) G fl (x 0 )p(i ; x 0 ) g (z i )μ 2j (i ; h; x 0 ) di = R g (z i ) i R p(i ; x 2 P h (i ;i 2 ; x 0 ) di 0 ) 2 di = R from where the result follows. 2 I ff 2 (x + h(x))g (i I ff (x))g fl(x x 0 ) dx ; di : di Our goal is to prove that f l CR = G fl?h l CR, a function from H 2 in R is Lipschitz continuous in H 2 uniformly on. In order to prove this, it is sufcient to prove that the numerator and the denominator of h l CR are bounded and Lipschitz continuous in H 2 uniformly on, and that the denominator is strictly positive. Indeed, we have the following

114 6.3 Local Criteria 3 Lemma 6.46 Let N : H 2! R be bounded and Lipschitz continuous in H 2 uniformly on. Let also D : H 2! R + be bounded, strictly positive, and Lipschitz continuous in H 2 uniformly on. Then the function G fl? N D : H 2! R is Lipschitz continuous in H 2 uniformly on. Proof : We form N (x0 )[z G fl (x x 0 ; h 0 ] 0 ) D(x 0 )[z 0 ; h N (x 0)[z; h] 0 dx ] D(x 0» 0 )[z; h] jn (x 0 )[z 0 ; h 0 ] N (x 0 )[z; h]j G fl (0) D(x 0 )[z 0 ; h 0 dx 0 + ] jn (x 0 )[z; h]jjd(x 0 )[z 0 ; h 0 ] D(x 0 )[z; h]j G fl (0) D(x 0 )[z 0 ; h 0 dx ]D(x 0 0 )[z; h] According to the hypotheses, there exists a>0 such that a» D(x 0 )[z; h]8z; x 0 ; h, there exists K N 0 such that jn (x 0 )[z; h]j»k N 8z; x 0 ; h, and there exists L N and L D such that jn (x 0 )[z 0 ; h 0 ] N (x 0 )[z; h]j»l N (jz z 0 j + kh h 0 k H ) jd(x 0 )[z 0 ; h 0 ] D(x 0 )[z; h]j»l D (jz z 0 j + kh h 0 k H ) 8z; z 0 ; x 0 ; h; h 0 : We therefore have through Cauchy-Schwarz inequality: G fl (x x 0 ) for some positive constant C. 2 N (x0 )[z 0 ; h 0 ] D(x 0 )[z 0 ; h N (x 0)[z; h] 0 ] D(x 0 )[z; h] dx 0» and C(jz z 0 j + kh h 0 k H ) We prove these properties for the numerator and the denominator of h l CR. Lemma 6.47 We have jjdiam()»g fl (x 0 ) v 2 (h; x 0 )»jjg fl (0)( + A 2 ) Proof : The proof is a direct consequence of the denition of G(x 0 ) and of lemma We then prove the following Lemma 6.48 The function H! R + such that (h; x 0 )! G fl (x 0 ) v 2 (h; x 0 ) is Lipschitz continuous in H uniformly in.

115 4 Chapter 6: Properties of the Matching Terms Lemma 6.49 The function H 2! R such that (z; h; x 0 )! μ 2 (h; x 0 ) d(z ; h; x 0 )+CR l (h; x 0 )(z 2 μ 2 (h; x 0 )); is bounded by 3A. Proof : This is because 0» CR(h; x 0 )», 0» μ 2 (h; x 0 )»A, and, according to lemma 6.45, because 0» d(z ; h; x 0 )»A. 2 At this point we can prove the following Proposition 6.50 The function! R n such that x! fcr l (z; h; x) is Lipschitz continuous uniformly in H 2. Proof : With the notations of lemma 6.46 we have jg fl?h l CR (x) G fl?h l CR (y)j» Because of lemmas 6.47 and 6.49 we have jg fl?h l CR (x) G fl?h l CR (y)j» hence the result. 2 3A jjdiam() jg fl (x x 0 ) G fl (y x 0 )j jn (x 0)[z; h]j D(x 0 )[z; h] jg fl (x x 0 ) G fl (y x 0 )j dx 0» dx 0 3ALip(G fl ) jx yj; diam() We also have the analog of lemmas 6.23, 6.8, and theorem 6.2. Lemma 6.5 The function d : H! R is bounded and Lipschitz continuous in H uniformly in. Lemma 6.52 The functions H! R dened by and (h; x 0 )! G fl (x 0 ) I ff 2 (x + h(x)) G fl(x x 0 ) dx (h; x 0 )! I G ff fl (x (x 2 0 ) + h(x))2 G fl (x x 0 ) dx are bounded and Lipschitz continuous in H uniformly in.

116 6.3 Local Criteria 5 Theorem 6.53 The function H! R dened by (h; x 0 )! E[Var[X I ff 2 ;h jx l I ff ](x 0)] v 2 (h; x 0 ) is Lipschitz continuous in H uniformly in. From which we deduce the Theorem 6.54 The function H 2! R such that (z; h; x 0 )! μ 2 (h; x 0 ) d(z ; h; x 0 )+CR l (h; x 0 )(z 2 μ 2 (h; x 0 )); is Lipschitz continuous in H 2 uniformly in. We can now prove the Theorem 6.55 The function fcr l : H 2! R such that (z; h; x)! fcr l (z; h; x) is Lipschitz continuous in H 2 uniformly in. Proof : The proof is just an application of lemma 6.46 to f l CR. 2 The combination of proposition 6.50 and theorem 6.55 yields the following Theorem 6.56 The function fcr l : H 2! R such that (z; h; x)! fcr l (z; h; x) is Lipschitz continuous. And we can conclude with the following theorem and proposition. Theorem 6.57 The function FCR l : H! H dened by FCR l (h) =f CR l (Iff ;Iff(Id 2 + h); Id)rIff 2 (Id + h) is Lipschitz continuous and bounded. Proof : The proof follows exactly the same pattern as the proof of theorem 6.4 and uses theorem Proposition 6.58 The function! R n such that x! FCR l (h(x)) satises jfcr l (h(x)) F CR l (h(y))j»k(jx yj + jh(x) h(y)j); for some constant K>0. Proof : The proof is similar to that of proposition 6.26 and follows from theorem 6.56 and the fact that the functions I ff, Iff 2 and all its derivatives, are Lipschitz continuous. 2

117 6 Chapter 6: Properties of the Matching Terms Cross Correlation In this section we prove the Lipschitz-continuity of the mapping H! H dened by FCC l (h) (equation (5.4)). The reasoning is completely analog to that of the global case. We start with some estimates which guaranty that the variance of X I ff ; x 0 is strictly positive. Lemma x 0 2, the following inequalities are veried Proof : μ (x 0 )= R 2 i G fl (x 0 ) 0» μ (x 0 )» A;» v (x 0 )» + A 2 : G (I h (x) i) G fl (x x 0 ) dx di di 2 = I G ff fl (x 0 ) (x) G fl(x x 0 ) dx (6.26) and the rst inequalities follow from the fact that I ff (x) 2 [0; A]. Similarly, for v (x 0 ), we have v (x 0 )= R 2 i 2 G fl (x 0 ) + G fl (x 0 ) G (I h (x) i) G fl (x x 0 ) dx I ff (x)2 G fl (x x 0 ) dx G fl (x 0 ) from which the second inequalities follow. 2 di di 2 μ 2 (x 0)= 2 I ff (x) G fl(x x 0 ) dx ; (6.27) We now show the boundedness and Lipschitz-continuity of the covariance of X I ff ; x 0 and X I ff 2 ; x 0 ; h. Proposition 6.60 The function H! R dened by (h; x 0 ) 7! v ;2 (h; x 0 ) is bounded and Lipschitz continuous in H, uniformly in. Proof : Indeed, we have v ;2 (h; x 0 )= Hence v ;2 (h; x 0 )= G fl (x 0 ) R 2 (i μ (x 0 )) (i 2 μ 2 (h; x 0 )) R G fl (x 0 ) G (I h (x) i) G fl (x x 0 ) dx di di 2 : (i μ (x 0 )) g (I ff (x) i ) di R (i 2 μ 2 (h; x 0 )) g (I ff (x 2 + h(x)) i 2) di 2 G fl (x x 0 ) dx;

118 6.3 Local Criteria 7 that is v ;2 (h; x 0 )= = G fl (x 0 ) G fl (x 0 ) (I ff (x) μ (x 0 )) (I ff 2 (x + h(x)) μ 2(h; x 0 )) G fl (x x 0 ) dx I ff (x) Iff 2 (x + h(x)) G fl(x x 0 ) dx μ (x 0 ) μ 2 (h; x 0 ): (6.28) Thus we have, 8x 0 2, jv ;2 (h; x 0 )j»a 2, which proves the rst part of the proposition. For the second part, since μ 2 (h; x 0 ) is Lipschitz continuous uniformly in (lemma 6.52), it sufces to show the Lipschitz continuity of the rst term in the righthand side. For this term we have, G fl (x 0 )» I ff (x) Iff 2 (x + h (x)) G fl (x x 0 ) dx I ff (x) Iff 2 (x + h 2(x)) G fl (x x 0 ) dx G fl (x 0 )» A Lip(I ff 2 ) G fl(0) k Hence (by Cauchy-Schwarz): G fl (x 0 ) ji ff (x)jjiff 2 (x + h (x)) I ff 2 (x + h 2(x))j G fl (x x 0 ) dx j h (x) h 2 (x) j dx: I ff (x) Iff 2 (x + h (x)) G fl (x x 0 ) dx I ff (x) Iff(x 2 + h 2(x)) G fl (x x 0 ) dx»ljjh h 2 jj H ; where the constant L is independent of x Theorem 6.6 The function H! R dened by (h; x) 7! CC l (h; x) is bounded and Lipschitz continuous in H, uniformly in. Proof : The cross-correlation function CC l is bounded by. Moreover, we have with: CC l (h; x) = v ;2(h; x) 2 v (x) v 2 (h; x) ; (6.29) ffl v ;2 (h; x) bounded and Lipschitz-continuous in H uniformly in (proposition 6.60).

119 8 Chapter 6: Properties of the Matching Terms ffl v 2 (h; x) bounded and Lipschitz-continuous in H, uniformly in (readily seen from lemmas 6.5 and 6.52). ffl v 2 (h; x) > 0 (lemma 6.5). ffl v (x) bounded and > 0 (lemma 6.59). We may therefore apply proposition Theorem 6.62 The function H 2! R dened by (z ;z 2 ; h; x) 7! L l CC;h (z ;z 2 ; x) is bounded and Lipschitz continuous in H 2, uniformly in. Proof : We have L l CC;h (z ;z 2 ; x) = 2 G fl (x)» v;2 (h; x) v 2 (h; x) z μ (x) v (x) CC g (h; x) z2 μ 2 (h; x) v 2 (h; x) : Taking into account the properties mentioned in the proof of proposition 6.6, the boundedness and Lipschitz continuity of CC l (proposition 6.6) and of μ 2 (lemma 6.52), plus the fact that G fl (x) > 0, we see that L l CC;h may be written as L l CC;h (z ;z 2 ; x) =f (h; x) z + f 2 (h; x) z 2 + f 3 (h; x); where the functions H! R f ;f 2 and f 3 are bounded and Lipschitz continuous in H uniformly in, from where the result readily follows. 2 Theorem 6.63 The function f l CC : H 2! R, dened by is bounded and Lipschitz continuous. f l CC (z; x) = G fl?l l CC;h (z; x) Proof : Indeed, let f : H 2! R be bounded by B f and Lipschitz continuous in H 2 uniformly in, of constant L f (these hypotheses are veried by L l CC;h according to theorem 6.62). We have G fl (x x 0 ) f (z; h; x 0 ) f (z 0 ; h 0 ; x 0 ) dx 0» G fl (0) jj L f jz z 0 j + kh h 0 k H ;

120 6.3 Local Criteria 9 and jg fl?f(z; h; x) G fl?f(z; h; y)j» hence the result. 2 jg fl (x x 0 ) G fl (y x 0 )jjf (z; h; x 0 )j dx 0» B f Lip(G fl )jjjx yj; Theorem 6.64 The function FCC l : H! H dened by FCC l (h) =f CC l (Iff ;Iff(Id 2 + h); Id)rIff 2 (Id + h) is Lipschitz continuous and bounded. Proof : The proof follows exactly the same pattern as the proof of theorem 6.4 and uses theorem Proposition 6.65 The function! R n such that x! FCC l (h(x)) satises jfcc l (h(x)) F CC l (h(y))j»k(jx yj + jh(x) h(y)j); for some constant K>0. Proof : The proof is similar to that of proposition 6.26 and follows from theorem 6.63 and the fact that the functions I ff, Iff 2 and all its derivatives, are Lipschitz continuous. 2

121

122 Part III Implementation Aspects

123

124 Chapter 7 Numerical Schemes This chapter describes the numerical schemes employed in discretizing the continuous evolution equations (see equation (.2) on page 39) for the different matching and regularization operators. We use spatial difference schemes to evaluate differential operators and an explicit forward discretization in time (Euler method). Parzen window estimates are computed by recursive ltering [3] of the discrete joint intensity histogram. These elements are detailed in the following sections. 7. Regularization Operators We begin by describing the schemes for the differential operators used for regularization. We use a schematic notation for the description of the nite-difference schemes. For instance, let us denote by L i;j;k p and h i;j;k p, the components (p =; 2; 3) of h and h at a grid point (i; j; k) in the discrete image domain. The voxel size in all directions is assumed to be equal to one. A possible scheme for ff h would be L i;j p = ff h i+;j p + h i ;j p + h i;j p + h i;j+ p 4 h i;j p ; (7.) in the 2D case (p =; 2) and L i;j;k p = ff h i+;j;k p + hp i ;j;k + h i;j ;k p + h i;j+;k p + h i;j;k p + h i;j;k+ p 6 h i;j;k p ; (7.2) int the 3D case (p =; 2; 3), which we write schematically as

125 24 Chapter 7: Numerical Schemes L i;j p = 4 and L i;j;k p = 6 h p ff h p ff respectively. In this notation, the tables represent the discrete grid and contain the weights associated to each pixel (voxel). The weight is zero if the voxel is empty. The function to which the grid corresponds is written at the bottom, together with any global weight (for example h p ff above means the grid is that of h p, weighted globally by ff). In each table, the index i is assumed to increase from left to right and the index j from top to bottom, the center being the coordinates (i; j). In the 3D case, each of the three stacked tables represents a different index k, which increases in the bottom-up direction, the one in the middle being that of index k. Although less compact than the notations (7.) and (7.2), these schematic representations have the advantage of clearly showing the position of the weights within the neighborhood of each voxel. 7.. The Linearized Elasticity Operator In this section, we describe our numerical schemes for computing the linearized elasticity operator: A = c ο h +( ο)r(r h) : (7.3) Our schemes are based on a rst order Taylor expansion of (7.3). For n = 2, it yields the following scheme: A i;j = ; h ο h ( ο) h 2 ( ο) 4 A i;j 2 = : h 2 ο h 2 ( ο) h ( ο) 4

126 7. Regularization Operators 25 Similarly, for n =3we obtain: A i;j;k = ; h ο h ( ο) h 2 ( ο) 4 h 3 ( ο) 4 A i;j;k 2 = h 2 ο h 2 ( ο) h ( ο) 4 h 3 ( ο) 4 and A i;j;k 3 = : h 3 ο h 3 ( ο) h ( ο) 4 h 2 ( ο) 4

127 26 Chapter 7: Numerical Schemes 7..2 The Nagel-Enkelmann Operator 2D case Our implementation corresponds to the scheme proposed by Alvarez et al [5]. Let now A = div(t I ff Dh), where T I ff = The scheme is the following, for p =; 2: ψ a b c d! : A i;j p = Λ + Λ 2 a h p 2 a h p + Λ + Λ 2 c h p 2 c h p + Λ + Λ 4 b h p 4 b h p 4 b Λ h p 4 b Λ h p : 3D case This scheme generalizes readily to the 3D case. In order to write explicitly the 3D scheme in a compact way, we take prot of the very simple form of this scheme to introduce a more compact notation. We shall write S 2 (a; x + ) Λ ; 2 a h p where x + indicates the direction dened by the voxels with non-null weights, starting at the center. With this notation, we write the 2D Nagel-Enkelmann operator above as A i;j p = S 2 (a; x+ ) + S 2 (a; x ) + S 2 (c; y + ) + S 2 (c; y ) + S 4 (b; x + y + ) + S 4 (b; x y ) S 4 (b; x + y ) S 4 (b; x y + ):

128 7.2 Dissimilarity Terms 27 In the 3D case, we have T I ff = 0 a b c b d e c e f and the corresponding scheme is, for p =; 2; 3, C A ; A i;j;k p = S 2 (a; x + ) + S 2 (a; x ) + S 2 (d; y + ) + S 2 (d; y ) + S 2 (f;z + )+ S 2 (f;z ) + S 4 (b; x + y + ) + S 4 (b; x y ) S 4 (b; x + y ) S 4 (b; x y + ) + S 4 (c; x + z + ) + S 4 (c; x z ) S 4 (c; x + z ) S 4 (c; x z + ) + S 4 (e; y + z + ) + S 4 (e; y z ) S 4 (e; y + z ) S 4 (e; y z + ): 7.2 Dissimilarity Terms This section discusses implementation issues concerning the three global matching functions F g MI (h), F g CR (h) and F g CC (h) as dened in (5.9) on page 85, and the local matching function FCC l (h) as dened in (5.4) on page 86. The remaining two functions will be treated in section 7.3. The reason for this separation is that the global functions and FCC l (h) can be computed as direct estimations of their respective denitions, while FMI l (h) and F CR l (h) require two convolutions, which makes them very hard to implement with limited memory. By observing their respective denitions, it appears that some specic operations are required to implement the global functions and FCC l (h). These operations are the following. ffl Convolutions The convolutions by a Gaussian kernel are approximated by recursive ltering using the smoothing operator introduced by Deriche [3]. Given a discrete D input sequence x(n); n = ; :::; M, its convolution by the smoothing operator S ff (n) =k (ffjnj +)e ffjnj is calculated efciently as (see [3]): y(n) =(S ff?x)(n) =y (n) +y 2 (n); where 8>< >: y (n) = k x(n) + e ff (ff ) x(n ) +2e ff y (n ) e 2ff y (n 2); y 2 (n) = k e ff (ff +)x(n +) e 2ff x(n +2) +2e ff y 2 (n +) e 2ff y 2 (n +2): The normalization constant k is chosen by requiring that R R S ff(t) dt =, which yields k = ff=4. This scheme is very efcient since the number of operations

129 28 Chapter 7: Numerical Schemes required is independent of the smoothing parameter ff. The smoothing lter can be readily generalized to n dimensions by dening the separable lter T ff (x) = Q n i= S ff(x i ). We can determine an optimal value of ff to approximate a Gaussian of variance. Requiring that G (t) and S ff (t) have the same L 2 norm yields the relation ff = 6 5 p ß ' :8=p ; while minimizing the L 2 norm of (G (t) S ff (t)) yields ff ' :695= p : As showed by Alvarez et al. in [2], the recursive procedure above can be seen as a numerical implementation of the heat equation. The convolution by S ff presents the advantage of being computable exactly by a recursive lter of order two, giving very precise results and fast computations. By computing the derivative of S ff (t), S 0 ff (t) = 4 ff3 te ffjtj ; we obtain a derivative lter of which a recursive realization can be similarly obtained: where now y(n) =(S 0 ff?x)(n) =(S ff?x 0 )(n) = 4 ff3 e ff (y (n) +y 2 (n)); 8 >< >: ffl Interpolation y (n) = x(n ) + 2 e ff y (n ) e 2ff y (n 2); y 2 (n) = x(n +) + 2 e ff y 2 (n +) e 2ff y 2 (n +2): Terms of the form ri ff 2 (x + h(x)) are calculated as follows. Convolution of I 2 with the derivative lter is used to compute the components of ri ff 2 on i;j;k. Then their value for an arbitrary position (not necessarily on the grid) is computed using a trilinear interpolation scheme, dened as follows. Let f be the function to be evaluated at (i + x; j + y; k + z), where (i; j; k) is a point on the grid and (x; y; z) 2 [0; ] 3. We set V = f (i; j; k); V 2 = f (i +; j; k); V 3 = f (i +; j +; k); V 4 = f (i; j +; k); V 5 = f (i; j; k +); V 6 = f (i +; j; k +); V 7 = f (i +; j +; k +); V 8 = f (i; j +; k +):

130 7.2 Dissimilarity Terms 29 Then the value f (i + x; j + y; k + z) is estimated as V = V ( x)( y)( z) + V 2 x ( y)( z) V 3 xy( z) + V 4 ( x) y ( z) + V 5 ( x)( y) z + V 6 x ( y) z + V 7 xyz + V 8 ( x) yz: The same interpolation scheme is used for estimating the value of I ff 2 (x+h(x)). ffl Density estimation Parzen density estimates are obtained by smoothing the discrete joint histogram of intensities. To describe this procedure, we dene the piecewise constant function v :! [0;N] 2 ρ N 2 by quantication of I h (x) into N + intensity levels (bins): v(x) = b I ff (x)c b I ff 2 (x + h(x))c A = 8 > < >: (0; 0) T on 0;0. (N;N) T on N;N ; where = N=A, b c denotes the floor operator in R +, i.e. the function R +! N such that bxc =maxfn 2 N : n» xg, and f k;l g (k;l)2[0;n] 2 is a partition of. We then compute, setting 0 = 2, P h (i) = jj = jj = G (I h (x) i) dx G 0 (Ih (x) i) dx ' jj = jj NX NX k=0 l=0 NX NX k=0 l=0 j k;l j=jj z } H(k;l) G 0 v(x) i dx k;l G 0(k i ;l i 2 ) dx G 0(k i ;l i 2 )= H?G 0 ( i); H being the discrete joint histogram. The convolution is performed by recursive ltering as described above. Note that this way of computing P h is quite efcient since only one pass on the images is required, followed by the convolution. These basic tools being described, we now state pseudo-algorithms of the way the different matching functions are computed.

131 30 Chapter 7: Numerical Schemes Algorithm 7. (F g MI (h)) ffl Estimate P h (i) and its marginals. ffl Estimate L g MI;h (i) (equation (5.6) on page 84) using centered nite-differences for the derivatives. ffl Estimate f g MI (i; h) =G?LMI;h g (i) by recursive smoothing. ffl Estimate F g MI (h)(x) =f g MI (I h(x); h) ri ff 2 (x + h(x)). Algorithm 7.2 (F g CR (h)) ffl Estimate P h (i) and its marginals. ffl Estimate μ 2 (h);v 2 (h);μ 2j (i ; h) and CR g (h) using equations (4.3), (4.4), (4.5), (4.0) and table 4., on pages Here the integrals are estimated by nite sums on the interval [0; A]. ffl Estimate L g CR;h (i) (equation (5.7) on page 84). ffl Estimate f g CR (i; h) =G?LCR;h g (i) by recursive smoothing. ffl Estimate F g CR (h)(x) =f g CR (I h(x); h) ri ff 2 (x + h(x)). Algorithm 7.3 (F g CC (h)) ffl Estimate the values μ 2 (h) and v 2 (h) using equations (6.9), (6.0) and (6.) on page 97 and the values μ ;v ;v ;2 (h) and CC g (h) using equations (6.2), (6.3), (6.4) and (6.5) on pages ffl Estimate L g CC;h (i) (equation (5.8) on page 85). ffl Estimate F g CC (h)(x) =Lg CC;h (I h(x)) ri ff 2 (x + h(x)). Algorithm 7.4 (F l CC (h)) ffl Estimate the function G fl (x) using equation (4.3) on page 68, the functions μ 2 (h; x) and v 2 (h; x) using equations (6.22), (6.23) and (6.24) on page 0 and the functions μ (x);v (x);v ;2 (h; x) and CC l (h; x) using equations (6.26), (6.27), (6.28) and (6.29) on pages 6 7. These estimations are computed by convolutions using recursive ltering.

132 7.3 Approximate Implementations of FMI l (h) and F CR l (h) 3 ffl Estimate (G fl?l l CC;h )(i; x) (see equation (5.3) on page 85) as: (G fl?l l CC;h )(i; x) =(G fl?f )(x) i +(G fl?f 2 )(x) i 2 +(G fl?f 3 )(x); where 8>< f (x) = 2 v ;2 (h; x) G fl (x) v (x) v 2 (h; x) ; >: f 2 (x) = 2 CCl (h; x) G fl (x) v 2 (h; x) ; f 3 (x) = f (x) μ (x) +f 2 (x) μ 2 (h; x) : ffl Estimate FCC l (h)(x) =(G fl?l l CC;h )(I h(x); x) ri ff 2 (x + h(x)). Note that algorithm 7.4 is similar to the algorithm proposed by Cachier and Pennec in [2], modulo the adding of a positive multiple of in all the denominators. This factor is crucial for the Lipschitz-continuity of F l CC (h). 7.3 Approximate Implementations of F l MI (h) and F l CR (h) This section discusses the implementation of the functions FMI l (h) and F CR l (h), dened in theorem 5.2 on page 86. These two functions are much more difcult to compute than those of the previous section. The reason for this is that they involve two convolutions, one with respect to the intensity variable i and the other with respect to the space variable x. A possible way to implement them would be to estimate the functions L l MI;h and Ll CR;h, and then smooth these functions, e.g. by recursive ltering. The problem is that this would require a dense data structure of dimension (n +) for LCR;h l and (n +2)for Ll MI;h. With 3D images, it becomes extremely difcult to maintain these four and ve-dimensional structures due to memory space limitations, not counting the computational effort of smoothing them, which has to be done at each iteration of the minimization flow. Our implementations rely on using directly the unsmoothed versions of LMI;h l and Ll CR;h, i.e. on eliminating both convolutions. We note the two functions obtained F ~ MI l (h) and F ~ CR l (h). Their implementation is described in more detail in the following sections Mutual Information We dene F ~ MI l (h) by eliminating the convolutions in the denition of Fl MI (h). It remains to estimate the function L l MI;h (Iff ;Iff 2 (Id + h); Id), which is done using equations (6.7), (6.8) and (6.9), starting on page 06. Note that clearly, since all the denominators involved are strictly positive and bounded, the function h! LMI;h l (Iff ;Iff(Id 2 + h); Id) is also Lipschitz-continuous, and therefore so is F ~ MI l (h).

133 32 Chapter 7: Numerical Schemes Note however that the integrals involved can no longer be approximated by recursive ltering. The ltering becomes non-stationary. We therefore propose the following algorithm. Algorithm 7.5 ( ~ F l MI (h)) ffl Estimate for each y on the discrete grid the value of a(i ff 2 (y+h(y)); y; h) using equation (6.7) on page 06, approximating the integrals by a nite sum on a sufciently large neighborhood around y. ffl Estimate in a similar way the value of A(I ff (y);iff 2 (y + h(y)); y; h) using equation (6.8) on page 06, of G fl (x) using equation (4.3) on page 68 and of LMI;h l (Iff (y);iff 2 (y + h(y)); y) using equation (6.9) on page 06. ffl Estimate F ~ MI l (h)(y) =Ll MI;h (Iff (y);iff(y 2 + h(y)); y) riff 2 (y + h(y)) Correlation Ratio Similarly to the previous case, we eliminate the two convolutions in the denition of FCR l (h). It remains to estimate the function Ll CR;h (Iff ;Iff 2 (Id + h); Id), where μ 2j (i ; h; x 0 ) is given by (see lemma 6.45 on page 2): μ 2j (i ; h; x 0 )= g (i I ff (x)) Iff 2 (x + h(x)) G fl(x x 0 ) dx g (i I ff (x)) G fl(x x 0 ) dx : (7.4) However, there is still another difculty in this case. The value of CR l (h; x), which is needed in the computation of L l CR;h, requires itself a convolution with respect to the intensity variable (see e.g. the proof of lemma 6.44 and in particular equation (6.25) on page ). For this reason, we make another approximation, namely that of replacing CR l (h; x) = in the denition of L l CR;h, by the function v 2j (i ; h; x) p(i ) di R v 2 (h; x) (i ; h; v 2j(i ; h; x) x) = ; (7.5) v 2 (h; x) where the function v 2j (i ; h; x) is given by (see the proof of lemma 6.44 on page ): v 2j (i ; h; x) =S(i ; h; x) μ 2j (i ; h; x) 2 : (7.6) In practice we use fl =5and a neighborhood of size 9 9:

134 7.3 Approximate Implementations of FMI l (h) and F CR l (h) 33 The function S(i ; h; x) is given by S(i ; h; x 0 )= p(i ; x 0 ) = + i 2 2 P h (i) di 2 R With these approximations, we dene g (i I ff (x)) Iff 2 (x + h(x))2 G fl (x x 0 ) dx g (i I ff (x)) G fl(x x 0 ) dx : (7.7) ~L l CR;h (i; x) = 2 G fl(x) v 2 (h; x) μ 2 (h; x) μ 2j (i ; h; x) + (i ; h; x) i 2 μ 2 (h; x) ; (7.8) and ~F CR l (h) =~ L l CR;h (Iff ;I ff (Id 2 + h); Id) riff 2 (Id + h): Again, we notice that the Lipschitz-continuity of F ~ CR l (h) is preserved. We also have in this case a non-stationary ltering process. We therefore propose the following algorithm. Algorithm 7.6 ( ~ F l CR (h)) ffl Estimate for each y on the discrete grid the value of G fl (y) using equation (4.3) on page 68, of μ 2 (h; y) and v 2 (h; y) using equations (6.22), (6.23) and (6.24) on page 0, of S(I ff (y); h; y) using equation (7.7), and of μ 2j(I ff (y); h; y) using equation (7.4), approximating the integrals by a nite sum on a sufciently large 2 neighborhood around y. ffl Estimate v 2j (I ff (y); h; y) using equation (7.6), (I ff (y); h; y) using equation (7.5), and L ~ CR;h l (Iff (y);iff 2 (y + h(y)); y) using equation (7.8). ffl Estimate F ~ CR l (h)(y) =~ LCR;h l (Iff (y);iff(y 2 + h(y)); y) riff 2 (y + h(y)) Parallel Implementation Algorithms 7.5 and 7.6 are very well adapted to parallelization, as they both describe a non-stationary ltering procedure. The operations required for computing ~ F l MI (h) and ~ F l CR (h) at each voxel are dened from the knowledge of the functions Iff, Iff 2 and h in a relatively large neighborhood around it. 2 In practice we use fl =5and a neighborhood of size 9 9:

135 34 Chapter 7: Numerical Schemes We have implemented these two algorithms using the MPI library for parallel execution using a cluster of N p processors. A master-slave architecture is used, i.e. one of the processors handles a certain amount of global work, and distributes local work to the remaining N p processors. The distributed work is the computation of F ~ MI l (h) and F ~ CR l (h) at each iteration. The remaining operations (mainly the computation of the regularization term and the time-step update) are handled by the master processor. The execution flows for the master processor and for a slave processor are illustrated in gures 7. and 7.2. It is assumed that all the processors have access to the data corresponding to I ff, Iff 2 and h at each iteration. This is achieved in practice by special synchronization routines of the MPI library. With this architecture and using a cluster of 24 processors, we have achieved 5 times faster execution times than with the sequential version. Begin Do initialization work Give N v voxels to treat to each of the N p slave processors Wait for the results of the next free processor Have all the voxels been treated? Yes Do global work for the iteration No Give the next N v voxels to treat to the current free processor Computation of FMI l (h) and F CR l (h) Is this the last iteration? Yes Output result No End Figure 7.: Execution flow for the master processor in the parallel implementation of the matching flow.

136 7.3 Approximate Implementations of FMI l (h) and F CR l (h) 35 No Begin Wait for the next task: a set of N v voxels to treat Apply algorithm 7.5 or 7.6 to each of the voxels to treat Is the minimization over? Yes End Figure 7.2: Execution flow for each slave processor in the parallel implementation of the matching flow.

137

138 Chapter 8 Determining Parameters This chapter discusses the way in which the different parameters of the algorithms are determined, particularly the smoothing parameter for the Parzen window estimates. The matching algorithms use the following parameters. ffl fl: This parameter is xed to 5 with a local window size of 9x9 for the mutual information and the correlation ratio. For the cross correlation, the value of this parameter does not affect the computation time. This important property is due the fact that the local statistics are calculated using the recursive smoothing lter. Thanks to this property, we have conducted some experiments with different values of this parameter, which have shown that the algorithms are not very sensitive to it. This is the reason why we have xed it for the mutual information and the correlation ratio. Qualitatively speaking, the local window has to be large enough for the statistics to be signicant, and small enough to account for non-stationarities of the density. The value chosen has given good results in practice. ffl : This is the smoothing parameter for the Parzen estimates. Unlike the parameter fl, determining a good value for this parameter is crucial for the matching results. The determination of this parameter is discussed in more detail in the next section. ffl ff: This parameter determines the weight given to the regularization term in the energy functional. Since the range of the different matching functions varies considerably, we replace this parameter by another one, noted C such that ff = C»; where» is given by» = kf (h 0 )k ;

139 38 Chapter 8: Determining Parameters h 0 being the initial eld and F any of the matching functions. ffl ff: This is the scale parameter. We adopt a multi-resolution approach, smoothing the images at each stage by a small amount. Within each stage of the multi-resolution pyramid, the parameter ff is xed to a small value, typically 0.25 voxels. Besides these global parameters, one extra parameter is needed for each family of regularization operators. ffl ο: This is the parameter controlling the behavior of the linearized elasticity operator. For ο close to, the Laplacian operator becomes dominant, while the operator r(div(h)) becomes dominant for ο close to zero. Two experiments in the next chapter show qualitatively the behavior of the elasticity operator with respect to that of the Laplacian. In practice, we x the value of ο to 0:5, giving thus the same weight to both operators. ffl : This is the parameter controlling the anisotropic behavior of the Nagel- Enkelmann tensor. We adopt the method proposed by Alvarez et al. [6]. Given s, which in practice is xed to 0., we take the value of such that s = H jri ff j(z) dz where H jri ff j(z) is the normalized histogram of jri ff j Determining the Smoothing Parameter The parameter of the gaussian kernel is determined automatically. A very large amount of literature has been published on the problem of determining an adequate value for (we refer to [7] for a recent comprehensive study on non-parametric density estimation, containing many references to the forementioned literature). We adopt a cross-validation method technique based on an empirical maximum likelihood method. We note k g a set of m intensity pair samples (k =:::m) and take the value of which maximizes the empirical likelihood: L() = my k= ^P ;k (i k ) where ^P ;k (i k )= X m n k fs: i s6=i k g G (i k i s )

140 8. Determining the Smoothing Parameter 39 and n k is the number of data samples for which i s = i k. We present two examples of the determination of by maximization of the empirical likelihood. Figures 8. and 8.2 show the value of the empirical likelihood of the estimated density as a function of the parameter for two different images. Note how different the optimal values are for these two examples. The parameter for the joint intensity function is taken as max( ; 2 ), where (resp. 2 ) is the optimal value obtained by maximization of the empirical likelihood for I ff (resp. Iff 2 ).

40 Chapter 8: Determining Parameters -5. likelihood -5.2-5.3-5.4-5.5-5.6-5.7-5.8-5.9-6 -6.

141 40 Chapter 8: Determining Parameters -5. likelihood Figure 8.: Density estimation: example of using maximization of the empirical likelihood. The curve in the second row shows the value of the empirical likelihood of the estimated density, as a function of the parameter which attains a maximum for ' 8. The bottom row presents the raw histogram of the image on the left, and its smoothed version with the optimal value of on the right.

8. Determining the Smoothing Parameter 4-5.6 likelihood -5.65-5.7-5.75-5.8-5.85-5.9-5.95-6 -6.05-6.

142 8. Determining the Smoothing Parameter likelihood Figure 8.2: Density estimation: second example using maximization of the empirical likelihood. The curve in the second row shows the value of the empirical likelihood as a function of the parameter, which reaches a maximum for ' 30. The bottom row presents the raw histogram of the image on the left, and its smoothed version with the optimal value of on the right.

143

144 Chapter 9 Experimental Results This chapter presents experimental results for all the described algorithms using both real and synthetic data. Examples include 2D images for applications in computer vision and 3D images concerning different medical image modalities. 9. Classication We present a total of eight experiments. The six dissimilarity criteria and the two regularization operators are tested. Both 2D and 3D problems are included. Some of the experiments are completely synthetic, some others are completely real and yet some others are partly synthetic. Table 9. on the following page summarizes how each of these categories are represented in the eight experiments shown. We use the following abbreviations to refer to the three global criteria: GMI, GCR, GCC, the three local criteria: LMI, LCR, LCC and the two regularization operators: LE (linearized elasticity) and AD (anisotropic diffusion using the Nagel-Enkelmann tensor). 9.2 Description of the Experiments Experiment 9. Similarity measure used : GMI. Intensity transformation : known. Geometric transformation : unknown. Regularization used : LE, AD. Parameters : ff =0, number of scales = 3. Computation time : ' 3 minutes. Matching program : MatchPDE. Related gures :

145 44 Chapter 9: Experimental Results Experiment: D ffl ffl ffl ffl ffl 3D ffl ffl ffl Known geometric transformation ffl ffl Known intensity transformation ffl ffl ffl Global mutual information ffl ffl ffl Global correlation ratio ffl ffl Global cross correlation ffl ffl Local mutual information ffl Local correlation ratio ffl Local cross correlation ffl ffl Linearized elasticity ffl ffl ffl ffl ffl ffl Anisotropic diffusion ffl ffl ffl ffl Table 9.: Summary of the characteristics of each of the experiments. Comments: This experiment shows the behavior of the two different families of regularization operators. Figure 9. shows on the rst row the images I (on the left) and I 2 (on the right), and on the second row the image I 2 f (Id + h Λ ), where h Λ is the displacement eld obtained with linearized elasticity (on the left) and anisotropic diffusion (on the right). The displacement elds are shown in gure 9.2. Figure 9.3 shows the result obtained with the linearized elasticity operator with a value of ο close to on the 2 left, and close to on the right. Finally, 9.4 shows the determinant of the Jacobian of (Id + h Λ ), where h Λ is the eld of gure 9.3 on the left. The interest in this function is that if it is everywhere positive, then the transformation function: x! Id + h Λ (x) is invertible. This is the case for all the displacement elds shown in this experiment. This experiment shows Experiment 9.2 Similarity measure used : GCR. Intensity transformation : unknown. Geometric transformation : unknown. Regularization : LE. Parameters : ff =20, number of scales: 3. Computation time : ' 0 minutes. Matching program : MatchPDE. Related gures :

146 9.2 Description of the Experiments 45 Figure 9.: Experiment showing the behavior of the two regularization operators. See explanation of experiment 9. on page 43.

147 46 Chapter 9: Experimental Results Figure 9.2: Displacement eld obtained with linearized elasticity (left) and anisotropic diffusion (right). Figure 9.3: Displacement eld obtained with linearized elasticity with a value of ο close to (left), and close to one (right). 2

9.2 Description of the Experiments 47 2.4 2.2 50 2.8 00.6.4 50.2 200 0.8 0.6 250 50 00 50 200 250 0.

148 9.2 Description of the Experiments Figure 9.4: Determinant of the Jacobian of (Id + h Λ ), for the displacement eld of gure 9.3 on the left.

An articial warp was applied to the T2-weighted 2D plane.

149 48 Chapter 9: Experimental Results Comments: This experiment shows matching of a 2D plane extracted from a 3D proton-density image (PD) a similar T2-weighted 2D plane. An articial warp was applied to the T2-weighted 2D plane. The deformation is well recovered using global correlation ratio. Figure 9.5: Proton density image matching against T2-weighted MRI.

20 4 20 2 40 3 40 60 2 60 80 80 0 00 0 00 20 20 2 40 2 40 60 3 60 3 80 4 80 4 200 20

150 9.2 Description of the Experiments 49 Figure 9.6: Deformation eld recovered in the experiment of gure Figure 9.7: Components of the deformation eld recovered in the experiment of gure 9.5.

50 Chapter 9: Experimental Results 20 40 60 80.3.2. 0.9 00 0.8 20 0.7 40 0.6 60 80 200 20 40 60 80 00 20 40 60 80 200 0.5 0.4 0.3 Figure 9.

151 50 Chapter 9: Experimental Results Figure 9.8: Determinant of the Jacobian of the deformation eld recovered in the experiment of gure 9.5. Experiment 9.3 Similarity measure used : LMI, LCR. Intensity transformation : known. Geometric transformation : known. Regularization : LE. Parameters : ff =0, number of scales: 2. Computation time : ' 30 minutes (2 processors). Matching program : mpi9pde Related gures : Comments: This experiment shows the result of the local mutual information and local correlation ratio on synthetic data. The reference and target image where both taken from the same 2D plane in a MRI data volume. The reference image J was then transformed in the following way (jj x is the size of the domain in the x direction): J 0 (x; y) = sin (2 ßJ(x; y)) cos 2 ß jj x + y jj x and then linearly renormalized in [0; ]. Notice that the effect of this manipulation produces a bias in the intensities of the reference image which resembles the real image modality of experiment 9.4, plus a sort of spatial bias. A non-rigid smooth deformation

152 9.2 Description of the Experiments 5 was then applied to the target image. As expected, the global algorithms failed to align these two images, due to the severe non-stationarity in the intensity distributions. Figure 9.9: Matching with local mutual information and correlation ratio. Reference image (left), deformed image (right). Figure 9.0: Realigned image and its superposition with the reference image in the experiment of gure 9.9. Experiment 9.4 Similarity measure : GMI Intensity transformation : unknown Geometric transformation : unknown Regularization : LE. Parameters : ff =0, N s =4 Computation time : 25 minutes Command line : MatchPDE3D Related gure : 9.4.

52 Chapter 9: Experimental Results 20 8 20 6 6 40 4 40 4 60 60 2 2 80 0 80 0 00 2 00 20 4 20 2 40 6 40 4 20 40

: Components of the deformation eld applied in the experiment of gure 9.

153 52 Chapter 9: Experimental Results Figure 9.: Components of the deformation eld applied in the experiment of gure Figure 9.2: Components of the deformation eld recovered in the experiment of gure 9.9.

9.2 Description of the Experiments 53.6 20.4 40 60.2 80 00 0.8 20 0.6 40 0.4 20 40 60 80 00 20 40 60 Figure 9.

154 9.2 Description of the Experiments Figure 9.3: Determinant of the Jacobian for the deformation eld recovered in the experiment of gure 9.9. Comments: This example shows an experiment with real MR data of the brain of a macaque monkey. The reference image is a T-weighted anatomical volume and the target image is a functional, mion contrast MRI (fmri). The contrast in this modality is related to blood oxygenation level. The gure shows the result of the global algorithms. Notice that the alignment of main axis of the volume has been corrected. Experiment 9.5 Similarity measure : LCC Intensity transformation : unknown Geometric transformation : unknown Regularization : LE. Parameters : ff =0, number of scales: 4. Computation time : 25 minutes Command line : MatchPDE3D Related gure : 9.5. Comments: Matching of T2-weighted anatomical MRI against EPI functional MRI, using local cross correlation. We obtain deformations of an amplitude up to 5 voxels, mostly in

155 54 Chapter 9: Experimental Results Figure 9.4: Global mutual information with fmri data. Top row: reference anatomical MRI. Middle row: initial fmri volume. Bottom row: nal (corrected) fmri volume. The two columns show two different points in the volumes.

9.2 Description of the Experiments 55 the phase-encoding direction (horizontal axis of the lower-right view). This solution seems consistent with previous results obtained by Kybic, Thévenaz et al.

156 9.2 Description of the Experiments 55 the phase-encoding direction (horizontal axis of the lower-right view). This solution seems consistent with previous results obtained by Kybic, Thévenaz et al. [49] on the same dataset. Figure 9.5: Matching of T2-weighted anatomical MRI against EPI functional MRI, using local cross correlation (images are courtesy of Jan Kybic, data provided by Arto Nirkko, Inselspital Bern). Top row: reference and deformed image. Bottom row : reference and realigned image. Experiment 9.6 Similarity measure : LMI, LCR Intensity transformation : unknown Geometric transformation : unknown Regularization : LE. Parameters : ff =0, number of scales:. Computation time : 30 minutes (24 processors). Command line : mpi9pde. Related gure : 9.6.

56 Chapter 9: Experimental Results Comments: Our next experiment shows the result of the local algorithms with real 3D MR data. The reference image is a T weighted anatomical MRI of a human brain.

Notice that the intensities in this modality are qualitatively close to our simulated experiment.

157 56 Chapter 9: Experimental Results Comments: Our next experiment shows the result of the local algorithms with real 3D MR data. The reference image is a T weighted anatomical MRI of a human brain. The target image is an MRI from the same patient which is acquired using a special magnetic eld gradient as part of the process of obtaining an image of the water diffusion tensor at each point. Notice that the intensities in this modality are qualitatively close to our simulated experiment. The estimated deformation eld has a dominant y component, a property which is physically coherent with the applied gradient. Both MI and CR yielded similar results in this case. Figure 9.6: Matching of anatomical vs diffusion-tensor-related MRI, using local mutual information and correlation ratio.

9.2 Description of the Experiments 57 Experiment 9.7 Similarity measure : GMI Intensity transformation : known. Geometric transformation : unknown. Regularization : LE, AD.

158 9.2 Description of the Experiments 57 Experiment 9.7 Similarity measure : GMI Intensity transformation : known. Geometric transformation : unknown. Regularization : LE, AD. Parameters : ff =0, number of scales: 5. Computation time : 25 minutes. Command line : MatchPDE. Related gures : Comments: This experiment shows a real stereo pair in which the intensities in one of the images were articially transformed using a sine function. The matching is performed using global mutual information. Figure 9.7: Sterao matching using global mutual information. Experiment 9.8 Similarity measure : LCC, GCC. Intensity transformation : unknown. Geometric transformation : unknown. Regularization : LE, AD. Parameters : ff =0, number of scales: 3. Computation time : 3 minutes. Command line : MatchPDE Related gures : 9.2, Comments: This last experiment shows the use of the global and local cross correlation r criteria to perform template matching of human faces. In this case the illuminating conditions

58 Chapter 9: Experimental Results Figure 9.

and its superposition with the reference image.

0 6 250 8 250 5 50 00 50 200 250 50 00 50 200 250 Figure

159 58 Chapter 9: Experimental Results Figure 9.8: Deformed image with the displacement eld obtained, and its superposition with the reference image Figure 9.9: Components of the obtained deformation eld in the experiment of gure 9.7.

9.2 Description of the Experiments 59 3.5 50 3 2.5 00 2 50.5 200 0.5 0 250 50 00 50 200 250 Figure 9.20: Determinant of the Jacobian for the obtained deformation eld in the experiment of gure 9.7.

160 9.2 Description of the Experiments Figure 9.20: Determinant of the Jacobian for the obtained deformation eld in the experiment of gure 9.7. are the same in both photographs. If different, the local algorithms should be used. The different albedos of the two skins create a multimodal situation and the transformation is truly non rigid due to the different shapes of the noses and mouths. Notice the excellent matching of the different features. This result was obtained completely automatically with the same sets of parameters as the rest of the experiments, using global mutual information.

161 60 Chapter 9: Experimental Results Figure 9.2: Human template matching. Reference (left) and target (right) images. Figure 9.22: Reference image and deformed target image using the obtained deformation eld in the experiment of gure 9.2.

9.2 Description of the Experiments 6 Figure

obtained deformation eld in the experiment

6 5 50 4 50 00 2 00 0 0 50 50 5 2 200 4 200

200 220 20 40 60 80 00 20 40 60 80 200 220

162 9.2 Description of the Experiments 6 Figure 9.23: Some corresponding points using the obtained deformation eld in the experiment of gure Figure 9.24: Components of the obtained deformation eld in the experiment of gure 9.2.

62 Chapter 9: Experimental Results 2 50.8.6 00.4 50.2 200 0.8 0.6 250 0.

163 62 Chapter 9: Experimental Results Figure 9.25: Determinant of the Jacobian for the obtained deformation eld in the experiment of gure 9.2.

164 Appendix A Other Applications The gradients of the statistical similarity measures studied in the previous chapters may be used in other contexts where the same criteria are applicable. In this appendix we focus on two such applications, namely image segmentation by entropy minimization (Section A.) and diffeomorphic matching (Section A.2). The goal is more to give the flavour of how the gradients are used in these contexts than to propose specic methods. A. Entropy Minimization for Image Segmentation The minimization of the entropy associated to an image yields an algorithm which can be viewed as a mean-shift process [29], useful for segmentation purposes. Although we restrict the discussion to global entropy in order to keep it as simple as possible, it can be generalized to the local case in a straightforward manner, yielding an algorithm which is very close to bilateral ltering [82]. The experimental results are shown using the local version of the energy. Given an image (we restrict to the scalar case) I :! R, we may associate it a random variable X I, whose values are noted i and whose samples are given by the values of I(x), for x 2. The probability density of X I may be estimated by P (i) = jj R G (I(x) i) di; this denition being in agreement with the usual property RR P (i) di = jj The entropy of X I is given by R R R G (I(x) i) di z } Ent(X I )= R R P (i) log(p (i)) di: dx =:

165 64 Appendix A: Other Applications This quantity is now viewed as a functional of I, and the segmentation problem may be dened as nding the function I Λ :! R satisfying I Λ =argmin I2F Ent(I): Clearly the global minimum is attained for a constant image, but rather than adding a data-attachment term, we use the non-convexity of this energy to nd the closest local minimum, starting with the initial image I 0 as rst estimate. Computing the rst variation of Ent(I), + R R = R ( + log(p (i))) G0 ffl=0 (I(x) i) J(x) dx di = R At the steady state, r H (Ent(I)) = 0 and thus, necessarily, P 0 (i) P (i) Now since we have P 0 (i) P (i) = i 2 R I(x) G (I(x) i) dx R G (I(x) i) dx G? P 0 (i) P (i) z } r H (Ent(I)) ; J(x) dx: = 0 for all i 2 R. we may try to solve the minimization problem by introducing time and a differentiable function I : R +! R, and dening the solution as the steady state of the initial value problem 8 >< (t; y) =I(t; y) I(0; ) =I 0 ( ): I(t; x) G (I(t; x) I(t; y)) R dx G ; 8y 2 : (I(t; x) I(t; y)) dx The second term on the right-hand side of the evolution equation is a weighted mean around the image intensity at y. Using an explicit time discretization with timestep equal to one, the resulting algorithm describes a mean-shift process [29]. Figures A., A.2 and A.3 show the results obtained by applying this procedure to two different images. The results are shown with the local version of the algorithm so that two parameters must be specied: ff, controlling the size of the local window, and, the smoothing parameter of the Parzen window estimate. The results shown are obtained after convergence for the specied parameters. A.2 Diffeomorphic Matching The contents of this section is described in more detail in Chef d Hotel et al. [23]. The gradients of the statistical criteria computed in Chapter 5 may be used in the context of the template matching equations introduced by Christensen [26] and recently generalized by Trouvé [84]. (We also refer to the work of Miller and Younes [58], who

A.2 Diffeomorphic Matching 65 Figure A.: Segmentation example: From left to right: original image, noisy image, result with ff =, =5. Figure A.2: Segmentation example: From left to right: original image, noisy image, result with ff =5, =5.

166 A.2 Diffeomorphic Matching 65 Figure A.: Segmentation example: From left to right: original image, noisy image, result with ff =, =5. Figure A.2: Segmentation example: From left to right: original image, noisy image, result with ff =5, =5. describe a general framework related to this type of approach). In this context, the unknown is considered to be the transformation f = Id+ h, rather than the displacement eld h. The matching problem is then solved by constructing a one-parameter family of diffeomorphisms f(t) (0» t») and taking f() as the solution. This family of diffeomorphisms is constructed as the solution to the initial value problem 8 = Df v; v(t) =ψ?f f (t); f(0) = Id; (A.) where ψ is a smoothing kernel which ensures the appropriate regularity of the timedependent vector eld v and F f is the gradient of the similarity criterion, i.e. may be replaced by one of the matching functions that we have studied (Equations (5.9) and (5.4) at the end of Chapter 5). Intuitively, one can construct this family by considering at each iteration k the deformed template I ff 0 2 I ff 2 f f k and computing Ff kj f=id between I ff and Iff 0 2. After the regularization which yields v k = ψ?ff k and for a sufciently small ft, the transformation Id + ft v k is a diffeomorphism. Composing f k (by right-composition) with this transformation gives an efcient scheme which is

66 Appendix A: Other Applications Figure A.3: Segmentation example: From left to right: original image, result with ff =0:5, =5. consistent with the continuous flow A.

Set b k = F crit (I ff ;Iff k 2 )j h=0 for some statistical criterion crit. 4. Set v k = ψ?b k (letting for instance ψ be a Gaussian kernel). 5. Set f k+ = f k f (Id + ft v k ). 6.

167 66 Appendix A: Other Applications Figure A.3: Segmentation example: From left to right: original image, result with ff =0:5, =5. consistent with the continuous flow A.: f k+ = f k f (Id + ft v k ): In summary, we have the following algorithm.. Set k =0, f 0 = Id. 2. Set I ff k 2 = I ff 2 f f k. 3. Set b k = F crit (I ff ;Iff k 2 )j h=0 for some statistical criterion crit. 4. Set v k = ψ?b k (letting for instance ψ be a Gaussian kernel). 5. Set f k+ = f k f (Id + ft v k ). 6. Set k = k +and go to step 2. Figures A.4 to A.7 show the result of applying this algorithm for two different criteria, namely local mutual information and local cross correlation. The rst experiment (Figure A.4) is the same as Experiment 9.3, using local mutual information. As shown in the gure, the applied displacements are well recovered. Figures A.5, A.6 and A.7 show results obtained with the local cross correlation criterion in conjunction with the algorithm described above. Three different random smooth deformations are applied to an inverse recovery echo planar image of the brain of a macaque monkey. This deformed image is superimposed in green over the anatomical MRI of the same monkey (in magenta). As shown in the gures, large displacements are consistently recovered with this approach.

168 A.2 Diffeomorphic Matching 67 Figure A.4: Diffeomorphic matching using local mutual information. First row: reference and target images. Second row: corrected target image (left) and its superposition with the reference image (right). Third and fourth rows: horizontal and vertical components of the estimated (left) and true (right) deformations elds (iso-level 3.4 is outlined).

169 68 Appendix A: Other Applications Figure A.5: Diffeomorphic matching using local cross correlation. Left: rst random smooth deformation used as initial state. Right: deformed template after convergence of the algorithm. Figure A.6: Diffeomorphic matching using local cross correlation. Left: second random smooth deformation used as initial state. Right: deformed template after convergence of the algorithm.

170 A.2 Diffeomorphic Matching 69 Figure A.7: Diffeomorphic matching using local cross correlation. Left: third random smooth deformation used as initial state. Right: deformed template after convergence of the algorithm.

Apprentissage automatique Méthodes à noyaux - motivation

Apprentissage automatique Méthodes à noyaux - motivation MODÉLISATION NON-LINÉAIRE prédicteur non-linéaire On a vu plusieurs algorithmes qui produisent des modèles linéaires (régression ou classification)