et applications à l analyse d images

Size: px

Start display at page:

Download "et applications à l analyse d images"

Blanche Brooks
5 years ago
Views:

1 École Normale Supérieure de Cachan THÈSE présentée par Agnès DESOLNEUX pour obtenir le grade de DOCTEUR DE L ÉCOLE NORMALE SUPÉRIEURE DE CACHAN Spécialité: Mathématiques Événements significatifs et applications à l analyse d images Soutenue le 8 décembre 2000 devant le jury composé de : M. Jean BRETAGNOLLE Rapporteur M. Guy DEMOMENT M. Yves MEYER M. Lionel MOISAN M. Jean-Michel MOREL Directeur M. David MUMFORD Rapporteur M. Alain TROUVÉ Rapporteur

2 2

3 Résumé La théorie de la Gestalt Au début du siècle en Allemagne, l école gestaltiste se lance dans l analyse de la perception visuelle : il s agit de donner des lois de constitution des objets dans le champ visuel permettant d expliquer comment et pourquoi nous voyons les formes, les choses et leurs mouvements. Pour Wertheimer [79], fondateur du gestaltisme géométrique, les lois de constitution des objets visuels sont déterminées par des groupements de points selon des critères géométriques; ce principe de groupement étant lui-même récursif. Les objets ainsi formés sont alors appelés gestalts 1. Les principaux critères géométriques de groupement sont les suivants: alignement, couleur identique, proximit, symtrie, bonne continuation, parallélisme, même orientation, même forme, constance de largeur, fermeture, convexité, etc. Figure 1: Exemple de groupement de points suivant les critères d alignement et de couleur identique : formation de segments verticaux à gauche et horizontaux à droite, groupés à leur tour par parallélisme. Les livres de Metzger ( Gesetze des Sehens, 1975, [53]) et de Kanizsa ( La grammaire du voir, 1996, [40]) exposent les travaux de l école gestaltiste en les illustrant par un grand nombre de croquis et de figures d illusions visuelles. Ces figures sont pour la plupart constituées très simplement de cercles, de triangles et de rectangles. Les critères de groupement et les objets visuels constitués sont tous d ordre purement géométrique : l important est ici l analyse de la perception visuelle qui précède tout processus de reconnaissance. Sur la figure 1 sont 1 Nous maintenons le terme allemand de gestalt. Sa traduction serait à la fois : forme, structure géométrique, groupe. 3

4 4 présentés des croquis issus du livre de Kanizsa illustrant quelques critères de groupement : alignement, même couleur et parallélisme. Dans le travail de recherche présenté ici, nous essayerons de donner un cadre mathématique au phénomène de groupement géométrique décrit par la théorie gestaltiste. Un groupement du type des points alignés ou des segments parallèles sera appelé un événement significatif. Les événements considérés seront toujours de type géométrique. Nous nous interesserons d abord à la détection d une des gestalts les plus simples: l alignement. Puis, les segments ainsi formés (par groupement de points selon le critère d alignement) ayant chacun une longueur et une orientation, nous étudierons par le biais de la définition des modes significatifs d un histogramme, la possibilité de les grouper selon des critères tels que le parallélisme ou la même longueur. Nous nous intéresserons aussi dans le dernier chapitre à la détection de la gestalt contraste fort. Les théories gestaltistes sur la perception ont parfois été résumées par la formule le tout masque les parties, faisant ainsi référence au principe de masquage. Lorsque des points sont assemblés suivant les lois de groupement fixées, ils forment un tout. Seules les parties de ce tout participant à la constuction visuelle de l objet sont perceptibles. Les autres parties peuvent être pensées mais elles ne sont pas visibles : elles sont masquées. Par exemple, les lettres D et P sont des parties de la lettre R, mais elles ne sont pas percues car elles ne participent pas à l explication globale de l objet. De même sur la figure 2, le triangle n est plus perçu car sa base est masquée par un ensemble de segments parallèles. Nous tenterons aussi dans le présent travail de donner un modèle mathématique au principe de masquage, en introduisant la notion d événement maximal significatif. Par exemple dans le cas d un carré, des parties (assez longues) d un des côtés seront des alignements significatifs mais seul le côté lui-même constituera un alignement maximal significatif. Figure 2: Exemple de phénomène de masquage : dans la figure de droite, la base du triangle n est plus perçue car elle est masquée par l ensemble des segments parallèles. Le principe de généricité Notre perception visuelle fonctionne de telle façon que lorsque nous observons des images d objets, nous considérons qu elles résultent de projections d objets de l espace selon un angle

5 5 de vue générique. Par exemple, lorsque nous observons deux droites parallèles dans le plan et que nous essayons de reconstruire quelle était leur position relative dans l espace, notre première réponse est de dire que ces deux droites sont la projection de deux droites coplanaires et parallèles de l espace. Car pour deux droites quelconques, seul un angle de vue très particulier, dit accidentel permet de les voir parallèles. Les autres vues dites génériques nous présentent deux droites sécantes (cf Figure 3). De même pour un cube plein, les vues génériques consistent à en voir sept sommets. Une vue accidentelle est par exemple celle qui consiste à voir le cube perpendiculairement à une de ses faces, ne faisant plus ainsi apparaître que quatre sommets. Dans leur article intitulé On view likelihood and stability [78], Daphna Weinshall and Michael Werman, proposent un moyen de mesurer la généricité d un angle de vue. Pour cela, ils définissent la notion de stabilité d une vue d un objet fixé. Les vues les plus stables (les plus génériques) sont celles qui varient le moins lorsque l on change un peu l angle de vue. Par exemple dans le cas du cube, la vue perpendiculairement à une des faces est hautement instable car dès qu on bouge un peu la tête, on voit apparaître de nouveaux sommets. vue generique vue accidentelle Figure 3: Seule une vue accidentelle de deux droites quelconques de l espace permet de les voir parallèles. Ce principe de généricité est déjà présent dans le livre de Helmholtz, intitulé Optique physiologique (dition originale publie en 1867), d abord sous la forme de l ide de la recherche de la meilleure position de vue d un objet : S il nous importe, de plus, de saisir le mieux possible la forme gnrale et les dimensions de l objet, nous nous plaons de manire que, sans mouvoir la tête, nous puissions parcourir du regard toute la surface, et qu en outre, les dimensions que nous voulons comparer se prsentent d une manire aussi symtrique que possible. Puis Hemlholtz dfend l ide suivant laquelle nos perceptions sont guides par l exprience. La

6 6 base fondamentale des inductions que nous faisons est la relation de cause effet, qu il nomme aussi loi causale (les mêmes causes doivent produire les mêmes effets). Pour notre perception ceci se traduit Lorsque nous avons vu trs souvent se prsenter ensemble deux phnomnes de la nature, ils nous paraissent être lis ncessairement l un l autre, et nous concluons qu ils ont une cause commune. Revenons à l exemple des deux droites parallèles. Etant donné qu il est peu probable qu elles proviennent de la projection de deux droites quelconques indépendantes dans l espace, nous les percevons comme un seul groupe provenant d un même objet solide : les droites ne sont plus indépendantes, elles sont liées ensemble. Selon le point de vue de Helmholtz, ceci peut aussi s exprimer ainsi : nous savons par expérience que la plupart des fois où nous voyons deux droites parallèles, celles-ci proviennent d un seul et même objet. Un autre exemple d application du principe de généricité est donné par la figure 4 : le cube de Necker et l illusion de Kopfermann (cf [2]). La figure de gauche A représente la vue en perspective d un cube transparent (Necker remarque qu il y a deux interprétations possibles en fonction du choix de la face que l on met en avant dans l espace). Un angle de vue singulier du cube peut amener l observateur à voir la figure B, mais dans ce cas là, le cube n est plus reconnu comme tel (illusion de Kopfermann). L explication en est donné par le principe de généricité. En effet, dans la vue accidentelle du cube, certaines arêtes se retrouvent alignées et sont donc perçues spontanément comme parties d une même droite. La description de la figure devient alors un hexagone régulier divisé par trois droites. A B Figure 4: Le cube de Necker et l illusion de Kopfermann. La figure A représente la vue en perspective d un cube. Un angle de vue singulier du cube peut amener à la figure B, mais alors le cube n est plus reconnu comme tel. Ceci peut s expliquer par le principe de généricité. Dans la suite, nous donnerons aussi au principe de généricité le nom de principe de Helmholtz. Il peut s énoncer ainsi: nous groupons ensemble deux éléments si leur position a une faible probabilité de résulter d un arrangement accidentel. Ce principe, d ordre qualitatif, ne donne pas directement le modèle mathématique correspondant : comment calculer la probabilité de résulter d un arrangement accidentel? Que veut dire faible? C est à ces

7 7 questions que nous nous proposerons de répondre. L idée de donner un cadre quantitatif à ce principe a déjà été proposée plusieurs fois en traitement d images. Notamment par David Lowe dans son livre Perceptual Organization and Visual Recognition [48] où il explique qu il faut déterminer la probabilité que chaque relation dans l image ait pu se produire par accident. Naturellement, plus cette valeur est petite, plus il y a de chance que la relation ait une interprétation causale. (...) L hypothèse la plus générale et la plus évidente que l on puisse faire est celle de supposer que les objets sont indépendamment répartis dans l espace, ce qui implique que leurs projections sont indépendamment réparties dans l image. (...) Par exemple, si deux droites sont parallèles à une précision de 5 degrés près, alors la probabilité que cette relation soit arrivée par accident pour deux objets indépendants est de 5/180 = 1/36. La théorie de Shannon pour les images Dans l exemple de Lowe ci-dessus, si on cherche à calculer la probabilité que les deux droites soient exactement parallèles, on trouve zéro. Mai ceci suppose qu on puisse mesurer l angle entre les droites avec une précision infinie! Mais ni notre oeil, ni les images dont nous disposons ne possèdent cette résolution infinie. En effet, les images naturelles et digitales sont générées de la façon suivante : soit une image source s(x) supposée être de résolution infinie. Une convolution optique est opérée sur s, ce qui conduit à l image lissée k s. Puis, cette image est échantillonnée suivant une grille régulière formée par les capteurs (la rétine humaine est tapissée d un réseau hexagonal de capteurs qui comptent les photons). L image finale u peut donc être modélisée par u = (k s).π où Π est le peigne de Dirac associé à la grille des capteurs. Grâce au théorème de Shannon-Whittaker, l image continue k s peut alors être retrouvée, si le réseau d échantillonage est suffisamment fin, à partir de l image discrète u en utilisant l interpolation de Shannon par une base de fonctions sinus cardinal. Ainsi, dans toute la suite, nous associerons toujours une certaine précision aux événements géométriques considérés et aux valeurs mesurées (précision due à la quantification et au bruit du capteur). Par exemple, les segments de droite auront une certaine épaisseur (celle du pixel) et par conséquent, la précision de leur orientation sera inversement proportionnelle à leur longueur. Les événements observés seront donc du type : des points alignés à une certaine précision p ou encore des droites parallèles à une certaine précision angulaire α près. De plus, le fait que les images soient à la fois quantifiées en espace et en niveau de gris permet de donner une valeur finie à des quantités comme le nombre de droites, le nombre de cercles ou le nombre de lignes de niveau, permettant ainsi de rentrer dans le cadre des probabilités discrètes. Les événements significatifs Le principe de généricité nous invite à raisonner de la façon suivante : calculons les probabilités comme si les positions objets étaient indépendantes et uniformément distribuées. Par

8 8 exemple, soit un ensemble de dix petits disques noirs sur une feuille blanche. Supposons que six de ces disques soient alignés (à une certaine précision). On peut alors calculer la probabilité P 0 d un tel événement en supposant préalablement que les positions des disques sont indépendantes et uniformément distribuées sur la feuille. Si cette probabilité P 0 est suffisamment petite, nous dirons que l événement est significatif. Reste évidemment à fixer le seuil T tel que P 0 T implique l événement est significatif. Pour ce faire, nous adopterons la définition générale suivante : un événement du type telle configuration de points a telle propriété est ε-significatif si l espérance du nombre d occurrences de cet événement dans une image est inférieur à ε. Pourquoi une telle définition? Parce qu elle nous garantit qu en moyenne nous observerons moins de ε événements ε-significatifs par hasard dans une image, ce qui peut aussi s exprimer par le fait que nous voulons un nombre de fausses alarmes inférieur à ε. Ainsi, les événements significatifs sont des événements rares qui ne peuvent pas se produire par hasard plus de ε fois en moyenne par image. Par défaut, nous fixerons ε égal à 1 (i.e. nous voulons moins d une erreur par image). La définition précédente étant très générale et abstraite, nous allons l illustrer par un exemple simple mais montrant déjà les difficultés auxquelles nous serons confrontés. Considérons une image binaire de taille 12 12: chaque pixel est soit blanc, soit noir. Soit alors l événement suivant : un carré de taille 4 4 de points tous de même couleur noire (cf Figure 5). Figure 5: Un carré noir de taille 4 4 dans une image binaire de taille Par le principe de généricité, nous calculons la probabilité de cet événement comme si les pixels étaient indépendants et avaient aléatoirement la valeur blanc ou noir, chacune avec la probabilité 1/2. La probabilité qu un carré de taille 4 4 soit entièrement noir est alors (1/2) 16. Et l espérance du nombre d occurrences de cet événement dans l image est simplement égale au nombre de carrés de taille 4 4 dans l image multiplié par (1/2) 16, c est à dire 9 9 (1/2) 16. Ce nombre étant très inférieur à 1, nous dirons que l événement est significatif. Cet exemple très simple soulève déjà pourtant les problèmes suivants: 1. Trop d événements significatifs? En effet, par exemple, tout carré de taille 3 3 inclus dans le carré noir est aussi significatif. Nous verrons comment la notion d événement maximal significatif peut être un moyen de résoudre ce problème.

9 9 2. Problème de la définition a priori/a posteriori de l événement : si on considère un motif quelconque de taille 4 4 dans une image binaire de taille 12 12, alors l espérance du nombre d occurrences de cet événement est aussi 9 9 (1/2) 16, et donc cet événement est lui aussi significatif. La réponse à ce problème est la suivante : nous devons définir a priori l événement géométrique. L événement ne peut pas être défini d après l image elle-même, il doit au contraire faire partie d une liste fixée d événements géométriques: ceux déterminés par la théorie de la gestalt. 3. Problème de la définition géométrique de l événement : si on considère le carré noir non plus comme un carré mais comme un convexe, alors l espérance du nombre d occurrences de cet événement dans l image devient (1/2) 16 multiplié par le nombre de convexes de taille 4 4 dans l image. L événement peut alors ne plus être significatif! Une approche statistique différente Un des principaux problèmes dans toutes les approches statistiques d analyse d images est le choix de la distribution de probabilité a priori suivie par les images naturelles. Par exemple, dans le modèle bayesien de S.Geman et D.Geman [29], étant donnée une image dégradée observée, notée obs, le but est de retrouver l image originale I définie comme étant l image la plus probable sachant l image observée. A l aide de la formule de Bayes, ce problème se ramène à trouver le Maximum A Posteriori (MAP) de P [I obs] = P [obs I] P [I]. P [obs] Le terme P [obs I] représente le modèle de dégradation (par exemple un bruit gaussien); il est l équivalent du terme de fidélité dans les approches variationnelles de segmentation ou de restauration d image (Mumford et Shah [58], Blake et Zisserman [6]). Le terme P [I] est la distribution de probabilité a priori sur les images, équivalent du terme de régularité des formulations variationnelles. Son choix est généralement arbitraire et dépend du type de régularité voulu : par exemple, norme L 1 du gradient, ou norme L 2 du gradient ou encore termes de courbure... La distribution de probabilité P [I] peut aussi être apprise en étudiant des statistiques sur un vaste ensemble d images naturelles. Cette approche est en particulier celle développée récemment par Zhu, Wu et Mumford ([83], [84], [85]) dans leur modèle appelé FRAME (Filter, Random field, And Minimax Entropy). A partir d un ensemble de statistiques observées sur un large ensemble d images naturelles (obtenues par une famille de filtres appropriés), ils construisent à l aide du principe de l entropie minimax une distribution de probabilité p(i). De même, Zhu dans un article intitulé Embedding Gestalt Laws in Markov Random Fields [86], cherche à capturer les caractéristiques des formes des objets 2D (en particulier les caractéristiques décrites par les gestaltistes telles que la co-linérarité ou la co-circularité),

10 10 pour en déduire ensuite une distribution de probabilité a priori sur l ensemble des formes planes. Notre approche est différente, voire complètement opposée. Au lieu de chercher à déterminer quelle est la distribution de probabilité suivie par les images naturelles, nous allons au contraire considérer un modèle a priori, naïf et faux, et les objets seront alors définis comme des événements rares par rapport à ce modèle. Ce modèle a priori nous est fourni par le principe de généricité : nous raisonnons comme si les points avaient des caractéristiques (niveau de gris, orientation) indépendantes et uniformément distribuées. Nous définissons alors les gestalts comme des contre-exemples (c est à dire des événements exceptionnels, de très faible probabilité) par rapport à cette hypothèse de bruit uniforme. Programme Le but général de notre programme de recherche est d être capables, à partir d une image donnée d en fournir la liste des gestalts. La première étape est d abord de créer des groupements élémentaires de points à partir des critères définis par la théorie de la gestalt. Cette étape permet déjà d obtenir un grand nombre d objets : segments de droite, courbes, courbes convexes, taches de niveau de gris homogène, etc... Pour chacun des critères de groupement (alignement, bonne continuation, même couleur), il nous faut déterminer le bon cadre mathématique associé. En effet, il nous faut tout d abord définir précisement l événement considéré : quelle configuration de points et quel critère de groupement. Puis, nous devons donner l hypothèse probabiliste a priori dont nous allons chercher les meilleurs contre-exemples (du type indépendance et distribution uniforme). A partir de ce modèle, nous pouvons alors calculer la probabilité de l événement observé, ainsi que l espérance du nombre d occurrences d un tel événement dans l image : si ce nombre est inférieur à ε, on dit alors que l événement est ε-significatif. Un événement très significatif entrainant l existence de nombreux sous-événements eux aussi significatifs, il nous faut définir une notion de maximalité. Ceci se fait à partir de la relation d inclusion naturelle déterminée par le type de configuration de points considéré : un événement est dit maximal significatif s il est significatif et s il ne contient pas et n est pas contenu dans un événement plus significatif que lui. A la fin de cette première étape, nous disposerons donc d une liste de gestalts élémentaires et partielles. La deuxième étape doit être consacrée à la construction de gestalts d ordre supérieur, formées de façon pyramidale à partir des objets élémentaires détectés précédemment. En effet, les objets élémentaires ont chacun une ou plusieurs caractéristiques: position, orientation, niveau de gris ou encore forme, et ils peuvent donc être regroupés de la même façon que les points dans la première étape. Un groupe ainsi formé possède à son tour des caractéristiques: celles qui lui sont propres, mais aussi celles dont il a hérité. Par exemple, dans la petite figure ci-dessous:

11 11 nous détectons d abord les petits segments verticaux élémentaires, puis nous les groupons ensemble à la fois par parallélisme, par constance de largeur (la distance entre deux segments consécutifs est constante) et aussi par le fait qu ils ont tous la même couleur noire. Le résultat de ce groupement est la formation d un segment plus grand qui hérite de la couleur noire, mais qui par contre possède une orientation propre, différente de celle des segments élémentaires qui le composent. Résumé détaillé de la thèse Dans le premier chapitre d introduction, nous exposons les motivations de ce travail sur les événements significatifs dans une image. Nous expliquons en particulier pourquoi les approches variationnelles de segmentation d images, ainsi que des méthodes telle que la transformée de Hough ne sont pas totalement satisfaisantes: il leur manque de donner la preuve que les objets ou les régions qu elles détectent existent vraiment. Nous exposons ensuite les théories de l école gestaltiste sur la perception visuelle, le principe de généricité appelé aussi principe de Helmholtz et pour finir cette introduction, nous donnons la définition générale de la notion d événement significatif. Le deuxième chapitre est entièrement consacré à une des gestalts les plus importantes : l alignement. En chaque point d une image de taille N N, nous calculons une direction (définie comme étant orthogonale au gradient calculé sur un voisinage de taille 2 2). Dans la section 2.1, nous donnons la définition de segment significatif. Pour un segment S de longueur l comptée en points indépendants (i.e. à distance 2), nous comptons le nombre k de points parmi les l ayant la propriété d avoir leur direction alignée avec celle du segment S à la précision p fixée près. Soit P (k, l) la probabilité de l événement au moins k points alignés parmi l, à la précision p près. Les points étant supposés indépendants et leur direction uniformément distribuée, la probabilité de cet événement est simplement donnée par la loi binomiale : P (k, l) = l i=k ( ) l p i (1 p) l i. i Le nombre total de segments dans l image étant égal à N 4, nous obtenons la définition suivante : le segment S est ε-significatif si et seulement si N 4 P (k, l) < ε. Dans la section 2.2, nous associons à chaque segment S son nombre de fausses alarmes, défini par NF (S) = N 4 P (k, l), et nous en donnons quelques propriétés élémentaires. Le nombre de fausses alarmes est une mesure de significativité : plus il est petit, plus le segment est significatif. Dans la section 2.3, nous cherchons des conditions nécessaires et suffisantes sur k et l pour rendre le segment S significatif. Nous donnons aussi des estimations asymtotiques (quand l est grand) ainsi que des inégalités pour le seuil k(l) défini comme étant le plus petit entier k tel que N 4 P (k, l) < ε

12 12 (c est à dire le nombre minimal de points alignés qu un segment de longueur l doit contenir pour être ε-significatif). Le résultat affirme en gros que k(l) p l + C l ln N 4 ε. Dans la section 2.4, nous étendons la fonction (k, l) P (k, l) à des valeurs continues de k et l. Ceci nous permet d en déduire des propriétés des segments significatifs, et en particulier le résultat suivant à propos du changement d échelle. Soit S un segment significatif d une image de taille N N, soit l sa longueur et k le nombre de points alignés qu il contient. Nous augmentons alors la résolution de l image d un facteur λ > 1 de telle façon que la nouvelle image soit maintenant de taille λn λn. Le segment considéré est alors S λ, de longueur λl et contient λk points alignés (on fait ici l hypothèse que la densité de points alignés est invariante par changement d échelle). Nous démontrons alors le résultat suivant : NF (S λ ) = (λn) 4 P (λk, λl) < N 4 P (k, l) = NF (S). Dans la section 2.5, nous introduisons la notion de segment maximal significatif : un segment est maximal significatif si et seulement s il est significatif et s il ne contient pas ou n est pas contenu dans un segment plus significatif que lui. Notre principale conjecture est alors que deux segments significatifs appartenant à une même droite ont une intersection vide. Plus précisement, notre conjecture est : si A et B sont deux segments significatifs d une même droite, alors min (NF (A B), NF (A B)) < max (NF (A), NF (B)). Ce résultat est prouvé asymptotiquement. Il est aussi prouvé pour l approximation normale et l approximation grandes déviations de la loi binomiale. Dans la section 2.6, nous montrons des résultats expérimentaux sur différents types d images, et nous décrivons aussi l algorithme de détection utilisé. Dans la section 2.7, nous discutons le problème du choix de la précision p. Nous montrons en particulier que lorsqu on diminue la précision p, les alignements deviennent de moins en moins significatifs. Par conséquent, nous considérerons toujours des alignements à la précision grossière p = 1/16. Et finalement, dans la section 2.8, nous montrons comment tout ce qui a été fait pour les segments peut être étendu de façon immédiate aux arcs de cercles. Dans le troisième chapitre, nous nous intéressons au problème du calcul de l orientation en chaque point d une image discrète et quantifiée. Le gradient en un point est calculé sur un voisinage 2 2, comme étant le gradient de l interpolation bilinéaire au centre de la fenêtre. Nous montrons d abord (Section 3.2) qu une telle facon de calculer le gradient ne crée pas de biais sur l histogramme des orientations (pour un bruit gaussien, l orientation est uniformément distribuée sur [0, 2π]). Par contre, nous montrons ensuite que la quantification des niveaux de gris a un effet désastreux : le cas extrême étant le cas binaire où les seules orientations possibles sont les multiples de π/4. Dans la section 3.3, nous proposons une solution pour déquantifier l orientation: il s agit d une translation de (1/2, 1/2)

13 13 en Fourier, en supposant, par le théorème de Shannon, que l image est bien échantillonnée. Nous étudions les effets de cette translation, et nous expliquons pourquoi elle permet de déquantifier l orientation. Le principal biais sur l orientation vient des régions de l image où le gradient est de norme petite. Nous montrons alors que dans ces régions la translation de (1/2, 1/2) transforme les valeurs quantifiées u(x+1) u(x) en des valeurs distribuées sur tout R suivant une loi quasi-gaussienne (ce qui rétablit une orientation uniforme). Ce chapitre se termine par quelques expériences (Section 3.4) qui montrent bien la nécessité de cette étude sur le calcul de l orientation dans une image quantifiée. Dans le quatrième chapitre, nous nous intéressons aux histogrammes, et plus précisement à la possibilité de définir la notion de modes significatifs d un histogramme. Cette tape est importante dans la construction pyramidale des gestalts dans la mesure où elle nous permettra de grouper des objets dj construits (par exemples des segments) suivant des critres telle que : la même couleur, la même longueur ou encore la même orientation. Nous considrons d abord des histogrammes discrets : soit un nombre fini M de points ayant chacun une valeur dans l ensemble {1, 2,..., L}. Notre modle a priori repose alors sur l hypothse probabiliste suivante : les M points ont des valeurs indpendantes et uniformment distribues sur l ensemble {1, 2,..., L}. Pour tout intervalle [a, b] [1, L], nous dfinissons sa longueur relative par p(a, b) = (b a + 1)/L (elle reprsente la probabilit a priori qu un point ait sa valeur dans [a, b]), et sa masse relative par r(a, b) = k(a, b)/m où k(a, b) est le nombre de points parmi les M ayant valeur dans [a, b] (cette masse relative r(a, b) reprsente la probabilit a posteriori qu un point ait sa valeur dans [a, b]). La probabilit que parmi les M points il y en ait au moins k ayant leur valeur dans l intervalle [a, b] est la queue de la loi binomiale de paramtre M et p(a, b), note aussi B(M, k, p(a, b)). Le nombre total d intervalles est L(L + 1)/2 et donc nous dfinissons le nombre de fausses alarmes d un intervalle [a, b] par NF ([a, b]) = B(M, k(a, b), p(a, b)) L(L + 1)/2. En utilisant l ingalit de Hoeffding ainsi que le thorme de grandes dviations (justifi par le fait que le nombre M est habituellement trs grand : par exemple lorsque c est le nombre de pixels d une image), nous adopterons dans toute la suite les dfinitions suivantes. Pour tout intervalle [a, b], nous dfinissons son entropie relative H([a, b]) par H([a, b]) = r(a, b) log r(a, b) 1 r(a, b) + (1 r(a, b)) log p(a, b) 1 p(a, b). Cette entropie relative peut aussi être vue comme la distance de Kullback-Leibler entre les distributions de Bernoulli de paramtre respectif r(a, b) et p(a, b). Un intervalle [a, b] est dit ε-significatif si et seulement si r(a, b) > p(a, b) et H([a, b]) > 1 M L(L + 1) log. 2ε De la même façon que nous avions eu besoin d introduire la notion de segment maximal significatif, nous dfinissons ici la notion d intervalle maximal significatif: un intervalle [a, b] est maximal significatif si et seulement s il est significatif et si pour tout intervalle J inclus dans

14 14 [a, b] ou contenant [a, b], alors H(J) < H([a, b]). Nous dmontrons alors le thorme suivant : si I et J sont deux intervalles maximaux significatifs alors I J =. Aprs avoir ainsi dfini des intervalles qui contiennent significativement plus de points que la moyenne, nous dfinissons aussi la notion d intervalle contenant significativement moins de points que la moyenne : un intervalle [a, b] est un trou ε-significatif si et seulement si r(a, b) < p(a, b) et H([a, b]) > 1 M L(L + 1) log. 2ε La dfinition finale est alors celle de mode significatif : c est un intervalle significatif qui ne contient pas de trou significatif. Nous montrons alors des rsultats exprimentaux (comparaison entre intervalles significatifs et modes significatifs) sur des histogrammes de niveau de gris provenant de diffrentes images. Dans le section 4.2, nous tendons les dfinitions prcdentes au cas d histogrammes continus. Dans la section 4.3, nous dmontrons quelques proprits des intervalles significatifs. Nous montrons en particulier que si [a, b] est un intervalle maximal significatif d un histogramme continu not h alors h(a) = h(b), et h est croissante au point a et dcroissante au point b. Nous donnons aussi l analogue de ce rsultat dans le cas discret. Puis, dans la section 4.4, nous posons le problme du choix de l intervalle de rfrence : doit-on prendre le support de l histogramme comme intervalle de rfrence ou non? La section 4.5 est consacre aux expériences : nous y montrons comment la notion de mode significatif intervient de façon fondamentale dans le processus de groupement des gestalts. Nous utilisons les segments maximaux significatifs dfinis dans le deuxime chapitre et nous les groupons, l aide de l histogramme de leur longueur ou de leur orientation, suivant des critres tels que la même longueur ou le paralllisme. Dans le cinquième chapitre, nous nous intéressons à un troisime type d vnement significatif : celui de fort contraste le long d une courbe. Les courbes considres sont les lignes de niveau : elles sont en tout point orthogonales au gradient, et sont donc des candidats naturels si on s intéresse au contraste. Soit u une image digitale et soit N ll le nombre de lignes de niveau fermes qu elle contient. comme étant u(x). Nous nous intressons alors Le contraste en un point x d une ligne de niveau est dfini l vnement suivant : en tout point d une ligne de niveau ferme de longueur l, le contraste est suprieur un certain µ. Se pose alors le problme du choix de la probabilit a priori : quelle est la probabilit P (µ) pour un point d avoir un contraste suprieur µ? La distribution uniforme n ayant ici aucun sens, nous adopterons la distribution empirique fournie par l image elle-même, c est dire : P (µ) = 1 #{x / u(x) µ }. N 2 Pour une ligne de niveau ferme C, de longueur l et de contraste minimal µ, nous dfinissons son nombre de fausses alarmes par NF (C) = N ll P (µ) l.

15 15 La courbe C est alors ε-significative si et seulement si son nombre de fausses alarmes est infrieur ε. De la même façon, nous dfinissons la notion d edge significatif, les candidats étant ici des morceaux de ligne de niveau. La relation d inclusion entre les ensembles de niveaux induit une relation naturelle de comparaison des lignes de niveaux, nous permettant ainsi de dfinir, dans la section 5.2, la notion de courbe maximale significative. Dans la section 5.3, nous montrons les rsultats obtenus sur diffrentes images, et nous les comparons ceux obtenus par des méthodes telles que la segmentation par la fonctionnelle de Mumford-Shah ou encore par le dtecteur de bords de Canny et Deriche. Perspectives Dans le travail de thèse présenté ici, nous allons exposer une méthode de détection de structures géométriques sûres dans une image. Pour cela, nous utilisons un principe de généricité (appelé aussi principe de Helmholtz), qui nous permet ainsi d éviter la problématique question : quelle distribution de probabilité pour les images naturelles? En effet, le but du principe de Helmholtz est de fournir non pas un modèle pour les images, mais un modèle par lequel notre perception analyse les images, a contrario. Nous introduisons les notions d événement ε-significatif et d événement maximal significatif pour différents types de gestalt : l alignement, les modes d histogramme et le contraste. Les résultats obtenus sont encourageants : dans tous les cas, les gestalts détectées sont perceptuellement présentes et inversement, il n y a pas de gestalt perceptuellement présente qui soit manquée par les algorithmes. Avant de pouvoir vraiment commencer la construction pyramidale des gestalts, il nous faudra d abord disposer d un plus grand nombre d objets élémentaires: en particulier des courbes, des angles, des convexes, des régions de niveau de gris homogène, etc. Certains de ces objets peuvent déjà être définis à partir de segments élémentaires: par exemple une courbe lisse peut être définie comme une courbe admettant en chaque point un segment élémentaire pour tangente, puis une courbe convexe comme une courbe lisse dont les segments tangents tournent toujours dans le même sens. Les modes d histogrammes sont une étape importante pour la construction récursive des gestalts. Mais la définition de mode significatif que nous avons donnée n est pas encore tout à fait satisfaisante. En effet, les modes obtenus ne permettent pas une segmentation fine de l histogramme. Par exemple, un petit pic isolé ne sera pas détecté, alors que le fait d être isolé devrait au contraire le rendre très significatif : ce cas se produit par exemple lorsqu un segment horizontal est seul au milieu de segments verticaux (cf Figure 6) ou encore lorsqu un carré rouge est seul au milieu de nombreux carrés noirs. Il nous faudra donc définir plus finement les modes d un histogramme. Il faudra aussi probablement étendre nos définitions au cas d histogrammes bi-dimensionnels, afin de pouvoir créer des groupements suivant deux caractéristiques en même temps (par exemple même couleur et même orientation). Lors de la détection d alignement, nous avons rencontré le problème de la bitangence : lorsque

16 16 Figure 6: Le segment vertical est seul au milieu d un ensemble de segments horizontaux, ce qui le rend très significatif par rapport au reste de l image. deux courbes convexes lisses sont proches, nous détectons des alignements bitangents à ces deux courbes. Ces alignements ne sont pas faux, mais la raison de leur détection n est pas la présence d un alignement pur, c est plutôt l existence des deux courbes lisses. De même, pour une image floue ou présentant un dégradé, nous trouvons de nombreux alignements qui là aussi ne constituent pas la meilleure explication, car ils sont la conséquence de la présence d une gestalt plus globale. Seule une analyse de toutes les gestalts présentes peut nous permettre à la fin de trouver la meilleure explication de l image. Ce principe est présent dans la théorie de la gestalt sous le nom du principe de l articulation sans reste. Selon ce principe, la meilleure explication d une image est l explication la plus globale, dans le sens où c est celle qui explique toute l image avec le plus petit nombre possible de gestalts. Ce problème de trouver la meilleure explication globale ne se posera qu une fois établie la liste de toutes les gestalts les plus élémentaires et aussi les plus complexes, présentes dans l image. Elle devrait nous amener à une formulation variationnelle, comme le conjecturait Köhler [45].

17 Contents 1 Introduction Theories of image analysis Gestalt Theory Helmholtz principle and meaningful events Detailed plan of the thesis Meaningful Alignments Definition of meaningful segments The discrete nature of applied geometry Definition of meaning Number of false alarms Definition Properties of the number of false alarms Thresholds and asymptotic estimations Sufficient condition of meaningfulness Necessary conditions for meaningfulness Asymptotics for the meaningfulness threshold k(l) Lower bound for the meaningfulness threshold k(l) Properties of meaningful segments Continuous extension of the binomial tail Increasing the resolution Maximal meaningful segments Definition A conjecture about maximality A simpler conjecture Proof of Conjecture 1 under Conjecture Partial results about Conjecture Experiments About the precision p Extension to circles Partial conclusion

18 18 CONTENTS 3 Dequantizing image orientation Introduction Local computation of gradient and orientation White noise Computation of orientation on nonquantized images Bias of quantization Orientation dequantization The proposed solution: Fourier translation Study of the dequantized noise Posterior independence The flat regions model: final explanation of the dequantization effect Experiments and application to the detection of alignments Modes of a histogram Discrete histograms Meaningful intervals Maximal meaningful intervals Meaningful gaps and modes Extension to continuous histograms Properties of meaningful intervals Mean value of an interval Structure of maximal meaningful intervals The reference interval Applications and experimental results Contrast Contrasted Boundaries Definitions Thresholds Maximality Experiments Some more experimentation Conclusion and Perspectives 155

19 Chapter 1 Introduction We address the problem of developing a mathematical model for the detection of geometric structures in an image, based on a statistical criterion. We will first point out the inadequacy of the variational methods, in as much as they do not demonstrate the existence of the geometric structures they compute. We will then explain the kind of geometric events we have to look for: the basic visual objects defined by the Gestalt theory. A grouping principle of perception due to Helmholtz will then drive us to our main definition: the one of meaningful event. 1.1 Theories of image analysis Most theories of image analysis tend to find in a given image geometric structures (regions, contours, lines, convex sets, junctions, etc.). These theories generally assume that the images contain such structures and then try to compute their best description. The variational framework is quite well adapted to such a viewpoint (for a complete review, see e.g. [56]). The general idea is to minimize a functional of the kind F (u, u 0 ) + R(u), where u 0 is the given image defined on a domain Ω R 2, F (u, u 0 ) is a fidelity term and R(u) is a regularity term. F and R define an a priori model. Let us give two examples: The Mumford-Shah model (see [58]), where the energy functional to be minimized is E(u, K) = λ 2 u 2 dx + µλ 2 length(k) + (u u 0 ) 2 dx, (1.1) Ω K where u is the estimated image, K its discontinuity set, and the result (u, K) is called a segmentation of u 0, i.e. a piecewise smooth function u with a set of contours K. The Bayesian model for segmentation or restoration (see [29] and [30]): let us denote by y = (y s ) s S the observation (the degraded image). The aim is to find the real image x = (x s ) s S knowing that the degradation model is given by a conditional probability Π( y x), 19 Ω K

20 20 CHAPTER 1. INTRODUCTION and that the a priori law of x is given by a Gibbs distribution Π( x) = Z 1 exp( U( x)) (for binary images, the main example is the Ising model). We then have to find the M.A.P. (Maximum A Posteriori) of Π( x y) = Π( y x)π( x) Π( y). (1.2) Assume that Π( y x) = C exp( V ( x, y)). For example, in the case of a Gaussian noise, 1 Π( y x) = ( 2πσ 2 ) S 2 exp( 1 2σ 2 (y s x s ) 2 ), finding the MAP is equivalent to seeking for the minimum of the functional s S V ( x, y) + U( x). (1.3) A main drawback of the variational methods is that they introduce normalization constants (λ, µ,...) and the resulting segmentation depends a lot upon the value of these constants. Now, for example, in the case of the Mumford-Shah piecewise constant model, the parameter λ may be seen as a scale parameter: it is related to the desired accuracy of the segmentation, i.e. a more or less detailed description of the image. The other point is that they will always deliver a minimum for their functional and so they assume that any image may be segmented (even a white noise). Indeed, they do not yield any criterion to decide whether segmentation is relevant or not. This fact is well illustrated in Chapter 5 by Figures 5.3 top-right and 5.4 top-right: the resulting segmentation creates boundaries that are not relevant (front side of the desk, or background of the cheetah image), these boundaries do not correspond to perceptual regions. A further step is missing: once some regions are detected by a variational segmentation method, we then have to check in some way that these regions are relevant. Of course, the probabilistic framework leading to variational methods should in principle give a way to estimate the parameters of the segmentation functional. In the deterministic framework, these parameters can sometimes be estimated as Lagrange multipliers when (e.g.) a noise model is at hand, like in the Rudin-Osher-Fatemi method (see [69]). It is nonetheless easy to check that, first, most variational methods propose a very rough and inaccurate model for the image, second, their parameters are generally not correctly estimated anyway, yielding to supervised methods. Another possibility, which turns out to be a significant improvement of MAP methods, is the Minimal Description Length method (MDL) introduced by Rissanen [66] and first applied in image segmentation by Yvon Leclerc [47]. Actually, this last mentioned method, applied to detect regions and their boundaries in an image, permits to fix in an automatic way the weight parameters whose presence we criticized in the Mumford-Shah model. Now, the resulting segmentation model remains all the same unproved: the MDL principle does not prove the existence of regions: it only gives their best description, provided the image indeed is segmentable into constancy regions. This fact is easily explained: the MDL principle assumes that a model, or a class of models, is given and then computes the best choice of the model parameters, and of the model explaining the

21 1.2. GESTALT THEORY 21 image. As far as perception theory is concerned, we request more, namely a proof that the model is the right one. 1 Another drawback of most segmentation methods is their locality. Despite the gestaltists theories, they look rather for local structure. Let us mention some nonlocal theories of image analysis : the Hough Transform (see [49]), the detection of globally salient structures by Sha Ashua and Ullman (see [73]), the Extension Field of Guy and Medioni (see [34]) and the Parent and Zucker curve detector (see [61]). These methods have the same drawback as the variational models of segmentation described above. The main point is that they a priori suppose that what they want to find (lines, circles, curves...) is in the image. They may find too many or too few such structures in the image and do not yield an existence proof for the found structures. As a main example, let us describe the Hough transform. We assume that the image under analysis is made of dots that may create aligned patterns or not. We then compute for each straight line in the image, the number of dots lying on the line. In fact, the Hough transform describes a fast algorithm to do so. The result of the Hough transform is then a map associating with each line a number of dots. Then, peaks of the Hough transform may be computed : they indicate the lines that have more dots. Which peaks are significant? Clearly, a threshold must be used. For the today technology, this threshold generally is given by a user or learned. The work of Kiryati, Eldar and Bruckstein [44] and of Shaked, Yaron and Kiryati [72] is, however, very close to what we develop here: these authors prove by large deviations estimates that lines in an image detected by Hough transform could be detected as well in an undersampled image without increasing significantly the false alarm rate. They view this method as an accelerator tool, while we shall develop it here as a geometric definition tool. The work of D. Geman and B. Jedynak [31], [32] is also related to what we present here. They have a finite list of possible hypotheses (shape classes and/or spatial positionings, including also a null hypothesis representing the event that no object is present) and wish to determine which one is true based on the results of various tests, using a decision tree. They use this for different applications in shape recognition: tracking roads in satellite images, face recognition, character recognition. We will see that our work is also closely related to statistical hypothesis testing. 1.2 Gestalt Theory The Hough transform is nothing but a particular kind of grouping. At the beginning of the century in Germany, the gestaltist school tried to define the laws of visual perception, that is, to understand how the visual objects are formed. According to gestalt theory, grouping is the law of visual perception : grouping laws are described by Wertheimer in [79] and also in the books of Kanizsa [40] and Metzger [53]. Their main idea is that whenever points (or 1 Now, once detection of geometric structures in an image has been achieved, the resulting set of detected structures may be very redundant, and we may need the MDL principle as a further step, in order to give the best explanation of what has been previously detected. We shall briefly develop this point of view in the experimental section of Chapter 2.

22 22 CHAPTER 1. INTRODUCTION previously formed visual objects) have a characteristic in common, they get grouped and form a new, larger visual object, called a gestalt. Some of the main grouping characteristics are colour constancy, good continuation, alignment, parallelism, common orientation, convexity and closedness (for a curve),... In addition, the grouping principle is recursive. For example, if points have been grouped into lines, then these lines may again be grouped according (e.g.) to parallelism. Our purpose is not to propose a new segmentation method. We rather propose a computational method to decide whether a given geometric structure (obtained by any segmentation or grouping method) is sure or not. The aim of the general research program is, given an image, to be able to get the list of all the gestalts it contains. We will begin with the detection of elementary geometric objects and then, in a second step we will group these objects according to a recursive and pyramidal algorithm. Now, in a first step we will have to study each grouping criterion one after the other in order to give a correct mathematical framework. 1.3 Helmholtz principle and meaningful events We first give here a general definition of what we will call a meaningful event. Our main idea is that a meaningful event is an event that, according to probabilistic estimates, should not happen in an image and therefore is significant. In that sense, we shall say that it is a proven event. The above informal definition immediately raises an objection : if we do probabilistic estimates in an image, this means that we have an a priori model. We are therefore losing any generality in the approach, unless the probabilistic model could be proven to be the right one for any image. In fact, we shall do statistical estimates, but related not to a model of the images but to a general model of perception. We shall apply the so-called Helmholtz principle. This principle attempts to describe when perception decides to group objects according to some quality (colour, alignment, etc.). It can be stated in the following way : we have a high probability of grouping two elements if the placement of the two elements has a low likelihood of resulting from accidental arrangement (see [36]). This principle, which may also be called a genericity principle can be generalized as follows. Assume that objects O 1, O 2,...,O n are present in an image. Assume that k of them, say O 1,...,O k have a common feature, say, same colour, same orientation, etc. We are then facing the dilemma : is this common feature happening by chance or is it significant? In order to answer this question, we make the following mental experiment : we assume that the considered quality has been randomly and uniformly distributed on all objects, i.e. O 1,...O n. Notice that this quality may be spatial (like position, orientation); then we (mentally) assume that the observed position of objects in the image is a random realization of this uniform process. Then, we may ask the question : is the observed repartition probable or not? The Helmholtz principle states that if the probability in the image of the observed configuration O 1,...,O k is very small, then the grouping of these objects makes sense. Now, what means very small? We will define an

23 1.3. HELMHOLTZ PRINCIPLE AND MEANINGFUL EVENTS 23 ε-meaningful event in the following way. Definition 1 (ε-meaningful event) We say that an event of type such configuration of points has such property is ε-meaningful, if the expectation in a image of the number of occurrences of this event is less than ε. This definition means that in an image, we will observe less than ε meaningful events happening just by chance, i.e. the expected number of errors (also called number of false alarms) will be less than ε. When ε 1, we talk about meaningful events. This seems to contradict our notion of a parameter-less theory. Now, it does not, since we will generally take ε = 1 (we want less than one error in an image) and also the ε-dependency of meaningfulness will be low (it will be in fact a log ε-dependency). The probability that a meaningful event is observed by accident will be very small. In such a case, our perception is liable to see the event, no matter whether it is true or not. Our term ε-meaningful is related to the classical p-significance in statistics; Now, as we shall see further on, we must use expectations in our estimates and not probabilities. The program we state here has been proposed several times in Computer Vision. We know of at least two instances: David Lowe [48] and Witkin-Tenenbaum [82]. Let us quote extensively David Lowe s program, whose mathematical consequences we shall try to develop : we need to determine the probability that each relation in the image could have arisen by accident, P (a). Naturally, the smaller that this value is, the more likely the relation is to have a causal interpretation. If we had completely accurate image measurements, the probability of accidental occurrence could become vanishingly small. For example, the probability of two image lines being exactly parallel by accident of viewpoint and position is zero. However, in real images there are many factors contributing to limit the accuracy of measurements. Even more important is the fact that we do not want to limit ourselves to perfect instances of each relation in the scene - we want to be able to use the information available from even approximate instances of a relation. Given an image relation that holds within some degree of accuracy, we wish to calculate the probability that it could have arisen by accident to within that level of accuracy. This can only be done in the context of some assumption regarding the surrounding distribution of objects, which serves as the null hypothesis against which we judge significance. One of the most general and obvious assumptions we can make is to assume that a background of independently positioned objects in three-space, which in turn implies independently positioned projections of the objects in the image. This null hypothesis has much to recommend it. (...) Given the assumption of independence in three-space position and orientation, it is easy to calculate the probability that a relation would have arisen to within a given degree of accuracy by accident. For example if two straight lines are parallel to within 5 degrees, we can calculate that the chance is only 5/180 = 1/36 that the relation would have arisen by accident from two independent objects. Some main points of the program we shall mathematically develop are contained in the preceding quotation : particularly the idea that significant geometric objects are the ones with small probability and the idea that this

24 24 CHAPTER 1. INTRODUCTION probability is anyway never zero because of the necessary lack of accuracy of observations in an image. Now, the preceding program is not accurate enough to give the right principles for computing Gestalt. The above mentioned example is e.g. not complete enough to be convincing. Indeed, we simply cannot fix a priori an event such as these two lines are parallel without merging it into the set of all events of the same kind, that is, all parallelisms. The space of straight lines in an image depends on the accuracy of the observations, but also on the size of the image itself. The fact that the mentioned probability be low (1/36) does not imply that few such events will occur in the image : we have to look for the number of possible pairs of parallel lines. If this number is large, then we will in fact detect many nonsignificant pairs of parallel lines. Only if the expected number of such pairs is much below 1, can we decide that the observed parallelism makes sense. Before proceeding to the mathematical theory, let us give some other toy example and discuss our definition of ε-meaningfulness. Example and Discussion : let us consider an image of size pixels. We assume that the grey-level at each pixel is 0 or 1, which means that we work on a binary image. Our main assumption is that if two points do not belong to the same object, then their grey-levels are independent (and equally distributed if the image is equalized). Now, imagine that we observe the following event: a black square. The expectation of the number of black squares in the image is simply the number of squares in the image times the probability that each pixel of a square is black. And so the expectation is ( ) , 2 which is much less than 1. We conclude that this event is meaningful. Remarks : 1) Subsquares (large enough) are also meaningful, and so are also candidates to be Gestalt. 2) Interaction of Gestalts : if we take into account that we observe a black square on a white background, then the expectation of the number of occurrences of this square-on-background event is ( ) ( ) 1 800, 2 and so we get a much more meaningful event. This is rather a toy example, but it shows immediately which kind of difficulties and apories are associated with meaningfulness : 1. Too many meaningful events : by the same argument as above, all large enough parts of the black square are meaningful. If (e.g.) we take all parts of this square with cardinality larger than 50, they are all meaningful and their number is larger than 2 50! We will see how to solve the problem of having too many meaningful events by defining the notion of maximal meaningful event. 2. Problem of the a priori/a posteriori definition of the event : if we take an arbitrary pattern in a random binary image, then the expectation of the number

25 1.4. DETAILED PLAN OF THE THESIS 25 of occurrences of this event is (1/2) 100 which is much less than 1. The answer is that we need an a priori geometric definition of the event, as done in Gestaltism. The event cannot be defined from the observed image itself! 3. Moreover, we can remark that the definition of the geometric event changes its meaningfulness. For example if we consider our black square as a convex set with area 100, then the expectation becomes (1/2) 100 times the number of convex sets with area 100. And so the event may loose its meaningfulness. 4. Abstract geometrical character of the information, lack of localization. ex.1: if we observe a meaningful black patch, all what we can say is: there is a black patch and the indicated dots may belong to it. We do not know which points belong for sure to the event. ex.2: if we observe a meaningful alignment of points, then we can say on that line, there are aligned points but we are not able to define the endpoints. 5. How many gestalts? If we make a list of pregnant gestalts, following Gestalt theory, the longer the list, the higher the expectation of finding false gestalt. Thus, perception, and also computer vision will at some time meet the following problem : to find the best trade-off between number of Gestalt (which might be a priori as high as possible) and the false detection rate. For the time being, we shall not address this problem; it will be addressed only when we are in a position to do a correct theory for many gestalts. 1.4 Detailed plan of the thesis In Chapter 2, we will focus on alignments, one of the most basic gestalts. At each point of the image (size N N), we compute a direction which is defined as the orthogonal of the gradient computed on a 2 2 neigbourhood. In Section 2.1, we first define the notion of meaningful alignment. For a segment S made of l independent points (i.e. at a distance larger than 2), we compute the number k of points among the l having their direction aligned with the direction of S according to a given precision p. This precision is fixed and we denote by P (k, l) the probability of the event at least k aligned points among l. This probability is simply the tail of the binomial distribution: P (k, l) = l i=k ( ) l p i (1 p) l i. i Since the total number of segments in the image is about N 4, we get the following definition: the segment S is ε-meaningful iff N 4 P (k, l) ε. In Section 2.2, we associate with each segment S of the image its number of false alarms, defined as NF (S) = N 4 P (k, l),

26 26 CHAPTER 1. INTRODUCTION and derive some elementary properties. This number of false alarms is a measure of meaningfulness in the sense that the smallest it is, the most meaningful the segment is. In section 2.3, we give some necessary and sufficient conditions on k and l for meaningfulness. We also give some asymptotic estimates (as l goes to infinity) and inequalities for the meaningfulness threshold k(l) defined as the smallest integer k such that N 4 P (k, l) ε. The result roughly says that k(l) p l + C l ln N 4 ε. In Section 2.4, we extend the function (k, l) P (k, l) to a continuous domain. This will be useful to derive some properties of meaningful alignments, and particularly the following result about changing scale, which is a consistency check for our model: let S be a segment of an image of size N N, let l be the length of S and k the number of aligned points it contains; we increase the resolution of the image in such a way that the new image has size λn λn, with λ > 1 and the considered segment is now S λ with length λl and containing λk aligned points (we admit that the density of aligned points on the segment is scale-invariant). The main result is then NF (S λ ) = (λn) 4 P (λk, λl) < N 4 P (k, l) = NF (S). In Section 2.5, we define the notion of maximal meaningful segment: a segment is maximal meaningful if it is meaningful and if it is not contained in, or does not contain a more meaningful segment. The main conjecture is that if two segments, lying on the same line, are maximal meaningful, then they have an empty intersection. More precisely, our conjecture is if A and B are two meaningful segments on the same line, then min (NF (A B), NF (A B)) < max (NF (A), NF (B)). This result is asymptotically proved. It is also proved for the gaussian approximate of the binomial distribution and for the large deviation estimate. In Section 2.6, we show some experiments on different images, we also describe the detection algorithm. In Section 2.7, we discuss the choice of the precision p. And finally in Section 2.8, we show that what has been done for alignments can be directly extended to the detection of arcs of circles. In Chapter 3, we address the problem of the computation of the orientation in an image. We compute the gradient at a point on a 2 2 window, as the gradient of the bilinear interpolate at the center of the window. We first show (Section 3.2) that such a way of computing the orientation does not create any bias in the orientation map. Then, on the contrary, we show that the histogram of orientations in the image is very sensitive to a quantization of grey levels (the extreme case is a binary image: possible orientations are only the multiples of π/4). In Section 3.3, we propose a solution for orientation dequantization: the Fourier (1/2, 1/2) translation, assuming that the image is a Shannon signal, i.e. well-sampled. We study the effect of this translation and explain why this performs a dequantization of orientation. We

27 1.4. DETAILED PLAN OF THE THESIS 27 end this chapter (Section 3.4) with some experiments proving that the dequantization works for the detection of alignments. In Chapter 4, we define the notion of meaningful modes of a histogram. This is an important step towards the pyramidal construction of gestalts, since it will be used for the recursive grouping of objects. As an example, we will consider the histogram of the orientations of the maximal meaningful segments detected in the first part. We will group these segments according to the mode they belong to, which corresponds to a grouping according to the parallelism criterion. In Section 4.1, we first consider the case of a discrete histogram : M points have a value in the set {1, 2,..., L}. Our hypothesis is here : the M points have independent values, uniformly distributed on {1,..., L}. For an interval [a, b] [1, L], we define its relative length by p(a, b) = (b a+1)/l (it is the prior probability for a point to have value in the interval [a, b]), and its relative mass r(a, b) = k(a, b)/m, where k(a, b) is the number of points among the M with value in [a, b] (this relative mass represents the posterior probability for a point to have value in [a, b]). Then we define the relative entropy of the interval [a, b] by H([a, b]) = r(a, b) log r(a, b) 1 r(a, b) + (1 r(a, b)) log p(a, b) 1 p(a, b). An ε-meaningful interval is an interval that contains significantly more points than expected, more precisely, it is defined by: r(a, b) > p(a, b) and H([a, b]) > 1 M L(L + 1) log. 2ε Thanks to Hoeffding inequality, this implies that the expected number of ε-meaningful intervals is less than ε. For the same reasons we had to define maximal meaningful segments, we define also maximal meaningful intervals : an interval [a, b] is maximal meaningful if it is meaningful and if for all interval J such that J [a, b] or [a, b] J, then H(J) < H([a, b]). The main result we prove is that maximal meaningful intervals cannot meet. We then define ε-meaningful gaps: it is the same definition as for ε-meaningful interval where we just replace the first condition by r(a, b) < p(a, b) (i.e. a gap has to contain significantly less points than expected). The final definition is then the one of meaningful mode: it is a meaningful interval that does not contain any meaningful gap. In Section 4.2, we extend the previous definitions to the case of a continuous histogram. In Section 4.3, we show some properties of meaningful intervals. In particular we prove that if [a, b] is a maximal meaningful interval of a continuous histogram denoted by h, then h(a) = h(b) and h is increasing at point a and decreasing at point b. We also give the analogous result for a discrete histogram. In Section 4.4, we address the problem of the choice of the reference interval and finally in Section 4.5 we show some applications and experimental results. In Chapter 5, we are interested in a third type of gestalt: strong contrast along a curve. The considered curves are the level lines of the image. Let u be a discrete image and let N ll be the number of closed level lines of u. The contrast at a point x is defined as

28 28 CHAPTER 1. INTRODUCTION u(x). We consider the following event : at each point of a closed level line of length l, the contrast is larger than µ. Now, what is the prior probability P (µ) for a point to have a contrast larger than µ? Since the uniform distribution does not make any sense here, we will use the empirical distribution given by the image itself, which means that P (µ) = 1 #{x / u(x) µ }. N 2 Then for a closed level line C (also called a boundary) of length l and of minimal contrast µ, we define its number of false alarms by NF (C) = N ll P (µ) l. The curve C is said to be an ε-meaningful boundary as soon as its number of false alarms is less than ε. In the same way, we also define ε-meaningful edges, where we replace the number N ll by the number of pieces of level lines. The natural relation of inclusion between level sets allows us to define a relation of comparison for level lines, which is then used (in Section 5.2) for the definition of maximal meaningful boundaries. In Section 5.3, we show the experimental results on several images, and we compare these results with the ones obtained with the Mumford-Shah functional and the Canny-Deriche edge detector.

29 Chapter 2 Meaningful Alignments 2.1 Definition of meaningful segments The discrete nature of applied geometry Although mathematicians and even computer vision scientists sometimes allude to or presuppose the fact that an image has a potentially infinite resolution, it must be recalled here that all images of which we physically dispose are discrete events containing a finite amount of information. Perceptual and digital images are the result of a convolution followed by a spatial sampling, as described in Shannon-Whittaker theory. From the samples, a continuous image may be recovered by Shannon interpolation, but the samples by themselves contain all of the image information. From this point of view, one could claim that no absolute geometric structure is present in an image, e.g. no straight line, no circle, no convex set, etc. We claim in fact the opposite and our definition to follow will explain in which sense we can be sure that a line is present in a digital image. Let us first explain what the basic local information is, that we can dispose of in a digital image. Let us consider a grey-level image of size N (that is a regular grid of N 2 pixels). At each point x, or pixel, of the discrete grid, we have a grey-level u(x) which is quantized and therefore inaccurate. We may compute at each point the direction of the gradient, which is the simplest local contrast invariant information (local contrast invariance is a necessary requirement in image analysis and perception theory [79]). We compute a direction, which is the direction of the level line passing by the point calculated on a q q pixels neighbourhood (generally q = 2). No previous smoothing on the image will be performed and no restoration: such processes would loose the a priori independence of directions which is required for the detection method. The computation of the gradient direction is based on an interpolation (we have q = 2). We define the direction at pixel (i, j) by rotating by π 2 the direction of the gradient of the order 2 interpolation at the center of the 2 2 window made of pixels (i, j), (i + 1, j), (i, j + 1) and 29

30 30 CHAPTER 2. MEANINGFUL ALIGNMENTS (i + 1, j + 1). We get dir(i, j) = 1 ( D D where D 1 [u(i, j + 1) + u(i + 1, j + 1)] + [u(i, j) + u(i + 1, j)] = 2 [u(i + 1, j) + u(i + 1, j + 1)] [u(i, j) + u(i, j + 1)] Then we say that two points X and Y have the same direction with precision 1 n if ). Angle(dir(X), dir(y )) π n. In agreement with psychophysics and numerical experimentation, we generally consider n = 16. According to the Helmholtz principle, we treat the direction at all points in an image as a uniformly distributed random variable. In the following, we assume that n > 2 and we set p = 1 n < 1 2 ; p is the accuracy of the direction. We interpret p as the probability that two independent points have the same direction with the given accuracy p. In a structureless image, when two pixels are such that their distance is more than 2, the directions computed at the two considered pixels should be independent random variables. By Helmholtz principle, every deviation from this randomness assumption will lead to the detection of a structure (Gestalt) in the image. Alignments provide a concrete way to understand Helmholtz principle. We know (by experience) that images have contours and therefore meaningful alignments. This is mainly due to the smoothness of contours of solid objects and the generation of geometric structure by most physical and biological laws. From now on, we do the computation as though each pixel had a direction which is uniformly distributed, two points at a distance larger than q = 2 having independent directions. Let A be a segment in the image made of l independent pixels (it means that the distance between two consecutive points of A is 2 and so, the real length of A is 2l). We are interested in the number of points of A having their direction aligned with the direction of A. Such points of A will simply be called aligned points of A. The question is to know what is the minimal number k(l) of aligned points that we must observe on a length l segment so that this event becomes meaningful when it is observed in an image Definition of meaning Let A be a straight segment with length l and x 1, x 2,..., x l be the l (independent) points of A. Let X i be the random variable whose value is 1 when the direction at pixel x i is aligned with the direction of A, and 0 otherwise. We then have the following Bernoulli distribution for X i : P [X i = 1] = p and P [X i = 0] = 1 p. The random variable representing the number of x i having the good direction is S l = X 1 + X X l.

31 2.1. DEFINITION OF MEANINGFUL SEGMENTS 31 Because of the independence of the X i, the law of S l is given by the binomial distribution ( ) l P [S l = k] = p k (1 p) l k. k When we consider a length l segment, we want to know whether it is ε-meaningful or not among all the segments of the image (and not only among the segments having the same length l). Let m(l) be the number of oriented segments of length l in a N N image. We define the total number of oriented segments in a N N image as the number of pairs (x, y) of points in the image (an oriented segment is given by its starting point and its ending point) and so we have l max l=1 m(l) N 4. Definition 2 (detection thresholds) We call detection thresholds a family of positive values w(l, ε, N), 1 l l max, such that l max l=1 w(l, ε, N)m(l) ε. Definition 3 (ε-meaningful segment - general definition) A length l segment is ε-meaningful in a N N image if it contains at least k(l) points having their direction aligned with the one of the segment, where k(l) is given by k(l) = min {k N, P [S l k] w(l, ε, N)}. Let us develop and explain this definition. For 1 i N 4, let e i be the following event: the i-th segment is ε-meaningful and χ ei denote the characteristic function of the event e i. We have P [χ ei = 1] = P [S li k(l i )] where l i is the length of the i-th segment. Notice that if l i is small we may have P [S li k(l i )] = 0. Let R be the random variable representing the exact number of e i occurring simultaneously in a trial. Since R = χ e1 + χ e χ en 4, the expectation of R is l E(R) = E(χ e1 ) + E(χ e2 ) E(χ ) = max en 4 l=0 m(l)p [S l k(l)]. We compute here the expectation of R but not its law because it depends a lot upon the relations of dependence between the e i. The main point is that segments may intersect and overlap, so that the e i events are not independent, and may even be strongly dependent. By definition we have P [S l k(l)] w(l, ε, N), so that E(R) l max l=1 w(l, ε, N)m(l) ε.

32 32 CHAPTER 2. MEANINGFUL ALIGNMENTS This means that the expectation of the number of ε-meaningful segments in an image is less than ε. This notion of ε-meaningful segments has to be related to the classical α-significance in statistics, where α is simply w(l, ε, N). The difference which leads us to have a slightly different terminology is following: we are not in a position to assume that the segment detected as ε-meaningful are independent in anyway. Indeed, if (e.g.) a segment is meaningful it may be contained in many larger segments, which also are ε-meaningful. Thus, it will be convenient to compare the number of detected segments to the expectation of this number. This is not exactly the same situation as in failure detection, where the failures are somehow disjoint events. See remark (*) below. The question of how to fix the detection thresholds is widely open. Our definition of ε-meaningful segment will be a restriction of the above general definition. Since there is a priori no reason to favour small or large segments, we choose a uniform family of detection thresholds: l 1 w(l, ε, N) = ε N 4. Our definition of ε-meaningful segment is then the following one. Definition 4 (ε-meaningful segment) A length l segment is ε-meaningful in a N N image if it contains at least k(l) points having their direction aligned with the one of the segment, where k(l) is given by { k(l) = min k N, P [S l k] In the following, we write P (k, l) for P [S l k]. ε } N 4. Remark : We could have defined a ε-meaningful length l segment as a segment ε-meaningful only among the set of the length l segments. It would have been a segment with at least k (l) points having the good direction where k (l) is defined by m(l) P [S l k (l)] ε. Notice that m(l) N 3 because there are approximately N 2 possible discrete straight lines in a N N image and on each discrete line, about N choices for the starting point of the segment. But we did not keep this definition because when looking for alignments we cannot a priori know the length of the segment we look for. In the same way, we never consider events like : a segment has exactly k aligned points, but rather a segment has at least k aligned points, and k must be given, as we do, by a detectability criterion and not a priori fixed. 2.2 Number of false alarms Definition Definition 5 (Number of false alarms) let A be a segment of length l 0 with k 0 points having their direction aligned with the direction of A. We define the number of false alarms

33 2.2. NUMBER OF FALSE ALARMS 33 of A as NF (k 0, l 0 ) = N 4 P [S l0 k 0 ] = N 4 l 0 k=k 0 ( ) l0 p k (1 p) l0 k. k Interpretation of this definition : the number NF (k 0, l 0 ) of false alarms of the segment A represents an upper bound of the expectation in an image of the number of α-meaningful segments, where α = NF (k 0, l 0 ). Remark : (*) (relative notion) Let A be a segment and NF (k 0, l 0 ) its number of false alarms. Then A is ε-meaningful if and only if NF (k 0, l 0 ) ε, but it is worth noticing that we could have compared NF (k 0, l 0 ) not to ε but to the real number of segments with probability less than the one of A, observed in the image. For example, if we observe 100 segments of probability less than α, and if the expected value R of the number of segments of probability less than α was 10, we are able to say that this 100-segments event could happen with probability less than 1/10, since 10 = E(R) 100 P [R 100]. Now, each of these 100 segments only is 10-meaningful! Of course, we cannot deduce in any way that each one of the segment is meaningful Properties of the number of false alarms Proposition 1 The number of false alarms NF (k 0, l 0 ) has the following properties : 1. NF (0, l 0 ) = N 4, which proves that the event for a segment to have more than zero aligned points is never meaningful! 2. NF (l 0, l 0 ) = N 4 p l 0, which shows that a segment such that all of its points have the good direction is ε-meaningful if its length is larger than ( 4 ln N + ln ε)/ ln p. 3. NF (k 0 + 1, l 0 ) < NF (k 0, l 0 ). This can be interpreted by saying that if two segments have the same length l 0, the more meaningful is the one which has the more aligned points. 4. NF (k 0, l 0 ) < NF (k 0, l 0 + 1). This property can be illustrated by the following figure of a segment (where a represents a misaligned point, and a represents an aligned point) : If we remove the last point (on the right), which is misaligned, the new segment is less probable and therefore more meaningful than the considered one. 5. NF (k 0 + 1, l 0 + 1) < NF (k 0, l 0 ). Again, we can illustrate this property : If we remove the last point (on the right), which is aligned, the new segment is more probable and therefore less meaningful than the considered one.

34 34 CHAPTER 2. MEANINGFUL ALIGNMENTS This proposition is a consequence of the definition and properties of the binomial distribution (see [23]). If we consider a length l segment (made of l independent pixels), then the expectation of the number of points of the segment having the same direction as the one of the segment is simply the expectation of the random variable S l, that is l l E(S l ) = E(X i ) = P [X i = 1] = p l. i=1 i=1 We are interested in ε-meaningful segments, which are the segments such that their number of false alarms is less than ε. These segments have a small probability (less than ε/n 4 ), and since they represent alignments (deviation from randomness), they should contain more aligned points than the expected number computed above. That is the main point of the following proposition. Proposition 2 Let A be a segment of length l 0 1, containing k 0 points having the same direction as the one of A. If P [S l0 k 0 ] p (which is the case when A is meaningful because N is very large and thus, pn 4 > 1), then k 0 pl 0 + (1 p). This is a sanity check for the model. This proposition will be proved by Lemma 4, where we will extend the discrete function P (k, l) = P [S l k] to a continuous domain. 2.3 Thresholds and asymptotic estimations In this section, we shall give precise asymptotic and non-asymptotic estimates of the thresholds k(l), which roughly say that k(l) pl + C l ln N 4 ε, where 2p(1 p) C 1 2. Some of these results are illustrated by Figure 2.1. These estimates are not necessary for the algorithm (because P (k, l) is easy to compute) but they provide an interesting order of magnitude for k(l) Sufficient condition of meaningfulness In this subsection, we will see how the central limit theorem and other inequalities concerning the tail of the binomial distribution can provide us a sufficient condition of meaningfulness. The key point is the following result due to Hoeffding (see [37]).

35 2.3. THRESHOLDS AND ASYMPTOTIC ESTIMATIONS minimal number of aligned points k(l) length l Figure 2.1: Estimates for the threshold of meaningfulness k(l) The middle (stepcase) curve represents the exact value of the minimal number of aligned points k(l) to be observed on a 1-meaningful segment of length l in an image of size 512, for a direction precision of 1/16. The upper and lower curves represent estimates of this threshold obtained by Proposition 4 and Proposition 7. Proposition 3 (Hoeffding s inequality) Let k, l be positive integers with k l, and p a real number such that 0 < p < 1. Then if r = k/l p, we have the inequalities ( P (k, l) exp lr ln p ) 1 p + l(1 r) ln exp( l(r p) 2 h(p)) exp( 2l(r p) 2 ), r 1 r where h is the function defined on ]0, 1[ by h(p) = 1 1 2p ln 1 p p h(p) = 1 2p(1 p) for 0 < p < 1 2, for 1 p < 1. 2 The function h defined above is continuous on ]0, 1[, decreasing on ]0, 1 2 ] and increasing on [ 1 2, 1[. Its minimal value is 2. We plot this function on Figure 2.2 Using the previous proposition, we deduce a sufficient condition for a segment to be meaningful. The size N of the image, and the precision p < 1/2 are fixed. Proposition 4 (sufficient condition of ε-meaningfulness) Let S be a length l segment, containing at least k aligned points. If k pl + 4 ln N ln ε l, h(p)

36 36 CHAPTER 2. MEANINGFUL ALIGNMENTS Figure 2.2: The graph of the function p h(p). then S is ε-meaningful. Proof : are such that Let S be a length l segment, containing at least k aligned points, where k and l If we denote r = k/l, then r p and By Proposition 3 we deduce that k pl + l(r p) 2 4 ln N ln ε l. h(p) 4 ln N ln ε. h(p) P (k, l) exp( l(r p) 2 h(p)) exp( 4 ln N + ln ε) = which means, by definition, that the segment S is ε-meaningful. ε N 4, Corollary 1 Let S be a length l segment, containing at least k aligned points. If then S is ε-meaningful. k pl + l (4 ln N ln ε), 2 Proof : This result is a simple consequence of Proposition 4 and of the fact that for p in ]0, 1[, h(p) 2 (see Hoeffding [37]) Necessary conditions for meaningfulness The first simple necessary condition we can get is a threshold on the length l. For an ε- meaningful segment, we have p l P [S l k(l)] ε N 4,

37 2.3. THRESHOLDS AND ASYMPTOTIC ESTIMATIONS 37 so that l 4 ln N + ln ε. (2.1) ln p Let us give a numerical example : if the size of the image is N = 512, and if p = 1/16 (which corresponds to 16 possible directions), the minimal length of a 1-meaningful segment is l min = 9. This necessary condition is only on l, so we now look for more precise conditions involving both k and l. Lemma 1 Let 0 < r < 1 be a real number, and g r the function defined on ]0, 1[ by g r (x) = r ln x + (1 r) ln(1 x), then g r is concave and has its maximum at point x = r. Moreover if 0 < p r then 2(r p) 2 g r (r) g r (p) (r p)2 p(1 p). Lemma 2 If N 5 and if S = (k, l) is a ε-meaningful segment with 1 k l, then if we denote r = k/l, g r (r) g r (p) > 3 ln N ln ε. l Proof : Let us assume first that 1 k l 1. Let S = (k, l) be an ε- meaningful segment, then ( ) l p k (1 p) l k P (k, l) k ε N 4. If n is an integer larger than 1, by the Stirling s formula refined to (see [23] for example), we have We then deduce that ( ) l k n n e n 2πne 1/(12n+1) n! n n e n 2πne 1/12n. We assumed that 1 k l 1 and so we get l l 2πl k k 2πk(l k) (l k) 2π(l k) e 1 12l k 1 12(l k). 1 12k (l k) 1 12l k (l k) = 1 6. On the other hand, we notice that e lgr(p) = p k (1 p) l k and e lgr(r) = ( ) k k ( 1 k ) l k. l l

38 38 CHAPTER 2. MEANINGFUL ALIGNMENTS And we also have k(l k) = l r(1 r) l/2, and we then obtain And consequently ε P (k, l) N 4 ( ) l p k (1 p) l k 2 e 1/6 e l(gr(p) gr(r)). k 2πl l(g r (r) g r (p)) 4 ln N ln ε 1 2 ln l + ln 2 2π 1 6. Since the size of the considered image is N N and l is a length of a segment of the image, we have l 2N. And so l(g r (r) g r (p)) ( 4 1 ) ln N ln ε ln π This inequality permits to conclude since the hypothesis N 5 ensures that 7 2 ln N 1 2 ln π 1 > 3 ln N. 2 6 If k = l, then r = 1 and we simply have g 1 (1) g 1 (p) = ln p. Now, since S = (k, l) is ε-meaningful, we have and therefore g r (r) g r (p) = ln p p l ε N 4, 4 ln N ln ε l 3 ln N ln ε. l Asymptotics for the meaningfulness threshold k(l) In this section, we still consider that ε and p are fixed. We will work on asymptotic estimates of P (k, l) when l is large. particular case of the binomial distribution (see [23]). We first recall a version of the central limit theorem in the Proposition 5 (De Moivre-Laplace limit theorem) If α is a fixed positive number, then as l tends to +, [ P S l pl + α ] l p(1 p) 1 + 2π α e x2 /2 dx. Our aim is to get the asymptotic behaviour of the threshold k(l) when l is large. The problem is that if l gets to infinity, we also have to consider that N tends to infinity (because, since l is the length of a segment in a N N image, necessarily l 2N). And so the α used in the De Moivre-Laplace theorem will depend on N. This is the reason why we use the following stronger version of the previous theorem (see [23]).

39 2.3. THRESHOLDS AND ASYMPTOTIC ESTIMATIONS 39 Proposition 6 (Feller) If α(l) + and α(l) 6 /l 0 as l +, then [ P S l pl + α(l) ] l p(1 p) 1 + 2π α(l) e x2 /2 dx. Proposition 7 (asymptotic behaviour of k(l)) When N + and l + in such a way that l/(ln N) 3 +, one has k(l) = pl + 2p(1 p) l ( ln N 4 ) + O(ln ln N). ε Proof : We define, for i {0, 1}, α i (l, N) = k(l) i pl lp(1 p). Lemmas 1 and 2 imply that α 0 (l, N) 3 ln N, so that α i (l, N) + as l. Conversely, Corollary 1 implies that l k(l) pl + (4 ln N ln ε) + 1, 2 from which we deduce that α 6 i (l, N) (4 ln N ln ε)3 C, l l where C is a constant. Since ε is fixed and l/(ln N) 3 +, we get that α 6 i (l, N)/l 0. Hence, we can apply Feller s Theorem to obtain i {0, 1}, [ P S l pl + α i (l, N) ] l p(1 p) 1 + e x2 /2 dx. (2.2) 2π α i (l,n) For i = 0 (resp. for i = 1), the left hand term of (2.2) is smaller (resp. larger) than ε/n 4. Besides, the right hand term is equivalent to For i = 0, we deduce that 1 2παi (l, N) e α2 i (l,n)/ π α 0 (l, N) e α2 0 (l,n)/2 (1 + o(1)) ε N 4, which implies O(1) + O(ln(α 0 (l, N))) α2 0 (l, N) + o(1) ln ε 2 N 4,

40 40 CHAPTER 2. MEANINGFUL ALIGNMENTS and finally that is k(l) pl + α 0 (l, N) 2 2 ln N 4 ε 2p(1 p) l + O(ln ln N), ( ln N 4 ) + O(ln ln N). (2.3) ε The case i = 1 gives in a similar way k(l) 1 pl + 2p(1 p) l ( ln N 4 ) + O(ln ln N). (2.4) ε Finally (2.3) and (2.4) yield the estimation of k(l) announced in Proposition Lower bound for the meaningfulness threshold k(l) In this part, we refine the necessary condition of ε-meaningfulness obtained in Section 4.2 by using a comparison between the binomial and the gaussian laws given by the following Proposition 8 (Slud 1977, [74]) If 0 < p 1/4 and pl k l, then P [S l k] 1 2π + α(k,l) e x2 /2 dx where α(k, l) = k pl lp(1 p). Proposition 9 (necessary condition of meaningfulness) We assume that 0 < p 1/4 and N are fixed. If a segment S = (k, l) is ε-meaningful then where α(n) is uniquely defined by k pl + α(n) lp(1 p), 1 + e x2 /2 dx = 2π α(n) ε N 4. This proposition is a direct consequence of Slud s Theorem. The assumption 0 < p 1/4 is not a strong condition since it is equivalent to consider that the number of possible oriented directions is larger than Properties of meaningful segments Continuous extension of the binomial tail We first extend the discrete function P (k, l) to a continuous domain (see [23]).

41 2.4. PROPERTIES OF MEANINGFUL SEGMENTS 41 Lemma 3 The map P : (k, l) p x k 1 (1 x) l k dx x k 1 (1 x) l k dx (2.5) is continuous on the domain { (k, l) R 2, 0 k l < + }, decreasing with respect with k, increasing with respect with l, and for all integer values of k and l one has P (k, l) = P (k, l). Proof : The continuity results from classical theorems on the regularity of parameterized integrals. Notice that the continuous extension of P when k = 0 is P (0, l) = 1. Now, we prove that P (k, l) is decreasing with respect with k. For that purpose, we introduce the map A(k, l) = p 0 1 p x k 1 (1 x) l k dx. x k 1 (1 x) l k dx Since 1/ P = 1 + 1/A, we need to prove that A decreases with respect with k. We compute 1 A (k, l) = A k p 0 x k 1 (1 x) l k ln p 0 x k 1 (1 x) l k dx x 1 x dx 1 p x k 1 (1 x) l k x ln 1 x dx 1, x k 1 (1 x) l k dx and we apply the mean value theorem to obtain the existence of (α, β) such that 0 < α < p < β < 1 and 1 A A α (k, l) = ln k p 1 α ln β 1 β. The right hand term being negative, the proof is complete. The proof that P increases with respect with l is similar, the increasing map x ln x 1 x being replaced by the decreasing map x ln(1 x). Finally, the fact that P (k, l) = P (k, l) for integer values of k and l is a consequence of the relation P (k +1, l +1) = p P (k, l)+(1 p) P (k +1, l) (see [23] for example). Remark : Properties (2) and (3) guarantee that P is a good interpolate of P in the sense that the monotonicity of P in both variables k and l is extended to the continuous domain. Notice that a proof based on the same method (using that x ln x is increasing) will establish that P k + P 0, l which is the natural extension of the property P (k + 1, l + 1) P (k, l) previously established in Proposition 1. From now on, we shall assume that p < 1/2. The following property is a good example of the interest of the continuous extension of P. This yields a proof of the announced Proposition 2.

42 42 CHAPTER 2. MEANINGFUL ALIGNMENTS Lemma 4 If l 1, then p P (p(l 1) + 1, l) < 1 2. The right-hand side of this inequality is a known result : it has been proved by Kaas and Buhrman [42]; indeed, they showed that for the binomial distribution the median and the mean are distant by no more than max(p, 1 p) (which is 1 p in our case, and thus this implies that P (pl + 1 p, l) < 1/2.) Proof : Using A(k, l) as in Lemma 3 we see that it is sufficient to prove that if k 1 = p(l 1), then p 1 p 1 p x k 1 (1 x) l k dx p 0 x k 1 (1 x) l k dx < For that purpose, we write f(x) = x k 1 (1 x) l k and we study the map g(x) = f(p x) f(p + x). 1 p x k 1 (1 x) l k dx. (2.6) A simple computation gives that up to a positive multiplicative term, g (x) writes 2x 2 (k 1 (1 p)(l 1)) 2p(1 p)(k 1 p(l 1)), and since k 1 = p(l 1) and p < 1/2, we have g < 0 on ]0, p]. Hence, g(x) < g(0) = 1 on ]0, p], which implies p 0 f(x)dx = p 0 f(p x)dx < and the right-hand side of (2.6) is proved. p 0 f(p + x)dx = 2p p f(x)dx < For the left-hand side, we follow the same reasoning with the map 1 p f(x)dx g(x) = f(p x) f(p + 1 p p x). After a similar computation, we obtain that g 0 on ]0, p], so that f(p x) f(p + 1 p p x) on ]0, p]. We integrate this inequality to obtain p 0 f(x)dx = p 0 f(p x)dx p 0 f(p + 1 p p x)dx = p 1 p 1 p f(x)dx, which proves the left-hand side of (2.6) Increasing the resolution In general, it is not easy to compare P (k, l) and P (k, l ) by performing simple computations on k, k, l and l. Assume that we have observed a meaningful segment S = (k, l) in a N N image. We increase the resolution of the image in such a way that the new image has size

43 2.4. PROPERTIES OF MEANINGFUL SEGMENTS 43 λn λn, with λ > 1, and the considered segment is now S λ = (λk, λl) (we admit that the density of aligned points on the segment is scale-invariant). Our aim is to compare the number of false alarms of S and of S λ, i.e. compare N 4 P (k, l) and (λn) 4 P (λk, λl). The result is given by the following proposition, and it shows that NF (S λ ) < NF (S). This is a consistency check for our model, since otherwise it would turn out that to get a better view does not increase the detection! Theorem 1 Let S = (k, l) be a 1-meaningful segment of a N N image (with N 6), then the function defined for λ 1 by is decreasing. λ (λn) 4 P (λk, λl) This theorem has the following corollary, which gives a way to compare the meaningfulness of two segments of the same image. Corollary 2 Let A = (k, l) and B = (k, l ) be two 1-meaningful segments of a N N image (with N 6) such that k and l > l. l Then, B is more meaningful than A, that is NF (B) < NF (A). k l Proof : Indeed, we can take λ = l /l > 1, so that k λk. We then have, by Theorem 1, (λn) 4 P (k, l ) N 4 P (k, l), and therefore N 4 P (k, l ) < N 4 P (k, l), i.e NF (B) < NF (A). An interesting application of Corollary 2 is the concatenation of meaningful segments. Let A = (k, l) and B = (k, l ) be two meaningful segments lying on the same line. Moreover we assume that A and B are consecutive, so that A B is simply a (k + k, l + l ) segment. Then, since k + k l + l we deduce, thanks to the above corollary, that This shows the following corollary. ( ) k min l, k l, NF (A B) < max(nf (A), NF (B)).

44 44 CHAPTER 2. MEANINGFUL ALIGNMENTS Corollary 3 The concatenation of two meaningful segments is more meaningful than the least meaningful of both. The next lemma is useful to prove Theorem 1. Lemma 5 Define for p < r 1, B(r, l) = P (rl, l). Then, one has ln B l where g r is the map defined in Lemma 1. < 1 l (g r(r) g r (p)), Proof : We first write the Beta integral in terms of the Gamma function (see [4]), Thanks to (2.5), this yields 1 0 t x 1 (1 t) y 1 dt = Γ(x)Γ(y) Γ(x + y). B(r, l) = We now use the expansion (see [4]) Γ(l + 1) Γ(rl) Γ((1 r)l + 1) p 0 x rl 1 (1 x) (1 r)l dx. (2.7) d ln Γ(x) dx = γ 1 + x + ( 1 n 1 ), (2.8) x + n n=1 where γ is Euler s constant. Using (2.7) and (2.8), we obtain [ ] 1 B = γ 1 + B l l ( 1 n 1 l n ) r γ 1 + rl + ( 1 n 1 rl + n ) n=1 n=1 [ ] 1 + (1 r) γ (1 r)l ( 1 n 1 (1 r)l n ) + p 0 n=1 (r ln x + (1 r) ln(1 x))x rl 1 (1 x) (1 r)l dx p. x rl 1 (1 x) (1 r)l dx 0 The function x r ln x + (1 r) ln(1 x) is increasing on ]0, r[, and we have p < r, so Then p 0 (r ln x + (1 r) ln(1 x))x rl 1 (1 x) (1 r)l dx 1 B B l p 0 x rl 1 (1 x) (1 r)l dx 1 + l + r ( rl + n + n=1 r ln p + (1 r) ln(1 p). 1 r (1 r)l + n 1 ) + r ln p + (1 r) ln(1 p). l + n

45 2.4. PROPERTIES OF MEANINGFUL SEGMENTS 45 Now, let us consider the function f : x r rl + x + 1 r (1 r)l + x 1 l + x, defined for all x > 0. Since 0 < r 1 we have rl + x l + x and (1 r)l + x l + x, so that f(x) 0 and f r (x) = (rl + x) 2 1 r ((1 r)l + x) (l + x) 2 We deduce that for N integer larger than 1, A simple integration gives N 0 Finally f(x) dx = r ln(1 + rl N which yields 1 B B l + n=1 N f(n) n=1 N 0 f(x) dx. (1 r)l ) + (1 r) ln(1 + N ) ln(1 + l ) r ln r (1 r) ln(1 r). N r ( rl + n + 1 r (1 r)l + n 1 ) r ln r (1 r) ln(1 r), l + n 1 l r ln r (1 r) ln(1 r) + r ln p + (1 r) ln(1 p) = 1 l g r(r) + g r (p). Proof of Theorem 1 : also, thanks to Lemma 2, Let us define r = k/l. Since S is 1-meaningful we have r > p and g r (r) g r (p) 3 ln N. l Let f be the function defined for λ 1 by f(λ) = (λn) 4 P (λk, λl) = (λn) 4 B(r, λl). If we compute the derivative of f and use Lemma 5, we get ln f λ = 4 λ + l ln B (r, λl) l < 4 λ + l( 1 λl g r(r) + g r (p)) < 5 λ 3 ln N which is negative thanks to the hypothesis N 6. Remark : For the approximation of P (k, l) given by the Gaussian Law G(k, l) = 1 2π + α(k,l) e x2 2 dx where α(k, l) = ( k l p ) l p(1 p), we immediately have the result that G(k, l ) < G(k, l) when k /l k/l > p and l > l.

46 46 CHAPTER 2. MEANINGFUL ALIGNMENTS 2.5 Maximal meaningful segments Definition Suppose that on a straight line we have found a meaningful segment S with a very small number of false alarms (i.e. NF (S) << 1). Then if we add some spurious points at the end of the segment we obtain another segment with probability higher than the one of S and having still a number of false alarms less than 1, which means that this new segment is still meaningful (see figure). In the same way, it is likely to happen in general that many subsegments of S having a probability higher than the one of S will still be meaningful (see experimental Section, where this problem obviously occurs for the pencil strokes image). These remarks justify the introduction of the following notion of maximal segment. Definition 6 (Maximal segment) A segment A is maximal if 1. it does not contain a strictly more meaningful segment : B A, NF (B) NF (A), 2. it is not contained in a more meaningful segment : B A, NF (B) > NF (A), Then we say that a segment is maximal meaningful if it is both maximal and meaningful. This notion of maximal meaningful segment is linked to what Gestaltists called the masking phenomenon. According to this phenomenon, most parts of an object are masked by the object itself except the parts which are significant from the point of view of the construction of the whole object. For example, if one considers a square, the only significant segments of this square are the four sides, and not large parts of the sides. With our definition, long enough parts of a side may be meaningful segments, but only the whole side itself will be a maximal meaningful segment. Proposition 10 (Properties of maximal segments) Let A be a maximal segment, then 1. the two endpoints of A have their direction aligned with the direction of A, 2. the two points next to A (one on each side) do not have their direction aligned with the direction of A. This is an easy consequence of Proposition 1.

47 2.5. MAXIMAL MEANINGFUL SEGMENTS A conjecture about maximality Up to now, we have established some properties that permit to characterize or compare meaningful segments. We now study the structure of maximal segments, and give some evidence that two distinct maximal segments on a same straight line have no common point. Conjecture 1 If (l, l, l ) [1, + ) 3 and (k, k, k ) [0, l] [0, l ] [0, l ], then ( min p, P (k, l), P ) (k + k + k, l + l + l ) < max ( P (k + k, l + l ), P ) (k + k, l + l ). (2.9) This conjecture can be deduced from a stronger (but simpler) conjecture : the concavity in a particular domain of the level lines of the natural continuous extension of P involving the incomplete Beta function. Let us state immediately some relevant consequences of Conjecture 1. Corollary 4 (Union and Intersection) If A and B are two segments on the same straight line, then, under Conjecture 1, ( ) min pn 4, NF (A B), NF (A B) ( ) < max NF (A), NF (B) This is a direct consequence of Conjecture 1 for integer values of k, k, k, l, l and l. Numerically, we checked this property for all segments A and B such that A B 256. For p = 1/16, we obtained ( ) ( ) max (NF (A), NF (B) min pn 4, NF (A B), NF (A B) min ( ) ( ) > 0, A B 256 max (NF (A), NF (B) + min pn 4, NF (A B), NF (A B) this minimum (independent of N) being obtained for A = (23, 243), B = (23, 243) and A B = (22, 230) (as before, the couple (k, l) we attach to each segment represents the number of aligned points (k) and the segment length (l)).. Theorem 2 (maximal segments are disjoint under Conjecture 1) Suppose that Conjecture 1 is true. Then, any two maximal segments lying on the same straight line have no intersection. Notice that this property applies to maximal segments and not only to maximal meaningful segments. Proof : Suppose that one can find two maximal segments (k +k, l +l ) and (k +k, l +l ) that have a non-empty intersection (k, l) Then, according to Conjecture 1 we have ( ) ( ) min p, P (k, l), P (k + k + k, l + l + l ) < max P (k + k, l + l ), P (k + k, l + l ).

48 48 CHAPTER 2. MEANINGFUL ALIGNMENTS If the left hand term is equal to p, then we have a contradiction since one of (k + k, l + l ) or (k + k, l + l ) is strictly less meaningful than the segment (1, 1) it contains. If not, we have another contradiction because one of (k + k, l + l ) or (k + k, l + l ) is strictly less meaningful than one of (k, l) or (k + k + k, l + l + l ). Remark : The numerical checking of Conjecture 1 ensures that for p = 1/16 (but we could have checked for another value of p), two maximal meaningful segments with total length smaller than 256 are disjoint, which is enough for most practical applications A simpler conjecture In this subsection, we state a simple geometric property entailing Conjecture 1. Conjecture 2 The map (k, l) P (k, l) defined in Lemma 3 has negative curvature on the domain D p = {(k, l) R 2, p(l 1) + 1 k l}. It is equivalent to say that the level curves l k(l, λ) of P defined by P (k(l, λ), l) = λ are concave, i.e. satisfy (k 0, l 0 ) D p, 2 k l 2 (l 0, P (k 0, l 0 )) < 0. Remark : All numerical computations we have realized so far for the function P (k, l) have been in agreement with Conjecture 1. Concerning theoretical results, we shall see in the next section that this conjecture is asymptotically true. For now, the following results show that Conjecture 1 is satisfied for the Gaussian approximation of the binomial tail (correct for small deviations, that is k pl + C l) and also for large deviations estimate. Proposition 11 The approximation of P (k, l) given by the Gaussian law G(k, l) = 1 2π + α(k,l) has negative curvature on the domain D p. e x2 2 dx where α(k, l) = k pl lp(1 p) Proof : The level lines G(k, l) = λ of G(k, l) can be written under the form k(l, λ) = pl + f(λ) l, with f > 0 on the domain {k > pl}. Hence, we have and consequently curv(g) < 0 on D p. 2 k f(λ) (l, λ) = l2 4l 3/2 We shall investigate Conjecture 1 with several large deviations arguments. Cramér s theorem about large deviations (see [15], for example) applied to Bernoulli random variables yields to the following:

49 2.5. MAXIMAL MEANINGFUL SEGMENTS 49 Proposition 12 (Cramér) Let r be a real number such that 1 r > p, then lim l + 1 l ln P [S l rl] = r ln r 1 r (1 r) ln p 1 p = g r(r) + g r (p). Notice that Proposition 12 gives the asymptotic estimate of ln P [S l rl] but not the asymptotic estimate of P [S l rl]. Notice also that the limit given by Proposition 12 was the upper bound of ln P [S l rl] given by Hoeffding s inequality (see Proposition 3). Theorem 3 The large deviations estimate of ln P (k, l) (see Proposition 12) given by H(k, l) = has negative curvature on the domain {pl k l}. [ k ln k (l k) ln l k ] pl (1 p)l Proof : The level lines of H(k, l) are defined by k(l, λ) ln k(l, λ) pl + (l k(l, λ)) ln l k(l, λ) (1 p)l = λ. We fix λ and we just write k(l, λ) = k(l). If we compute the first derivative of the above equation and then simplify we get: k (l) ln k(l) k (l) ln(pl) + (1 k (l)) ln(l k(l)) (1 k (l)) ln((1 p)l) = 0. Now, again by differentiation, we get k (l) ln (1 p)k(l) p(l k(l)) 1 l + k (l) 2 k(l) + (1 k (l)) 2 l k(l) = 0. It is equivalent to: k (l) ln (1 p)k(l) p(l k(l)) = (k(l) k (l)l) 2 lk(l)(l k(l)), which shows that H(k, l) has negative curvature on the domain pl k l Proof of Conjecture 1 under Conjecture 2 Lemma 6 (under Conjecture 2) If k 1 > p(l 1) and µ > 0, then the map x P (k + µx, l + x) has no local minimum at x = 0.

50 50 CHAPTER 2. MEANINGFUL ALIGNMENTS Proof : Call f this map, it is sufficient to prove that either f (0) 0 or (f (0) = 0 and f (0) < 0). If f (0) = 0, then so that thanks to Conjecture 2. µ = P l P k (k, l), f (0) = µ 2 Pkk + 2µ P kl + P ll = curv( P )(k, l) ( P k 2 + P l 2 ) 3/2 P k 2 We now can prove Conjecture 1 under Conjecture 2. Proof : Because the inequality we want to prove is symmetric in k and k, we can suppose that k /l k /l. If k + k 1 p(l + l 1), then P (k + k, l + l ) > p and we have finished. Thus, in the following we assume k + k 1 > p(l + l 1). Let us define the map f(x) = P ( k + x(k + k ), l + x(l + l ) ) for x [0, 1]. We remark that for x 0 = l /(l + l ) ]0, 1[, we have k + x 0 (k + k ) = k + l l + l (k + k ) k + l l + l (k + k l l ) = k + k, which implies that P (k + k, l + l ) f(x 0 ). Hence, it is sufficient to prove that ( ) min p, f(0), f(1) < f(x 0 ). The set S = < 0 { x [0, 1], k + x(k + k ) 1 p ( l + x(l + l ) 1 ) } > 0 is a connected segment that contains x 0 because k + x 0 (k + k ) 1 k + k 1 > p(l + l 1) = p ( l + x 0 (l + l ) 1 ). Moreover, S contains 0 or 1 because the linear function involved in the definition of S is either 0 or vanishes only once. Since f has no local minimum on S thanks to Lemma 6, we conclude as announced that f(x 0 ) > min f(x) = min f(x) min (p, f(0), f(1)), x S x S since if x S ]0, 1[, then f(x) p thanks to Lemma 4. Remark : This proof (and the proof of Lemma 6) only relies on the fact that there exists some smooth interpolation of the discrete P (k, l) that has negative curvature on the domain D p. There are good reasons to think that the P (k, l) approximation satisfies this property, but it could be that another approximation also does, though we did not find any (for example, the piecewise bilinear interpolation of P (k, l) is not appropriate). On Figure 2.3, we give the geometric idea underlying the proof of Conjecture 1 under Conjecture 2.

51 2.5. MAXIMAL MEANINGFUL SEGMENTS 51 k. (k+k +k,l+l +l ). (k+k,l+l ). (k+k,l+l ).. (k,l) k-1=p(l-1) 0 l Figure 2.3: Geometric idea of the proof of Conjecture 1 under Conjecture 2. We assume that P (k + k, l + l ) P (k + k, l + l ). We represent the concave level line of P passing by (k + k, l + l ). The point (k + k, l + l ) is above this level line (indeed, P k < 0). Since the segments [(k + k, l + l ), (k + k, l + l )] and [(k, l), (k + k + k, l + l + l )] have the same middle point, one sees that one of the points (k, l) and (k + k + k, l + l + l ) must lie above the concave level line Partial results about Conjecture 1 In this section, we shall give an asymptotic proof of Conjecture 2. In all the following, we assume that p and r satisfy 0 < p < r < 1 and p < 1/2. The proof relies on the two following technical propositions: Proposition 13 and Proposition 14, both proved by Lionel Moisan in [18] and [54]. Proposition 13 (precise large deviations estimate) Let [ exp l p(1 p) D(rl + 1, l + 1) = (r p) 2πlr(1 r) ( r ln r 1 r + (1 r) ln p 1 p Then, for any positive p, r, l such that p < r < 1 and p < 1/2, one has )]. (2.10) 4r 1 (r p) 2 l(1 p) r(1 r) 2πlr(1 r) P (rl + 1, l + 1) D(rl + 1, l + 1) 1. (2.11) 2 1 2πlr(1 r) In particular, on has P (rl + 1, l + 1) D(rl + 1, l + 1) l + uniformly with respect to r in any compact subset of ]p, 1[.

52 52 CHAPTER 2. MEANINGFUL ALIGNMENTS Notice that the exponential term in (2.10) corresponds to Hoeffding s inequality (see Theorem 3). Proposition 14 For any λ [0, 1] and l > 0, there exists a unique k(l, λ) such that P (k(l, λ) + 1, l + 1) = λ. (2.12) Moreover, one has 2 k ( ) l, P (rl + 1, l + 1) l 2 l + ( r ln r 1 r + (1 r) ln p 1 p l r(1 r) ( ln r(1 p) (1 r)p ) 2 ) 3. (2.13) uniformly with respect to r in any compact subset of ]p, 1[. It is interesting to notice that (2.13) remains true when k(l, λ) is defined not from P but from its estimate D given by (2.10). In the same way, one can prove that k ( ) l, P (rl + 1, l + 1) l l + ln 1 p 1 r r(1 p) ln (1 r)p is satisfied by both definitions of k(l, λ). This proves that (2.10) actually gives a very good estimate of P, since it not only approximates the values of P but also its level lines up to second order. Theorem 4 (asymptotic proof of Conjecture 2) There exists a continuous map L :]p, 1[ R such that (k, l) P (k, l) has negative curvature on the domain { } Dp L = (rl + 1, l + 1), r ]p, 1[, l [L(r), + [. This result is illustrated on Figure 2.4. Proof : Define k(l, λ) by (2.12). Thanks to Proposition 14, the function r 2 k l 2 ( l, P (rl + 1, l + 1) ) ( r(1 p) l r(1 r) ln (1 r)p ( r ln r 1 r + (1 r) ln p 1 p ) 3 ) 2 tends to 1 as l goes to infinity, and the convergence is uniform with respect to r in any compact subset of ]p, 1[. Thus, we deduce that the map { r l(r) = inf l 0 > 0, l l 0, curv P } (rl + 1, l + 1) < 0

53 2.6. EXPERIMENTS 53 k k-1=r(l-1) D L p D p k-1=p(l-1) l Figure 2.4: Conjecture 2 is proven on a subdomain D L p of D p. is bounded on any compact subset of ]p, 1[. Now, defining L(r) as a continuous upper bound for l(r) yields the desired result. For example, on can take L(r) = sup d n (r), n Z where d n is the unique linear function passing through the points ( a n 1, max t [an 2,a n] l(t) ) and ( a n, max t [an 1,a n+1 ] l(t) ), and (a n ) n Z an increasing sequence such that lim n a n = p and lim n + a n = Experiments In all the following experiments, the direction at a pixel in an image is computed on a 2 2 neighborhood with the method described in section 2.1 (q = 2) and the precision is p = 1/16. The direction is computed at all pixels, unless the gradient is strictly equal to zero (up to machine precision). Let N denote the size of the considered image. The algorithm used to find the meaningful segments is the following. For each one of the four sides of the image, we consider for each pixel of the side the lines starting at this pixel, and having an orientation multiple of π/n, with n larger than 48. And then on each line, we compute the meaningful segments : let Lmax be the length of the considered line. We first compute for each 1 i Lmax, the number of aligned points K(i) contained in the segment beginning at the first point and ending at the i th point, for simplicity we will denote this segment by [1, i]; this is recursive and just need one run along the line. We then build an array of size Lmax Lmax, denoted by NF, and containing the number of false alarms of all the segments of the line : for 1 i j Lmax, we have NF (i, j) = N 4 P (K(j) K(i 1), j i + 1), and if it is less than ε, then the segment [i, j] is ε-meaningful. Notice that P (k, l) can be simply tabulated at the beginning

54 54 CHAPTER 2. MEANINGFUL ALIGNMENTS of the algorithm using the property P (k + 1, l + 1) = pp (k, l) + (1 p)p (k + 1, l). In order to find maximal meaningful segments, we build two arrays: the first one denoted by NF inf and the second one by NF sup, and defined by : NF inf (i, j) is the minimal value of the number of false alarm of the segments contained in the segment [i, j] and NF sup (i, j) is the minimal value of the number of false alarm of the segments containing the segment [i, j]. We do this using the recursive properties: NF inf (i, j) = min(nf (i, j), NF inf (i + 1, j), NF inf (i, j 1)) and NF sup (i, j) = min(nf (i, j), NF inf (i 1, j), NF inf (i, j + 1)), and thus this is done with two runs (one for each array) of the initial array NF. Finally, a maximal meaningful segment is defined by NF (i, j) ε, NF (i, j) NF inf (i, j) and NF (i, j) NF sup (i, j). The complexity of this algorithm is about n N 3 :n N is the number of lines we consider, and for each line the number of points is less than N, thus the number of segments is less than N 2 and we just need four runs over the set of segments. Typical CPU time on a Pentium II, 350 MHz is ten seconds for a , and one second for a image. The value 48N 3 is a low estimate of the number of considered segments. Because of the angle precision 2π/16 (to be compared with π/48), the sampling of directions is enough to cover all possible alignments in a image. It must be made clear that we applied exactly the same algorithm to all presented images, which have very different origins. The only parameter of the algorithm is precision. We fixed it equal to 1/16 in all experiments ; this value corresponds to the very rough accuracy of 360/16 = 22.5 degrees ; this means that (e.g.) two points can be considered as aligned with, say the 0 direction if their angles with this direction are up to ±11.25 degrees! It is clear that these bounds are very rough, but in agreement with the more pessimistic estimates for the vision accuracy in psychophysics and the numerical experience as well. Moreover, in all experiments, we only keep the meaningful segments having in addition the property that their endpoints have their direction aligned with the one of the segment: black points represent points on a meaningful segment which have the same direction as the one of the segment (with the precision p), and grey points represent points on a meaningful segment which do not have the same direction as the segment. For each one of the following images, we compute 1. all the meaningful segments. 2. the maximal meaningful segments. 3. for some of them: meaningful segments with length less than 30 or 20. These segments have a small length (close to the minimal length l min = 4 ln N/ ln p), and consequently a density of aligned points close to 1. Typical CPU time for a image is ten seconds, and one second for a image. As a general comment to all experiments, we shall see that the (non maximal) meaningful events are too long : indeed, if we find a very meaningful segment (and this happens very systematically in the experiments), then much larger segments containing this

55 2.6. EXPERIMENTS 55 very meaningful one will still be meaningful. We display for the first image all meaningful alignments. In continuation, we display the maximal meaningful alignments, as a way to check by comparison that these maintain the whole alignment information, and are by far more accurate. We think the experiments clearly demonstrate the necessity of maximality. We also display in several images the only alignments whose length is smaller than a given threshold (20 or 30). This is a way to check that, in natural images, most alignments can be locally detected. Indeed, we see that most maximal detected alignments are a concatenation of small, still meaningful, alignments. Image 1 : Pencil strokes. This digital image was first drawn with a ruler and a pencil on a standard A4 white sheet of paper, and then scanned into a digital image (a); the scanner s apparent blurring kernel is about two pixels wide and some aliasing is perceptible, making the lines somewhat blurry and dashed. Two pairs of pencil strokes are aligned on purpose. We display in the first experiment all meaningful segments (b). Four phenomena occur, which are very apparent in this simple example, but will be perceptible in all further experiments. 1. Too long meaningful alignments : we commented this above ; clearly, the pencil strokes boundaries are very meaningful, thus generating larger meaningful segments which contain them. 2. Multiplicity of detected segments. On both sides of the strokes, we find several parallel lines (reminder : the orientation of lines is modulo 2π). These parallel lines are due to the blurring effect of the scanner s optical convolution. Classical edge detection theory would typically select the best, in terms of contrast, of these parallel lines. 3. Lack of accuracy of the detected directions : we do not check that the directions along a meaningful segment be distributed on both sides of the lines direction. Thus, it is to be expected that we detect lines which are actually slanted with respect to the edge s true direction. Typically, a blurry edge will generate several parallel and more or less slanted alignments. It is not the aim of the actual algorithm to filter out this redundant information ; indeed, we do not know at this point whether the detected parallel or slanted alignments are due to an edge or not : this must be the object of a more complex algorithm. Everything indicates that an edge is no way an elementary phenomenon in Gestalt. We display in the second experiment for this image all maximal meaningful segments (c), which show for each stroke two bundles of parallel lines on each side of the stroke. In the third one, we display all meaningful segments whose length is less than 60 pixels (d). This achieves a kind of localization of the segments. Now, a visual comparison between this experiment and the former one (c) shows that maximality achieves a better, more accurate localization.

56 56 CHAPTER 2. MEANINGFUL ALIGNMENTS Image 2 : White noise blurred images. Image (a) is a white noise (size ), all pixels values being independent and identically distributed with a gaussian law. Image (b) is Image (a) convolved with a gaussian kernel with standard deviation 4 pixels. We apply the same algorithm as before to these images. The outcome was : no alignment detected! This experiment was devised to show that the local independence of pixels can be widely violated without affecting the final outcome. Indeed, a blurring creates local alignments but not global ones. Image 3 : Uccello s painting. This image (a), with size , is a result of the scan of an Uccello s painting: Presentazione della Vergine al tempio (from the book L opera completa di Paolo Uccello, Classici dell arte, Rizzoli). In image (b) we display all maximal meaningful segments and in image (c) all meaningful segments with length less than 60. Notice how maximal segments are detected on the staircase in spite of the occlusion by the going up child. Compare with the small meaningful segments. All remarks made in Image 1 apply here (parallelisms due to the blur, etc.) Image 4 : Airport image (size ). This digital image also has a noticeable aliasing which creates horizontal and vertical dashes along the edges. We display in image (b) all maximal detectable segments, always for ε = 1/10. We compare in image (c) and (d) with the same image with ε = 1/100 and ε = 1/1000. Image 5 : A road (courtesy of INRETS), size We display all maximal meaningful segments (b) and all meaningful segments with length less than 60 (c). Notice the detected horizontal lines in (b): they correspond to horizon lines, that is, lines parallel to the horizon. They tend to accumulate towards the horizon of the image. Such lines correspond to nonlocal alignments (they are not present in Image (c)). They are due to a perspective effect : all visual objects on the road (shadows, spots, etc.) are seen in very slanted view. Thus, their contours are mostly parallel to the horizon, thus generating what we should call perspective alignments. Image 6 : Two disks. This image (a) has size and it represents two slightly blurred disks. On Image (b), we compute the maximal meaningful segments. The boundaries of the disks are found as the union of small tangent segments. We also observe two bitangent alignments between the two disks. We will discuss this at the end of this chapter. Image 7: Building in Cachan (Figure 2.11 (a)). The size of this image is On Figure (b), we display all maximal ε-meaningful segments for ε = Notice that we find a lot of diagonal alignments. The explanation of this phenomenon is the fact that when we have many long and parallel alignments (for example at the top of the building), we also detect slanted (with angle less than the precision p) alignments. In Figure (c) we only display a minimal length description of the same segments. This means that, once detected, the alignments must be given their best explanation. One point of the image may belong to many maximal meaningful alignments. We say that a point x is maximal for a segment S if x belongs to S, the direction at point x is aligned (up to precision p) with the direction of the segment S and if S is the most meaningful (smallest number of false alarms) segment

57 2.7. ABOUT THE PRECISION P 57 containing x and aligned with the direction at x. Finally, on Figure (c), we only display the maximal meaningful segments of (b) having the property that they are still meaningful when we only count as aligned the number of maximal points they contain. The definition of maximal meaningful segments was already a first minimal description of the set of all meaningful alignments. Maximality is a way, along a line, to keep the best representatives of the set of meaningful segments. The MDL described above is a second step, in some sense more general since it is allowed to compare segments that are not necessarily on the same line. The property that maximal meaningful segments lying on the same line have an empty intersection, is not a direct consequence of the definition of maximality : it is all the contrary since it is still a conjecture in the general case, and it is just numerically checked for the usual values of the numbers k and l. We could have taken another definition of maximality, where the fact that maximal segments cannot meet would have been part of the definition. For example, considering the point of view of coding (see [13]), we may want a description of a line as a sequence of nothing to declare and alignment, with minimal code length. Indeed, a line may be represented by a sequence of ones and zeros ( ) where a 1 is an aligned point and a 0 is a non-aligned point. An optimal code for this sequence will give a description of the line as a succession of segments of zeros (i.e. nothing particular to declare ) and segments of ones (i.e. an alignment is present), with as few segments as possible and as close as possible to the original sequence (the weights of these two opposite requirements is automatically fixed by the code). This kind of description will directly provide a set of disjoint alignments. With such a definition, we may probably obtain a result close to the one we obtain by maximality. Now, the complexity of our algorithm seems to be lower since we do not compute any global minimization along all the line. 2.7 About the precision p In this subsection, we address the problem of the choice of the precision p. We show that it is useless to decrease artificially the precision p : this yields no better detection rates. We consider a segment S of length l. We can assume that the direction of the segment is θ = 0. Suppose that among the l points, we observe k aligned points with given precision p (i.e. k points having their direction in [ pπ, +pπ]). Now, what happens if we change the precision p into p/10 (for example)? Knowing that there are k points with direction in [ pπ, +pπ], we can assume (by Helmholtz principle) that the average number of points having their direction in [ p 10 π, + p 10π] is k/10. The aim now is to compare B(l, k, p) and B(l, k 10, p 10 ), where B(l, k, p) = P (k, l) for precision p (in the notation P (k, l), we omitted the precision p because it was fixed).

58 58 CHAPTER 2. MEANINGFUL ALIGNMENTS (a) The original image. (b) All meaningful segments. (c) Maximal meaningful segments. (d) Meaningful segments whose length is less than 60 pixels. Figure 2.5: Pencil strokes

59 2.7. ABOUT THE PRECISION P 59 (a) White noise : no alignment detected (ε = 1). (b) Convolution of image (a) with a gaussian kernel with S.D.=4 pixels : no alignment detected (ε = 1). Figure 2.6: White noise blurred images

60 60 CHAPTER 2. MEANINGFUL ALIGNMENTS (a) The original image. (b) Maximal meaningful segments. (c) Meaningful segments whose length is less than 60 pixels. Figure 2.7: Uccello s painting

61 2.7. ABOUT THE PRECISION P 61 (a) The original image. (b) Maximal ε-meaningful segments, for ε = 1/10. (c) Maximal ε-meaningful segments, for ε = 1/100. (d) Maximal ε-meaningful segments, for ε = 1/1000. Figure 2.8: Airport image

62 62 CHAPTER 2. MEANINGFUL ALIGNMENTS (a) The original image. (b) Maximal meaningful segments. (c) Meaningful segments whose length is less than 60 pixels. Figure 2.9: A road

63 2.7. ABOUT THE PRECISION P 63 (a) The original image (b) Maximal meaningful segments. Figure 2.10: Two disks

64 CHAPTER 2. MEANINGFUL ALIGNMENTS (a) The original Cachan building image (b) Maximal ε-meaningful alignments with ε = 10 6.

64 64 CHAPTER 2. MEANINGFUL ALIGNMENTS (a) The original Cachan building image (b) Maximal ε-meaningful alignments with ε = (c) Minimal description of the set of maximal meaningful segments Figure 2.11: Building image: maximal meaningful segments and their minimal description.

65 2.8. EXTENSION TO CIRCLES 65 Remark : A non-aligned point for precision p is also non-aligned for precision p/10. Since we are interested in meaningful segments, we will only consider the case λ = k l p = k/10 l p/10 > 1. We then have to study the function p B(l, λlp, p). Is it increasing, decreasing,...? If we consider the large deviations estimate given by ( λlp G(l, λlp, p) = l log λlp l lp + (1 λlp l ) log 1 ) λlp l, 1 p we can easily prove that the function p λp log λ + (1 λp) log 1 λp 1 p λ > 1). Consequently p G(l, λlp, p) decreases. Thus is increasing (for This inequality has several consequences: G(l, k, p) < G(l, k 10, p 10 ). If the observed alignment at precision p/10 is meaningful, then the original alignment at precision p is more meaningful. The previous argument shows that we must always take the precision as coarse as possible, because when we observe a meaningful alignment at a very good precision (i.e. p very small), then the best explanation of this alignment is maybe at a larger precision. This result is experimentally checked on Figure 2.12, where we compute the maximal meaningful alignments of the same image (original image (a): the Megawave image, size ), first for the usual precision p = 1/16 and then for p = 1/32. Almost all the alignments detected at precision 1/32 are already detected at precision 1/16. Remark : A natural question is: is also p B(l, λlp, p) decreasing? If we consider very small precisions, like for example p = 1/64, then we observe a new phenomenon: we detect artefactual meaningful alignments due to the quantization of grey-levels of the image, which creates a quantization of the orientations. This problem is addressed in the next chapter. 2.8 Extension to circles In this section, we will see that what we did for segments can be straightforward applied to the detection of meaningful arc of circles. We first compute the total number of arcs of circles in an image with given size N N: this number is about N 5 (N 2 for the choice of the center of the circle, N for its radius, N for the starting point of the arc and N for the ending point).

66 66 CHAPTER 2. MEANINGFUL ALIGNMENTS (a) The original image. (b) Maximal meaningful segments for precision p = 1/16. (c) Maximal meaningful segments for precision p = 1/32. Figure 2.12: Decreasing the precision: p = 1/16 and p = 1/32.

67 2.9. PARTIAL CONCLUSION 67 At each point of the image we compute, as for alignments, an orientation. Let C be an arc of circle, and let l be its length (counted in independent points). We count the number of points, among the l, having the property that their orientation is aligned (up to the fixed precision p = 1/16) with the tangent to the circle at the considered point. An example of the result we obtain is shown on Figure On this figure, we observe that many detected circles are not due to the effective presence of circles in the image but are explained by bitangency: see for example on the coat, just under the M. This is the same problem we met with alignments (see the image with the two disks), see also Figure Figure 2.13: The problem of bitangency for arc of circles and also for alignments. These detected arcs or alignments (dashed lines) are not wrong since they have an explanation. For instance, the alignment is due to the presence of two smooth curves in the image. In such a case, a better explanation is given by both curves as gestalts. These examples fall under the general concept of masking in Gestalt theory. Masking means that partial gestalts can be masked (unseen) if a more complete global explanation is at hand. 2.9 Partial conclusion This preliminary study about Gestalt has tried to build the correct mathematical framework for the widespread idea that significant geometric structures in an image correspond to very low probability events. They are two ways to interpret this statement : the wellspread one is to define a probabilistic functional which is minimized, thus yielding the most likely geometric structures. Now, we emphasized the fact that the detection of structure has an intermediate stage, clearly missed in the variational framework : before we look for the most likely structures, we have to make a list of all proven structures. Experiments show well the difference between both approaches : where edge detection algorithm (which always look for the best position for an edge) directly yield a single edge, we find multiple alignments. In many cases, it is plain from the experiments that edge detection could be interpreted as a selection procedure among the alignments. To summarize, we have two different qualities which are mixed in the variational framework : the feasibility and the optimality. By looking for optimality only, we forget to prove that the found, optimal structures indeed exist. Next, we proposed an alternative to global variational principles : the notion of maximal event. In some extent, maximal alignments are local minimizers of a probability functional. The main

68 68 CHAPTER 2. MEANINGFUL ALIGNMENTS (a) The original image (b) Meaningful arcs of circles. Figure 2.14: Meaningful arc of circles.

69 2.9. PARTIAL CONCLUSION 69 difference is first that we do a minimization among feasible structures only; second, that we get additional structure properties from maximality, as the fact that maximal alignments do not intersect. It may well be asked at that point what we can further do. We have considered one Gestalt quality only : the alignment. A first question is : to which other qualities the notions developed here apply? We have already seen that this theory of meaningfulness can be directly extended to the case of arc of circles. In the next chapters, we will develop others instances: the modes of an histogram (useful for the gestalts same length and parallelism, i.e. same orientation ) and the contrast. A second question which was raised by Lowe is the combination of several Gestalt qualities to generate more elaborate geometric structures. Edge detection is such an elaborate geometric structure : it is a combination of alignment (or curviness), of contrast along the edge curve, of homogeneity on both sides, of maximality of the slope and finally of stability across scales! (see the scale space theory). All of these criteria contribute to more and more sophisticated edge detectors. In this chapter, we have shown that one of the qualities involved, the alignment, can be proved separately. The other qualities can receive an analogous, if not sometimes identical theories of meaningfulness. Now, the question of how we should let such qualities collaborate will remain open.

70 70 CHAPTER 2. MEANINGFUL ALIGNMENTS

71 Chapter 3 Dequantizing image orientation In this chapter, we address the problem of computing a local orientation map in a digital image. We show that standard image grey-level quantization causes a strong bias in the repartition of orientations, hindering any accurate geometric analysis of the image. In continuation, a simple dequantization algorithm is proposed, which maintains all of the image information and transforms the quantization noise in a nearby gaussian white noise (we actually prove that only gaussian noise can maintain isotropy of orientations!) Mathematical arguments are used to show that this results in the restoration of a high quality image isotropy. In contrast with other classical methods, it turns out that this property can be obtained without smoothing the image or increasing the SNR. As an application, it is shown in the experimental section that, thanks to this dequantization of orientations, such geometric algorithms as the detection of meaningful alignments can be performed efficiently. We also point out (experimentally) similar improvements of orientation quality for aliased images by the same algorithm. 3.1 Introduction Let u be a grey-level image, where x denotes the pixel and u(x) is a real value. Most natural (non synthetic) images are generated in the following way : a source image s(x) is assumed to be of infinite resolution. A band limited optical smoothing is performed on s, yielding a smoothed version k s. By Shannon-Whittaker theory, the band-limited image can be sampled on a regular and fine enough grid. Let us denote by Π the Dirac Comb of this grid. Then u is roughly obtained as u = (s k).π, which yields the discrete, digital image. According to Shannon-Whittaker Theorem, s k can be recovered from u by the so called Shannon interpolation, using a basis of sinc functions. Actually, this model is significantly idealized, since other operations result in a substantial image degradation, namely a white photonic and/or electronic noise n, a windowing (Π is not infinite, but restricted to a rectangle) and, last but not least, a quantization Q. Thus, the realistic image model is u = Q[(k s).π + n], 71

72 72 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION in which we neglect the windowing effect as affecting essentially the image boundary. In this chapter, we address the problem of computing accurately and in an unbiased way the orientation of the gradient of u, defined as the number θ [0, 2π[ such that exp(iθ) = Du/ Du, where Du = (u x, u y ) denotes the image gradient. When we refer to the gradient of u, we wish to refer to the gradient of the smooth subjacent image, in as much as we consider u to be Shannon interpolable. If we assume, which is realistic enough, that k and n are isotropic, we are led to address the effect of the quantization Q on the field of orientations. We discovered that this effect is strong and leads to a very biased field of orientations (see Figure 3.6). It can hinder any faithful geometric analysis of the image, unless some previous restoration is performed. Before explaining how we shall address this restoration, let us give an example where this restoration is crucial in order to perform a correct geometric analysis in the image. This is a particular instance, but let it be mentioned that all probabilistic methods using local pixel interactions (e.g. Markov random field models) would suffer, knowingly or not the same effect. In the previous chapter meaningful alignments, we proposed a grouping, nonlocal, method for the detection of alignments in an image. In a few words, the principle of the method is as follows (see also [17]). We assume that each point in the image has an orientation θ(x) with relative precision p > 0. We call independent points any set of points in the image whose relative distances are larger than the Nyquist distance, so that we may consider them as independent observations, not affected by the optical smoothing. If the image is decently sampled, this distance is about 2 pixels. We consider a segment S of aligned independent points in the image, with length l. Assume we have observed k points on S having their orientation equal, with precision p, to the orientation of S. We say that this alignment is meaningful if, under the assumption that the orientations at each point are independent isotropic random variables, the expectation of the event at least k points aligned among l is much smaller than 1. We tried to prove that all perceptible alignments are detectable as such large deviations from the white noise assumption. In Figure 3.10b), we show all segments detected in a natural image by this method at precision p = 1/16. It can be visually checked that no detected segment seems to be artefactual, i.e. due to image generation. Let us now choose a precision of orientation p = 1/64. This precision may seem exaggerate, but can be successfully used in an image with strong gradients. Indeed, at points with strong gradients, the SNR of orientation is obviously good. Figure 3.10c) shows the detected alignments, who are, according to our definition, highly noncasual. Clearly, such detections are artefactual and are the result of image generation. After some inquiry, it turned out that the grey-level quantization is responsible for such artefactual detections. Actually, this does not mean that the alignment detection is wrong, but only that the detected alignments are image generation artefacts. In Figure 3.10d), we show the result of alignment detection (at precision p = 1/64) on the same picture, after the dequantization we propose here has been performed. Let us therefore go back to the problem of computing a reliable orientation. The first good answer to this problem is known as the dithering method [21] which consists in adding a noise before quantization and then substracting the same noise sample from the quantized

73 3.1. INTRODUCTION 73 image. This results in decreasing the SNR of the image, but turns out to maintain better the image aspect and its isotropy under strong quantization. Unfortunately, the dithering method has been to the best of our knowledge fully abandoned in image generation devices. To summarize, image isotropy can be restored by dithering to the cost of decreasing the SNR, but this is a degradation and should anyway be performed in the image generation process itself: this is not generally the case. A second easy answer, very in use, consists in smoothing the image by some convolution kernel, and only retaining the orientation at points where the gradient is high and stable across scales. This is the classical edge detection method [81], [51], [7] and, for more up to date methods, [50]. There is nothing to object to this method, since at the end it retains edge points which are very local, confirmed at larger scales though. Now, clearly, many orientations in the image are usable for detecting alignments, which are not computed at edge points: the edge points simply are a particularly good selection, but sparse. Another way, addressed recently and successfully by several authors [63], [71] consists in defining an orientation scalespace. Also, the affine scale space ([70], [3]) provides a way to compute a multiscale orientation of level lines. In all cases, the objective is different and wider than just computing a local orientation : the aim of these methods is to compute a multiscale orientation map which has to be considered by itself as a nonlocal analysis of the image. These methods are better than edge maps methods in the sense that they provide an orientation at all points. They are all the same not appropriate for image analysis models based on local observations (e.g. most probabilistic methods) as the one we outlined before. Indeed, they do not preserve the independence of points at Nyquist distance. The solution which we propose to dequantize the image should, according to the preceding discussion, satisfy the following requirements : to maintain the independence of local observations (no smoothing), to maintain all of the image information (thus the method must be invertible, and, to ask more, be an isometry in Hilbert space), it should give an unbiased orientation map, where quantization noise has been made isotropic. We shall actually prove that a simple and invertible operation, namely a Shannon (1/2, 1/2)- translation of the image, permits to remove the quantization effects on the orientation map. More precisely, we shall prove experimentally and mathematically that this translation transforms the quantization noise into a nearby gaussian white noise. We shall also prove that all reasonable local computations of the gradient, applied to the dequantized image, yield an unbiased orientation, even at points where the gradient is small. This remains true even when the quantization step is large. As a consequence, we point out the possibility of using in geometric analysis of images a very local estimate of the gradient, using therefore the full

74 74 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION image accuracy. The plan of this chapter is as follows. In Section 2, we consider a wide set of classical local computation methods for the gradient and show that they preserve an excellent isotropy, under the assumption that the image noise is uniform or gaussian. We also prove a converse statement: the image orientation will be isotropic if and only if the noise is gaussian! We analyze the bias introduced by quantization and show that its effect on orientation can be disastrous. In Section 3, we detail the proposed solution and make an accurate mathematical and practical analysis of the dequantization effect (using a flat regions model). We end with some experiments. 3.2 Local computation of gradient and orientation We consider a grey image u of size N M (that is N M pixels). At each point, we can compute an orientation which is the direction of the level line passing by the point, calculated on a 2 2 neighbourhood, which is the smallest possible to preserve locality. We define the orientation at pixel (n, m) by rotating by π/2 the direction of the gradient of the order 2 interpolation at the center of the 2 2 window made of pixels (n, m), (n + 1, m), (n, m + 1) and (n + 1, m + 1). Thus, we get (when the gradient is not zero) dir(n, m) = 1 G(n, G (n, m), m) where the gradient G(n, m) at point (n, m) is defined by G(n, m) = ( [u(n + 1, m) + u(n + 1, m + 1)] [u(n, m) + u(n, m + 1)] [u(n, m + 1) + u(n + 1, m + 1)] [u(n, m) + u(n + 1, m)] If we denote G(n, m) = R(n, m) exp(iθ(n, m)), then the direction at point (n, m) is θ(n, m) + π/2. The question is now to decide wether such a way of computing the orientation is valid or not (i.e. wether it gives or not some privilege to some directions). We will first study the case of a gaussian noise, and then the case of a uniform noise. In this section we prove that if the image u is a white noise, then there is no bias on the orientations (this means that, at each point, all orientations have an equal probability), and that, if u is a uniform noise, there is a small bias (orientations multiple of π/4 are slightly favoured). In the following, we shall use the following notations. We denote X 1, X 2, X 3 and X 4 the grey-level value at the four neighbouring pixels of a point. We denote u x and u y the components of the gradient: G = ( ux u y ) ( X2 + X = 4 X 1 X 3 X 1 + X 2 X 3 X 4 ). ).

75 3.2. LOCAL COMPUTATION OF GRADIENT AND ORIENTATION 75 We then define R and θ by: u x + iu y = R exp(iθ). (3.1) Our aim is to study the behaviour of θ as a function of the four values X 1, X 2, X 3 and X 4. X1 X2 X3 X4 Figure 3.1: Local computation of the gradient: the four neighbouring pixels White noise We first show that if the image u is a white noise, then there is no bias on the orientations (i.e. all orientations have an equal probability). Proposition 15 Let X 1, X 2, X 3 and X 4 be independent identically gaussian N (0, σ 2 ) distributed random variables. Then θ is uniformly distributed on [0, 2π]. Proof : The first point is to notice that if we denote A = X 2 X 3 and B = X 1 X 4, then A and B are independent and u x = A B and u y = A + B. Thus, from (3.1): A + ib = R 2 exp(i(θ π/4)). Now, A is gaussian with mean 0 and variance 2σ 2 because it is the sum of two independent gaussian random variables (with mean 0 and variance σ 2 ). And B is also gaussian for the same reason. Since A and B are independent, the law of the couple (A, B) is given by the density function f(a, b) = 1 4πσ 2 exp ( a2 + b 2 This function does not depend on the angle θ (it only involves the radius). This shows that the distribution of θ is uniform on [0, 2π]. 4σ 2 ). Proposition 16 (Converse proposition) Let X 1, X 2, X 3 and X 4 be four independent identically distributed random variables. Assume that their common law is given by a probability density f(x)dx, where f is square integrable and even. If the law of θ is uniform on [0, 2π], then the probability density f(x)dx is gaussian.

76 76 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION Proof : As we did in the proof of Proposition 15, we denote A = X 2 X 3 and B = X 1 X 4. A and B are independent and identically distributed. They have the same density function g(x)dx given by the convolution of x f(x) with itself: g(x) = + f(x y)f(y)dy. Since f is square integrable, the function x g(x) is continuous. The law of (A, B) is given by the density g(x)g(y)dxdy. Since the law of θ is uniform on [0, 2π], this shows that g(x)g(y) only depends on x 2 + y 2. Thus, there exists a function h : R R + such that (x, y) R 2 g(x)g(y) = h(x 2 + y 2 ). Consequently, for all x R, we have g(x)g(0) = h(x 2 ) (in particular, this shows that g(0) 0 and that g(x) = g( x)) and g(x) 2 = h(2x 2 ). Thus g(x) 2 = g(x 2)g(0). We consider now the function g defined for x 0 by g(x) = ln(g( x)/g(0)). Then, g(2x) = ln(g( 2x)/g(0)) = 2 ln(g( x)/g(0)) = 2 g(x). Since g is continuous and g(0) = 0, this shows that g is linear. Consequently, there exists σ R such that x R g(x) = g(0)e x2 /2σ 2. The constant g(0) is defined by the property g = 1. Thus, the law of A (and also B) is the gaussian distribution N (0, σ 2 ). We now prove that the law of the X i is also gaussian. Since g = f f, considering the Fourier Transform, we get ĝ(t) = f(t) 2 = Ce t2 σ 2 /2. Thus, f is gaussian. Since the inverse Fourier transform of a gaussian distribution is also gaussian, it shows that f is the gaussian distribution with mean 0 and variance σ 2 /2. This result has a strong practical consequence: if we wish to have a non biased orientation map for digital image, we must process the image in such a way that its noise becomes as gaussian as possible. We shall see that it is feasible with quantization noise. The following proposition is the generalization of Proposition 15, when the gradient is computed on a larger neighbourhood. Proposition 17 (Generalization) Assume that the components (u x, u y ) of the gradient are computed on n neighbourhing pixels: X 1, X 2,...X n, i.e. n u x = λ i X i and u y = i=1 n µ i X i. i=1

77 3.2. LOCAL COMPUTATION OF GRADIENT AND ORIENTATION 77 where λ i and µ i are real numbers such that n λ i µ i = 0 and i=1 n λ 2 i = i=1 n µ 2 i. i=1 If the X i are independent identically gaussian N (0, σ 2 ) distributed, then the angle θ is uniformly distributed on [0, 2π]. Proof : Every linear combination of u x and u y is gaussian because it is also a linear combination of the X i, which are independent and gaussian distributed. Thus (u x, u y ) is a gaussian vector. Since λ i µ i = 0, this implies that the correlation between u x and u y is 0. Since (u x, u y ) is a gaussian vector, this shows (see [24] for example) that u x and u y are independent. Moreover, the property λ 2 i = µ 2 i shows that u x and u y are gaussian with same mean and same variance. Finally, as in the proof of Proposition 15, the law of the couple (u x, u y ) is given by a density function f(x, y) which depends only on the radius x 2 + y 2 and not on the angle θ. Thus, θ is uniformly distributed on [0, 2π] Computation of orientation on nonquantized images In this section, we address the effect on the orientation histogram of applying the former described computation of the gradient. We shall see that the bias introduced by the method is small. It is not always realistic to assume that the local repartition of the grey-levels of an image is gaussian. Instead, we can roughly assume that the values at neighbouring points differ by a uniform random variable. Thus it is licit, or at least very indicative, to compute the orientation map of a uniform noise, in order to have an estimate of the bias on orientation provoked by this gradient computation. Let us therefore perform the computations in the following framework : consider an image whose values at pixels are independent random variables, identically uniformly distributed on [ 1/2, 1/2]. We then get a small bias on the orientation θ. More precisely, we have the following proposition. Proposition 18 Let X 1, X 2, X 3 and X 4 be independent random variables, identically uniformly distributed on [ 1/2, 1/2]. Then the law of θ is given by the density function g defined on [0, π/4] by g(θ) = 1 12 (1 + tan2 ( π 4 θ))(2 tan(π 4 θ)), and then defined on the remainder of the interval by symmetries with respect to π/4 and 0. Proof : We use the same notations as in the proof of proposition 15, i.e. we denote A = X 2 X 3 and B = X 1 X 4. The density probability of A = X 1 X 4 is given by the

78 78 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION convolution of the characteristic function of the interval [ 1/2, 1/2] with itself. The result is the triangle function h defined by 1 + x if 1 x 0 h(x) = 1 x if 0 x 1 0 otherwise. The law of B is the same as the law of A. Moreover A and B are independent. We want to know the law of θ where θ is defined by We denote α = θ π/4. A + ib = R 2 exp(i(θ π/4)). In order to compute the law of α, thanks to symmetries, we first consider the case 0 α π/4. The probability of having α with precision dα is the probability that y/x = tan α with precision (1 + tan 2 α)dα. Thus, we have x in [0, 1] with density (1 x)dx and y = x tan α with density (1 x tan α)dy where dy = x(1 + tan 2 α)dα (see also Figure 3.2). Thus, the law of α [0, π/4] is given by the density function f(α)dα = 1 x=0 x(1 + tan 2 α)(1 x tan α)dα (1 x) dx = 1 12 (1 + tan2 α)(2 tan α)dα. 1 dy y=x tan α dα α 0 x 1 Figure 3.2: Relation between α, x, y = x tan α and dy. And finally, since α = θ π/4 and by symmetries, we obtain the announced law for θ. Proposition 18 shows that if the pixels of the image have independent uniformly distributed values, then the orientations are not uniformly distributed. The law of the orientation θ is given by Figure 3.3. It shows that the orientations multiple of π/4 are favoured. If we want to measure the bias, we can compute the relative deviation from the uniform distribution on [ π, π]. We get: bias = 2π max θ g(θ) 1 2π This shows, however, that the bias is small, about 4.7%.

79 3.2. LOCAL COMPUTATION OF GRADIENT AND ORIENTATION Figure 3.3: Law of θ on [ π, π] when the image is a uniform noise, and comparison with the uniform distribution on [ π, π] (dotted line) Bias of quantization We saw in the previous section that the way we compute the orientation at a point of the image from the gradient does not create artefacts. Now, on the contrary, we will see that the histogram of orientations in the image is very sensitive to a quantization of grey-levels. Let us first consider the simplest case: a binary image. We assume that the grey-level at each pixel is 0 (black) or 1 (white). Then, the orientation θ of the gradient only takes a finite number of possible values: the multiples of π/4. On Figure 3.4, we describe for each possible local configuration of (X 1, X 2, X 3, X 4 ) the corresponding orientation θ of the gradient. The binary case is an extreme case. Let us now consider the case of an image quantized on a finite number n of grey-levels: {0, 1, 2,..., n 1}. Again, we denote A = X 2 X 3 and B = X 1 X 4. Then A and B have discrete values in { n + 1,.., 1, 0, 1,..., n 1}. Since tan(θ π 4 ) = B A, this shows that θ only takes a finite number of values: where a, b { n + 1,..., n 1}. θ = π 4 + arctan 2 If we assume that the image is a uniform discrete noise (i.e. the X i are independent and for all k in {0,.., n 1}, P [X i = k] = 1/n), then we can ask what is the distribution law of θ. First, we compute the probability distribution of A (and B): n 1 k { n + 1,..., n 1}, P [A = k] = P [X 3 = j] P [X 2 = k + j] = n k n 2. j=0 Then, we compute the probability distribution of B/A. Notice that if B = 0 and A = 0, then the orientation θ is undefined. For each possible discrete value b/a Q (with a b = 1) of b a,

80 80 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION π/2 3π/4 π/4 undefined π 0 3π/4 π/4 π/2 Figure 3.4: For each local binary configuration, the corresponding gradient orientation. B/A, we have [ B P A = b ] = P [B = λb] P [A = λa]. a λ Z In particular, we can compute the probability of the event θ = π/4 (it corresponds to the event B = 0 and A > 0. Notice that, thanks to symmetries, this probability also is the probability of the events θ = π/4, θ = 3π/4 and θ = 3π/4. [ P θ = π ] n 1 = P [A = a] P [B = 0] = n 1 4 2n 2. a=1 Moreover, we have α π 4 + π [ 2 Z, P [θ = α] < P θ = π ]. 4 This shows that the orientations multiple of π/4 are highly favoured. On Figure 3.5, we plot the probability distribution of θ when the number of grey-levels is n = 6. Most quantized images are not binary images, but the effect of quantization on the computation of the gradient orientation is always very significant. The reason for this is that in an image, there are usually many flat regions. In these regions, the grey-levels take a small number of values, and consequently the orientation is very quantized. On Figure 3.6, we present an image, quantized on 256 grey levels. We show the histogram of the orientation of the gradient, and also the histogram of the modulus of the gradient.

81 3.3. ORIENTATION DEQUANTIZATION Figure 3.5: Probability distribution of θ on [ π/2, π/2], when the grey-levels are uniformly distributed in {0, 1,.., 5} 3.3 Orientation dequantization The proposed solution: Fourier translation We assume that the original image denoted s (before quantization) is a Shannon signal (i.e. we can reconstruct the whole signal from the samples): s(x) = k Z s(k) sin(π(x k)). π(x k) Now, we do not know the exact values of the s(k). We only have the quantized signal S. At each point, s(k) = S(k) + X k, where X k is the quantization noise. In the following, we assume that the X k are independent, and uniformly distributed on [ 1/2, 1/2]. This independence assumption is correct above the Nyquist distance. The proposed solution for dequantization, is the following one. We replace the quantized values S(n) by the Shannon interpolates S(n + 1/2): S(n ) = sin(π(1/2 k)) S(n + k) π(1/2 k) k Z = ( 1) k s(n + k) π(1/2 k) k Z k Z = s(n ) ( 1) k X n+k π(1/2 k) k Z X n+k ( 1) k π(1/2 k) Remark : The previous formula is valid for an infinite, one-dimensional signal. For a two-

82 82 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION (a) The original building image, quantized on 256 grey-levels. 3.5 x x (b) The histogram of the orientation of the gradient. (c) The histogram of the norm of the gradient. Figure 3.6: The effect of quantization on the histogram of orientations

83 3.3. ORIENTATION DEQUANTIZATION 83 dimensional image u, the formula becomes: u(x, y) = u(k, l) k,l Z sin(π(x k)) π(x k) sin(π(y l)). π(y l) For a finite image, of size N N, we have: u(x, y) = N 1 k,l=0 u(k, l) sinc d (x k) sinc d (y l), where sinc d is the discrete version of the sinc function: sinc d (t) = sin( N 1 N πt) N sin( πt N ) N 1 N Study of the dequantized noise if t NZ if t NZ By the dequantization method, we aim at replacing the structured quantization noise by a noise as white as possible. Obviously, the Shannon interpolate being an isometry in Hilbert Space, we do not reduce or enlarge the variance of the noise. Thus, we can already claim that the method is at any rate harmless. We can of course reconstruct the original digital image by the inverse translation. Our aim in this subsection will be to study the dequantized noise Y defined in 1-D by Y = k Z ( 1) k π(1/2 k) X k, where the X k can be assumed, in a first approximation, to be independent, and are uniformly distributed on [ 1/2, 1/2]. In dimension 2, the dequantized noise, denoted by Y 2, is defined by Y 2 = k,l Z ( 1) k ( 1) l π(1/2 k) π(1/2 l) X k,l. On Figure 3.7, we show the distribution of Y in dimension 1 and 2. On the same figure we plot the gaussian distribution which has the same mean value and variance as Y. These probability distributions seem to be very close. We shall prove it. Let us first introduce some notations. For k Z, we set c k = The mean value and the variance of X k are: ( 1)k π(1/2 k). E(X k ) = 0 and V (X k ) = 1 12.

84 84 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION (a) distribution of Y in dimension (b) distribution of Y in dimension 2. Figure 3.7: Distribution of Y in dimension 1 and 2 and comparison with the gaussian distribution with mean value 0 and variance 1/12 (dotted curve).

85 3.3. ORIENTATION DEQUANTIZATION 85 Since c 2 k is convergent, this shows that the random variable series c k X k is convergent in L 2. Moreover, we get E(Y ) = 0 and V (Y ) = 1 12 c 2 k = The variance of Y also is 1/12 because the Fourier 1/2-translation is an isometry of L 2. Let us denote by g the gaussian distribution with expectation 0 and variance 1/12: k Z x R g(x) = 1 σ /2σ2 e x2 2π where σ 2 = Let f be the density probability of Y : f is the convolution of the density probabilities of the c k X k. Kurtosis Comparison One way to compare the distribution of Y to the gaussian distribution is to compare their fourth order moment (notice that they have same mean value 0, same variance 1/12 and same third order moment 0). More precisely, we will compare their normalized fourth order moment (called the kurtosis, see [57]). Definition 7 (Kurtosis) The kurtosis κ of a random variable X with average x is defined by κ = E((X x)4 ) E((X x) 2 ) 2. Proposition 19 For any gaussian distribution, we have κ = 3, i.e. the kurtosis does not depend on the average and variance. We will now compute the kurtosis of the distribution of Y, in dimension 1 and 2. This is a very useful way to check wether a distribution is gaussian like. Proposition 20 Let κ 1 be the kurtosis of Y 1 defined by Y 1 = k c kx k, where the X k are independent identically uniformly distributed on [ 1/2, 1/2], then κ 1 = Let κ 2 be the kurtosis of Y 2 defined by Y 2 = k,l c kc l X k,l, where the X k,l are independent identically uniformly distributed on [ 1/2, 1/2], then κ 2 =

86 86 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION More generally, let Y n be the dequantized noise in dimension n: Y n = And let κ n denote the kurtosis of Y n, then Proof : i 1,i 2,...,i n Z n c i1 c i2...c in X i1,i 2,...,i n. κ n = n 1. Let f be the probability density of Y 1. One way to compute the fourth order moment of Y 1 is to compute the Fourier transform F of f and then to compute the fourth order derivative of F at 0. We first compute F : F (t) = E(e ity 1 ) = k Z E(e itc kx k ) = k Z Φ(c k t), where Φ is the Fourier transform of the uniform distribution on [ 1/2, 1/2], i.e. Φ(t) = 1/2 1/2 e itx dx = 2 t sin ( t 2 ). Consequently, we get F (t) = k Z ( ) 2 c k t sin ck t = ( ) ck t sin c. 2 2 k Z For x close to 0, we have Thus, for t close to 0, we get log F (t) = k Z = k Z log 1 6 sin c (x) = 1 x2 3! + x4 5! + O(x6 ). ( 1 1 ( ck ) 2 t ( ck ) 4 t 4 + O(c 6 k )) t6 ( ck 2 ) 2 t ( ck ) 4 t 4 1 ( ck ) 4 t O(c6 k t6 ) = t2 6 S 2 + t4 120 S 4 t4 72 S 4 + O(t 6 ), where S 2 = k Z (c k/2) 2 and S 4 = k Z (c k/2) 4. Finally, since for x close to 0, exp(x) = 1 + x + x 2 /2 + O(x 3 ), we get: F (t) = 1 t2 6 S The fourth order moment of Y 1 is then, E(Y 4 1 ) = F (4) (0) = 24 t 4 36 S2 2 t4 180 S 4 + O(t 6 ). ( 1 72 S2 2 1 ) 180 S 4.

87 3.3. ORIENTATION DEQUANTIZATION 87 On the other hand, we can compute S 2 and S 4 using Bernoulli numbers and the zeta function (see [20] for example), and get: Finally, we obtain S 2 = 2 π 2 k 0 S 4 = 2 π 4 k 0 1 (2k + 1) 2 = 1 4, 1 (2k + 1) 4 = E(Y1 4 ) = and κ 1 = = In the same way, we can compute the fourth order moment of Y 2 = k,l c kc l X k,l : E(Y2 4 ) = 4! 1 72 k,l Z Notice that the variance of Y 2 is also 1/12: Finally, the kurtosis κ 2 of Y 2 is ( ck c l 2 2 ) k,l Z ( 16 E(Y2 4 ) = S ) 180 S2 4 = V (Y 2 ) = 1 12 k,l Z c 2 k c2 l = S2 2 = κ 2 = = ( ck c ) l 4. 2 More generally, we can compute the fourth order moment of dequantized noise Y n in dimension n: E(Yn 4 ) = 4! 1 72 i 1,..,i n Z n ( ci1...c in 2 2 ) i 1,..,i n Z n ( ci1...c ) in 4. 2 We also notice that the variance of Y n is always 1/12. Thus, ( 2 κ n = n 4 ) (Sn 2 ) 2 24n Sn 4 = n 1. Estimating L 1 distance to gaussian distribution On Figure 3.7, we remark that the probability densities f and g seem to be very close on the average. In this subsection, we will estimate the L 1 distance between f and g: f g L 1= f(x) g(x) dx. R

88 88 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION We have Y = k Z c kx k, and the c k are symmetric around 1/2, i.e. c 1 k = c k. Thus, we get Y = k 1 c k(x k + X 1 k ). Let us denote n Z n = c k (X k + X 1 k ), k=1 and let f n be the probability density of Z n. We then have f n = f n 1 h n, where h n is the probability density of c n (X n + X 1 n ). Proposition 21 Let f be the probability density of Y = k Z c kx k, and let g be the gaussian distribution with mean 0 and variance 1/12, then n 1, f g L 1 f n g L g L 1 c 2 k. k n+1 Proof : For n 2, we have f n = f n 1 h n, where h n is the probability density of c n (X n + X 1 n ). We notice that h n is a positive even function, with compact support [ c n, c n ] and satisfying h n = 1. Thus, f n g L 1 f n 1 h n g h n L 1 + g h n g L 1 f n 1 g L 1 + g h n g L 1. We now compute g h n g L 1. For x R, using the definition of g h n and the integral Taylor formula, we get, yh n (y) being odd, g h n (x) g(x) = g(x y)h n (y)dy g(x) = = = R cn c n cn c n cn c n [g(x + y) g(x)]h n (y)dy 1 ) (yg (x) + y 2 (1 t)g (x + ty)dt h n (y)dy 0 ( 1 ) y 2 (1 t)g (x + ty)dt h n (y)dy. Then, we can estimate the L 1 distance: cn 1 g h n g L 1 y 2 (1 t) g (x + ty) h n (y)dtdydx R = g L 1 = 1 2 g L 1 0 c n 0 cn 1 c n 0 cn c n = 1 12 g L 1 c 2 n y 2 (1 t)h n (y)dtdy y 2 h n (y)dy = 1 2 g L 1 V (c n (X n + X 1 n ))

89 3.3. ORIENTATION DEQUANTIZATION 89 We add these inequalities and obtain n 1, f g L 1 f n g L g L 1 k n+1 In order to have a numerical estimate of f g, we can use a computational software to compute numerically the first terms f 1, f 2,...f 10,... Thus, we can compute an upper-bound for the L 1 distance between the density function of Y and the gaussian distribution. We can also do this for the random variable Y 2 (dequantized noise in dimension 2) defined by Y 2 = c k c l X k,l. k,l Z Proposition 22 (Estimate L 1 distance) Let f be the probability density of Y = k c kx k, and let f be the probability density of Y 2 = k,l c kc l X k,l. Let g be the gaussian distribution with mean 0 and variance 1/12, then Proof : f g L and f g L We first compute g L 1. For x R, we have g (x) = (x 2 /σ 4 1/σ 2 )g(x), where σ 2 = 1/12 is the variance of g. Thus, using an integration by parts and the properties R g = 1 and R x2 g(x) = σ 2, we get g L 1 = 1 σ 4 = 2 σ 4 x σ x σ (x 2 σ 2 )g(x)dx + 1 σ 4 x 2 g(x)dx 2 σ 2 = 4 σ g(σ) = 4 σ 2 2π e 1/2 1 σ 2 = 12 Then, we compute the rest of the sum c 2 k : k N+1 Thus, we obtain c 2 k = 1 π 2 k N+1 1 (k 0.5) 2 1 π g L 1 k N+1 x σ + N+1/2 x σ g(x)dx c 2 k 1 π 2 N. c 2 k. (σ 2 x 2 )g(x)dx 1 (x 0.5) 2 dx = 1 π 2 N. In order to have an estimate of the L 1 distance with an error less than 1%, we take N = 25. We then have to numerically compute f N g L 1. Let [ A N, A N ] be the support of the density function f N, we have A N = N c k = k=1 N k=1 1 π(k 0.5).

90 90 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION Then, AN f N g g(x)dx + f N (x) g(x) dx. x >A N A N On the other hand, when we numerically compute the integral of a lipschitz function F using a Riemann sum with discretization δ, we have an error estimated by: b F (x)dx δf (kδ) (2 max F + (b a)l)δ, a k/a kδ b where L is the lipschitz constant of F. For the gaussian function g, we have 12 max g = g(0) = and max g = g (σ) = 12 e 1/2. 2π 2π And for the density function f n we have max f n max f 1 = 1 c 1 = π 2 and max f n max f 1 = 1 c 2 1 = π2 4. Since at each step, we compute f n+1 as the convolution of f n with the uniform distribution on [ c n /2, c n /2], we have an error less than 2δ (2 π2 + c n π2 4 Finally, if we denote f N g num,δ the numerical estimate of f N g L 1 when the discretization length is δ, we get ( π 2 12 f N g L 1 f N g num,δ +δ 2Nπ + A N A N e )+ 1/2 g(x)dx. 2π 2π x >A N Thus, when we compute this for N = 25 and δ = 10 5, we get f N g num,δ with an error less than 1%, this shows that f g L ). In the same way we can compute f g L Posterior independence In our study of the dequantized noise, we made the assumption that the X k are independent (and uniformly distributed on [ 1/2, 1/2]). We address here the problem of the posterior independence of the dequantized noise, i.e. if we consider the dequantized noise Y (n) and Y (m) at two different points n and m: Y (n) = c k X n+k and Y (m) = c k X m+k, k Z k Z we are interested in the correlation of Y (n) and Y (m). The result, which shows that we do not increase the correlation, is given by the following proposition.

91 3.3. ORIENTATION DEQUANTIZATION 91 Proposition 23 Let the X k, for k Z, be random variables uniformly distributed on [ 1/2, 1/2]. Assume that for all δ Z, the correlation between X k and X k+δ is the same for all k Z, i.e. δ Z, C δ s.t. k Z, Cor(X k, X k+δ ) = E(X kx k+δ ) E(X k )E(X k+δ ) σ(x k )σ(x k+δ ) = C δ. Then, the correlation between Y (n) and Y (m) is Cor(Y (n), Y (m)) = C m n. In particular, this shows that if the X k are independent, then the correlation between Y (n) and Y (m) is 0 when n m. Proof : Since, for all k, we have E(X k ) = 0, this implies that E(Y (n)) = E(Y (m)) = 0. We also have, for all k Z, σ 2 (X k ) = 1/12 = σ 2 (Y (n)) = σ 2 (Y (m)). Thus, if we compute the posterior correlation, we get: For δ Z, we have k Z c k c k+δ = ( 1)δ π 2 Cor(Y (n), Y (m)) = 12[E(Y (n)y (m)) E(Y (n))e(y (m))] = 12 c k c l E(X k+n X l+m ) k Z l Z = 12 c k c k+δ E(X k+n X k+δ+m ) k Z δ Z = c k c k+δ C δ+m n δ Z k Z k Z 1 (1/2 k)(1/2 k δ) = ( 1)δ δπ 2 On the other hand, for δ = 0, we already saw that ( k Z c 2 k = 4 1 π 2 (1 2k) 2 = 1. k Z k Z 1 (1/2 k δ) 1 (1/2 k) ) = 0. Notice that these properties of the c k are explained by the fact that the 1/2 Fourier translation is an isometry of L 2. Finally, we obtain the announced result: Cor(Y (n), Y (m)) = k Z c 2 k C m n = C m n.

92 92 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION The flat regions model: final explanation of the dequantization effect Let us summarize. We had written the dequantized signal as S(n ) = s(n ) k Z X n+k ( 1) k π(1/2 k), (3.2) where s(n + 1/2) is the original signal computed by Shannon interpolate at point n + 1/2, S(n + 1/2) is the dequantized signal at the same point and Y (n) = k Z X n+k ( 1) k π(1/2 k) (3.3) is the so-called dequantized noise. We have proven that Y (n) is nearby gaussian. Since in Proposition 15 we also prove that the addition of a gaussian noise to the signal does not create any bias in the orientation map, we might be contented with this result. Now, we claim that the above explanation does not give an account of the change in the orientation histogram obtained by dequantization. Indeed, we prove in Proposition 18 that the addition to the signal of a uniformly distributed noise on an interval does not create a bias on the orientation larger than 4.7%. Thus something must be inaccurate in our assumptions. Actually, we notice that when the gradient of s is small, then the quantized values of s(n) around 0 (i.e. n = ±1, n = ±2,...) are a very discrete signal. In other terms, assuming for simplicity that s(0) = 0, we have that S(k) = s(k) X k {0, 1, 1, 2, 2,..} where the first integer values are very majoritary. This means that s(k) and X k are highly correlated when the gradient is small. Thus, our model (3.2) explaining the good behaviour of S(n + 1/2) S(n ) = original signal + gaussian noise = s(n ) + Y (n) will make sense only if we can point out that the dequantization process implies: Y (n) and s(n + 1/2) decorrelated. Now, using the same proof as in Proposition 23, we can show that this is not true. In fact, more precisely, by this result we see that Cor(Y (n), s(n )) = Cor(s(n), X n), under the sound assumption of stationarity. Thus, we gain or lose no independence of the signal and the noise obtained by dequantization. The final explanation will however come out of the technique developed above. We first point out (see Figures 3.9.c and 3.9.d) that all the bias in orientation histogram, in an image u, is due to low values of the quantized gradient, namely u 4, i.e. u 2 = 1, 2, 4, 5, 8, 9, 10, 13, 16. The reason for this is the following: at a point (x, y) of an image where the gradient is large, the orientation is not much affected by the quantization. In fact, the angle error between the true orientation at the point and the orientation computed after the quantization of the image is proportional to 1/ u. Let us show this. We denote by u the original image, then u = u x + iu y = u e iθ,

93 3.3. ORIENTATION DEQUANTIZATION 93 where θ is the direction of the gradient. Let ũ denote the quantized image, and let q denote the quantization step. If θ denotes the direction of the gradient of the quantized image, we obtain ũ = ũ e i θ = u e iθ + z, where z is a complex number (it represents the gradient of the difference between the true image and the quantized image) with modulus smaller than q. Thus, we get (see also Figure 3.8): sin() q u. u s Figure 3.8: the effect of quantization on the gradient orientation. The points with small gradient are in the majority (about 60%, see Figure 3.9.b) in the gradient norm histogram. Thus, we must focus on the points where 1 u 4. We notice that in a neighbourhood of such a point n, the histogram of values of S(k) are a discrete uniform process centered a S(n). Taking, without lost of generality, S(n) = 0, we can model the values around these points as discrete independent random values. See Figures 3.9.e and 3.9.f for the histogram and correlations. In flat regions, the gradient is quantized on a small number of values and we will see that the proposed Fourier translation has a strong dequantization effect. Let S denote the quantized signal. At a point n, we replace the quantized value S(n), by the Shannon interpolate S(n + 1/2), and then, we compute the gradient by S(n + 1/2) S(n 1/2) = ( 1) k + k) S(n + k 1)] π(1/2 k) k Z[S(n. In flat regions, we can assume that the difference S(n) S(n 1) takes a small number of discrete values. For example, if we assume that it only takes the values 0, 1 or 1, then the following proposition shows in particular that S(n + 1/2) S(n 1/2) is no more quantized. Proposition 24 Let Z be the random variable defined by Z = k Z ( 1) k π(1/2 k) Q k,

94 94 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION u s percentage of pixels u s norm of the gradient (a) The original image (size ). (b) The histogram of the norm of the gradient number of pixels u s orientation of the gradient number of pixels u s orientation of the gradient (c) The histogram of the gradient orientation for all points of the image. (d) The histogram of the gradient orientation for points with gradient norm larger than u s correlation coefficient u s distance (e) Local histogram (window of size 3 3) of u(x) u(x 0) for points x 0 such that u(x 0) < 3. (f) Correlation coefficient of u(x 0 + d) u(x 0) and u(x 0 d) u(x 0) as a function of the distance d, for points x 0 such that u(x 0) < 3. Figure 3.9: Empirical observations on histograms and correlation.

95 3.3. ORIENTATION DEQUANTIZATION 95 where the Q k are independent discrete random variables, taking the values 0, 1 or 1, each one with probability 1/3. Then Z follows the same probability distribution as T 3 = k 2[3] ( 1) k π(1/2 k) 3X k, where the X k are independent, uniformly distributed on [ 1/2, 1/2]. Thus, a < b, P [a Z b] = P [a T 3 b]. Remark : In fact, T 3 is nearby gaussian. In particular, we have a nearby perfect dequantization since the previous proposition implies that Proof : a < b, P [a Z b] > 0. As in the previous section, we denote c k = ( 1) k /π(1/2 k). Let δ(x) denote the dirac function. Then, the probability distribution of the Q k is d(x) = 1/3 (δ(x 1) + δ(x) + δ(x + 1)). The main point is to notice that the result of the convolution of d(x) with the uniform distribution on [ 1/2, 1/2], is the uniform distribution on [ 3/2, 3/2]. This means that Q k +X k has the same probability distribution as 3X k. And consequently, k c k(q k +X k ) has the same probability distribution as k 3X k. We now consider the Fourier transform of the previous distributions (it will convert the convolution into a product). Let F 1 (t) denote the Fourier transform of k c kx k. We already saw in the previous section that F 1 (t) = ( ) ck t sin c. 2 k Z We denote by F 3 (t) the Fourier transform of k 3c kx k. We then have F 3 (t) = F 1 (3t). On the other hand, the Fourier transform of d(x) is (1 + 2 cos t)/3. Thus, if we denote by G(t) the Fourier transform of the law of k c kq k, we have G(t) = ( ) cos(ck t), 3 k Z where in both cases the convergence of products is uniform on every compact subset of R. Since k c k(q k + X k ) has the same probability distribution as k 3X k, this shows that t R, G(t)F 1 (t) = F 3 (t). We now show that there exists a continuous function H 1 such that for all t R, F 3 (t) = F 1 (t)h 1 (t). In fact, we have F 3 (t) = ( ) 3t sin c = π(2k 1) k Z k 2[3] ( ) 3t sin c π(2k 1) k 3Z+2 ( ) 3t sin c. π(2k 1)

96 96 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION For k 3Z + 2, we can write k = 3k 1, and thus 3/(2k 1) = 1/(2k 1). This shows that k 3Z+2 ( ) 3t sin c = F 1 (t), π(2k 1) and consequently, we have F 3 (t) = F 1 (t)h 1 (t), where H 1 (t) = k 2[3] ( ) 3t sin c, π(2k 1) and H 1 is continuous on R as the uniform limit on compact subsets of continuous functions. Thus, we have for all t R, G(t)F 1 (t) = H 1 (t)f 1 (t). Since the zeros of F 1 are discrete, and G and H 1 are continuous, this shows that for all t R, G(t) = H 1 (t) = ( ) 3t sin c. π(2k 1) k 2[3] And thus, Z = k c kq k has the same probability distribution as k 2[3] 3c kx k. Moreover, thanks to Levy Theorem, we have a convergence in law of the partial sums k n c kq k to T Experiments and application to the detection of alignments In this section, we present some applications of the proposed solution for dequantization. The first application is the detection of meaningful alignments in an image. Generally, we compute meaningful alignments with the precision p = 1/16. But, sometimes, we are interested in alignments at a better precision, say for example p = 1/64. In Figure 3.10, we first present the original image (a): this is a result of the scan of an Uccello s painting: Presentazione della Vergine al tempio (from the book L opera completa di Paolo Uccello, Classici dell arte, Rizzoli). This image is quantized on 32 grey-levels. We first compute (Figure(b)) the meaningful alignments at precision p = 1/16. Then, we compute (Figure(c)) the meaningful alignments at precision p = 1/64: it shows many diagonal alignments. These alignments are artefacts, their explanation is the quantization effect on the computation of orientations: directions multiple of π/4 are highly favoured. On Figure (d), we show the detection of meaningful alignments at precision p = 1/64, after the proposed solution for dequantization: (1/2, 1/2) Fourier translation. The result shows that artefactual diagonal alignments are no more detected. On Figures 3.11, 3.12 and 3.13, we show the results of the (1/2, 1/2) translation on images quantized on 20 grey-levels : for each image, we compute the histogram of orientations and the maximal meaningful alignments for precision p = 1/64, before and after translation.

97 3.4. EXPERIMENTS AND APPLICATION TO THE DETECTION OF ALIGNMENTS97 We noticed also (but we have, for the time being, no theorical argument to justify it) that the same method yields a significant improvement in orientation map of aliased images: see Figure This is particularly true for aliasing due to direct undersampling, a barbarous but usual zoom in method in many image processing software.

(a) The original painting image, quantized on 32 grey

(b) The maximal meaningful alignments for precision p =

meaningful alignments for precision p = 1/64.

98 98 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION rag replacements u s u s (a) The original painting image, quantized on 32 grey levels. (b) The maximal meaningful alignments for precision p = 1/16. frag replacements u s u s (c) The maximal meaningful alignments for precision p = 1/64. (d) The maximal meaningful alignments for precision p = 1/64, after (1/2, 1/2)-translation. Figure 3.10: The effect of quantization on the detection of alignments.

3.4. EXPERIMENTS AND APPLICATION TO THE DETECTION OF ALIGNMENTS99 u s u s u s u s 6000 1500 5000 4000 1000 3000 2000 1000 0 150 100 50 0 50 100 150 500 u s u s 0 150 100 50 0 50 100 θ 150 θ Figure 3.

99 3.4. EXPERIMENTS AND APPLICATION TO THE DETECTION OF ALIGNMENTS99 u s u s u s u s u s u s θ 150 θ Figure 3.11: the La Cornouaille boat (size ). First line: left, the original image; right, the image quantized on 20 grey levels. Second line: left, the maximal ε-meaningful alignments of the quantized image for precision p = 1/64 and ε = 1; right, the maximal ε-meaningful alignments after the (1/2, 1/2)-translation, for precision p = 1/64 and ε = 1. Third line: left, the histogram of the orientations (in degrees) in the quantized image; right, the histogram of the orientations after the (1/2, 1/2)-translation.

100 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION u s u s u s u s 2.5 x 104 4000 2 3500 3000 1.5 2500 2000 1 1500 0.

First line: left, the original image; right, the image quantized on 20 grey levels.

100 100 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION u s u s u s u s 2.5 x u s u s θ 150 θ Figure 3.12: the church of Valbonne: size (from INRIA-Robotvis database). First line: left, the original image; right, the image quantized on 20 grey levels. Second line: left, the maximal ε-meaningful alignments of the quantized image for precision p = 1/64 and ε = 1; right, the maximal ε-meaningful alignments after the (1/2, 1/2)-translation, for precision p = 1/64 and ε = 1. Third line: left, the histogram of the orientations (in degrees) in the quantized image; right, the histogram of the orientations after the (1/2, 1/2)-translation.

3.4. EXPERIMENTS AND APPLICATION TO THE DETECTION OF ALIGNMENTS101 u s u s u s u s 3 x 104 4500 2.5 4000 3500 2 3000 1.5 2500 2000 1 0.

101 3.4. EXPERIMENTS AND APPLICATION TO THE DETECTION OF ALIGNMENTS101 u s u s u s u s 3 x u s u s θ 150 θ Figure 3.13: the famous Lena (size ). First line: left, the original image; right, the image quantized on 20 grey levels. Second line: left, the maximal ε-meaningful alignments of the quantized image for precision p = 1/64 and ε = 1; right, the maximal ε-meaningful alignments after the (1/2, 1/2)-translation, for precision p = 1/64 and ε = 1. Third line: left, the histogram of the orientations (in degrees) in the quantized image; right, the histogram of the orientations after the (1/2, 1/2)-translation.

gradient orientation in the aliased image. (d) The histogram of the gradient orientation after translation.

102 102 CHAPTER 3. DEQUANTIZING IMAGE ORIENTATION u s u s (a) The aliased image. (b) The image after (1/2, 1/2) Fourier translation u s u s (c) The histogram of the gradient orientation in the aliased image. (d) The histogram of the gradient orientation after translation. u s u s (e) The meaningful alignments of the aliased image. (f) The meaningful alignments after translation. Figure 3.14: The effect of the Fourier translation on an aliased image.

103 Chapter 4 Modes of a histogram When we observe a histogram (for example the histogram of grey-levels of an image), we usually observe peaks in the histogram. But peaks are not well-defined: their width and height can vary a lot. We will try here to define the notion of meaningful peaks, using the same methodology as for meaningful alignments. In histogram analysis, we can distinguish several classes of algorithms computing modes. First of all, a parametric model may be at hand, ensuring e.g. that the histogram is the an instance of k gaussian random variables whose average and variance have to be estimated from the histogram ([22], [77], [65]). Clearly, optimization algorithms can be defined for this problem and, if k is unknown, it may be found by using variants of the Minimal Description Length Principle. The theory of Model selection also addresses the problem of choosing a histogram estimator from a sample distribution, by minimizing a given risk (see for example [10] and [9]). For instance, given a histogram (i.e. an empirical distribution) and a class of models with various complexities (for example, the set of piecewise constant functions, where the number of pieces measures the complexity ), a selection criterion which has to be minimized is then defined using for example the Kullback-Leibler distance and a penalty term (typically, the number of pieces). Many theories intend to threshold a histogram in an optimal way, that is, to divide the histogram into two modes according to some criterion. The most popular criterion is entropy (see [64], [1],[41],[11]): the authors try to find a threshold value m such that some entropy term of the bimodal histogram is maximal ; a generalization leads to find by entropy criteria multiple thresholds. This thresholding problem turns out to be very useful and relevant in image analysis, since it leads to the problem of optimal quantization of the grey levels. Here again, we can repeat the same criticism as for segmentation algorithms: the found thresholds are not proved to be relevant, and separating meaningful modes of the histogram. To take an instance, if the histogram is constant, the optimal threshold given by the mentioned methods is the median value. Now, a constant histogram is not bimodal. As in the alignment detection theory, we shall adopt the Helmholtz principle (we give up any 103

104 104 CHAPTER 4. MODES OF A HISTOGRAM a priori knowledge about the histogram model). Thus, we compute as though all samples were uniformly and independently distributed. Meaningful modes will be defined as counterexamples to this uniformity assumption and we define the actual modes as the maximal meaningful modes. We shall give a theorem proving that, in the large deviation framework, maximal meaningful intervals of the histogram are disjoint. We shall immediately apply the resulting algorithm to image analysis. Our goal is to show the reliability of the detection theory to give an account of the so called visual pyramid, according to which geometric events (Gestalt) are grouped recursively at different scales of complexity. This hypothesis of Gestalt theory [53] shall be valid only if the detection of geometric events is robust enough to allow one to compute modes of properties of these events. We shall first compute all maximal meaningful alignments in several images, and then group them according to the mode of length, or orientation they belong to. 4.1 Discrete histograms We first consider a discrete histogram, that is a finite number M of points and a finite number L of values. This is for example the case of the grey-level histogram of a discrete image. We assume that the set of possible values is {1,..., L}. For each discrete interval of values [a, b], let k(a, b) be the number of points (among the M) with value in [a, b], and let p(a, b) = (b a + 1)/L. It represents the prior probability for a point to have value in [a, b]. We are now interested in peaks, or modes of the histogram, that is intervals [a, b] which contain significantly more points than expected Meaningful intervals The total number of possible intervals is L(L + 1)/2. We fix an interval [a, b], the probability that it contains at least k(a, b) points among the M is B(M, k(a, b), p(a, b)), where B(n, k, p) = ) p j (1 p) n j denotes the tail of the binomial distribution of parameters n and p. If n ( n j=k j we adopt the same definition as for alignments, the number of false alarms of the interval [a, b] is then: L(L + 1) NF ([a, b]) = B(M, k(a, b), p(a, b)). 2 Thus, an interval [a, b] is ε-meaningful if NF ([a, b]) ε, that is B(M, k(a, b), p(a, b)) 2ε L(L + 1). Notice that in the above definition, compared to definition of meaningful alignment, we use the binomial distribution in different ways: For histograms: B(M, k(a, b), b a+1 L ). For alignments: B(l, k, p).

105 4.1. DISCRETE HISTOGRAMS 105 In the first case M is fixed, the other arguments depend on the considered interval, including the probability p(a, b) = (b a + 1)/L. In the second case, the precision p is fixed and the length l of the segment is a variable first argument of B. Thus, our variables are used in quite different places of B. Now, as we shall see, meaningfulness and maximal meaningfulness will receive a quite analogous treatment. In the following, for an interval [a, b] which contains k(a, b) points among the M, we will denote k(a, b) r(a, b) = M. It represents the posterior probability for a point to have value in [a, b]. Proposition 25 Let [a, b] be a meaningful interval, then and by Hoeffding s inequality we have r(a, b) = B(M, k(a, b), p(a, b)) e M h k(a, b) M > p(a, b) i r(a,b) log r(a,b) 1 r(a,b) +(1 r(a,b)) log p(a,b) 1 p(a,b). Proof : This is a direct application of Lemma 4 and Hoeffding s inequality (Proposition 3). Notice that Lemma 4 provides some inequalities for the binomial distribution when p 1/2. In order to have inequalities for p > 1/2, we use the following property: B(l, k, p) = 1 B(l, l k + 1, 1 p), and also the fact that for a meaningful interval, B(M, k(a, b), b a + 1 ) < L 2 L(L + 1) < min(p(a, b), 1 2 ). We will be interested in experiments on the histogram of grey-levels of a discrete N N image. We will consider an image of size N = 256 and with grey-level values in {0, 1,.., 255}. We fix M = and L = 256, and we first give a table of detection thresholds. For each length l, such that 1 l L, we compute the minimal number k(l) of points (among the N 2 = M) that an interval of length l has to contain in order to become 1-meaningful. This means that k(l) is defined as the smallest integer such that B(M, k(l), l L ) < 2 L(L + 1). We also compute the detection thresholds k d (l) given by the large deviations estimate of the binomial tail. This means that k d (l) is defined as the smallest integer above M l/l such that k d (l) M log k d(l)l Ml + (1 k d(l) M ) log 1 k d(l)/m 1 l/l > 1 M L(L + 1) log. 2

106 106 CHAPTER 4. MODES OF A HISTOGRAM Thanks to Hoeffding s inequality, we have k d (l) k(l) > Ml/L. On Figure 4.1, we first plot the graph of the function l k(l) for l in [1, 256]. Then, we plot the graph of the relative error l (k d (l) k(l))/k(l). Notice that the maximal value of the relative error is about 3%, attained for small values of l. Finally, on the third subfigure, we give k(l), k d (l) (dotted curve) and M l/l (dashed line) for l in [1, 10]. These experiments justify the adoption of the large deviation estimate in order to define meaningful and maximal meaningful intervals 1. Definition 8 (relative entropy) We define the relative entropy of an interval [a, b] (with respect to the prior distribution) by { 0 if r(a, b) p(a, b) H([a, b]) = r(a, b) log r(a,b) 1 r(a,b) p(a,b) + (1 r(a, b)) log 1 p(a,b) otherwise. In the case r(a, b) > p(a, b), the relative entropy H([a, b]) is also called the Kullback- Leibler distance between the two Bernoulli distributions of respective parameter r(a, b) and p(a, b) (see [13]). Remark : This definition is related to coding and Information Theory (see also [13]). Let us explain in which sense. We consider the histogram of a set of M points distributed on a length L interval (called the reference interval). We fix an interval I of length l L. Let k be the number of points, among the M, it contains. We want to encode a binarisation of the histogram defined in the following way: for each point we only keep the information of whether it belongs to the fixed interval I or not. Since the prior probability for a point to be in I is l/l, the prior expected bit-length needed to encode the histogram is k log 2 l L (M k) log 2(1 l L ). On the other hand, the posterior probability for a point to be in I is k/m. Thus, the posterior expected bit-length needed to encode the histogram is k log 2 This shows that the code gain is k M (M k) log 2(1 k M ). l k log 2 L (M k) log 2(1 l L ) ( k log k 2 M (M k) log 2(1 k M )) = M(r log r 2 p +(1 r) log 1 r 2 1 p ), where r = k/m and p = l/l. Thus, our measure of meaningfulness of an interval is directly related to the gain between the prior and the posterior coding of the interval. The higher the gain is, the more meaningful the interval is. 1 In pratice, B(M, k, p) is no more exactly computable for M exceeding

107 4.1. DISCRETE HISTOGRAMS x u s (a) Exact detection thresholds k(l), computed with the binomial law relative error in percent rag replacements u s length l u s (b) Relative error (k d (l) k(l))/k(l). The maximal value is 3.1%, attained for small values of l. (c) Detection thresholds k(l), k d (l) (dotted curve) and M l/l (dashed line) for 1 l 10. Figure 4.1: The detection thresholds k(l) and k d (l).

108 108 CHAPTER 4. MODES OF A HISTOGRAM Definition 9 (meaningful interval) We say that an interval [a, b] is ε-meaningful if its relative entropy H([a, b]) is such that H([a, b]) > 1 M L(L + 1) log. 2ε Maximal meaningful intervals For the same reasons we had to introduce the notion of maximal meaningful alignment, we have here to define maximal meaningful intervals. Definition 10 (maximal meaningful interval) We say that an interval I = [a, b] is maximal meaningful if it is meaningful and if J I H(J) H(I), and J I H(J) < H(I). The question is: can two maximal meaningful intervals have a non-empty intersection? We will see that the answer is no. But notice that we are not in the same case as for alignments, and so we cannot apply the same results. In the case of alignments, the probability p was a fixed number and the variables were the length l of the segment and the number k of aligned points on the considered segment. Now, in the case of histograms, the total number of points is a fixed number N and the variables are the prior probability p(i) of interval I and the number k(i) of points in I. Theorem 5 Let I 1 and I 2 be two meaningful intervals such that I 1 I 2, then max(h(i 1 I 2 ), H(I 1 I 2 )) min(h(i 1 ), H(I 2 )), and the inequality is strict when I 1 I 2 I 1 and I 1 I 2 I 2. Proof : For an interval I, we denote by r(i) the proportion of points it contains and p(i) its relative length. Then the entropy of the interval is { 0 if r(i) p(i) H(I) = F (r(i), p(i)) otherwise, where F is defined on [0, 1] [0, 1] by F (r, p) = r log r + (1 r) log(1 r) r log p (1 r) log(1 p). For all (r, p) [0, 1] [0, 1], F (r, p) is positive and it is 0 if and only if r = p. F (r, p) = g r (r) g r (p), and we know by Lemma 1 that g r attains its maximum at r. We first prove that F is a convex function. The partial derivatives of F are: F r = log r 1 r log p 1 p and F p = p r p(1 p). Indeed

109 4.1. DISCRETE HISTOGRAMS 109 Then, we get 2 F r 2 = 1 r(1 r), 2 F r p = 1 p(1 p) and 2 F p 2 = r (1 r) + p2 (1 p) 2. 2 F r 2 > 0 and det(d2 F ) = 2 F r 2 2 F p 2 ( 2 ) 2 F = r p (r p) 2 r(1 r)p 2 0, (1 p) 2 which shows that F is convex. Then the continuous function H(r, p) defined by F (r, p) if r p and 0 otherwise, is also convex (the partial derivatives are continuous). By hypothesis, we have I 1 I 2. We denote I = I 1 I 2 and J = I 1 I 2. Then { r(i) + r(j) = r(i1 ) + r(i 2 ) p(i) + p(j) = p(i 1 ) + p(i 2 ) (4.1) and { r(i) min(r(i1 ), r(i 2 )) max(r(i 1 ), r(i 2 )) r(j) p(i) min(p(i 1 ), p(i 2 )) max(p(i 1 ), p(i 2 )) p(j) Now, we want to show that (4.2) min(h(i 1 ), H(I 2 )) max(h(i), H(J)), and that the inequality is strict when I 1 I 2 I 1 and I 1 I 2 I 2. In the plane R 2, we consider the set R of points (r, p), such that r(i) r r(j) and p(i) p p(j). Then R is a rectangle and, by (4.2), it contains the points X 1 = (r(i 1 ), p(i 1 )) and X 2 = (r(i 2 ), p(i 2 )). Let A be the following set of points: A = {(r, p)/h(r, p) max(h(i), H(J))}. A is a convex set because H is a convex function. Let X = (r(i), p(i)) and Y = (r(j), p(j)), then A contains the segment [X, Y ]. Since F r 0 for r p, the set A contains R {r p} P + where P + is the half-plane above the line (X, Y ) (see Figure 4.2). p r=p Y X 2 X X 1 R u s r Figure 4.2: Since I 1 and I 2 are meaningful, we get X 1 and X 2 in R {r > p}. And then since the middle point of segment [X 1, X 2 ] is also the middle point of segment [X, Y ] by (4.1), one of

110 110 CHAPTER 4. MODES OF A HISTOGRAM X 1 and X 2 is in P +. Consequently, X 1 or X 2 is in A, which shows that min(h(i 1 ), H(I 2 )) max(h(i), H(J)). If I I 1 and I I 2 then the inequality is strict, thanks to the fact that for r > p, F r > 0, and we have the announced result. Proposition 26 Let I 1 and I 2 be two different maximal meaningful intervals, then I 1 I 2 =. Proof : Assume that I = I 1 I 2. If I I 1 and I I 2, then by Theorem 5, we have max(h(i 1 I 2 ), H(I 1 I 2 )) > min(h(i 1 ), H(I 2 )), which is a contradiction with the fact that I 1 and I 2 are maximal meaningful. If for example I = I 1 I 2 = I 1, then I 1 I 2. Since by hypothesis I 1 and I 2 are maximal meaningful, we get by definition of maximality H(I 1 ) H(I 2 ) and H(I 2 ) < H(I 1 ), which is again a contradiction. Corollary 5 Let I and J be two meaningful intervals such that H(I) = H(J) = Then, either I J or J I, or I J =. Proof : max H(K). K [0,L] By Theorem 5, if I J and I J and J I, we deduce that H(I J) or H(I J) exceeds H(I) = H(J), which is a contradiction. We now show some results on the histogram of grey-level images. On Figure 4.3, we first show the original image (a). The size of the image is We then give the histogram of grey-levels of the image and compute the maximal meaningful intervals (b). We find only one: the interval [69, 175] (delimited by dotted lines). Finally, on (c), we show the thresholded image: black points represent points on the original image with grey-level value in the maximal meaningful interval [69, 175]. Let us show another example. On figure 4.4, we first give the original image (a). The size of this image is Then, we compute the histogram of grey-levels (b) and we look for maximal meaningful intervals. We only find one: the interval [88, 247]. Notice that the peak [18, 19] is meaningful but not maximal meaningful because NF ([88, 247]) < NF ([18, 247]) < NF ([18, 19]).

4.1. DISCRETE HISTOGRAMS 111 u s (a) The original road image 900 800 700 600 500 400 rag replacements u s 300 200 100 0 0 50 100 150 200 250 300 u s (b) Histogram of grey-levels and maximal

We then give the histogram of grey-levels of the image and compute the maximal meaningful intervals (b). We find only one: the interval [69, 175] (delimited by dotted lines).

111 4.1. DISCRETE HISTOGRAMS 111 u s (a) The original road image rag replacements u s u s (b) Histogram of grey-levels and maximal meaningful interval (c) Thresholded image Figure 4.3: The original road image and the corresponding histogram. We first show the original image (a). We then give the histogram of grey-levels of the image and compute the maximal meaningful intervals (b). We find only one: the interval [69, 175] (delimited by dotted lines). Finally, on (c), we show the thresholded image: black points represent points on the original image with grey-level value in [0, 68], grey points represent points with grey-level value in the maximal meaningful interval [69, 175], and white points represent points with grey-level value larger than 176.

112 112 CHAPTER 4. MODES OF A HISTOGRAM This is a fact that we often observe when looking for maximal meaningful intervals: when the histogram has two peaks, one of them is not maximal meaningful because the union of both is more meaningful (but still less meaningful than the second peak). We will see in the next section how we can solve this problem by defining meaningful gaps and meaningful modes. Finally, as we did for the previous image, we compute the thresholded image (c): black points represent points on the original image with grey-level value in the maximal meaningful interval [88, 247]. We next show some experiments on a very small image: see Figure 4.5. The original image (a) is a small subimage (size 18 12) of the previous airport image. We compute the histogram of this small image (c) and we find only one maximal meaningful interval: the interval [114, 195]. The problem here is that the support of the histogram is not the whole interval [1, 256]. Then, on Figure (d), we compute the maximal meaningful intervals of the histogram when the reference interval is the support of the histogram (for a discussion about the reference interval, see Section 4.4). We find two maximal meaningful intervals: [119, 129] and [179, 191]. This means that we have two peaks : the first one is for the grey-level value 124 (and width ±5) and the second one, for the grey-level value 185 (and width ±6). We finally obtain the thresholded image (b): black points represent pixels of the original image with value less than ( )/ Meaningful gaps and modes In the previous section, we were interested in meaningful intervals, i.e. intervals which contain more points than the expected average in the sense that B(M, k(a, b), p(a, b)) < 2 L(L + 1). We are now interested in gaps, i.e. intervals which contain less points than the expected average. Let us define this more precisely. Let [a, b] be an interval with prior probability p(a, b) = (b a + 1)/L. Let k be an integer such that 0 k M. Then the probability that the interval [a, b] contains less than k points (among the total number M of points) is k j=0 ( ) M p(a, b) j (1 p(a, b)) M j = B(M, M k, 1 p(a, b)) = 1 B(M, k + 1, p(a, b)). j An interval [a, b] containing k(a, b) points is a meaningful gap if B(M, M k(a, b), 1 p(a, b)) < 2 L(L + 1). Proposition 27 An interval cannot be in the same time a meaningful interval and a meaningful gap.

4.1. DISCRETE HISTOGRAMS 113 u s (a) The original airport image 2000 1800 50 1600 1400 100 1200 150 1000 800 200 rag

of grey-levels and maximal meaningful interval (c) Thresholded image : black points represent points on the original image

113 4.1. DISCRETE HISTOGRAMS 113 u s (a) The original airport image rag replacements u s u s (b) Histogram of grey-levels and maximal meaningful interval (c) Thresholded image : black points represent points on the original image with grey-level value in the maximal meaningful interval. Figure 4.4: The original airport image and the corresponding histogram.

114 114 CHAPTER 4. MODES OF A HISTOGRAM rag replacements u s (a) The original image u s (b) Thresholded image rag replacements u s u s (c) Histogram of grey-levels and maximal meaningful interval (d) Maximal meaningful intervals when the reference interval is the support of the histogram Figure 4.5: A small image and the corresponding histogram.

115 4.2. EXTENSION TO CONTINUOUS HISTOGRAMS 115 Proof : Let [a, b] be a meaningful gap, then thanks to Proposition 2, we have M k(a, b) > M (1 p(a, b)), i.e. r(a, b) = k(a, b)/m < p(a, b). This shows that [a, b] cannot be a meaningful interval. From now on, and by the same arguments as in the previous section, we adopt the large deviations estimate. Definition 11 (meaningful gap) We say that an interval [a, b] containing k(a, b) points is a meaningful gap if and only if r(a, b) = k(a, b)/m < p(a, b) and r(a, b) log r(a, b) 1 r(a, b) + (1 r(a, b)) log p(a, b) 1 p(a, b) > 1 M L(L + 1) log. 2 Definition 12 (meaningful mode) We say that an interval is a meaningful mode if it is a meaningful interval and if it does not contain any meaningful gap. Definition 13 (maximal meaningful mode) We say that an interval I is a maximal meaningful mode if it is a meaningful mode and if for all meaningful modes J I, H(J) H(I) and for all meaningful modes J I, H(J) < H(I). On Figure 4.6, we present some experimental results. Subfigure (a) is the original histogram. We have L = 60 and M = 920. We first compute maximal meaningful intervals (subfigure (b)). We find only one: the interval [10, 22]. The second peak [40, 50] is not maximal meaningful because when we compute the number of false alarms, we find that NF ([10, 22]) < NF ([10, 50]) < NF ([40, 50]). Next, we compute maximal meaningful modes (subfigure (c)) and we find the two modes [10, 22] and [40, 50]. See also Figure 4.7 for some results. On the histogram of grey-levels of the airport image (Figure 4.4), we only find one maximal meaningful interval: the interval [88, 247]. Now, if we look for maximal meaningful modes (subfigure (a)), we find three: the intervals [18, 19], [93, 232] and [246, 246]. Subfigure (b) is the resulting thresholded image: the thresholds are chosen as the middle points between the maximal meaningful modes. Thus, black points represent points with value in [0, 56], grey points have value in [57, 238] and white points have value in [239, 255]. 4.2 Extension to continuous histograms In this section, we will extend the definitions of meaningful interval and of maximal meaningful interval to the case of a continuous histogram. The continuous formalism is not usable numerically, but provides a very useful qualitative information about meaningful intervals. A

116 116 CHAPTER 4. MODES OF A HISTOGRAM u s (a) The original histogram rag replacements u s u s (b) Only one Maximal meaningful interval (c) The two maximal meaningful modes. Figure 4.6: modes. Comparison between maximal meaningful intervals and maximal meaningful

4.2. EXTENSION TO CONTINUOUS HISTOGRAMS 117 2000 1800 1600 1400 1200 1000 800 600 u s 400 200 0 0 50 100 150 200 250 300 (a) The maximal meaningful modes of the grey-level histogram of the airport

117 4.2. EXTENSION TO CONTINUOUS HISTOGRAMS u s (a) The maximal meaningful modes of the grey-level histogram of the airport image u s (b) Thresholded image Figure 4.7: Maximal meaningful modes of the airport image. On the histogram of grey-levels of the airport image (Figure 4.4),we find three maximal meaningful modes (subfigure (a)): the intervals [18, 19], [93, 232] and [246, 246]. Subfigure (b) is the resulting thresholded image: black points represent points with value in [0, 56], grey points have value in [57, 238] and white points have value in [239, 255].

118 118 CHAPTER 4. MODES OF A HISTOGRAM continuous histogram will be defined in the following sense. Let f be a density function defined on [0, 1], such that 1 0 f = 1. For each interval [a, b] [0, 1], we can define the prior probability p(a, b) and the relative mass r(a, b) of the interval (which represents a posterior probability) by p(a, b) = b a and r(a, b) = This is a direct extension of the discrete histogram case. Now, in the continuous case, the histogram has infinite resolution, so that the large deviation estimate can be considered as infinitely accurate. Thus, we define meaningful intervals in the following way. Definition 14 (meaningful interval in the continuous case) We say that an interval [a, b] is meaningful as soon as r(a, b) > p(a, b). We adopt the same definitions as in the discrete case. Thus, the relative entropy of an interval [a, b] is defined by b a f. { 0 if r(a, b) p(a, b) H(a, b) = r(a, b) log r(a,b) 1 r(a,b) p(a,b) + (1 r(a, b)) log 1 p(a,b) otherwise. And we say that an interval I = [a, b] is maximal meaningful if and only if it is meaningful and if J I H(J) H(I), and J I H(J) < H(I). We also have the same property of maximal meaningful intervals which is that if I 1 and I 2 are two different maximal meaningful, then I 1 I 2 =. (The proof of Theorem 5 is valid in the continuous case). The following property is specific to the continuous case. Lemma 7 If f is continuous on [0, 1], then H is continuous on {(a, b)/0 a b 1}. In particular, H(a, b) goes to 0 when [a, b] tends to reduce to a single point {x 0 }, and H(a, b) goes to 0 when (a, b) goes to (0, 1). Proof : We first consider the case [a, b] {x 0 }. If f(x 0 ) = 0: for a and b close enough to x 0, we have b a f < b a. Thus, by definition of H, H(a, b) is zero.

119 4.3. PROPERTIES OF MEANINGFUL INTERVALS 119 If f(x 0 ) > 0: when a x 0 and b x 0, then r(a, b) 0, p(a, b) 0 and r(a,b) p(a,b) f(x 0). Consequently, H(a, b) goes to zero. Let us now consider the case a 0 and b 1. In this case r(a, b) and p(a, b) go both to 1. We have (1 r(a, b)) log 1 r(a, b) 1 p(a, b) = (1 b a f) log(1 b a f) (1 b a f) log(1 b + a). The first term of the difference goes to 0, and so does the second term, because 1 b Finally, we get H(a, b) 0. a f = a 0 f + 1 b f (a + 1 b) (max f). The relative entropy measures the meaningfulness of an interval. The higher the relative entropy is, the more meaningful the interval is. As previously, let F (r, p) be the function defined on [0, 1] [0, 1] by then F (r, p) = r log r p for 0 p r 1, + (1 r) log 1 r 1 p, F r 0 and F p This means that if two intervals have the same length (i.e. the same prior probability), then the more meaningful one is the one which has the maximal mass. And conversely, if two intervals I and J have the same mass ( I f = J f), then the more meaningful one is the one with minimal length. Remark : If f is a positive function defined on an interval I (with length l(i) > 0) and such that 0 < I f < +, then in the same way we can define the entropy of any sub-interval [a, b] of I just by setting b l([a, b]) a p(a, b) = and r(a, b) = f l(i) I f. 4.3 Properties of meaningful intervals Mean value of an interval Our aim here is to compare the relative entropy of two intervals which have the same mean 0. value. The mean value of an interval [a, b] being defined by r(a, b)/p(a, b). In the discrete case, if the histogram denoted h is defined on {1,..., L}, the mean value of an interval [a, b] is r(a, b) k(a, b) = p(a, b) M L b a + 1 = b x=a h(x) L x=1 h(x) L b a + 1.

120 120 CHAPTER 4. MODES OF A HISTOGRAM In the continuous case, if we have a density function f defined on [0, 1], the mean value of an interval [a, b] [0, 1] is r(a, b) p(a, b) = 1 b a We are only interested in meaningful intervals, this means that we will consider intervals with mean value larger than 1. Proposition 28 Let I and J be two intervals with same mean value: If p(i) > p(j), then b λ = r(i) p(i) = r(j) p(j) > 1. H(I) > H(J), which means that when the average is fixed, the more meaningful interval is the longer one. a f. Remark : This result has to be compared with the consistency result about the density of aligned points for meaningful segments. Proof : Let λ > 1 be fixed. For p in ]0, 1[ such that r = λp 1, we consider the function g(p) = F (λp, p) = λp log λ + (1 λp) log 1 λp 1 p. We want to show that g is increasing. We have [ g (p) = λ log 1 λp λ λp λp ], λ λp which shows that g (p) > 0, because (1 λp)/(λ λp) < 1, and for x 1, we have log x < x 1. The previous proposition has the following corollary which is a result about the concatenation of meaningful intervals. Corollary 6 Let [a, b] and [b, c] be two consecutive intervals, then H(a, c) min[h(a, b), H(b, c)], which means that the interval [a, c] is more meaningful than [a, b] or [b, c]. Proof : Since r(a, c) = r(a, b) + r(b, c) and p(a, c) = p(a, b) + p(b, c), we get [ ] r(a, c) r(a, b) r(b, c) min,, p(a, c) p(a, b) p(b, c) and then the result is a direct consequence of the previous proposition. One possible application of this corollary is the fact that maximal meaningful intervals cannot be consecutive.

121 4.3. PROPERTIES OF MEANINGFUL INTERVALS Structure of maximal meaningful intervals We first consider the continuous case: let f be a density function defined on [0, 1]. Let [a, b] [0, 1]. We recall that the interval I = [a, b] is said to be maximal meaningful if r(a, b) > p(a, b) and if J I H(J) H(I), and J I H(J) < H(I). In the following, we will assume that the function f is regular (continuous or C 1 on [0, 1]), and we will see some properties of the endpoints of a maximal meaningful interval. Theorem 6 Let [a, b] [0, 1] be a maximal meaningful interval, such that a 0 and b 1, then f(a) = f(b), f(a) < 1 b a b a f, f (a) > 0 and f (b) < 0. This theorem shows that a maximal meaningful interval [a, b] has necessarily the following configuration (see Figure 4.8). From a computational point of view, it also shows that if we want to find maximal meaningful intervals, we do not have to test all intervals, but only a small part of them (those such that f(a) = f(b), f (a) > 0 and f (b) < 0). 0 a b u s 1 Figure 4.8: Structure of a maximal meaningful interval. Proof : The relative entropy of the interval [a, b] is H(a, b) = ( b a f) log b a f b a + (1 b a f) log 1 b a f 1 (b a).

122 122 CHAPTER 4. MODES OF A HISTOGRAM Since [a, b] is maximal, the function x H(x, b) has a local maximum at point x = a and the function y H(a, y) has a local maximum at point y = b. Since a and b are in ]0, 1[, we get H a H (a, b) = (a, b) = 0, b 2 H a 2 (a, b) 0 and 2 H (a, b) 0. b2 A simple computation gives b H (b a)(1 a (a, b) = f(a) log f) b a (1 b + a) b a f + a f b a 1 b a f 1 b + a = 0, b H (1 b + a) a (a, b) = f(b) log f b b (b a)(1 b a f) a f b a + 1 b a f 1 b + a = 0. And then, since b a f b a > 1 > 1 b a f 1 b + a, we can compute f(a) and f(b), and get f(a) = f(b). For the second derivative, after computation and simplification, we get 2 b H a 2 (a, b) = f (b a)(1 a (a) log f) (1 b + a) b a f + [(b a)f(a) b a f]2 ( b a f)(b + a)2 [(1 b + a)f(a) (1 b a f)]2 (1 b a f)(1 b + 0. a)2 Now, also using the fact that b a f > b a, we get f (a) > 0. And the same kind of computation gives f (b) < 0. For ε > 0 small enough, we consider the interval [a ε, b]. Then, thanks to Proposition 28 and to the fact that the interval [a ε, b] is less meaningful than [a, b], we have It implies that We let ε go to 0, and finally get r(a ε, b) r(a, b) < p(a ε, b) p(a, b). 1 a f < 1 ε a ε b a f(a) 1 b a The previous theorem can be extended to the case of a discrete histogram; the result is then the following one (see also Figure 4.9). Theorem 7 Let h be a histogram defined on a finite set of values {1,..., L}. If [a, b] is a b a b maximal meaningful interval such that 1 < a < b < L, then h(a 1) < h(a) and h(b + 1) < h(b), h(a) > h(b + 1) and h(b) > h(a 1). a f. f.

123 4.3. PROPERTIES OF MEANINGFUL INTERVALS u s Proof : Figure 4.9: Maximal meaningful interval of a discrete histogram. Let M = L i=1 h(i) be the total weight. For an interval [i, j] we have j x=i and r(i, j) = h(x) M. p(i, j) = j i + 1 L The relative entropy H([i, j]) = H(r(i, j), p(i, j)) of the interval [i, j] is 0 if r(i, j) < p(i, j) and r(i, j) log r(i, j)+(1 r(i, j)) log(1 r(i, j)) r(i, j) log p(i, j)+(1 r(i, j)) log(1 p(i, j)) otherwise. We will use the fact that the function (r, p) H(r, p) is convex (see the proof of theorem 5) and that H r 0 for r p. Let [a, b] be a maximal meaningful interval, we will prove that h(a 1) < h(a) (the proof is exactly the same for the other inequalities). Assume that h(a 1) h(a). Since [a, b] is meaningful, we have r(a, b) > p(a, b). Using the strict convexity of H(r, p) for r > p, we have H(a, b) < max Since [a, b] is maximal, we have Thus, ( H(r(a, b) h(a) M, p(a, b) 1 h(a) ), H(r(a, b) + L M, p(a, b) + 1 ) L ). H(a + 1, b) = H(r(a, b) h(a) M, p(a, b) 1 ) H(a, b). L H(a, b) < H(r(a, b) + h(a) M, p(a, b) + 1 L ). This shows that r(a, b) + h(a)/m > p(a, b) + 1/L. Using the fact that H r get Thus, H(a 1, b) = H(r(a, b) + 0 for r p, we h(a 1) M, p(a, b) + 1 h(a) ) H(r(a, b) + L M, p(a, b) + 1 L ). H(a 1, b) > H(a, b), which is a contradiction with the maximality of [a, b].

124 124 CHAPTER 4. MODES OF A HISTOGRAM 4.4 The reference interval In the previous sections, we always considered that the reference interval was fixed (for example [0, 1] in the continuous case). We consider now the problem of the choice of the reference interval. Assume for example that we observe the histogram of grey-levels of an image, knowing a priori that grey-levels have value in an interval of length L (with L > 1). Now, suppose that the resulting histogram has support in [0, 1] (the support is defined as the interval [α, β] where α = inf{x/f(x) 0} and β = sup{x/f(x) 0}). If we want to detect meaningful and maximal meaningful intervals, which reference interval shall we consider? Shall we work on the length L interval or on the support of the histogram? In order to answer this question, we first have to know what happens if the length L of the reference interval gets very large. Is the support of the histogram meaningful? maximal meaningful? Let L 1 be the length of the reference interval, and let f be the histogram with support [0, 1]. In the following, for an interval [a, b] we denote by p L (a, b) its prior probability (i.e. its relative length) and by r L (a, b) its relative weight : p L (a, b) = b a L and r L (a, b) = b a f I L f = b a f 1 = r(a, b). 0 f Notice that r L (a, b) is independent of L. The relative entropy H L (a, b) of an interval [a, b] is 0 if r(a, b) (b a)/l and else: H L (a, b) = r(a, b) log L r(a, b) b a + (1 r(a, b)) log 1 r(a, b) 1 b a. L In particular, for the support of the histogram we have r L (0, 1) = 1 and p L (0, 1) = 1/L. This shows that the support is meaningful as soon as L > 1 and that H L (0, 1) = log L. Proposition 29 (continuous case) If f is continuous and f(0) > 0 and f(1) > 0, then there exists L 0 such that L L 0, [a, b] [0, 1] H L (a, b) < H L (0, 1). This means that when the length of the reference interval is large enough, then the support of the histogram is maximal meaningful (and it is the only one). Proof : For an interval [a, b] [0, 1], and for a length L > 1, we have H L (a, b) = H(a, b) + r(a, b) log L + (1 r(a, b)) log The last term of the sum being negative, we get H L (a, b) H(a, b) + r(a, b) log L. 1 (b a) 1 b a. L

125 4.4. THE REFERENCE INTERVAL 125 We show now that there is a constant C > 0 such that [a, b] [0, 1] H(a, b) 1 r(a, b) < C. It will then imply that for all L such that log L C, H L (a, b) H L (0, 1) = log L, with equality if and only if r(a, b) = 1, that is if [a, b] = [0, 1]. Thanks to Lemma 7, we already know that H is continuous, and consequently bounded. If we want to show that H/(1 r) is bounded, we just have to prove that it is bounded when r(a, b) goes to 1, that is when a goes to zero and b to 1. We have H(a, b) 1 r(a, b) = b a f 1 b a f log b a f b a + log 1 b a f 1 b + a. When a 0 and b 1, then (b a) 1 and b a f 1. And so: b a f b a f ) b a f b + a 1 b a f log b a f b a 1 b a f ( b a f b a 1 1 b a f = b + a 1 b a f. Since f(0) > 0 and f(1) > 0, there is m > 0 such that for a close to 0 and b close to 1, 1 b a f = a 0 f + 1 b f m(1 b + a). And then since f is continuous on [0, 1] it is bounded by a constant M, and finally 1 M 1 b + a 1 b a f 1 m, which shows that H/(1 r) is bounded as (a, b) (0, 1). Corollary 7 (discrete histogram) Let h be a discrete histogram defined on a finite set of values {i/n}, 0 i n. We assume that h(0) > 0 and h(1) > 0 (i.e. the support of the histogram is [0, 1]). Then there exists L 0 such that L L 0, [a, b] [0, 1] H L (a, b) < H L (0, 1). This means that when the length of the reference interval is large enough, then the support of a discrete histogram is maximal meaningful (and it is the only one). Proof : Since the histogram is defined on a finite set of values, the number of discrete intervals [a, b] [0, 1] is finite. We also notice that if [a, b] [0, 1], then 1 r(a, b) > 0. Thus, H(a, b)/(1 r(a, b)) is bounded for all [a, b] [0, 1]. As in the proof of Proposition 29, this shows that when the length L of the reference interval is large enough, we have [a, b] [0, 1] H L (a, b) H(a, b) + r(a, b) log L < log L = H L (0, 1).

126 126 CHAPTER 4. MODES OF A HISTOGRAM In Proposition 29, we were interested in the case of a continuous density function defined on [0, 1] and such that f(0) > 0 and f(1) > 0. Now what happens when f(0) = 0 or f(1) = 0? The first thing we can say is that the support [0, 1] of the histogram is asymptotically maximal meaningful in the following sense. Proposition 30 Let f be a continuous density function on [0, 1]. Then for all ε > 0, there exists L 0 > 1 such that for every interval [a, b] [0, 1] of length less than 1 ε, and for all L L 0, then H L (a, b) < H L (0, 1). This means that the support of the histogram is asymptotically maximal meaningful Proof : This is a simple consequence of the fact that H/(1 r) is bounded when 1 r ε L L u s Figure 4.10: On the left, the support of the histogram is maximal meaningful when the length L of the reference interval is long enough. On the right, the support is asymptotically maximal meaningful. 4.5 Applications and experimental results In this section, we will present some joint applications of meaningful alignments in an image and of meaningful modes of a histogram. For each image, in a first step, we find the maximal meaningful alignments of the image. We obtain a finite set of segments. Each one of these segments has an orientation (valued in [0, 2π[ because segments are oriented). The precision of the direction of the segment is related to its length: if l denotes the length of the segment, the precision of its direction is 1/l. The second step is to get the discrete histogram of the orientations of the detected alignments. The interval [0, 2π[ is decomposed into n = 2πl min bins, where l min is the minimal length of the detected segments. Thus, the size of a bin is 1/l min. The third step is to look for maximal meaningful modes of the histogram of orientations. Notice that the framework is a little different. Let us explain this: a histogram of orientations is defined on the circular interval [0, 2π[. Thus, when we look for meaningful intervals [a, b], we do not only consider intervals with 0 a b < 2π, but also intervals such that 0 b a < 2π. We define an interval [a, b] such that 0 b a < 2π as the union [a, 2π[ [0, b].

127 4.5. APPLICATIONS AND EXPERIMENTAL RESULTS 127 Image 1 : Pencil strokes (see Figure 2.5(a)). We again consider this image, more precisely, the maximal meaningful alignments of this image (shown in 2.5(c)). On Figure 4.11 we first present the histogram of the length of the obtained maximal meaningful segments (Figure (a)). We compute the maximal meaningful modes of this histogram and we find the interval [22, 51]. On Figure (b), we present the histogram of the orientation modulo π of the obtained maximal meaningful segments. We measure the orientation in degrees: the interval [ 90, 90] degrees is divided into 2πl min = 112 bins. We then compute maximal meaningful modes of this histogram. We find five intervals: [85, 93], [ 44, 39], [ 5, 18], [28, 31], [39, 45]. Finally, on figure 4.12, for each one of the five maximal meaningful modes of the histogram, we show the segments which have their orientation in the mode. Image 2: Uccello s painting (see Figure 4.13). This image (a) is a result of the scan of an Uccello s painting: Presentazione della Vergine al tempio (from the book L opera completa di Paolo Uccello, Classici dell arte, Rizzoli). The size of this image is In Figure (b), we display all maximal ε-meaningful segments with ε = On Figure (c), we compute the histogram of the orientations modulo 2π of the obtained maximal meaningful segments. We measure the orientation in degrees, and the interval [ 180, 180] degrees is divided into 2πl min = 138 bins. We compute the maximal meaningful modes of the histogram and we find five intervals: [175, 175], [ 92, 87], [ 4, 9], [87, 95] and [156, 162]. The interval mode [156, 162] corresponds to the left side of the roof of the temple. The four others modes correspond to the oriented vertical and horizontal lines. For each mode, we show on Figure 4.14 the segments which have their orientation in the mode.

128 128 CHAPTER 4. MODES OF A HISTOGRAM u s (a) The histogram of the length of the maximal meaningful segments: one mode: (22, 51) u s (b) The histogram of the orientation modulo π of the maximal meaningful segments. We measure the orientation in degrees. We find five modes: (85, 93), ( 44, 39), ( 5, 18), (28, 31), (39, 45). Figure 4.11: Histogram of length and histogram of orientation with maximal meaningful modes.

129 4.5. APPLICATIONS AND EXPERIMENTAL RESULTS rag replacements u s u s u s (a) The maximal meaningful segments of the pencil strokes image (b) Segments with orientation in the mode (85, 93) (c) Segments with orientation in the mode ( 44, 39) rag replacements u s u s u s (d) Segments with orientation in the mode ( 5, 18) (e) Segments with orientation in the mode (28, 31) (f) Segments with orientation in the mode (39, 45) Figure 4.12: Grouping of segments according to common orientation.

130 CHAPTER 4. MODES OF A HISTOGRAM rag replacements u s u s (a) The original image: Uccello s painting (b) Maximal ε-meaningful segments for ε = 10 6.

130 130 CHAPTER 4. MODES OF A HISTOGRAM rag replacements u s u s (a) The original image: Uccello s painting (b) Maximal ε-meaningful segments for ε = u s (c) Histogram of orientations modulo 2π of the maximal meaningful segments: five maximal meaningful modes. Figure 4.13: Uccello s painting: maximal meaningful alignments and histogram of orientations

131 4.5. APPLICATIONS AND EXPERIMENTAL RESULTS 131 u s u s (a) Segments with orientation in the mode ( 4, 9) (b) Segments with orientation in the mode (175, 185) rag replacements u s u s u s (c) Segments with orientation in the mode ( 92, 87) (d) Segments with orientation in the mode (87, 95) (e) Segments with orientation in the mode (156, 162) Figure 4.14: Grouping of segments according to common orientation.

132 132 CHAPTER 4. MODES OF A HISTOGRAM

133 Chapter 5 Contrast In this chapter, we will apply to edge detection the same methodology we have already developed for meaningful alignments and for meaningful modes of a histogram. We define an edge as a level line along which the contrast of the image is strong. We call boundary a closed edge. We shall in the following give a definition of meaningfulness and of maximal meaningfulness for both objects. Then, we shall show experiments and discuss them. A comparison with the classical Mumford-Shah segmentation method will be made and also with the Canny-Deriche edge detector. We shall give a (very simple in that case) proof of the existence of maximal detectable gestalt, applied to the edges. What we do on the edges won t be a totally straightforward extension of the method we developed for alignments. Indeed, we cannot do for edge or boundary strength as for orientation, i.e. we cannot assume that the modulus of the gradient of an image is uniformly distributed. 5.1 Contrasted Boundaries We call contrasted boundary any closed curve, long enough, with strong enough contrast and which fits well to the geometry of the image, namely, orthogonal to the gradient of the image at each one of its points. We will first define ε-meaningful contrasted boundaries, and then maximal meaningful contrasted boundaries. Notice that this definition depends upon two parameters (long enough, contrasted enough) which will be usually fixed by thresholds in a computer vision algorithm, unless we have something better to say. In addition, most boundary detection will, like the snake method [43], introduce regularity parameters for the searched for boundary [56]. If we remove the condition long enough, we can have boundaries everywhere, as is patent in the classical Canny filter [7]. The work of Konishi, Coughlan, Yuille and Zhu [46] is related to our work in the sense that they also treat the problem of edge detection as one of statistical inference : they want to evaluate the effectiveness of different edge detectors. Now, the main difference is that they have a learning phase where they learn the statistics of edge detector responses both on and off edges on a dataset of pre-segmented images. 133

134 134 CHAPTER 5. CONTRAST The considered geometric event will be: a strong contrast along a level line of an image. Level lines are curves directly provided by the image itself. They are a fast and obvious way to define global, contrast insensitive candidates to edges [8]. Actually, it is well acknowledged that edges, whatever their definition might be, are as orthogonal as possible to the gradient [7, 14, 22, 52, 67]. As a consequence, we can claim that level lines are the adequate candidates for following up local edges. The converse statement is false : not all level lines are edges. The claim that image boundaries (i.e. closed edges) in the senses proposed in the literature [87, 62] also are level lines is a priori wrong. How wrong it is will come out from the experiments, where we compare an edge detector with a boundary detector. Surprisingly enough, we will see that they can give comparable results. We now proceed to define precisely the geometric event: at each point of a length l (counted in independent points) part of a level line, the contrast is larger than µ. Then, we compute the expectation of the number of occurrences of such an event (i.e. the number of false alarms). This will define the thresholds: minimal length of the level line, and also minimal contrast in order to be meaningful. We will give some examples of typical numerical values for these thresholds in digital images. Then, as it has been done for other gestalts like alignments and histograms, we will define here a notion of maximality, and derive some properties Definitions Let u be a discrete image, of size N N. The number of grey-levels is finite, and for each grey-level λ, we can define the level sets χ λ = {x/u(x) λ} and χ λ = {x/u(x) λ}. A level line is then defined as the boundary of a connected component of level set. Let L be a level line of the image u. We denote by l the its length counted in independent points. In the following, we will consider that points at a geodesic distance (along the curve) larger than 2 are independent (i.e. the contrast at these points are independent random variables). Let x 1, x 2,...x l denote the l considered points of L. For a point x L, we will denote by c(x) the contrast at x. It is defined by c(x) = u (x), (5.1) where u is computed by a standard finite difference on a 2 2 neighborhood (same computation as for alignments). For µ R +, we consider the event: for all 1 i l, c(x i) µ, i.e. each point of L has a contrast larger than µ. From now on, all computations are performed in the Helmholtz framework explained in the introduction: we make all computations as though the contrast observations at x i were mutually independent. Since the l points are independent, the probability of this event is P [c(x 1 ) µ] P [c(x 2 ) µ]... P [c(x l ) µ] = H(µ) l, (5.2)

135 5.1. CONTRASTED BOUNDARIES 135 where H(µ) is the probability for a point in the image to have a contrast larger than µ. An important question here is the choice of H(µ). Shall we consider that H(µ) is given by an a priori probability distribution, or is it given by the image itself (i.e. by the histogram of gradient norm in the image)? In the case of alignments, we took by Helmholtz principle the orientation at each point of the image to be a random, uniformly distributed variable on [0, 2π]. Here, in the case of contrast, it does not seem sound at all to consider that the contrast is uniformly distributed. In fact, when we observe the histogram of the gradient norm of a natural image (see Figure 5.1), we notice that most of the points have a small contrast (between 0 and 3), and that only a few points are highly contrasted. This is explained by the fact that a natural image contains many flat regions (the so-called blue sky effect, [39]). In the following, we will consider that H(µ) is given by the image itself, which means that H(µ) = 1 #{x / u (x) µ}. (5.3) N 2 In order to define a meaningful event, we have to compute the expectation of the number of occurrences of this event in the observed image. Thus, we first define the number of false alarms. Definition 15 (Number of false alarms) Let L be a level line with length l, counted in independent points. Let µ be the minimal contrast of the points x 1,..., x l of L. The number of false alarms of this event is defined by where N ll is the number of level lines in the image. NF (L) = N ll [H(µ)] l, (5.4) Notice that the number N ll of level lines is provided by the image itself. We now define ε-meaningful level lines. The definition is analogous to the definition of ε-meaningful modes of a histogram or to the definition of ε-meaningful alignments: the number of false alarms of the event is less than ε. Definition 16 (ε-meaningful boundary) A level line L with length l and minimal contrast µ is ε-meaningful if NF (L) = N ll [H(µ)] l ε. (5.5) The above definition involves two variables: the length l of the level line, and its minimal contrast µ. The number of false alarms of an event measures the meaningfulness of this event: the smaller it is, the more meaningful the event is. Let us now proceed to define edges. We denote by N llp the number of pieces of level lines in the image. Definition 17 (ε-meaningful edge) A piece of level line L with length l and minimal contrast µ is ε-meaningful if NF (L) = N llp [H(µ)] l ε. (5.6)

136 136 CHAPTER 5. CONTRAST u s percentage of pixels u s norm of the gradient u s contrast Figure 5.1: From left to right : 1. original image; 2. histogram of the norm of the gradient; 3. its repartition function (µ P [ u µ]).

137 5.1. CONTRASTED BOUNDARIES 137 Here is how N llp is computed : we first compute all level lines at uniformly quantized levels (grey-level quantization step is 1 and generally ranges from 1 to 255). For each level line, L i with length l i, we compute its number of pieces, sampled at pixel rate, the length unit being pixel side. We then have N llp = i l i (l i 1). 2 This fixes the used number of samples. This number of samples will be fair for a 1-pixel accurate edge detector. Clearly, we do detection and not optimization of the detected edge : in fact, according to Shannon conditions, edges have a between two or three pixels width. Thus, the question of finding the best edge representative among the found ones is not addressed here, but has been widely addressed in the literature [7, 14] Thresholds In the following we will denote by F the function defined by F (µ, l) = N ll [H(µ)] l. (5.7) Thus, the number of false alarms of a level line of length l and minimal contrast µ is simply F (µ, l). Since the function µ H(µ) = P [c(x) µ] is decreasing, and since for all µ, we have H(µ) 1, we obtain the following elementary properties: We fix µ and l l, then F (µ, l) F (µ, l ), which shows that if two level lines have the same minimal contrast, the more meaningful one is the longer one. We fix l and µ µ, then F (µ, l) F (µ, l), which shows that if two level lines have the same length, the more meaningful one is the one with higher contrast. When the contrast µ is fixed, the minimal length l min (µ) of an ε-meaningful level line with minimal contrast µ is l min (µ) = log ε log N ll. (5.8) log H(µ) Conversely, if we fix the length l, the minimal contrast µ min (l) needed to become ε-meaningful is such that µ min (l) = H 1 ( [ε/n ll ] 1/l). (5.9)

138 138 CHAPTER 5. CONTRAST u s Figure 5.2: Minimal length of a 1-meaningful boundary, as a function of the minimal contrast µ, for µ [0, 50]. Numerical example: For the previous image the house image, we obtain N ll On Figure 5.2, we plot the function µ l min (µ) where l min (µ) is the minimal length of an 1-meaningful level line with minimal contrast µ. Notice that l min (µ) is the length counted in independent points, and thus, the real length of the level line is double. 5.2 Maximality In this section, we address two kinds of maximality for the edges and boundary. Let us start with boundaries. A natural relation between closed level lines is given by their inclusion [55]. If C and C are two different closed level lines, then C and C cannot intersect. Let D and D denote the bounded domains surrounded by C and C. Either D D = or (D D or D D). We can consider, as proposed by Pascal Monasse, the inclusion tree of all level lines. From now on, we work on the subtree of the detected meaningful level curves, that is, the ones for which F (µ, l) ε where ε is our a priori fixed expectation of false alarms. (In practice, we take ε = 1 in all experiments.) On this subtree, we can, following Pascal Monasse, define what we shall call a maximal monotone level curve interval, that is, a sequence of level curves C i, i [1, k] such that : - for i 2, C i is the unique son of C i 1, - the interval is maximal (not contained in a longer one), - the grey-levels of the detected curves of the interval are either decreasing from 1 to k, or increasing from 1 to k.

139 5.3. EXPERIMENTS 139 We can see many such maximal monotone intervals of detected curves in the experiments : they roughly correspond to fat edges, made of several well contrasted level lines. The edge detection ideology tends to define an edge by a single curve. This is easily made by selecting the best contrasted edges along a series of parallel ones. Definition 18 We associate with each maximal monotone interval its optimal level curves, that is, the ones for which the false alarm number F (µ, l) is minimal along the interval. We call optimal boundary map of an image the set of all optimal level curves. This optimal boundary map will be compared in the experiments with classical edge detectors or segmentation algorithms. We now address the problem of finding optimal edges among the detected ones. We won t be able to proceed as for the boundaries. Although the pieces of level lines inherit the same inclusion structure as the level lines, we cannot compare two of them belonging to different level curves for detectability, since they can have different positions and lengths. We can instead compare two edges belonging to the same level curve. Our main aim is to define on each curve a set of disjoint maximally detectable edges. In the following, we denote by F (E) = F (µ, l) the false alarm number of a given edge E with minimal gradient norm µ and length l. Definition 19 We call maximal meaningful edge any edge E such that for any other edge E on the same level curve such that E E (resp. E E) we have F (E ) > F (E) (resp. F (E ) F (E)). This definition follows the methodology developed in Chapter 2 and in Chapter 4, where we applied it to the definition of maximal alignments and maximal modes in a histogram. Proposition 31 Two maximal edges cannot meet. Proof : Let E and E be two maximal distinct and non-disjoint meaningful edges in a given level curve and µ and µ the respective minima of gradient of the image on E and E. Assume e.g. that µ µ. Then E E has the same minimum as E but is longer. Thus, by the remark of the preceding subsection, we have F (µ, l + l ) < F (µ, l ), which implies that E E has a smaller number of false alarms than E. Thus, E is not maximal. As a consequence, two maximal edges cannot meet. 5.3 Experiments INRIA desk image (Figure 5.3). In this experiment, we compare our method with two other methods : Mumford and Shah image segmentation and Canny-Deriche edge detector.

140 140 CHAPTER 5. CONTRAST In the Mumford and Shah model [58], given an observed image u, one looks for the piecewise approximation v of u that minimizes the functional E(v) = λ v u 2 + length ( K(v) ), where length ( K(v) ) is the one-dimensional measure of the discontinuity set of v. Hence, this energy is a balance between a fidelity term (the approximation error in L 2 norm) and a regularity term (the total length of the boundaries). The result v, called a segmentation of u, depends on the parameter λ, that indicates how to weight the two terms. As shown on Figure 5.3, the Mumford-Shah model generally produces reasonable boundaries except in flat zones where spurious boundaries often appear (see the front side of the desk for example). The Canny-Deriche filter [7, 16] is an optimization of Canny s well known edge detector, roughly consisting in the detection of maxima of the norm of the gradient in the direction of the gradient. Notice that, in contrast with the Mumford-Shah model and with our model, it does not produce a set of boundaries (i.e. one-dimensional structures) but a discrete set of points that still are to be connected. It depends on two parameters : the width of the impulse response, generally set to 1 pixel, and a threshold on the norm of the gradient that selects candidates for edge points. As we can see on Figure 5.3, the result is very dependent on this threshold. Cheetah image (Figure 5.4). This experiment compares our edge detector with the Mumford-Shah model. As before, we observe that the Mumford-Shah model produces some spurious boundaries on the background. DNA image (Figure 5.5). This experiment illustrates the concept of optimal boundaries we have introduced previously. When we compute the boundaries of the original image, each spot produces several parallel boundaries due to the important blur. With the definition of maximality we adopted, we select exactly one boundary for each spot. Segments image (Figure 5.6). As in the DNA experiment, the optimal boundaries allow to select exactly one boundary per object (here, hand-drawn segments). In particular, the number of boundaries we find (21) counts exactly the number of segments.

141 5.3. EXPERIMENTS 141 u s Figure 5.3: First row: left: original image; right: boundaries obtained with the Mumford- Shah model (1000 regions). Second row: edges obtained with Canny-Deriche edge detector, for two different threshold values (2 and 15). Third row: edges (left) and boundaries (right) obtained with our model (ε = 1). Fourth row: reconstruction with the Mumford-Shah model (left) and with our model (right).

142 142 CHAPTER 5. CONTRAST u s Figure 5.4: First row: original image (left) and boundaries obtained with the Mumford-Shah model with 1000 regions (right). Second row: edges (left) and boundaries (right) obtained with our method (ε = 1).

143 5.3. EXPERIMENTS 143 u s u s u s Figure 5.5: From top to bottom : 1. original image; 2. boundaries; 3. optimal boundaries.

144 144 CHAPTER 5. CONTRAST u s u s u s u s Figure 5.6: Upleft: original image. Upright: all the level lines with grey-level step equal to 5. Downleft: boundaries. Downright: optimal boundaries.

145 Chapter 6 Some more experimentation On all the following images, we apply exactly the same algorithms. For each image, we compute: upleft: the original image; upright: the maximal ε-meaningful alignments (p = 1/16 and ε = 10 3 ); downleft: the optimal meaningful boundaries (ε = 1) and downright: the minimal description of the set of maximal meaningful alignments. We recall that we define the minimal description of the set of maximal meaningful alignments of an image in the following way : we say that a point x is maximal for a segment S if x belongs to S, the orientation at point x is aligned (according to precision p) with the direction of the segment S and if S is the most meaningful segment (smallest number of false alarms) containing x and aligned with the orientation at x. Then, we only display the maximal meaningful segments having the property that they are still meaningful when we only count as aligned the number of maximal points they contain. As a general comment of these experiments, we first can say that the detection of maximal meaningful alignments is relevant when straight geometric structures are present in the image (for example in Figures 6.1, 6.2, 6.3, 6.8). On the other hand, for images of faces and textures, we detect alignments but they are not always relevant since they don t provide the right explanation of the image. Most of them are due to the presence of smooth curves or smooth structures in the image, which creates tangent alignments. The minimal description is generally a good way to reduce the set of all maximal meaningful alignments without losing important information. Because of the precision p = 1/16, we find many slanted alignments; and the minimal description is a way to keep only the best alignments. The obtained results are generally relevant (see for example Figure 6.6). It would certainly be interesting to go on working on this minimal description principle. As expected, the optimal meaningful boundaries are relevant when the image contains homogeneous patches on a contrasted background (see for example the INRIA pattern: Figure 6.3, or the openings of the church: Figure 6.2). On the other hand, for textures (the muscle image 6.5, the image of straw 6.6 or the forest image 6.7), we detected many meaningful boundaries. The reason for this is that textured regions will certainly require a special treatment, not only using level lines which do not provide a good description of textures. 145

146 146 CHAPTER 6. SOME MORE EXPERIMENTATION u s u s u s u s Figure 6.1: aerial image of buildings: size (from INRIA-Robotvis database).

147 147 u s u s u s u s Figure 6.2: the church of Valbonne: size (from INRIA-Robotvis database).

148 148 CHAPTER 6. SOME MORE EXPERIMENTATION u s u s u s u s Figure 6.3: the INRIA Pattern: size (from INRIA-Robotvis database).

149 149 u s u s u s u s Figure 6.4: the famous Lena: size

150 150 CHAPTER 6. SOME MORE EXPERIMENTATION u s u s u s u s Figure 6.5: an image of muscle: size

151 151 u s u s u s u s Figure 6.6: an image of straw: size

152 152 CHAPTER 6. SOME MORE EXPERIMENTATION u s u s u s u s Figure 6.7: a forest image: size (from the database of the Berkeley Digital Library Project ).

153 153 u s u s u s u s Figure 6.8: the La Cornouaille boat: size

Outils de Recherche Opérationnelle en Génie MTH Astuce de modélisation en Programmation Linéaire

Outils de Recherche Opérationnelle en Génie MTH 8414 Astuce de modélisation en Programmation Linéaire Résumé Les problèmes ne se présentent pas toujours sous une forme qui soit naturellement linéaire.