Incorporating stereo information within the. kernel framework

Similar documents
Homology groups of disks with holes

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Computational modeling techniques

On Topological Structures and. Fuzzy Sets

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

A proposition is a statement that can be either true (T) or false (F), (but not both).

Pattern Recognition 2014 Support Vector Machines

A Matrix Representation of Panel Data

1 The limitations of Hartree Fock approximation

Lyapunov Stability Stability of Equilibrium Points

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes

Testing Groups of Genes

FINITE BOOLEAN ALGEBRA. 1. Deconstructing Boolean algebras with atoms. Let B = <B,,,,,0,1> be a Boolean algebra and c B.

Tree Structured Classifier

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

A new Type of Fuzzy Functions in Fuzzy Topological Spaces

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

Chapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms

The Electromagnetic Form of the Dirac Electron Theory

Math 302 Learning Objectives

Chapter 8 Predicting Molecular Geometries

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

Chapter 3: Cluster Analysis

Chem 115 POGIL Worksheet - Week 12 Molecular Shapes

REPRESENTATIONS OF sp(2n; C ) SVATOPLUK KR YSL. Abstract. In this paper we have shown how a tensor product of an innite dimensional

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

AIP Logic Chapter 4 Notes

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

RECHERCHES Womcodes constructed with projective geometries «Womcodes» construits à partir de géométries projectives Frans MERKX (') École Nationale Su

MAKING DOUGHNUTS OF COHEN REALS

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Differentiation Applications 1: Related Rates

5 th grade Common Core Standards

Full Disjunctions: Polynomial-Delay Iterators in Action

SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 4. Function spaces

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

Matter Content from State Frameworks and Other State Documents

Introduction: A Generalized approach for computing the trajectories associated with the Newtonian N Body Problem

Equilibrium of Stress

Dataflow Analysis and Abstract Interpretation

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

A Few Basic Facts About Isothermal Mass Transfer in a Binary Mixture

, which yields. where z1. and z2

Name Honors Chemistry / /

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Section 6-2: Simplex Method: Maximization with Problem Constraints of the Form ~

Chapter 2 GAUSS LAW Recommended Problems:

ECEN 4872/5827 Lecture Notes

The blessing of dimensionality for kernel methods

A Correlation of. to the. South Carolina Academic Standards for Mathematics Precalculus

8 th Grade Math: Pre-Algebra

Module 4: General Formulation of Electric Circuit Theory

The Equation αsin x+ βcos family of Heron Cyclic Quadrilaterals

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

CHEM Thermodynamics. Change in Gibbs Free Energy, G. Review. Gibbs Free Energy, G. Review

ENGI 4430 Parametric Vector Functions Page 2-01

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

Turing Machines. Human-aware Robotics. 2017/10/17 & 19 Chapter 3.2 & 3.3 in Sipser Ø Announcement:

Distributions, spatial statistics and a Bayesian perspective

NUMBERS, MATHEMATICS AND EQUATIONS

BASD HIGH SCHOOL FORMAL LAB REPORT

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Revisiting the Socrates Example

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

A crash course in Galois theory

Chapter 9 Vector Differential Calculus, Grad, Div, Curl

B. Definition of an exponential

4 electron domains: 3 bonding and 1 non-bonding. 2 electron domains: 2 bonding and 0 non-bonding. 3 electron domains: 2 bonding and 1 non-bonding

Keysight Technologies Understanding the Kramers-Kronig Relation Using A Pictorial Proof

AP Statistics Notes Unit Two: The Normal Distributions

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1

Thermodynamics and Equilibrium

Dead-beat controller design

Subject description processes

Course/ Subject: Chemistry I Grade: Teacher: Hill Oberto Month: September/October (6-8 weeks)

Support-Vector Machines

A little noticed right triangle

READING STATECHART DIAGRAMS

Determining the Accuracy of Modal Parameter Estimation Methods

SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 4. Function spaces

UNIT 1 COPLANAR AND NON-COPLANAR FORCES

Materials Engineering 272-C Fall 2001, Lecture 7 & 8 Fundamentals of Diffusion

General Chemistry II, Unit II: Study Guide (part 1)

BASIC DIRECT-CURRENT MEASUREMENTS

CMSC 425: Lecture 9 Basics of Skeletal Animation and Kinematics

Preparation work for A2 Mathematics [2017]

Eric Klein and Ning Sa

Supplementary Course Notes Adding and Subtracting AC Voltages and Currents

Part 3 Introduction to statistical classification techniques

Checking the resolved resonance region in EXFOR database

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium

- 6 - UNIQUENESS OF A ZARA GRAPH ON 126 POINTS AND NON-EXISTENCE OF A COMPLETELY REGULAR TWO-GRAPH ON. A. Blokhuis and A.E.

Transcription:

Incrprating stere infrmatin within the graph kernel framewrk Pierre-Anthny Grenier, Luc Brun, Didier Villemin T cite this versin: Pierre-Anthny Grenier, Luc Brun, Didier Villemin. graph kernel framewrk. 2013. <hal-00809066v2> Incrprating stere infrmatin within the AL Id: hal-00809066 https://hal.archives-uvertes.fr/hal-00809066v2 Submitted n 7 Oct 2013 AL is a multi-disciplinary pen access archive fr the depsit and disseminatin f scientific research dcuments, whether they are published r nt. The dcuments may cme frm teaching and research institutins in France r abrad, r frm public r private research centers. L archive uverte pluridisciplinaire AL, est destinée au dépôt et à la diffusin de dcuments scientifiques de niveau recherche, publiés u nn, émanant des établissements d enseignement et de recherche français u étrangers, des labratires publics u privés.

Incrprating stere infrmatin within the graph kernel framewrk Pierre-Anthny Grenier, Luc Brun, and Didier Villemin GREYC UMR CNRS 6072, LCMT UMR CNRS 6507, Caen, France {pierre-anthny.grenier,didier.villemin}@ensicaen.fr, luc.brun@greyc.ensicaen.fr Abstract. Mlecules being ften described using a graph representatin, graph kernels prvide an interesting framewrk which allws t cmbine machine learning and graph thery in rder t predict mlecule s prperties. wever, sme f these prperties are induced bth by the cvalent bund relatinships between atms and by cnstraints n the relative psitining f these atms. Graph kernels based slely n the graph representatin f a mlecule d nt encde the relative psitining f atms and are cnsequently unable t predict accurately mlecule s prperties cnnected with this relative psitining. In this reprt, rdered structured bject are intrduced in rder t incrprate spatial cnstraints within the graph kernel framewrk. The incrpratin f this new features within the graph kernel framewrk allws t predict accurately stere infrmatin hence vercming the previus limitatin. Keywrds: Graph kernel, Cheminfrmatics, Chirality. 1 Intrductin The purpse f Cheminfrmatic is t predict prperties f mlecules, in rder t facilitate drug design. Cheminfrmatics is based n the similarity principle: tw structurally similar mlecules shuld have similar prperties. One cmmn methd t predict chemical prperties cnsist t design a vectr f descriptrs frm a mlecule and use statistical machine learning algrithms t predict mlecule s prperties. Such methds [4, 3], can use structural infrmatin, physical prperties r bilgical activities in rder t cmpute vectrs f descriptrs. wever, such an apprach requires t either select a randm set f pre defined descriptrs (befre a variable selectin step) r t use an heuristic definitin f apprpriate descriptrs by a chemical expert. In bth cases, the transfrmatin f the graph int a finite vectr f features induces a lss f infrmatin. Anther apprach cnsist t encde a mlecule by a graph, and use it t predict prperties. Definitin 1. Mlecular graph

2 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin A mlecular graph is a labeled graph G = (V, E, µ, ν) representing a mlecule. The unlabeled graph (V, E) encdes the structure f the mlecule, each nde v V encding an atm and each edge e = (v, w) E a bnd between tw atms v and w. µ assciates t each vertex v V a label µ(v) encding the nature f the atm and ν assciates t each edge e a type f bnd ν(e) (single, duble, triple r armatic). Several methds based n graph thery use this representatin t predict prperties. One apprach cnsists t search subgraphs with a large difference f frequencies between a set f psitive and a set f negative examples [5]. Anther apprach cnsists t encde each class f mlecules by a graph prttype and t measure the structural similarity between each prttype and an input mlecule [6]. wever, these methds can nt be easily cmbined with machine learning algrithms. This is nt the case f graph kernel methds, which can be cupled t machine learning algrithm prvided that the kernel is definite psitive. Let G be the set f graph. A definite psitive kernel is a symmetric functin k : G G R such that: n i=1 j=1 n c i c j k(g i, G j ) 0 where n > 0, G 1,..., G n G, c 1,..., c n R Such a definite psitive kernel crrespnds t a scalar prduct between tw vectrs ψ(g) and ψ(g ) in an ilbert space. A large family f graph kernel methds, assciates a bag f patterns t each graph, and define the kernel value frm a measure f similarity between thse bags [7 9, 1]. In [7] a graph kernel is defined as a measure f similarity between set f walks extracted frm each graph. But thse walks are linear features and thus have limited expressiveness. An infinite set f tree patterns is used in [8] t define kernels. wever, the similarity between tw graphs is based n an implicit enumeratin f their cmmn tree patterns which des nt allw t readily analyze the influence f a pattern n the predictin. Finally [9] and [1] are based n an explicit enumeratin f patterns. In [9], a predefined set f unlabeled subgraphs, called graphlets, is enumerated fr each graph and in [1] all subtrees f a labeled graphs up t size 6, called treelets are enumerated. One advantage f [1] is that, unlike [9], the label f the graph are taken int accunt by the graph similarity measure. wever, sme mlecules may have a same mlecular frmula, a same mlecular graph but a different relative psitining f their atms. Such mlecules are said t be stereismers. Different stereismers may be assciated t different prperties. wever, usual graph kernels based n the mlecular graph representatin are nt able t capture any dissimilarity between these mlecules. Frm a mre lcal pint f view, an atm r tw cnnected atms are called sterecenters if a permutatin f the psitins f tw atms belnging t the unin f their neighbrhds prduces a different stereismer. In rder t get an intuitin f stereismerism, let us cnsider an acyclic mlecular graph rted n an atm f carbn with fur neighbrs, each neighbr

Incrprating stere infrmatin within the graph kernel framewrk 3 Br F F Br 1 1 C 1 Br 1 F (a) Asymmetric carbn Br 1 2 1 C C 1 Br 1 Br (b) Carbns cnnected by a duble bnd Fig. 1. Tw types f sterecenters. being assciated t a different subtree. Such an atm, called an asymmetric carbn, is a sterecenter and has tw different spatial cnfiguratins f its neighbrs encded by a same mlecular graph (Figure 1(a)). Using mlecule represented in Figure 1(a), ne cnfiguratin crrespnds t the case where the three atms (,F,Br) cnsidered frm the atm are encuntered in this rder when turning cunter-clckwise arund the central carbn atm. The alternative stereismer crrespnds t the case where this sequence f atms is encuntered clckwise when cnsidered frm the same psitin. This example crrespnds t a particular frm f stereismerism, called chirality, where the mlecule has n center nr plane f symmetry. In this case, mlecules are said t be chiral. Tw carbns, cnnected by a duble bnd, can als define stereismers (Figure 1(b)). Indeed, n the left side f Figure 1(b) bth hydrgen atms are lcated n the same side f the duble bnd while they are lcated n ppsite sides n the stereismer represented n the right. In this case bth carbn atms f the duble bnd crrespnd t a sterecenter. This example crrespnd t anther stereismerism frm, called gemetric ismerism, where stereismers have at least ne center r ne plane f symmetry. T distinguish thse cnfiguratins, we intrduce the tw fllwing subsets f the set f vertices V f a mlecular graph: Definitin 2. Ptential Asymmetric Carbns Let us dente V P AC the subset f V cntaining all vertices encding atms f carbn with fur neighbrs: V P AC = {v V µ(v) = C and V (v) = 4} Since being an atm with fur neighbrs is a necessary cnditin t define an asymmetric carbn, the set V P AC cntains all vertices which may encde such atms. Definitin 3. Set f duble-bnds cnnecting carbn atms

4 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin O O Br O Br O Fig. 2. Asymmetric carbns with identical neighbrhd. The subset f V cntaining all atms f carbn which share a duble bnd with anther carbn is nted V DB : V (v) = V (w) = 3 V DB = v V e(v, w) E, ν(e) = 2, and µ(v) = µ(w) = C An atm f carbn with tw duble bunds must have a degree equal t tw. ence, each vertex v belnging t V DB is incident t a single duble bund and we dente n = (v) the ther carbn cnnected by this duble bnd. Nte that n = (v) V DB. Brwn et al. described in [2] a methd which includes infrmatin related t the spatial cnfiguratin f atms within the tree-pattern kernel [8]. wever, this methd nly cnsiders the direct neighbrs f a sterecenter while, as shwn by Figure 2, the difference between tw subtrees f a sterecenter may nt be lcated n the rt f the subtree. In this last case [2] cnsiders as identical tw different sterecenters and is thus unable t recver their different prperties. In this paper we prpse a methd t incrprate the spatial cnfiguratin f atms within a graph kernel based n a subtree enumeratin [1]. This methd remains valid when the spatial cnfiguratin is nt encded in the direct neighbrhd f a sterecenter. In Sectin 2, we define a graph encding f stereismers and we intrduce stere vertices as vertices encding sterecenters. Next, in Sectin 3, we restrict ur attentin t acyclic mlecules. Such a restrictin allws us t efficiently characterize a stere vertex by a rted tree. In Sectin 4, we define the smallest tree characterizing a stere vertex and use this infrmatin t design a graph kernel between mlecules. Finally, we demnstrate the validity f ur kernel thrugh experiments in Sectin 5. 2 Encding f stereismers 2.1 Ordered Structured Object Definitin 4. Structured Objects A structured bject S is an bject t which we can assciate an unique labeled graph G(S) = (V, E, µ, ν). A structured bject can be fr example a graph (assciated t itself) r a rted tree (assciated t an acyclic graph).

Incrprating stere infrmatin within the graph kernel framewrk 5 An usual methd in chemistry t encde stereismetry cnsists t encde a relative rder n the neighbrhd f each vertex. In rder t encde such an infrmatin, we intrduce the ntin f rder n structured bject. Definitin 5. Ordered Structured Objects An rdered structured bject S = (Ŝ, rd) is a structured bject Ŝ, assciated t a graph G(S) = (V, E, µ, ν), tgether with a functin rd which maps each vertex v belnging t a subset V rd f V nt an rdered list f a subset f its neighbrhd V (v): { Vrd V rd v v 1... v n where {v 1,..., v n } V (v) dentes a subset f the neighbrhd f v. We dente rd(v) = n the length f the rdered list fr any v V rd. We have thus 0 < rd(v) V (v). Nte that, the ntatin V rd which dentes the subset f V fr which functin rd is defined will be used in the remaining part f this dcument. Definitin 6. Set f Ordered Structured Objects A set f rdered structured bjects S is a set S = {S = (Ŝ, rd)} frm which we can define a set f ismrphism Ism(Ŝ, Ŝ ) Ism(G(S), G(S )) between any tw structured bjects Ŝ and Ŝ. This set f ismrphism must respects the fllwing prperties: 1. S 1, S 2 S 2, f Ism(Ŝ1, Ŝ2) f 1 Ism(Ŝ2, Ŝ1) 2. S 1, S 2, S 3 S 3, f Ism(Ŝ1, Ŝ2), g Ism(Ŝ2, Ŝ3) g f Ism(Ŝ1, Ŝ3) 3. S S, Ism(Ŝ, Ŝ) is a grup. 4. (S, S ) S 2, f Ism(Ŝ, Ŝ ) { f(v rd ) = V rd v V rd, rd(v) = rd (f(v)) The first three cnditins impse that ur restricted set f ismrphim relatinships satisfies the usual prperties f ismrphisms: the inverse and the cmpsitin f tw ismrphisms is still an ismrphism (cnditins 1 and 2) and cnsidering a set f autmrphisms, the identity belngs t ur valid set f ismrphisms (cnditin 3). The last cnditin impses that the set f vertices n which the functin rd is defined remains stable by an ismrphims. It further impses that an ismrphism des nt mdify the number f vertices n which the rder relatinship is defined fr each vertex. Definitin 7. Ismrphism between rdered structured bjects Let us cnsider a set f rdered structured bjects S. Tw rdered structured bjects S = (Ŝ, rd) and S = (Ŝ, rd ) are said t be ismrphic S S iff there is an ismrphism between the structured bjects Ŝ and Ŝ which is cherent with the rder n the subsets f the neighbrhds:

6 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin S S f Ism(Ŝ, Ŝ ) s.t. v V rd with rd(v) = v 1... v n, rd (f(v)) = f(v 1 )... f(v n ) In this case, f is called an rdered ismrphism between S and S, and we dente IsmOrd(S, S ) Ism(Ŝ, Ŝ ) the set f rdered ismrphism between S and S. Prpsitin 1. Ordered structured bject ismrphism induces an equivalence relatinship. Prf. Let us cnsider a set f rdered structured bjects S. We have thus t shw that the structured bject ismrphism relatinship is reflexive, symmetric and transitive: 1. The ismrphism between structured bject is reflexive. Let S = (Ŝ, rd) S an rdered structured bject assciated t a graph G(S) = (V, E, µ, ν), and let f dentes the identity functin n V ( f(v) = v, v V ). Then f Ism(Ŝ, Ŝ) (Definitin 6, cnditin 3) and: v V rd V, f(v 1 )... f(v n ) = rd(f(v)) = rd(v) = v 1... v n where {v 1,..., v n } V (v) dentes a subset f the neighbrhd f v. Therefre, S S. 2. The ismrphism between rdered structured bject is symmetric. Let S a = (Ŝa, rd a ) S and S b = (Ŝb, rd b ) S tw rdered structured bject, respectively assciated t G(S a ) = (V a, E a, µ a, ν a ) and G(S b ) = (V b, E b, µ b, ν b ), such that S a S b and let us further dente by f the rdered ismrphism between S a and S b. By definitin f is als an ismrphism between Ŝa and Ŝb, therefre it exists (Definitin 6, cnditin 1) an ismrphism f 1 between Ŝb and Ŝa. Let us cnsider a vertex v b in V rdb V b and v a = f 1 (v b ) V rda (Definitin 6, cnditin 4). Since S a S b, we have: { rda (v a ) = v a1... v an and rd b (v b ) = rd b (f(v a )) = v b1... v bn with v bi = f(v ai ), i {1,..., n} ence: { rdb (v b ) = v b1... v bn and rd a (v a ) = rd a (f 1 (v b )) = v a1... v an = f 1 (v b1 )... f 1 (v bn ) Thus S b S a and f 1 is an rdered ismrphism between S b and S a. 3. The ismrphism between rdered structured bject is transitive. Let S a = (Ŝa, rd a ) S, S b = (Ŝb, rd b ) S and S c = (Ŝc, rd c ) S three rdered structured bject, respectively assciated t G(S a ) = (V a, E a, µ a, ν a ),

Incrprating stere infrmatin within the graph kernel framewrk 7 G(S b ) = (V b, E b, µ b, ν b ) and G(S c ) = (V c, E c, µ c, ν c ), such that S a S b and S b S c. We dente by f the rdered ismrphism between S a and S b, and by g the rdered ismrphism between S b and S c. As the ismrphism between structured bjects is transitive we have g f Ism(Ŝa, Ŝc) (Definitin 6, cnditin 2). Let us cnsider a vertex v a in V rda V a with v b = f(v a ) V rdb V b and v c = g(v b ) = g f(v a ) V rdc V c. We have since S a S b and S b S c : Therefre: rd a (v a ) = v a1... v an and rd b (f(v a )) = rd b (v b ) = f(v a1 )... f(v an ) = v b1... v bn, nt. rd c (g(v b )) = g(v b1 )... g(v bn ) rd c (g(v b )) = rd c (g f(v a )) = g f(v a1 )... g f(v an ) Thus S a S c and g f is an rdered ismrphism between S a and S c. In cnclusin, the ismrphism between rdered structured bjects is reflexive, symmetric and transitive. It is therefre, an equivalence relatinship. 2.2 Re-rdering functin A spatial cnfiguratin f atms may be encded by several equivalent rders. We thus intrduce the ntin f re-rdering functin, which assciates t each vertex f an rdered structured bject a permutatin n a subset f its neighbrhd. Definitin 8. Re-rdering functins Let us cnsider a set f rdered structured bjects S. A re-rdering functin σ S n an rdered structured bject S = (Ŝ, rd), assciated t a graph G(S) = (V, E, µ, ν), assciates t each vertex v V rd a permutatin ϕ v n {1,..., rd(v) }. { Vrd P σ S v ϕ v Π rd(v) where Π n is the grup f permutatins f n elements and P is the unin f Π n fr all n N. Applicatin f a re-rdering functin n an rdered structured bject prvides a new rdered structured bject defined as fllws: Definitin 9. Re-rdered structured bjects Let us cnsider a set f rdered structured bjects S. Let S = (Ŝ, rd) dentes an rdered structured bject, σ S (S) = (Ŝ, rd σ S ) is defined as the rdered structured bject btained after applying the re-rdering functin σ S n the rder f the bject: v V rd s.t. rd(v) = v 1,..., v n and rd σs (v) = v ϕv(1),..., v ϕv(n) σ S (v) = ϕ v,

8 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin Nte that Ŝ = σ S (S). In ther wrds, a re-rdering f an rdered structured bject des nt change the assciated structured bject. Re-rdering peratins being defined as functins, these functins may be cmbined using cmpsitin peratins: Definitin 10. Cmpsitin f re-rdering functins Let us cnsider a set f rdered structured bjects S. Let σ S and σ S dente tw re-rdering functins n an rdered structured bject S = (Ŝ, rd). The cmpsitin f σ S and σ S is a re-rdering functin dented by σ S σ S and defined as fllws: ( Vrd P σ S σ S v σ S (v) σ S (v) Π rd(v) where Π n is the grup f permutatins f n elements and P is the unin f Π n fr all n N. The identity fr the cmpsitin is the re-rdering functin Id S such that v V rd, Id S (v) = Id rd(v) where Id n is the identity permutatin n Π n. 2.3 Structured bject having equivalent rder Re-rdering functins previusly defined may apply any re-rdering n a structured bject hence remving the ntin f rder n these bjects. In rder t btain a useful ntin f re-rdering, we have t define mre precisely which prperties shuld satisfies a valid family f re-rdering functins. Definitin 11. Valid re-rderings Let us cnsider a set f rdered structured bjects S. Fr each S = (Ŝ, rd) S, let us dente by S a set f re-rdering functins n S. A valid family f re-rdering functins is a set = { S, S S } which satisfies the tw fllwing prperties : Fr any S S, S is a grup fr the cmpsitin. Fr any tw rdered structured bjects S = (Ŝ, rd) and S = (Ŝ, rd ) whse assciated un-rder structured bjects are ismrphic by a functin f, any re-rdering functin σ S is equal, up t the ismrphism f 1, t a re-rdering functin f S. f Ism(Ŝ, Ŝ ), σ S ) σ f 1 S. The first cnstraint f Definitin 11 states that the set f re-rdering functins f an rdered structured bjects may be cmbined freely using cmpsitin peratins. The secnd cnstraint, invlves that tw rdered structured bjects with ismrphic un-rdered structured bjects shuld have, up t the ismrphism functin, equivalent set f re-rdering functins. Nte that this last cnstraint is equivalent t the fllwing equatin: f Ism(Ŝ, Ŝ ), σ S ) σ S σ f = σ

Incrprating stere infrmatin within the graph kernel framewrk 9 Prpsitin 2. Let us cnsider a set f rdered structured bjects S, and a valid family f re-rdering functins. The grup f re-rdering functins σ(s) f any re-rdered structured bject σ(s) is equal t the grup S f re-rdering functins f S: ) S S, σ S, σ(s) = S Prf. Let us cnsider σ S. Since S and σ(s) nly differ by the rder defined n each vertices, the identity functin Id is a valid ismrphism between the unrder structured bjects assciated t σ(s) and S. Then, using Definitin 11, fr any σ σ(s), it exists a re-rdering functin σ S such that: v V rd, σ (v) = σ (Id(v)) = σ (v) We have thus σ = σ and thus σ(s) S. The reverse inclusin is shwn in the same way. Definitin 12. Equivalent rders Let us cnsider a set f rdered structured bjects S and tw f its rdered structured bjects S a = (Ŝa, rd a ) S and S b = (Ŝb, rd b ) S. These structured bjects are said t be equivalent S a S b accrding t a valid family f re-rdering functins if: σ Sa, σ(s a ) S b (1) In ther wrd, we cnsider that tw rdered structured bjects are equivalent if, up t a valid re-rdering σ we can establish an rdered structured bject ismrphism f between them. In that case the rdered ismrphism f is called an equivalent rdered ismrphism thrugh σ between S a and S b and we dente IsmEqOrd σ (S a, S b ) the set f equivalent rdered ismrphism thrugh σ between S a and S b. We further dente by IsmEqOrd(S a, S b ) the unin f all IsmEqOrd σ (S a, S b ) fr all σ Sa. IsmEqOrd(S a, S b ) = σ Sa IsmEqOrd σ (S a, S b ) We will nw prve that the equivalence rder relatinship is, as suggested by its name, an equivalence relatinship. Prpsitin 3. Let S be a set f rdered structured bjects and dentes a valid family f re-rdering functins. The equivalent rder relatinship based n this family is reflexive. S S, S S Prf. Let dentes a valid family f re-rdering functins and S = (Ŝ, rd) an rdered structured bject.

10 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin By Definitin 11, S is a grup and therefre Id S S. We have by definitin f Id S (Definitin 10): v V rd, rd IdS (v) = rd(v). We have by Definitin 7, Id S (S) S with Id S S. Thus S S. Lemma 1. Let S be a set f rdered structured bjects and a valid family f re-rdering functins. Let us cnsider tw rdered structured bjects S a = (Ŝa, rd a ) S and S b = (Ŝb, rd b ) S such that S a S b. Let σ a Sa and σ b Sb tw re-rdering functins such that: v V rda, σ a (v) = σ b (f(v)), where f is an rdered ismrphism between S a and S b. Then we have σ a (S a ) σ b (S b ). Prf. Let us cnsider f IsmOrd(S a, S b ), and a vertex v V rda with u = f(v). We have by Definitin 7: v V rda { rda (v) = v 1... v n and rd b (u) = u 1... u n, with u i = f(v i ), i {1,..., n} Let us further dente by ϕ v the permutatin defined n vertex v bth by σ a and σ b : ϕ v = σ a (v) = σ b (f(v)). Given the re-rdered structured bjects σ a (S a ) = (Ŝa, rd σa ) and σ b (S b ) = (Ŝb, rd σb ) we have by Definitin 9: { rdσa (v) = v ϕv(1)... v ϕv(n) and rd σb (u) = u ϕv(1)... u ϕv(n). Since sequences v 1... v n and u 1... u n satisfy u i = f(v i ) and since a same permutatin ϕ v is applied n bth sequences we have: i {1,..., n} u ϕv(i) = f(v ϕv(i)) The ismrphism f maps thus the rder encded by σ a (S a ) arund each vertex f S a nt the rder defined by σ b (S b ) n the crrespnding vertex f S b. Mrever, since f is an ismrphism between rdered structured bjects, it als crrespnds t an ismrphism between un-rdered structured bjects and we have by Definitin 7, σ a (S a ) σ b (S b ). Prpsitin 4. Let S be a set f rdered structured bjects and a valid family f re-rdering functins. The equivalent rder relatinships based n this family is symmetric: (S a, S b ) S 2, S a S b S b S a.

Incrprating stere infrmatin within the graph kernel framewrk 11 Prf. Let us cnsider a set f rdered structured bjects S, a valid family f re-rdering functins and tw rdered structured bjects S a = (Ŝa, rd a ) S and S b = (Ŝb, rd b ) S such that S a S b We have by Definitin 12: σ Sa s. t. σ(s a ) S b. As Sa is a grup (Definitin 11), it exists a re-rdering functin σ 1 Sa such that σ 1 (σ(s a )) = S a. Let us dente by f the rdered ismrphism between σ(s a ) and S b. By Definitin 11, since σ(s a ) S b, the re-rdering functin σ 1 Sa shuld be equivalent t sme re-rdering functin (σ 1 ) in Sb. In ther wrds: (σ 1 ) Sb such that v V rda, σ 1 (v) = (σ 1 ) (f(v)). We have thus by Lemma 1: σ 1 (σ(s a )) (σ 1 ) (S b ). Therefre S a (σ 1 ) (S b ), and by symmetry f the rdered ismrphism (σ 1 ) (S b ) S a. S by Definitin 12, S b S a. Bth cases being symmetric, the reverse implicatin is prved in the same way. Prpsitin 5. Let S a set f rdered structured bjects and a valid family f re-rdering functins. The equivalent rder relatinship based n this family is transitive: ( ) Sa S b and (S a, S b, S c ) S 3, S S b S a S c c Prf. Let us cnsider a set f rdered structured bjects S, a valid family f re-rdering functins and three rdered structured bjects S a = (Ŝa, rd a ) S, S b = (Ŝb, rd b ) S and S c = (Ŝc, rd c ) S such that S a S b and S b S c. Using Definitin 12, it exists tw re-rdering functins σ a Sa and σ b Sb such that σ a (S a ) S b and σ b (S b) S c. Let us dente the ismrphism between un-rdered structured bjects Ŝb and σ a (S a ) by f ba. Since S b and σ(s a ) are ismrph and since σ b S b, it must exists (by Definitin 11) a re-rdering functin σ a σa(s a) such that: v V rdb, σ b(v) = σ a(f ba (v)). Then by Lemma 1, we have σ b (S b) σ a(σ a (S a )). Since the rdered ismrphism is an equivalence relatinship (Prpsitin 1) and σ b (S b) S c, we have σ a(σ a (S a )) S c. Since Sa is a grup (Definitin 11) and since σ a Sa and σ a σa(s a) = Sa (Prpsitin 2) we have: σ a σ a Sa. S by Definitin 12, S a S c.

12 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin Therem 1. Let S be a set f rdered structured bjects and a valid family f re-rdering functins. The equivalent rder relatinship based n this family is an equivalence relatinship. Prf. The equivalent rder relatinship is reflexive (Prpsitin 3), symmetric (Prpsitin 4) and transitive (Prpsitin 5). 2.4 Ordered graphs We nw apply the definitin f rdered structured bjects t labeled graphs, in rder t define rdered graphs. Definitin 13. Set f Ordered Graphs An rdered graph S = (G = (V, E, µ, ν), rd) is an rdered structured bject with G(S) = G and a functin rd which maps each vertex f V rd V t an rdered list f its neighbrs: { Vrd V rd v v 1... v n where V (v) = {v 1,..., v n } dentes the neighbrhd f v. The set f rdered graph is dented OG. The set f ismrphism defined between tw un-rdered graphs (Definitin 6) is the usual set f ismrphism between labeled graphs. As the set f rdered graphs is a set f structured bjects, we can define, fr rdered graphs, re-rdering functins (Definitin 8). The definitin f rders and re-rdering functins depends n the applicatin at end. Nevertheless, if re-rderings fulfill the definitin f a valid family f re-rdering functins (Definitin 11) we can define the equivalence relatinship between rdered graphs defined by Definitin 12 (Therem 1). Let us nw define a stere vertex, which encdes a sterecenter when rdered graphs represents mlecules. Definitin 14. Stere vertices Let be a valid family f re-rdering functins n OG. Let G = (V, E, µ, ν, rd) be an rdered graph. A vertex v V rd V f degree n is called a stere vertex iff: (i, j) {1,..., n} 2 with i j, f IsmEqOrd(G, τ i,j (G)) with f(v) = v. where τ i,j is a re-rdering functin equals t the identity n all vertices except v fr which it permutes the vertices f index i and j in rd(v). In ther wrds, a vertex is a stere vertex if any permutatin f its neighbrs prduces an rdered graph with a nn-equivalent rder.

Incrprating stere infrmatin within the graph kernel framewrk 13 2.5 Ordered graphs and re-rdering functins encding f a mlecule We nw restrict ur attentin n mlecular graphs (Definitin 1) and let us define frm them mlecular rdered graphs. The mlecular rdered graph f a mlecule is defined by first defining its mlecular graph (Definitin 1) G = (V, E, µ, ν) which encdes relatinships between atms tgether with the type f atm and bnd respectively assciated t each vertex and each edge. Definitin 15. Mlecular rdered graph A mlecular rdered graph is a cuple S = (G, rd) where G crrespnds t a mlecular graph. The functin rd is defined n a set V rd defined as: V rd = V P AC V DB where V P AC and V DB dente respectively the set f ptential asymmetric carbns (Definitin 2) and the set f carbns f degree 3 cnnected by a duble bund (Definitin 3). The functin rd is defined as fllws fr each vertex v V rd : If v V P AC : We set randmly ne f its neighbr v 1 at the first psitin. The three ther neighbrs f v are rdered such that if we lk at v frm v 1, the three remaining neighbrs are rdered clckwise (Sectin 1). One f the three rders (defined up t circular permutatins) fulfilling this cnditin is chsen randmly (Figure 3(a)). If v V DB : Let us cnsider w = n = (v) and the tw neighbrhds V (v) = {w, a, b} and V (w) = {v, c, d}. The rder n the neighbrhd f v is set as rd(v) = w, a, b and the rder n w s neighbrhd is set as rd(w) = v, c, d, whereby a, b, c, d are traversed clckwise when turning arund the duble bnd fr a given plane embedding (Figure 3(b)). We dentes OM the set f rdered mlecular graphs. Definitin 16. Set f mlecular re-rdering functins We define fr each mlecular rdered graph S, a set f re-rdering functin S M. M S cntains all the re-rdering functins σ such that: Fr each v in V P AC σ(v) is an even permutatin: v V P AC, ɛ(σ(v)) = 1. Fr each v in V DB, σ(v) and σ(n = (v)) have the same parity: v V DB, ɛ(σ(v)) = ɛ(σ(w)) with w = n = (v). where ɛ dentes the signature f a permutatin.

14 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin v1 v4 v3 v v2 rd(v)=v1,v2,v3,v4 rdσ(v)=v1,v4,v2,v3 rdσ'(v)=v4,v1,v3,v2... b a v w c d rd(v)=w,a,b; rd(w)=v,c,d rdσ(v)=w,b,a; rdσ(w)=v,d,c rdσ'(v)=a,b,w; rdσ'(w)=v,c,d (a) Element f V P AC (b) Tw elements f V DB Fig. 3. Example f elements f V P AC and f V DB with their rdered list (tp) and the rdered lists btained using tw permutatins σ M S and σ M S Prpsitin 6. Fr any mlecular graph S, M S is a grup fr the cmpsitin. Prf. We have t shw that S M admits an identity element, is clsed under cmpsitin and admits fr each re-rdering functin an inverse element. 1. M S admits an identity element Id S. Let us cnsider the re-rdering functin Id S such that: v V rd, Id S (v) = Id V (v) where Id n is the identity permutatin n Π n. Since the identity is an even permutatin we have: { v VP AC ɛ(id S (v)) = 1 v V DB ɛ(id S (v)) = ɛ(id S (n = (v)) = 1 Thus by definitin f S M, Id S S M. 2. (σ, σ ) ( ) 2, S M σ σ S M. Let σ and σ dente tw re-rdering functins f S M. If v V P AC : Since ɛ is a mrphism between Π [rd(v) and ({ 1, 1}, ) we have: v V P AC, ɛ(σ(v) σ (v)) = ɛ(σ(v))ɛ(σ (v)) = 1.1 = 1 If v V DB : Let us cnsider w = n = (v). Since σ(v) and σ(w) n ne hand and σ (v) and σ (w) n the ther hand have a same signature, we have: v V DB, ɛ(σ(v) σ (v)) = ɛ(σ(v))ɛ(σ (v)) = ɛ(σ(w))ɛ(σ (w)) = ɛ(σ(w) σ (w)) Permutatins σ(v) σ (v) and σ(w) σ (w) have thus a same parity. Thus by definitin f M S, σ σ M S. 3. σ M S, σ 1 M S such that σ σ 1 = Id S, where Id S is the identity element f M S. Let us cnsider σ 1 such that: v V rd, σ 1 (v) = (σ(v)) 1. We have by Definitin 10, σ σ 1 = Id S. We have t prve that, fr each σ M S, σ 1 M S.

Incrprating stere infrmatin within the graph kernel framewrk 15 If v V P AC : ɛ(σ 1 (v)) = ɛ(σ(v) 1 ) = ɛ(σ(v)) = 1 Thus σ 1 (v) is even. If v V DB : Let us cnsider w = n = (v). Since σ(v) and σ(w) have a same parity and: { ɛ(σ 1 (v)) = ɛ(σ(v)) ɛ(σ 1 (w)) = ɛ(σ(w)), σ 1 (v) and σ 1 (w) have als a same parity. Thus by definitin f M S, σ 1 M S. As the cmpsitin f functins is assciative and S M admits an identity element fr the cmpsitin, is clsed under cmpsitin, and admits fr each re-rdering functin an inverse element we can cnclude that S M is a grup fr the cmpsitin. Prpsitin 7. Fr any tw mlecular rdered graphs S and S whse assciated un-rder graphs are ismrphic by a functin f, and fr any re-rdering functin σ M S, we have σ f 1 M S : f Ism(Ŝ, ) Ŝ ), σ S M σ = σ f 1 S M. Prf. Let us cnsider tw mlecular rdered graphs S and S whse assciated un-rder graphs G and G are ismrphic by a functin f. We dente by f 1 the ismrphism between G and G and define the re-rdering functin σ as σ f 1. Let us prve that σ M S. If v V P AC : We have by definitin f σ, σ (v ) = σ(v) with v = f 1 (v ). As v V P AC we have, µ(v ) = C and V (v ) = 4. Since f 1 is an isrmrphism between G and G we have: { µ(v) = µ(f 1 (v )) = µ(v ) = C V (v) = V (f 1 (v )) = V (v ) = 4 Thus by definitin f V P AC, v V P AC. Mrever, since σ (v ) = σ(v), σ(v) has the same even parity than σ (v ). If v V DB : Given w = n = (v ), we have by definitin f σ : { σ (v ) = σ(v) with v = f 1 (v ) σ (w ) = σ(w) with w = f 1 (w )

16 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin As v and w V DB we have µ(v ) = µ(w ) = C and V (v ) = V (w ) = 3. Since f 1 is an isrmrphism between G and G we have: µ(v) = µ(f 1 (v )) = µ(v ) = C and µ(w) = µ(f 1 (w )) = µ(w ) = C V (v) = V (f 1 (v )) = V (v ) = 3 and V (w) = V (f 1 (w )) = V (w ) = 3 As v V DB, it exists an edge e (v, w ) cnnecting v and w with a label ν(e ) = 2. Such an edge is preserved by the ismrphism f 1 between G and G and it thus exists an edge e(v, w) in G between v = f 1 (v ) and w = f 1 (w ) with ν(e) = 2. We have thus by definitin f V DB, {v, w} V DB and since σ S M we have by definitin f S M : Therefre, by definitin f σ : ɛ(σ(v)) = ɛ(σ(w)) ɛ(σ (v )) = ɛ(σ(v)) = ɛ(σ(w)) = ɛ(σ (w )) Permutatins σ (v ) and σ (w ) have thus the same parity. Thus σ = σ f 1 M G. Therem 2. The set f mlecular re-rdering functins M = {S M, S OM} is a valid family f re-rdering functins. Therefre the equivalent rder relatinship based n this family is an equivalence relatinship. Prf. The set f mlecular re-rdering functins M is a valid family f rerdering functins by Prpsitin 6 and 7. Thus by Therem 1, the equivalent rder relatinship based n this family is an equivalence relatinship. Remark 1. Let S = (G, rd) with G = (V, E, µ, ν) dente a mlecular rdered graph. We have, by cnstructin f re rdering functins, the fllwing prperty: If we select fr each vertex v V rd a neighbr n v, we can always find a re-rdering functin σ f S M such that the rdered list f each vertex v f V rd in σ(s) starts by its selected neighbr n v. Given ur encding f the relative psitining f atms by rders defined in this sectin, we encde the spatial cnfiguratin f atms within the neighbrhd f each f its vertex. Our equivalence relatinship between mlecular rdered graphs allws t check if tw mlecules have a same spatial cnfiguratin. Sterecenters are defined as stere vertices (Definitin 14). Indeed, a vertex is a stere vertex if any permutatin f its neighbrs prduces an rdered graph with a nn-equivalent rder, called a different stereismer within the chemistry framewrk.

Incrprating stere infrmatin within the graph kernel framewrk 17 3 Equivalence rder relatinship between rdered tree Let us nw restrict ur attentin t acyclic graphs in rder t btain an efficient way t determine if tw mlecular graphs have equivalent rders. Given a rted tree, the father f each nde v is dented by p v. The tree itself is dented by ˆT = (r, G) where r dentes the rt f the tree and G = (V, E, µ, ν) the acyclic graph assciated t ˆT. Definitin 17. Ordered rted tree An rdered rted tree T = ( ˆT, rd) with ˆT = (r, G) is an rdered structured bject. Its assciated acyclic labeled graph is G = (V, E, µ, ν). The functin rd maps each internal vertex t an rdered list f its children: { Vrd V rd v v 1... v n where {v 1,..., v n } dentes the children f v. We dente by OT the set f rdered trees. Nte that the functin rd is defined n all internal vertices f the tree we have thus: ( ) ˆT T = ( ˆT = (r, G),, rd) OT, with V G = (V, E, µ, ν) rd = V Leaf(T ) where Leaf(T ) dentes the set f leaves f T. Mrever, the rder relatinship f each vertex f an rdered rted tree is defined n all its children, i.e. all its neighbrs but its parent. We have thus: { T = ( ˆT, rd) OT, with ˆT = (r, G) rd(r) = V (v) v V rd, v r rd(v) = V (v) 1 An ismrphism between rted trees may be cnsidered as an ismrphism between graphs which maps the rts f bth trees ne n the ther. ence the set f ismrphisms Ism( ˆT, ˆT ) between tw rted tree may be cnsidered as a subset f the ismrphisms between the assciated acyclic graphs: Ism(Ĝ, Ĝ ) (Definitin 6). Given these ismrphisms we define fr OT rdered ismrphisms between rdered rted trees (Definitin 7). Such ismrphisms preserve bth the structure f bth trees and their rderings. The rder defined n each vertex f the trees belnging t OT depends f the cnsidered applicatin. The valid family f re-rdering functins (Definitin 11), which may be defined n OT als depends f this applicatin. Given bth rders and a valid family f re-rdering functins we can build an equivalence relatinship (Definitin 12 and Therem 1) between rdered trees encding the fact that up t re-rderings tw rted trees are structurally similar and have a same rder. Fllwing [10], we assciate t each rdered rted tree T, an unique depthfirst string encding DFSE(T ). This string is based n the sequence f nde and edge labels btained by traversing the tree in a depth-first rder and uses respectively $ and # t represent backtracks and the end f the string encding.

18 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin Definitin 18. Depth-First String Encding The depth first string encding f an rdered rted tree T = ( ˆT, rd) dented by DFSE(T ) is defined as the sequence f nde and edge labels encuntered when traversing T using a depth-first methd based n the rder defined by functin rd. Each backtrack during this traversal is encded by the symbl $ while the end f the string is encded by the symbl #. Remark 2. As shwn by [10](Lemma 2.2), tw ismrphic rdered trees have the same depth-first string encding and cnversely. T 1 T 2 DFSE(T 1 ) = DFSE(T 2 ) Definitin 19. Depth-first cannical string and Depth-first cannical frm f a tree Let us cnsider an rdered tree T and a valid family f re-rdering functins : The depth-first cannical string DFCS (T ) f T is the minimal depth-first string encding amng all pssible rdered trees σ(t ) btained by applying σ T n T : DFCS (T ) = min σ T DFSE(σ(T )) The depth-first cannical frm DFCF (T ) f T accrding t is the rdered tree σ(t ) whse depth-first string encding is minimal (and thus equals t DFCS (T )): DFSE(DFCF (T )) = DFCS (T ) The depth-first cannical frm is unique up t rdered ismrphism (Remark 2). Prpsitin 8. Tw rdered rted trees T a = ( ˆT a, rd a ) and T b = ( ˆT b, rd b ) have equivalent rders accrding t a valid family f re-rdering functins iff their depth-first cannical string accrding t are equals: T a T b DFCS (T a ) = DFCS (T b ) Prf. Let us first prve that DFCS (T a ) = DFCS (T b ) T a T b and then the reverse implicatin. 1. We suppse that DFCS (T a ) = DFCS (T b ). Let us dente σ a Ta the re-rdering functin such that σ a (T a ) = DFCF (T a ) and σ b T b the ne such that σ b (T b) = DFCF (T b ). By definitin DFCS (T a ) = DFSE(σ a (T a )) and DFCS (T b ) = DFSE(σ b (T b)). Since DFCS (T a ) = DFCS (T b ) we have DFSE(σ a (T a )) = DFSE(σ b (T b)), and thus σ a (T a ) σ b (T b) (Remark 2).

Incrprating stere infrmatin within the graph kernel framewrk 19 As Ta is a grup (Definitin 11), it exists a re-rdering functin σa 1 Ta such that σa 1 (σ a (T a )) = T a. Let us dente by f the rdered ismrphism between σ a (T a ) and σ b (T b). By Definitin 11, since σ a (T a ) σ b (T b), the re-rdering functin σa 1 Ta shuld be equivalent t sme re-rdering functin σ 1 b in Tb. In ther wrds: σ 1 b Tb such that v V a, σa 1 (v) = σ 1 (f(v)). We have thus by Lemma 1: σa 1 (σ a (T a )) σ 1 b (σ b (T b)). Therefre T a σ 1 b (σ b (T b)). As Tb is a grup, σ 1 b σ b T b, s T a T b. 2. We suppse that T a T b. We dente σ b T b, the re-rdering functin such that σ b (T b) = DFCF (T b ). As T a T b, σ Ta such that σ(t a ) T b. Let us dente by f the rdered ismrphism between T b and σ(t a ). By Definitin 11, since T b σ(t a ), the re-rdering functin σ b T b shuld be equivalent t sme re-rdering functin σ a in Ta. In ther wrds: b σ a Ta such that v V b, σ b(v) = σ a(f(v)). We have thus by Lemma 1: σ a(σ(t a )) σ b (T b). Therefre DFSE(σ a(σ(t a ))) = DFSE(σ b(t b )) = DFCS (T b ) Since Ta is a grup, we have σ a σ Ta. Therefre: DFCS (T b ) = DFSE(σ a(σ(t a ))) DFCS (T a ). Bth cases being symmetric the reverse inequality is shwn by cnsidering σ a Ta such that σ a(t a ) = DF CF (T a ). Therefre: By 1 and 2 we have prven that DFCS (T a ) = DFCS (T b ). T a T b DFCS (T a ) = DFCS (T b ). Let be a valid family f re-rdering functins. An rdered tree T can have tw vertices cnnected t a same parent and whse assciated subtrees are equivalent accrding t. Any path frm a leaf f T passing thrugh ne f these tw vertices is equivalent t anther path passing thrugh the ther vertex. Frm a mre glbal pint f view, a permutatin exchanging these tw subtrees n the depth-first cannical frm f T wuld lead t an ismrphic rdered tree. We thus cnsider that these tw vertices are equivalent.

20 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin Definitin 20. Equivalent rdered sub-tree Let us cnsider an rdered rted tree T and a valid family f re-rdering functins. Tw child f a same parent whse assciated rted sub-trees are equivalent accrding t are said t be equivalent: v i v j (v, σ) V T s.t. { pvi = p vj = v and (σ(v))(i) = j and DFCF (σ(t )) DFCF (T ) (2) The representative f each class is defined as the vertex with the minimal index within the rdered list f children f its parent: i {1,..., n} rep(v i ) = min{j v j v i }. (3) The representative f a class is prperly defined since tw equivalent ndes must have a same parent. 3.1 Ordered trees and re-rdering functins encding f an acyclic mlecule T define a mlecular rdered tree T = ( ˆT, rd T ), frm an acyclic mlecular rdered graph G = (Ĝ, rd G), we have t define a rt and fr each vertex an rder n its children. By definitin, the functin rd T f T is defined n (Definitin 17): Vrd T = V Leaf(T ) where Leaf(T ) dentes the set f leaves f T. On the ther end, the set f vertices f an rdered mlecular graph G = (Ĝ, rd G) n which the functin rd G is defined (Definitin 15) is equal t: V M rd = V P AC V DB Since a mlecular rdered graph des nt prvide an rder fr all vertices, while all vertices f an rdered tree but its leaves are rdered, the definitin f an rder n a mlecular tree impses t fix a priri the rder n the child f sme vertices if this rder is nt prvided by the mlecular rdered graph. Definitin 21. Mlecular rdered tree A mlecular rdered tree T = ( ˆT, rd T ) with ˆT = (r, G T ) is defined frm a mlecular rdered graph G = (Ĝ, rd G) with Ĝ = (V, E, µ, ν) by setting G T = Ĝ, r V and by defining rd T frm rd G as fllws: Given the rt r, let us cnsider a re-rdering functin f G M such that (Remark 1): v Vrd M {r}, rd σ(g) (v) = p v.v 1....v n

Incrprating stere infrmatin within the graph kernel framewrk 21 If V {r}, the functin rd T n r is set equals t: { rdσ(g) (r) if r V rd T (r) = rd M randm(child(r)) therwise where randm(child(r)) dentes a randm rdering f the child f r (child(r)). Fr all ther vertices: If v V T rd V M rd {r} rdt (v) = randm(child(v)) If v V T rd V M rd {r} We have rd σ(g) (v) = p v.v 1.....v n where v 1....v n crrespnds t an rdering f child(v). We set thus rd T (v) as: rd T (v) = v 1....v n Definitin 22. Set f mlecular re-rdering functins fr tree The set f re-rdering functins T M is defined by : If v V T rd V M rd, Permutatin σ(v) can be any permutatin. Therefre the rder n thse vertex have n influence, it nly allws us t determine the depth-first string encding and the depth-first cannical string f a mlecular rdered tree. If v V P AC, σ(v) is an even permutatin: ɛ(σ(v)) = 1 If v V DB, Permutatins σ(v) and σ(n = (v)) have a same parity : ɛ(σ(v)) = ɛ(σ(w)) with w = n = (v) Given a unique cde assciated t an rdered rted tree, the chirality f a vertex may be efficiently tested if ne can transpse Definitin 14 t rdered rted trees: Prpsitin 9. Let T = ( ˆT, rd T ) with ˆT = (r, G) be an rdered rted tree encding an acyclic mlecule, and T M the set f mlecular re-rdering functins fr T. r is a stere vertex if: (i, j) {1,..., V (r) } 2 with i j, T τ i,j (T ) where τ i,j is a re-rdering functin equals t the identity n any vertex but r where it permutes children f index i and j in the rdered list f r.

22 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin Prf. Using acyclic mlecular graphs, an equivalent rdered ismrphism between rdered rted trees crrespnds t an equivalent rdered ismrphism between rdered graphs with an additinal cnstraint n the mapping f bth rts. If we can find an ismrphism between T and τ i,j (T ) such an ismrphism f satisfies f(r) = r and als crrespnds t an ismrphism between rdered graphs. Cnditins f Definitin 14 are thus vilated and r is nt a stere vertex. The reverse implicatin may be demnstrated using the same type f reasning. 4 Frm a glbal t a lcal characterizatin f stere infrmatin Prpsitin 9 allws us t determine if a vertex induces a stere prperty fr a mlecule. Such a prpsitin cncerning the whle mlecule induces a glbal characterizatin f stere infrmatin. wever, such a prpsitin des nt allw t characterize the minimal subgraph f a mlecule which induces the stere prperty f a vertex. Using acyclic graphs, such a minimal subgraph crrespnds t the smallest rdered sub-tree, rted n a stere vertex v which allws t characterize v using Prpsitin 9. 4.1 Minimal stere subtree f an asymmetric carbn Let v be a stere vertex representing an asymmetric carbn (v V P AC ). We dente its neighbrs v 1,..., v 4. We cnsider the rdered tree T rted n v and described in Sectin 3 and the family f re-rdering functins fr mlecular tree T M. We nte T 1,..., T 4 the subtrees f T rted n the children f v. Fr any i {1, 2, 3, 4} we dente T j i the subtree f T i cmpsed f all ndes with a depth lwer than j. Accrding t Prpsitin 9, the stere infrmatin f v may be characterized frm its subtrees T j i iff all pairs f subtrees are nt equivalent. Indeed, in such a case n transpsitin f tw subtrees T j i and T j k can induce a rted tree with equivalent rder. Therefre fr each i {1, 2, 3, 4}, we define the minimal subtree assciated t v i as T j (i) i with: j (i) = min{j k {1,..., 4} {i}, T j i T j k }. Fr example in Figure 4, the rt f T 1 is a Chlr atm while the rt f each ther T i is an xygen atm, thus the subtree T1 1 reduced t the Chlr atm has a sufficient depth t be distinguished frm all ther subtrees Ti k, i 1 and we have j (1) = 1. The minimal stere subtree f v is the subtree f T rted n v, where v has fr children T j (1) 1,..., T j (4) 4. The asymmetric carbn is then represented by the depth-first cannical string f this tree accrding t T M. T find j (i), we increase j fr each T j i until T j i T j k fr each k {1,..., 4}, k i. At each iteratin we cmpute DFCS M (T j i ) fr each i {1,..., 4}. Therefre the calculus f the minimal stere subtree f v is perfrmed in O((max T j (i) i ) 2 ) i which is bunded by O( V 2 ).

Incrprating stere infrmatin within the graph kernel framewrk 23 Br O N N (i=3) O O (i=4) Br i 1 2 3 4 j*(i) 1 2 4 4 O O T j*(i) i O O (i=2) (i=1) N Br Br N DFCS (T j*(i) ) i # O1# O1C2C1N$1Br$$1# O1C2C1Br$1N$$1# Fig. 4. Left: An asymmetric carbn with its minimal stere subtree (surrunded by a dtted line). Right: minimal subtrees rted n its children. 4.2 Minimal stere subtree f duble bnd Let v a be ne carbn f a duble bnd, v a V DB. We dente n = (v a ) = v b and e = (v a, v b ) the duble bnd between them. Let us dente by v 1 a and v 2 a the tw remaining neighbrs f v a. Cnsidering the rdered tree T rted n v a, v a is a stere vertex nly if the subtrees rted n the children f v a d nt have equivalent rders (Prpsitin 9). This implies that the tw subtrees rted n v 1 a and v 2 a d nt have equivalent rders. This last necessary cnditin is hwever nt sufficient. Indeed if the subtrees rted n the remaining neighbrs v 1 b and v2 b f v b have equivalent rders, then ne can apply a re-rdering functin σ M T n T which simultaneusly permutes the subtrees rted n v 1 a and v 2 a and the subtrees rted n v 1 b and v2 b (by definitin f V DB and M ). The resulting rted tree σ(t ) has an equivalent rder t T (Definitin 12) but als t τ(t ), where τ permutes nly vertices v 1 a and v 2 a in the rdered list f children f v a. In such a case, v a is nt a stere vertex(prpsitin 9). Therefre, if v b is nt a stere vertex, v a is als nt a stere vertex and cnversely. ence v a and v b are stere vertices, nly if the tw fllwing cnditins are satisfied: subtrees rted n va 1 and va 2 d nt have equivalent rders and subtrees rted n vb 1 and v2 b als d nt have equivalent rders. In rder t encde this cnstraint, we define as in Sectin 4.1 the minimal subtrees rted n va 1 (Ta 1 ) and va 2 (Ta 2 ) with nn-equivalent rders tgether with the minimal subtrees rted n vb 1 (T b 1) and v2 b (T b 2 ) with nn-equivalent rders. We dente by T a and T b the tw rdered rted trees rted n v a and v b. The subtrees f these tw rts being respectively (Ta 1, Ta 2 ) and (Tb 1, T b 2). The tree encding the chirality f the duble bnd is then defined as an rdered rted tree, whse rt crrespnds t a virtual vertex (nt crrespnding t any atm) cnnected t the tw subtrees T a and T b. As in Sectin 4.1, the cmputatin f the minimal stere subtree is bunded by O( V 2 ). Figure 5a represents a duble bnd between tw carbn atms with its minimal stere subtree (Figure 5b).

24 Pierre-Anthny Grenier, Luc Brun, and Didier Villemin O C C (a) O O O Br C C O O O (b) O O 1,2 2,2,1 cs (c) O Br Fig. 5. A duble bnd (a), its minimal stere subtree (b) and its cntractin (c). 4.3 Graph Cntractin Using results in Sectin 4.1 and 4.2, each stere vertex may be assciated t a minimal stere subtree and a depth-first cannical string accrding t M representing it (Sectin 3). wever, prperties f a mlecule are bth determined by its set f minimal stere subtrees and by relatinships between these trees and the remaining part f the mlecule. In rder t btain a lcal characterizatin f such relatinships, we prpse t cntract the minimal stere subtree f each stere vertices. Let us cnsider a stere vertex s and its minimal stere subtree T = ( ˆT, rd T ), with ˆT = (r, G T ), G T = (V T, E T, µ, ν) assciated t a depth-first cannical string accrding t M, we dente this string c s = DFCS M (T ). We define fr this tree a set f cnnectin vertices: and a set f edges t cntract: Vcn = {v Leaf(T ) d(v) > 1} K T = E T Ecn with Ecn = {(v, p v ) Vcn V T }. The cntractin f K T creates a new graph G s = (V s, E s ), with a cntracted nde n s labeled by c s and V s = V (V T Vcn) {n s }; E s = E K T (Figure 5c). Each edge f Ecn cnnects an element l f Vcn t n s in G s. The label f e = (n s, l) has t encde the psitin f l in the minimal stere subtree. We thus cnsider the path cnnecting r t l in the minimal stere subtree: CP (l) = v 1,.., v n with v 1 = r and v n = l. Let us dente i j the index f v j in the rdered list f children f p vj. The sequence i 2... i n defines a unique path in the stere subtree assciated t n s. Such a sequence may thus be cnsidered as a prper label f edge e. wever as mentined in Sectin 3, sme paths may pass thrugh equivalent subtrees and shuld thus be cnsidered as equivalent. In rder t encde such an equivalence relatinship we define the label f e as: n ν(e) = rep(v i ) i=2 where rep is defined by Equatin 3, Definitin 20 and dentes the cncatenatin peratr.

Incrprating stere infrmatin within the graph kernel framewrk 25 G 0 G 2 G 3 G 4 G 4 G 5 G 5 G 6 G7 G 7 G 9 G 9 G 8 G 10 G 10 G 10 G G 11 G 12 11 Fig. 6. The set f steretreelet with n s( ), elements f V cn( ), elements f V V cn( ) 4.4 StereTreelet Fr each stere vertex s we have a graph G s. The steretreelets f G s are defined as all subtrees f G s whse size is lwer than 6 and which include n s. Since each neighbrs v f n s crrespnds t a leaf f the minimal stere tree f s, the edge (v, n s ) is already encded within the cde c s f n s. Cnsequently, we impse that each neighbr v f n s in a steretreelet must have at least anther neighbr (different f n s ). This cnstraint induces the set f steretreelets represented in Fig. 6. The set f steretreelet T (G) f G is defined as the unin f steretreelets f each G s. When all steretreelets f G have been enumerated, we cmpute its spectrum s(g) which crrespnds t a vectr representing the treelet distributin. Each cmpnent f this vectr is equal t the frequency f a given steretreelet t: s(g) = (f t (G)) t T (G) with f t (G) = (t G). The kernel between tw graphs G and G is defined as a sum f kernels between the different number f treelets cmmn t bth graphs: k(g, G ) = t T (G) T (G ) K(f t (G), f t (G )). 5 Experiments We have tested ur methd n a dataset f acyclic chiral mlecules [11] related t a regressin prblem. This dataset is cmpsed f 90 mlecules tgether with their ptical rtatins. In practice, we nly select 35 mlecules, since almst all mlecules have nly ne sterecenter, and fr 55 mlecules this sterecenter is unique in the dataset. Such mlecules crrespnd t a prperty represented nly nce in the dataset which can thus nt be accurately predicted. The prperty t predict, the ptical rtatin, is cnnected with chirality and has a standard deviatin f 38.25 fr the 35 selected mlecules. Fr ur experiment we use a leave-ne-ut crss-validatin n the dataset t predict the ptical rtatin f each mlecule. The predicted rtatins are cmputed by using bth kernel ridge regressin and the weighted mean f knwn