Functor Semantics for Refactoring-Induced Data Migration

Functor Semantics for Refactoring-Induced Data Migration Harald König, Michael Löwe, Christoph Schulz Bericht Nr.: 02007/01

Impressum Forschungsberichte der FHDW Hannover Veröffentlichungen aus dem Bereich Forschung und Entwicklung der FHDW Hannover. Herausgeber: Die Professoren der FHDW Hannover Fachhochschule für die Wirtschaft Hannover Freundallee 15 30173 Hannover Kontakt: techrep@fhdw.de ISSN 1863-7043

Functor Semantics for Refactoring-Induced Data Migration Harald König, Michael Löwe, and Christoph Schulz Fachhochschule für die Wirtschaft (FHDW), Hannover Freundallee 15, 30173 Hannover, Deutschland harald.koenig@fhdw.de Abstract. The refactoring of complete information systems requires an integrated change of the model and the existing data that conforms to the model. In this paper, we present a categorical approach that canonically extends a model refactorisation to a migration of the data. The meta-model can be any topos-like category C. A model M is just an object of C. The information systems conforming to a model M are represented by suitable epi-reflective subcategories of the comma category under M. Model refactorisations are given by arbitrary relations in C, i. e. spans of morphisms M K M. We show that many practically relevant induced migrations can be defined as free constructions between (subcategories of) the comma categories under M, K, and M, respectively. The free construction is composed of the pullback functor of the rule s left-hand side, the left-adjoint to the pullback functor of the rule s right-hand side, and the epi-reflection into the subcategories under consideration. This semantics can be applied not only to typed data, but also to rules, direct derivations, and complete derivation sequences in pushout-based operational semantics (like the double or single pushout approach to graph transformation), since free constructions preserve co-limits. Hence it is an important prerequisite for the integrated refactoring of a system s static structure and dynamical behavior. 1 Introduction Refactoring of information systems requires the integrated change of the data model and the data itself. Ideally, the system designer changes the model only and the development framework provides an automatic migration of the existing data. Up to now, workbenches of this kind are rare and do not support all relevant practical refactorisations. This is mainly due to lack of theoretical work, which shows what kind of model changes can be canonically extended to the data without loss of information. In [1], we started to work out such a theory in a categorical setting. We showed by examples that most of the practical requirements are satisfied by the framework (compare also section 2 in the following). One important theoretical result was that refactorisations and induced migrations are closed under sequential composition. In order to achieve these practical and theoretical results, we introduced some ad-hoc categorical constructions like the notion of folding and abstract folding. In this paper, we show that the same results can be achieved in a standard categorical framework using topos-like categories C. In order to achieve this, we consider data I that is typed in a model M by some morphism i: I M of C as an object of the comma category C M or of a suitable, epireflective subcategory of C M. The constructions of [1] can be reformulated and generalized as free constructions in this setting. The paper is organized as follows. Section 2 presents some practical refactoring examples and introduces the constructions which are needed to handle these cases from the informal point of view. Therefore, section 2 provides the motivation for the rest of the paper. Section 3 presents the categorical preliminaries for the reformulation of our framework in section 4. It is self-contained,

2 Harald König, Michael Löwe, Christoph Schulz only proofs that can be found in the literature are omitted and referenced. Section 4 defines refactoring-induced migrations as free constructions and shows that the construction of [1] fits into this more general framework. Section 5 shows that refactorisations and migrations are closed under sequential composition. 2 Examples Refactorings occur in a variety of ways. E.g. a refactoring of class models 1 might require the migration of objects that conform to the model 2. The question of migration arises if objects reside throughout the time of refactorisation. This is the case, if one considers schema evolution with necessary data migration or transformation of XML-documents that are structured according to changing DTD s or XML Schemas. These examples can formally be described using graphs as a meta-model for arbitrary modelling structures together with graph morphisms for structure transformation. To be able to relate nodes to edges (i.e. to objectify projections or associations 3 ), we consider the following algebraic specification: Graph = sorts O(bject) opns s(ource), t(arget): O O axioms s(s(x)) = s(x) t(t(x)) = t(x) s(x) = x = t(x) = x t(x) = x = s(x) = x which expresses that graphs consist of objects only. However, the axioms distinguish between those x O for which s(x) = x (and hence t(x) = x), which can be considered as nodes, and those for which this is not the case (edges) 4. Using this assignment, it is easy to see that this category is equivalent to the category of algebras that conform to the signature Graph = sorts E, V opns s, t: E V self : V E axioms s(self(x)) = x t(self(x)) = x 1 Model refactorisations already enjoy an extensive tool support: e.g. the IDE Eclipse offers a lot of possibilities. 2... for which tool support is still rare, see [2] 3 For the moment, the cardinality of associations shall not be restricted. Later we will mention concepts for aggregation and composition of objects. 4 Sources and targets of these edges are nodes because of the first two equations.

Functor Semantics for Refactoring-Induced Data Migration 3 which describes the usual graph structure but forces each node to have a self-edge, see Figure 1. Equivalence arises using the forgetful functor V s : Alg(Graph) Alg(Graph ) which corresponds to the specification morphism s : Graph Graph. s sends E and V to O, preserves the source and target symbols, and maps self to the (auxiliary) symbol id : O O, for which the equation id(x) = x has to be added. In the following, we will refer to the specification which fits best the considerations. id E self s self t s t self V id Fig. 1. Quotient Term Category for Graphs with Self-Edges Models 5 can be formalised as algebras of either one of these specifications. Our first example considers unfolding of model objects (see e.g. the Split Table - Pattern in [2] or the Extracting Class - Pattern in [3]). Since we want to use graph morphisms, the best way of expressing this procedure is to introduce a morphism l from the new model K to the old model M, which identifies objects. A typing of data I in the old model is then a morphism i mapping objects in I to their type object in M, compare Figure 2. In general model objects are deleted, where l is not surjective, and are unfolded where l is not injective. M i I l P.B. l K P i Fig. 2. Unfolding and deleting requires pullback for migration In order to migrate the data I, we need to unfold each object in I according to the behaviour of l. We should not produce more or less than that. Thus the best choice is to perform a migration by constructing the pullback (i : P K, l : P I) of (i, l), see Figure 2. Other refactorisations merge and add model objects. These can be formalised analogously using a graph morphism r : K N, where K is an old model and N is the new model. Of course, a data collection I that conforms to the old model must not be touched, if we do not merge elements in K. But the question arises, whether association instances should be added, because without them attributes remain null. [2] discusses the introduction of default values, but also mentions that identifying a true default value could be difficult. Here, we do not discuss default values, such that no elements are added during migration. Example 1 (Adding of model objects). Let K r N be a refactorisation which only adds elements. Then the migration of typed data I 5 Class or object models, but also database schemas and their existing data i K is given by I. Hence the diagram in

4 Harald König, Michael Löwe, Christoph Schulz Figure 3 describes the migration. To obtain a typing of I in the new model we just have to compose the old typing and the refactorisation r. It can easily be checked that a pullback is produced. K r N i P.B. r i I id I Fig. 3. Adding model objects causes the migration to be the identity It seems reasonable to be able to combine both the Unfolding and Deleting procedure and the Merging and Adding procedure, see the examples in [1]. Hence a complete refactorisation should consist of a span M l r K N of two graph morphisms. This is very similar to graph transformation rules (see [4]), where a graph transformation instance is then realised using a double-pushout approach. This symmetric semantics is advantageous because one gains the ability to easily reverse transformations. However, the next examples will show that migrations cannot be based on a symmetric approach in general, i.e. the right hand side will not always become a pullback. Example 2 (No pullback on the right hand side). This example (see Figure 4) shows that in the general case we cannot expect to obtain a pullback on the right hand side of the migration 6. This is due to the fact, that the new model is the terminal object in Alg(Graph), s.th. the pullback must be the product of the old model and the new data. Because the old model contains three elements, the number of elements in the pullback must be a multiple of three, but there is exactly one element. A a B i r ABa B? Fig. 4. No reasonable migration In this example the question arises, whether the refactorisation is reasonable at all. If model objects A and B shall be merged, there should be no objects of type A or B, because without additional information, one cannot determine how these objects shall be combined 7. The same effect causes the ambiguity in the next example. 6 We depict elements of Alg(Graph) using dots for elements x with s(x) = x (nodes) and arrows for the other elements (edges), see the earlier interpretation of the Graph-specification. 7 Of course, an exception is the situation, where objects of A are in a 1:1-correspondence to B-objects.

Functor Semantics for Refactoring-Induced Data Migration 5 1 2 K 1 3 23 i 1 2 3 N I 1 r 23 r i I 1 I 2 3 2 3 2 3 r i 23 Fig. 5. Two non-isomorphic, non-comparable pullback complements Example 3 (Ambiguous pullback complement). If model objects 2 and 3 in K, which, in addition to the previous example, are differently structured, shall be merged, two different object pairings are possible. This results in non-isomorphic graphs I and I, see Figure 5. Note, that (I, i, r ) is the pullback of (r, i ) and (I, i, r ) is the pullback of (r, i ). The previous examples show that pullback complements can not be the correct semantics for data migration. In order to be able to define a unique semantics, we have to add information to the relations between model objects. In fact, in Examples 2 and 3 it should only be possible to merge model elements if objects correspond to each other, for example in a 1:1-manner. E.g., in Figure 4, association a must be either a composition of elements A and B, i.e. each A-object contains exactly one B-object and each B-object is contained in exactly one A-object, or an inheritance relation, where B might be an abstract class 8. The way objects are composed, defines an equivalence relation resulting in a partition of graphs. In the sequel, equivalence classes will be referred to as components of the graphs. Moreover, the morphism, which types data, maps components bĳectively. We will refer to these morphisms as typings. But in the case, where B is not an abstract class or B has more than one concrete subclass, this requirement must slightly be weakened. Then, inheritance trees are only partially instantiated, such that typings only have to be injective on components, see [1] for a more detailed discussion of this approach. We will call these morphisms weak typings. The following example illustrates such a situation. Example 4 (Injectively instantiated Components). Assume that an old model provides a class Role (abbreviated by R), which is used for instance in insurance entities like claims or contracts to reference business partner data that is characteristic for the entity 9. An already performed 8 Then, as in object-oriented databases, we consider each A-object consisting of a (non-independent) B-object and additional data. 9 For a claim participant, one might be interested in the level of alcohol, a policy holder has a special address for contract correspondence.

6 Harald König, Michael Löwe, Christoph Schulz refactorisation in the way proposed in Example 1 added subclasses of Role like Policy Holder (P) and Claimant. Since this refactorisation did not change the instance structure, there are still objects of runtime type Role. Note, that R and P together with their connecting edge belong to the same component. This is where our example starts (see the graph M in Figure 6, where the Role-class is denoted by RH1). Non-singleton components are framed. In the old model, a contract (not shown in the Figure) referenced a Bank account (B) (used for premium debits) via the role object (named RH1 in graph M). Now, assume that some customers want the insurance to use different accounts for premiums and for claim credits, i.e., the premium account reference must be redirected to the policy holder class P. Assume that the affected part of the instance level looks like graph T in Figure 6. The redirection can be carried out by introducing a H(elper) class in an intermediate model K which is connected to P and R by 1 and 2. The span (l, r) unfolds the role class RH1 to R, H and 1 moving the bank account reference to H (morphism l) and merges P, H and 2 to P H2 moving the bank account association to the policy holder class (morphism r). 10 Note that both l and r do not identify across equivalence classes. RH1 M R K R N 1 2 B l H r B 1 B 2 P P P H2 t u v RH1 T R 1 l B H U R V 1 r B P B Fig. 6. Redirection of association Pulling-back (l, t) creates a new object H in U. From here a reasonable migration according to r is to leave the data as it is: the identification of H and P by r cannot be reflected because there is no P -object in U. Thus with the exception of the renaming r (H ) = P, r is an identity. Obviously this construction does not create a pullback in the right part of the diagram. But the migration was successful, because it created a new P -object P which now references B exactly as the refactorisation prescribes. Once again, this shows that a doublepullback approach is not appropriate. 10 Mapping edges to nodes is possible in Alg(Graph).

Functor Semantics for Refactoring-Induced Data Migration 7 If we take a closer look at the previous example we can derive a reasonable procedure to deal with these cases. We do not recall the general construction here (see [1]), but describe the algorithm intuitively: Construction 5 (Componentwise Merging). Let M l r K N be a refactorisation span and T t M be a weak typing. 1. Carry out the pullback of l and t on the left-hand side of the diagram resulting in an intermediate typed data U u K. 2. Componentwise merging: Let the components of V consist exactly of the disjoint union of the components (r u)(c) for all components C of U. Let U r V be this mapping. 3. Close the square on the right-hand side by defining { V N v : x (r u)((r ) 1 (x)) which is unique since ker(r ) ker(r u). This algorithm turns out to fit into the general framework introduced in Section 4. Example 6 (Cross-Component Merging). In the last example we merged objects within the same component only, e.g. P 2 H was merged to PH2. However, it is sometimes necessary to merge model objects belonging to different components. Consider the following situation, which is essentially the reverse process of the last example. Let us assume we have two subclasses of R(ole), namely P(olicy holder) and C(laimant), which both contain a separate reference to a B(ank account). Typically, the two associations are in different (singleton) components. If we want to shift and merge these associations into a single association between R and B, which is essentially abstracting the two associations into a single one, this can be achieved by two steps: 11 1. We let the role class R be the new source of both associations. This is technically done as described in the last example, except that the movement of the association source is performed in the opposite direction. This results in the role class R having two separate associations to the bank account class B (Fig. 7). 2. Now we simply merge both associations on the right side of the second refactorisation by merging the associations as well as their components (Fig. 8). Note that, in this case, the left side of the refactorisation is the identity morphism. Having done that on the model level, the question What happens to the instance level? remains. The first step results in similar effects as the last example moving the association source, with the exception that no new objects are introduced at the instance level. 12 The second step is more interesting, as the instance level remains completely unchanged. In particular, the merge of the two components at the model level does not cause a similar component merge at the instance level, i.e., the various existing association instances at the instance level remain in different components and are not merged. This is necessary, as merging two components at the instance level would necessitate a pairing of instance objects, which neither has to exist nor need to be unique (compare the discussion on object-level merging in Example 3). It follows that merging components cannot be done at the instance level if one wants to ensure that the resulting graph is essentially unique. 11 Eventually, both steps can be combined into a single step, as every sequence of refactorisations can be composed into a single step (see [1]) and Chap. 5. 12 Helper objects arise temporarily in the middle graph, but they disappear again on the right side of the refactorisation.

8 Harald König, Michael Löwe, Christoph Schulz R M 2b 2a l PH1a CH1b R K 2b 2a H P H C 1b 1a P C RH2 N 1b 1a r P C B B B t u v 2a R PH1a R T R 2b CH1b l 2a 2b H P H C 1a 1b P C R U RH2 RH2 V B B B B B B r 1a P C 1b Fig. 7. Cross-Component Merging: Step 1 3 Categorical preliminaries In this Section, we present some important auxiliary results that are the base for the existence of refactoring-induced migrations defined in Section 4. The main outcome is Theorem 15 on page 13. It uses the fact, that certain closure properties and smallness properties of a category C carry over to the category of arrows over an object, such that pullbacks can be seen as simple functors between these slice categories admitting a left-adjoint. We also discuss the general arrow category, but will use this result only for the formal treatment of some Examples in Section 2 (see Examples 4 and 6). In the sequel, C denotes an arbitrary category. For the time being, there are no assumptions made, but the choice of category will be restricted to reasonable settings, later on. Ob C and Mor C describe objects and morphisms of C resp. and, more precisely, for some A, B Ob C the set 13 of all arrows from A to B will be depicted by Mor C (A, B). If an object A belongs to a category C we sometimes write A C instead of A Ob C. A functor between categories is always denoted by a capital letter. We frequently use the following categories. 13 Here, we follow the definition of [5].

Functor Semantics for Refactoring-Induced Data Migration 9 R M R K R N P C l r P C P C 2 1 2 1 1, 2 B B B t u v R P R C 1 2 T R l R r P C 1 2 U R R V P C (1, 2) (1, 2) B B B B B B Fig. 8. Cross-Component Merging: Step 2 Definition 7 (Slice Category). Let A, B, C be categories and F : A C, G : B C be functors, then the slice category F G is the category with objects and morphisms Ob F G = {(A, f, B) : A Ob A, B Ob B, f Mor C (F A, GB)} Mor F G ((A, f, B), (A, f, B )) = {(a, b) Mor A (A, A ) Mor B (B, B ) : f F a = Gb f} i.e., the diagram in Figure 9 commutes. Identities are identity pairs and composition is defined componentwise. F A F a F A f = f GB Gb GB Fig. 9. Morphisms in the slice category If 1 is the one object category (with only the identity morphism) and D is any category with fixed A Ob D, there is the constant functor Const A : 1 D mapping the object to A. Special slice categories emerge from the following settings.

10 Harald König, Michael Löwe, Christoph Schulz A = C, B = 1 : C A = (Id, Const A ) (The slice category over A) B = C, A = 1 : C A = (Const A, Id) (The slice category under A) A = B = C : C 2 = (Id, Id) (The arrow category of C) where Id is the identity functor. From now on, we assume that C has pullbacks. It can easily be seen, that the following mapping indeed defines a functor between slice categories. Definition 8 (Pullback Functor). Let M, N C and r Mor C (M, N). Then for any arrow i I N C N the pullback P Pr(i) M, P r I of (r, i) defines a functor P r : C N C M sending α Mor C N (i 1, i 2 ) to the unique completion morphism of (P r (i 1 ), α r ) into the domain of P r (i 2 ). i In the following, we will always write I N C N, i.e. the domain name of the object is always given by the capital letter of the object name. It is convenient to write P r ( I i N ) = ( P r (I) Pr(i) M ) although the notation P r (I) is not correct (since P r is not a functor on the object level in C). The following result can be found in [6], section 15.3. Theorem 9 (Left-Adjoint of Pullback Functor). Let M, N C and r Mor C (M, N). Then the functor { C M C N F r : α ( j 1 α j 2 ) ( r j 1 r j 2 ) is the left-adjoint of P r. In the sequel, we will write F G to denote the fact, that F is left-adjoint to G. Theorem 9 shows that in Example 1 from Section 2 the migration is the composition F r P l of applying the pullback functor (on the left-hand side of the refactorisation) and the left-adjoint from Theorem 9 (on the right-hand side). But, as discussed in Section 2, typing morphisms often have to be restricted. Thus, we will not always consider P r as a functor on the whole category C N, but also on appropriate subcategories. If, e.g., each model object shall possess a default value, it is reasonable to consider the subcategory of epic arrows together with all morphisms between them. Because we want to pursue the adjointness aspects we have to look for reflective subcategories, such that the left-adjoint functor from Theorem 9 can be composed with the left-adjoint of an inclusion. In order to obtain useful results, it is now necessary to establish certain restrictions on the underlying categories.

Functor Semantics for Refactoring-Induced Data Migration 11 Definition 10 (Subobjects and Quotient Objects). Let A Ob C. If B m A is a monic arrow, we call (B, m) a subobject of A. If A e Q is an epic arrow, we call (Q, e) a quotient object of A. Two subobjects (B, m) and (B, m ) of A are said to be isomorphic if there is a C-isomorphism B i B, such that m = m i. Two quotient objects (Q, e) and (Q, e ) are called isomorphic, if there is Q i Q, s.th. i e = e. We require that the following conditions hold: 1. C is complete (i.e. C possesses all small limits) 2. C is wellpowered and co-wellpowered (i.e., for each object A C each class of non-isomorphic subobjects (resp. quotient objects) of A is a set) Later on, we will show that these restrictions cover all structures needed for our purposes. Thus, in connection with our analysis of subcategories, the following theorem is important (see [5]) 14. Lemma 11 (Characterisation of Epi-Reflection). Let C be a category that fulfills conditions 1 and 2 above and U be a full subcategory of C. The inclusion functor I : U C has a left adjoint with epimorphic unit, if and only if U is closed under the formation of products and extremal subobjects 15. We now want to investigate in which way conditions 1 and 2 carry over from C to C A for any A Ob C. To demonstrate some functorial properties in this context, we give a more or less detailled proof for this fact. Lemma 12. Any morphism α Mor C A is an isomorphism (monomorphism, epimorphism resp.) in C A if and only if it is an isomorphism (monomorphism, epimorphism resp.) in C. Proof. It is straightforward to show, that α Mor C A is an isomorphism (monomorphism, resp.) in C A if and only if α is an isomorphism (monomorphism, resp.) in C and if α is an epimorphism in C, it is also an epimorphism in C A. To show the reverse statement for epimorphisms, we define the forgetful functor { C A C Σ A : α ( j 1 α j 2 ) ( J 1 J 2 ) where J 1 j 1 A, J 2 j 2 A C A. In [7], it is shown that Σ A. A where the functor. A sends each B C to the projection of B A into A and each morphism f : B C to (f, id) : B A C A. 16 Thus, as a left-adjoint, Σ A also preserves epimorphisms. Hence, α Mor C A is an epimorphism in C A if and only if α is an epimorphism in C. 14 The result is a combination of statements in Chapter 12 and 16 of [5]. 15 A is an extremal subobject of B if there is an extremal monomorphism A m B, i.e., a monomorphism m with the property: m = f e and e is epic = e is an isomorphism. 16 See Proposition 1.33 in [7].

12 Harald König, Michael Löwe, Christoph Schulz Lemma 13. Let b 1, b 2, b C A. m If b 1 1 m b, b 2 2 b are monomorphisms in C A such that (B 1, m 1 ), (B 2, m 2 ) are isomorphic subobjects of B in C, then (b 1, m 1 ), (b 2, m 2 ) are isomorphic subobjects of b in C A. e 1 e If b 2 b 1, b b 2 are epimorphisms in C A such that (B 1, e 1 ), (B 2, e 2 ) are isomorphic quotient objects of B in C, then (b 1, e 1 ), (b 2, e 2 ) are isomorphic quotient objects of b in C A. Proof. Let i : B 1 B 2 be an isomorphism with m 2 i = m 1. It suffices to show, that i is also a C A-morphism, because then, by Lemma 12, it is a C A-isomorphism commuting with m 1 and m 2. Because m 1, m 2 Mor C A we have such that b 2 i = b m 2 i by (2) b m 1 = b 1 (1) b m 2 = b 2 (2) = b m 1 because (B 1, m 1 ), (B 2, m 2 ) are isomorphic = b 1 by (1) which yields i Mor C A. If i is an isomorphism of the quotient objects (B 1, e 1 ) and (B 2, e 2 ), we have i e 1 = e 2. Once again, it suffices to show, that i Mor C A. Because we obtain b = b 1 e 1 (3) b = b 2 e 2 (4) b 2 i e 1 = b 2 e 2 because (B 1, e 1 ), (B 2, e 2 ) are isomorphic = b by (4) = b 1 e 1 by (3). Because e 1 is epic, i Mor C A. Lemma 14 (Completeness and Smallness carry over to Slice Category). Let C be a category that fulfills conditions 1 and 2 and A Ob C. Then the conditions 1 and 2 hold in C A and if C has finite colimits, 1 and 2 hold in C 2. Proof. The second statement follows from the fact, that C 2 is isomorphic to the category of functors from 2 to C, where 2 is the two-object category. These functor categories are known to have limits if C has (by componentwise limit construction, see [8]) and are wellpowered and co-wellpowered if C possesses these properties and has finite colimits (see [9]).

Functor Semantics for Refactoring-Induced Data Migration 13 Now we consider the first statement: It is shown in [10] that closure wrt. limits carries over to the slice category C A. It remains to show that the smallness property (Condition 2) is transferred from C to C A. Let therefore ( b i m i b ) i I be a family of non-isomorphic subobjects of b C A indexed by some class I. By Lemma 12 the monic arrows are monic in C producing a family ( B i m i B ) i I (5) of subobjects of B C. Assume now, that (B i, m i ) and (B j, m j ) are isomorphic for some pair i, j. Then, by Lemma 13, (b i, m i ) and (b j, m j ) are isomorphic subobjects of b in C A which yields i = j. Thus (5) is a non-isomorphic familiy of B-subobjects indexed by the class I and thus, by Condition 2, I a set. Hence the original familiy of b-subobjects is a set, as well. The same argument can be used for a family of non-isomorphic quotient objects, which concludes the proof of the Lemma. The following result concludes our considerations of adjointness of the pullback functor. Theorem 15 (Free Construction into Subcategories). Let C be a category which fulfills conditions 1 and 2, M, N Ob C, r Mor C (M, N), and U be a full subcategory of C N that is closed under the formation of products and extremal subobjects. Let I U : U C N be the corresponding inclusion functor. Then the restriction of the pullback functor P r I U : U C M has a left-adjoint. Proof. Theorem 9 gives F r P r. From Lemma 14, we deduce that C N also fulfills conditions 1 and 2, such that by Lemma 11, there is a functor F U, s.th. F U I U. Since left-adjoints compose, F U F r P r I U. A comfortable notation for the free functor into the subcategory will be F r (ignoring that this was already the notation in Theorem 9), if it is clear, which subcategory is considered. We conclude this Section with two specialisations of the developed theory. Let Spec = (S, Op, I) be an implicational algebraic specification, i.e., a family S of sorts together with a family Op of (S S)-indexed operation symbols and a family I of implications. E.g., the specification Graph on page 2 is structured like this. It is shown in [11], that the category C = Alg(Spec) of all algebras that conform to such an implicational specification is closed under the formation of small products and subalgebras. Since for the existence of limits it suffices to have equalisers (which are subobjects, i.e., subalgebras) and small products, this shows that Alg(Spec) fulfills Condition 1. Moreover, Alg(Spec) admits a forgetful functor V : Alg(Spec) SET. In [5] it is shown that each category that admits a faithful functor to SET 17 (which is true for V ), is regular wellpowered and regular co-wellpowered. Since monomorphisms (epimorphisms resp.) are regular monomorphisms (regular epimorphisms, resp.) in Alg(Spec), we conclude, that Alg(Spec) also fulfills Condition 2. 17 A functor is called faithful, if it is injective on the hom-sets.

14 Harald König, Michael Löwe, Christoph Schulz In addition to that, Alg(Spec) has all colimits. This can be seen, using the fact, that Alg(Spec) is a full subcategory with the necessary closure properties (see above) to admit a left-adjoint of the inclusion functor I : Alg(Spec) Alg(S, Op) where Alg(S, Op) denotes the category of all algebras w.r.t. to the signature (S, Op). Alg(S, Op) has pushouts [12] and, since there is an initial object, Alg(S, Op) has colimits which are preserved by the above mentioned left-adjoint of the inclusion. Taking Lemma 14 into account, these considerations can be summarised as follows. Corollary 16. Let Spec be an arbitrary implicational algebraic specification. Then Theorem 15 holds for C = Alg(Spec) and for C = (Alg(Spec)) 2. 4 Migration framework From now on, we always assume that C satisfies conditions 1 and 2 on page 11. Definition 17 (Refactorisation Rule). Let M, K, N Ob C. The span M l r K N is called a refactorisation (rule). Definition 18 (Migration and Migration Functor). Let M l K r N be a refactorisation, and U M, U K, U N be subcategories of C M, C K, C N resp. Assume that the pullback functors conform with the subcategory structures, i.e., P l : U M U K and P r : U N U K. If the left-adjoint F r : U K U N of P r : U N U K exists, then { C M C N M : i (F r P l )(i) is called the Migration Functor. If i C M is typed data the Migration of i is M(i). Theorem 19. The migration from Definition 18 exists, if U N is a full subcategory of C N that is closed under the formation of products and extremal subobjects. It is unique up to isomorphism. Proof. The existence statement follows from Theorem 15. Uniqueness is guaranteed, because pullbacks and left-adjoints are unique up to isomorphism. In the case C = Alg(Spec) (or in the corresponding arrow category), the proof for the existence of the free construction into the subcategory is constructive: Given A Alg(Spec), in the first step one determines the set of all non-isomorphic quotient objects of A into the subcategory. In every practical example the carrier sets of A are finite, such that this set is also finite. The solution then is the epi-part in the epi-mono-factorisation of the product of these quotient objects, which can be computed. Moreover, the pullback P l and the functor F r are computable, thus, by Theorem 15 M is computable, which can be used for tool support to compute refactoring-induced migrations.

Functor Semantics for Refactoring-Induced Data Migration 15 We now want to show, that all those examples in section 2, where a successful migration was defined, are migrations in the sense of Definition 18. We already mentioned that this is true for Example 1, see Theorem 9. Consider now Example 4 where the difficulty origins from the treatment of the right-hand side of the construction. Components can be formalised using arrows A f B, which induce an equivalence relation ker(f) on A if there is a forgetful functor V : C SET. Thus, the arrow category is a good choice for generalising graphs that are equipped with a component structure. In the following, elements of Ob C 2 will always be denoted in the form G g G, i.e. the domain name of such an object is the capital letter of the object name (always a small letter) and the codomain is the underlined domain name. A morphism in Mor C 2(g, h) is denoted by a pair of the form (a, a). The weak typing property can then formally be expressed by Definition 20 (Weak typing). (f, f) Mor C 2(g, h) is called a weak typing, if the unique completion of ( G f H, G g G ) into the pullback ( P f H, P h G ) of ( G f H, H h H ) is a monomorphism, see Figure 10. f H h H f = P u = G g P.B. h G Fig. 10. General definition of Weak Typing f The following Lemma gives a convenient characterisation of weak typings. Lemma 21 (Characterisation of Weak Typings). (f, f) Mor C 2(g, h) is a weak typing if and only if for all X Ob C and arrows X x G, X y G f x = f y g x = g y = x = y (6) Proof. = : If f x = f y g x = g y, then from Figure 10, we obtain f u x = f u y h u x = h u y In pullback squares, the two arrows where the domain is the pullback object, are jointly monic, which gives u x = u y and thus, because u is monic, x = y. = : Let two arrows x, y be given, such that u x = u y. Then also f u x = f u y h u x = h u y which yields f x = f y g x = g y. Hence, by assumption, x = y, such that u is monic.

16 Harald König, Michael Löwe, Christoph Schulz Vividly speaking, in algebraic categories with carrier sets, the property (6) describes the situation where classes in H are instantiated by f at most once in each component of G. This is the case in object composition hierarchies and in inheritance trees, see section 2. In [1] the case of (true) typings (pure 1:1-correspondences) which, in fact, lead to pullback complements, is also considered. Theorem 22 (Epi-reflectiveness of Weak Typings). Let n Ob C 2 and U be the subcategory of C 2 n which consists of all weak typings together with all morphisms between them. Then U is closed under the formation of products and extremal subobjects. (s,s) n is an extremal sub- Proof. Consider first a weak typing i (t,t) n and assume that j object of (t, t). Thus there is a monic arrow (m, m) : (s, s) (t, t) (the additional property for extremality is not needed). The situation is as in Figure 11. X x y n N N s t s j t J J m m I I i Fig. 11. Looking at a tent from below To show that (s, s) is a weak typing, let two arrows X x y J be given with j x = j y s x = s y (7) By Lemma 21 we have to show that x = y. Observe first, that by (7) i m x = i m y t m x = t m y. Since (t, t) is a weak typing, by Lemma 21 it follows that m x = m y (8) Now id X (x,j x) (y,j y) j are two C 2 -morphisms and we have by assumption m j x = m j y

Functor Semantics for Refactoring-Induced Data Migration 17 in Figure 11. Together with (8) this gives (m, m) (x, j x) = (m, m) (y, j y) By Lemma 12 the (C 2 n)-monomorphism (m, m) is also monic in C 2 which yields (x, j x) = (y, j y) and hence x = y. Let now ( j i (t i,t i ) n ) i I be a family of weak typings in U and q (p,p) n their product with projections (π i, π i ) : (p, p) (t i, t i ) Since the projections are (C 2 n)-morphisms, the equations t i π i = p (9) are valid for all i I. Again, by Lemma 21 we have to show that from the hypothesis p x = p y q x = q y (10) one can conclude x = y. To do this, observe first, that from (9) and (10), we get for all i. Moreover, t i π i x = t i π i y (11) j i π i x = π i q x (because (π i, π i ) Mor C 2(q, j i )) = π i q y (by (10)) = j i π i y (see first conversion) Since (t i, t i ) are weak typings, this together with (11) gives π i x = π i y for all i I (using again Lemma 21). Since limits are constructed componentwise, Q (the domain of q) is the product of all J i (the domain of the j i ) with projections π i. This yields x = y. The next Lemma shows that weak typings are stable under pullbacks in any category where Conditions 1 and 2 are valid. Lemma 23 (Pullbacks preserve Weak Typings). Let (r, r) : a b be an arrow in C 2 and (i, i) : c b be a weak typing. Then P (r,r) (i, i) is a weak typing, too. Proof. Define (i, i ) = P (r,r) (i, i) : p a, such that the pullback situation is as in Figure 12. Let X x y P be given with p x = p y and i x = i y (12) Because (r, r ) Mor C 2(p, c) we obtain c r x = c r y from the first equation in (12). And the commutativity of the diagram in Figure 12 together with the second equation in (12) gives i r x = i r y. Because (i, i) is a weak typing, by Lemma 21, r x = r y. In the pullback, r and i are jointly monic, such that x = y. Thus, by Lemma 21 and (12), (i, i ) is a weak typing.

18 Harald König, Michael Löwe, Christoph Schulz (i,i ) a p (r,r) (P.B.) b c (r,r ) (i,i) Fig. 12. Pullback in arrow category (l,l) Suppose that a refactorisation rule m k (r,r) n in Alg(Spec) 2 and a weak typing (t, t) : i m are given. Constructing the migration as in the first step of Construction 5 produces h, such that the square (1) in Figure 13 is a pullback diagram. By Lemma 23, (u, u) is a weak typing and in [1] it is shown, that (concerning the componentwise merging in the second and the third step) the C 2 -morphisms (r, r ), (v, v) have the following properties: (2) in Figure 13 commutes (13) (v, v) is a weak typing (14) r is isomorphic, r is an epimorphism and x, y : X H : (h x = h y) = (r u x = r u y r x = r y) (15) (t,t) m i (l,l) (1) (l,l ) k (u,u) h (r,r) n (2) j (r,r ) (v,v) Fig. 13. Migration in arrow category Moreover, the diagram (2) enjoys the following universal property: For each triple ( h (w,w) b, b (t,t) a, n (z,z) a ) of C 2 -morphisms where (t, t) is a weak typing and (t, t) (w, w) = (z, z) (r, r) (u, u), there is a unique arrow j b such that the diagram parts (3) and (4) commute in Figure 14. Moreover, any j with this property coincides with j up to isomorphism. In [1] we referred to this property as an abstract folding. Theorem 24 (Componentwise Merging is Free Construction). Let C = (Alg(Spec)) 2, m (l,l) k (r,r) n

Functor Semantics for Refactoring-Induced Data Migration 19 (u,u) k (r,r) n (z,z) a h (r,r ) (w,w) j (3) (v,v) (4) (t,t) Fig. 14. Universal property of componentwise merging b be a refactorisation rule and U m, U k, U n be the subcategories of weak typings in C m, C k, C n, resp. If i (t,t) m U m, then in Figure 13 (v, v) = M(t, t). Proof. We abbreviate P = P (r,r) and simplify the notation of C 2 -morphisms by writing f instead of (f, f). Let η h : h P (j) be the universal completion from h into the pullback P (j) P (v) k, P (j) r j which can be constructed in Figure 13. η h exists because of (13). Thus r η h = r (16) P (v) η h = u (17) By Lemma 23 the pullback functor maps weak typings to weak typings in Alg(Spec). By Theorem 22 and Corollary 16 the pullback functor has a left-adjoint. Thus, by the essential uniqueness of left-adjoints, it suffices to show, that j in Figure 13 meets the requirements of left-adjointness. k P (v) u η h h P (j) ϕ P (ϕ ) P (j ) P (x) r r r n v j ϕ j x Fig. 15. Componentwise merging is a free construction Let, therefore, an object j x n U n be given and let ϕ : u P (x)

20 Harald König, Michael Löwe, Christoph Schulz be an arrow, compare Figure 15. Let r : P (j ) j be the j -projection in this pullback. Since x r ϕ = r P (x) ϕ (pullback property) = r u (because ϕ Mor C (u, P (x))) and x is a weak typing, from the universal property in Figure 14, we obtain a unique arrow ϕ : j j with Now we can derive and ϕ r = r ϕ (18) x ϕ = v (19) r P (ϕ ) η h = ϕ r η h (by Definition 8) = ϕ r (by (16)) = r ϕ (by (18)) P (x) P (ϕ ) η h = P (v) η h (by (19)) = u (by (17)) = P (x) ϕ (because ϕ Mor C (u, P (x))) Because we have a pullback situation, r and P (x) are jointly monic, such that by (20) and (21) (20) (21) P (ϕ ) η h = ϕ (22) To show that ϕ is the universal lift of ϕ, we observe that any other ϕ with (19) and (22) meets the prerequisites of the universal property in Figure 14, which gives ϕ = ϕ. Hence the functorial mapping u v is left-adjoint to P and η is the unit in this construction. For later use, we state an important implication of this result. From the well-known homomorphism theorem it follows that (15) is a characterisation of the whole free construction U k U n of weak typing subcategories, i.e., (r, r ) is the unit in this construction if and only if (15) holds. 18 In the special case (u, u) = id (certainly a weak typing), we obtain a characterisation of the part F n : (C 2 n) U n of this construction: For all h (r,r) n Ob C2 n, the unit (r, r ) : (r, r) F n (r, r) of F n is characterised by the following properties: r is an isomorphism, r is an epimorphism and x, y : X H : (h x = h y) = (r x = r y r x = r y) (23) 5 Sequential Composition In this section, we investigate under which circumstances refactorisations can be sequentially composed. Consider two refactorisation rules in a category where conditions 1 and 2 hold. It is clear, that λ R 1 = ( m 0 1 k 0 ρ 1 λ m 1 ) and R 2 = ( m 1 2 k 1 ρ 2 m 2 ) (24) 18 Note, that the epimorphism property of r also follows from Theorem 11.

Functor Semantics for Refactoring-Induced Data Migration 21 can naturally be combined to one span by using the pullback of ρ 1 and λ 2 19 : Definition 25 (Sequential Composition of Refactorisation Rules). Let two refactorisation rules R 1 and R 2 be given as above. The sequential composition is defined by λ R 2 R 1 = ( m 0 1 λ 2 m 1 ρ 2 ρ 1 m 2 ) where ( m 1 λ 2 k 0, m 1 ρ 1 k 1 ) is the pullback of ( k 0 ρ 1 m 1, k 1 λ 2 m 1 ), see Figure 16. m 0 k 0 λ 1 λ 2 m 1 ρ 1 (P B) k 1 ρ 2 ρ 1 λ 2 m 1 m 2 Fig. 16. Sequential Composition It is now important to establish the connection between a composed migration and a migration of a composition: I.e., if M R1 and M R2 are the migration functors for R 1, R 2, resp., the question is whether M R1 M R2? = MR2 R 1 holds, where the latter is the migration functor of R 2 R 1. It is obvious that the validity of this statement depends on the subcategories under consideration: If, e.g. the target subcategory into which M R1 constructs is in a certain sense smaller than the target of M R2, one cannot expect that the composition (which omits the intermediate step into the target of M R1 ) determines the same result. Thus, we have to find a criterion to compare subcategories of different slice categories. Since we only work with epi-reflective subcategories, we can always assume that the left-adjoints of the inclusion functors exist. For the investigation of the composition procedure, we consider two sequentially composed refactorisation rules as in Figure 17. In this diagram, R 1 and R 2 are given as in (24), ( i 1 λ 2 h 0, i 1 ρ 1 α 1 = M R1 (α 0 ) and α 2 = M R2 (α 1 ), (25) h 1 ) is the pullback of ( h 0 ρ 1 i 1, h 1 λ 2 i 1 ), and γ is the unique completion of β 0 λ 2 and β 1 ρ 1 into the top face pullback. Assume, that in Figure 17 the construction takes place on appropriate subcategories of the involved slice categories according to the requirements of Theorem 15. Let especially I m : U m C m 1 19 In this section we denote objects of the category C with small letters and morphisms between them with greek letters.

22 Harald König, Michael Löwe, Christoph Schulz λ m 1 0 k 0 λ 2 m 1 ρ 1 γ k ρ 2 1 m 2 α 0 β 0 ρ 1 λ 2 m 1 α 1 β 1 α 2 i 1 i 0 λ 1 h 0 ρ 1 h 1 λ 2 ρ 2 i 2 ρ 1 i 1 λ 2 Fig. 17. Migration sequence and I k : U k C k 1 be the inclusion functors for these subcategories where F m and F k are their left-adjoints. Theorem 26 (Sequential Composition of Migrations). Let λ R 1 = ( m 0 1 k 0 ρ 1 λ m 1 ) and R 2 = ( m 1 2 k 1 ρ 2 m 2 ) be two refactorisation rules with migration functors M R1 and M R2. Let R 2 R 1 be defined as in Definition 25, M R2 R 1 its migration functor, and i 0 α 0 m 0 be an arbitrary element of the subcategory of C m 0, such that α 2 = M R2 (M R1 (α 0 )) (compare (25)). If then Proof. We claim that P λ2 F m = F k P λ2, (26) α 2 = M R2 R 1 (α 0 ) β 1 = F ρ1 (γ) (27) where F ρ1 is the free construction of the restriction of the pullback functor P ρ1 into U k as in Theorem 15. Assume for the moment, that (27) holds, then we can argue as follows: By the composition and decomposition properties of pullbacks, we can easily deduce that the left face of the cube in Figure 17 is a pullback, too. By construction α 2 = F ρ2 (β 1 ) and thus α 2 = F ρ2 F ρ1 (γ) (by (27)) = F ρ2 ρ 1 (γ) (because left-adjoints compose) = F ρ2 ρ 1 (P λ1 λ 2 (α 0 )) (because pullbacks compose) = M R2 R 1 (α 0 ) (by Definition 18 for the composed migration)

Functor Semantics for Refactoring-Induced Data Migration 23 It remains to show (27). To do this, we extract the cube of Figure 17 and consider it in more detail, see Figure 18. β 0 k 0 h 0 m 1 λ 2 ρ 1 γ k 1 λ 2 ρ 1 m 1 β 1 i 1 P id λ2 (ρ 1 β 0 ) P λ2 (η) i 1 h 1 ρ 1 β 0 id λ 2 α 1 λ 2 i 1 λ 2 h 0 η Fig. 18. Extract from Figure 17 As in the proof of Theorem 15 we split the construction α 1 = F ρ1 (β 0 ) into its two parts: F ρ1 = F m F where F (β 0 ) = ρ 1 β 0 as in Theorem 9. If we now construct the pullback of k 1 λ 2 m 1 and h 0 ρ 1 β 0 m 1, we obtain an object which reveals to be i 1 because in the cube, the right, top and left faces are pullbacks and thus the bottom face as well, hence h 0 id h 0 is transferred by this pullback procedure to the identity arrow at the lower back edge. This gives P λ2 (ρ 1 β 0 ) = ρ 1 γ (28) which, by Theorem 9, is the image of the left-adjoint of the pullback functor P ρ1 of γ. Thus, we obtain (27) from β 1 = P λ2 (α 1 ) (by construction) = P λ2 (F m (ρ 1 β 0 )) (by (25) and Theorem 15) = F k (P λ2 (ρ 1 β 0 )) (by assumption (26)) = F k (ρ 1 γ) (by (28)) = F ρ1 (γ) (by Theorem 15) Thus, we are able to compose a sequence of several refactorisation steps into one migration, which captures the effect of the whole sequence. On the other hand, refactorisation rules are decomposable into atomic actions supporting reuse of already constructed migration steps.

24 Harald König, Michael Löwe, Christoph Schulz is Now the question arises, in which cases the strong assumption (26) holds. The first observation Lemma 27. Let C = (Alg(Spec)) 2 and all subcategories be weak typings in Theorem 26. Then (26) is satisfied. k λ m P λ (τ) P λ (τ ) τ p λ i η τ P λ (η) i λ p Fig. 19. The tent from Figure 18 Proof. A relevant situation is shown in Figure 19 with slightly different notations. If the left and right sides are pullbacks along λ, the bottom face is a pullback as well. Let η = (f, f) be the unit of the free construction into the subcategory of C m of weak typings. By (23) f is an isomorphism and f is epic. Because limits are constructed pointwise in functor categories and pullbacks preserve epimorphisms in Alg(Spec), these properties transfer to g and g in P λ (η) = (g, g). By Lemma 23, the weak typing property of τ carries over to P λ (τ ). Let now two arrows x, y : X P be given such that p x = p y. Using Lemma 21 we obtain P λ (τ) x = P λ (τ) y P λ (τ ) P λ (η) x = P λ (τ ) P λ (η) y P λ (η) x = P λ (η) y (because P λ (τ ) is weak typing) By (23) this characterises the unit in the free construction into the subcategory of C k, such that F k P λ = P λ F m Weak typings also cover the case, where no restriction is put on the typing morphisms and no componentisation of models and instances is performed, because each C is a subcategory of C 2 by the embedding functor A f B ( A id A ) (f,f) ( B id B ) Thus Lemma 27 and Theorem 26 can be summarised by

Functor Semantics for Refactoring-Induced Data Migration 25 Corollary 28. In the two cases C = Alg(Spec) with no restriction on the subcategories C = (Alg(Spec)) 2 with the subcategories of weak typings each migration sequence is equal to the migration of the sequential composition of two refactorisation rules, i.e, M R2 M R1 = M R2 R 1 Another characterisation for compatible migration sequences can be formulated in cartesian closed categories. [10] (chapter 11) shows that for any k, m Ob C and k λ m the pullback functor P λ : C m C k possesses a right adjoint Π λ : C k C m in cartesian closed categories. An example for a cartesian closed category is Alg(Graph ) because this category is isomorphic to SET S, the category of functors from the term category S in Figure 1 to SET. [10] (chapter 20) shows that cartesian closedness carries over to the functor category (Alg(Graph )) 2. Theorem 29 (Sequential Composition of Migrations in Cartesian Closed Categories). Let C be cartesian closed and λ R 1 = ( m 0 1 k 0 ρ 1 λ m 1 ) and R 2 = ( m 1 2 k 1 ρ 2 m 2 ) be two refactorisation rules with migration functors M R1 and M R2 as in Theorem 26. Let R 2 R 1 be defined as in Definition 25 and M R2 R 1 its migration functor. If then P λ2 : U m U k and Π λ2 : U k U m (29) M R2 M R1 = M R2 R 1 Proof. We recall the inclusion functors I m : U m C m and I k : U k C k. The second part of the assumption (29) can be reformulated as I m Π λ2 = Π λ2 I k. Because of the first part of assumption (29), the left-adjoints of the functors in this equation are equal as well, yielding P λ2 F m = F k P λ2. This is just (26), such that the result follows from Theorem 26. This reflects the fact that the subcategories in the middle of the composed migration have to be somehow similar. Here, the condition expresses that the pullback functor and its reverse (the right-adjoint in this case) respect the subcategories. We see, that the difference between U m and U k must not be too big to allow composition compatibility.