UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES

Computational Intelligence, Volume 20, Number 2, 2004 UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES MICHAEL MCGEACHIE Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA JON DOYLE Department of Computer Science, North Carolina State University, Raleigh, NC Ceteris paribus (all-else equal) preference statements concisely represent preferences over outcomes or goals in a way natural to human thinking. Although deduction in a logic of such statements can compare the desirability of specific conditions or goals, many decision-making methods require numerical measures of degrees of desirability. To permit ceteris paribus specifications of preferences while providing quantitative comparisons, we present an algorithm that compiles a set of qualitative ceteris paribus preferences into an ordinal utility function. Our algorithm is complete for a finite universe of binary features. Constructing the utility function can, in the worst case, take time exponential in the number of features, but common independence conditions reduce the computational burden. We present heuristics using utility independence and constraint-based search to obtain efficient utility functions. Key words: qualitative decision theory, rationality, ceteris paribus preferences. 1. INTRODUCTION Some works on what one can call qualitative decision theories (Doyle and Thomason 1999) study reasoning and algorithmic methods over logical languages of qualitative preference rankings, especially methods that work by decomposing outcomes into their component qualities or features and then reasoning about the relative desirability of alternatives possessing different features (see, for example, Doyle, Shoham, and Wellman 1991; Bacchus and Grove 1996; La Mura and Shoham 1999). These theories provide mechanisms for stating conditions under which possible outcomes in one set are preferred to outcomes in another set by a given decision maker. Quantitative approaches to decision theory, in contrast, typically start with numeric valuations of each attribute, direct estimation of a decision maker s utility function, or detailed estimations of probabilities of different event occurrences given certain actions. In many domains, qualitative rankings of desirability provide a more natural and appropriate starting point than quantitative rankings (Wellman and Doyle 1991). As with qualitative representations of probabilistic information (Wellman 1990), the primary qualitative relationships often times immediately suggest themselves. Primary qualitative relationships also remain unchanged despite fluctuating details of how one condition trades off against others, and can determine some decisions without detailed knowledge of such tradeoffs. For example, in the domain of configuration problems one seeks to assemble a complicated system according to constraints described in terms of qualitative attributes, such as the type of a computer motherboard determining the kind of processors possible. Some user of a configuration system might prefer fast processors over slow without any special concern for the exact speeds. Purely qualitative comparisons do not suffice in all cases, especially when the primary attributes of interest are quantitative measures, or when outcomes differ in respect to some easily measurable quantity. For example, in standard economic models the utility of money is proportional to the logarithm of the amount of money. Quantitative comparisons also are needed when the decision requires assessing probabilistic expectations, as in maximizing expected utility. Such expectations require cardinal as opposed to ordinal measures of desirability. In other cases, computational costs drive the demand for quantitative comparisons. C 2004 Blackwell Publishing, 350 Main Street, Malden, MA 02148, USA, and 9600 Garsington Road, Oxford OX4 2DQ, UK.

UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES 159 For example, systems manipulating lists of alternatives sorted by desirability might be best organized by computing degrees of desirability at the start and then working with these easily sorted keys throughout the rest of the process, rather than repeatedly re-determining pairwise comparisons. These considerations suggest that different decision-making processes place different demands on the representation of preference information, just as demands on representations of other types of information vary across activities. Although some processes might get by with a purely quantitative or purely qualitative representation, one expects that the activities of automated rational agents will require both representations for different purposes, and have need to transfer preference information from one representation to the other. In formulating a new decision, for example, the agent might need to reason like a decision analyst, starting by identifying general preference information relevant to the decision at hand, continuing by assembling this information into a quantitative form, and followed by applying the quantitative construction to test problems or to the actual problem. These applications might reveal flaws in the decision formulation. Minor flaws might require only choice of a different quantitative representation of the qualitative information, much as a physical scientist might adjust parameters in a differential equation. Major flaws might require augmenting or emending the general qualitative information. All in all, both forms of representation seem required for different parts of the process, so we look for means to support such activities. In this work, we give methods for translating preference information from a qualitative representation to a quantitative representation. This process preserves many of the desirable properties of both representations. We maintain an intuitive, flexible, under-constrained, and expressive input language from our qualitative representation. From our quantitative representation, we gain an ordinal utility function: a function u that takes as input an assignment m of values to features F and returns a real number u(m) such that if m is logically preferred to m, then u(m) > u(m ). Because we focus on ordinal utilities, the utility function does not provide meaningful relative magnitudes, e.g., if u(m) > u(m ) neither u(m) u(m ) nor u(m)/(u(m )) need be meaningful quantities. As such, the functions we construct are unsuitable for certain applications that compare ratios of utilities or compute probabilistic expectations of utilities. We will consider a propositional language L augmented with an ordering relation used to express preferences over propositional combinations of a set F of elementary features. Given a set C of preferences in L over the subset F(C) of features appearing in C, and M the set of all models of F(C), we compute a utility function u : M R such that the preference ordering implied by u is consistent with C. We give a more formal explanation of this task after first providing some background about preferences and introducing some notation. Wellman and Doyle (1991) have observed that human preferences for many types of goals can be interpreted as qualitative representations of preferences. Doyle, Shoham, and Wellman (1991) present a theoretical formulation of human preferences of generalization in terms of ceteris paribus preferences, i.e., all-else-equal preferences. Ceteris paribus relations express a preference over sets of possible worlds. We consider all possible worlds (or outcomes) to be describable by some (large) set F of binary features. Then each ceteris paribus preference statement specifies a preference over some features of outcomes while ignoring the remaining features. The specified features are instantiated to either true or false, while the ignored features are fixed, or held constant. A ceteris paribus preference might be we prefer programming tutors receiving an A in Software Engineering to tutors not receiving an A, other things being equal. In this example, we can imagine a universe of computer science tutors, each describable by some set of binary features F. Perhaps F = {Graduated, SoftwareEngineering A, ComputerSystems A, Cambridge resident, Willing to work on Tuesdays,...}. The preferences expressed above state that, for a particular computer science tutor,

160 COMPUTATIONAL INTELLIGENCE TABLE 1. Properties of Possible Computer Science Tutors Tutor Feature Alice Bob Carol Graduated False False True A in Software Engineering True False False A in Computer Systems True True False Cambridge resident True True True Will work Tuesdays False False True.... they are more desirable if they received an A in the Software Engineering course, all other features being equal. Specifically, this makes the statement that a tutor Alice, of the form shown in Table 1, is preferred to another tutor Bob, also in Table 1, assuming the elided features are identical, since they differ only on the feature we have expresses a preference over (grade in Software Engineering). The ceteris paribus preference makes no statement about the relationship between tutor Alice and tutor Carol because they differ with regard to other features. 1.1. Approach In broad terms, we generate an ordinal utility function from a set C of ceteris paribus preferences over features F in the following steps. First we examine the preferences stated in C and determine which features might be assumed utility independent of which other features. Utility independence (UI) is the idea that it makes sense to talk of the utility contributed by some features independent of the values assumed by other features. This provides computational benefits, since utility independent sets of features can be considered without regard to the other features. Next, again using C, we can define subutility functions for each utility independent set of features. Such methods are based on representing the preorders consistent with the preferences C by building a graph over assignments to the features. Finally, to assign relative weights of satisfaction of different features, we solve a linear programming problem. In the end, we have built a utility function u that can be used to quickly evaluate the utility of different assignments to values of F. Since the work of Doyle and Wellman, other researchers have proposed methods of computing with different representations of ceteris paribus preferences. The work of Bacchus and Grove (1995) and La Mura and Shoham (1999) use quantitative preference statements, in contrast to our qualitative preference statements. Both systems use an adaptation of Bayesian Networks to utility as their computation paradigm. As heavily quantitative representations do, their representation requires a priori knowledge of many of the utilities and many of the probabilities of combinations of states. Despite our method and the methods of Bacchus and Grove (1995) and La Mura and Shoham (1999) all modeling ceteris paribus preference statements, the efforts are quite different. Their approaches assume that subutility functions are known, and that these can be combined into a utility function. Our work endeavors to build subutility functions from the raw preference statements themselves. Our work is similar to the work of Boutilier, Bacchus, and Brafman (2001), who compute an additive decomposition utility function from quantified preferences. Earlier work by Boutilier et al. (1999) uses qualitative conditional ceteris paribus preferences, and presents some strong heuristics for determining if one outcome is preferred to another, but uses a very

UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES 161 restricted representation for ceteris paribus preferences. Their methodology allows complicated logical conditions determining under what conditions a preference holds, but the preference itself is limited to only one feature, and the representation cannot express statement such as p q. Our language of preference statements, in contrast, can reference more than one feature at a time and express general comparisons. As we will show, their method has similar computational costs to our own methods, when our methods operate under favorable conditions. Since our method allows arbitrary preference statements as input, we sometimes run into cases that are computationally intractable. We now give a formal account of the ceteris paribus formulation we will use, then a statement of our task using that formalism. 1.2. A Formal All Else Equal Logic We employ a restricted logical language L, patterned after (Doyle et al. 1991). We simplify the presentation by using only the standard logical operators (negation) and (conjunction) to construct finite sentences over a set of atoms A, thus requiring one to translate sentences using disjunction, implication, and equivalence into (possibly larger) equivalent logical sentences using only negation and conjunction. Each atom a A corresponds to a feature f F, a space of binary features describing possible worlds. We write f (a) for the feature corresponding to atom a. Byliterals(A) we denote the atoms of A and their negations; literals(a) = A { a a A}. A complete consistent set of literals m is a model. That is, m is a model iff exactly one of a and a are in m, for all a A. We use M for the set of all models of L. A model of L assigns truth values to all atoms of L, and therefore to all formula in L and all features in F. We write f i (m) for the truth value assigned to feature f i by model m. A model satisfies a sentence p of L if the truth values m assigns to the atoms of p make p true. We write m = p when m satisfies p. Wedefine a proposition expressed by a sentence p, by[p] ={m M m = p}. A preference order is a complete preorder (reflexive and transitive relation) over M. When m m, we say that m is weakly preferred to m.ifm m and m m, we write m m and say that m is strictly preferred to m.ifm m and m m, then we say m is indifferent to m, written m m. The support of a sentence p is the minimal set of atoms determining the truth of p, denoted s(p). The support of p is the same as the set of atoms appearing in an irredundant sum-of-products sentence logically equivalent to p. Two models m and m are equivalent modulo p if they are the same outside the support of p. Formally, m m mod p iff m\(literals(s(p))) = m \(literals(s(p))). Model modification is defined as follows. A set of model modifications of m making p true, written m[p], are those models satisfying p which assign the same truth values to atoms outside the support of p as m does. That is, m[p] ={m [p] m m mod p}. A statement of desire is an expression of ceteris paribus preferences. Desires are defined in terms of model modification. We write p q when p is desired at least as much as q. Formally, we interpret this as p q if and only if for all m in M, m m[p q] and m m[ p q], we have m m. This is just a statement that p is desired over q exactly when any model making p true and q false is weakly preferred to any model making p false and q true, whenever the two models assign the same truth values to all atoms not in the

162 COMPUTATIONAL INTELLIGENCE support of p or of q. Ifp is weakly preferred to q and there is some pair of models m, m, where m makes p true and q false, m makes p false and q true, m, m assign the same truth values to atoms outside the support of p and q, and m is strictly preferred to m, we instead have a strict preference for p over q, written p q. 1.3. Preference Specification Ceteris paribus preferences are specified by a set of preference rules using the language L. A preference rule is any statement of the form p q or p q, where p and q are statements in L. If a preference rule c implies that m m, for m, m M, then we write m c m. A set of preferences rules C is said to be consistent just in case for all m, m M,itisnot the case that m is strictly preferred to m and m is strictly preferred to m. Given a set C of ceteris paribus preference rules, let [C] be the set of all weak preference orders over models such that the weak preference order satisfies the preference specification C. 1.4. Conditional Ceteris Paribus Preferences Conditional ceteris paribus preferences can be represented as well. A conditional ceteris paribus preference is a preference of the form: if r then p q, where p, q, and r are a statements in L. This is taken to mean that p q, aceteris paribus preference, holds just in case r holds. Such a preference is equivalent to the unconditional ceteris paribus preference r p r q whenever the support of r is disjoint from that of p and q, but need not be equivalent when these supports overlap (Doyle et al. 1991). Partly for this reason, conditional ceteris paribus preferences have attracted attention as an important extension of unconditional ceteris paribus preferences (Boutilier et al. 1999). In the following, however, we regard unconditional ceteris paribus preference statements as able to capture most of the conditional ceteris paribus preferences of interest, and use the correspondence between conditional ceteris paribus preferences and our simplified ceteris paribus in L to obtain results that apply to both types of ceteris paribus preferences when conditions and preferences do not share support. 1.5. Utility Functions We have described the language L in the preceding section. A utility function u : M R, maps each model in M to a real number. Each utility function implies a particular preorder over the models M by reflecting the ordering of utility values onto models. We denote the preorder implied by u with p(u). Given a finite set C of ceteris paribus preferences in a language L over a set of atoms A representing features in F, we attempt to find a utility function u such that p(u) [C]. We will carry out this task by demonstrating several methods for constructing ordinal utility functions, u; this involves demonstrating two things. First, we show what the function u computes: how it takes a model and produces a number. Second, we show and how to compute u given a finite set of ceteris paribus preferences C. These methods are sound and complete. We then demonstrate the heuristic means of making the computation of u more efficient. Before we discuss the function u,wedefine another representation of ceteris paribus preferences, which we refer to as feature vector representation. We discuss this in the following section.

UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES 163 2. FEATURE VECTOR REPRESENTATION The utility construction methods developed in following sections employ an intermediate representation of preferences in terms of simple rules that relate paired patterns of specified and do not care feature values. We will first define this feature vector representation. Then we will show how to convert between statements in the new representation and statements in the previously discussed logical representation (of Section 1.2). Our translation is designed so that it can be implemented automatically and used as a preprocessing step in systems computing with ceteris paribus preference statements. Finally we provide proofs of the expressive equivalence of the two representations. Let C be a finite set of preference rules. We define F(C) F to be the set of features corresponding to the atoms in the union of the support of each rule in C. That is: { F(C) = f (a) :a } s(c). c C Because each preference rule mentions only finitely many features, F(C) is also finite, and we write N to mean F(C). We construct utility functions representing the constraints in C in terms of model features. Features not specified in any rule in C are not relevant to compute the utility of a model, since there is no preference information about them in the set C. Accordingly, we focus our attention on F(C). 2.1. Definition We define the feature vector representation relative to an enumeration V = ( f 1,..., f N )off(c). We define the language L r (V) of feature vector rules in terms of a language L(V) of propositions over the ternary alphabet Ɣ ={0, 1, }. A statement in L(V) consists of a sequence of N letters drawn from the alphabet Ɣ, so that L(V) consists of words of length N over Ɣ. For example, if V = ( f 1, f 2, f 3 ), we have 10 L(V). Given a statement p L(V) and a feature f F(C), we write f (p) for the value in Ɣ assigned to f in p. In particular, if f = V i, then f (p) = p i. A feature vector rule in L r (V) consists of a triple p q in which p, q L(V) have matching values. That is, p q is in L r (V) just in case p i = if and only if q i = for all 1 i N. For example, if V = ( f 1, f 2, f 3 ), L r (V) contains the expression 10 00 but not the expression 10 0 0. We refer to the statement in L(V) left of the symbol in a rule r as the left-hand side of r, and denote it LHS(r). We define right-hand side RHS(r) analogously. Thus p = LHS(p q) and q = RHS(p q). We regard statements of L(V) containing no letters as models of L(V), and write M(V) to denote the set of all such models. We say a model m satisfies s, written m = s, just in case m assigns the same truth value to each feature as s does for each non feature in s. That is, m = s iff f (m) = f (s) for each f F(C) such that f (s). For example, 0011 satisfies both 0 1 and 00. We project models in M to models in M(V) by a mapping α: Definition 1 (Model projection). The translation α : M M(V) isdefined for each m M and f F(C)byα(m) = m, m M(V). For all f i V, f (α(m)) = 1if f m, f (α(m)) = 0if f m.

164 COMPUTATIONAL INTELLIGENCE This projection induces an equivalence relation on M, and we write [m] to mean the set of models in M mapped to the same model in M(V) asm: [m] ={m M α(m ) = α(m)}. (1) The translation α specifies that m and m must assign the same truth values to features that appear in L(V), but that on features not appearing therein, there is no restriction. When the feature vector of L(V) is the set of features F, there is a one-to-one correspondence of models in L(V) and L. We say that a pair of models (m, m )ofl(v) satisfies a rule r in L r (V), and write (m, m ) = r, ifm satisfies LHS(r), m satisfies RHS(r), and m, m have the same value for those features represented by in r, that is, m i = m i for each 1 i N such that LHS(r) i =. For example, (100, 010) = 10 01, but (101, 010) = 10 01. The meaning [r] ofaruler in L r (V) is the set of all preference orders over M such that for each m, m M, if(α(m),α(m )) = r, then m m. The meaning of a set R of rules consists of the set of preference orders consistent with each rule in the set, that is, [R] = r R [r]. Thus a rule 01 10 represents four specific preferences 0001 0010 0101 0110 1001 1010 1101 1110. Note that this says nothing at all about the preference relationship between, e.g., 0101 and 1010. The support features of a statement p in L(V), written s(p), are exactly those features in p that are assigned value either 0 or 1, which represent the least set of features needed to determine if a model of L(V) satisfies p. The support features of a rule r in L r (V), denoted s(r), are the features in (LHS(r)). The definition of L r (V) implies that (LHS(r)) = (RHS(r)). We say that a pair of models (m, m )ofl(v) satisfies a rule r in L r (V), and write (m, m ) = r, ifm satisfies (LHS(r)), m satisfies (RHS(r)), and m, m have the same value for those features represented by in r, that is, m i = m i for each 1 i N such that LHS(r) i =. For example, (100, 010) = 10 01, but (101, 010) = 10 01. The meaning [r] ofaruler in L r (V) is the set of all preference orders over M such that for each m, m M, if(α(m),α(m )) = r, then m m. The meaning of a set R of rules consists of the set of preference orders consistent with each rule in the set, that is, [R] = r R [r]. Thus a rule 01 10 represents four specific preferences 0001 0010 0101 0110 1001 1010 1101 1110. Note that this says nothing at all about the preference relationship between, e.g., 0101 and 1010. 2.2. L(V) Compared to L These two languages of ceteris paribus preference have similar expressive power. Many of the terms used in the definition of the feature vector representation have direct analogs in the ceteris paribus representation of Doyle and Wellman. We now show a translation from

UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES 165 a set C of ceteris paribus rules into sets R of feature vector rules in a way that guarantees compatibility of meaning in the sense that [R] [C]. This translation will allow us to develop algorithms and methods in following sections that use the language L(V) and know that our conclusions hold over statements in L. We then give the opposite translation. 2.3. Forward Translation The translation involves considering models restricted to subsets of features. We write M[S] to denote the set of models over a feature set S F, so that M = M[F]. If m M[S] and S S, we write m S to denote the restriction of m to S, that is, the model m M[S ] assigning the same values as m to all features in S. We say that a model m M[S] satisfies a model m M[S ], written m = m just in case S S and m = m S. A set of rules R of intermediate representation is compatible with a rule c = p c q c in the ceteris paribus representation of the previous section just in case [R] [c]. If m 1 c m 2, this means that for some r R we have (m 1, m 2 ) = r, where m 1, m 2 model L, m 1, m 2 model L(V), such that m 1 [m 1 ] and m 2 [m 2 ]. We give a construction for such an r from an arbitrary p. Before delving into the details of our construction, we define two important translation functions. The characteristic statement σ (m) of a model m is the statement in L(V) that indicates the same truth values on features as m does. Similarly, the characteristic model µ( p) ofa statement p L(V) is the model in M[s(p)] that indicates the same truth values on features as p does. We give formal definitions below. Definition 2 (Characteristic statement σ ). For any S F(C), σ is a function: σ : M[S] L(V). Let m be a model of S.Weset f i (σ (m)) as follows: f i (σ (m))=1iff f i (m) = true, f i (σ (m))=0iff f i (m) = false, f i (σ (m))=*iff f i S. An important property of σ is as follows. If a is a model of M[S], m a model in M(V), then m satisfies a implies that m satisfies σ (a). More formally, m = a m = σ (a). (2) This can be seen by considering each of the N elements of F(C) one at a time. Let S F(C). If f i S, then a assigns a truth value to f i, which is consistent with f i (m) and f i (σ (a)). If f i S, then a does not assign f i a truth value, nor does σ (a) require a particular truth value for f i (m) for a model m to satisfy σ (a). Note that if f i (S\F(C)), it is (again) not required to be of a particular truth value for a model to satisfy σ (a). Definition 3 (Characteristic model µ). Let p be a statement in L(V). Then the characteristic model µ(p) of a statement p in L(V) is the model in M[s(p)] defined by µ(p) ={f f (p) = 1} { f f (p) = 0}. Note that for m M we have m = µ(α(m)), that is, µ(α(m)) = m F(C). This follows directly from the definitions of model satisfaction, in a similar fashion to the proof of equation (2). We translate a single ceteris paribus rule c L r into a set of intermediate representation rules R by the support of c. Ifc is of the form p c q c, where p c, q c are sentences in L, then models that satisfy p c q c are preferred to models that satisfy p c q c, other things being equal. For brevity, let s c = s(p c q c ) s( p c q c ), so that s c F(C) is the set of

166 COMPUTATIONAL INTELLIGENCE support features for each rule r R, and consider models in M[s c ]. Let W l be the set of such models satisfying p c q c, that is, W l ={w M[s c ] w = p c q c }, and define W r as the corresponding set of models satisfying the right-hand side, W r ={w M[s c ] w = p c q c }. We construct new sets W l and W r of statements in L(V) from W l and W r by augmenting each member and translating into L(V). We define these new sets by W l ={w L(V) σ (w ) = w, w W l }, W r ={w L(V) σ (w ) = w, w W r }. Note that the members of W l and W r are of length F(C), while those of W l and W r are of size s c. We now construct a set of rules in L r (V) to include a rule for each pair of augmented statements, that is, a set R ={w l w r w l W l,w r W r }. This completes the translation. Consider a simple example. In the following, we assume V = ( f 1, f 2, f 3, f 4 ). A ceteris paribus rule might be of the form f 2 f 4 f 3. This expresses the preference for models satisfying over models satisfying ( f 2 f 4 ) f 3 ( f 2 f 4 ) f 3 other things being equal. Note s c ={f 2, f 3, f 4 }. Then, following the above construction, W l ={{f 2, f 3, f 4 }}, and W r ={{f 2, f 3, f 4 }, { f 2, f 3, f 4 }, { f 2, f 3, f 4 }}. In this case we translate these into three intermediate representation rules: 100 111 100 011 100 010. The above exposition and construction gives us the necessary tools to state and prove the following lemma. Lemma 1 (Feature vector representation of ceteris paribus rules). Given ceteris paribus rule cinl, and any m 1, m 2 models of L where m 1 c m 2, there exists a finite set of rules R in L r (V) such that there exists m 1, m 2 models of L(V), r R, with (m 1, m 2 ) = r such that m 1 [m 1 ] and m 2 [m 2 ]. Proof. Without loss of generality, we take c = p c q c. The case of c = p c q c is the same. Using the definition of a ceteris paribus rule, we note that Lemma 1 is satisfied when m M, m m[p c q c ], m m[ p c q c ] r R, m = LHS(r), m (3) = RHS(r).

UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES 167 We show that if m, m, m are such that m m[p c q c ], m m[ p c q c ], then our translation above constructs an r such that m = LHS(r), m = RHS(r). Given such m, m, m, from the definition of m m[p c q c ], we have m = p c q c. Let w a be a member of W l such that w a = m s c and w b be a member of W r such that w b = m s c. By definition of restriction, m = w a, and m = w b. Let w a be in W l such that w a = σ (w a). Since m = w a then by equation (2) m = w l. We also have m = w a. Since w a = LHS(r) for some r in R according to the construction, we have m = LHS(r) for some r. A similar argument shows that m = RHS(r) for some (possibly different) r. Since a W l, b W r, r : a = LHS(r), b = RHS(r), we choose r such that m = LHS(r) = w a and m = RHS(r) = w b. This completes the proof of Lemma 1. Thus, we have shown, for a given rule c = p c q c in the ceteris paribus representation, there exists a set of rules R in the feature vector representation such that (m 1, m 2 ) = r R iff there exists m such that m 1 m[p c q c ], m 2 m[ p c q c ]. Thus, the construction preserves the meaning of a rule in the ceteris paribus representation. 2.4. Reverse Translation We give a translation from one rule in L r (V) to one rule in L. This is a construction for a rule r in L r (V) into a ceteris paribus rule c in L, such that if (m 1, m 2 ) = r, there exists c = p c q c and m, such that m 1 m[p c q c ] and m 2 m[ p c q c ]; and similarly for rules using. Suppose we have a general feature vector representation rule r = LHS(r) RHS(r). This means that models satisfying LHS(r) are preferred to those satisfying RHS(r), all else equal. A ceteris paribus rule is a comparison of formulae a b where a, b are formulae in the language L. Thus, we must look for some formulae a, b such that a b µ(lhs(r)) (4) and a b µ(rhs(r)). In the following, m denotes a model in M(V). Consider the sets of models [LHS(r)] ={m α(m ) = LHS}, and [RHS(r)] ={m α(m ) = RHS}. Note that [LHS(r)] and [RHS(r)] are disjoint. Disjointness follows from the support features of LHS(r) equal to the support features of RHS(r), and that LHS(r) RHS(r). [LHS(r)] [RHS(r)] follows from disjointness, where [RHS(r)] ={m α(m ) = RHS(r)} is the complement of [RHS(r)]. Thus, [LHS(r)] [RHS(r)] = [LHS(r)] and [RHS(r)] [LHS(r)] = [RHS(r)]. This suggests a solution to equation (4). We let a = µ(lhs(r)) so we have [a] = [LHS(r)], and similarly, b = µ(rhs(r)), and [b] = [RHS(r)]. Then we have [a b] = [a] [b] = [a] and [ a b] = [a] [b] = [b], as required. Thus, our ceteris paribus rule c is a b.

168 COMPUTATIONAL INTELLIGENCE Lemma 2 (Ceteris paribus rule expressed by L r (V)). Given a rule r in L r (V), for each pair of models (m 1, m 2 ) = r, there exists a ceteris paribus rule c = p c q c,orc = p c q c, with p c, q c in L, such that there exists an m such that m 1 m[p c q c ], m 2 m[ p c q c ]. Proof. The proof follows from the construction. We must show that for each pair (m 1, m 2 ) = r in L(V), our construction creates a rule c = p c q c, with p c, q c in L, such that there exists m such that m 1 m[p c q c ], m 2 m[ p c q c ]. The argument for c = p c q c is similar. Let m 1, m 2 model L, and be such that α(m 1 ) = m 1, and α(m 2 ) = m 2. Then (α(m 1 ),α(m 2 )) = r. We know that α(m 1 ) = LHS(r) and α(m 2 ) = RHS(r). This implies that m 1 is in [LHS(r)] and m 2 is in [RHS(r)]. Since we define a = µ(lhs(r)), [a] = [LHS(r)] follows from the definition of µ. Similarly, [b] = [RHS(r)], so we have m 1 [a], m 2 [b]. m 1 [a b] and m 2 [ a b] follows from [a] = [a b], [b] = [ a b]. Now we show that m 1, m 2 are the same outside the support of [a b] and [ a b], respectively. Note that any two logically equivalent statements have the same support, which follows from how we define support. Thus, s(lhs(r)) = s(a) = s(a b), and s(rhs(r)) = s(b) = s( a b). Note also, from the definition of the feature vector representation, we have s(lhs(r)) = s(rhs(r)), as a consequence of the requirement that the support features of LHS(r) be the same as the support features of RHS(r). Thus s([a b]) = s([ a b]). We are given that (m 1, m 2 ) = r, this implies m 1, m 2 are the same outside the support of LHS(r), RHS(r). Thus, m 1, m 2 are the same outside support of [a b]. This implies there exists m such that m 1 m[a b], m 2 m[ a b], which completes the proof. We note that m = m 1 or m = m 2 satisfies the condition on m. 2.5. Summary We have shown translations from the feature vector representation to the ceteris paribus representation, and a similar translation from the ceteris paribus representation to the feature vector representation. Combining Lemmas 1 and 2 we can state the following theorem: Theorem 1 (Expressive equivalence). Any preference over models representable in either the ceteris paribus representation or the feature vector representation can expressed in the other representation. Using this equivalence and the construction established in Section 2.3, we define an extension of the translation function σ (defined in Definition 1) to sets of ceteris paribus preference rules in L. This translation converts a set of ceteris paribus preference rules to a set of feature vector representation statements. That is, σ : C C, where C is a set of ceteris paribus rules, and C is a set of rules in L r (V). This translation is accomplished exactly as described in Section 2.3. Despite the demonstrated equivalence, the two preference representations are not equal in every way. Clearly, the translation from the ceteris paribus representation can result in the creation of more than one feature vector representation rule. This leads us to believe that the language of the ceteris paribus representation is more complicated than the other. This is naturally the case, since we can have complicated logical sentences in the language L in our ceteris paribus rules. Thus, it can take many more rules in L(V) to represent a rule in L. This gives us some further insight. The length, in characters, of a ceteris paribus rule in the logical language L, is arbitrary. The length of a rule in L r (V) is always 2 V. We might also ask how quickly we can tell if a model m of L(V) satisfies a statement r in L r (V). This verification is essentially a string parsing problem, where we check elements of m against elements of r. Thus, the determination of satisfaction takes time proportional to the length of the formula.

UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES 169 3. SOME SIMPLE ORDINAL UTILITY FUNCTIONS In this section, we illustrate simple utility functions, u : M(V) R consistent with an input set of feature vector representation preferences C where each rule r C is a statement in L r (V). It is possible to combine this with the translation defined in the previous section. Thus, we may take as input a set of ceteris paribus preference rules C, and then let C = σ (C). Similarly, one can use these functions u to define utility functions over models in L by composition with the model-projection mapping α. Specifically, one finds the utility of a model m M by computing the projection α(m) M(V) and using one of the functions defined in the following. We consider a model graph, G(C ), a directed graph which will represent preferences expressed in C. Each node in the graph represents one of the possible models over the features F(C). The graph G(C ) always has exactly 2 F(C) nodes. The graph has two different kinds of directed edges. Each directed edge in G(C ), e s (m 1, m 2 ) from source m 1 to sink m 2, exists if and only if (m 1, m 2 ) = r for some strict preference rule r C. Similarly, an edge e w (m 1, m 2 ) from source m 1 to sink m 2, exists if and only if (m 1, m 2 ) = r for some weak preference rule r C. Each edge, e s and e w, is an explicit representation of a preference for the source over the sink. It is possible to consider each rule r and consider each pair of models (m 1, m 2 ) = r, and create an edge e for each such pair. After this process is completed, we can determine if m i is preferred to m j according to C by looking for a path from m i to m j in G(C ) and a path from m j to m i. If both pathes exist, and are composed entirely of weak edges e w, then we can conclude that m i m j.if only a path exists from m i to m j, or a path exists using at least one strict edge, then we can conclude that m i m j according to C. Similarly, if only a path exists from m j to m i,orone using at least one strict edge, then we can conclude that m j m i according to C. If neither path exists, then a preference has not been specified between m i and m j in C. Cycles in the model graph are consistent if all the edges in the cycle are weak edges (e w ). In this case, all the models in the cycle are similarly preferred to each other, and should be assigned the same utility. If there are cycles using strict edges (e s ), this would indicate that some model is preferred to itself according to C, which we consider inconsistent. Thus, it is possible to construct the graph, and then look for cycles, with the presence of strict-cycles indicating that the input preferences are inconsistent. 3.1. Graphical Utility Functions In this section, we will define four different graphical utility functions (GUFs). Each GUF will use the same graph, a model graph G(C ), but will define different measures of utility using this graph. We will define GUFs that compute the utility of a model m by checking the length of the longest path originating at m, counting the number of descendants of m in G(C ), or the number of ancestors, or the length of the longest path originating elsewhere and ending at m, or the number of nodes following it in the topological-sorted order of G(C ). We present a summary of each GUF in Table 2. Definition 4 (Minimizing graphical utility function). Given the input set of ceteris paribus preferences C, and the corresponding model graph G(C ), the Minimizing GUF is the utility function u M where u M (m) is equal to the number of unique nodes on the longest path, including cycles, originating from m in G(C ). The intuition is that in a graph, the distance between nodes, measured in unique nodes, can indicate their relative utilities. Note that if neither m 1 m 2 or m 2 m 1 in C, then we do not

170 COMPUTATIONAL INTELLIGENCE TABLE 2. Four Different Ordinal Graphical Utility Functions (GUFs) Function name Symbol Method Minimizing GUF u M Longest path from m Descendant GUF u D Number of descendants of m Maximizing GUF u X Longest path ending at m Topological GUF u T Rank of m in topological-sorted order require u(m 1 ) = u(m 2 ). Only when m 1 m 2 and m 2 m 1 in C, do require u(m 1 ) = u(m 2 ). This implies members of a cycle should receive the same utility. However, in a graph with cycles the concept of longest path needs more discussion. Clearly several infinite long pathes can be generated by looping forever on different cycles. We mean the longest nonbacktracking path wherein all nodes on any cycle the path intersects are included on the path. It is important to include each of the members of a cycle as counted once among the unique nodes on this longest path, however, since this assures that a set of nodes on the same cycle will receive the same utility. For example, suppose the rules C are such that only the following relations are implied over models: m 1 m 2 m 2 m 3 m 3 m 4 m 3 m 5 m 5 m 2 m 4 m 6. (5) It is clear that m 2, m 3, m 5 form a cycle (see Figure 1). The longest path from m 1 clearly goes through this cycle, in fact this path visits nodes m 1, m 2, m 3, m 4, m 6. However, since this path intersects the cycle {m 2, m 3, m 5 }, we add all nodes on the cycle to the path, in this case only the node m 5. Thus, the longest path we are interested in from m 1 passes M 1 M 5 M 2 M 3 M 4 M 6 FIGURE 1. Model graph for preferences in equation (5).

UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES 171 M 3 M 1 M 4 M 7 M 2 M 5 M 6 FIGURE 2. Model graph for preferences in equation (6). though all six nodes, and we have u M (m 1 ) = 6. Similarly, the other nodes receive utility as follows: u M (m 1 ) = 6 u M (m 2 ) = 5 u M (m 3 ) = 5 u M (m 4 ) = 2 u M (m 5 ) = 5 u M (m 6 ) = 1. Consider the relation between unrelated models provided by the minimizing GUF. Our different graphical utility functions define different relationships for such models, in fact, the relationship is somewhat arbitrary. Consider the following example. Suppose the rules in C are such that only the following pairs of models satisfy any rule in C : m 1 m 2 m 3 m 4 m 4 m 5 m 4 m 6. (6) The relationships between several models (see Figure 2), for example m 1 and m 3,is unspecified. A utility function is free to order these two any way convenient, and the utility function will still be consistent with the input preferences. The minimizing GUF gives the following values to the models: u M (m 1 ) = 2 u M (m 2 ) = 1 u M (m 3 ) = 3 u M (m 4 ) = 2 u M (m 5 ) = 1 u M (m 6 ) = 1. An interesting property of this utility function is that it is minimizing, that is, it assigns minimal utility to models that are not explicitly preferred to other models according to the preference set. Thus, suppose there exists an m 7 in the above domain, about which no preferences are specified. This model will receive minimal utility (1) from the utility function. Definition 5 (Descendant graphical utility function). Given the input set of ceteris paribus preferences C, and the corresponding model graph G(C ), the Descendant GUF is the

172 COMPUTATIONAL INTELLIGENCE utility function u D where u D (m) is equal to the total number of unique nodes on any paths originating from m in G(C ). Definition 6 (Maximizing graphical utility function). Given the input set of ceteris paribus preferences C, and the corresponding model graph G(C ), we let max(g(c )) be the length of the longest path in G(C ). The maximizing GUF is the utility function u X where u X (m) is equal to max(g(c )) minus the number of unique nodes on the longest path originating at any node other than m and ending at m in G(C ). Definition 7 (Topological sort graphical utility function). Given the input set of strict ceteris paribus preferences C, and the corresponding model graph G(C ), let n = 2 F(C) be the number of nodes in G(C ). The topological sort GUF is the utility function u T where u T (m) is equal to n minus the rank of m in the topological-sorted order of G(C ). Each of the above utility functions is ordinal. Each function except the topological sort GUF handles both weak and strict preferences. Our maximizing GUF must make use of the same technique for the path intersecting a cycle that we use for the minimizing GUF. The Descendant function has no such difficulty with cycles, but note that a node must be one of its own descendants. Each function behaves differently toward models not mentioned by the preferences. We have already discussed the minimizing property, where a utility function gives very low or zero utility to models not mentioned by the preferences. These are models that are neither preferred to other models or have other models preferred to them. We therefore have no information about them, and, as we have mentioned, are free to order them as convenient, while preserving the ordinal property of the utility function. In contrast to minimizing, we call the above functions maximizing when they assign high utility to models about which we have no information. Consider the example given in equations (6). Under the Descendant GUF (Definition 5), we get the following utilities for m 1,...,m 7 : u D (m 1 ) = 2 u D (m 2 ) = 1 u D (m 3 ) = 4 u D (m 4 ) = 3 u D (m 5 ) = 1 u D (m 6 ) = 1 u D (m 7 ) = 1. We give slightly higher utility to m 3, m 4 than the under Definition 4, since the former counts all descendants, while the latter counts only the longest path. The maximizing graphical utility function, Definition 6, gives the following values: u X (m 1 ) = 3 u X (m 2 ) = 2 u X (m 3 ) = 3 u X (m 4 ) = 2 u X (m 5 ) = 1 u X (m 6 ) = 1 u X (m 7 ) = 3.

UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES 173 For some applications, it can be useful to find groups of models for which no model is clearly better, these models are called nondominated, or pareto-optimal. The maximizing GUF assigns each such nondominated model maximal utility. Thus, it assigns u(m 7 ) = 3. In some sense, this gives the function a speculative characteristic, since it is willing to assign high utility to models which are unmentioned in the input preferences. Topological sort gives different answers depending on how it is implemented. However, each model gets a unique utility value. It could give the following utility values for the example: u T (m 1 ) = 7 u T (m 2 ) = 6 u T (m 3 ) = 5 u T (m 4 ) = 4 u T (m 5 ) = 3 u T (m 6 ) = 2 u T (m 7 ) = 1. This is a good example of how the utility of models can vary hugely under different utility functions, while each utility function is consistent with C. Now that we have discussed some of the properties and behaviors of each of the GUFs defined above, we prove their consistency. Theorem 2 (Consistency of minimizing GUF). Given the input set of ceteris paribus preferences C, and the corresponding model graph G(C ), the utility function that assigns u M (m) equal to the number of unique nodes on the longest path originating from m in G(C )is consistent with C. Proof. Let u M (m) be equal to the number of nodes on the longest path originating from m in G(C ). For u M to be consistent with C, we require p(u M ) [C ]. Specifically, this requires that u M (m 1 ) u M (m 2 ) whenever m 1 C m 2, and u M (m 1 ) > u M (m 2 ) whenever m 1 C m 2. Choose a pair m 1, m 2 such that m 1 C m 2 according to C. By construction of G(C ), there exists an edge from m 1 to m 2 in G(C ). Thus, u M (m 1 ) u M (m 2 ) because the there exists a path from m 1 that contains m 2, and therefore contains the longest path from m 2, plus at least one node, namely the node m 1.Ifm 1 C m 2 then it is possible that there is both a path from m 1 to m 2 and a path from m 2 to m 1. In this case, these two pathes define a cycle containing both m 1 and m 2. Since m 1 and m 2 lie on the same cycle, u M (m 1 ) = u M (m 2 ) since any longest path accessible from one model is also the longest path from the other. Theorem 3 (Consistency of descendant GUF). Given the input set of ceteris paribus preferences C, and the corresponding model graph G(C ), the utility function that assigns u D (m) equal to the total number of unique nodes on any pathes originating from m in G(C )is consistent with C. Proof. Following the proof of Theorem 2, if we know that m 1 m 2 according to C, then by construction of G(C ), there exists an edge from m 1 to m 2 in G(C ). Since m 2 is on a path from m 1, m 1 has at least one more descendant than m 2, namely, m 1. Therefore u D (m 1 ) > u D (m 2 ). If it is a weak edge from m 1 to m 2, then there might also be a path from m 2 to m 1, in which case, both models have the same set of descendants, and u D (m 1 ) = u D (m 2 ). Theorem 4 (Consistency of maximizing GUF). Given the input set of ceteris paribus preferences C, and the corresponding model graph G(C ), we let max(g(c )) be the length

174 COMPUTATIONAL INTELLIGENCE of the longest path in G(C ). The utility function that assigns u X (m) equal to max(g(c )) minus the number of unique nodes on the longest path originating at any node other than m and ending at m in G(C ) is consistent with C. Proof. Omitted. Theorem 5 (Consistency of topological sort GUF). Given the input set of strict ceteris paribus preferences C, and the corresponding model graph G(C ), let n = 2 F(C) be the number of nodes in G(C ). The utility function that assigns u T (m) equal to n minus the rank of m in the topological-sorted order of G(C ) is consistent with C. Proof. Omitted. 3.2. Complexity The utility functions outlined in the previous section, while conceptually simple, have worst-case complexity exponential in the number of relevant features N = F(C). As noted earlier, the model graph G(C ) has 2 N nodes, but this exponential size does not in itself imply exponential cost in computing utility functions because the utility functions derive from graph edges rather than graph nodes. The descendant utility function u D, for example, requires counting the number of descendants of nodes, a number at worst linear in the number of edges. The other utility functions measure the number of ancestors, or the longest path from or to a node. Clearly counting the number of ancestors is the same computational burden as counting the number of descendants. Computing the longest path originating at a node and ending elsewhere has the same bound, since searching all descendants can determine the longest path. Accordingly, the number of edges in the model graph provides a basic complexity measure for these utility computations. In fact, a simple and familiar example shows that the model graph can contain a number of edges exponential in the size of F(C). Suppose, for instance, that F(C) consists of four features and that the derived intermediate preference rules C consist of those displayed in Table 3. These rules order all models lexicographically in a complete linear order, the same ordering we give models if we interpret them as binary representations of the integers from 0 to 15. The longest path through G(C ) has length 2 F(C), so the number of edges is exponential in C = F(C). One should note that this example does not imply utility dependence among the features, but it does imply that the preferences over some features dominate the preferences over other features. Moreover, the example does not show that derivation of a utility function must take exponential time, because lexicographic utility functions can be expressed in much simpler ways than counting path length. The true complexity of this problem remains an open question. In fact, one can trade computation cost between construction and evaluation of the utility function. The evaluation of specific utility values can be reduced by significant preprocessing in the function-construction stage. Clearly the utility value of m M(V) could be cached TABLE 3. Lexicographic Preference Rules 1 0 10 01 100 011 1000 0111