An Argumentation-Based Interpreter for Golog Programs

Size: px

Start display at page:

Download "An Argumentation-Based Interpreter for Golog Programs"

Nickolas Beasley
5 years ago
Views:

1 An Argumentation-Based Interpreter for Golog Programs Michelle L. Blom and Adrian R. Pearce NICTA Victoria Research Laboratory Department of Computer Science and Software Engineering The University of Melbourne {mlblom, Abstract This paper presents an argumentation-based interpreter for Golog programs. Traditional Golog interpreters are not designed to find the most preferred executions of a program from the perspective of an agent. Existing techniques developed to discover these executions are limited in terms of how the preferences of an agent can be expressed, and the variety of preference types that can be used to guide search for a solution. The presented work combines the use of argumentation to compare executions relative to a set of general comparison principles, and the theory behind best first search to reduce the cost of the search process. To the best of our knowledge this is the first work to integrate argumentation and the interpretation of Golog programs, and to use argumentation as a tool for best first search. 1 Introduction Golog [Levesque et al., 1997] is a logic programming language for agents designed to represent complex actions and procedures in the situation calculus. Each execution of a Golog program represents a way of achieving a particular goal. A traditional Golog interpreter is designed to find the possible executions of a program, not to identify those that are the most preferred. Several approaches have been developed to achieve this task, and extend Golog with preferences. DTGolog a decision theoretic extension of Golog maximises utility in the interpretation of Golog programs [Boutilier et al., 2000]. In [Fritz and McIlraith, 2006], DTGolog is extended with the incorporation of qualitative preferences in the form of temporal logic constraints of varying priority. In [Sohrabi et al., 2006], best first search is applied in the interpretation of Golog programs to discover a web service composition that is the most preferred given temporal logic preference formulae. The variety of preference types that can be expressed in these existing approaches is limited. The utilities of [Boutilier et al., 2000] and the qualitative preferences of [Fritz and McIlraith, 2006; Sohrabi et al., 2006] do not express all the ways in which executions of a program can be compared. These approaches, for example, cannot capture comparative preference derived from the relationships that exist between one choice and another. Examples of such preferences include those based on: intangible, immeasurable factors such as taste and perception; decision heuristics encapsulating past decision-making experience; and lexicographic or semi-lexicographic [Tversky, 1969] comparison rules. In this paper, we present an argumentation-based Golog interpreter that aims to address this limitation, while maintaining the advantage of reduced search cost through best first search. Argumentation techniques have gained widespread use in both agent and multi-agent systems as tools for the support of more human-like decision-making processes [Ouerdane et al., 2007]. In this paper, we first develop an argumentation framework for decision-making (ADF ) that constructs a strict preference profile (a dominance graph) over a set of executions A of a Golog program. This framework allows an agent to compare executions according to whichever principles it desires (denoted comparison principles), including those describing comparative preference. We do not assume that the resulting profile is transitive or acyclic, and as such we apply choice theory techniques to define the most preferred subset of A (MP S(A)). We apply a choice rule, such as the Schwartz [Schwartz, 1972] criterion, to a dominance graph over A to select these most preferred alternatives. We then characterise the interpretation of a Golog program as best first search (using the transition semantics of [de Giacomo et al., 2000]). At each stage in the search for a most preferred execution of a program δ, an instantiation of our ADF provides an estimate of which available transitions will lead to an execution in MP S(A), where A denotes the executions of δ. The concept of speculation (a tool for predicting future opportunities in search) and a prioritised selection algorithm are used to select a transition to perform at each stage of the search process. To select a transition, an agent speculates on (estimates) the nature of the preferences that exist between complete executions of a program, based on the preferences that exist between partial executions at each stage in search. Our argumentation-based best first search interpreter is shown to always find an execution in the most preferred subset of a program s execution set A (provided A = ). The remainder of this paper is structured as follows. In Section 2, we provide an introduction to the Golog programming language. In Section 3, we describe in more detail the existing preference-based Golog interpreters, their limitations, and our motivation for using argumentation-based best

2 first search to address them. Our ADF, defining MP S(A) for a set of program executions A, is outlined in Section 4. In Section 5, we describe argumentation-based best first search. 2 Golog, and The Situation Calculus Golog programs are described using axioms of the situation calculus, together with constructs to encode procedures, actions, iteration, and non-deterministic choice [Levesque et al., 1997]. The situation calculus enables reasoning about actions, which change the state of the world, and situations, which describe a sequence of actions applied to an initial state. In the situation calculus, a collection of action precondition, successor state, foundational domain independent, and initial state axioms characterise the domain of an application. The constructs present in the Golog programming language are described in [de Giacomo et al., 2000; Levesque et al., 1997] and are: a (primitive action); φ? (test that φ holds in the current situation); δ 1 ; δ 2 (execute program δ 1 then δ 2 ); δ 1 δ 2 (choose to execute program δ 1 or δ 2 ); π v.δ (choice of arguments v in δ); δ (execute δ zero or more times); if φ then δ 1 else δ 2 (synchronised conditional); while φ do δ (synchronised loop); and proc P ( v)δ end (procedure). Given a domain theory, D, expressed in the situation calculus, a legal execution of a Golog program δ is a sequence of actions a such that: D = Do(δ, s o, do( a, s o)), where do( a, s o ) is a legal terminating situation (execution) of δ when executed in the situation s 0 1. Each pair (δ, s), where δ is a program to be executed in situation s, is called a configuration. The transition semantics for the interpretation of Golog programs [de Giacomo et al., 2000] (providing an implementation of Do) defines when an agent can transition from one configuration (δ, s) to another (δ, s ) by performing one step in the program δ, resulting in the situation s and the program δ left to perform. For each form a Golog program may assume, δ f, a Trans axiom is defined to determine which configurations can be transitioned to from (δ f, s) by performing a single step of δ f in situation s. A configuration (δ f, s) is final (Final(δ f, s)) if no steps remain to be performed in δ f. A selection of Trans and Final axioms are shown below 2. Given a program of the form δ f = a (where a is a primitive action), the configuration (δ f, s) can transition if it is possible to perform action a in situation s (P oss(a[s], s)), T rans(a, s, δ, s ) P oss(a[s], s) δ = nil s = do(a[s], s) and is not final as the action a remains to be performed. For the sequence δ f = δ 1 ; δ 2 (where δ 1 is performed first and then δ 2 ), the configuration (δ f, s) transitions as follows, T rans(δ 1 ; δ 2, s, δ, s ) γ.δ = (γ; δ 2 ) T rans(δ 1, s, γ, s ) F inal(δ 1, s) T rans(δ 2, s, δ, s ) and is final if both δ 1 and δ 2 are final in s: F inal(δ 1 ; δ 2, s) F inal(δ 1, s) F inal(δ 2, s) 1 do({a 1, a 2,..., a n 1, a n}, s) is shorthand for do(a n, do(a n 1,... do(a 2, do(a 1, s)))). 2 A full account of the transition semantics can be found in [de Giacomo et al., 2000]. 3 Related Work In rational choice theory, it is assumed that the preferences of a decision maker can be expressed as a utility function that assigns a value to each decision alternative. Utility maximisation produces a complete and transitive ranking from which a most preferred alternative can be selected. Golog and Preferences Developed in this vein is DTGolog a decision theoretic extension of Golog [Boutilier et al., 2000]. DTGolog a prominent approach for finding the most preferred executions of a Golog program combines the theory of Markov decision processes (MDPs) with the Golog programming language. Solving an MDP involves finding an optimal policy an optimal mapping between the states an agent might find itself in and the actions that should be performed in those states. DTGolog specifies an MDP with an action theory (expressed in the situation calculus) augmented with stochastic actions (actions with uncertain outcomes) and a reward function mapping numerical rewards to situations. A policy for a program δ is optimal if it maximises the expected reward or utility achieved over a given horizon (maximum number of actions performable), and the probability with which δ will be successfully executed. The incorporation of qualitative preferences in the interpretation of Golog programs [Fritz and McIlraith, 2006; Sohrabi et al., 2006] and planning in general [Son and Pontelli, 2004; Bienvenu et al., 2006] is an advantage when quantitative preferences (such as utilities or rewards) are not available or do not capture all of the preferences of an agent. Fritz and McIlraith extend DTGolog with qualitative preferences in [Fritz and McIlraith, 2006]. These preferences are expressed in terms of basic desire formulae (temporal logic constraints on situations) of varying priority. These formulae are compiled into a DTGolog program to be executed synchronously with the program under interpretation δ. This synchronous execution aims to find the most quantitatively preferred policies of δ amongst those that satisfy (if not all, at least some of) the qualitative constraints. Sohrabi et al develop GologPref [Sohrabi et al., 2006], a Golog interpreter designed to discover a most preferred web service composition using the best first search strategy of PPLAN [Bienvenu et al., 2006]. In [Bienvenu et al., 2006], user preferences are expressed in a language that extends PP [Son and Pontelli, 2004]. PP describes a hierarchy of preference constructs: basic desire formulae outlining temporal logic constraints on plan trajectories; atomic preferences specifying an ordering on basic desires according to their importance; and general preference formulae allowing the combination of atomic preferences with operators analogous to,, and. PPLAN introduces additional mechanisms of preference combination and assigns weights (denoting degrees of importance) to basic desires in atomic preference formulae. These weights are used to compute a value for each partial plan on a search frontier, where preferences not already violated are assumed to be satisfied at a future point on the trajectory of a partial plan. PPLAN selects the most promising partial plan for exploration at each stage in search. In contrast with the work of [Sohrabi et al., 2006], DT- Golog based approaches [Boutilier et al., 2000; Fritz and

3 McIlraith, 2006] consider the entire execution space in the search for one that is most preferred. This need arises, however, from their incorporation of stochastic actions. In this paper, we do not consider the use of stochastic actions. Limitations of Existing Work In [Delgrande et al., 2007], a distinction between absolute preference (describing the desirability of a particular state of affairs) and comparative preference (describing the desirability of one state of affairs relative to others) is made. The approaches described above support only absolute preference, and disallow those that imply the merit of one partial plan or execution is dependent on the alternatives it is being compared with. In addition to the varieties of preference that can often only be described comparatively such as preference over tastes and perception preference over alternatives involving multiple attributes is often expressed in terms of comparative rules to reduce the cognitive load on a decision maker [Bar-Hillel and Margalit, 1988]. An example is the application of a majority rule over multiple attributes. In this scenario, one alternative is preferred to another if it is better on a majority of attributes. Approximation methods used to ascertain a preference for one alternative over another such as the majority rule described above, and the cancelling out of dimensions with similar differences [Tversky, 1969] are common in human decision-making [Tversky, 1969]. Argumentation and Best First Search Argumentation techniques provide a general means by which preference between alternatives can be specified or ascertained. In argumentation-based decision-making, preference amongst alternatives is constructed as a part of the decision-making process, and is based on the competing interests, motivations, values, or goals, of the decision maker. Given a set of program executions, reasons may be formed in support of preferences between its members based on the variety of principles an agent may use to compare them. In this paper, we combine the advantages of argumentation as a tool for expressing complex preferences, and best first search a technique applied in existing work to reduce the cost of search for a most preferred execution. In place of independently evaluating partial executions with respect to a set of absolute preferences, arguments supporting preferences between partial executions are used to guide the search. 4 Argumentation over Executions In this section, we define an argumentation framework for decision-making (ADF) that constructs preference relationships between decision alternatives given a general set of principles by which they are compared. We apply this framework, in conjunction with choice theory, to find the most preferred subset of an execution set A for a Golog program δ. We first provide the following definition of a comparison principle, and assume that an agent a = C, I is equipped with a finite set of such principles C, and a (transitive, irreflexive, and asymmetric) partial ordering I over the principles in C. Given c 1, c 2 C, I(c 1, c 2 ) denotes that principle c 1 is more important to a than principle c 2. Definition 1 Given decision alternatives d 1, d 2 D, a comparison principle is a relation C(d 1, d 2, φ) where φ is a conditional expression involving d 1 and d 2. If φ(d 1, d 2 ) holds, we say that d 1 is preferred to d 2 (P(d 1, d 2 )). We now introduce a running example that will used throughout the remainder of this paper. Example 1 (The Travelling Agent) An agent a has the following program (denoted δ) outlining what to do to travel from point A to B to C. Agent a can drive or walk from A to B, and then take the train (preceded by eating cake) or skip (followed by eating fruit) from B to C. proc travel(a, B, C) [drive(a, B) walk(a, B)] ; [[eat(cake); train(b, C)] [skip(b, C); eat(f ruit)]] end We make the assumption that all actions are possible in any situation, with the exception of taking the train which is not possible in a situation that has involved driving. Given an initial situation s 0, A = home (h), B = university (u), C = cinema (c), the program δ has three executions: s 1 = do({drive(h, u), skip(u, c), eat(fruit)}, s 0 ), s 2 = do({walk(h, u), eat(cake), train(u, c)}, s 0 ), s 3 = do({walk(h, u), skip(u, c), eat(fruit)}, s 0 ). Agent a has three principles C i (s j, s k, φ i ) by which it compares these executions. By the first comparison principle, agent a prefers one execution over another if it involves eating tastier food. φ 1 (do( a 1, s 0 ), do( a 2, s 0 )) eat(x) a 1 eat(y) a 2 x > taste y where x > taste y denotes that x tastes better than y. The second principle expresses that walking and skipping together is preferred to driving. φ 2 (do( a 1, s 0 ), do( a 2, s 0 )) walk(x i, y i ) a 1 skip(x j, y j ) a 1 drive(x k, y k ) a 2 By the third principle, agent a prefers executions involving less tiring activities. φ 3 (do( a 1, s 0 ), do( a 2, s 0 )) x a 1 y a 2 x < tiring y where x < tiring y denotes that x is less tiring than y. There are several existing frameworks that have been developed to argumentatively determine the best decisions in an alternative set based on the preferences of an agent (see [Ouerdane et al., 2007] for a review). Each of these approaches is designed to induce a preference ordering over a set of available decisions, from which the best (most preferred) decisions can be inferred. This preference ordering is formed by analysing and comparing a set of arguments (pro and con each decision, for example) constructed in light of the positive and negative aspects of each decision. Our aim is to provide an agent with the flexibility to compare executions with whichever principles it desires. Rather than construct arguments in support of and against individual

4 decisions (pros and cons), comparison principles are instantiated in our ADF to form arguments in support of preference relationships between decisions (executions). Each argument in our ADF represents a reason why one execution s i A is preferred to another s j A (P(s i, s j )). Each argument takes the form J, P(s i, s j ) where s i, s j A and J is a justification for the truth of the conclusion P(s i, s j ). Each justification J represents the instantiation of a comparison principle c C used by an agent to determine if one execution is preferred to another. An argument a 1 = J 1, P(s i, s j ) AR is attacked (rebutted) by all arguments with an opposing conclusion a 2 = J 2, P(s j, s i ) AR (denoted attacks(a 2, a 1 )). An argument a 1 defeats an argument a 2 iff attacks(a 1, a 2 ) holds and the principle that derives a 2 is not more important to the decision maker (agent) than the principle that derives a 1. The structure of our ADF is based on the structure of an audience-specific value-based argumentation framework [Bench-Capon, 2003]. The conclusions of undefeated arguments (arguments not defeated by another argument) in an ADF represent a strict preference profile (a dominance graph) P over the available decisions (executions). Definition 2 An ADF is a 5-tuple: ADF = AR, attacks, C, pri, I where AR is a finite set of arguments, attacks is an irreflexive binary attack relation on AR, C is a finite set of comparison principles, pri maps each argument in AR to the principle in C it is derived from, and I C C is a (transitive, irreflexive, and asymmetric) partial ordering of the principles in C. An argument a AR defeats an argument b AR iff attacks(a, b) and I(pri(b), pri(a)) holds. We assume that an agent has its own choice rule (Ch) for selecting a most preferred subset of A (MP S(A)) based on the preferences that have been found to exist between its elements (Ch(A, P) = MP S(A)). As preference is based on a set of general comparison principles, we do not assume that P is transitive or acyclic. An agent may, for example, represent Ch(A, P) with the Schwartz set [Schwartz, 1972]. Definition 3 Given a set of alternatives D, and a dominance graph over D, P: (i) an alternative γ D dominates an alternative β D iff (γ, β) P; (ii) a Schwartz set component of D is a subset S D such that: (a) no alternative γ D where γ / S dominates an alternative β S; and (b) no non-empty subset S S satisfies property (a); and (iii) the Schwartz set Sc of D is the union of all Schwartz set components of D. If a preference cycle exists between decision alternatives, the Schwartz set infers that they are each at least as good as each other [Schwartz, 1972]. Example 2 (The Travelling Agent cont.) For agent a: fruit > taste cake, drive < tiring walk, and I = {(C 1, C 2 ), (C 1, C 3 ), (C 2, C 3 )}. In an ADF defined over A = {s 1, s 2, s 3 }, a is able to instantiate its comparison principles to produce five arguments in support of preference relationships between elements of A (AR = {A i } i=1,...,5 ). Argument 1, A 1 = C 1, P(s 1, s 2 ), denotes that s 1 is preferred to s 2 as it involves eating tastier food. Similarly, A 2 = C 1, P(s 3, s 2 ). Argument 3, A 3 = C 2, P(s 3, s 1 ), denotes that s 3 is preferred to s 1 as s 3 involves walking and skipping while s 1 involves driving. Argument 4, A 4 = C 3, P(s 1, s 2 ), denotes that s 1 is preferred to s 2 as it is less tiring. Similarly, A 5 = C 3, P(s 1, s 3 ). By Definition 2, A 3 and A 5 attack each other with A 3 defeating A 5 (as I(C 3, C 2 )), and P = {(s 1, s 2 ), (s 3, s 2 ), (s 3, s 1 )}. Given the Schwartz set (Definition 3) as Ch, the most preferred subset of A, MP S(A), is Ch(A, P) = {s 3 }. 5 Argumentation-Based Best First Search In this section, we use the transition semantics defined in [de Giacomo et al., 2000] to characterise the interpretation of a Golog program as best first search. Given a program δ to interpret (and an initial situation s 0 ), the state-space of the problem is the set of configurations (δ i, s i ) that can be transitioned to from (δ, s 0 ) by performing a sequence of actions in δ. Given a configuration (δ i, s i ), a legal moves function MOVES(δ i, s i ) returns the set of configurations that may be transitioned to from (δ i, s i ) by performing the first step of δ i in the situation s i. Search proceeds by building a tree (with root node (δ, s 0 )), whose paths represent partial or complete executions of δ. For each fringe (set of leaf configurations) of this tree, Fringe, an evaluation function EVAL(Fringe, P) selects which configuration in Fringe to transition to next given a dominance graph P over Fringe. If a final configuration (δ r, s r ) is selected by EVAL, the execution represented by this configuration s r forms the result of search. Our aim is to allow an agent a = C, I to discover a final configuration representing an execution in M P S(A) where A are the executions of δ given s 0. Algorithm 1 outlines the program interpretation process, where: EXPAND(Fringe, γ) removes the configuration γ = (δ i, s i ) from the search fringe and replaces it with MOVES(δ i, s i ); ARGUE-ADF(Fringe, C, I) determines the undefeated conclusions P of an ADF defined over the configurations in Fringe; and EVAL(Fringe, P) selects a configuration in Fringe to transition to next based on the dominance graph P. Algorithm 1 PROGRAM INTERPRETATION PROCESS Input: (δ, s 0 ), C, I, Output: Best Best (δ, s 0 ), Fringe {(δ, s 0 )} while Final(Best) do Fringe EXPAND(Fringe, Best) P ARGUE-ADF(Fringe, C, I) Best EVAL(Fringe, P) end while ARGUE-ADF(Fringe, C, I) Each fringe in the search tree constructed by EXPAND consists of a set of configurations (δ i, s i ) where each (δ i, s i ) is reachable from (δ, s 0 ) by performing a sequence of actions in δ. ARGUE-ADF constructs a dominance graph over the configurations in a fringe, with the objective of providing advice to EVAL in its selection of the next configuration to transition to (expand). An agent a = C, I determines P by forming an ADF (Definition 2) over the configurations in Fringe. Each argument in the ADF expresses a preference for one configuration γ 1 Fringe over another γ 2 Fringe (based on the ex-

5 ecutions or partial executions they represent), and describes a reason why γ 1 is more likely to lie on a path to a most preferred final configuration (execution) than γ 2. If arguments in the ADF over Fringe are based solely on actions that have been performed on a path from (δ, s 0 ) to a configuration in Fringe, the search for a most preferred execution of δ becomes greedy. To achieve admissibility, we must ensure that an agent can construct an argument preferring one configuration γ 1 Fringe to another γ 2 Fringe if, according to C, γ 1 leads to a configuration that is preferred to γ 2. We do not want an agent to search ahead in a program to determine which actions are actually possible, as this would defeat the purpose of best first search. Instead, we allow an agent to speculate on (estimate) which actions could be performed after making a transition, and construct arguments accordingly. To formally define a speculation, we introduce a relaxation of the transition semantics of [de Giacomo et al., 2000]. In this relaxed semantics, a TransR axiom is defined for each of the constructs in Section 2. This semantics is defined such that: if a configuration γ can transition to a configuration γ by Trans, γ can transition to γ by TransR; and if a configuration γ cannot transition to a configuration γ by Trans, it may transition to γ by TransR. The relaxed semantics overestimates what is possible in a program. For each construct defined in Section 2, with the exception of primitive actions, TransR and Trans are equivalent. Given a primitive action δ = a, (δ, s) transitions as follows, T ransr(a, s, δ, s ) δ = nil s = do(a[s], s) and differs from its Trans counterpart in that it does not check if an action is possible. Definition 4 (Speculation) For a configuration γ = (δ, s ), γ s = (δ s, s s ) is a speculation of γ if it is final (F inal(δ s, s s )) and a member of the reflexive, transitive closure of TransR (TransR ) where TransR (δ, s, δ s, s s ) holds if (δ s, s s ) is reachable from (δ, s ) by repeated application of TransR 3. With the introduction of speculations the set of arguments AR in the ADF defined over a fringe is formed as follows: for each pair of configurations on a fringe (γ 1, γ 2 ) (where both γ 1 and γ 2 are final or non-final, or γ 1 is non-final and γ 2 final), an argument c, P(γ 1, γ 2 ) (as described in Section 4) is constructed in support of P(γ 1, γ 2 ) if a speculation of γ 1 is preferred to γ 2 according to a comparison principle c C. EVAL(Fringe, P) EVAL computes a set E MP S representing the configurations in Fringe an agent estimates to lie on a path from (δ, s 0 ) to a final configuration (δ, s ) where s MP S(A) and A are the executions of δ. E MP S is the result of applying a choice rule Ch to the dominance graph P induced by an ADF over Fringe in ARGUE-ADF: E MP S def = Ch(Fringe, P) Fringe EVAL selects a configuration in Fringe to transition to next, using the advice in E MP S. As P is based on speculations that may not represent legally reachable configurations, some 3 We consider only Golog programs that admit a finite number of executions, and for which there is no situation in which iteration and loop constructs admit an infinite number of iterations. of the preferences in P are transient. A transient preference between configurations γ 1 and γ 2 does not exist between all configurations reachable from γ 1 and all those reachable from γ 2, limiting an agent s ability to predict the nature of a P over all final program configurations from P over a search fringe. To address problems arising from transient preferences, EVAL is implemented using prioritised configuration selection. This prioritisation is based on the following assumption regarding the nature of all choice rules an agent may adopt. Assumption 1 Given a non-empty set of alternatives D, and a dominance graph over those alternatives P, any choice rule Ch adopted by an agent: (i) finds a non-empty subset of D where Ch(D, P) contains (at least) all alternatives γ D for which there is no alternative γ D such that P(γ, γ); and (ii) satisfies the property that an alternative γ D being in Ch(D, P) is independent of any alternative γ D not connected to γ by an undirected path in P. Assumption 1 provides an agent with a stopping condition an indication of when a final configuration in the E MP S of a fringe represents a most preferred execution. The Prioritisation Algorithm Given a search fringe Fringe, a dominance graph P over Fringe, and the set E MP S = Ch(Fringe, P), EVAL(Fringe, P) is defined to: (i) Select a final configuration γ E MP S for which γ Fringe. P(γ, γ), if one exists; (ii) If (i) is not satisfied, select a final configuration γ E MP S which is not connected by an undirected path of preferences in P to a non-final configuration γ Fringe, if one exists; (iii) If (i)-(ii) are not satisfied, select a non-final configuration γ E MP S, if one exists; and (iv) If (i)-(iii) are not satisfied, select any non-final configuration γ Fringe. Steps (i) and (ii) specify the conditions that must be satisfied by a final configuration to demonstrate that it represents an execution in M P S(A). Transient preferences can only exist between configuration pairs in which at least one member is non-final. If a final configuration γ Fringe is not independent of all members of such pairs (according to Assumption 1), an agent cannot predict the status of the execution it represents when a P is constructed over all executions in A. In this case, steps (iii)-(iv) tell the agent to continue its search. Theorem 1 Given an agent a = C, I, using a choice rule satisfying Assumption 1, any execution s r A of a program δ, given an initial situation s 0, found as a result of argumentation-based best first search is in MP S(A). Proof : We demonstrate that a final configuration γ = (δ r, s r ) of δ cannot be selected from any search fringe Fringe, given a dominance graph over its members P, if s r / MP S(A). EVAL, using the prioritisation algorithm, will only select γ Ch(Fringe, P) as a result of search if: (i) there is no configuration γ Fringe such that P(γ, γ); or (ii) there is no non-final configuration γ Fringe where γ and γ are connected by an undirected path in P. If (i) holds, then there is no undefeated argument (based on a speculation or not) that can be formed in support of P(γ, γ)

6 for γ Fringe. As speculation over-estimates what is possible after expanding a configuration, there is no final configuration of δ given s 0 that is preferred to γ. According to part (i) of Assumption 1, s r must be a member of MP S(A). Let U be the set of configurations in Fringe connected to γ by an undirected path in P. If (ii) holds, each configuration in U is final, and no configuration in U is connected to a non-final configuration by an undirected path in P. Thus, there is no argument (based on a speculation) in support of a preference between a configuration γ Fringe and a configuration in U. As speculation over-estimates what is possible, there are no final configurations of δ outside of U that could be connected to γ (or any configuration in U) by an undirected path in P. By part (ii) of Assumption 1, the question of whether s r is in MP S(A) is independent of any execution in A represented by a configuration outside of U. Let U be the set of executions represented by configurations in U. As U A, and s r is in MP S(U ), s r must be in MP S(A). Moreover, the assumption that Ch(Fringe, P) is nonempty for a non-empty Fringe (Assumption 1) ensures that if a program δ has at least one legal execution, argumentationbased best first search is guaranteed to find a result. Example 3 (The Travelling Agent cont.) In the first step of the interpretation of δ in Example 1: Fringe = {γ 1 = (δ, s 1), γ 2 = (δ, s 2)} where s 1 = do(drive(h, u), s 0 ), s 2 = do(walk(h, u), s 0 ), and δ = [eat(cake) ; train(u, c)] [skip(u, c) ; eat(fruit)]. The speculations that can be formed of (δ, s 1) and (δ, s 2) are {sp 1, sp 2 } and {sp 3, sp 4 }, respectively: sp 1 = (nil, do(train(u, c), do(eat(cake), s 1))) sp 2 = (nil, do(eat(fruit), do(skip(u, c), s 1))) sp 3 = (nil, do(train(u, c), do(eat(cake), s 2))) sp 4 = (nil, do(eat(fruit), do(skip(u, c), s 2))) In an ADF defined over Fringe (given C and I as described in Examples 1 and 2), AR = {A i } i=1,2, where A 1 = C 3, P(γ 1, γ 2 ) and A 2 = C 2, P(γ 2, γ 1 ). A 1 is based on principle C 3 (driving is less tiring than walking). A 2 is based on sp 4 and principle C 2 (walking and skipping together is preferred over driving). A 2 defeats A 1 (by Definition 2 and I(C 3, C 2 )), and P = {(γ 2, γ 1 )}. Given the Schwartz set [Schwartz, 1972] as Ch, EVAL selects walk as the first step (E MP S = Ch(Fringe, P) = {γ 2 } and γ Fringe.P(γ, γ 2 )), resulting in the fringe: Fringe = {γ 1 = (δ, s 1), γ 3 = (train(u, c), do(eat(cake), s 2)), γ 4 = (eat(fruit), do(skip(u, c), s 2))} Performing the ARGUE-ADF step on the new fringe results in a new set of arguments AR = {A i } i=1,...,5 where: A 1 = C 1, P(γ 4, γ 3 ) (based on sp 4 ), A 2 = C 2, P(γ 4, γ 1 ) (based on sp 4 ), A 3 = C 3, P(γ 1, γ 3 ) (based on sp 1 and sp 2 ), A 4 = C 3, P(γ 1, γ 4 ) (based on sp 1 and sp 2 ), and A 5 = C 1, P(γ 1, γ 3 ) (based on sp 2 ). The conclusions of undefeated arguments in AR form the dominance graph P = {(γ 4, γ 3 ), (γ 4, γ 1 ), (γ 1, γ 3 )}. EVAL selects the configuration γ 4 to transition to, replacing γ 4 on the fringe with sp 4. Repeating the ARGUE-ADF and EVAL steps given this new fringe results in the execution represented by sp 4 being chosen as the final result. In Example 2, MP S(A = {s 1, s 2, s 3 }) = {s 3 }. In this example, sp 4 = (nil, s 3 ). 6 Conclusions This paper has presented an argumentation-based best first search algorithm for the interpretation of Golog programs. Argumentation allows the flexible specification (and elicitation) of preferences (including comparative preference), while the incorporation of best first search reduces the execution space traversed in finding a most preferred result. Our approach uses the concept of speculation (a tool for predicting future opportunities at each stage of search), and arguments in support of preference between configurations, to guide search toward a most preferred program execution. The proposed argumentation-based interpretation of Golog programs is shown to be admissible. As future work, we intend to consider the implicit speculation of programs, in place of the explicit generation of speculations by a TransR relation. References [Bar-Hillel and Margalit, 1988] M. Bar-Hillel and A. Margalit. Theory and Decision, 24: , [Bench-Capon, 2003] T. J. M. Bench-Capon. Persuasion in practical argument using value based argumentation frameworks. J. of Log. Comp., 13: , [Bienvenu et al., 2006] M. Bienvenu, C. Fritz, and S. McIlraith. Planning with Qualitative Temporal Preferences. In KR, pages , [Boutilier et al., 2000] C. Boutilier, R. Reiter, M. Soutchanski, and S. Thrun. Decision-Theoretic, High-Level Agent Programming in the Situation Calculus. In AAAI, [de Giacomo et al., 2000] G. de Giacomo, Y. Lespérance, and H. J. Levesque. ConGolog, a concurrent programming language based on the situation calculus. Artificial Intelligence, 121(1-2): , [Delgrande et al., 2007] James Delgrande, Torsten Schaub, and Hans Tompits. J. of Log. Comp., 17(5): , [Fritz and McIlraith, 2006] Christian Fritz and Sheila McIlraith. Decision-Theoretic Golog with Qualitative Preferences. In KR, pages , [Levesque et al., 1997] Hector Levesque, Ray Reiter, Yves Lespérance, Fangzhen Lin, and Richard B. Scherl. Golog: A logic programming language for dynamic domains. J. of Log. Prog., 31(1-3):59 83, [Ouerdane et al., 2007] W. Ouerdane, N. Maudet, and A. Tsoukias. Arguing over Actions That Involve Multiple Criteria: A Critical Review. In ECSQARU, [Schwartz, 1972] T. Schwartz. Rationality and the myth of the maximum. Noûs, 6(2):97 117, [Sohrabi et al., 2006] S. Sohrabi, N. Prokoshyna, and S. McIlraith. In ISWC, pages , [Son and Pontelli, 2004] T. Son and E. Pontelli. Planning with preferences using logic programming. In LPNMR, pages , [Tversky, 1969] A. Tversky. Intransitivity of Preferences. Psychological Review, 76(1):31 48, 1969.

Reconciling Situation Calculus and Fluent Calculus

Reconciling Situation Calculus and Fluent Calculus Stephan Schiffel and Michael Thielscher Department of Computer Science Dresden University of Technology {stephan.schiffel,mit}@inf.tu-dresden.de Abstract