Learning by Questions and Answers:

Similar documents
1. Belief Convergence (or endless cycles?) by Iterated Learning 2. Belief Merge (or endless disagreements?) by Iterated Sharing (and Persuasion)

A Qualitative Theory of Dynamic Interactive Belief Revision

What is DEL good for? Alexandru Baltag. Oxford University

Product Update and Looking Backward

Schematic Validity in Dynamic Epistemic Logic: Decidability

Changing Types. Dominik Klein Eric Pacuit. April 24, 2011

The Algebra of Multi-Agent Dynamic Belief Revision

Correlated Information: A Logic for Multi-Partite Quantum Systems

Logic and Artificial Intelligence Lecture 12

From conditional probability to the logic of doxastic actions

Doxastic Logic. Michael Caie

Logical and Probabilistic Models of Belief Change

Logic and Artificial Intelligence Lecture 13

On iterated revision in the AGM framework

Belief Revision and Truth-Finding

Iterated Belief Change: A Transition System Approach

SOME SEMANTICS FOR A LOGICAL LANGUAGE FOR THE GAME OF DOMINOES

Preference and its Dynamics

An Inquisitive Formalization of Interrogative Inquiry

Belief revision: A vade-mecum

Update As Evidence: Belief Expansion

Epistemic Informativeness

Arbitrary Announcements in Propositional Belief Revision

. Modal AGM model of preference changes. Zhang Li. Peking University

ON THE SOLVABILITY OF INDUCTIVE PROBLEMS: A STUDY IN EPISTEMIC TOPOLOGY

Conflict-Based Belief Revision Operators in Possibilistic Logic

TR : Public and Private Communication Are Different: Results on Relative Expressivity

Agreement Theorems from the Perspective of Dynamic-Epistemic Logic

Towards A Multi-Agent Subset Space Logic

The Algebra of Multi-Agent Dynamic Belief Revision

For True Conditionalizers Weisberg s Paradox is a False Alarm

For True Conditionalizers Weisberg s Paradox is a False Alarm

Laboratoire d'informatique Fondamentale de Lille

DEL-sequents for Regression and Epistemic Planning

Bisimulation for conditional modalities

Action Models in Inquisitive Logic

A Unifying Semantics for Belief Change

To every formula scheme there corresponds a property of R. This relationship helps one to understand the logic being studied.

When all is done but not (yet) said: Dynamic rationality in extensive games

Don t Plan for the Unexpected: Planning Based on Plausibility Models

Axiomatic characterization of the AGM theory of belief revision in a temporal logic

09 Modal Logic II. CS 3234: Logic and Formal Systems. October 14, Martin Henz and Aquinas Hobor

A Semantic Approach for Iterated Revision in Possibilistic Logic

Ambiguous Language and Differences in Beliefs

Simulation and information: quantifying over epistemic events

Adding Modal Operators to the Action Language A

Logic and Artificial Intelligence Lecture 6

Logics for Belief as Maximally Plausible Possibility

Levels of Knowledge and Belief Computational Social Choice Seminar

Modal Logics. Most applications of modal logic require a refined version of basic modal logic.

Ranking Theory. Franz Huber Department of Philosophy University of Toronto, Canada

A Bad Day Surfing is Better than a Good Day Working: How to Revise a Total Preorder

Knowable as known after an announcement

Logics of Rational Agency Lecture 3

Conditional Logic and Belief Revision

In Defense of Jeffrey Conditionalization

Preference, Choice and Utility

Comment on Leitgeb s Stability Theory of Belief

The Axiomatic Method in Social Choice Theory:

Logics of Rational Agency Lecture 2

EQUIVALENCE OF THE INFORMATION STRUCTURE WITH UNAWARENESS TO THE LOGIC OF AWARENESS. 1. Introduction

A logical formalism for the subjective approach in a multi-agent setting

Argumentation and rules with exceptions

Everything is Knowable how to get to know whether a proposition is true

Towards Symbolic Factual Change in Dynamic Epistemic Logic

Towards Tractable Inference for Resource-Bounded Agents

Prime Forms and Minimal Change in Propositional Belief Bases

Formal Epistemology: Lecture Notes. Horacio Arló-Costa Carnegie Mellon University

Probability Calculus. Chapter From Propositional to Graded Beliefs

Nested Epistemic Logic Programs

Belief Revision in Social Networks

Logic and Social Choice Theory. A Survey. ILLC, University of Amsterdam. November 16, staff.science.uva.

Models of Strategic Reasoning Lecture 4

Multi-agent belief dynamics: bridges between dynamic doxastic and doxastic temporal logics

Belief Change in the Context of Fallible Actions and Observations

How Restrictive Are Information Partitions?

Rough Sets for Uncertainty Reasoning

Introduction to Metalogic

Undecidability in Epistemic Planning

Epistemic Modals and Informational Consequence

Conditional Probability and Update Logic

Justified Belief and the Topology of Evidence

Modal Dependence Logic

Evidence-Based Belief Revision for Non-Omniscient Agents

Using Counterfactuals in Knowledge-Based Programming

Partial Meet Revision and Contraction in Logic Programs

Truth-Functional Logic

Dalal s Revision without Hamming Distance

Valentin Goranko Stockholm University. ESSLLI 2018 August 6-10, of 29

Stabilizing Boolean Games by Sharing Information

First-Degree Entailment

The Muddy Children:A logic for public announcement

Agency and Interaction in Formal Epistemology

KRIPKE S THEORY OF TRUTH 1. INTRODUCTION

Maximal Introspection of Agents

Ten Puzzles and Paradoxes about Knowledge and Belief

A Game Semantics for a Non-Classical Logic

Some Remarks on Alternating Temporal Epistemic Logic

Reasoning with Inconsistent and Uncertain Ontologies

Encoding formulas with partially constrained weights in a possibilistic-like many-sorted propositional logic

Transcription:

Learning by Questions and Answers: From Belief-Revision Cycles to Doxastic Fixed Points Alexandru Baltag 1 and Sonja Smets 2,3 1 Computing Laboratory, University of Oxford, Alexandru.Baltag@comlab.ox.ac.uk 2 Dep. of Artificial Intelligence & Dep. of Philosophy, University of Groningen 3 IEG Research Group, University of Oxford, S.J.L.Smets@rug.nl 1 Introduction We investigate the long-term behavior of iterated belief revision with higher-level doxastic information. While the classical literature on iterated belief revision [13, 11] deals only with propositional information, we are interested in learning (by an introspective agent, of some partial information about the) answers to various questions Q 1, Q 2,..., Q n,... that may refer to the agent s own beliefs (or even to her belief-revision plans). Here, learning can be taken either in the hard sense (of becoming absolutely certain of the answer) or in the soft sense (accepting some answers as more plausible than others). If the questions are binary ( is ϕ true or not? ), the agent learns a sequence of true doxastic sentences ϕ 1,..., ϕ n,.... Investigating the long-term behavior of this process means that we are interested in whether or not the agent s beliefs, her knowledge and her conditional beliefs stabilize eventually or keep changing forever. The initial beliefs are given by a finite plausibility structure as in [5, 7, 6]: a finite set of possible worlds with a total preorder, representing the worlds relative plausibility. These structures are standard in Belief Revision: they are special cases of Halpern s preferential models [20], Grove s sphere models [19], Spohn s ordinal-ranked models [25] and Board s belief-revision structures [10]. A (pointed plausibility) model is a plausibility structure with a designated state (representing the actual state of the real world) and a valuation map (capturing the ontic facts in each possible state of the world). We adopt our finiteness assumption for three reasons. First, all the above-mentioned semantic approaches are equivalent in the finite case: this gives a certain semantic robustness to our results. Secondly, it seems realistic to assume that an agent starts with a belief base consisting of finitely many sentences (believed on their own merits, even if by logical omniscience, all their infinitely many consequences are also believed). Since all the languages considered here have the finite model property, any finite base can be represented in a finite model. Thirdly, our problem as formulated above makes sense primarily for finite structures: only for finite models it is reasonable to expect that revision might stabilize after finitely many iterations. A belief upgrade will be a type of model transformation, i.e. a (partial) function taking (pointed plausibility) models as inputs and returning upgraded models as outputs. Examples of upgrades have been proposed in the literature on

Belief Revision e.g. by Boutilier [12] and Rott [24], in the literature on dynamic semantics for natural language by Veltman [26], and in Dynamic Epistemic Logic by J. van Benthem [9]. Three of them ( update, radical upgrade and conservative upgrade) are mentioned in this paper, and our results bear specific relevance to them. But we also generalize these proposals to a natural notion of belief upgrade, in the spirit of Nayak [23], which turns out to be a special case of our Action Priority Update [7]. So our results have a general significance: they are applicable to any reasonable notion of iterated belief revision. Not every model transformation makes sense as a way to change beliefs. The fact that we are dealing with purely doxastic events imposes some obvious restrictions: we assume nothing else happens in the world except for our agent s change of beliefs; so the upgrade should not change the facts of the world but only the doxastic state. Hence, the actual state and the valuation of atomic sentences should stay the same (since we interpret atomic sentences as ontic, non-doxastic facts ). The only things that might change are the set S of possible worlds and the order : the first reflects a change in the agent s knowledge, while the second reflects a change in her (conditional) beliefs. Moreover, in the single-agent case, it is enough to consider only changes in which the new set S of states is a subset S S of the old set: since our agent has perfect recall and nothing else happens in the world except for her own belief change, there is no increase in uncertainty; hence, the set of possibilities can only shrink or stay the same 4. So single-agent upgrades correspond essentially to a shrinking (or maintaining) of the set of possible worlds, combined with a relational transformation. Still, not even every transformation of this type will make sense as a doxastic change. Our idea is that, in general, a belief upgrade comes by learning some (certain or uncertain) information about the answer to some specific question. There is a large literature on the semantics and pragmatics of questions, starting with the classical work of Grice [17]. Here, we adopt Groenendijk s approach to questions-as-partitions [18]. Questions in a given language are in this view partitions of the state space, such that each cell of the partition is definable by a sentence in the language. In particular, a binary question is given by a two-cell partition {A, A} definable by a sentence A, while a general question ( which of the following is true...? ) may have more cells {A 1,..., A n }, each corresponding to a possible answer A i. As underlying languages, we consider the language of basic doxastic logic (with operators for belief ) and its extension with conditional belief operators, as in [10, 9, 5, 7, 6]. In addition, we use dynamic operators for learning answers to questions: these operators are eliminable, but they provide a compact way of expressing properties of dynamic belief revision. In the hard sense, the action of learning the correct answer to a question Q = {A 1,..., A n } restricts the state space to the cell A := A i that contains the real state of the world; the plausibility relation and the valuation are correspondingly restricted, while the real world is left the same. This operation (as a possible semantics for the AGM revision operator T A) is known as condition- 4 As shown in the Dynamic Epistemic Logic literature, this is not the case for multiagent upgrades, see e.g. [4, 3, 7, 6]

ing in the literature on Belief Revision (as a qualitative analogue of Bayesian conditioning), and as update (or public announcement ) in the Dynamic Epistemic Logic literature [8]. Intuitively, it represents the kind of learning in which the agent becomes absolutely certain that the correct answer is A: she comes to know that A was true, in an irrevocable, un-revisable way. Such learning can never be undone: the deleted states can never be considered possible again. However, one can also learn the answer to a question Q = {A 1,..., A n } in a softer (and more realistic) sense of learning : one may come to believe that the correct answer A := A i is more plausible than the others; in addition, one may come to rank the plausibility of the various other possible answers. This means that what is learned is in fact a plausibility order on the cells of the partition Q. The effect of this type of learning is that the order between states in different cells A i, A j (i j) is changed (in accordance to the newly learned plausibility relation on cells), while the order between states in the same cell is kept the same. This is the obvious qualitative analogue of Jeffrey Conditioning. An example is the radical (or lexicographic ) upgrade A from [24, 9]: the A-worlds are promoted, becoming more plausible than the non-a-worlds, while within the two zones the old order is kept the same. This is a soft learning of the answer to a binary question Q = {A, A}. Our setting can also capture another type of doxastic transformation proposed in the literature: the conservative upgrade A, promoting only the most plausible A-worlds (making them the most plausible overall), and leaving everything else the same. In practice, we may combine soft Jeffreyan learning with hard Bayesian learning: while ranking some of the possible answers in their order of plausibility, one may simultaneously come to exclude other answers as impossible. In pragmatic terms: although receiving only soft information about the correct answer, one may also learn in the hard sense that the correct answer is not one of the answers to be excluded. So the most general notion of learning the answer to a question Q = {A 1,..., A n } is an action that throws out (all the states lying in) some of the cells of the partition, and imposes some (new) plausibility order between (the states in) any two remaining cells A i, A j (i j), while leaving unchanged the order between the states in the same cell. Any such action of learning an answer to a question Q = {A 1,..., A n } can thus be encoded as a plausibility order (or preorder) on some set {A i1,..., A ik } of possible answers (a subset of the set of all possible answers {A 1,..., A n }). We call this type of actions belief upgrades: they represent our proposal for a general notion of natural single-agent belief-revision event. One can easily recognize them as a special case of the doxastic action models (or doxastic event models ), introduced (as ordinal-ranked models) in [2, 14] and (as qualitative plausibility models) in [7, 6]: namely, the special case in which the preconditions of every two events are mutually disjoint. But the important thing is the meaning of this representation. The intended effect (as described in the paragraph above) of this action on any initial plausibility model is a kind of lexicographic refinement of the old total preorder by the new preorder. This can be recognized as a special case (the single-agent, partitional case) of the Action Priority Rule intro-

duced in our previous work [7, 6] on multi-agent belief revision. While this move was seen there as the natural generalization to a belief-revision-friendly setting of the so-called update product from Dynamic Epistemic Logic [4, 3, 15], the special case considered here is even more closely related to mainstream work in Belief Revision. Indeed, it is a semantic counterpart of Nayak s Jeffryzation of belief revision [23], as expressed in the following quote: (...) one of the morals we have learnt from Richard Jeffrey s classical work (...) is that in belief updating, the representation of the belief state, the representation of the new information and the result of the update should be of the same type. ([23], page 361). An upgrade is said to be correct if the believed answer is true: the actual state belongs to the most plausible cell (according to the newly learnt plausibility relation on possible answers). In the case of radical upgrades A, this means that the upgrade is truthful, i.e. A is true in the real world. We are interested in iterated belief revision processes induced by sequences of correct upgrades. We formalize this in our notion of correct upgrade streams. A particular case is the one of repeated (correct) upgrades: the same (partial information about the same) answer is repeatedly learnt in the same way, so the same transformation is iterated. The reason this is a non-superfluous repetition is that the learnt information may be doxastic: it may refer to the agent s own (conditional) beliefs, which are subject to change during the process. Hence, the truth value of an answer may change; as a consequence, it may be non-redundant to announce again the same answer to the same question! This phenomenon is well-known in the multi-agent case, see e.g. the Muddy Children Puzzle [16]. But, despite the existence of Moore sentences [22] (e.g. p is the case, but you don t believe it! ), many authors tend to think that the problem becomes trivial in the (introspective) single-agent case: since the agent already knows what she believes and what not, it may appear that whenever she learns a new information, she can separate its ontic content from its (already known) doxastic part, and can thus simply learn a non-doxastic sentence. This is indeed the case for a fixed initial model, but in general the process of eliminating the doxastic part is highly complex and model-dependent: in different contexts, the same doxastic sentence may convey different information. And we cannot simply disregard such sentences: new information may come packaged in such a way that it explicitly refers to the agent s beliefs in order to implicitly convey important new information about reality, in a way that is highly context-dependent. An example is the announcement: You are wrong about p: whatever you believe about (whether or not) p is false! This announcement cannot be truthfully repeated, since (like a Moore sentence) it becomes false after being learnt. Our counterexamples in this paper are of a similar nature, but with added subtleties, designed to keep the announcement truthful after it was learned. They decisively show that Introspection does not trivialize the problem of repeated upgrades: even when applied on an initial finite model, a correct upgrade (with the same true sentence) may be repeated ad infinitum, without ever reaching a fixed point of the belief-revision process! Finite models may keep oscillating, changing cyclicly, and so do the agent s conditional beliefs.

However, we also have some positive convergence results: when iterating correct upgrades (starting on a finite initial model), the agent s simple (nonconditional) beliefs (as well as her knowledge) will eventually stabilize, reaching a fixed point. Moreover, if we apply correct repeated upgrades with (the same) new information refering only to simple (non-conditional) beliefs (i.e. we repeatedly revise with the same sentence in basic doxastic logic), then the whole model always stabilizes after a finite number of repeated upgrades: so in this case even the agent s conditional beliefs reach a fixed point. The above stabilization results form the main technical contribution of this paper (besides the above-mentioned easy, but rather surprising and conceptually important, counterexamples); their proofs are non-trivial and are relegated to the Appendix. These results are applicable in particular to the important case of iterating (or repeating) truthful radical upgrades A. In contrast (as our second counterexample shows), even the simple beliefs may never stabilize when repeating truthful conservative upgrades A. The reason for which the above result doesn t apply is that conservative upgrades, no matter how truthful, can still be deceiving (i.e. fail to be correct ). 2 Questions and Upgrades on Preferential Models A (finite, single-agent) plausibility frame is a structure (S, ), consisting of a finite set S of states and a total preorder (i.e. a reflexive, transitive and connected 5 binary relation on S) S S. The usual reading of s t is that state s is at least as plausible as state t. We write s < t for the strict plausibility relation (s is more plausible than t): s < t iff s t but t s. Similarly, we write s = t for the equi-plausibility (or indifference) relation (s and t are equally plausible): s = t iff both s t and t s. The connectedness assumption is not actually necessary for any of the results in this paper 6, but it is convenient to have, as it simplifies the definition of (conditional) beliefs. For infinite models, it is usually assumed that the relation is well-founded, but in the finite case (the only one of concern here) this condition is automatically satisfied. A (pointed) plausibility model is a structure S = (S,,, s 0 ), consisting of a plausibility frame (S, ) together with a valuation map : Φ P(S), mapping every element p of some given set Φ of atomic sentences into a set of states p S, and together with a designated state s 0 S, called the actual state. Knowledge, Belief and Conditional Belief Given a plausibility model S, and sets P, R S, we put: best P = Min P := {s P : s s for all s P }, best := bests, KP := {s S : P = S}, BP := {s S : best P }, B R P := {s S : best R P }. Interpretation. The states of S represent possible descriptions of the real world. The correct description of the world is given by the actual state s 0. Intuitively, 5 A binary relation R S S is called connected (or complete ) if for all s, t S we have either srt or trs. A connected (pre)order is usually called a total (pre)order. 6 And in fact for multi-agent frames it needs to be replaced by local connectedness, see [7, 6]

our (implicit) agent considers a description s as possible iff s S. The atomic sentences p Φ represent ontic (i.e. non-doxastic) facts. The valuation tells us which facts hold in which worlds. The plausibility relations capture the agent s (conditional) beliefs about the world: if the agent learnt that the real state is either s or t, she would believe (conditional on this information) that it was the most plausible of the two. best P is the set of most plausible states satisfying P, while KP, BP, B R P represent respectively the propositions P is known, P is believed and P is believed conditional on R. So knowledge corresponds to truth in all the worlds that the agent considers as possible. Belief corresponds to truth in all the most plausible worlds; while conditional belief given a condition R corresponds to truth in all the most plausible R- worlds. So conditional beliefs B R describe the agent s contingency plans for belief revision in case she learns R. To quote J. van Benthem [9], conditional beliefs pre-encode the agent s potential belief changes. Note that the sets KP, BP, B R P are always equal either to the empty set or to the whole state space S; this reflects our implicit Introspection assumption: the agent knows what she knows (and what she believes) and what she doesn t know (or doesn t believe). Example 1 Consider a pollster (Charles) with the following beliefs about how a voter (Mary) will vote: s : r t : w : d In the representation, we skip the loops and the arrows that can be obtained by transitivity. There are three possible worlds s (in which Mary votes Republican), w (in which she votes Democrat) and t (in which she doesn t vote). We assume there are no other options: i.e. there are no other candidates and it is impossible to vote for both candidates. The atomic sentences are r (for voting Republican ) and d (for voting Democrat ). The valuation is given in the diagram: r = {s}, d = {w}. We assume the real world is s, so in fact Mary will vote Republican! But Charles believes that she will vote Democrat (d); and in case this turns out wrong, he d rather believe that she won t vote ( d r) than accepting that she may vote Republican: so t is more plausible than s. Doxastic Propositions A doxastic proposition is a map P assigning to each model S some set P S S of states in S. We write s = S P, and say that P is true at s in model S, iff s P S. We skip the subscript and write s = P when the model is fixed. We say that P is true in the (pointed) model S, and write S = P, if it is true at the actual state s 0 in the model S (i.e. s 0 = S P). We denote by Prop the family of all doxastic propositions. In particular, the always true and always false propositions are given pointwise by S :=, S := S. For each atomic sentence p there exists a corresponding proposition p, given by p S = p S. All the Boolean operations on sets can be lifted pointwise to operations on Prop: negation ( P) S := S \P S, conjunction (P R) S := P S R S etc. Similarly, we can define pointwise the best operator, a special doxastic proposition best and the epistemic and (conditional) doxastic modalities: (best P) S := best (P S ), best S := best S, (KP) S := KP S etc. Finally, the relation of entailment P = R is given pointwise by set-inclusion: P = R iff P S R S for all S.

Questions and Answers A question is a (finite) family Q = {A 1,..., A n } of exhaustive and mutually disjoint propositions, i.e. i=1,n Ai = and A i A j = for all i j. Every question Q induces a partition {A 1 S,..., An S } on any model S. Any of the sets A i is a possible answer to question Q. The correct answer to Q in a pointed model S is the unique true answer A i ( i.e. such S = A i ). Example 1, continued: In the above example, the relevant question is how will Mary vote? This can be represented as a question Q = {r, d, r d}. Another relevant question is will Mary vote Democrat?, given by Q = {d, d}. Learning the Answer with Certainty: Updates The action by which the agent learns with absolute certainty the correct answer A = A i to a question Q is usually denoted by!a. This operation is known as update in Dynamic Epistemic Logic, and as conditioning in the Belief Revision literature (where the term update is used for a different operation, the Katzuno-Mendelzon update [21]). This action deletes all the non-a-worlds from the pointed model, while leaving everything else the same. The update!a is executable on a pointed model S iff it is truthful, i.e. S = A. So formally, an update!a is a partial function that takes as input any pointed model S = (S,,, s 0 ) satisfying A and returns a new pointed model!a(s) = (S,,, s 0), given by: S = A S, s t iff s t and s, t S, p = p S, for all atomic p, and s 0 = s 0. Example 2 Consider the case in which Charles learns for sure that Mary will not vote Democrat: this is the update!( d), whose result is the updated model s : r t : Learning uncertain information: Upgrades If the agent only learns some uncertain information about the answer to a question Q, then what is actually learnt is a plausibility relation on a subset A {A 1,..., A n } of the set of all possible answers. Intuitively, the agent learns that the correct answer belongs to A, and that some answers are more plausible than others. We encode this as a plausibility frame (A, ), called a belief upgrade, whose worlds A A are mutually disjoint propositions. This is a special case of event models [4, 3, 2, 14, 7, 6] (in which every two events have mutually disjoint preconditions). A belief upgrade α = (A, ) is executable on a model S if the disjunction of all its answers is true (S = A). As an operation on (pointed) models, the upgrade α = (A, ) is formally defined to be a (partial) function α taking as input any model S = (S,,, s 0 ) satisfying A, and returning a new model α(s) = (S,,, s 0), given by: S = ( A) S ; s t iff either s A i, t A j for some answers such that A i < A j, or else s t and s, t A i for the same answer A i ; p = p S, for all atomic p; and s 0 = s 0. So this operation deletes all the worlds not satisfying any of the answers in the list A, and promotes the worlds satisfying more plausible answers, making them more plausible than the worlds satisfying less plausible answers; in rest, everything else stays the same.

It is easy to see that the model α(s) obtained by applying a belief upgrade α = (A, ) to an initial model S is precisely the anti-lexicographic product update S α of the state model S and the event model (A, ), as prescribed by the Action Priority Rule in [7, 6]. Important Special Case: Standard Upgrades An upgrade (A, ) is standard if the preorder is actually an order; in other words, if the plausibility relation between any two distinct answers A i A j in A is strict (i.e. A i = A j ). Lemma 1 Every belief upgrade is equivalent to a standard upgrade: for every upgrade α there is a standard upgrade α s. t. α(s) = α (S) for all models S. Standard Form So we can assume (and from now we will assume) that all our upgrades are standard. This allows us to denote them using a special notation, called standard form: (A 1,..., A n ) represents the upgrade having {A 1,..., A n } as its set of answers and in which the order is A 1 < A 2 <... < A n. Standard Upgrade Induced by a Propositional Sequence. Any sequence P 1,..., P n of propositions induces a standard upgrade [P 1,..., P n ], given by [P 1,..., P n ] := (P 1, P 1 P 2,..., P i P n ). 1 i n 1 Special Cases: updates, radical upgrades, conservative upgrades (1) Updates:!P is a special case of upgrade:!p = [P] = (P). (2) Radical Upgrades: An operation P called radical upgrade (or lexicographic upgrade ) was introduced by various authors [23], [24], [9], as one of the natural ways to give a dynamic semantics to the AGM belief revision operator [1]. This operation can be described as promoting all the P-worlds so that they become better (more plausible) than all P-worlds, while keeping everything else the same: the valuation, the actual world and the relative ordering between worlds within either of the two zones (P and P) stay the same. Radical upgrade is a special case of upgrade: P = [P, ] = [P, P] = (P, P). (3) Conservative Upgrades: Following [12, 24, 9], consider another operation P, called conservative upgrade. This corresponds to promoting only the best (most plausible) P-worlds, so that they become the most plausible overall, while keeping everything else the same. This operation seems a more appropriate semantics for AGM revision, since it performs the minimal model-revision that is forced by believing P. It is also a special case of our upgrades : P = (best P) = [bestp, ] = (best P, best P). Correctness A standard upgrade (A 1,..., A n ) is said to be correct with respect to a pointed model S if its most plausible answer is correct; i.e. if A 1 is true in the actual state: S = A 1. A correct upgrade induces a true belief with respect to its underlying question: the agent comes to believe the correct answer.

Correctness and Truthfulness An update!p, a radical upgrade P or a conservative upgrade P are said to be truthful in a pointed model S if S = P. It is easy to see that an update!p, seen as an upgrade (P), is correct (with respect to S) iff it is truthful. Similarly, a radical upgrade P, seen as an upgrade (P, P), is correct iff it is truthful. However, for conservative upgrades, the two notions differ: seen as an upgrade (best P, best P), P is correct iff S = best P (which is not equivalent to S = P). As a result, correctness of a conservative upgrade implies truthfulness, but the converse is false. So a truthful conservative upgrade can still be deceiving (i.e. incorrect)! Examples In Example 1, suppose that a trusted informer tells Charles that Mary will not vote Democrat. Charles is aware that the informer is not infallible: so this is not an update! Nevertheless, if Charles has a strong trust in the informer, then this learning event is a radical upgrade ( d), whose result is: w : d s : r t : Now, Charles believes that Mary will not vote at all (and so that she won t vote Democrat)! But if he later learned that this is not the case, he d still keep his new belief that she won t vote Democrat (so he d conclude she votes Republican). Contrast this situation with the case in which Charles hears a rumor that Mary will not vote Democrat. Charles believes the rumor, but only barely: if he later needs to revise his beliefs, he d give up immediately his belief in the rumor. This is a conservative upgrade d of the model in Example 1, resulting in: s : r w : d t : Now, Charles believes Mary will not vote, but if he later learned this was not the case, he d dismiss the rumor and revert to believing that Mary votes Democrat. Operations on Upgrades: Dynamic Modalities and Sequential Composition Given an upgrade α, we can define a dynamic upgrade modality [α], associating to any doxastic proposition P another proposition [α]p, given by ([α]p) S := P α(s). Since upgrades are functions, we can compose them functionally α α or relationally α; α (=α α). The resulting function is again an upgrade: Proposition 2 Upgrades are closed under composition. The Logic of Upgrades and Conditional Beliefs. We can use the standard form to introduce a nice syntax for the logic of upgrades and conditional beliefs: ϕ ::= p ϕ ϕ ϕ B ϕ ϕ [[ϕ,..., ϕ]]ϕ Here [[ϕ 1,..., ϕ n ]] denotes the dynamic modality for the standard upgrade [ϕ 1,..., ϕ n ]. This logic can be embedded in the logic in [7, 6]. The semantics is defined compositionally, in the obvious way: to each sentence ϕ we associate a doxastic proposition ϕ, by induction on the structure of ϕ, e.g. B ϕ ψ = B ϕ ψ etc. The operators K, [!ϕ] and [ ϕ] can be defined in this logic. There exists a complete axiomatization (to be presented in an extended version of this paper).

3 Iterated Upgrades Since in this paper we are concerned with the problem of long-term learning via iterated belief revision, we need to look at sequences of upgrades. We are particularly interested in learning true information, and so in iterating correct upgrades. Our main problem is whether or not the iterated revision process induced by correct upgrades converges to a stable set of (conditional) beliefs. An upgrade stream α is an infinite sequence of upgrades (α n ) n N. An upgrade stream is definable in a logic L if for every n all the answers of α n are propositions that are definable in L. A repeated upgrade is an upgrade stream of the form (α, α,...), i.e. one in which α n = α for all n N. Any upgrade stream induces a function mapping every model S into a (possibly infinite) sequence α(s) = (S n ) of models, defined inductively by: S 0 = S, and S n+1 = α n (S n ), if α n is executable on S n. The upgrade stream α is said to be executable on a pointed model S if every α n is executable on S n (for all n N). Similarly, α is said to be correct with respect to S if α n is correct with respect to S n, for every n N. If an upgrade stream is an update stream (i.e. one consisting only of updates), or a radical upgrade stream (consisting only of radical upgrades), or a conservative upgrade stream, then we call it truthful if every α n is truthful with respect to S n. We say that an upgrade stream α stabilizes a (pointed) model S if the process of model-changing induced by α reaches a fixed point; i.e. there exists some n N such that S n = S m for all m > n. We say α stabilizes the agent s (simple, non-conditional) beliefs on the model S if the process of belief-changing induced by α on S reaches a fixed point; i.e. if there exists some n N such that S n = BP iff S m = BP, for all m > n and all doxastic propositions P. Equivalently, iff there exists some n N such that best Sn = best Sm for all m > n. Similarly, we say α stabilizes the agent s conditional beliefs on the model S if the process of conditional-belief-changing induced by α on S reaches a fixed point; i.e. if there exists n N such that S n = B R P iff S m = B R P, for all m > n and all doxastic propositions P, R. Equivalently, iff there exists n N such that (bestp) Sn = (bestp) Sm for all m > n and all doxastic propositions P. Finally, α stabilizes the agent s knowledge on the model S if the knowledgechanging process induced by α on S reaches a fixed point; i.e. if there exists n N such that S n = KP iff S m = KP for all m > n and all propositions P. Equivalently, iff there exists n N such that S n = S m for all m > n. Lemma 3 An upgrade stream stabilizes a pointed model iff it stabilizes the agent s conditional beliefs on that model. Lemma 4 If an upgrade stream stabilizes the agent s conditional beliefs then it also stabilizes her knowledge and her (simple) beliefs. Proposition 5 Every upgrade stream stabilizes the agent s knowledge. Corollary 6 [8] Every executable update stream stabilizes the model (and thus it stabilizes the agent s knowledge, beliefs and conditional beliefs).

The analogue of Corollary 6 is not true for arbitrary upgrade streams, not even for correct upgrade streams. It is not even true for repeated correct upgrades: Counterexample: In Example 1, suppose that the trusted informer tells Charles the following true statement A: If you d truthfully learn that Marry won t vote Republican, then your resulting belief about whether or not she votes Democrat would be wrong. So A is the sentence r (d B r d) ( d B r d). This radical upgrade A is truthful (since A is true in s), and so correct, yielding w : d s : r t : In this model, A is true in s (and in w), so another correct upgrade A yields: t : w : d s : r Yet another correct upgrade with the same proposition produces w : d t : s : r then another correct upgrade A gets us back to the previous model: t : w : d s : r Clearly from now on the last two models will keep reappearing, in an endless cycle: hence in this example, conditional beliefs never stabilize! But note that the simple beliefs are the same in the last two models: so the set of simple (nonconditional) beliefs stays the same from now on. This is not an accident, but a symptom of a more general converge phenomenon: Theorem 7 Every correct upgrade stream stabilizes the agent s beliefs. This is the main result of this paper (proved in the Appendix), and it has a number of important consequences. Corollary 8 Every correct repeated upgrade α, α,..., α,... whose answers are definable in doxastic-epistemic logic (i.e. the language of simple belief and knowledge operators) stabilizes every model, and thus stabilizes the conditional beliefs. Corollary 9 Every truthful radical upgrade stream ( P i ) i N stabilizes the agent s beliefs. Corollary 10 Every repeated stream of truthful radical upgrades ϕ,..., ϕ of a sentence definable in doxastic-epistemic logic stabilizes every model (with respect to which it is truthful), and thus stabilizes the agent s conditional beliefs. The (analogues of the) last two Corollaries fail for conservative upgrades: Counterexample: In the situation S from Example 1, suppose Charles hears a rumor B saying that: Either Mary will vote Republican or else your beliefs about whether or not she votes Democrat are wrong. So B is the sentence r (d Bd) ( d Bd). The conservative upgrade B is truthful: s B S. (But this is not a correct upgrade: s (best B) S = {t}.) The result of the upgrade is: s : r w : d t :

Again, the sentence B is true in the actual state s (as well as in w), so the rumor B can again be truthfully spread, resulting in... the original model once again: s : r t : w : d So from now on, the two models (that support opposite beliefs!) will keep reappearing, in an endless cycle: the agent s beliefs will never stabilize! 4 Conclusions This paper is an investigation of the long-term behavior of iterated belief revision with higher-level doxastic information. We propose a general notion of correct belief upgrade, based on the idea that non-deceiving belief revision is induced by learning (partial, but true information about) answers to questions. The surprising conclusion of our investigation is that iterated revision with doxastic information is highly non-trivial, even in the single-agent case. More concretely, the revision is not guaranteed to reach a fixed point, even when indefinitely repeating the same correct upgrade: neither the models, nor the conditional beliefs are necessarily stabilized. But we also have some important stabilization results: both knowledge and beliefs are eventually stabilized by iterated correct upgrades; moreover, both the models and the conditional beliefs are stabilized by repeated correct upgrades with non-conditional information (expressible in doxastic-epistemic logic). These positive results have a wide scope, being applicable in particular to truthful radical upgrades (as well as to updates ). However, their range of applicability is limited by the existence of truthful-but-deceiving (incorrect) upgrades: in particular, the repetition of the conservative upgrade in our last counterexample leads to infinite cycles of non-stabilizing beliefs. Acknowledgments The authors give special thanks to J. van Benthem for stimulating discussions and seminal ideas. We thank D. Mackinson for his insightful commentary on the second s author s LSE presentation of preliminary work leading to this paper. We thank to an anonymous WOLLIC referee for useful references. The research of the first author was partially supported by the Netherlands Organization for Scientific Research, grant number B 62-635, which is herewith gratefully acknowledged. The second author acknowledges the support by the University of Groningen via a Rosalind Franklin research position. References [1] C.E. Alchourrón, P. Gärdenfors, and D. Makinson. On the logic of theory change: partial meet contraction and revision functions. JSL, 50:510 530, 1985. [2] G. Aucher. A combined system for update logic and belief revision. Master s thesis, ILLC, University of Amsterdam, Amsterdam, the Netherlands, 2003. [3] A. Baltag and L.S. Moss. Logics for epistemic programs. Synthese, 139:165 224, 2004. Knowledge, Rationality & Action 1 60.

[4] A. Baltag, L.S. Moss, and S. Solecki. The logic of common knowledge, public announcements, and private suspicions. In I. Gilboa, editor, Proc. of TARK 98, pages 43 56, 1998. [5] A. Baltag and S. Smets. Conditional doxastic models: a qualitative approach to dynamic belief revision. ENTCS, 165:5 21, 2006. [6] A. Baltag and S. Smets. The logic of conditional doxastic actions. Texts in Logic and Games, 4:9 32, 2008. [7] A. Baltag and S. Smets. A qualitative theory of dynamic interactive belief revision. Texts in Logic and Games, 3:9 58, 2008. [8] J.F.A.K. van Benthem. One is a lonely number. In P. Koepke Z. Chatzidakis and W. Pohlers, editors, Logic Colloquium 2002, pages 96 129. ASL and A.K. Peters, Wellesley MA, 2006. [9] J.F.A.K. van Benthem. Dynamic logic of belief revision. JANCL, 17 (2):129 155, 2007. [10] O. Board. Dynamic interactive epistemology. Games and Economic Behaviour, 49:49 80, 2002. [11] R. Booth and T. Meyer. Admissible and restrained revision. Journal of Artificial Intelligence Research, 26:127 151, 2006. [12] C. Boutilier. Iterated revision and minimal change of conditional beliefs. JPL, 25(3):262 305, 1996. [13] A. Darwiche and J. Pearl. On the logic of iterated belief revision. Artificial Intelligence, 89:1 29, 1997. [14] H.P. van Ditmarsch. Prolegomena to dynamic logic for belief revision. Synthese, 147:229 275, 2005. [15] H.P. van Ditmarsch, W. van der Hoek, and B.P. Kooi. Dynamic Epistemic Logic, Synthese Library 337. Springer, 2007. [16] R. Fagin, J.Y. Halpern, Y. Moses, and M.Y. Vardi. Reasoning about Knowledge. MIT Press, Cambridge MA, 1995. [17] P. Grice. Logic and conversation. In Studies in the Ways of Words. Harvard University Press, Cambridge, MA, 1989. [18] J. Groenendijk and M. Stokhof. Questions. In J. van Benthem and A. ter Meulen, editors, Handbook of Logic and Language, pages 1055 1124. Elsevier, 1997. [19] A. Grove. Two modellings for theory change. JPL, 17:157 170, 1988. [20] J.Y. Halpern. Reasoning about Uncertainty. MIT Press, Cambridge MA, 2003. [21] H. Katsuno and A. Mendelzon. On the difference between updating a knowledge base and revising it. In Cambridge Tracts in Theoretical Computer Science, pages 183 203, 1992. [22] G.E. Moore. A reply to my critics. In P.A. Schilpp, editor, The Philosophy of G.E. Moore, volume 4 of The Library of Living Philosophers, pages 535 677. Northwestern University, Evanston IL, 1942. [23] A.C. Nayak. Iterated belief change based on epistemic entrenchment. Erkenntnis, 41:353 390, 1994. [24] H. Rott. Conditionals and theory change: revisions, expansions, and additions. Synthese, 81:91 113, 1989. [25] W. Spohn. Ordinal conditional functions: a dynamic theory of epistemic states. In W.L. Harper and B. Skyrms, editors, Causation in Decision, Belief Change, and Statistics, volume II, pages 105 134, 1988. [26] F. Veltman. Defaults in update semantics. JPL, 25:221 261, 1996.

Appendix: Proofs Proof of Lemma 1: We change the question A to a different question A, consisting of all the disjunctions of equi-plausible answers from A. We change the answer, taking the order induced on A : i I Ai < j J Aj iff A i < A j for some i I, j J. Proof of Prop. 2: Given two standard upgrades α = (A 1,..., A n ) and α = (B 1,..., B m ), the composition α; α is the upgrade having {A i [A i ]B j : i = 1, n, j = 1, m} as set of answers, endowed with the anti-lexicographic order: (A i [A i ]B j ) < (A k [A k ]B l ) iff either j < l, or j = l and i < k. Proof of Lemma 3: One direction is obvious: if the model stabilizes, then so do conditional beliefs. The other direction: suppose conditional beliefs stabilize at stage n. Then we have S n = S m (else, if t S n \ S m then S m = B {t} but S n = B {t}, contradicting the stabilization of conditional beliefs at n). For s, t S n = S m, we have the equivalencies: s < t in S n iff S n = B {s,t} {t} iff S m = B {s,t} {t} iff s < t in S m. Proof of Lemma 4: The conclusion follows from the observation that knowledge and simple belief are definable in terms of conditional beliefs: KP = B P P, BP = B P. Proof of Prop. 5: α(s) S for every upgrade (A, ) and every model S. So S 0 S 1 S n is an infinite descending chain of finite sets, hence it must stabilize: there exists n such that S n = S m for all m > n. Proof of Cor. 6: An update changes only the set of possible states. By Prop. 5, the set of states stabilizes at some finite stage, so the model itself stabilizes at that stage. To prove Theorem 7, we first fix the setting and prove some preliminary results. Let α = (α n) n N be an upgrade stream in standard form α n = (A 1 n,..., A mn n ), and let A(n) := A 1 n be the best (most plausible) answer of α n. Let S = (S,,, s 0) be a model, and let (S n) n N be the sequence of pointed models S n = (S n, n,, s 0) defined by applying in turn each of the upgrades in α: S 0 = S, and S n+1 = α n(s n). In the following we assume α is correct with respect to S, i.e. s 0 A(n) Sn for all n N. Lemma 11 For all n N we have s < n t for all s i<n A(i) S i and all t S n \ ( i<n A(i) S i ). Proof. This is by induction on n: for n = 0, it is trivially true (since i<0 A(i) S i = ). For the induction step: we assume it true for n. After applying α n to S n, we have (by the definition of an upgrade) that s < n+1 w for all s A(n) Sn and all w S n+1 \ A(n) Sn (since all A(n)-worlds get promoted ) and also that ( ) ( ) s < n+1 t for all s A(n) Sn A(i) Si and all t A(n) Sn \ A(i) Si i<n (because of the induction assumption together with the fact that inside the A(n) Sn - zone the old order n is preserved by applying α n). Putting these two together, and using the transitivity of < n+1, we obtain the desired conclusion. i<n

Lemma 12 For all n N, we have best S n i<n A(i) S i. Proof. Suppose towards a contradiction that, for some n, there is some state t best S n such that t A(i) i<n S i. By the correctness assumption, we have s 0 A(i) Si for all i, so s 0 A(i) i<n S i. By Lemma 11, s 0 < n t, which contradicts with t best S n. Lemma 13 For all n N, we have best S n = Min n ( i<n A(i) S i ). Proof. This follows immediately from the previous Lemma, together with the definition of best S n and of Min n P. Lemma 14 There exists a number n 0 s. t. i<n 0 A(i) Si = i<m A(i) S i for all m n 0. Proof. The sequence S = S 0 A(0) S0 A(0) S0 A(1) S1 i<n A(i) S i is an infinite descending sequence of finite sets, so it must stabilize at some stage n 0. Proof of Theorem 7: By definition of upgrades (and by A(m) being the most plausible answer of α m), we know that for all m, the order inside the A(m) Sm -zone is left the same by α m. Let now n 0 be as in Lemma 14. So we must have i<n 0 A(i) Si A(m) Sm, for all m n 0. Hence, the order inside i<n 0 A(i) Si is left the same by all future upgrades α m, with m n 0. As a consequence, we have: ( ) ( ) Min n0 A(i) Si = Min m A(i) Si for all m n 0. i<n 0 i<n 0 This, together with the previous two Lemmas, gives us that: ( ) ( ) ( ) bests n0 = Min n0 A(i) Si = Min m A(i) Si = Min m A(i) Si = bests m i<n 0 i<n 0 i<m for all m n 0. So the sequence of most plausible states stabilizes at n 0. Proof of Corollary 8: Suppose we have a repeated upgrade α, α,..., where α = [ϕ 1,..., ϕ k ] is such that all sentences ϕ i belong to doxastic-epistemic logic. Putting together Proposition 5 and Theorem 7, we know that both the knowledge (the set of all states) and the beliefs (the set of most plausible states) stabilize: there exists some n 0 such that S n0 = S m and best S n0 = best S m for all m > n 0. We also know that the valuation is stable. Using these (together with the full introspection of knowledge and beliefs), we can check (by induction on ϕ) that, for every (single-agent) doxasticepistemic sentence ϕ, we have ϕ Sn0 = ϕ Sm for all m > n 0. (Here, ϕ S is the interpretation of ϕ in model S, i.e. the set {s S n : s = Sn ϕ}.) So the interpretations of all the answers ϕ i stabilize at stage n 0. Hence (by the definition of the model transformation induced by α), applying α after this stage will not produce any changes. Proofs of Corollaries 9 and 10: These follow immediately from Theorem 7 and Corollary 8, together with the fact that a radical upgrade is correct iff it is truthful.