Decision Theory Intro: Preferences and Utility

Size: px

Start display at page:

Download "Decision Theory Intro: Preferences and Utility"

Horatio Hutchinson
5 years ago
Views:

1 Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29 March 22, 2006 Textbook 9.5 Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 1

2 Lecture Overview Recap Decision Theory Intro Preferences Utility Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 2

3 Markov chain A Markov chain is a special sort of belief network: S 0 S 1 S 2 S 3 S 4 Thus P (S t+1 S 0,..., S t ) = P (S t+1 S t ). The past is independent of the future given the present. A stationary Markov chain is when for all t > 0, t > 0, P (S t+1 S t ) = P (S t +1 S t ). We specify P (S0 ) and P (S t+1 S t ). Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 3

4 Hidden Markov Model A Hidden Markov Model (HMM) starts with a Markov chain, and adds a noisy observation about the state at each time step: S 0 S 1 S 2 S 3 S 4 O 0 O 1 O 2 O 3 O 4 P (S 0 ) specifies initial conditions P (S t+1 S t ) specifies the dynamics P (O t S t ) specifies the sensor model Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 4

5 Lecture Overview Recap Decision Theory Intro Preferences Utility Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 5

6 Decisions Under Uncertainty In the first part of the course we focused on decision making in domains where the environment was understood with certainty Search/CSPs: single decisions Planning: sequential decisions In uncertain domains, we ve so far only considered how to represent and update beliefs What if an agent has to make decisions in a domain that involves uncertainty? this is likely: one of the main reasons to represent the world probabilistically is to be able to use these beliefs as the basis for making decisions Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 6

7 Decisions Under Uncertainty An agent s decision will depend on: 1. what actions are available 2. what beliefs the agent has note: this replaces state from the deterministic setting 3. the agent s goals We ve spoken quite a lot about (1) and (2). today let s consider (3) we ll move from all-or-nothing goals to a richer notion: rating how happy the agent is in different situations Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 7

8 Lecture Overview Recap Decision Theory Intro Preferences Utility Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 8

9 Preferences Actions result in outcomes Agents have preferences over outcomes A rational agent will take the action that leads to the outcome which he most prefers Sometimes agents don t know the outcomes of the actions, but they still need to compare actions Agents have to act (doing nothing is (often) an action). Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 9

10 Preferences Over Outcomes If o 1 and o 2 are outcomes o 1 o 2 means o 1 is at least as desirable as o 2. read this as the agent weakly prefers o 1 to o 2 o 1 o 2 means o 1 o 2 and o 2 o 1. read this as the agent is indifferent between o 1 and o 2. o 1 o 2 means o 1 o 2 and o 2 o 1 read this as the agent strictly prefers o1 to o 2 Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 10

11 Lotteries An agent may not know the outcomes of his actions, but may instead only have a probability distribution over the outcomes. A lottery is a probability distribution over outcomes. It is written [p 1 : o 1, p 2 : o 2,..., p k : o k ] where the o i are outcomes and p i > 0 such that p i = 1 i The lottery specifies that outcome o i occurs with probability p i. We will consider lotteries to be outcomes. Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 11

12 Our Goal We want to reason about preferences mathematically To do this, we must give some rules that allow us to allow us to relate and transform expressions involving the symbols, and, as well as lotteries. Just as we did with probabilities, we will axiomatize preferences. These rules will allow us to derive consequences of preference statements In the end, one has to either accept these consequences or reject one of the axioms Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 12

13 Preference Axioms Completeness: A preference relationship must be defined between every pair of outcomes: o 1 o 2 o 1 o 2 or o 2 o 1 Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 13

14 Preference Axioms Transitivity: Preferences must be transitive: if o 1 o 2 and o 2 o 3 then o 1 o 3 This makes good sense: otherwise o 1 o 2 and o 2 o 3 and o 3 o 1. An agent should be prepared to pay some amount to swap between an outcome they prefer less and an outcome they prefer more Intransitive preferences mean we can construct a money pump! Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 14

15 Preference Axioms Monotonicity: An agent prefers a larger chance of getting a better outcome than a smaller chance: If o 1 o 2 and p > q then [p : o 1, 1 p : o 2 ] [q : o 1, 1 q : o 2 ] Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 15

16 Consequence of axioms Suppose o 1 o 2 and o 2 o 3. Consider whether the agent would prefer o 2 the lottery [p : o1, 1 p : o 3 ] for different values of p [0, 1]. You can plot which one is preferred as a function of p: o 2 - lottery Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 16

17 Preference Axioms Continuity: Suppose o 1 o 2 and o 2 o 3, then there exists a p [0, 1] such that o 2 [p : o 1, 1 p : o 3 ] Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 17

18 Preference Axioms Decomposability: ( no fun in gambling ). An agent is indifferent between lotteries that have same probabilities and outcomes. This includes lotteries over lotteries. For example: [p : o 1, 1 p : [q : o 2, 1 q : o 3 ]] [p : o 1, (1 p)q : o 2, (1 p)(1 q) : o 3 ] Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 18

19 Preference Axioms Substitutivity: if o 1 o 2 then the agent is indifferent between lotteries that only differ by o 1 and o 2 : [p : o 1, 1 p : o 3 ] [p : o 2, 1 p : o 3 ] Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 19

20 Lecture Overview Recap Decision Theory Intro Preferences Utility Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 20

21 What we would like We would like a measure of preference that can be combined with probabilities. So that value([p : o 1, 1 p : o 2 ]) = p value(o 1 ) + (1 p) value(o 2 ) Can we use money as this measure of preference? Would you you prefer $1, 000, 000 or [0.5 : $0, 0.5 : $2, 000, 000]? Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 21

22 What we would like We would like a measure of preference that can be combined with probabilities. So that value([p : o 1, 1 p : o 2 ]) = p value(o 1 ) + (1 p) value(o 2 ) Can we use money as this measure of preference? Would you you prefer $1, 000, 000 or [0.5 : $0, 0.5 : $2, 000, 000]? Money is not going to work. Let s invent an abstract concept called utility. Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 21

23 Utility as a function of money 1 Utility Risk averse Risk neutral Risk seeking 0 $0 $2,000,000 Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 22

24 Theorem Is it possible that preferences are too complex and muti-faceted to be represented by single numbers? If preferences satisfy the preceding axioms, then preferences can be measured by a function such that utility : outcomes [0, 1] 1. o 1 o 2 if and only if utility(o 1 ) utility(o2). 2. Utilities are linear with probabilities: utility([p 1 : o 1, p 2 : o 2,..., p k : o k ]) k = p i utility(o i ) i=1 Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 23

25 Proof Part 1: o 1 o 2 if and only if utility(o 1 ) utility(o2). If all outcomes are equally preferred, set utility(o i ) = 0 for all outcomes o i. Otherwise, suppose the best outcome is best and the worst outcome is worst. For any outcome o i, define utility(o i ) to be the number u i such that o i [u i : best, 1 u i : worst] This exists by the Continuity property. Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 24

26 Proof (cont.) Part 1: o 1 o 2 if and only if utility(o 1 ) utility(o2). Suppose o 1 o 2 and utility(o i ) = u i, then by Substitutivity, [u 1 : best, 1 u 1 : worst] [u 2 : best, 1 u 2 : worst] Which, by completeness and monotonicity implies u 1 u 2. Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 25

27 Proof (cont.) Part 2: utility([p 1 : o 1,..., p k : o k ]) = k i=1 p i utility(o i ) Suppose p = utility([p 1 : o 1, p 2 : o 2,..., p k : o k ]). Suppose utility(o i ) = u i. We know: o i [u i : best, 1 u i : worst] By substitutivity, we can replace each o i by [u i : best, 1 u i : worst], so p = utility( [ p 1 : [u 1 : best, 1 u 1 : worst]... p k : [u k : best, 1 u k : worst]]) Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 26

28 Proof (cont.) Part 2: utility([p 1 : o 1,..., p k : o k ]) = k i=1 p i utility(o i ) By decomposability, this is equivalent to: p = utility( [ p 1 u p k u k : best, Thus, by definition of utility, p 1 (1 u 1 ) + + p k (1 u k ) : worst]]) p = p 1 u p k u k Decision Theory Intro: Preferences and Utility CPSC 322 Lecture 29, Slide 27

Reasoning Under Uncertainty: Introduction to Probability

Reasoning Under Uncertainty: Introduction to Probability CPSC 322 Uncertainty 1 Textbook 6.1 Reasoning Under Uncertainty: Introduction to Probability CPSC 322 Uncertainty 1, Slide 1 Lecture Overview 1