Game playing. Chapter 6. Chapter 6 1

Size: px

Start display at page:

Download "Game playing. Chapter 6. Chapter 6 1"

Jerome Hunt
5 years ago
Views:

1 Game playing Chapter 6 Chapter 6 1

2 Outline Minimax α β pruning UCT for games Chapter 6 2

3 Game tree (2-player, deterministic, turns) Chapter 6 3

4 Minimax Perfect play for deterministic, perfect-information games Idea: choose move to position with highest minimax value = best achievable payoff against best play E.g., 2-ply game: Chapter 6 4

5 Minimax algorithm function Minimax-Decision(state) returns an action inputs: state, current state in game return the a in Actions(state) maximizing Min-Value(Result(a, state)) function Max-Value(state) returns a utility value if Terminal-Test(state) then return Utility(state) v for a, s in Successors(state) do v Max(v, Min-Value(s)) return v function Min-Value(state) returns a utility value if Terminal-Test(state) then return Utility(state) v for a, s in Successors(state) do v Min(v, Max-Value(s)) return v Chapter 6 5

6 Properties of minimax Complete?? Chapter 6 6

7 Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Chapter 6 7

8 Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? Chapter 6 8

9 Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? O(b m ) Space complexity?? Chapter 6 9

10 Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? O(b m ) Space complexity?? O(bm) (depth-first exploration) For chess, b 35, m 100 for reasonable games exact solution completely infeasible But do we need to explore every path? Chapter 6 10

11 α β pruning example Chapter 6 11

12 α β pruning example Chapter 6 12

13 α β pruning example Chapter 6 13

14 α β pruning example Chapter 6 14

15 α β pruning example Chapter 6 15

16 Why is it called α β? α is the best value (to max) found so far off the current path If V is worse than α, max will avoid it prune that branch Define β similarly for min Chapter 6 16

17 The α β algorithm function Alpha-Beta-Decision(state) returns an action return the a in Actions(state) maximizing Min-Value(Result(a, state)) function Max-Value(state, α, β) returns a utility value inputs: state, current state in game α, the value of the best alternative for max along the path to state β, the value of the best alternative for min along the path to state if Terminal-Test(state) then return Utility(state) v for a, s in Successors(state) do v Max(v, Min-Value(s, α, β)) if v β then return v α Max(α, v) return v function Min-Value(state, α, β) returns a utility value same as Max-Value but with roles of α, β reversed Chapter 6 17

18 Pruning does not affect final result Properties of α β Good move ordering improves effectiveness of pruning A simple example of the value of reasoning about which computations are relevant (a form of metareasoning) Chapter 6 18

19 Resource limits Standard approach: Use Cutoff-Test instead of Terminal-Test e.g., depth limit (perhaps add quiescence search) Use Eval instead of Utility i.e., evaluation function that estimates desirability of position Suppose we have 100 seconds, explore 10 4 nodes/second 10 6 nodes per move 35 8/2 α β reaches depth 8 pretty good chess program Chapter 6 19

20 Evaluation functions For chess, typically linear weighted sum of features Eval(s) = w 1 f 1 (s) + w 2 f 2 (s) w n f n (s) e.g., w 1 = 9 with f 1 (s) = (number of white queens) (number of black queens), etc. Chapter 6 20

21 Upper Confidence Tree (UCT) for games Standard backup updates all parents of v l as n(v) n(v) + 1 Q(v) Q(v) + (count how often has it been played) (sum of rewards received) In games use a negamax backup: While iterating upward, flip sign in each iteration Survey of MCTS applications: Browne et al.: A Survey of Monte Carlo Tree Search Methods, Chapter 6 21

22 Brief notes on game theory (Small) zero-sum games can be represented by a payoff matrix U ji denotes the utility of player 1 if she chooses the pure (=deterministic) strategy i and player 2 chooses the pure strategy j. Zero-sum games: U ji = U ij, U T = U Fining a minimax optimal mixed strategy p is a Linear Program max w s.t. Up w, p i = 1, p 0 w Note that Up w implies min j (Up) j w. i Gainable payoff of player 1: max p min q q T Up Minimax-Theorem: max p min q q T Up = min q max p q T Up Minimax-Theorem optimal p with w 0 exists Chapter 6 22

Artificial Intelligence

Artificial Intelligence Bandits, MCTS, & Games Marc Toussaint University of Stuttgart Winter 2015/16 Multi-armed Bandits There are n machines Each machine i returns a reward y P (y; θ i ) The machine s