Minimax strategies, alpha beta pruning. Lirong Xia

Minimax strategies, alpha beta pruning Lirong Xia

Reminder ØProject 1 due tonight Makes sure you DO NOT SEE ERROR: Summation of parsed points does not match ØProject 2 due in two weeks 2

How to find good heuristics? ØNo really mechanical way art more than science ØGeneral guideline: relaxing constraints e.g. Pacman can pass through the walls ØMimic what you would do 3

Arc Consistency of a CSP Ø A simple form of propagation makes sure all arcs are consistent: X X X Delete Ø If V loses a value, neighbors of V need to be rechecked! from tail! Ø Arc consistency detects failure earlier than forward checking Ø Can be run as a preprocessor or after each assignment Ø Might be time-consuming 4

Limitations of Arc Consistency ØAfter running arc consistency: Can have one solution left Can have multiple solutions left Can have no solutions left (and not know it) 5

Sum to 2 game Ø Player 1 moves, then player 2, finally player 1 again Ø Move = 0 or 1 Ø Player 1 wins if and only if all moves together sum to 2 Player 1 0 1 Player 2 Player 2 0 1 0 1 Player 1 Player 1 Player 1 Player 1 0 1 0 1 0 1 0 1-1 -1-1 1-1 1 1-1 Player 1 s utility is in the leaves; player 2 s utility is the negative of this

Today s schedule ØAdversarial game ØMinimax search ØAlpha-beta pruning algorithm 7

Adversarial Games Ø Deterministic, zero-sum games: Tic-tac-toe, chess, checkers The MAX player maximizes result The MIN player minimizes result Ø Minimax search: A search tree Players alternate turns Each node has a minimax value: best achievable utility against a rational adversary 8

Computing Minimax Values Ø This is DFS Ø Two recursive functions: max-value maxes the values of successors min-value mins the values of successors Ø Def value (state): If the state is a terminal state: return the state s utility If the agent at the state is MAX: return max-value(state) If the agent at the state is MIN: return min-value(state) Ø Def max-value(state): Initialize max = - For each successor of state: Compute value(successor) Update max accordingly return max Ø Def min-value(state): similar to max-value 9

Minimax Example 3 3 2 2 10

Tic-tac-toe Game Tree 11

Renju 15*15 5 horizontal, vertical, or diagonal in a row win no double-3 or double-4 moves for black otherwise black s winning strategy was computed L. Victor Allis 1994 (PhD thesis) 12

Ø Time complexity? ( m ) Ob Ø Space complexity? Minimax Properties Obm ( ) Ø For chess, Exact solution is completely infeasible b 35, m 100 But, do we need to explore the whole tree? 13

Ø Cannot search to leaves Ø Depth-limited search Resource Limits Instead, search a limited depth of tree Replace terminal utilities with an evaluation function for non-terminal positions Ø Guarantee of optimal play is gone 14

Evaluation Functions Ø Functions which scores non-terminals Ø Ideal function: returns the minimax utility of the position Ø In practice: typically weighted linear sum of features: Ø e.g. Evals s ( ) = w 1 f ( 1 s) + w 2 f ( 2 s) + + w n f ( n s) f ( s ) = (# white queens - # black queens) 1, etc. 15

Minimax with limited depth ØSuppose you are the MAX player ØGiven a depth d and current state ØCompute value(state, d) that reaches depth d at depth d, use a evaluation function to estimate the value if it is non-terminal 16

Improving minimax: pruning 17

Pruning in Minimax Search ØAn ancestor is a MAX node already has an option than my current solution my future solution can only be smaller 18

Alpha-beta pruning Ø Pruning = cutting off parts of the search tree (because you realize you don t need to look at them) When we considered A* we also pruned large parts of the search tree Ø Maintain α = value of the best option for the MAX player along the path so far β = value of the best option for the MIN player along the path so far Initialized to be α = - and β = + Ø Maintain and update α and β for each node α is updated at MAX player s nodes β is updated at MIN player s nodes

Alpha-Beta Pruning Ø General configuration We re computing the MIN-VALUE at n We re looping over n s children n s value estimate is dropping α is the best value that MAX can get at any choice point along the current path If n becomes worse than α, MAX will avoid it, so can stop considering n s other children Define β similarly for MIN α is usually smaller than β Once α >= β, return to the upper layer 20

Alpha-Beta Pruning Example α β is MAX s best alternative here or above is MIN s best alternative here or above 21

Alpha-Beta Pruning Example starting α/ β α = - β = + raising α α = - β = + α = - β = + α = 3 β = + α = 3 β = + lowering β α = - β = + α = - β = 3 α = - β = 3 α = - β = 3 α = 3 β = + α = 3 β = 2 α = 3 β = + α = 3 β = 14 α = 3 β = 5 α = 3 β = 1 raising α α = - β = 3 α = 8 β = 3 α β is MAX s best alternative here or above is MIN s best alternative here or above 22

Alpha-Beta Pseudocode 23

Alpha-Beta Pruning Properties Ø This pruning has no effect on final result at the root Ø Values of intermediate nodes might be wrong! Important: children of the root may have the wrong value Ø Good children ordering improves effectiveness of pruning Ø With perfect ordering : Time complexity drops to O(b m/2 ) Doubles solvable depth! Your action looks smarter: more forward-looking with good evaluation function Full search of, e.g. chess, is still hopeless 24

Project 2 ØQ1: write an evaluation function for (state,action) pairs the evaluation function is for this question only ØQ2: minimax search with arbitrary depth and multiple MIN players (ghosts) evaluation function on states has been implemented for you ØQ3: alpha-beta pruning with arbitrary depth and multiple MIN players (ghosts) 25

Recap ØMinimax search with limited depth evaluation function ØAlpha-beta pruning ØProject 1 due midnight today ØProject 2 due in two weeks 26