Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game

Size: px

Start display at page:

Download "Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game"

Amberly Fisher
5 years ago
Views:

1 Summary rtificial Intelligence and its applications Lecture 4 Game Playing Search onstraint Satisfaction Problems From start state to goal state onsider constraints Professor Daniel Yeung danyeung@ieee.org Dr. Patrick han patrickchan@ieee.org South hina University of Technology, hina Difficulty Game Playing Markov Decision Processes onsider an adversary onsider an uncertainty Reinforcement Learning No information is given 1 2 genda Expected Value Expected Max lgorithm Minimax lgorithm lpha-beta Pruning Simultaneous Game Games Intelligence opponents in game 3 4

Search and Game Information of Game Search Problem Independent from your decision Games 5 ompetition with an adversary Game opponent acts according to your decision Decision should be made in limited

2 Search and Game Information of Game Search Problem Independent from your decision Games 5 ompetition with an adversary Game opponent acts according to your decision Decision should be made in limited time in game: approximation is needed traditional hallmark of intelligence Model for many applications: Military confrontations, negotiation, auctions, State: s s start : starting state IsEnd(s): whether s is an end state (game over) ctions(s): possible actions from state s Succ(s, a): resulting state if choose action a in state s (s): agent s utility for end state s Player(s) ϵ Players: player who controls state s Players = {agent, opp} Information of Game Example Question Players: State s: IsEnd(s): ctions(s): {white, black} position of all pieces checkmate or draw legal chess moves that Player(s) can make Rules: You choose one of the bin Your opponent chooses a number in your chosen bin Your goal is to maximize the chosen number (s): if white wins 0 if draw, 1 if black wins Depend on attitude of opponent Stochastic (base on probability) gainst you e helpful (unlikely) 7 8

3 Expected Value Similar to a search problem, build a tree hance node is denoted by a circle Take an action with probability Each node is a decision point for a player you opponent outcome Each root-to-leaf path is a possible outcome of the game Expected Value Stochastic policies p (s, a) [0, 1] Probability of player p taking action a in state s For a Two-player Game Value of the game V agent,opp (s) agent opp agent Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) 9 10 Expected Value Example 1 Expected Value V agent,opp (s) Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) V agent,opp (s) Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) ssume agent(s start,)= agent (S start,)= agent (S start,)=1/3 opp(s, L) = opp (s, R) = 0.5, for any s What is the value of the game? V agent,opp (S 1 ) 1/3 1/3 = 1/3 V agent,opp (S 11 ) + 1/3 V agent,opp (S 12 ) S 11 S 12 S /3 V agent,opp (S 13 ) = 1/3 x (0.5 x (-50) x (50)) + 1/3 x (0.5 x (1) x (3)) /3 x (0.5 x (-5) x (15)) = 14/3 11 S /3 ssume agent(s start, ) = 1 opp(s, L) = opp (s, R) = 0.5, for any s What is the value of the game: V agent,opp (S 1 ) = 1 V agent,opp (S 11 ) + 0 V agent,opp (S 12 ) + 0 V agent,opp (S 13 ) = 1 x (0.5 x (-50) x (50)) + 0 x (0.5 x (1) x (3)) + 0 x (0.5 x (-5) x (15)) = S S 11 S 12 S 13

4 Expected Value Example 3 ssume opp(s 11, L) = 0.4, opp (s 11, R) = 0. opp(s 12, L) = 0.5, opp (s 12, R) = 0.5 opp(s 13, L) = 0.2, opp (s 13, R) = 0.8 Which action we should choose? 13 V agent,opp () = 10 V agent,opp () = 2 V agent,opp () = 11 We should take the one with the maximum value ction 10 S ??? S 11 S 12 2 S Expected Max lgorithm Expected Max lgorithm selects action maximizing value over actions Max nodes is denoted by an upward-pointing triangle V max,opp (s) 14 agent opp agent max aϵctions(s) V max,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V max,opp (Succ(s,a)) Expected Max lgorithm Example ssume opp(s, L) = opp (s, R) = 0.5, for any s Which action an agent will choose in Expected Max lgorithm? max aϵctions(s) V max,opp (Succ(s,a)) ction = V max,opp (S start ) = 20 5?? ? 10 Minimax Unfortunately, we never know what our opponent will do ssume they take action randomly may be too optimistic The worst case should be considered 15 1

5 Minimax Minimax assumes opponent selects the worst action to an agent Minimax Example 1 Which action an agent will choose in minimax? agent opp agent min aϵctions(s) V max,min (Succ(s,a)) max aϵctions(s) V max,min (Succ(s,a)) Min nodes is denoted by an upside-down triangle V max,min (s) max aϵctions(s) V max,min (Succ(s,a)) min aϵctions(s) V max,min (Succ(s,a)) ction = V max,min (S start ) = Node Summary Example Which action an agent will choose in minimax? hance node weighted sum Max node max Min node min

6 New Game: Rules You choose one of the three bins Then Flip a coin; if heads, then move one bin to the left (with wrap around) If not, just stick on your choice Your opponent chooses a number from that bin Your goal is to maximize the chosen number Now, we have three parties Players = {agent, opp, coin} V max,coin,min (s) agent coin opp max aϵctions(s) V max,coin,min (Succ(s,a)) min aϵctions(s) V max,coin,min (Succ(s,a)) agent Σ aϵctions(s) π coin (s,a) V max,coin,min (Succ(s,a)) Player(s) = coin You choose one of the three bins Flip a coin; if heads, then move one bin to the left (with wrap around) If not, just stick on your choice Your opponent chooses a number from that bin Your goal is to maximize the chosen number V max,coin,min (s)= max( E(min(-50,50), min(-5,15)), E(min(1,3), min(-50,50), E(min(-5,15), min(1,3) ) = max(e(-50, -5), E(1, -50), E(-5,1))= max(-27.5,-24.5,-2) -2 =

Time omplexity Time omplexity fter a game is modeled as a tree, the search technique can be used omplexity: Space: O(d) Time: O(b2d) However, even a simple game like Tic Tac Toe, the tree is very

7 Time omplexity Time omplexity fter a game is modeled as a tree, the search technique can be used omplexity: Space: O(d) Time: O(b2d) However, even a simple game like Tic Tac Toe, the tree is very complicated where b: branching factor, d: depth Example: hess b 35, d 50 long path to get the utility Time omplexity = Time / Space complexity is large in practice utility dvanced Method dvanced Methods Evaluation Function Original How to speed up minimax? Evaluation Functions Do not access TRUE utility but approximate it use domain-specific knowledge lpha-beta Pruning general-purpose Ignore unnecessary path compute exact answer Evaluation Functions Sstart dmax s Very tall Send =??? Evaluation! = 1 (win) 27 28

dvanced Method Evaluation Function Limited depth tree search (stop at maximum depth d max ) Eval(s) evaluates the value of V

min aϵctions(s) V max,min (Succ(s,a), d-1) dvanced Method: Evaluation Function Example Example: hess Eval(s) = material +

Queen, R: rook, : bishop, N : Knight, P : Pawn : the difference in due to the move Mobility: 0.

30 dvanced Method lpha-beta Pruning dvanced Method lpha-beta Pruning In some cases, visiting some branches is not necessary

cannot be more than 2 No need to further investigate 3 3 5 2 10 2 Prune a node if its value is not in the interval bounded by

8 dvanced Method Evaluation Function Limited depth tree search (stop at maximum depth d max ) Eval(s) evaluates the value of V max,min (s) at d max (may be very inaccurate) V max,min (s,d) d max Eval(s) d=0 max aϵctions(s) V max,min (Succ(s,a), d-1) min aϵctions(s) V max,min (Succ(s,a), d-1) dvanced Method: Evaluation Function Example Example: hess Eval(s) = material + mobility + king-safety + center-control Material: (K K ) + 9(Q Q ) + 5(R R )+ 3( ) + 3(N N ) + 1(P P ) K : King, Q : Queen, R: rook, : bishop, N : Knight, P : Pawn : the difference in due to the move Mobility: 0.1 x (legal_move# - legal_move# ) King-safety: keeping the king safe is good enter-control: control the center of the board dvanced Method lpha-beta Pruning dvanced Method lpha-beta Pruning In some cases, visiting some branches is not necessary in minimax algorithm For example s opp always take minimal value, after finding utility 2, the value of action on the right cannot be more than 2 No need to further investigate Prune a node if its value is not in the interval bounded by and, (i.e. ~( ), v is value of node) where a s : lower bound on value of max node s where b s : upper bound on value of min node s No need to check this branch 31 32

9 dvanced Method: lpha-beta Pruning Example 1 dvanced Method: lpha-beta Pruning Example: The last node can be pruned α α β No overlap with every ancestor where a s : lower bound on value of max node s No need to check this branch as the value cannot be bigger than 8 3 Still need to check the rest as the value may be equal to 8 where b s : upper bound on value of min node s dvanced Method: lpha-beta Pruning dvanced Method: lpha-beta Pruning Example Prune here!

Simultaneous Game Turn-based Games Simultaneous Game

Example Simultaneous Game Type of Strategy

2-3 -3 4 Pure Strategy lways do the same action If

lways 1: π = [1, 0] General Strategy Take an action

10 Simultaneous Game Turn-based Games Simultaneous Game Simultaneous Game Example Two-finger Morra Rules Players and each show 1 or 2 fingers. If both show 1, gives 2 dollars. If both show 2, gives 4 dollars. Otherwise, gives 3 dollars Simultaneous Game Example Simultaneous Game Type of Strategy If both show 1, gives 2 dollars. If both show 2, gives 4 dollars. Otherwise, gives 3 dollars V(a,b), where a,bϵctions Pure Strategy lways do the same action If π(b) = 1 and π(a) = 0, where b a and a ϵ ctions E.g. lways 1: π = [1, 0] General Strategy Take an action with probability 0 π(a) 1 for a ϵ ctions E.g. Uniformly random: π = [ 1/2, 1/2] 39 40

Simultaneous Game Expected Value Summary Value of the game if follows π and follows π

Satisfaction Problems onsider constraints Example: π = [1, 0], π = [1/2, 1/2] V(π, π

1/2 x 4) = -1/2 2-3 -3 4 Difficulty Game Playing Markov Decision Processes

11 Simultaneous Game Expected Value Summary Value of the game if follows π and follows π is Search From start state to goal state V(π, π ) = Σ a,b π (a)π (b) V(a,b) onstraint Satisfaction Problems onsider constraints Example: π = [1, 0], π = [1/2, 1/2] V(π, π ) = Σ a,b π (a)π (b) V(a,b) = (1 x 1/2 x 2) + (1 x 1/2 x -3) + (0 x 1/2 x -3) + (0 x 1/2 x 4) = -1/ Difficulty Game Playing Markov Decision Processes Reinforcement Learning onsider an adversary onsider an uncertainty No information is given 41 42

CS 4100 // artificial intelligence. Recap/midterm review!

CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks