Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game

Summary rtificial Intelligence and its applications Lecture 4 Game Playing Search onstraint Satisfaction Problems From start state to goal state onsider constraints Professor Daniel Yeung danyeung@ieee.org Dr. Patrick han patrickchan@ieee.org South hina University of Technology, hina Difficulty Game Playing Markov Decision Processes onsider an adversary onsider an uncertainty Reinforcement Learning No information is given 1 2 genda Expected Value Expected Max lgorithm Minimax lgorithm lpha-beta Pruning Simultaneous Game Games Intelligence opponents in game 3 4

Search and Game Information of Game Search Problem Independent from your decision Games 5 ompetition with an adversary Game opponent acts according to your decision Decision should be made in limited time in game: approximation is needed traditional hallmark of intelligence Model for many applications: Military confrontations, negotiation, auctions, State: s s start : starting state IsEnd(s): whether s is an end state (game over) ctions(s): possible actions from state s Succ(s, a): resulting state if choose action a in state s (s): agent s utility for end state s Player(s) ϵ Players: player who controls state s Players = {agent, opp} Information of Game Example Question Players: State s: IsEnd(s): ctions(s): {white, black} position of all pieces checkmate or draw legal chess moves that Player(s) can make Rules: You choose one of the bin Your opponent chooses a number in your chosen bin Your goal is to maximize the chosen number (s): -50 50 +1 if white wins 0 if draw, 1 if black wins Depend on attitude of opponent Stochastic (base on probability) gainst you e helpful (unlikely) 7 8

Expected Value Similar to a search problem, build a tree -5050 hance node is denoted by a circle Take an action with probability Each node is a decision point for a player you opponent -50 50 outcome Each root-to-leaf path is a possible outcome of the game Expected Value Stochastic policies p (s, a) [0, 1] Probability of player p taking action a in state s For a Two-player Game Value of the game V agent,opp (s) agent opp agent Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) 9 10 Expected Value Example 1 Expected Value V agent,opp (s) Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) V agent,opp (s) Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) ssume agent(s start,)= agent (S start,)= agent (S start,)=1/3 opp(s, L) = opp (s, R) = 0.5, for any s What is the value of the game? V agent,opp (S 1 ) 1/3 1/3 = 1/3 V agent,opp (S 11 ) + 1/3 V agent,opp (S 12 ) S 11 S 12 S 13 + 1/3 V agent,opp (S 13 ) 0.5 0.5 0.5 0.5 0.5 0.5 = 1/3 x (0.5 x (-50) + 0.5 x (50)) + 1/3 x (0.5 x (1) + 0.5 x (3)) + -50 50 1/3 x (0.5 x (-5) + 0.5 x (15)) = 14/3 11 S 1-5050 1/3 ssume agent(s start, ) = 1 opp(s, L) = opp (s, R) = 0.5, for any s What is the value of the game: V agent,opp (S 1 ) = 1 V agent,opp (S 11 ) + 0 V agent,opp (S 12 ) + 0 V agent,opp (S 13 ) = 1 x (0.5 x (-50) + 0.5 x (50)) + 0 x (0.5 x (1) + 0.5 x (3)) + 0 x (0.5 x (-5) + 0.5 x (15)) = 0 12 1 0 S 1-5050 0.5 0.5 0.5 0.5 0.5 0.5-50 50 0 S 11 S 12 S 13

Expected Value Example 3 ssume opp(s 11, L) = 0.4, opp (s 11, R) = 0. opp(s 12, L) = 0.5, opp (s 12, R) = 0.5 opp(s 13, L) = 0.2, opp (s 13, R) = 0.8 Which action we should choose? 13 V agent,opp () = 10 V agent,opp () = 2 V agent,opp () = 11 We should take the one with the maximum value ction 10 S 1-5050??? S 11 S 12 2 S 13 11 0.4 0. 0.5 0.5 0.2 0.8-50 50 Expected Max lgorithm Expected Max lgorithm selects action maximizing value over actions Max nodes is denoted by an upward-pointing triangle V max,opp (s) 14 agent opp agent max aϵctions(s) V max,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V max,opp (Succ(s,a)) Expected Max lgorithm Example ssume opp(s, L) = opp (s, R) = 0.5, for any s Which action an agent will choose in Expected Max lgorithm? max aϵctions(s) V max,opp (Succ(s,a)) ction = V max,opp (S start ) = 20 5?? 20 20 0.5 0.5 0.5 0.5 0.5 0.5-10 20 10 30 5 15? 10 Minimax Unfortunately, we never know what our opponent will do ssume they take action randomly may be too optimistic The worst case should be considered 15 1

Minimax Minimax assumes opponent selects the worst action to an agent Minimax Example 1 Which action an agent will choose in minimax? agent opp agent min aϵctions(s) V max,min (Succ(s,a)) max aϵctions(s) V max,min (Succ(s,a)) Min nodes is denoted by an upside-down triangle V max,min (s) max aϵctions(s) V max,min (Succ(s,a)) min aϵctions(s) V max,min (Succ(s,a)) ction = V max,min (S start ) = 1-50 -50 50 1 1-5 17 18 Node Summary Example 1 2 3 1 Which action an agent will choose in minimax? 1 3 2 2 hance node weighted sum Max node max Min node min 2 5 2 7 10-50 2 7 5 8 2 7 2 5 10 22 19 20

New Game: Rules You choose one of the three bins Then Flip a coin; if heads, then move one bin to the left (with wrap around) If not, just stick on your choice Your opponent chooses a number from that bin Your goal is to maximize the chosen number -50 50 Now, we have three parties Players = {agent, opp, coin} V max,coin,min (s) agent coin opp max aϵctions(s) V max,coin,min (Succ(s,a)) min aϵctions(s) V max,coin,min (Succ(s,a)) agent Σ aϵctions(s) π coin (s,a) V max,coin,min (Succ(s,a)) Player(s) = coin 21 22 You choose one of the three bins Flip a coin; -5050 if heads, then move one bin to the left (with wrap around) If not, just stick on your choice Your opponent chooses a number from that bin Your goal is to maximize the chosen number V max,coin,min (s)= max( E(min(-50,50), min(-5,15)), -5050 E(min(1,3), min(-50,50), E(min(-5,15), min(1,3) ) = max(e(-50, -5), E(1, -50), E(-5,1))= max(-27.5,-24.5,-2) -2 = -2-27.5-24.5-2 -50 5 1-50 -5 1-50 50-50 50-50 50-50 50 23 24

Time omplexity Time omplexity fter a game is modeled as a tree, the search technique can be used omplexity: Space: O(d) Time: O(b2d) However, even a simple game like Tic Tac Toe, the tree is very complicated where b: branching factor, d: depth Example: hess b 35, d 50 long path to get the utility Time omplexity = Time / Space complexity is large in practice 255155207298852924121150151425587 3019041448811019324177784407714 7258239937358437329870435557897823 3195377353285543297897750743 9318774414025 utility 25 2 https://commons.wikimedia.org/wiki/file:tic-tac-toe-full-game-tree-x-rational.jpg dvanced Method dvanced Methods Evaluation Function Original How to speed up minimax? Evaluation Functions Do not access TRUE utility but approximate it use domain-specific knowledge lpha-beta Pruning general-purpose Ignore unnecessary path compute exact answer Evaluation Functions Sstart dmax s Very tall Send =??? Evaluation! = 1 (win) 27 28

dvanced Method Evaluation Function Limited depth tree search (stop at maximum depth d max ) Eval(s) evaluates the value of V max,min (s) at d max (may be very inaccurate) V max,min (s,d) d max Eval(s) d=0 max aϵctions(s) V max,min (Succ(s,a), d-1) min aϵctions(s) V max,min (Succ(s,a), d-1) dvanced Method: Evaluation Function Example Example: hess Eval(s) = material + mobility + king-safety + center-control Material: 10 100 (K K ) + 9(Q Q ) + 5(R R )+ 3( ) + 3(N N ) + 1(P P ) K : King, Q : Queen, R: rook, : bishop, N : Knight, P : Pawn : the difference in due to the move Mobility: 0.1 x (legal_move# - legal_move# ) King-safety: keeping the king safe is good enter-control: control the center of the board 29 30 dvanced Method lpha-beta Pruning dvanced Method lpha-beta Pruning In some cases, visiting some branches is not necessary in minimax algorithm For example s opp always take minimal value, after finding utility 2, the value of action on the right cannot be more than 2 No need to further investigate 3 3 5 2 10 2 Prune a node if its value is not in the interval bounded by and, (i.e. ~( ), v is value of node) where a s : lower bound on value of max node s where b s : upper bound on value of min node s No need to check this branch 31 32

dvanced Method: lpha-beta Pruning Example 1 dvanced Method: lpha-beta Pruning Example: The last node can be pruned α α β 5 7 97 83 2 4 8 No overlap with every ancestor where a s : lower bound on value of max node s 8 4 9 7 No need to check this branch as the value cannot be bigger than 8 3 Still need to check the rest as the value may be equal to 8 where b s : upper bound on value of min node s 33 34 dvanced Method: lpha-beta Pruning dvanced Method: lpha-beta Pruning Example 3 7 9 9 7 7 9 7 8 3 7 8 3 7 8 9 7 8 3 7 7 3 9 7 8 3 9 20 50-2 -2 99 1 1 5 5 1 7 7 7 7 3 50 20-2 15 50-50 5 15 35 9 7 8 3 Prune here! 9 7 8 3 3

Simultaneous Game Turn-based Games Simultaneous Game Simultaneous Game Example Two-finger Morra Rules Players and each show 1 or 2 fingers. If both show 1, gives 2 dollars. If both show 2, gives 4 dollars. Otherwise, gives 3 dollars 37 38 Simultaneous Game Example Simultaneous Game Type of Strategy If both show 1, gives 2 dollars. If both show 2, gives 4 dollars. Otherwise, gives 3 dollars V(a,b), where a,bϵctions 2-3 -3 4 Pure Strategy lways do the same action If π(b) = 1 and π(a) = 0, where b a and a ϵ ctions E.g. lways 1: π = [1, 0] General Strategy Take an action with probability 0 π(a) 1 for a ϵ ctions E.g. Uniformly random: π = [ 1/2, 1/2] 39 40

Simultaneous Game Expected Value Summary Value of the game if follows π and follows π is Search From start state to goal state V(π, π ) = Σ a,b π (a)π (b) V(a,b) onstraint Satisfaction Problems onsider constraints Example: π = [1, 0], π = [1/2, 1/2] V(π, π ) = Σ a,b π (a)π (b) V(a,b) = (1 x 1/2 x 2) + (1 x 1/2 x -3) + (0 x 1/2 x -3) + (0 x 1/2 x 4) = -1/2 2-3 -3 4 Difficulty Game Playing Markov Decision Processes Reinforcement Learning onsider an adversary onsider an uncertainty No information is given 41 42