Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game

Size: px
Start display at page:

Download "Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game"

Transcription

1 Summary rtificial Intelligence and its applications Lecture 4 Game Playing Search onstraint Satisfaction Problems From start state to goal state onsider constraints Professor Daniel Yeung danyeung@ieee.org Dr. Patrick han patrickchan@ieee.org South hina University of Technology, hina Difficulty Game Playing Markov Decision Processes onsider an adversary onsider an uncertainty Reinforcement Learning No information is given 1 2 genda Expected Value Expected Max lgorithm Minimax lgorithm lpha-beta Pruning Simultaneous Game Games Intelligence opponents in game 3 4

2 Search and Game Information of Game Search Problem Independent from your decision Games 5 ompetition with an adversary Game opponent acts according to your decision Decision should be made in limited time in game: approximation is needed traditional hallmark of intelligence Model for many applications: Military confrontations, negotiation, auctions, State: s s start : starting state IsEnd(s): whether s is an end state (game over) ctions(s): possible actions from state s Succ(s, a): resulting state if choose action a in state s (s): agent s utility for end state s Player(s) ϵ Players: player who controls state s Players = {agent, opp} Information of Game Example Question Players: State s: IsEnd(s): ctions(s): {white, black} position of all pieces checkmate or draw legal chess moves that Player(s) can make Rules: You choose one of the bin Your opponent chooses a number in your chosen bin Your goal is to maximize the chosen number (s): if white wins 0 if draw, 1 if black wins Depend on attitude of opponent Stochastic (base on probability) gainst you e helpful (unlikely) 7 8

3 Expected Value Similar to a search problem, build a tree hance node is denoted by a circle Take an action with probability Each node is a decision point for a player you opponent outcome Each root-to-leaf path is a possible outcome of the game Expected Value Stochastic policies p (s, a) [0, 1] Probability of player p taking action a in state s For a Two-player Game Value of the game V agent,opp (s) agent opp agent Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) 9 10 Expected Value Example 1 Expected Value V agent,opp (s) Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) V agent,opp (s) Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) ssume agent(s start,)= agent (S start,)= agent (S start,)=1/3 opp(s, L) = opp (s, R) = 0.5, for any s What is the value of the game? V agent,opp (S 1 ) 1/3 1/3 = 1/3 V agent,opp (S 11 ) + 1/3 V agent,opp (S 12 ) S 11 S 12 S /3 V agent,opp (S 13 ) = 1/3 x (0.5 x (-50) x (50)) + 1/3 x (0.5 x (1) x (3)) /3 x (0.5 x (-5) x (15)) = 14/3 11 S /3 ssume agent(s start, ) = 1 opp(s, L) = opp (s, R) = 0.5, for any s What is the value of the game: V agent,opp (S 1 ) = 1 V agent,opp (S 11 ) + 0 V agent,opp (S 12 ) + 0 V agent,opp (S 13 ) = 1 x (0.5 x (-50) x (50)) + 0 x (0.5 x (1) x (3)) + 0 x (0.5 x (-5) x (15)) = S S 11 S 12 S 13

4 Expected Value Example 3 ssume opp(s 11, L) = 0.4, opp (s 11, R) = 0. opp(s 12, L) = 0.5, opp (s 12, R) = 0.5 opp(s 13, L) = 0.2, opp (s 13, R) = 0.8 Which action we should choose? 13 V agent,opp () = 10 V agent,opp () = 2 V agent,opp () = 11 We should take the one with the maximum value ction 10 S ??? S 11 S 12 2 S Expected Max lgorithm Expected Max lgorithm selects action maximizing value over actions Max nodes is denoted by an upward-pointing triangle V max,opp (s) 14 agent opp agent max aϵctions(s) V max,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V max,opp (Succ(s,a)) Expected Max lgorithm Example ssume opp(s, L) = opp (s, R) = 0.5, for any s Which action an agent will choose in Expected Max lgorithm? max aϵctions(s) V max,opp (Succ(s,a)) ction = V max,opp (S start ) = 20 5?? ? 10 Minimax Unfortunately, we never know what our opponent will do ssume they take action randomly may be too optimistic The worst case should be considered 15 1

5 Minimax Minimax assumes opponent selects the worst action to an agent Minimax Example 1 Which action an agent will choose in minimax? agent opp agent min aϵctions(s) V max,min (Succ(s,a)) max aϵctions(s) V max,min (Succ(s,a)) Min nodes is denoted by an upside-down triangle V max,min (s) max aϵctions(s) V max,min (Succ(s,a)) min aϵctions(s) V max,min (Succ(s,a)) ction = V max,min (S start ) = Node Summary Example Which action an agent will choose in minimax? hance node weighted sum Max node max Min node min

6 New Game: Rules You choose one of the three bins Then Flip a coin; if heads, then move one bin to the left (with wrap around) If not, just stick on your choice Your opponent chooses a number from that bin Your goal is to maximize the chosen number Now, we have three parties Players = {agent, opp, coin} V max,coin,min (s) agent coin opp max aϵctions(s) V max,coin,min (Succ(s,a)) min aϵctions(s) V max,coin,min (Succ(s,a)) agent Σ aϵctions(s) π coin (s,a) V max,coin,min (Succ(s,a)) Player(s) = coin You choose one of the three bins Flip a coin; if heads, then move one bin to the left (with wrap around) If not, just stick on your choice Your opponent chooses a number from that bin Your goal is to maximize the chosen number V max,coin,min (s)= max( E(min(-50,50), min(-5,15)), E(min(1,3), min(-50,50), E(min(-5,15), min(1,3) ) = max(e(-50, -5), E(1, -50), E(-5,1))= max(-27.5,-24.5,-2) -2 =

7 Time omplexity Time omplexity fter a game is modeled as a tree, the search technique can be used omplexity: Space: O(d) Time: O(b2d) However, even a simple game like Tic Tac Toe, the tree is very complicated where b: branching factor, d: depth Example: hess b 35, d 50 long path to get the utility Time omplexity = Time / Space complexity is large in practice utility dvanced Method dvanced Methods Evaluation Function Original How to speed up minimax? Evaluation Functions Do not access TRUE utility but approximate it use domain-specific knowledge lpha-beta Pruning general-purpose Ignore unnecessary path compute exact answer Evaluation Functions Sstart dmax s Very tall Send =??? Evaluation! = 1 (win) 27 28

8 dvanced Method Evaluation Function Limited depth tree search (stop at maximum depth d max ) Eval(s) evaluates the value of V max,min (s) at d max (may be very inaccurate) V max,min (s,d) d max Eval(s) d=0 max aϵctions(s) V max,min (Succ(s,a), d-1) min aϵctions(s) V max,min (Succ(s,a), d-1) dvanced Method: Evaluation Function Example Example: hess Eval(s) = material + mobility + king-safety + center-control Material: (K K ) + 9(Q Q ) + 5(R R )+ 3( ) + 3(N N ) + 1(P P ) K : King, Q : Queen, R: rook, : bishop, N : Knight, P : Pawn : the difference in due to the move Mobility: 0.1 x (legal_move# - legal_move# ) King-safety: keeping the king safe is good enter-control: control the center of the board dvanced Method lpha-beta Pruning dvanced Method lpha-beta Pruning In some cases, visiting some branches is not necessary in minimax algorithm For example s opp always take minimal value, after finding utility 2, the value of action on the right cannot be more than 2 No need to further investigate Prune a node if its value is not in the interval bounded by and, (i.e. ~( ), v is value of node) where a s : lower bound on value of max node s where b s : upper bound on value of min node s No need to check this branch 31 32

9 dvanced Method: lpha-beta Pruning Example 1 dvanced Method: lpha-beta Pruning Example: The last node can be pruned α α β No overlap with every ancestor where a s : lower bound on value of max node s No need to check this branch as the value cannot be bigger than 8 3 Still need to check the rest as the value may be equal to 8 where b s : upper bound on value of min node s dvanced Method: lpha-beta Pruning dvanced Method: lpha-beta Pruning Example Prune here!

10 Simultaneous Game Turn-based Games Simultaneous Game Simultaneous Game Example Two-finger Morra Rules Players and each show 1 or 2 fingers. If both show 1, gives 2 dollars. If both show 2, gives 4 dollars. Otherwise, gives 3 dollars Simultaneous Game Example Simultaneous Game Type of Strategy If both show 1, gives 2 dollars. If both show 2, gives 4 dollars. Otherwise, gives 3 dollars V(a,b), where a,bϵctions Pure Strategy lways do the same action If π(b) = 1 and π(a) = 0, where b a and a ϵ ctions E.g. lways 1: π = [1, 0] General Strategy Take an action with probability 0 π(a) 1 for a ϵ ctions E.g. Uniformly random: π = [ 1/2, 1/2] 39 40

11 Simultaneous Game Expected Value Summary Value of the game if follows π and follows π is Search From start state to goal state V(π, π ) = Σ a,b π (a)π (b) V(a,b) onstraint Satisfaction Problems onsider constraints Example: π = [1, 0], π = [1/2, 1/2] V(π, π ) = Σ a,b π (a)π (b) V(a,b) = (1 x 1/2 x 2) + (1 x 1/2 x -3) + (0 x 1/2 x -3) + (0 x 1/2 x 4) = -1/ Difficulty Game Playing Markov Decision Processes Reinforcement Learning onsider an adversary onsider an uncertainty No information is given 41 42

CS 4100 // artificial intelligence. Recap/midterm review!

CS 4100 // artificial intelligence. Recap/midterm review! CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks

More information

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.

More information

Announcements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits

Announcements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits CS 188: Artificial Intelligence Fall 2009 Lecture 7: Expectimax Search 9/17/2008 Announcements Written 1: Search and CSPs is up Project 2: Multi-agent Search is up Want a partner? Come to the front after

More information

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games CS 188: Artificial Intelligence Spring 2009 Lecture 7: Expectimax Search 2/10/2009 John DeNero UC Berkeley Slides adapted from Dan Klein, Stuart Russell or Andrew Moore Announcements Written Assignment

More information

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence Fall Announcements CS 188: Artificial Intelligence Fall 2009 Lecture 7: Expectimax Search 9/17/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 Announcements Written

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Adversarial Search II Instructor: Anca Dragan University of California, Berkeley [These slides adapted from Dan Klein and Pieter Abbeel] Minimax Example 3 12 8 2 4 6 14

More information

IS-ZC444: ARTIFICIAL INTELLIGENCE

IS-ZC444: ARTIFICIAL INTELLIGENCE IS-ZC444: ARTIFICIAL INTELLIGENCE Lecture-07: Beyond Classical Search Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems, BITS Pilani, Pilani, Jhunjhunu-333031,

More information

Final. CS 188 Fall Introduction to Artificial Intelligence

Final. CS 188 Fall Introduction to Artificial Intelligence S 188 Fall 2012 Introduction to rtificial Intelligence Final You have approximately 3 hours. The exam is closed book, closed notes except your three one-page crib sheets. Please use non-programmable calculators

More information

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

Introduction to Spring 2009 Artificial Intelligence Midterm Exam S 188 Introduction to Spring 009 rtificial Intelligence Midterm Exam INSTRUTINS You have 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators

More information

POLYNOMIAL SPACE QSAT. Games. Polynomial space cont d

POLYNOMIAL SPACE QSAT. Games. Polynomial space cont d T-79.5103 / Autumn 2008 Polynomial Space 1 T-79.5103 / Autumn 2008 Polynomial Space 3 POLYNOMIAL SPACE Polynomial space cont d Polynomial space-bounded computation has a variety of alternative characterizations

More information

Minimax strategies, alpha beta pruning. Lirong Xia

Minimax strategies, alpha beta pruning. Lirong Xia Minimax strategies, alpha beta pruning Lirong Xia Reminder ØProject 1 due tonight Makes sure you DO NOT SEE ERROR: Summation of parsed points does not match ØProject 2 due in two weeks 2 How to find good

More information

Game playing. Chapter 6. Chapter 6 1

Game playing. Chapter 6. Chapter 6 1 Game playing Chapter 6 Chapter 6 1 Outline Minimax α β pruning UCT for games Chapter 6 2 Game tree (2-player, deterministic, turns) Chapter 6 3 Minimax Perfect play for deterministic, perfect-information

More information

Adversarial Search & Logic and Reasoning

Adversarial Search & Logic and Reasoning CSEP 573 Adversarial Search & Logic and Reasoning CSE AI Faculty Recall from Last Time: Adversarial Games as Search Convention: first player is called MAX, 2nd player is called MIN MAX moves first and

More information

Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.

Evaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode. CS 188: Artificial Intelligence Fall 2008 Evaluation for Pacman Lecture 7: Expectimax Search 9/18/2008 [DEMO: thrashing, smart ghosts] Dan Klein UC Berkeley Many slides over the course adapted from either

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 7: Expectimax Search 9/18/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 1 Evaluation for

More information

CSE 573: Artificial Intelligence

CSE 573: Artificial Intelligence CSE 573: Artificial Intelligence Autumn 2010 Lecture 5: Expectimax Search 10/14/2008 Luke Zettlemoyer Most slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements

More information

Introduction to Spring 2009 Artificial Intelligence Midterm Solutions

Introduction to Spring 2009 Artificial Intelligence Midterm Solutions S 88 Introduction to Spring 009 rtificial Intelligence Midterm Solutions. (6 points) True/False For the following questions, a correct answer is worth points, no answer is worth point, and an incorrect

More information

Algorithms for Playing and Solving games*

Algorithms for Playing and Solving games* Algorithms for Playing and Solving games* Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 * Two Player Zero-sum Discrete

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

CS221 Practice Midterm

CS221 Practice Midterm CS221 Practice Midterm Autumn 2012 1 ther Midterms The following pages are excerpts from similar classes midterms. The content is similar to what we ve been covering this quarter, so that it should be

More information

ENGR 200 ENGR 200. What did we do last week?

ENGR 200 ENGR 200. What did we do last week? ENGR 200 What did we do last week? Definition of probability xioms of probability Sample space robability laws Conditional probability ENGR 200 Lecture 3: genda. Conditional probability 2. Multiplication

More information

Adversarial Search. Christos Papaloukas, Iosif Angelidis. University of Athens November 2017

Adversarial Search. Christos Papaloukas, Iosif Angelidis. University of Athens November 2017 Adversarial Search Christos Papaloukas, Iosif Angelidis University of Athens November 2017 Christos P., Iosif A. Adversarial Search UoA, 2017 1 / 61 Main Aspects Formulation In order to perform an Adversarial

More information

Information, Utility & Bounded Rationality

Information, Utility & Bounded Rationality Information, Utility & Bounded Rationality Pedro A. Ortega and Daniel A. Braun Department of Engineering, University of Cambridge Trumpington Street, Cambridge, CB2 PZ, UK {dab54,pao32}@cam.ac.uk Abstract.

More information

Final. CS 188 Fall Introduction to Artificial Intelligence

Final. CS 188 Fall Introduction to Artificial Intelligence CS 188 Fall 2012 Introduction to Artificial Intelligence Final You have approximately 3 hours. The exam is closed book, closed notes except your three one-page crib sheets. Please use non-programmable

More information

CITS4211 Mid-semester test 2011

CITS4211 Mid-semester test 2011 CITS4211 Mid-semester test 2011 Fifty minutes, answer all four questions, total marks 60 Question 1. (12 marks) Briefly describe the principles, operation, and performance issues of iterative deepening.

More information

Playing Abstract games with Hidden States (Spatial and Non-Spatial).

Playing Abstract games with Hidden States (Spatial and Non-Spatial). Playing Abstract games with Hidden States (Spatial and Non-Spatial). Gregory Calbert, Hing-Wah Kwok Peter Smet, Jason Scholz, Michael Webb VE Group, C2D, DSTO. Report Documentation Page Form Approved OMB

More information

ASHANTHI MEENA SERALATHAN. Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr.

ASHANTHI MEENA SERALATHAN. Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr. USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS ASHANTHI MEENA SERALATHAN Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr. Stephen

More information

CSC 8301 Design & Analysis of Algorithms: Lower Bounds

CSC 8301 Design & Analysis of Algorithms: Lower Bounds CSC 8301 Design & Analysis of Algorithms: Lower Bounds Professor Henry Carter Fall 2016 Recap Iterative improvement algorithms take a feasible solution and iteratively improve it until optimized Simplex

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Uncertainty. Michael Peters December 27, 2013

Uncertainty. Michael Peters December 27, 2013 Uncertainty Michael Peters December 27, 20 Lotteries In many problems in economics, people are forced to make decisions without knowing exactly what the consequences will be. For example, when you buy

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics February 19, 2018 CS 361: Probability & Statistics Random variables Markov s inequality This theorem says that for any random variable X and any value a, we have A random variable is unlikely to have an

More information

CSC 1700 Analysis of Algorithms: P and NP Problems

CSC 1700 Analysis of Algorithms: P and NP Problems CSC 1700 Analysis of Algorithms: P and NP Problems Professor Henry Carter Fall 2016 Recap Algorithmic power is broad but limited Lower bounds determine whether an algorithm can be improved by more than

More information

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS 188 Fall 2018 Introduction to Artificial Intelligence Practice Final You have approximately 2 hours 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

Lecture 2 : CS6205 Advanced Modeling and Simulation

Lecture 2 : CS6205 Advanced Modeling and Simulation Lecture 2 : CS6205 Advanced Modeling and Simulation Lee Hwee Kuan 21 Aug. 2013 For the purpose of learning stochastic simulations for the first time. We shall only consider probabilities on finite discrete

More information

An Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1

An Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1 Proc. AAAI-94, to appear. An Analysis of Forward Pruning Stephen J. J. Smith Dana S. Nau Department of Computer Science Department of Computer Science, and University of Maryland Institute for Systems

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Q-learning Tutorial. CSC411 Geoffrey Roeder. Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava

Q-learning Tutorial. CSC411 Geoffrey Roeder. Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava Q-learning Tutorial CSC411 Geoffrey Roeder Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava Tutorial Agenda Refresh RL terminology through Tic Tac Toe Deterministic

More information

Exam EDAF May 2011, , Vic1. Thore Husfeldt

Exam EDAF May 2011, , Vic1. Thore Husfeldt Exam EDAF05 25 May 2011, 8.00 13.00, Vic1 Thore Husfeldt Instructions What to bring. You can bring any written aid you want. This includes the course book and a dictionary. In fact, these two things are

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes Name: Roll Number: Please read the following instructions carefully Ø Calculators are allowed. However, laptops or mobile phones are not

More information

Rollout-based Game-tree Search Outprunes Traditional Alpha-beta

Rollout-based Game-tree Search Outprunes Traditional Alpha-beta Journal of Machine Learning Research vol:1 8, 2012 Submitted 6/2012 Rollout-based Game-tree Search Outprunes Traditional Alpha-beta Ari Weinstein Michael L. Littman Sergiu Goschin aweinst@cs.rutgers.edu

More information

CMSC 474, Game Theory

CMSC 474, Game Theory CMSC 474, Game Theory 4b. Game-Tree Search Dana Nau University of Maryland Nau: Game Theory 1 Finite perfect-information zero-sum games! Finite: Ø finitely many agents, actions, states, histories! Perfect

More information

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes. CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers

More information

HSMC 2017 Free Response

HSMC 2017 Free Response HSMC 207 Free Response. What are the first three digits of the least common multiple of 234 and 360? Solution: 468. Note that 234 = 2 3 2 3, and 360 = 2 3 3 2 5. Thus, lcm =2 3 3 2 5 3 = 0 36 3 = 4680.

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Cyber Rodent Project Some slides from: David Silver, Radford Neal CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reinforcement Learning Supervised learning:

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 50 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015

More information

Introduction to Fall 2009 Artificial Intelligence Final Exam

Introduction to Fall 2009 Artificial Intelligence Final Exam CS 188 Introduction to Fall 2009 Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators

More information

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions. COMP 34120 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE AI and Games Date: Thursday 17th May 2018 Time: 09:45-11:45 Please answer all Questions. Use a SEPARATE answerbook for each SECTION

More information

Reinforcement Learning

Reinforcement Learning CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act

More information

Monopoly An Analysis using Markov Chains. Benjamin Bernard

Monopoly An Analysis using Markov Chains. Benjamin Bernard Monopoly An Analysis using Markov Chains Benjamin Bernard Columbia University, New York PRIME event at Pace University, New York April 19, 2017 Introduction Applications of Markov chains Applications of

More information

Final. Introduction to Artificial Intelligence. CS 188 Summer 2014

Final. Introduction to Artificial Intelligence. CS 188 Summer 2014 S 188 Summer 2014 Introduction to rtificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers ON

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Optimism in the Face of Uncertainty Should be Refutable

Optimism in the Face of Uncertainty Should be Refutable Optimism in the Face of Uncertainty Should be Refutable Ronald ORTNER Montanuniversität Leoben Department Mathematik und Informationstechnolgie Franz-Josef-Strasse 18, 8700 Leoben, Austria, Phone number:

More information

CS Communication Complexity: Applications and New Directions

CS Communication Complexity: Applications and New Directions CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the

More information

Probability theory and mathematical statistics:

Probability theory and mathematical statistics: N.I. Lobachevsky State University of Nizhni Novgorod Probability theory and mathematical statistics: Geometric probability Practice Associate Professor A.V. Zorine Geometric probability Practice 1 / 7

More information

Mock Exam Künstliche Intelligenz-1. Different problems test different skills and knowledge, so do not get stuck on one problem.

Mock Exam Künstliche Intelligenz-1. Different problems test different skills and knowledge, so do not get stuck on one problem. Name: Matriculation Number: Mock Exam Künstliche Intelligenz-1 January 9., 2017 You have one hour(sharp) for the test; Write the solutions to the sheet. The estimated time for solving this exam is 53 minutes,

More information

Decision Trees. Tirgul 5

Decision Trees. Tirgul 5 Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2

More information

The Reinforcement Learning Problem

The Reinforcement Learning Problem The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence

More information

Learning in Zero-Sum Team Markov Games using Factored Value Functions

Learning in Zero-Sum Team Markov Games using Factored Value Functions Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer

More information

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands

More information

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6 Solving Zero-Sum Extensive-Form Games ranislav ošanský E4M36MS, Fall 2013, Lecture 6 Imperfect Information EFGs States Players 1 2 Information Set ctions Utility Solving II Zero-Sum EFG with perfect recall

More information

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage. Course basics CSE 190: Reinforcement Learning: An Introduction The website for the class is linked off my homepage. Grades will be based on programming assignments, homeworks, and class participation.

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

Introduction to Fall 2011 Artificial Intelligence Final Exam

Introduction to Fall 2011 Artificial Intelligence Final Exam CS 188 Introduction to Fall 2011 rtificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except two pages of crib sheets. Please use non-programmable calculators

More information

CSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1

CSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1 CSC304 Lecture 5 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Recap Last lecture Cost-sharing games o Price of anarchy (PoA) can be n o Price of stability (PoS) is O(log n)

More information

Written examination: Solution suggestions TIN175/DIT411, Introduction to Artificial Intelligence

Written examination: Solution suggestions TIN175/DIT411, Introduction to Artificial Intelligence Written examination: Solution suggestions TIN175/DIT411, Introduction to Artificial Intelligence Question 1 had completely wrong alternatives, and cannot be answered! Therefore, the grade limits was lowered

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Introduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming

Introduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming Introduction to Reinforcement Learning Part 6: Core Theory II: Bellman Equations and Dynamic Programming Bellman Equations Recursive relationships among values that can be used to compute values The tree

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous Helicopter Flight via Reinforcement Learning Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy

More information

Practicable Robust Markov Decision Processes

Practicable Robust Markov Decision Processes Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)

More information

General-sum games. I.e., pretend that the opponent is only trying to hurt you. If Column was trying to hurt Row, Column would play Left, so

General-sum games. I.e., pretend that the opponent is only trying to hurt you. If Column was trying to hurt Row, Column would play Left, so General-sum games You could still play a minimax strategy in general- sum games I.e., pretend that the opponent is only trying to hurt you But this is not rational: 0, 0 3, 1 1, 0 2, 1 If Column was trying

More information

algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Algorithms 2012, 5, ; doi: /a

algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Algorithms 2012, 5, ; doi: /a Algorithms 01, 5, 51-58; doi:10.3390/a504051 Article OPEN ACCESS algorithms ISSN 1999-4893 www.mdpi.com/journal/algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Ashraf M. Abdelbar

More information

Name: UW CSE 473 Final Exam, Fall 2014

Name: UW CSE 473 Final Exam, Fall 2014 P1 P6 Instructions Please answer clearly and succinctly. If an explanation is requested, think carefully before writing. Points may be removed for rambling answers. If a question is unclear or ambiguous,

More information

Machine Learning 3. week

Machine Learning 3. week Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which

More information

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396 Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Properties of Forward Pruning in Game-Tree Search

Properties of Forward Pruning in Game-Tree Search Properties of Forward Pruning in Game-Tree Search Yew Jin Lim and Wee Sun Lee School of Computing National University of Singapore {limyewji,leews}@comp.nus.edu.sg Abstract Forward pruning, or selectively

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Enochian Chess for Divination

Enochian Chess for Divination Enochian Chess for Divination Some have claimed the chess actually originated from the Tarot; there being no way to authenticate this. However, Enochian Chess can be used as a divinatory tool, by determining

More information

Chapter 6: Temporal Difference Learning

Chapter 6: Temporal Difference Learning Chapter 6: emporal Difference Learning Objectives of this chapter: Introduce emporal Difference (D) learning Focus first on policy evaluation, or prediction, methods hen extend to control methods R. S.

More information

CS 237: Probability in Computing

CS 237: Probability in Computing CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 11: Geometric Distribution Poisson Process Poisson Distribution Geometric Distribution The Geometric

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for solving

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

More information

Probability. VCE Maths Methods - Unit 2 - Probability

Probability. VCE Maths Methods - Unit 2 - Probability Probability Probability Tree diagrams La ice diagrams Venn diagrams Karnough maps Probability tables Union & intersection rules Conditional probability Markov chains 1 Probability Probability is the mathematics

More information

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007. Spring 2007 / Page 1 Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007. Don t panic. Be sure to write your name and student ID number on every page of the exam. The only materials

More information

Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples

Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples February 24, 2011 Summary: We introduce the Nash Equilibrium: an outcome (action profile) which is stable in the sense that no player

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

SF2972 Game Theory Written Exam with Solutions June 10, 2011

SF2972 Game Theory Written Exam with Solutions June 10, 2011 SF97 Game Theory Written Exam with Solutions June 10, 011 Part A Classical Game Theory Jörgen Weibull and Mark Voorneveld 1. Finite normal-form games. (a) What are N, S and u in the definition of a finite

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 2

CS 188 Fall Introduction to Artificial Intelligence Midterm 2 CS 188 Fall 2013 Introduction to rtificial Intelligence Midterm 2 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use

More information

Introduction to Spring 2006 Artificial Intelligence Practice Final

Introduction to Spring 2006 Artificial Intelligence Practice Final NAME: SID#: Login: Sec: 1 CS 188 Introduction to Spring 2006 Artificial Intelligence Practice Final You have 180 minutes. The exam is open-book, open-notes, no electronics other than basic calculators.

More information