Summary. Agenda. Games. Intelligence opponents in game. Expected Value Expected Max Algorithm Minimax Algorithm Alpha-beta Pruning Simultaneous Game
|
|
- Amberly Fisher
- 5 years ago
- Views:
Transcription
1 Summary rtificial Intelligence and its applications Lecture 4 Game Playing Search onstraint Satisfaction Problems From start state to goal state onsider constraints Professor Daniel Yeung danyeung@ieee.org Dr. Patrick han patrickchan@ieee.org South hina University of Technology, hina Difficulty Game Playing Markov Decision Processes onsider an adversary onsider an uncertainty Reinforcement Learning No information is given 1 2 genda Expected Value Expected Max lgorithm Minimax lgorithm lpha-beta Pruning Simultaneous Game Games Intelligence opponents in game 3 4
2 Search and Game Information of Game Search Problem Independent from your decision Games 5 ompetition with an adversary Game opponent acts according to your decision Decision should be made in limited time in game: approximation is needed traditional hallmark of intelligence Model for many applications: Military confrontations, negotiation, auctions, State: s s start : starting state IsEnd(s): whether s is an end state (game over) ctions(s): possible actions from state s Succ(s, a): resulting state if choose action a in state s (s): agent s utility for end state s Player(s) ϵ Players: player who controls state s Players = {agent, opp} Information of Game Example Question Players: State s: IsEnd(s): ctions(s): {white, black} position of all pieces checkmate or draw legal chess moves that Player(s) can make Rules: You choose one of the bin Your opponent chooses a number in your chosen bin Your goal is to maximize the chosen number (s): if white wins 0 if draw, 1 if black wins Depend on attitude of opponent Stochastic (base on probability) gainst you e helpful (unlikely) 7 8
3 Expected Value Similar to a search problem, build a tree hance node is denoted by a circle Take an action with probability Each node is a decision point for a player you opponent outcome Each root-to-leaf path is a possible outcome of the game Expected Value Stochastic policies p (s, a) [0, 1] Probability of player p taking action a in state s For a Two-player Game Value of the game V agent,opp (s) agent opp agent Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) 9 10 Expected Value Example 1 Expected Value V agent,opp (s) Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) V agent,opp (s) Σ aϵctions(s) π agent (s,a) V agent,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V agent,opp (Succ(s,a)) ssume agent(s start,)= agent (S start,)= agent (S start,)=1/3 opp(s, L) = opp (s, R) = 0.5, for any s What is the value of the game? V agent,opp (S 1 ) 1/3 1/3 = 1/3 V agent,opp (S 11 ) + 1/3 V agent,opp (S 12 ) S 11 S 12 S /3 V agent,opp (S 13 ) = 1/3 x (0.5 x (-50) x (50)) + 1/3 x (0.5 x (1) x (3)) /3 x (0.5 x (-5) x (15)) = 14/3 11 S /3 ssume agent(s start, ) = 1 opp(s, L) = opp (s, R) = 0.5, for any s What is the value of the game: V agent,opp (S 1 ) = 1 V agent,opp (S 11 ) + 0 V agent,opp (S 12 ) + 0 V agent,opp (S 13 ) = 1 x (0.5 x (-50) x (50)) + 0 x (0.5 x (1) x (3)) + 0 x (0.5 x (-5) x (15)) = S S 11 S 12 S 13
4 Expected Value Example 3 ssume opp(s 11, L) = 0.4, opp (s 11, R) = 0. opp(s 12, L) = 0.5, opp (s 12, R) = 0.5 opp(s 13, L) = 0.2, opp (s 13, R) = 0.8 Which action we should choose? 13 V agent,opp () = 10 V agent,opp () = 2 V agent,opp () = 11 We should take the one with the maximum value ction 10 S ??? S 11 S 12 2 S Expected Max lgorithm Expected Max lgorithm selects action maximizing value over actions Max nodes is denoted by an upward-pointing triangle V max,opp (s) 14 agent opp agent max aϵctions(s) V max,opp (Succ(s,a)) Σ aϵctions(s) π opp (s,a) V max,opp (Succ(s,a)) Expected Max lgorithm Example ssume opp(s, L) = opp (s, R) = 0.5, for any s Which action an agent will choose in Expected Max lgorithm? max aϵctions(s) V max,opp (Succ(s,a)) ction = V max,opp (S start ) = 20 5?? ? 10 Minimax Unfortunately, we never know what our opponent will do ssume they take action randomly may be too optimistic The worst case should be considered 15 1
5 Minimax Minimax assumes opponent selects the worst action to an agent Minimax Example 1 Which action an agent will choose in minimax? agent opp agent min aϵctions(s) V max,min (Succ(s,a)) max aϵctions(s) V max,min (Succ(s,a)) Min nodes is denoted by an upside-down triangle V max,min (s) max aϵctions(s) V max,min (Succ(s,a)) min aϵctions(s) V max,min (Succ(s,a)) ction = V max,min (S start ) = Node Summary Example Which action an agent will choose in minimax? hance node weighted sum Max node max Min node min
6 New Game: Rules You choose one of the three bins Then Flip a coin; if heads, then move one bin to the left (with wrap around) If not, just stick on your choice Your opponent chooses a number from that bin Your goal is to maximize the chosen number Now, we have three parties Players = {agent, opp, coin} V max,coin,min (s) agent coin opp max aϵctions(s) V max,coin,min (Succ(s,a)) min aϵctions(s) V max,coin,min (Succ(s,a)) agent Σ aϵctions(s) π coin (s,a) V max,coin,min (Succ(s,a)) Player(s) = coin You choose one of the three bins Flip a coin; if heads, then move one bin to the left (with wrap around) If not, just stick on your choice Your opponent chooses a number from that bin Your goal is to maximize the chosen number V max,coin,min (s)= max( E(min(-50,50), min(-5,15)), E(min(1,3), min(-50,50), E(min(-5,15), min(1,3) ) = max(e(-50, -5), E(1, -50), E(-5,1))= max(-27.5,-24.5,-2) -2 =
7 Time omplexity Time omplexity fter a game is modeled as a tree, the search technique can be used omplexity: Space: O(d) Time: O(b2d) However, even a simple game like Tic Tac Toe, the tree is very complicated where b: branching factor, d: depth Example: hess b 35, d 50 long path to get the utility Time omplexity = Time / Space complexity is large in practice utility dvanced Method dvanced Methods Evaluation Function Original How to speed up minimax? Evaluation Functions Do not access TRUE utility but approximate it use domain-specific knowledge lpha-beta Pruning general-purpose Ignore unnecessary path compute exact answer Evaluation Functions Sstart dmax s Very tall Send =??? Evaluation! = 1 (win) 27 28
8 dvanced Method Evaluation Function Limited depth tree search (stop at maximum depth d max ) Eval(s) evaluates the value of V max,min (s) at d max (may be very inaccurate) V max,min (s,d) d max Eval(s) d=0 max aϵctions(s) V max,min (Succ(s,a), d-1) min aϵctions(s) V max,min (Succ(s,a), d-1) dvanced Method: Evaluation Function Example Example: hess Eval(s) = material + mobility + king-safety + center-control Material: (K K ) + 9(Q Q ) + 5(R R )+ 3( ) + 3(N N ) + 1(P P ) K : King, Q : Queen, R: rook, : bishop, N : Knight, P : Pawn : the difference in due to the move Mobility: 0.1 x (legal_move# - legal_move# ) King-safety: keeping the king safe is good enter-control: control the center of the board dvanced Method lpha-beta Pruning dvanced Method lpha-beta Pruning In some cases, visiting some branches is not necessary in minimax algorithm For example s opp always take minimal value, after finding utility 2, the value of action on the right cannot be more than 2 No need to further investigate Prune a node if its value is not in the interval bounded by and, (i.e. ~( ), v is value of node) where a s : lower bound on value of max node s where b s : upper bound on value of min node s No need to check this branch 31 32
9 dvanced Method: lpha-beta Pruning Example 1 dvanced Method: lpha-beta Pruning Example: The last node can be pruned α α β No overlap with every ancestor where a s : lower bound on value of max node s No need to check this branch as the value cannot be bigger than 8 3 Still need to check the rest as the value may be equal to 8 where b s : upper bound on value of min node s dvanced Method: lpha-beta Pruning dvanced Method: lpha-beta Pruning Example Prune here!
10 Simultaneous Game Turn-based Games Simultaneous Game Simultaneous Game Example Two-finger Morra Rules Players and each show 1 or 2 fingers. If both show 1, gives 2 dollars. If both show 2, gives 4 dollars. Otherwise, gives 3 dollars Simultaneous Game Example Simultaneous Game Type of Strategy If both show 1, gives 2 dollars. If both show 2, gives 4 dollars. Otherwise, gives 3 dollars V(a,b), where a,bϵctions Pure Strategy lways do the same action If π(b) = 1 and π(a) = 0, where b a and a ϵ ctions E.g. lways 1: π = [1, 0] General Strategy Take an action with probability 0 π(a) 1 for a ϵ ctions E.g. Uniformly random: π = [ 1/2, 1/2] 39 40
11 Simultaneous Game Expected Value Summary Value of the game if follows π and follows π is Search From start state to goal state V(π, π ) = Σ a,b π (a)π (b) V(a,b) onstraint Satisfaction Problems onsider constraints Example: π = [1, 0], π = [1/2, 1/2] V(π, π ) = Σ a,b π (a)π (b) V(a,b) = (1 x 1/2 x 2) + (1 x 1/2 x -3) + (0 x 1/2 x -3) + (0 x 1/2 x 4) = -1/ Difficulty Game Playing Markov Decision Processes Reinforcement Learning onsider an adversary onsider an uncertainty No information is given 41 42
CS 4100 // artificial intelligence. Recap/midterm review!
CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks
More informationCS 188 Introduction to Fall 2007 Artificial Intelligence Midterm
NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.
More informationAnnouncements. CS 188: Artificial Intelligence Fall Adversarial Games. Computing Minimax Values. Evaluation Functions. Recap: Resource Limits
CS 188: Artificial Intelligence Fall 2009 Lecture 7: Expectimax Search 9/17/2008 Announcements Written 1: Search and CSPs is up Project 2: Multi-agent Search is up Want a partner? Come to the front after
More informationAnnouncements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games
CS 188: Artificial Intelligence Spring 2009 Lecture 7: Expectimax Search 2/10/2009 John DeNero UC Berkeley Slides adapted from Dan Klein, Stuart Russell or Andrew Moore Announcements Written Assignment
More informationCS 188: Artificial Intelligence Fall Announcements
CS 188: Artificial Intelligence Fall 2009 Lecture 7: Expectimax Search 9/17/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 Announcements Written
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Adversarial Search II Instructor: Anca Dragan University of California, Berkeley [These slides adapted from Dan Klein and Pieter Abbeel] Minimax Example 3 12 8 2 4 6 14
More informationIS-ZC444: ARTIFICIAL INTELLIGENCE
IS-ZC444: ARTIFICIAL INTELLIGENCE Lecture-07: Beyond Classical Search Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems, BITS Pilani, Pilani, Jhunjhunu-333031,
More informationFinal. CS 188 Fall Introduction to Artificial Intelligence
S 188 Fall 2012 Introduction to rtificial Intelligence Final You have approximately 3 hours. The exam is closed book, closed notes except your three one-page crib sheets. Please use non-programmable calculators
More informationIntroduction to Spring 2009 Artificial Intelligence Midterm Exam
S 188 Introduction to Spring 009 rtificial Intelligence Midterm Exam INSTRUTINS You have 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators
More informationPOLYNOMIAL SPACE QSAT. Games. Polynomial space cont d
T-79.5103 / Autumn 2008 Polynomial Space 1 T-79.5103 / Autumn 2008 Polynomial Space 3 POLYNOMIAL SPACE Polynomial space cont d Polynomial space-bounded computation has a variety of alternative characterizations
More informationMinimax strategies, alpha beta pruning. Lirong Xia
Minimax strategies, alpha beta pruning Lirong Xia Reminder ØProject 1 due tonight Makes sure you DO NOT SEE ERROR: Summation of parsed points does not match ØProject 2 due in two weeks 2 How to find good
More informationGame playing. Chapter 6. Chapter 6 1
Game playing Chapter 6 Chapter 6 1 Outline Minimax α β pruning UCT for games Chapter 6 2 Game tree (2-player, deterministic, turns) Chapter 6 3 Minimax Perfect play for deterministic, perfect-information
More informationAdversarial Search & Logic and Reasoning
CSEP 573 Adversarial Search & Logic and Reasoning CSE AI Faculty Recall from Last Time: Adversarial Games as Search Convention: first player is called MAX, 2nd player is called MIN MAX moves first and
More informationEvaluation for Pacman. CS 188: Artificial Intelligence Fall Iterative Deepening. α-β Pruning Example. α-β Pruning Pseudocode.
CS 188: Artificial Intelligence Fall 2008 Evaluation for Pacman Lecture 7: Expectimax Search 9/18/2008 [DEMO: thrashing, smart ghosts] Dan Klein UC Berkeley Many slides over the course adapted from either
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 7: Expectimax Search 9/18/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 1 Evaluation for
More informationCSE 573: Artificial Intelligence
CSE 573: Artificial Intelligence Autumn 2010 Lecture 5: Expectimax Search 10/14/2008 Luke Zettlemoyer Most slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements
More informationIntroduction to Spring 2009 Artificial Intelligence Midterm Solutions
S 88 Introduction to Spring 009 rtificial Intelligence Midterm Solutions. (6 points) True/False For the following questions, a correct answer is worth points, no answer is worth point, and an incorrect
More informationAlgorithms for Playing and Solving games*
Algorithms for Playing and Solving games* Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 * Two Player Zero-sum Discrete
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationMachine Learning I Reinforcement Learning
Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:
More informationCS221 Practice Midterm
CS221 Practice Midterm Autumn 2012 1 ther Midterms The following pages are excerpts from similar classes midterms. The content is similar to what we ve been covering this quarter, so that it should be
More informationENGR 200 ENGR 200. What did we do last week?
ENGR 200 What did we do last week? Definition of probability xioms of probability Sample space robability laws Conditional probability ENGR 200 Lecture 3: genda. Conditional probability 2. Multiplication
More informationAdversarial Search. Christos Papaloukas, Iosif Angelidis. University of Athens November 2017
Adversarial Search Christos Papaloukas, Iosif Angelidis University of Athens November 2017 Christos P., Iosif A. Adversarial Search UoA, 2017 1 / 61 Main Aspects Formulation In order to perform an Adversarial
More informationInformation, Utility & Bounded Rationality
Information, Utility & Bounded Rationality Pedro A. Ortega and Daniel A. Braun Department of Engineering, University of Cambridge Trumpington Street, Cambridge, CB2 PZ, UK {dab54,pao32}@cam.ac.uk Abstract.
More informationFinal. CS 188 Fall Introduction to Artificial Intelligence
CS 188 Fall 2012 Introduction to Artificial Intelligence Final You have approximately 3 hours. The exam is closed book, closed notes except your three one-page crib sheets. Please use non-programmable
More informationCITS4211 Mid-semester test 2011
CITS4211 Mid-semester test 2011 Fifty minutes, answer all four questions, total marks 60 Question 1. (12 marks) Briefly describe the principles, operation, and performance issues of iterative deepening.
More informationPlaying Abstract games with Hidden States (Spatial and Non-Spatial).
Playing Abstract games with Hidden States (Spatial and Non-Spatial). Gregory Calbert, Hing-Wah Kwok Peter Smet, Jason Scholz, Michael Webb VE Group, C2D, DSTO. Report Documentation Page Form Approved OMB
More informationASHANTHI MEENA SERALATHAN. Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr.
USING ADAPTIVE LEARNING ALGORITHMS TO MAKE COMPLEX STRATEGICAL DECISIONS ASHANTHI MEENA SERALATHAN Haverford College, Computer Science Department Advisors: Dr. Deepak Kumar Dr. Douglas Blank Dr. Stephen
More informationCSC 8301 Design & Analysis of Algorithms: Lower Bounds
CSC 8301 Design & Analysis of Algorithms: Lower Bounds Professor Henry Carter Fall 2016 Recap Iterative improvement algorithms take a feasible solution and iteratively improve it until optimized Simplex
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationUncertainty. Michael Peters December 27, 2013
Uncertainty Michael Peters December 27, 20 Lotteries In many problems in economics, people are forced to make decisions without knowing exactly what the consequences will be. For example, when you buy
More informationCS 361: Probability & Statistics
February 19, 2018 CS 361: Probability & Statistics Random variables Markov s inequality This theorem says that for any random variable X and any value a, we have A random variable is unlikely to have an
More informationCSC 1700 Analysis of Algorithms: P and NP Problems
CSC 1700 Analysis of Algorithms: P and NP Problems Professor Henry Carter Fall 2016 Recap Algorithmic power is broad but limited Lower bounds determine whether an algorithm can be improved by more than
More informationLecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010
Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book
More informationThe exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS 188 Fall 2018 Introduction to Artificial Intelligence Practice Final You have approximately 2 hours 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib
More informationLecture 2 : CS6205 Advanced Modeling and Simulation
Lecture 2 : CS6205 Advanced Modeling and Simulation Lee Hwee Kuan 21 Aug. 2013 For the purpose of learning stochastic simulations for the first time. We shall only consider probabilities on finite discrete
More informationAn Analysis of Forward Pruning. to try to understand why programs have been unable to. pruning more eectively. 1
Proc. AAAI-94, to appear. An Analysis of Forward Pruning Stephen J. J. Smith Dana S. Nau Department of Computer Science Department of Computer Science, and University of Maryland Institute for Systems
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationQ-learning Tutorial. CSC411 Geoffrey Roeder. Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava
Q-learning Tutorial CSC411 Geoffrey Roeder Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava Tutorial Agenda Refresh RL terminology through Tic Tac Toe Deterministic
More informationExam EDAF May 2011, , Vic1. Thore Husfeldt
Exam EDAF05 25 May 2011, 8.00 13.00, Vic1 Thore Husfeldt Instructions What to bring. You can bring any written aid you want. This includes the course book and a dictionary. In fact, these two things are
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationCSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes
CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes Name: Roll Number: Please read the following instructions carefully Ø Calculators are allowed. However, laptops or mobile phones are not
More informationRollout-based Game-tree Search Outprunes Traditional Alpha-beta
Journal of Machine Learning Research vol:1 8, 2012 Submitted 6/2012 Rollout-based Game-tree Search Outprunes Traditional Alpha-beta Ari Weinstein Michael L. Littman Sergiu Goschin aweinst@cs.rutgers.edu
More informationCMSC 474, Game Theory
CMSC 474, Game Theory 4b. Game-Tree Search Dana Nau University of Maryland Nau: Game Theory 1 Finite perfect-information zero-sum games! Finite: Ø finitely many agents, actions, states, histories! Perfect
More informationFinal. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.
CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers
More informationHSMC 2017 Free Response
HSMC 207 Free Response. What are the first three digits of the least common multiple of 234 and 360? Solution: 468. Note that 234 = 2 3 2 3, and 360 = 2 3 3 2 5. Thus, lcm =2 3 3 2 5 3 = 0 36 3 = 4680.
More informationReinforcement Learning
Reinforcement Learning Cyber Rodent Project Some slides from: David Silver, Radford Neal CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reinforcement Learning Supervised learning:
More informationDeep Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 50 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationIntroduction to Fall 2009 Artificial Intelligence Final Exam
CS 188 Introduction to Fall 2009 Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators
More informationTwo hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.
COMP 34120 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE AI and Games Date: Thursday 17th May 2018 Time: 09:45-11:45 Please answer all Questions. Use a SEPARATE answerbook for each SECTION
More informationReinforcement Learning
CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act
More informationMonopoly An Analysis using Markov Chains. Benjamin Bernard
Monopoly An Analysis using Markov Chains Benjamin Bernard Columbia University, New York PRIME event at Pace University, New York April 19, 2017 Introduction Applications of Markov chains Applications of
More informationFinal. Introduction to Artificial Intelligence. CS 188 Summer 2014
S 188 Summer 2014 Introduction to rtificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers ON
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationOptimism in the Face of Uncertainty Should be Refutable
Optimism in the Face of Uncertainty Should be Refutable Ronald ORTNER Montanuniversität Leoben Department Mathematik und Informationstechnolgie Franz-Josef-Strasse 18, 8700 Leoben, Austria, Phone number:
More informationCS Communication Complexity: Applications and New Directions
CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the
More informationProbability theory and mathematical statistics:
N.I. Lobachevsky State University of Nizhni Novgorod Probability theory and mathematical statistics: Geometric probability Practice Associate Professor A.V. Zorine Geometric probability Practice 1 / 7
More informationMock Exam Künstliche Intelligenz-1. Different problems test different skills and knowledge, so do not get stuck on one problem.
Name: Matriculation Number: Mock Exam Künstliche Intelligenz-1 January 9., 2017 You have one hour(sharp) for the test; Write the solutions to the sheet. The estimated time for solving this exam is 53 minutes,
More informationDecision Trees. Tirgul 5
Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2
More informationThe Reinforcement Learning Problem
The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence
More informationLearning in Zero-Sum Team Markov Games using Factored Value Functions
Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer
More informationComplexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
More informationSolving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6
Solving Zero-Sum Extensive-Form Games ranislav ošanský E4M36MS, Fall 2013, Lecture 6 Imperfect Information EFGs States Players 1 2 Information Set ctions Utility Solving II Zero-Sum EFG with perfect recall
More informationCourse basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.
Course basics CSE 190: Reinforcement Learning: An Introduction The website for the class is linked off my homepage. Grades will be based on programming assignments, homeworks, and class participation.
More informationAlpha-Beta Pruning: Algorithm and Analysis
Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationIntroduction to Fall 2011 Artificial Intelligence Final Exam
CS 188 Introduction to Fall 2011 rtificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except two pages of crib sheets. Please use non-programmable calculators
More informationCSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1
CSC304 Lecture 5 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Recap Last lecture Cost-sharing games o Price of anarchy (PoA) can be n o Price of stability (PoS) is O(log n)
More informationWritten examination: Solution suggestions TIN175/DIT411, Introduction to Artificial Intelligence
Written examination: Solution suggestions TIN175/DIT411, Introduction to Artificial Intelligence Question 1 had completely wrong alternatives, and cannot be answered! Therefore, the grade limits was lowered
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More informationIntroduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming
Introduction to Reinforcement Learning Part 6: Core Theory II: Bellman Equations and Dynamic Programming Bellman Equations Recursive relationships among values that can be used to compute values The tree
More informationAlpha-Beta Pruning: Algorithm and Analysis
Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person
More informationMachine Learning Recitation 8 Oct 21, Oznur Tastan
Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy
More informationAutonomous Helicopter Flight via Reinforcement Learning
Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy
More informationPracticable Robust Markov Decision Processes
Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)
More informationGeneral-sum games. I.e., pretend that the opponent is only trying to hurt you. If Column was trying to hurt Row, Column would play Left, so
General-sum games You could still play a minimax strategy in general- sum games I.e., pretend that the opponent is only trying to hurt you But this is not rational: 0, 0 3, 1 1, 0 2, 1 If Column was trying
More informationalgorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Algorithms 2012, 5, ; doi: /a
Algorithms 01, 5, 51-58; doi:10.3390/a504051 Article OPEN ACCESS algorithms ISSN 1999-4893 www.mdpi.com/journal/algorithms Alpha-Beta Pruning and Althöfer s Pathology-Free Negamax Algorithm Ashraf M. Abdelbar
More informationName: UW CSE 473 Final Exam, Fall 2014
P1 P6 Instructions Please answer clearly and succinctly. If an explanation is requested, think carefully before writing. Points may be removed for rambling answers. If a question is unclear or ambiguous,
More informationMachine Learning 3. week
Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which
More informationMachine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396
Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationProperties of Forward Pruning in Game-Tree Search
Properties of Forward Pruning in Game-Tree Search Yew Jin Lim and Wee Sun Lee School of Computing National University of Singapore {limyewji,leews}@comp.nus.edu.sg Abstract Forward pruning, or selectively
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationEnochian Chess for Divination
Enochian Chess for Divination Some have claimed the chess actually originated from the Tarot; there being no way to authenticate this. However, Enochian Chess can be used as a divinatory tool, by determining
More informationChapter 6: Temporal Difference Learning
Chapter 6: emporal Difference Learning Objectives of this chapter: Introduce emporal Difference (D) learning Focus first on policy evaluation, or prediction, methods hen extend to control methods R. S.
More informationCS 237: Probability in Computing
CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 11: Geometric Distribution Poisson Process Poisson Distribution Geometric Distribution The Geometric
More informationAlpha-Beta Pruning: Algorithm and Analysis
Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for solving
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationA Polynomial-time Nash Equilibrium Algorithm for Repeated Games
A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationProbability. VCE Maths Methods - Unit 2 - Probability
Probability Probability Tree diagrams La ice diagrams Venn diagrams Karnough maps Probability tables Union & intersection rules Conditional probability Markov chains 1 Probability Probability is the mathematics
More informationFinal exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.
Spring 2007 / Page 1 Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007. Don t panic. Be sure to write your name and student ID number on every page of the exam. The only materials
More informationGame Theory and Algorithms Lecture 2: Nash Equilibria and Examples
Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples February 24, 2011 Summary: We introduce the Nash Equilibrium: an outcome (action profile) which is stable in the sense that no player
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationSF2972 Game Theory Written Exam with Solutions June 10, 2011
SF97 Game Theory Written Exam with Solutions June 10, 011 Part A Classical Game Theory Jörgen Weibull and Mark Voorneveld 1. Finite normal-form games. (a) What are N, S and u in the definition of a finite
More informationCS 188 Fall Introduction to Artificial Intelligence Midterm 2
CS 188 Fall 2013 Introduction to rtificial Intelligence Midterm 2 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use
More informationIntroduction to Spring 2006 Artificial Intelligence Practice Final
NAME: SID#: Login: Sec: 1 CS 188 Introduction to Spring 2006 Artificial Intelligence Practice Final You have 180 minutes. The exam is open-book, open-notes, no electronics other than basic calculators.
More information