Chapter 17 Planning Based on Model Checking

Size: px
Start display at page:

Download "Chapter 17 Planning Based on Model Checking"

Transcription

1 Lecture slides for Automated Planning: Theory and Practice Chapter 17 Planning Based on Model Checking Dana S. Nau University of Maryland 1:19 PM February 9, 01 1

2 Motivation c a b Actions with multiple possible outcomes Action failures» e.g., gripper drops its load Exogenous events» e.g., road closed Nondeterministic systems are like Markov Decision Processes (MDPs), but without probabilities attached to the outcomes Useful if accurate probabilities aren t available, or if probability calculations would introduce inaccuracies a c b grasp(c) Intended outcome a b Unintended outcome

3 Nondeterministic Systems Nondeterministic system: a triple Σ = (S, A, γ) S = finite set of states A = finite set of actions γ: S A s Like in the previous chapter, the book doesn t commit to any particular representation It only deals with the underlying semantics Draw the state-transition graph explicitly Like in the previous chapter, a policy is a function from states into actions π: S A Notation: S π = {s (s,a) π} In some algorithms, we ll temporarily have nondeterministic policies» Ambiguous: multiple actions for some states π: S A, or equivalently, π S A We ll always make these policies deterministic before the algorithm terminates 3

4 Example Robot r1 starts at location l1 Objective is to get r1 to location l4 π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 4

5 Example Robot r1 starts at location l1 Objective is to get r1 to location l4 π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 5

6 Example Robot r1 starts at location l1 Objective is to get r1 to location l4 π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 6

7 Execution Structures Execution structure for a policy π: s The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 7

8 Execution Structures Execution structure for a policy π: The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S s π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 8

9 Execution Structures Execution structure for a policy π: s The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 9

10 Execution Structures Execution structure for a policy π: The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S s π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 10

11 Execution Structures Execution structure for a policy π: The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 11

12 Execution Structures Execution structure for a policy π: The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 1

13 Types of Solutions Weak solution: at least one execution path reaches a goal s0 a0 a1 s a Strong solution: every execution path reaches a goal s0 a0 a1 s a a3 Strong-cyclic solution: every fair execution path reaches a goal Don t stay in a cycle forever if there s a state-transition out of it s0 a0 a1 a3 s a 13

14 Backward breadth-first search Finding Strong Solutions StrongPreImg(S) = {(s,a) : γ(s,a), γ(s,a) S} all state-action pairs for which all of the successors are in S PruneStates(π,S) = {(s,a) π : s S} S is the set of states we ve already solved keep only the state-action pairs for other states MkDet(π') π' is a policy that may be nondeterministic remove some state-action pairs if necessary, to get a deterministic policy 14

15 Example π = failure π' = S π' = S g S π' = {} 15

16 Example π = failure π' = S π' = S g S π' = {} π'' PreImage = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} 16

17 Example π = failure π' = S π' = S g S π' = {} π'' PreImage = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} π π' = π' π' U π'' = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} 17

18 π = Example π' = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} S π' = {,} S g S π' = {,,} 18

19 π = Example π' = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} s S π' = {,} S g S π' = {,,} PreImage {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4)), (,move(r1,l4,l3)), (,move(r1,l4,l5))} π'' {(s,move(r1,l,l3))} π π' = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} π' {(s,move(r1,l,l3), (,move(r1,l3,l4)), (,move(r1,l5,l4))} 19

20 Example π = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} π' = {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} S π' = {s,,} S g S π' = {s,,,} s 0

21 Example π = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} π' = {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} s S π' = {s,,} S g S π' = {s,,,} π'' {(,move(r1,l1,l))} π π' = {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} π' {(,move(r1,l1,l)), (s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} 1

22 Example π = {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} s π' = {(,move(r1,l1,l)), (s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} S π' = {,s,,} S g S π' = {,s,,,}

23 Example π = {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} s π' = {(,move(r1,l1,l)), (s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} S π' = {,s,,} S g S π' = {,s,,,} S 0 S g S π' MkDet(π') = π' 3

24 Finding Weak Solutions Weak-Plan is just like Strong-Plan except for this: WeakPreImg(S) = {(s,a) : γ(s,a) i S } at least one successor is in S Weak Weak 4

25 π = failure π' = S π' = S g S π' = {} Example Weak Weak 5

26 Example π = failure π' = S π' = S g S π' = {} π'' = PreImage = {(,move(r1,l1,l4)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} π π' = π' π' U π'' = {(,move(r1,l1,l4)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} Weak Weak 6

27 Example π = π' = {(,move(r1,l1,l4)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} S π' = {,,} S g S π' = {,,,} S 0 S g S π' MkDet(π') = π' Weak Weak 7

28 Finding Strong-Cyclic Solutions Begin with a universal policy π' that contains all state-action pairs Repeatedly, eliminate state-action pairs that take us to bad states PruneOutgoing removes state-action pairs that go to states not in S g S π» PruneOutgoing(π,S) = π {(s,a) π : γ(s,a) S S π PruneUnconnected removes states from which it is impossible to get to S g» with π' =, compute fixpoint of π' π WeakPreImg(S g S π ) 8

29 Finding Strong-Cyclic Solutions s Once the policy stops changing, If it s not a solution, return failure RemoveNonProgress removes state-action pairs that don t go toward the goal implement as backward search from the goal s6 at(r1,l6) MkDet makes sure there s only one action for each state 9

30 Example 1 π π' {(s,a) : a is applicable to s} s s6 at(r1,l6) 30

31 Example 1 π π' {(s,a) : a is applicable to s} π {(s,a) : a is applicable to s} s PruneOutgoing(π',S g ) = π' PruneUnconnected(π',S g ) = π' RemoveNonProgress(π') =? s6 at(r1,l6) 31

32 Example 1 π π' {(s,a) : a is applicable to s} π {(s,a) : a is applicable to s} s PruneOutgoing(π',S g ) = π' PruneUnconnected(π',S g ) = π' RemoveNonProgress(π') = as shown s6 at(r1,l6) 3

33 π Example 1 π' {(s,a) : a is applicable to s} π {(s,a) : a is applicable to s} PruneOutgoing(π',S g ) = π' PruneUnconnected(π',S g ) = π' RemoveNonProgress(π') = as shown MkDet( ) = either {(,move(r1,l1,l4), (s,move(r1,l,l3)), (,move(r1,l3,l4), (,move(r1,l4,l6), (,move(r1,l5,l4)} or {(,move(r1,l1,l), (s,move(r1,l,l3)), (,move(r1,l3,l4), (,move(r1,l4,l6), (,move(r1,l5,l4)} s s6 at(r1,l6) 33

34 Example : no applicable actions at π π' {(s,a) : a is applicable to s} s s6 at(r1,l6) 34

35 Example : no applicable actions at π π' {(s,a) : a is applicable to s} s π {(s,a) : a is applicable to s} PruneOutgoing(π',S g ) = s6 at(r1,l6) 35

36 Example : no applicable actions at π π' {(s,a) : a is applicable to s} s π {(s,a) : a is applicable to s} PruneOutgoing(π',S g ) = π' s6 at(r1,l6) 36

37 Example : no applicable actions at π π' {(s,a) : a is applicable to s} s π {(s,a) : a is applicable to s} PruneOutgoing(π',S g ) = π' PruneUnconnected(π',S g ) = as shown s6 at(r1,l6) 37

38 Example : no applicable actions at π π' {(s,a) : a is applicable to s} s π {(s,a) : a is applicable to s} PruneOutgoing(π',S g ) = π' PruneUnconnected(π',S g ) = as shown π' as shown s6 at(r1,l6) 38

39 Example : no applicable actions at π' as hown π π' PruneOutgoing(π',S g ) = π' s PruneUnconnected(π',S g ) = π' so π = π' RemoveNonProgress(π') = s6 at(r1,l6) 39

40 Example : no applicable actions at π' shown π π' PruneOutgoing(π',S g ) = π' s PruneUnconnected(π'',S g ) = π' so π = π' RemoveNonProgress(π') = as shown MkDet(shown) = no change s6 at(r1,l6) 40

41 Planning for Extended s Here, extended means temporally extended Constraints that apply to some sequence of states Examples: want to move to l3, and then to l5 want to keep going back and forth between l3 and l5 wait wait wait move(r1,l,l1) wait 41

42 Planning for Extended s Context: the internal state of the controller Plan: (C, c 0, act, ctxt) C = a set of execution contexts c 0 is the initial context act: S C A ctxt: S C S C Sections 17.3 extends the ideas in Sections 17.1 and 17. to deal with extended goals We ll skip the details 4

Chapter 17 Planning Based on Model Checking

Chapter 17 Planning Based on Model Checking Lecture slides for Automated Planning: Theory and Practice Chapter 17 Planning Based on Model Checking Dana S. Nau CMSC 7, AI Planning University of Maryland, Spring 008 1 Motivation c Actions with multiple

More information

Chapter 16 Planning Based on Markov Decision Processes

Chapter 16 Planning Based on Markov Decision Processes Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Chapter 7 Propositional Satisfiability Techniques

Chapter 7 Propositional Satisfiability Techniques Lecture slides for Automated Planning: Theory and Practice Chapter 7 Propositional Satisfiability Techniques Dana S. Nau University of Maryland 12:58 PM February 15, 2012 1 Motivation Propositional satisfiability:

More information

Chapter 7 Propositional Satisfiability Techniques

Chapter 7 Propositional Satisfiability Techniques Lecture slides for Automated Planning: Theory and Practice Chapter 7 Propositional Satisfiability Techniques Dana S. Nau CMSC 722, AI Planning University of Maryland, Spring 2008 1 Motivation Propositional

More information

Task Decomposition on Abstract States, for Planning under Nondeterminism

Task Decomposition on Abstract States, for Planning under Nondeterminism Task Decomposition on Abstract States, for Planning under Nondeterminism Ugur Kuter a, Dana Nau a, Marco Pistore b, Paolo Traverso b a Department of Computer Science and Institute of Systems Research and

More information

Planning Under Uncertainty II

Planning Under Uncertainty II Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov

More information

Chapter 14 Temporal Planning

Chapter 14 Temporal Planning Lecture slides for Automated Planning: Theory and Practice Chapter 14 Temporal Planning Dana S. Nau CMSC 722, AI Planning University of Maryland, Spring 2008 1 Temporal Planning Motivation: want to do

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

CS 4100 // artificial intelligence. Recap/midterm review!

CS 4100 // artificial intelligence. Recap/midterm review! CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks

More information

Chapter 5 Plan-Space Planning

Chapter 5 Plan-Space Planning Lecture slides for Automated Planning: Theory and Practice Chapter 5 Plan-Space Planning Dana S. Nau University of Maryland 5:10 PM February 6, 2012 1 Problem with state-space search Motivation In some

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

Intelligent Agents. Formal Characteristics of Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University

Intelligent Agents. Formal Characteristics of Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University Intelligent Agents Formal Characteristics of Planning Ute Schmid Cognitive Systems, Applied Computer Science, Bamberg University Extensions to the slides for chapter 3 of Dana Nau with contributions by

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes. CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers

More information

Chapter 4 Deliberation with Temporal Domain Models

Chapter 4 Deliberation with Temporal Domain Models Last update: April 10, 2018 Chapter 4 Deliberation with Temporal Domain Models Section 4.4: Constraint Management Section 4.5: Acting with Temporal Models Automated Planning and Acting Malik Ghallab, Dana

More information

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent

More information

Introduction to Spring 2006 Artificial Intelligence Practice Final

Introduction to Spring 2006 Artificial Intelligence Practice Final NAME: SID#: Login: Sec: 1 CS 188 Introduction to Spring 2006 Artificial Intelligence Practice Final You have 180 minutes. The exam is open-book, open-notes, no electronics other than basic calculators.

More information

Markov Decision Processes (and a small amount of reinforcement learning)

Markov Decision Processes (and a small amount of reinforcement learning) Markov Decision Processes (and a small amount of reinforcement learning) Slides adapted from: Brian Williams, MIT Manuela Veloso, Andrew Moore, Reid Simmons, & Tom Mitchell, CMU Nicholas Roy 16.4/13 Session

More information

1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a

1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a 3 MDP (2 points). (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a s P (s s, a)u(s ) where R(s) is the reward associated with being in state s. Suppose now

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian

More information

Lecture 3: The Reinforcement Learning Problem

Lecture 3: The Reinforcement Learning Problem Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Machine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel

Machine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING Slides adapted from Tom Mitchell and Peter Abeel Machine Learning: Jordan Boyd-Graber UMD Machine Learning

More information

Reinforcement Learning. Spring 2018 Defining MDPs, Planning

Reinforcement Learning. Spring 2018 Defining MDPs, Planning Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state

More information

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book

More information

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar. Fall 2016 CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or

More information

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro

CMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 11: Markov Decision Processes II Teacher: Gianni A. Di Caro RECAP: DEFINING MDPS Markov decision processes: o Set of states S o Start state s 0 o Set of actions A o Transitions P(s s,a)

More information

COMP4418, 25 October 2017 Planning. Planning. KRR for Agents in Dynamic Environments. David Rajaratnam 2017 ( Michael Thielscher 2015) COMP s2

COMP4418, 25 October 2017 Planning. Planning. KRR for Agents in Dynamic Environments. David Rajaratnam 2017 ( Michael Thielscher 2015) COMP s2 1 Planning KRR for Agents in Dynamic Environments 2 Overview Last week discussed Decision Making in some very general settings: Markov process, MDP, HMM, POMDP. This week look at a practical application

More information

Planning With Information States: A Survey Term Project for cs397sml Spring 2002

Planning With Information States: A Survey Term Project for cs397sml Spring 2002 Planning With Information States: A Survey Term Project for cs397sml Spring 2002 Jason O Kane jokane@uiuc.edu April 18, 2003 1 Introduction Classical planning generally depends on the assumption that the

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out

More information

Markov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.

Markov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy. Page 1 Markov decision processes (MDP) CS 416 Artificial Intelligence Lecture 21 Making Complex Decisions Chapter 17 Initial State S 0 Transition Model T (s, a, s ) How does Markov apply here? Uncertainty

More information

Motivation for introducing probabilities

Motivation for introducing probabilities for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.

More information

Asynchronous Models For Consensus

Asynchronous Models For Consensus Distributed Systems 600.437 Asynchronous Models for Consensus Department of Computer Science The Johns Hopkins University 1 Asynchronous Models For Consensus Lecture 5 Further reading: Distributed Algorithms

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Reinforcement Learning. Machine Learning, Fall 2010

Reinforcement Learning. Machine Learning, Fall 2010 Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Data Structures in Java

Data Structures in Java Data Structures in Java Lecture 21: Introduction to NP-Completeness 12/9/2015 Daniel Bauer Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways

More information

Reading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Reading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

Introduction to Spring 2009 Artificial Intelligence Midterm Exam

Introduction to Spring 2009 Artificial Intelligence Midterm Exam S 188 Introduction to Spring 009 rtificial Intelligence Midterm Exam INSTRUTINS You have 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators

More information

Lecture Notes on Software Model Checking

Lecture Notes on Software Model Checking 15-414: Bug Catching: Automated Program Verification Lecture Notes on Software Model Checking Matt Fredrikson André Platzer Carnegie Mellon University Lecture 19 1 Introduction So far we ve focused on

More information

Don t Plan for the Unexpected: Planning Based on Plausibility Models

Don t Plan for the Unexpected: Planning Based on Plausibility Models Don t Plan for the Unexpected: Planning Based on Plausibility Models Thomas Bolander, DTU Informatics, Technical University of Denmark Joint work with Mikkel Birkegaard Andersen and Martin Holm Jensen

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies

More information

Control Theory : Course Summary

Control Theory : Course Summary Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields

More information

total 58

total 58 CS687 Exam, J. Košecká Name: HONOR SYSTEM: This examination is strictly individual. You are not allowed to talk, discuss, exchange solutions, etc., with other fellow students. Furthermore, you are only

More information

SFM-11:CONNECT Summer School, Bertinoro, June 2011

SFM-11:CONNECT Summer School, Bertinoro, June 2011 SFM-:CONNECT Summer School, Bertinoro, June 20 EU-FP7: CONNECT LSCITS/PSS VERIWARE Part 3 Markov decision processes Overview Lectures and 2: Introduction 2 Discrete-time Markov chains 3 Markov decision

More information

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B.

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B. Advanced Topics in Machine Learning, GI13, 2010/11 Advanced Topics in Machine Learning, GI13, 2010/11 Answer any THREE questions. Each question is worth 20 marks. Use separate answer books Answer any THREE

More information

What happens to the value of the expression x + y every time we execute this loop? while x>0 do ( y := y+z ; x := x:= x z )

What happens to the value of the expression x + y every time we execute this loop? while x>0 do ( y := y+z ; x := x:= x z ) Starter Questions Feel free to discuss these with your neighbour: Consider two states s 1 and s 2 such that s 1, x := x + 1 s 2 If predicate P (x = y + 1) is true for s 2 then what does that tell us about

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

A Gentle Introduction to Reinforcement Learning

A Gentle Introduction to Reinforcement Learning A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,

More information

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING

More information

Contingent Planning with Goal Preferences

Contingent Planning with Goal Preferences Contingent Planning with Goal Preferences Dmitry Shaparau ITC-IRST, Trento, Italy shaparau@irst.itc.it Marco Pistore University of Trento, Italy pistore@dit.unitn.it Paolo Traverso ITC-IRST, Trento, Italy

More information

CS 188 Fall Introduction to Artificial Intelligence Midterm 2

CS 188 Fall Introduction to Artificial Intelligence Midterm 2 CS 188 Fall 2013 Introduction to rtificial Intelligence Midterm 2 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

The State Explosion Problem

The State Explosion Problem The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis

More information

Planning as Satisfiability

Planning as Satisfiability Planning as Satisfiability Alan Fern * Review of propositional logic (see chapter 7) Planning as propositional satisfiability Satisfiability techniques (see chapter 7) Combining satisfiability techniques

More information

CSCI3390-Assignment 2 Solutions

CSCI3390-Assignment 2 Solutions CSCI3390-Assignment 2 Solutions due February 3, 2016 1 TMs for Deciding Languages Write the specification of a Turing machine recognizing one of the following three languages. Do one of these problems.

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Busch Complexity Lectures: Turing s Thesis. Costas Busch - LSU 1

Busch Complexity Lectures: Turing s Thesis. Costas Busch - LSU 1 Busch Complexity Lectures: Turing s Thesis Costas Busch - LSU 1 Turing s thesis (1930): Any computation carried out by mechanical means can be performed by a Turing Machine Costas Busch - LSU 2 Algorithm:

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B.

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B. Advanced Topics in Machine Learning, GI13, 2010/11 Advanced Topics in Machine Learning, GI13, 2010/11 Answer any THREE questions. Each question is worth 20 marks. Use separate answer books Answer any THREE

More information

CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam

CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam CSE 573 Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming Slides adapted from Andrey Kolobov and Mausam 1 Stochastic Shortest-Path MDPs: Motivation Assume the agent pays cost

More information

Some AI Planning Problems

Some AI Planning Problems Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533

More information

Logic: Intro & Propositional Definite Clause Logic

Logic: Intro & Propositional Definite Clause Logic Logic: Intro & Propositional Definite Clause Logic Alan Mackworth UBC CS 322 Logic 1 February 27, 2013 P & M extbook 5.1 Lecture Overview Recap: CSP planning Intro to Logic Propositional Definite Clause

More information

Logic Design II (17.342) Spring Lecture Outline

Logic Design II (17.342) Spring Lecture Outline Logic Design II (17.342) Spring 2012 Lecture Outline Class # 10 April 12, 2012 Dohn Bowden 1 Today s Lecture First half of the class Circuits for Arithmetic Operations Chapter 18 Should finish at least

More information

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:

AGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications: AGREEMENT PROBLEMS (1) AGREEMENT PROBLEMS Agreement problems arise in many practical applications: agreement on whether to commit or abort the results of a distributed atomic action (e.g. database transaction)

More information

Controlling probabilistic systems under partial observation an automata and verification perspective

Controlling probabilistic systems under partial observation an automata and verification perspective Controlling probabilistic systems under partial observation an automata and verification perspective Nathalie Bertrand, Inria Rennes, France Uncertainty in Computation Workshop October 4th 2016, Simons

More information

Week 2: Defining Computation

Week 2: Defining Computation Computational Complexity Theory Summer HSSP 2018 Week 2: Defining Computation Dylan Hendrickson MIT Educational Studies Program 2.1 Turing Machines Turing machines provide a simple, clearly defined way

More information

Artificial Intelligence Planning

Artificial Intelligence Planning Artificial Intelligence Planning Gholamreza Ghassem-Sani Sharif University of Technology Lecture slides are mainly based on Dana Nau s AI Planning University of Maryland 1 For technical details: Related

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Probabilistic Planning. George Konidaris

Probabilistic Planning. George Konidaris Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t

More information

Q-learning Tutorial. CSC411 Geoffrey Roeder. Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava

Q-learning Tutorial. CSC411 Geoffrey Roeder. Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava Q-learning Tutorial CSC411 Geoffrey Roeder Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava Tutorial Agenda Refresh RL terminology through Tic Tac Toe Deterministic

More information

Markov Chains and Related Matters

Markov Chains and Related Matters Markov Chains and Related Matters 2 :9 3 4 : The four nodes are called states. The numbers on the arrows are called transition probabilities. For example if we are in state, there is a probability of going

More information

Algorithms for Playing and Solving games*

Algorithms for Playing and Solving games* Algorithms for Playing and Solving games* Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 * Two Player Zero-sum Discrete

More information

Latches. October 13, 2003 Latches 1

Latches. October 13, 2003 Latches 1 Latches The second part of CS231 focuses on sequential circuits, where we add memory to the hardware that we ve already seen. Our schedule will be very similar to before: We first show how primitive memory

More information

ARTIFICIAL INTELLIGENCE. Reinforcement learning

ARTIFICIAL INTELLIGENCE. Reinforcement learning INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Temporal difference learning

Temporal difference learning Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).

More information

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)

More information

The Markov Decision Process (MDP) model

The Markov Decision Process (MDP) model Decision Making in Robots and Autonomous Agents The Markov Decision Process (MDP) model Subramanian Ramamoorthy School of Informatics 25 January, 2013 In the MAB Model We were in a single casino and the

More information

Reinforcement Learning: An Introduction

Reinforcement Learning: An Introduction Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is

More information

Reinforcement Learning. Summer 2017 Defining MDPs, Planning

Reinforcement Learning. Summer 2017 Defining MDPs, Planning Reinforcement Learning Summer 2017 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time

Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time David Laibson 9/30/2014 Outline Lectures 9-10: 9.1 Continuous-time Bellman Equation 9.2 Application: Merton s Problem 9.3 Application:

More information

Homework set 2 - Solutions

Homework set 2 - Solutions Homework set 2 - Solutions Math 495 Renato Feres Simulating a Markov chain in R Generating sample sequences of a finite state Markov chain. The following is a simple program for generating sample sequences

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 10: Consensus Outline Consensus impossibility result Consensus with S Consensus with Ω Consensus Most famous problem in distributed computing

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Fall Semester, 2007 Lecture 12 Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Fall Semester, 2007 Lecture 12 Notes

More information

Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples

Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples February 24, 2011 Summary: We introduce the Nash Equilibrium: an outcome (action profile) which is stable in the sense that no player

More information

AM 121: Intro to Optimization Models and Methods: Fall 2018

AM 121: Intro to Optimization Models and Methods: Fall 2018 AM 11: Intro to Optimization Models and Methods: Fall 018 Lecture 18: Markov Decision Processes Yiling Chen Lesson Plan Markov decision processes Policies and value functions Solving: average reward, discounted

More information

Lecture 3: Markov Decision Processes

Lecture 3: Markov Decision Processes Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov

More information

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability

CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)

More information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Limitations of Algorithms We conclude with a discussion of the limitations of the power of algorithms. That is, what kinds

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

CS261: A Second Course in Algorithms Lecture #9: Linear Programming Duality (Part 2)

CS261: A Second Course in Algorithms Lecture #9: Linear Programming Duality (Part 2) CS261: A Second Course in Algorithms Lecture #9: Linear Programming Duality (Part 2) Tim Roughgarden February 2, 2016 1 Recap This is our third lecture on linear programming, and the second on linear programming

More information

Decision, Computation and Language

Decision, Computation and Language Decision, Computation and Language Non-Deterministic Finite Automata (NFA) Dr. Muhammad S Khan (mskhan@liv.ac.uk) Ashton Building, Room G22 http://www.csc.liv.ac.uk/~khan/comp218 Finite State Automata

More information

Goal specification using temporal logic in presence of non-deterministic actions

Goal specification using temporal logic in presence of non-deterministic actions Goal specification using temporal logic in presence of non-deterministic actions Chitta Baral Matt Barry Department of Computer Sc. and Engg. Advance Tech Development Lab Arizona State University United

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems. Required reading for this topic } Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson for "Impossibility of Distributed with One Faulty Process,

More information